10. Parsers

  Module Parser.XML

  CLASS Parser.XML.Validating

Inherits
  • Parser.XML.Simple

Method parse

array parse(string data, function(string:mixed) callback, mixed ... extra)

FIXME

Document this function


Method parse_dtd

array parse_dtd(string data, function(string:mixed) callback, mixed ... extra)

FIXME

Document this function

  CLASS Parser.XML.Simple


Method define_entity_raw

void define_entity_raw(string entity, string raw)


Method define_entity

void define_entity(string entity, string raw, function cb, mixed ... extras)


Method allow_rxml_entities

void allow_rxml_entities(int(0..1) yes_no)


Method parse_dtd

mixed parse_dtd(string dtd, function cb, mixed ... extras)


Method autoconvert

string autoconvert(string xml)

  Module Parser.XML.NSTree

Description

A namespace aware version of Parser.XML.Tree. This implementation does as little validation as possible, so e.g. you can call your namespace xmlfoo without complaints.


Method parse_input

NSNode Parser.XML.NSTree.parse_input(string data)

Description

Takes a XML string data and produces a namespace node tree.

Throws

Throws an error when an error is encountered during XML parsing.


Method visualize

string Parser.XML.NSTree.visualize(Node n, void|string indent)

Description

Makes a visualization of a node graph suitable for printing out on a terminal.

Example

> object x = parse_input("<a><b><c/>d</b><b><e/><f>g</f></b></a>"); > write(visualize(x)); Node(ROOT) NSNode(ELEMENT,"a") NSNode(ELEMENT,"b") NSNode(ELEMENT,"c") NSNode(TEXT) NSNode(ELEMENT,"b") NSNode(ELEMENT,"e") NSNode(ELEMENT,"f") NSNode(TEXT) Result 1: 201

  CLASS Parser.XML.NSTree.NSNode

Inherits
  • Node
Description

Namespace aware node.


Method get_ns

string get_ns()

Description

Returns the namespace in which the current element is defined in.


Method get_default_ns

string get_default_ns()

Description

Returns the default namespace in the current scope.


Method get_defined_nss

mapping(string:string) get_defined_nss()

Description

Returns a mapping with all the namespaces defined in the current scope, except the default namespace.

Note

The returned mapping is the same as the one in the node, so destructive changes will affect the node.


Method get_ns_attributes

mapping(string:string) get_ns_attributes(string namespace)

Description

Returns the attributes in this node that is declared in the provided namespace.


Method get_ns_attributes

mapping(string:mapping(string:string)) get_ns_attributes()

Description

Returns all the attributes in all namespaces that is associated with this node.

Note

The returned mapping is the same as the one in the node, so destructive changes will affect the node.


Method add_namespace

void add_namespace(string ns, void|string symbol, void|int(0..1) chain)

Description

Adds a new namespace to this node. The preferred symbol to use to identify the namespace can be provided in the symbol argument. If chain is set, no attempts to overwrite an already defined namespace with the same identifier will be made.


Method diff_namespaces

mapping(string:string) diff_namespaces()

Description

Returns the difference between this nodes and its parents namespaces.


Method get_xml_name

string get_xml_name()

Description

Returns the element name as it occurs in xml files. E.g. "zonk:name" for the element "name" defined in a namespace denoted with "zonk". It will look up a symbol for the namespace in the symbol tables for the node and its parents. If none is found a new label will be generated by hashing the namespace.


Method remove_child

void remove_child(NSNode child)

Description

The remove_child is a not updated to take care of name space issues. To properly remove all the parents name spaces from the chid, call remove_node in the child.

  Module Parser.XML.Tree


Constant STOP_WALK

constant Parser.XML.Tree.STOP_WALK


Constant XML_ROOT

constant Parser.XML.Tree.XML_ROOT


Constant XML_ELEMENT

constant Parser.XML.Tree.XML_ELEMENT


Constant XML_TEXT

constant Parser.XML.Tree.XML_TEXT


Constant XML_HEADER

constant Parser.XML.Tree.XML_HEADER


Constant XML_PI

constant Parser.XML.Tree.XML_PI


Constant XML_COMMENT

constant Parser.XML.Tree.XML_COMMENT


Constant XML_DOCTYPE

constant Parser.XML.Tree.XML_DOCTYPE


Constant XML_ATTR

constant Parser.XML.Tree.XML_ATTR

Description

Attribute nodes are created on demand


Constant XML_NODE

constant Parser.XML.Tree.XML_NODE


Method parse_xml_callback

mixed Parser.XML.Tree.parse_xml_callback(string type, string name, mapping attr, string|array contents, mixed location, mixed ... extra)


Method parse_input

Node Parser.XML.Tree.parse_input(string data, void|int(0..1) no_fallback, void|int(0..1) force_lowercase)

Description

Takes a XML string and produces a node tree.


Method parse_file

Node Parser.XML.Tree.parse_file(string path)

Description

Loads the XML file path, creates a node tree representation and returns the root node.

  CLASS Parser.XML.Tree.AbstractNode


Method set_parent

void set_parent(AbstractNode parent)

Description

Sets the parent node to parent.


Method get_parent

AbstractNode get_parent()

Description

Returns the parent node.


Method get_children

array(AbstractNode) get_children()

Description

Returns all the nodes children.


Method count_children

int count_children()

Description

Returns the number of children of the node.


Method clone

AbstractNode clone(void|int(-1..1) direction)

Description

Returns the corresponding node in a clone of the tree.


Method get_root

AbstractNode get_root()

Description

Follows all parent pointers and returns the root node.


Method get_last_child

AbstractNode get_last_child()

Description

Returns the last childe node or zero.


Method `[]

AbstractNode `[](mixed pos)

Description

The [] operator indexes among the node children, so

node[0]
returns the first node and
node[-1]
the last.

Note

The [] operator will select a node from all the nodes children, not just its element children.


Method add_child

AbstractNode add_child(AbstractNode c)

Description

Adds a child node to this node. The child node is added last in the child list and its parent reference is updated.

Returns

The updated child node is returned.


Method remove_child

void remove_child(AbstractNode c)

Description

Removes all occurences of the provided node from the called nodes list of children. The removed nodes parent reference is set to null.


Method remove_node

void remove_node()

Description

Removes this node from its parent. The parent reference is set to null.


Method replace_children

void replace_children(array(AbstractNode) children)

Description

Replaces the nodes children with the provided ones. All parent references are updated.


Method replace_child

AbstractNode replace_child(AbstractNode old, AbstractNode new)

Description

Replaces the first occurence of the old node child with the new node child. All parent references are updated.

Returns

Returns the new child node.


Method replace_node

AbstractNode replace_node(AbstractNode new)

Description

Replaces this node with the provided one.

Returns

Returns the new node.


Method walk_preorder

int|void walk_preorder(function(AbstractNode:int|void) callback, mixed ... args)

Description

Traverse the node subtree in preorder, root node first, then subtrees from left to right, calling the callback function for every node. If the callback function returns Parser.XML.Tree.STOP_WALK the traverse is promptly aborted and Parser.XML.Tree.STOP_WALK is returned.


Method walk_preorder_2

int|void walk_preorder_2(function(AbstractNode:int|void) callback_1, function(AbstractNode:int|void) callback_2, mixed ... args)

Description

Traverse the node subtree in preorder, root node first, then subtrees from left to right. For each node we call callback_1 before iterating through children, and then callback_2 (which always gets called even if the walk is aborted earlier). If the callback function returns Parser.XML.Tree.STOP_WALK the traverse decend is aborted and Parser.XML.Tree.STOP_WALK is returned once all waiting callback_2 functions has been called.


Method walk_inorder

int|void walk_inorder(function(AbstractNode:int|void) callback, mixed ... args)

Description

Traverse the node subtree in inorder, left subtree first, then root node, and finally the remaining subtrees, calling the callback function for every node. If the callback function returns Parser.XML.Tree.STOP_WALK the traverse is promptly aborted and Parser.XML.Tree.STOP_WALK is returned.


Method walk_postorder

int|void walk_postorder(function(AbstractNode:int|void) callback, mixed ... args)

Description

Traverse the node subtree in postorder, first subtrees from left to right, then the root node, calling the callback function for every node. If the callback function returns Parser.XML.Tree.STOP_WALK the traverse is promptly aborted and Parser.XML.Tree.STOP_WALK is returned.


Method iterate_children

int|void iterate_children(function(AbstractNode:int|void) callback, mixed ... args)

Description

Iterates over the nodes children from left to right, calling the callback function for every node. If the callback function returns Parser.XML.Tree.STOP_WALK the iteration is promptly aborted and Parser.XML.Tree.STOP_WALK is returned.


Method get_preceding_siblings

array(AbstractNode) get_preceding_siblings()

Description

Returns all preceding siblings, i.e. all siblings present before this node in the parents children list.


Method get_following_siblings

array(AbstractNode) get_following_siblings()

Description

Returns all following siblings, i.e. all siblings present after this node in the parents children list.


Method get_siblings

array(AbstractNode) get_siblings()

Description

Returns all siblings, including this node.


Method get_ancestors

array(AbstractNode) get_ancestors(int(0..1) include_self)

Description

Returns a list of all ancestors, with the top node last. The list will start with this node if include_self is set.


Method get_descendants

array(AbstractNode) get_descendants(int(0..1) include_self)

Description

Returns a list of all descendants in document order. Includes this node if include_self is set.


Method get_preceding

array(AbstractNode) get_preceding()

Description

Returns all preceding nodes, excluding this nodes ancestors.


Method get_following

array(AbstractNode) get_following()

Description

Returns all the nodes that follows after the current one.

  CLASS Parser.XML.Tree.Node

Inherits
  • AbstractNode
Description

Node in XML tree


Method clone

Node clone(void|int(-1..1) direction)

Description

Clones the node, optionally connected to parts of the tree. If direction is -1 the cloned nodes parent will be set, if direction is 1 the clone nodes childen will be set.


Method get_attributes

mapping get_attributes()

Description

Returns this nodes attributes, which can be altered destructivly to alter the nodes attributes.


Method get_node_type

int get_node_type()

Description

Returns the node type. See defined node type constants.


Method get_text

string get_text()

Description

Returns text content in node.


Method get_doc_order

int get_doc_order()


Method set_doc_order

void set_doc_order(int o)


Method get_tag_name

string get_tag_name()

Description

Returns the name of the element node, or the nearest element above if an attribute node.


Method get_any_name

string get_any_name()

Description

Return name of tag or name of attribute node.


Method get_attr_name

string get_attr_name()

Description

Returns the name of the attribute node.


Method create

void Parser.XML.Tree.Node(int type, string name, mapping attr, string text)


Method value_of_node

string value_of_node()

Description

If the node is an attribute node or a text node, its value is returned. Otherwise the child text nodes are concatenated and returned.


Method get_first_element

AbstractNode get_first_element(void|string name)

Description

Returns the first element child to this node. If a name is provided, the first element child with that name is returned. Returns 0 if no matching node was found.


Method get_elements

array(AbstractNode) get_elements(void|string name)

Description

Returns all element children to this node. If a name is provided, only elements with that name is returned.


Method cast

mixed cast(string to)

Description

It is possible to cast a node to a string, which will return Parser.XML.Tree.Node.render_xml for that node.


Method render_xml

string render_xml()

Description

Creates an XML representation of the nodes sub tree.


Method get_attribute_nodes

array(Node) get_attribute_nodes()

Description

Creates and returns an array of new nodes; they will not be added as proper children to the parent node, but the parent link in the nodes are set so that upwards traversal is made possible.

  CLASS Parser.HTML

Description

This is a simple parser for SGML structured markups. It's not really HTML, but it's useful for that purpose.

The simple way to use it is to give it some information about available tags and containers, and what callbacks those is to call.

The object is easily reused, by calling the Parser.HTML.clone() function.

See also

Parser.HTML.add_tag, Parser.HTML.add_container, Parser.HTML.clone


Method add_tag
Method add_container
Method add_entity
Method add_quote_tag
Method add_tags
Method add_containers
Method add_entities

Parser.HTML add_tag(string name, mixed to_do)
Parser.HTML add_container(string name, mixed to_do)
Parser.HTML add_entity(string entity, mixed to_do)
Parser.HTML add_quote_tag(string name, mixed to_do, string end)
Parser.HTML add_tags(mapping(string:mixed) tags)
Parser.HTML add_containers(mapping(string:mixed) containers)
Parser.HTML add_entities(mapping(string:mixed) entities)

Description

Registers the actions to take when parsing various things. Tags, containers, entities are as usual. add_quote_tag() adds a special kind of tag that reads any data until the next occurrence of the end string immediately before a tag end.

to_do can be:

  • 0

    a function to be called. The function is on the form

    mixed tag_callback(Parser.HTML parser,mapping args,mixed ... extra)
    mixed container_callback(Parser.HTML parser,mapping args,string content,mixed ... extra)
    mixed entity_callback(Parser.HTML parser,mixed ... extra)
    mixed quote_tag_callback(Parser.HTML parser,string content,mixed ... extra)
    
    depending on what realm the function is called by.

  • 0

    a string. This tag/container/entity is then replaced by the string. The string is normally not reparsed, i.e. it's equivalent to writing a function that returns the string in an array (but a lot faster). If Parser.HTML.reparse_strings is set the string will be reparsed, though.

  • 0

    an array. The first element is a function as above. It will receive the rest of the array as extra arguments. If extra arguments are given by Parser.HTML.set_extra(), they will appear after the ones in this array.

  • 0

    zero. If there is a tag/container/entity with the given name in the parser, it's removed.

The callback function can return:

  • 0

    a string; this string will be pushed on the parser stack and be parsed. Be careful not to return anything in this way that could lead to a infinite recursion.

  • 0

    an array; the element(s) of the array is the result of the function. This will not be parsed. This is useful for avoiding infinite recursion. The array can be of any size, this means the empty array is the most effective to return if you don't care about the result. If the parser is operating in Parser.HTML.mixed_mode, the array can contain anything. Otherwise only strings are allowed.

  • 0

    zero; this means "don't do anything", ie the item that generated the callback is left as it is, and the parser continues.

  • 0

    one; reparse the last item again. This is useful to parse a tag as a container, or vice versa: just add or remove callbacks for the tag and return this to jump to the right callback.

Returns

the called object

See also

tags, containers, entities


Method at
Method at_line
Method at_char
Method at_column

array(int) at()
int at_line()
int at_char()
int at_column()

Description

Returns the current position. Characters and columns count from 0, lines count from 1.

Parser.HTML.at() gives an array consisting of ({line,char,column}), in that order.


Method case_insensitive_tag
Method ignore_tags
Method ignore_unknown
Method lazy_argument_end
Method lazy_entity_end
Method match_tag
Method max_parse_depth
Method mixed_mode
Method reparse_strings
Method ws_before_tag_name
Method xml_tag_syntax

int case_insensitive_tag(void|int value)
int ignore_tags(void|int value)
int ignore_unknown(void|int value)
int lazy_argument_end(void|int value)
int lazy_entity_end(void|int value)
int match_tag(void|int value)
int max_parse_depth(void|int value)
int mixed_mode(void|int value)
int reparse_strings(void|int value)
int ws_before_tag_name(void|int value)
int xml_tag_syntax(void|int value)

Description

Functions to query or set flags. These set the associated flag to the value if any is given and returns the old value.

The flags are:

  • 0

    case_insensitive_tag: All tags and containers are matched case insensitively, and argument names are converted to lowercase. Tags added with Parser.HTML.add_quote_tag() are not affected, though. Switching to case insensitive mode and back won't preserve the case of registered tags and containers.

  • 0

    ignore_tags: Do not look for tags at all. Normally tags are matched even when there's no callbacks for them at all. When this is set, the tag delimiters '<' and '>' will be treated as any normal character.

  • 0

    ignore_unknown: Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.

  • 0

    lazy_argument_end: A '>' in a tag argument closes both the argument and the tag, even if the argument is quoted.

  • 0

    lazy_entity_end: Normally, the parser search indefinitely for the entity end character (i.e. ';'). When this flag is set, the characters '&', '<', '>', '"', ''', and any whitespace breaks the search for the entity end, and the entity text is then ignored, i.e. treated as data.

  • 0

    match_tag: Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.

  • 0

    max_stack_depth: Maximum recursion depth during parsing. Recursion occurs when a tag/container/entity/quote tag callback function returns a string to be reparsed. The default value is 10.

  • 0

    mixed_mode: Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.

  • 0

    reparse_strings: When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.

  • 0

    ws_before_tag_name: Allow whitespace between the tag start character and the tag name.

  • 0

    xml_tag_syntax: Whether or not to use XML syntax to tell empty tags and container tags apart:
    0: Use HTML syntax only. If there's a '/' last in a tag, it's just treated as any other argument.
    1: Use HTML syntax, but ignore a '/' if it comes last in a tag. This is the default.
    2: Use XML syntax, but when a tag that does not end with '/>' is found which only got a non-container tag callback, treat it as a non-container (i.e. don't start to seek for the container end).
    3: Use XML syntax only. If a tag got both container and non-container callbacks, the non-container callback is called when the empty element form (i.e. the one ending with '/>') is used, and the container callback otherwise. If only a container callback exists, it gets the empty string as content when there's none to be parsed. If only a non-container callback exists, it will be called (without the content argument) for both kinds of tags.

Note

When functions are specified with Parser.HTML._set_tag_callback() or Parser.HTML._set_entity_callback(), all tags or entities, respectively, are considered known. However, if one of those functions return 1 and ignore_unknown is set, they are treated as text data instead of making another call to the same function again.


Method clear_tags
Method clear_containers
Method clear_entities
Method clear_quote_tags

Parser.HTML clear_tags()
Parser.HTML clear_containers()
Parser.HTML clear_entities()
Parser.HTML clear_quote_tags()

Description

Removes all registered definitions in the different categories.

Returns

the called object

See also

Parser.HTML.add_tag, Parser.HTML.add_tags, Parser.HTML.add_container, Parser.HTML.add_containers, Parser.HTML.add_entity, Parser.HTML.add_entities


Method clone

Parser.HTML clone(mixed ... args)

Description

Clones the Parser.HTML object. A new object of the same class is created, filled with the parse setup from the old object.

This is the simpliest way of flushing a parse feed/output.

The arguments to clone is sent to the new object, simplifying work for custom classes that inherits Parser.HTML.

Returns

the new object.

Note

create is called _before_ the setup is copied.


Method tags
Method containers
Method entities
Method quote_tags

mapping tags()
mapping containers()
mapping entities()
mapping quote_tags()

Description

Returns the current callback settings. For quote_tags, the values are arrays ({callback, end_quote}).

Note that when matching is done case insensitively, all names will be returned in lowercase.

Implementation note: With the exception of quote_tags(), these run in constant time since they return copy-on-write mappings. However, quote_tags() allocates a new mapping and thus runs in linear time.

See also

Parser.HTML.add_tag, Parser.HTML.add_tags, Parser.HTML.add_container, Parser.HTML.add_containers, Parser.HTML.add_entity, Parser.HTML.add_entities


Method context

string context()

Description

Returns the current output context as a string:

  • 0

    "data": In top level data. This is always returned when called from tag or container callbacks.

  • 0

    "arg": In an unquoted argument.

  • 0

    A single character string: In a quoted argument. The string contains the starting quote character.

  • 0

    "splice_arg": In a splice argument.

This function is typically only useful in entity callbacks, which can be called both from text and argument values of different sorts.

See also

Parser.HTML.splice_arg


Method current

string current()

Description

Gives the current range of data, ie the whole tag/entity/etc being parsed in the current callback. Returns zero if there's no current range, i.e. when the function is not called in a callback.


Method feed

Parser.HTML feed()
Parser.HTML feed(string s)
Parser.HTML feed(string s, int do_parse)

Description

Feed new data to the Parser.HTML object. This will start a scan and may result in callbacks. Note that it's possible that all data feeded isn't processed - to do that, call Parser.HTML.finish().

If the function is called without arguments, no data is feeded, but the parser is run.

If the string argument is followed by a 0, ->feed(s,0);, the string is feeded, but the parser isn't run.

Returns

the called object

See also

Parser.HTML.finish, Parser.HTML.read, Parser.HTML.feed_insert


Method feed_insert

Parser.HTML feed_insert(string s)

Description

This pushes a string on the parser stack. (I'll write more about this mechanism later.)

Returns

the called object


Method finish

Parser.HTML finish()
Parser.HTML finish(string s)

Description

Finish a parser pass. A string may be sent here, similar to feed().

Returns

the called object


Method get_extra

array get_extra()

Description

Gets the extra arguments set by Parser.HTML.set_extra().

Returns

the called object


Method parse_tag_args

mapping parse_tag_args(string tag)

Description

Parses the tag arguments from a tag string without the name and surrounding brackets, i.e. a string on the form "some="tag" args".

Returns

a mapping containing the tag arguments

See also

Parser.HTML.tag_args


Method parse_tag_name

string parse_tag_name(string tag)

Description

Parses the tag name from a tag string without the surrounding brackets, i.e. a string on the form "tagname some="tag" args".

Returns

the tag name or an empty string if none


Method read

string|array(mixed) read()
string|array(mixed) read(int max_elems)

Description

Read parsed data from the parser object.

Parser.HTML.mixed_mode, an array of arbitrary data otherwise.

Returns

a string of parsed data if the parser isn't in


Method set_extra

Parser.HTML set_extra(mixed ...args)

Description

Sets the extra arguments passed to all tag, container and entity callbacks.

Returns

the called object


Method splice_arg

string splice_arg(void|string name)

Description

If given a string, it sets the splice argument name to it. It

If a splice argument name is set, it's parsed in all tags, both those with callbacks and those without. Wherever it occurs, its value (after being parsed for entities in the normal way) is inserted directly into the tag. E.g:

     <foo arg1="val 1" splice="arg2='val 2' arg3" arg4>
     
becomes
     <foo arg1="val 1" arg2='val 2' arg3 arg4>
     
if "splice" is set as the splice argument name.

Returns

the old splice argument name.


Method tag
Method tag_name
Method tag_args
Method tag_content

array tag()
string tag_name()
mapping(string:mixed) tag_args()
string tag_content()
array tag(mixed default_value)
string tag_args(mixed default_value)

Description

These give parsed information about the current thing being parsed, e.g. the current tag, container or entity. They return zero if they're not applicable.

tag_name gives the name of the current tag. If used from an entity callback, it gives the string inside the entity.

tag_args gives the arguments of the current tag, parsed to a convenient mapping consisting of key:value pairs. If the current thing isn't a tag, it gives zero. default_value is used for arguments which have no value in the tag. If default_value isn't given, the value is set to the same string as the key.

tag_content gives the content of the current tag, if it's a container or quote tag.

tag() gives the equivalent of ({tag_name(),tag_args(), tag_content()}).


Method write_out

Parser.HTML write_out(mixed ... args)

Description

Send data to the output stream, i.e. it won't be parsed and it won't be sent to the data callback, if any.

Any data is allowed when the parser is running in Parser.HTML.mixed_mode. Only strings are allowed otherwise.

Returns

the called object


Method _inspect

mapping _inspect()

Description

This is a low-level way of debugging a parser. This gives a mapping of the internal state of the Parser.HTML object.

The format and contents of this mapping may change without further notice.


Method _set_tag_callback
Method _set_entity_callback
Method _set_data_callback

Parser.HTML _set_tag_callback(function to_call)
Parser.HTML _set_entity_callback(function to_call)
Parser.HTML _set_data_callback(function to_call)

Description

These functions set up the parser object to call the given callbacks upon tags, entities and/or data.

The callbacks will only be called if there isn't another tag/container/entity handler for these.

The function will be called with the parser object as first argument, and the active string as second.

Note that no parsing of the contents has been done. Both endtags and normal tags are called, there is no container parsing.

The return values from the callbacks are handled in the same way as the return values from callbacks registered with Parser.HTML.add_tag and similar functions.

The data callback will be called as seldom as possible with the longest possible string, as long as it doesn't get called out of order with any other callback. It will never be called with a zero length string.

Returns

the called object

  Module Parser


Method get_xml_parser

Parser.HTML Parser.get_xml_parser()

Description

Returns a Parser.HTML initialized for parsing XML. It has all the flags set properly for XML syntax and callbacks to ignore comments, CDATA blocks and unknown PI tags, but it has no registered tags and doesn't decode any entities.


Method html_entity_parser
Method parse_html_entities

HTML Parser.html_entity_parser()
string Parser.parse_html_entities(string in)

Description

Parse any HTML entities in the string to unicode characters. Either return a complete parser (to build on or use) or parse a string. Throw an error if there is an unrecognized entity in the string.

Note

Currently using XHTML 1.0 tables.


Method decode_numeric_xml_entity

string Parser.decode_numeric_xml_entity(string chref)

Description

Decodes the numeric XML entity chref, e.g. "&#x34;" and returns the character as a string. chref is the name part of the entity, i.e. without the leading '&' and trailing ';'. Returns zero if chref isn't on a recognized form or if the character number is too large to be represented in a string.

  CLASS Parser.RCS

Description

A RCS file parser that eats a RCS *,v file and presents nice pike data structures of its contents.


Variable head

string head

Description

Version number of the head version of the file


Variable branch

string|int(0..0) branch

Description

The default branch (or revision), if present,

0
otherwise


Variable access

array(string) access

Description

The usernames listed in the ACCESS section of the RCS file


Variable comment

string|int(0..0) comment

Description

The RCS file comment if present,

0
otherwise


Variable expand

string expand

Description

The keyword expansion options (as named by RCS) if present,

0
otherwise


Variable description

string description

Description

The RCS file description


Variable locks

mapping(string:string) locks

Description

Maps from username to revision for users that have acquired locks on this file


Variable strict_locks

int(0..1) strict_locks

Description

1 if strict locking is set, 0 otherwise


Variable tags

mapping(string:string) tags

Description

Maps tag names (indices) to tagged revision numbers (values)


Variable branches

mapping(string:string) branches

Description

Maps branch numbers (indices) to branch names (values)


Variable revisions

mapping(string:Revision) revisions

Description

Data for all revisions of the file. The indices of the mapping are the revision numbers, whereas the values are the data from the corresponding revision.


Variable trunk

array(mapping) trunk

Description

Data for all revisions on the trunk, sorted in the same order as the RCS file stored them - ie descending, most recent first, I'd assume (rcsfile(5), of course, fails to state such irrelevant information).


Method create

void Parser.RCS(string|void file_name, string|int(0..0)|void file_contents)

Description

Initializes the RCS object.

Parameter file_name

The path to the raw RCS file (includes trailing ",v"). Used mainly for error reporting (truncated RCS file).

Parameter file_contents

If a string is provided, that string will be parsed to initialize the RCS object. If a zero (

0
) is sent, no initialization will be performed at all. If no value is given at all, but file_name was provided, that file will be loaded and parsed for object initialization.


Method parse_admin_section

string parse_admin_section(string raw)

Description

Lower-level API function for parsing only the admin section (the initial chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running Parser.RCS.parse_admin_section, the RCS object will be initialized with the values for Parser.RCS.head, Parser.RCS.branch, Parser.RCS.access, Parser.RCS.branches, Parser.RCS.tags, Parser.RCS.locks, Parser.RCS.strict_locks, Parser.RCS.comment and Parser.RCS.expand.

Parameter raw

The unprocessed RCS file.

Returns

The rest of the RCS file, admin section removed.

See also

Parser.RCS.parse_delta_sections, Parser.RCS.parse_deltatext_sections, Parser.RCS.parse, Parser.RCS.create

FIXME

Does not handle rcsfile(5) newphrase skipping.


Method parse_delta_sections

string parse_delta_sections(string raw)

Description

Lower-level API function for parsing only the delta sections (the second chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running Parser.RCS.parse_delta_sections, the RCS object will be initialized with the value of Parser.RCS.description and populated Parser.RCS.revisions mapping and Parser.RCS.trunk array. Their Parser.RCS.Revision members are however only populated with the members Parser.RCS.Revision.revision, Parser.RCS.Revision.branch, Parser.RCS.Revision.time, Parser.RCS.Revision.author, Parser.RCS.Revision.state, Parser.RCS.Revision.branches, Parser.RCS.Revision.rcs_next, Parser.RCS.Revision.ancestor and Parser.RCS.Revision.next.

Parameter raw

The unprocessed RCS file, with admin section removed. (See Parser.RCS.parse_admin_section.)

Returns

The rest of the RCS file, delta sections removed.

See also

Parser.RCS.parse_admin_section, Parser.RCS.parse_deltatext_sections, Parser.RCS.parse, Parser.RCS.create

FIXME

Does not handle rcsfile(5) newphrase skipping.


Method parse_deltatext_sections

void parse_deltatext_sections(string raw, void|function(string:void) progress_callback, array|void callback_args)

Description

Lower-level API function for parsing only the deltatext sections (the final and typically largest chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After a Parser.RCS.parse_deltatext_sections run, the RCS object will be fully populated.

Parameter raw

The unprocessed RCS file, with admin and delta sections removed. (See Parser.RCS.parse_admin_section and Parser.RCS.parse_delta_sections.)

Parameter progress_callback

This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).

Parameter args

Optional extra trailing arguments to be sent to progress_callback

See also

Parser.RCS.parse_admin_section, Parser.RCS.parse_delta_sections, Parser.RCS.parse, Parser.RCS.create

FIXME

Does not handle rcsfile(5) newphrase skipping.


Method parse

this_program parse(string raw, void|function(string:void) progress_callback)

Description

Parse the RCS file raw and initialize all members of this object fully initialized.

Parameter raw

The unprocessed RCS file.

Parameter progress_callback

Passed on to Parser.RCS.parse_deltatext_sections.

Returns

The fully initialized object (only returned for API convenience; the object itself is destructively modified to match the data extracted from raw)

See also

Parser.RCS.parse_admin_section, Parser.RCS.parse_delta_sections, Parser.RCS.parse_deltatext_sections, Parser.RCS.create

  CLASS Parser.RCS.DeltatextIterator

Description

Iterator for the deltatext sections of the RCS file. Typical usage:

Example

string raw = Stdio.read_file(my_rcs_filename); Parser.RCS rcs = Parser.RCS(my_rcs_filename, 0); raw = rcs->parse_delta_sections(rcs->parse_admin_section(raw)); foreach(rcs->DeltatextIterator(raw); int n; Parser.RCS.Revision rev) do_something(rev);


Method create

void Parser.RCS.DeltatextIterator(string deltatext_section, void|function(string:void) progress_callback, void|array(mixed) progress_callback_args)

Parameter deltatext_section

the deltatext section of the RCS file in its entirety

Parameter progress_callback

This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).

Parameter progress_callback_args

Optional extra trailing arguments to be sent to progress_callback

See also

the rcsfile(5) manpage outlines the sections of an RCS file


Method read_next

int(0..1) read_next()

Description

drops the leading whitespace before next revision's deltatext entry and sets this_rev to the revision number we're about to read.

Note

this method requires that raw starts with a valid deltatext entry


Method index

int index()

Returns

the number of deltatext entries processed so far (0..N-1, N being the total number of revisions in the rcs file)


Method value

Revision value()

Returns

the Parser.RCS.Revision at whose deltatext data we are, updated with its info


Method `!

int(0..1) `!()

Returns

1 if the iterator has processed all deltatext entries, 0 otherwise.


Method next

int(0..1) next()

Description

like `+=(1), but returns 0 if the iterator is finished


Method first

int(0..1) first()

Description

Restart not implemented; always returns 0 (==failed)


Method parse_deltatext_section

string parse_deltatext_section(string raw)

Description

Chops off the first deltatext section from the string raw and returns the rest of the string, or the value 0 (zero) if we had already visited the final deltatext entry. The deltatext's data is stored destructively in the appropriate entry of the Parser.RCS.revisions array.

Note

raw must start with a deltatext entry for this method to work

FIXME

does not handle rcsfile(5) newphrase skipping

FIXME

if the rcs file is truncated, this method writes a descriptive error to stderr and then returns 0 - some nicer error handling wouldn't hurt

  CLASS Parser.RCS.Revision

Description

All data tied to a particular revision of the file.


Variable revision

string revision

Description

the revision number (i e Parser.RCS->revisions["1.1"]->revision == "1.1")


Variable author

string author

Description

the name of the user that committed the revision


Variable branches

array(string) branches

Description

when there are branches from this revision, an array of the revision numbers where each branch starts, otherwise 0


Variable state

string state

Description

the state of the revision - typically "Exp" or "dead"


Variable time

Calendar.ISO.Second time

Description

the (UTC) date and time when the revision was committed


Variable branch

string branch

Description

the branch name on which this revision was committed (calculated according to how cvs manages branches)


Variable rcs_next

string rcs_next

Description

the revision stored next in the rcs file, or 0 if none exists


Variable ancestor

string ancestor

Description

the revision of the ancestor of this revision, or 0 if this was the initial revision


Variable next

string next

Description

the revision that succeeds this revision, or 0 if none exists


Variable log

string log

Description

the log message associated with the revision


Variable lines

int lines

Description

the number of lines this revision contained, altogether (not of particular interest for binary files)


Variable added

int added

Description

the number of lines that were added from the previous revision to make this revision (for the initial revision too)


Variable removed

int removed

Description

the number of lines that were removed from the previous revision to make this revision


Method get_contents

string get_contents()

Description

Returns the file contents from this revision, without performing any keyword expansion.

See also

Parser.RCS.Revision.expand_keywords


Method expand_keywords

string expand_keywords(string|void text, int|void override_binary)

Description

Expand keywords and return the resulting text according to the expansion rules set for the file.

Parameter text

If supplied, substitutes keywords for that text instead, using values that would apply for this revision. Otherwise, this revision is used.

Parameter override_binary

Perform expansion even if the file was checked in as binary.

Note

The Log keyword (which lacks sane quoting rules) is not expanded. Keyword expansion rules set in CVSROOT/cvswrappers are ignored. Only implements the -kkv and -kb expansion modes.

See also

Parser.RCS.Revision.get_contents

  CLASS Parser.SGML

Description

This is a handy simple parser of SGML-like syntax like HTML. It doesn't do anything advanced, but finding the corresponding end-tags.

It's used like this:

array res=Parser.SGML()->feed(string)->finish()->result();

The resulting structure is an array of atoms, where the atom can be a string or a tag. A tag contains a similar array, as data.

Example

A string "<gat>&nbsp;<gurka>&nbsp;</gurka>&nbsp;<banan>&nbsp;<kiwi>&nbsp;</gat>" results in

	({
	    tag "gat" object with data:
	    ({
	        tag "gurka" object with data:
		({
                 " "
             })
	        tag "banan" object with data:
		({
                 " "
	            tag "kiwi" object with data:
		    ({
                    " "
                 })
             })
         })
     })
	

ie, simple "tags" (not containers) are not detected, but containers are ended implicitely by a surrounding container _with_ an end tag.

The 'tag' is an object with the following variables:

	 string name;           - name of tag
	 mapping args;          - argument to tag
	 int line,char,column;  - position of tag
	 string file;           - filename (see <ref>create</ref>)
	 array(SGMLatom) data;  - contained data
     


Method create

void Parser.SGML()
void Parser.SGML(string filename)

Description

This object is created with this filename. It's passed to all created tags, for debug and trace purposes.

Note

No, it doesn't read the file itself. See Parser.SGML.feed.


Method feed
Method finish
Method result

object feed(string s)
array(SGMLatom|string) finish()
array(SGMLatom|string) result(string s)

Description

Feed new data to the object, or finish the stream. No result can be used until Parser.SGML.finish is called.

Both Parser.SGML.finish and Parser.SGML.result return the computed data.

Parser.SGML.feed returns the called object.

  Module Parser.LR

  CLASS Parser.LR.Priority

Description

Specifies the priority and associativity of a rule.


Variable value

int value

Description

Priority value


Variable assoc

int assoc

Description

Associativity

-1

Left

0

None

1

Right



Method create

void Parser.LR.Priority(int p, int a)

Description

Create a new priority object.

Parameter p

Priority.

Parameter a

Associativity.

  CLASS Parser.LR.Rule

Description

This object is used to represent a BNF-rule in the LR parser.


Variable nonterminal

int nonterminal

Description

Non-terminal this rule reduces to.


Variable symbols

array(string|int) symbols

Description

The actual rule


Variable action

function|string action

Description

Action to do when reducing this rule. function - call this function. string - call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will be the value of this non-terminal. The default rule is to return the first argument.


Variable has_tokens

int has_tokens

Description

This rule contains tokens


Variable num_nonnullables

int num_nonnullables

Description

This rule has this many non-nullable symbols at the moment.


Variable number

int number

Description

Sequence number of this rule (used for conflict resolving) Also used to identify the rule.


Variable pri

Priority pri

Description

Priority and associativity of this rule.


Method create

void Parser.LR.Rule(int nt, array(string|int) r, function|string|void a)

Description

Create a BNF rule.

Example

The rule

rule : nonterminal ":" symbols ";" { add_rule };

might be created as

rule(4, ({ 9, ":", 5, ";" }), "add_rule");

where 4 corresponds to the nonterminal "rule", 9 to "nonterminal" and 5 to "symbols", and the function "add_rule" is too be called when this rule is reduced.

Parameter nt

Non-terminal to reduce to.

Parameter r

Symbol sequence that reduces to nt.

Parameter a

Action to do when reducing according to this rule. function - Call this function. string - Call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will become the value of this non-terminal. The default rule is to return the first argument.

  CLASS Parser.LR.ErrorHandler

Description

Class handling reporting of errors and warnings.


Variable verbose

optional int(-1..1) verbose

Description

Verbosity level

-1

Just errors.

0

Errors and warnings.

1

Also notices.



Method create

void Parser.LR.ErrorHandler(int(-1..1)|void verbosity)

Description

Create a new error handler.

Parameter verbosity

Level of verbosity.

See also

Parser.LR.ErrorHandler.verbose

  CLASS Parser.LR.Parser

Description

This object implements an LALR(1) parser and compiler.

Normal use of this object would be:

 set_error_handler
 {add_rule, set_priority, set_associativity}*
 set_symbol_to_string
 compile
 {parse}*
 


Variable grammar

mapping(int:array(