e4Graph and XML

The e4XML library enables XML input to be stored in an e4_Storage object. The XML input is parsed using James Clark's expat parser, and stored in nodes and vertices under a given e4_Node object in an e4_Storage object. This facility is provided by the e4_XMLParser class.

The e4XML library also provides facilities for producing a character string representing the XML encoding of a node and all its vertices. This functionality is provided by the e4_XMLGenerator class.

The e4_XMLParser Class

The e4XML library provides a class, e4_XMLParser, which can be used to parse a given string of XML input into an e4Graph graph of objects. A new instance of e4_XMLParser must be created for each parse; each instance can be used for only one parse and must be deleted afterwards.

Each instance of e4_XMLParser is associated with an instance of e4_Node. The association must be formed before parsing can commence, either during the construction of the parser, or afterwards by assigning a node to be associated with an existing parser using the SetNode method. The associated node can later be retrieved using the GetNode method, and interim parsing state can be queried through the regular operations on e4_Node instances.

A parse either succeeds or fails. The current state can be queried using the HasError method which returns true if an error was encountered. The reason for the error is stored in a NULL terminated string which is returned by the ErrorString method. The error state can be cleared using the ClearError method. Clearing the error state does not always succeed and the parser may be unable to continue parsing. If that happens, the parser will immediately enter a new error state.

A parse is either in progress or has finished. The current state may be queried using the Finished method, which returns true when the parse is finished.

Thus, there are four distinct states:

Parsing is done through the Parse method, which takes a buffer of input (not necessarily NULL terminated). The Parse method will advance the parse using the provided input.

The XML input is parsed as follows:

Note that the XML parser is very naive and may cause a stack overflow attempting to parse highly nested XML input. Unfortunately this is an artifact of the underlying parser used, expat. There is no way to fix this problem in programs that use expat; it must be fixed in expat itself and then all programs using expat will automatically be able to handle XML input of arbitrary size.

The following code snippet shows how the e4_XMLParser class might be used in a C++ program:

#include <stdlib.h>
#include "e4xml.h"
...
e4_XMLParser *parser;
e4_Node n;
char *buf;
size_t len;
...
parser = new e4_XMLParser(n);
...
if (!parser->Parse(buf, len)) {
    fprintf(stderr, "Parsing encountered an error: \"%s\"\n",
            parser->ErrorString());
}
delete parser;
e4_XMLParser Methods and Constructors
   
e4_XMLParser() Constructor. Creates an empty parser that does not have an associated node.
e4_XMLParser(e4_Node nn) Constructor. Creates a parser associated with the node nn, which must be a valid node.
~e4_XMLParser() Destructor. Destroys the parser and any associated transient state information.
   
void SetNode(e4_Node nn) Associates the node nn with this parser. New input will be stored in new vertices added to the end of the list of vertices in the node nn.
bool GetNode(e4_Node &nn) Retrieves the associated node in nn, if there is an associated node. If not, returns false.
bool Finished() Returns true when the parse has finished successfully, false otherwise.
bool HasError() Returns true if the parse has encountered an error.
const char *ErrorString() Returns a NULL terminated string describing the error, if any, encountered by the parser.
void ClearError() Attempts to clear the error situation and recover; may fail, in which case a subsequent attempt to parse more input will reenter an error state.
bool Parse(char *buf, size_t len) Attempts to parse the XML input in the buffer buf of length len. If the entire buffer was parsed successfully, the operation returns true. If an error was encountered, false is returned.
const unsigned char *Base64_Decode(const char *base64Str, int *nbytes) As a convenience, the parser provides a method to decode a character string encoded in BASE64 (RFC 1341) into a binary value. The output argument nbytes contains the length of the byte sequence returned. The memory occupied by the returned binary value is managed by the parser and must be copied by your application.
   

The e4_XMLGenerator Class

The e4XML library provides a class, e4_XMLGenerator, which can be used for creating an XML string representing a given node and all its vertices, recursively. A new instance of e4_XMLGenerator must be used each time you want to generate XML output from a node.

Each instance of e4_XMLGenerator is associated with an instance of e4_Node, the node from which XML output is generated. The association must be formed before XML generation can occur, by either giving the associated node in the constructor of e4_XMLGenerator, or by using the SetNode method. The current associated node is returned by the GetNode method.

XML output generated from the associated node is wrapped by an XML tag determined either at construction time or set later by using the SetElementName method. The currently set wrapping XML tag is returned by the GetElementName method.

XML output is generated with the Generate method, which takes no arguments and returns the XML output as a NULL terminated string. You can get the XML output string from a previous invocation of Generate using the Get method. The memory occupied by the string returned by Generate and Get is owned by the XML generator; if your program wishes to keep the value around, it must be copied.

The generated XML output string reverses the process of parsing XML input using the e4_XMLParser class described above. Specifically:

Note that the generator is very naive and may cause a stack overflow while descending into the associated node's reachable graph structure. A future version of the generator may fix this problem by using iteration instead of recursive descent to visit all reachable nodes and vertices. However, the generator is guaranteed to finish generating output in bounded time; it will not recurse infinitely given circular data structures.

The following code snippet shows how you might use the e4_XMLGenerator class in your C++ programs:

#include "e4xml.h"
...
e4_XMLGenerator *gen;
e4_Node n;
char *xml;
...
gen = new e4_XMLGenerator(n, "hello");
...
xml = gen->Generate();
...
fprintf(stderr, "Generated XML: \"%s\"\n", xml);
...
delete gen;

 
e4_XMLGenerator Constructors and Methods
   
e4_XMLGenerator() Creates an empty instance. Subsequently you should associate an instance of e4_Node with this XML generator using the SetNode method, and a wrapping XML element tag name, using the SetElementName method.
e4_XMLGenerator(e4_Node n, char *elementName) Creates an instance with an associated instance of e4_Node and a wrapping XML element  tag name.
~e4_XMLGenerator() Destructor. Frees memory associated with this XML generator.
   
void SetNode(e4_Node n) Sets the associated node for this XML generator.
void SetElementName(char *elementName) Sets the wrapping XML element tag name for this XML generator.
void SetElementNameAndNode(e4_Node n, char *elementName) Sets both the associated node and the wrapping XML element tag name for this XML generator.
void GetNode(e4_Node &n) const Retrieves the associated node.
char *GetElementName() const Retrieves the wrapping XML element tag name. This retrieves the XML element tag name string used by the generator itself, not a copy.
char *Generate() Generates and returns the XML output representing the associated node, wrapped in the wrapping XML tag name. The returned string's memory is owned by the generator, so your application must copy it if the value should be preserved. If an error occurs or the generator is not ready to produce XML (no associated node or wrapping XML tag name was previously given) then NULL is returned.
char *Get() Retrieves the XML output previously generated by a sucessful call to Generate. The returned string's memory is managed by the generator and must be copied by your application.
char *Base64_Encode(unsigned char *bytes, int nbytes) const As a convenience, the class provides a method to encode binary data of a given length as a BASE64 character string. The memory occupied by the returned string is owned by the generator and must be copied by your application.