The Document Object Model (DOM) is a cross-platform and language- The DOM is a programming interface for HTML and XML documents. independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM (such as its "Elements") may be addressed and manipulated within the syntax of the programming l anguage in use. The public interface of a DOM is specified in its Application Programming Interface (API). It defines the way a document can be accessed and manipulated. Using a DOM, a programmer can create a document, navigate its structure, and add, modify, or delete its elements. As a W3C specification, one important objective for the DOM has been to provide a standard programming interface that can be used in a wide variety of environments and applications
In other words: The XML DOM is a standard for how to get, change, add, or delete XML elements.
XML DOM Tutorial
| ||||||
SAX PARSER What is SAX? The Simple API for XML, SAX, was invented in late 1997/early 1998 when Peter Murray-Rust and several authors of XML parsers written in Java decided there wasn’t much point to maintaining multiple similar yet incompatible APIs to do exactly the same thing. Murray-Rust was the first to suggest what he called “YAXPAPI”. The reason Murray-Rust wanted Yet Another XML Parser API was that he was thoroughly sick of supporting multiple, incompatible XML parsers for his parser-client application JUMBO. Instead, he wanted a standard API everyone could agree on. Parser authors Tim Bray and David Megginson quickly signed on to the project, and work began in public on the xml-dev mailing list where many people participated. Megginson wrote the initial draft of SAX. After a short beta period, SAX 1.0 was released on May 11, 1998. SAX was designed around abstract interfaces rather than concrete classes so it could be layered on top of parsers’ existing native APIs. SAX is not the most sophisticated XML API imaginable, but that’s part of its beauty. The ease with which SAX could be implemented by many parser vendors with very different architectures contributed to its success and rapid standardization. SAX (Simple API for XML) is a sequential access parser API for XML. SAX provides a mechanism for reading data from an XML document. It is a popular alternative to the Document Object Model (DOM). IntroductionThis web page publishes SAX Parser code that reads XML formatted data into Java objects. A class is included that will allocate and initialize the SAX Parser. If a boolean flag is true, the parser will be initialized as a validating parser. The XML schema that the XML documents are validated against is published here as well.SAX: Ass Backward ParsingWith SAX and XML Schema validation as examples, I am left with the impression that the people who developed these technologies never took a compiler implementation class, or if they did, the class left no impression on them.Parsing is usually done by two logical components: a parser and a scanner. The scanner reads the text and classifies it as "tokens". A token is a catagory that is recognized by the parser. For example, a scanner for the Java programming language might return the tokens that include: identifier, integer, for (a reserved word), mult (an operator). An important point, relative to SAX, is that the parser calls the scanner. As the parser processes the tokens returned by the scanner it performs operations, like building a syntax tree. An example of a parser that reads assignment statements and arithmetic expressions and builds XML can be found here. The is part of the DOM parsing software mentioned above. In the case of SAX, the scanner (the SAXParser object) calls the parser. This makes parsing with SAX needlessly awkward and complicates the architecture of the software. For this reason, the DOMParser is frequently used for parsing complicated XML documents. SAX is not without its virtues (maybe)The SAXParser does have two notable advantages over the DOMParser: the SAXParser is faster and it uses less memory. While the SAXParser is difficult to use for processing complex XML documents, perhaps it is appropriate for processing simple XML documents? This web page grew out of an experiment to see if this is true.A Prototype ApplicationThe prototype code published on this web page is motivated by a real application. This is a software system I call a Trade Engine, which is diagrammed in Figure 1. The Trade Engine is designed to process order and control messages for trading applications. These might be computer driven trading programs for the stock, options or foreign exchange markets. The trading applications submit XML formatted orders and control messages to the Trade Engine. The Trade Engine parses and validates these messages and builds internal Java objects. The market orders are called "aim orders" because they specify a trading goal. Depending on the processing instructions, the Trade Engine may execute the order over a period of time (e.g., the trading day).Figure 1 Parsing Trade Engine Messages using SAXSAX uses "call backs". When the SAXParser object recognizes a component in an XML document (e.g., a start Element, an end Element, the characters between tags), it calls a method that may be supplied by the application to process the XML component. In Java this is done by subclassing a handler class, like the DefaultHandler. This can be seen in the method signature in the javax.xml.parsers.SAXParser object for the parse method used in this example:parse(InputStream is, DefaultHandler dh)In this example a MessageProcessor subclass is derived from the DefaultHandler class. The MessageProcessor class overrides the methods associated with the XML components that are of interest. For example, startElement is overridden, but the processingInstruction() method is not. The MessageProcessor class is diagrammed in Figure 2. Figure 2 Figure 3 A Trade Engine message may enclose multiple sub-messages. For example, one Trade Engine message may include multiple aim orders. When the MessageProcessor class recognizes the start of a sub-message it allocates a sub-message processor. The MessageProcessor then calls the sub-message processor methods to process each of the XML components and build the object. The class diagram for the sub-message processors is shown in Figure 4. Again, this is just a prototype, so there are only two message processors. Figure 4 The MessageBaseMessage base class processes the common data fields in the MessageBase object, which is the base class for the Control and AimOrder objects. ConclusionThe software published here builds message objects from XML formatted data. In theory using the SAXParser for this is faster than using the DOMParser to build a DOM object and then traversing the DOM tree to build a message object. But the call back architecture of SAX introduces complexity that does not exist for a parser which calls a scanner. The awkwardness of SAX and the overhead of DOMParsing are some of the motivations behind the XML Pull Parser, which is called by a parsing application. An example that applies XML Pull Parsing to the Trade Engine messages described on this web page can be found here.One advantage that both the SAX and DOM parsers have is that they are validating. The structure of the XML document can be verified against an XML schema. However, the computational cost of this validation is unknown (at least to me). SAX validation may reduce the computational advantage of the SAX parser compared to the DOM parser. At its core, SAX, the Simple API for XML, is based on just two interfaces, the XMLReader interface that represents the parser and the ContentHandler interface that receives data from the parser. These two interfaces alone suffice for 90% of what you need to do with SAX. This chapter shows the basic operation of XMLReader and discusses ContentHandler in detail. The next chapter explores a variety of ways to customize the parsing process through the more advanced features of the XMLReader interface. | ||||||
|
Topics covered Client side programming - overview of java script, objects in java script, Regular expressions, overview of XML, DTD and XML schema, DOM and SAX parsers, CSS , XSLT.
Wednesday, October 13, 2010
DOM AND SAX PARSERS
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment