Pages

Saturday, October 16, 2010

CSS

XML and CSS

The simplicity of document creation was a key element in the astonishingly rapid development of the Web. This article describes XML and CSS: the "one-two" punch that will not only bring back that level of simplicity, but also enable the construction of complex applications which are either difficult or impossible using HTML. In this article we outline the steps for using an CSS style sheet in an XML document; we discuss the limitations of CSS in complex applications; and we present a real life example.
HTML provides limited possibilities for the explicit formatting and positioning of text. The mechanisms that are provided--such as the FONT element or the ALIGN attribute--force the page designer to embed presentation-specific information within the document; a fact that makes it difficult to prepare documents for a variety of screen sizes, presentation modalities, and types of audiences. Because these limited features are not sufficient to achieve the formatting results desired by many Web designers, they commonly resort to using tables and various HTML coding "tricks." This presents many negative consequences, particularly because it is so difficult to maintain information content in HTML documents; the content is inextricably interwined with the format-related encoding. More sophisticated formatting capabilities have long been needed to support the many document types, ranging from marketing froufrou to legal documents to scientific journals.
Cascading Style Sheets (CSS) is a style sheet mechanism specifically developed to meet the needs of Web designers and users. CSS provides HTML with far greater control over document presentation in a way that is independent of document content. CSS style sheets can be used to set fonts, colors, whitespace, positioning, backgrounds, and many other presentational aspects of a document. It is also possible for several documents to share the same style sheet, which allows users to maintain consistent presentation within a collection of related documents without having to modify each document separately.
The rationale for XML is discussed at length in other papers in this issue; we will thus restrict our introduction to the observation that, in combination, XML and CSS can once again simplify document creation. XML uses markup to describe the structure and data content of a document, making it easy both for authors to write it and for computer programs to process it. CSS, on the other hand, makes it possible to present that document to the user in a browser. CSS or some type of style sheet mechanism is, in fact, a requisite for browsing XML on the Web. If CSS works well with HTML, it will work wonders with XML.
In this article we will illustrate the use of CSS and XML with two examples. We do not describe the syntax of CSS, but refer the reader to the CSS documentation on the W3C Web site.[1]

A Thespian Example

The steps in the following sections illustrate the use of a CSS style sheet for an XML document (an extract from the Shakespeare play "Much Ado about Nothing").


Figure 1

Step 1: The Document Source

Figure 1 shows much_ado.xml; preparing the document source is the first step in using a CSS style sheet for an XML document.

Step 2: Define Style Sheet Rules

The style rules for a document can be stored either within the document itself, or within a separate text file. In this example, we will save the style sheet as a separate file, shaksper.css, so that it can be easily applied to other Shakespeare plays we might like to publish.
You can create the style sheet manually using your favorite text editor, as shown in Example 1.
Alternatively, you can create the style sheet using an editor that supports CSS, such as Grif's Symposia, as shown in Figure 2.
Example 1
PLAY { background-color : white }
FM { font-style : italic;
     font-size : 14;
     color : #400040;
     text-align : right }
SPEAKER { font-weight : bold;
          color : #ff0080 }
LINE { color : #800040;
       left : 15 }
PERSONA { font-style : italic;
          font-size : -1 }
PERSONAE { color : #800040 }
SCNDESCR { margin-top : 30;
           font-size : 18;
           color : #0000a0;
           height : 20 }
STAGEDIR { font-weight : bold;
           font-style : italic;
           height : 20 }
PLAYSUBT { font-weight : bold;
           font-size : 16;
           text-decoration : underline;
           height : 20 }
SPEECH { margin-top : 5 }



Figure 2

Step 3: Link the Style Sheet
to the Document

At the time of writing, there is not yet a standardized method for linking an XML document with a style sheet. This subject forms part of the work, currently in progress by the W3C, that is due to be published in December 1997 as part III of the XML specification: XML-Style.
The method we will use in this example therefore is based on a draft proposal for stylesheet linking in XML which consists of inserting the XML processing instruction <?XML-stylesheet?> at the top of the document. The processing instruction has two required attributes type and href which respectively specify the type of stylesheet and its address. In our example, we thus need to add the following line to our XML document:

<?XML-stylesheet type="text/css" href="shaksper.css"?> 
Symposia supports this XML style sheet linking mechanism and inserts the processing instruction for you automatically when you specify an external style sheet using the "Create new Style Sheet/External" command.

Step 4: Publish the Document
and its Style Sheet

Once the style sheet is linked to the document, it can be published. When opened in an XML browser (or in Symposia, as shown in Figure 3), the style rules are applied to the different XML elements in the document.
Figure 3


 

Limitations of CSS for
Complex Applications

Although CSS style sheets can be very effective for improving the presentation of HTML documents, the CSS1 standard has a number of important omissions which can limit the effectiveness of CSS style sheets for more complex applications. The following list, taken from Jon Bosak's presentation at WWW6 on April 11, 1997[2] describes just a few of the major limitations of the CSS standard:

  • CSS cannot grab an item (such as a chapter title) from one place and use it again in another place (such as a page header).
  • CSS has no concept of sibling relationships. For example, it is impossible to write a CSS stylesheet that will render every other paragraph in bold.
  • CSS is not a programming language; it does not support decision structures and cannot be extended by the stylesheet designer.
  • CSS cannot calculate quantities or store variables. This means, at the very least, that it cannot store commonly used parameters in one location that is easy to update.
  • CSS cannot generate text (page numbers, etc.)
  • CSS uses a simple box-oriented formatting model that works for current Web browsers but will not extend to more advanced applications of the markup, such as multiple column sets.
  • CSS is oriented toward Western languages and assumes a horizontal writing direction.
It is for these reasons that the W3C working group responsible for the XML standard is concentrating its efforts on the implementation of a simplified version of the DSSSL standard, DSSSL Online (DSSSL-O). DSSSL-O promises to provide far more layout and document presentation features than CSS, but is a considerably more complex standard and may prove difficult to implement.
What is clear, however, is that the limitations of CSS1 do represent a serious hinderance for its use in the more complex types of application that are possible with XML. As the following section shows, even for a somewhat simple XML application, the CSS standard needs to acquire a certain number of essential features if it is realize its full potential.

A Real Life Example

The following example is taken from a Marketing Contact application which is actually in use at Grif. The Marketing Contact application allows our staff to record information and update information about our business contacts, and to extract reports and find existing records based on specific criteria using a search engine. Though this type of application is often implemented using a database such as Access, a more Web-centric approach would be to create an interface to a database using HTML forms. We choose instead to simply create our database records directly using Symposia. Our search engine is XML-capable so the equivalent of field-specific searches can be performed within XML elements. Among the advantages of this approach is that our application is entirely Intranet-based; and as a result we were neither required to write a line of CGI code, nor to develop HTML or database forms.
The DTD for our application is shown in Example 2.
Example 2
<!XML version="1.0">
<!DocType ContactRec [
<!Element ContactRec (Name, Company, Address, 
Product, Contacts)>
<!Element Name (Honorific?, First,
Middle?, Last)>
<!Element Company (JobTitle?, CompanyName)>
<!Element Address (Street+, City, Region?,
PostCode, Country, Phone,
Internet)>
<!Element Product  Empty>
<!AttList Product
SGMLEditor (Yes|No) #REQUIRED
SGMLEditorKorean (Yes|No) #REQUIRED
SGMLEditorJapanese (Yes|No) #REQUIRED
ActiveViews (Yes|No) #REQUIRED
SymposiaPro (Yes|No) #REQUIRED
SymposiaDocPlus (Yes|No) #REQUIRED
XMLProducts (Yes|No) #REQUIRED
General (Yes|No) #REQUIRED>
<!Element Contacts (Language, History)>
<!Element Honorific (#PCDATA)>
<!AttList Honorific
Title (Mr|Ms|Mrs|Miss|Dr|Professor|M|Mme|Mlle|SeeContent) "SeeContent">
<!Element First (#PCDATA)>
<!Element Middle (#PCDATA)>
<!Element Last (#PCDATA)>
<!Element JobTitle (#PCDATA)>
<!Element CompanyName (#PCDATA)>
<!Element Street (#PCDATA)>
<!Element City (#PCDATA)>
<!Element Region (#PCDATA)>
<!Element PostCode (#PCDATA)>
<!Element Country (#PCDATA)>
<!Element Phone (DayTime, Fax?)>
<!Element Internet (Email, Web)>
<!Element Language  EMPTY>
<!AttList Language
Preference (English|French) "English">
<!Element History (Events+)>
<!Element DayTime (#PCDATA)>
<!Element Fax (#PCDATA)>
<!Element Email (#PCDATA)>
<!Element Web (#PCDATA)>
<!Element Events (Date, Venue, Notes)>
<!Element Date (Day, Month, Year)>
<!Element Venue (#PCDATA)>
<!Element Notes (#PCDATA)>
<!Element Day (#PCDATA)>
<!Element Month (#PCDATA)>
<!Element Year (#PCDATA)>
]>
A form-based interface provides many user-friendly features that make data-entry easier for the user, such as the ability to select from a predetermined list of options, or the possibility to select or unselect different options, as appropriate.
To provide an equivalent level of "comfort," we need to provide a predefined document template in which the user need only fill in the required data. Using the DTD as our guide, we can produce a document instance that contains all required document elements. Without a style sheet, however, this template is not of great value. The editor will simply display the document as a list of tags. Users need to know what information to insert where without displaying these tags--the only way we can accomplish this is by using our style sheet.
With the HTML form, it was possible to simply label each input zone with some text. Unfortunately, the CSS standard does not provide a means of displaying predefined text before or after an element. This style sheet feature was left out of CSS1 because of the inherent difficulty of implementing this feature in today's Web browsers (which, of course, by the time you read these words, will already be yesterday's Web browsers). Structure based editor/browsers such as Symposia, or indeed the thoroughbred XML structured editors to come, would have no problem implementing this type of feature, if it were included in the CSS standard.
The only viable solution for the moment seems to be to provide default text content for each data field that is then replaced by the user as he/she enters the information in the page (see Figure 4).
Figure 4


This layout is still not very satisfactory, however. Figure 5 improves the layout of the page using our style sheet to highlight certain elements and group related items together.
While we can achieve quite a pleasing presentation for our data entry screen using the style sheet. However there are two elements in the DTD which we cannot display in this same way: Product and Language. These two elements are empty and have no text content.

  • The Product element takes a number of optional attributes to indicate the contact's interest in the company's different products (the value of each attribute is "Yes" or "No," accordingly.
  • The Language element takes the Preference attribute, which can take the values "English" or "French," and for which the default value is "English"--indicating that any communication with or documentation sent to the contact should be in English.
One way around this problem, although not very elegant, would be to specify a background image for each element using the following style rule:

Product {background-image:image.gif} 
Using our authoring tool, we would then be able to select the element by clicking on the image and pull up a list of the element's attributes for modification, as shown in Figure 6.
Figure 5


Figure 6


Of course, we are still left with the following problem: using CSS, it is not possible to display the attribute names or values for an element in the document itself. In our example, it would have been nice to be able to pull up the list of attributes for the Product element, supply a value for each attribute, then see these changes appear in the document (display some text or an image to indicate which products the contact was interested in, for example). Neither does CSS provide the possibility to apply conditional style rules based on an element's attribute values (one might imagine a different image to be displayed for "Yes" or for "No" values).
Despite the limitations of CSS described here we have found that the language comes close to meeting our needs. This, combined with the fact that CSS is already implemented for HTML in the major Web browsers--and our sense that the simplicity of CSS will appeal to Web designers over more complex (albeit powerful) approaches--leads us to believe that CSS will be the dominant mechanism for displaying XML documents on the Web.

Table of contents

For further notes visit

http://www.w3schools.com/xml/default.asp
http://developer.apple.com/internet/webcontent/xmltransformations.html
http://www.quackit.com/xml/tutorial/xml_css.cfm

Wednesday, October 13, 2010

DOM AND SAX PARSERS

The Document Object Model (DOM) is a cross-platform and language-
independent convention for representing and interacting with objects in
HTML, XHTML and XML documents. Aspects of the DOM (such as its "Elements")
may be addressed and manipulated within the syntax of the programming l
anguage in use. The public interface of a DOM is specified in its
Application Programming Interface (API).
The DOM is a programming interface for HTML and XML documents.
It defines the way a document can be accessed and manipulated.
Using a DOM, a programmer can create a document, navigate its structure,
and add, modify, or delete its elements.
As a W3C specification, one important objective for the DOM has been to
provide a standard programming interface that can be used in a wide variety
 of environments and applications

What is the DOM?

The W3C Document Object Model (DOM) is a platform and language
 neutral interface that allow programs and scripts to dynamically access
and update the content,style and structure of a document."
The W3C DOM provides a standard set of objects for representing HTML
and XML documents, and a standard interface for accessing and manipulating
them.
The DOM is separated into different parts (Core,XML and HTML) and
also have different levels (DOM Level 1/2/3):
  • Core DOM - define a standard set of objects for any structured documents
  • XML DOM - defines a standard set of objects for XML documents only.
  • HTML DOM - define a standard set of objects for HTML documents only

What is the XML DOM?

  • The XML DOM is the Document Object Model for XML only.
  • The XML DOM is language- and platform-independent
  • The XML DOM define a standard set of objects for XML
  • The XML DOM define a standard way to access XML documents
  • The XML DOM define a standard way to manipulate XML documents
  • The XML DOM is W3C standard
The DOM view XML documents as a tree-structure. All elements; their containing
text and their attributes, can be accessed through the DOM tree. Their contents
can be deleted or modified, and new elements can be created. The elements,
their text, and their attributes are all known as nodes.
The XML DOM is:
  • A standard object model for XML
  • A standard programming interface for XML
  • Platform- and language-independent
  • A W3C standard
The XML DOM defines the objects and properties of all XML elements, and the methods (interface) to access them.
In other words: The XML DOM is a standard for how to get, change, add, or delete XML elements.




What You Should Already Know

Before you goes to this tutorial you should have a basic understanding of the following:
  • HTML / XHTML
  • XML
  • JavaScript
                                                                    

XML DOM Tutorial


DOM Nodes
DOM Node Tree
DOM Parser
DDOM Methods
DOM Accessing
DOM Node Info
DOM OM Load Function
Node List
DOM Traversing
DOM Browsers
DOM Navigating

Manipulate Nodes

XML DOM Reference

DOM Node Types
DOM Node
DOM NodeList
DOM NamedNodeMap
DOM Document
DOM DocumentImpl
DOM DocumentType
DOM ProcessingInstr
DOM Element
DOM Attribute
DOM Text
DOM CDATA
DOM Comment
DOM XMLHttpRequest
DOM ParseError Obj
DOM Parser Errors
DOM Summary

XML DOM Examples
DOM Examples
DOM Validator
                          SAX PARSER
What is SAX?

The Simple API for XML, SAX, was invented in late 1997/early 1998 when Peter Murray-Rust and several authors of XML parsers written in Java decided there wasn’t much point to maintaining multiple similar yet incompatible APIs to do exactly the same thing. Murray-Rust was the first to suggest what he called “YAXPAPI”. The reason Murray-Rust wanted Yet Another XML Parser API was that he was thoroughly sick of supporting multiple, incompatible XML parsers for his parser-client application JUMBO. Instead, he wanted a standard API everyone could agree on. Parser authors Tim Bray and David Megginson quickly signed on to the project, and work began in public on the xml-dev mailing list where many people participated. Megginson wrote the initial draft of SAX. After a short beta period, SAX 1.0 was released on May 11, 1998.
SAX was designed around abstract interfaces rather than concrete classes so it could be layered on top of parsers’ existing native APIs. SAX is not the most sophisticated XML API imaginable, but that’s part of its beauty. The ease with which SAX could be implemented by many parser vendors with very different architectures contributed to its success and rapid standardization.
SAX (Simple API for XML) is a sequential access parser API for XML. SAX provides a mechanism for reading data from an XML document. It is a popular alternative to the Document Object Model (DOM).

Introduction

This web page publishes SAX Parser code that reads XML formatted data into Java objects. A class is included that will allocate and initialize the SAX Parser. If a boolean flag is true, the parser will be initialized as a validating parser. The XML schema that the XML documents are validated against is published here as well.

SAX: Ass Backward Parsing

With SAX and XML Schema validation as examples, I am left with the impression that the people who developed these technologies never took a compiler implementation class, or if they did, the class left no impression on them.
Parsing is usually done by two logical components: a parser and a scanner. The scanner reads the text and classifies it as "tokens". A token is a catagory that is recognized by the parser. For example, a scanner for the Java programming language might return the tokens that include: identifier, integer, for (a reserved word), mult (an operator). An important point, relative to SAX, is that the parser calls the scanner. As the parser processes the tokens returned by the scanner it performs operations, like building a syntax tree. An example of a parser that reads assignment statements and arithmetic expressions and builds XML can be found here. The is part of the DOM parsing software mentioned above.
In the case of SAX, the scanner (the SAXParser object) calls the parser. This makes parsing with SAX needlessly awkward and complicates the architecture of the software. For this reason, the DOMParser is frequently used for parsing complicated XML documents.

SAX is not without its virtues (maybe)

The SAXParser does have two notable advantages over the DOMParser: the SAXParser is faster and it uses less memory. While the SAXParser is difficult to use for processing complex XML documents, perhaps it is appropriate for processing simple XML documents? This web page grew out of an experiment to see if this is true.

A Prototype Application

The prototype code published on this web page is motivated by a real application. This is a software system I call a Trade Engine, which is diagrammed in Figure 1. The Trade Engine is designed to process order and control messages for trading applications. These might be computer driven trading programs for the stock, options or foreign exchange markets. The trading applications submit XML formatted orders and control messages to the Trade Engine. The Trade Engine parses and validates these messages and builds internal Java objects. The market orders are called "aim orders" because they specify a trading goal. Depending on the processing instructions, the Trade Engine may execute the order over a period of time (e.g., the trading day).

Figure 1

Parsing Trade Engine Messages using SAX

SAX uses "call backs". When the SAXParser object recognizes a component in an XML document (e.g., a start Element, an end Element, the characters between tags), it calls a method that may be supplied by the application to process the XML component. In Java this is done by subclassing a handler class, like the DefaultHandler. This can be seen in the method signature in the javax.xml.parsers.SAXParser object for the parse method used in this example:
parse(InputStream is, DefaultHandler dh) 
In this example a MessageProcessor subclass is derived from the DefaultHandler class. The MessageProcessor class overrides the methods associated with the XML components that are of interest. For example, startElement is overridden, but the processingInstruction() method is not. The MessageProcessor class is diagrammed in Figure 2.





Figure 2
The result of the SAXParser calling the methods overridden by the MessageProcessor class is to build an object from the data. All Trade Engine objects share a common set of fields indicating the product that sent the message, the user and a local ID that is used by the product. The specific messages have unique data that is associated with that message type. This is experimental prototype code, so only two message types, Control and AimOrder are supported. This is shown in the Figure 3.

Figure 3
A Trade Engine message may enclose multiple sub-messages. For example, one Trade Engine message may include multiple aim orders. When the MessageProcessor class recognizes the start of a sub-message it allocates a sub-message processor. The MessageProcessor then calls the sub-message processor methods to process each of the XML components and build the object. The class diagram for the sub-message processors is shown in Figure 4. Again, this is just a prototype, so there are only two message processors.

Figure 4
The MessageBaseMessage base class processes the common data fields in the MessageBase object, which is the base class for the Control and AimOrder objects.

Conclusion

The software published here builds message objects from XML formatted data. In theory using the SAXParser for this is faster than using the DOMParser to build a DOM object and then traversing the DOM tree to build a message object. But the call back architecture of SAX introduces complexity that does not exist for a parser which calls a scanner. The awkwardness of SAX and the overhead of DOMParsing are some of the motivations behind the XML Pull Parser, which is called by a parsing application. An example that applies XML Pull Parsing to the Trade Engine messages described on this web page can be found here.
One advantage that both the SAX and DOM parsers have is that they are validating. The structure of the XML document can be verified against an XML schema. However, the computational cost of this validation is unknown (at least to me). SAX validation may reduce the computational advantage of the SAX parser compared to the DOM parser.

At its core, SAX, the Simple API for XML, is based on just two interfaces, the XMLReader interface that represents the parser and the ContentHandler interface that receives data from the parser. These two interfaces alone suffice for 90% of what you need to do with SAX. This chapter shows the basic operation of XMLReader and discusses ContentHandler in detail. The next chapter explores a variety of ways to customize the parsing process through the more advanced features of the XMLReader interface.