Programmer: Lifelong Learning: Interview Questions

Interview Questions - XML

DOM, SAX, StAX

http://geekexplains.blogspot.com/2009/04/sax-vs-dom-differences-between-dom-and.html

http://sharat.wordpress.com/2006/09/27/83-what-are-the-differences-between-sax-and-dom-parser/

There are three distinct approaches to parsing XML documents:

• DOM parsing

• Push parsing - SAX

• Pull parsing – StAX

DOM means Document Object Model, SAX stands for Simple API for XML.

DOM was developed by W3C, whereas SAX, StAX were developed by an informal group of participants of mailing list.

They have advantages and disadvantages and should be used depending on the situation.

DOM Parsing

The DOM approach has the following notable aspects:

1. An in-memory DOM tree representation of the complete document is constructed before the document structure and content can be accessed or manipulated.

2. Document nodes can be accessed randomly and do not have to be accessed strictly in document order.

3. Random access to any tree node is fast and flexible, but parsing the complete document before accessing any node can reduce parsing efficiency.

4. If an XML document needs to be navigated randomly or if the document content and structure needs to be manipulated, the DOM parsing approach is the most practical approach.

5. DOM is convenient when applications need to traverse the document multiple times.

6. DOM supports XPath.

7. For large documents ranging from hundreds of megabytes to gigabytes in size, the in-memory DOM tree structure can exhaust all available memory, making it impossible to parse such large documents under the DOM approach.

Push Approach -- SAX

1. SAX was developed by an informal group of participants of the XML-DEV mailing list.

2. Under the push parsing approach, a push parser generates synchronous events as a document is parsed, and these events can be processed by an application using a callback handler model.

3. No no random access, SAX can be used only for a sequential processing of an XML document, it can only traverse XML from top to bottom.

4. SAX doesn't retain all the information of the underlying XML document such as comments whereas DOM retains almost all the info.

5. SAX doesn't support XPath.

6. SAX doesn't retain all the information of the underlying XML document such as comments whereas DOM retains almost all the info.

7. Comparewd with DOM, SAX is efficient, and consumes lower memory.

Pull Approach -- StAX

1. Under the pull approach, events are pulled from an XML document under the control of the application using the parser.

2. StAX is similar to the SAX API in that both offer event-based APIs. However,

3. StAX differs from the SAX API in the following respects:

4. In SAX, data is pushed via events to application code handlers.

5. In StAX, the application "pulls" the data from the XML data stream at its convenience. Application code can filter, skip tags, or stop parsing at any time. The application--not the parser--is in control, which enables a more intuitive way to process data.

5.StAX offers two event-based APIs: a cursor-based API and an iterator-based API.

6. Unlike the SAX API, the StAX API can be used both for reading and for writing XML documents.

SAX, StAX are good choices for dealing with large documents.

XPath

XPath is a language for addressing node sets within an XML document.

The XPath data model treats an XML document as a tree of various node types, such as an element node, an attribute node, and a text node.

XSLT

XSLT specifies a language for transforming XML documents into other XML documents.

XSLT language constructs are completely based on XML. Therefore, transformations written in XSLT exist as well-formed XML documents. An XML document containing XSLT transformations is commonly referred to as a style sheet.

An XSLT style sheet merely specifies a set of transformations. Therefore, you need an XSLT processor to apply these transformations to a given XML document. An XSLT processor takes an XML document and an XSLT style sheet as inputs, and it transforms the given XML document to its target output, according to transformations specified in the style sheet. The target output of XSLT transformations is typically an XML document but could be an HTML document or any type of text document. Two commonly used XSLT processors are Xalan-Java and Saxon.

To use an XSLT processor, you need a set of Java APIs, and TrAX is precisely such an API set.

XML Libraries

Xerces, jdom, Sun’s XML parser

JAXP - Java API for XML Processing

Sun packages its XML APIs as the Java API for XML Processing (JAXP). JAXP is included in Jdk5.0 and later.

Its XSLT processor is actually Xalan from Apache.

Using factory classes, JAXP allows you to plug in any conforming XML or XSL parse.

JAXB - Object Binding

XML Schema Binding to Java Representation

Resources:

Pro XML Development with Java Technology

http://geekexplains.blogspot.com/2009/04/sax-vs-dom-differences-between-dom-and.html