VTD-XML Benchmark Report for XPath Evaluation

Objective

This articles reports the measurement of XPath evaluation performance for VTD-XML version 1.9, and compares with Xerces 2.7.1 DOM and XPath functions bundled with JDK 1.5 (which is Xalan), as well as Jaxen 1.1.

Methodology Discussion

What is measured?

This benchmark report focuses on XPath evaluation performance only, i.e. the test code parses the document into internal structure, then repetitively evaluates a single pre-compiled XPath over the entire document. Our test measures the time (aka. latency) it takes to evaluate the XPath over the entire document. 

How to measure?

The benchmark code performs the following steps:

  1. Read XML file into an in-memory buffer.
  2. Parse the document once.
  3. Evaluate a single pre-compiled XPath expression for a large number of iterations.

All benchmark programs first loop thru the XPath evaluation code a number of iterations so the server JVM compile them into native code to obtain optimal performance, before the real measurement of the average latency of subsequent XPath evaluations.

Test Setup

Hardware

Software

The XML files

Three XML files of similar structure, but different sizes, are used for the test.

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name> Robert Smith </name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity></quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity><![CDATA[1]]></quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
         </item>
               ...
    </items>
</purchaseOrder>

The respective file sizes are:

 

The following XPath expressions are used for the test

The Code Sample

A snippet of the DOM/JDK benchmark code is shown below:

Document d = null;
k = total;
d = parser.parse(bais);
while (k > 0) {
        NodeList nodeList = (NodeList) xPathExpression.evaluate(d,
        XPathConstants.NODESET);
        k--;
}
long l=0,lt=0;
for (int j = 0; j < 10; j++) {
        l = System.currentTimeMillis();
        for (int i = 0; i < total; i++) {
            NodeList nodeList = (NodeList) xPathExpression.evaluate(d,
            XPathConstants.NODESET);
}
long l2 = System.currentTimeMillis();
lt = lt + (l2 - l);
}
System.out.println(" average XPath latency ==> "
+ ((double) (lt) / total / 10) + " ms");

 

Jaxen's code looks like the following:

 XPath expression = new org.jaxen.dom.DOMXPath(xpe);
System.out.println("====> update_jaxen1 ==>"+xpe);
//parse and xpath eval

Document d = null;
d = parser.parse(bais);
k = total;
while (k > 0) {
   List results = expression.selectNodes(d);
   k--;
}
long l=0,lt=0;
for (int j = 0; j < 10; j++) {
l = System.currentTimeMillis();
for (int i = 0; i < total; i++) {
   List results = expression.selectNodes(d);
}
long l2 = System.currentTimeMillis();
lt = lt + (l2 - l);
}
System.out.println(" average XPath latency ==> "
+ ((double) (lt) / total / 10) + " ms");

In comparison, a sample snippet of VTD-XML benchmark code is shown below:

k=total;
VTDNav vn = null;
while(k>0){
        while((i1=ap.evalXPath())!=-1){
        }
        ap.resetXPath();
        k--;
}
for (int j=0;j<10;j++){
        l = System.currentTimeMillis();
        for(int i = 0;i<total;i++)
        {
            while((i1=ap.evalXPath())!=-1){
            }
            ap.resetXPath();
        }
        long l2 = System.currentTimeMillis();
        lt = lt + (l2 - l);
        //System.out.println("j ==> "+j);
}
System.out.println(" average XPath latency ==> "+
((double)(lt)/total/10) + " ms");

Results

Notice that the table includes # of nodes returned from XPath evaluation.

Latency Number Comparison

/*/*/*[position() mod 2 = 0]

  Jaxen (ms) JDK (ms) VTD-XML (ms)
po_small.xml 0.0709 1.465 0.0205
po_medium.xml 0.937 11.5 0.266
po_big.xml 12.933 94.146 2.809

/purchaseOrder/items/item[USPrice <100]

  Jaxen (ms) JDK (ms) VTD-XML (ms)
po_small.xml 0.1465 1.55 0.0564
po_medium.xml 2.649 13.11 0.927
po_big.xml 33.48 111.04 10.66

/*/*/*/quantity/text()

  Jaxen (ms) JDK (ms) VTD-XML (ms)
po_small.xml 0.1365 1.426 0.0394
po_medium.xml 2.1866 12.06 0.634
po_big.xml 35.626 103.7 6.95

//item/comment

  Jaxen (ms) JDK (ms) VTD-XML (ms)
po_small.xml 0.47 1.615 0.123
po_medium.xml 17.062 15.41 1.846
po_big.xml 219.84 138.44 24.84

//item/comment/../quantity

  Jaxen (ms) JDK (ms) VTD-XML (ms)
po_small.xml 0.494 1.689 0.147
po_medium.xml 18.541 15.923 2.27825
po_big.xml 229.81 143.18 22.943

/*/*/*/* [position() mod 2 = 0]

  Jaxen (ms) JDK(ms) VTD-XML(ms)
po_small.xml 0.25 1.672 0.0673
po_medium.xml 5.29 15.582 1.167
po_big.xml 79.184 128.19 12.58

/purchaseOrder/items/item[USPrice > 100]/comment/text()

  Jaxen (ms) JDK (ms) VTD-XML (ms)
po_small.xml 0.196 1.626 0.0727
po_medium.xml 3.48 14.11 1.225
po_big.xml 47.618 118.13 13.55

 

Graphical View

/*/*/*[position() mod 2 = 0]

/purchaseOrder/items/item[USPrice <100]

/*/*/*/quantity/text()

//item/comment

//item/comment/../quantity

/*/*/*/* [position() mod 2 = 0]

/purchaseOrder/items/item[USPrice > 100]/comment/text()

Conclusion

As readers can see from the benchmark, it is particularly troubling that the JDK's bundled XPath engine (rumored to be Xalan) is quite slow, especially for small documents, whose XPath performance is over twice as slow as parsing. Something seems to be extraordinarily wrong because although DOM's parsing is slow, one would expect that the random access nature of DOM would enable a speedy XPath evaluation. But it is not the case. In contrast, VTD-XML's XPath is far better!