This articles reports the measurement of XPath evaluation performance for VTD-XML version 1.9, and compares with Xerces 2.7.1 DOM and XPath functions bundled with JDK 1.5 (which is Xalan), as well as Jaxen 1.1.
This benchmark report focuses on XPath evaluation performance only, i.e. the test code parses the document into internal structure, then repetitively evaluates a single pre-compiled XPath over the entire document. Our test measures the time (aka. latency) it takes to evaluate the XPath over the entire document.
The benchmark code performs the following steps:
All benchmark programs first loop thru the XPath evaluation code a number of iterations so the server JVM compile them into native code to obtain optimal performance, before the real measurement of the average latency of subsequent XPath evaluations.
Three XML files of similar structure, but different sizes, are used for the test.
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name> Robert Smith </name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity></quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity><![CDATA[1]]></quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
...
</items>
</purchaseOrder>
The respective file sizes are:
The following XPath expressions are used for the test
A snippet of the DOM/JDK benchmark code is shown below:
Document d = null;
k = total;
d = parser.parse(bais);
while (k > 0) {
NodeList nodeList = (NodeList) xPathExpression.evaluate(d,
XPathConstants.NODESET);
k--;
}
long l=0,lt=0;
for (int j = 0; j < 10; j++) {
l = System.currentTimeMillis();
for (int i = 0; i < total; i++) {
NodeList nodeList = (NodeList) xPathExpression.evaluate(d,
XPathConstants.NODESET);
}
long l2 = System.currentTimeMillis();
lt = lt + (l2 - l);
}
System.out.println(" average XPath latency ==> "
+ ((double) (lt) / total / 10) + " ms");
Jaxen's code looks like the following:
XPath expression = new org.jaxen.dom.DOMXPath(xpe);
System.out.println("====> update_jaxen1 ==>"+xpe);
//parse and xpath eval
Document d = null;
d = parser.parse(bais);
k = total;
while (k > 0) {
List results = expression.selectNodes(d);
k--;
}
long l=0,lt=0;
for (int j = 0; j < 10; j++) {
l = System.currentTimeMillis();
for (int i = 0; i < total; i++) {
List results = expression.selectNodes(d);
}
long l2 = System.currentTimeMillis();
lt = lt + (l2 - l);
}
System.out.println(" average XPath latency ==> "
+ ((double) (lt) / total / 10) + " ms");
In comparison, a sample snippet of VTD-XML benchmark code is shown below:
k=total;
VTDNav vn = null;
while(k>0){
while((i1=ap.evalXPath())!=-1){
}
ap.resetXPath();
k--;
}
for (int j=0;j<10;j++){
l = System.currentTimeMillis();
for(int i = 0;i<total;i++)
{
while((i1=ap.evalXPath())!=-1){
}
ap.resetXPath();
}
long l2 = System.currentTimeMillis();
lt = lt + (l2 - l);
//System.out.println("j ==> "+j);
}
System.out.println(" average XPath latency ==> "+
((double)(lt)/total/10) + " ms");
Notice that the table includes # of nodes returned from XPath evaluation.
/*/*/*[position() mod 2 = 0]
Jaxen (ms) | JDK (ms) | VTD-XML (ms) | |
po_small.xml | 0.0709 | 1.465 | 0.0205 |
po_medium.xml | 0.937 | 11.5 | 0.266 |
po_big.xml | 12.933 | 94.146 | 2.809 |
/purchaseOrder/items/item[USPrice <100]
Jaxen (ms) | JDK (ms) | VTD-XML (ms) | |
po_small.xml | 0.1465 | 1.55 | 0.0564 |
po_medium.xml | 2.649 | 13.11 | 0.927 |
po_big.xml | 33.48 | 111.04 | 10.66 |
/*/*/*/quantity/text()
Jaxen (ms) | JDK (ms) | VTD-XML (ms) | |
po_small.xml | 0.1365 | 1.426 | 0.0394 |
po_medium.xml | 2.1866 | 12.06 | 0.634 |
po_big.xml | 35.626 | 103.7 | 6.95 |
//item/comment
Jaxen (ms) | JDK (ms) | VTD-XML (ms) | |
po_small.xml | 0.47 | 1.615 | 0.123 |
po_medium.xml | 17.062 | 15.41 | 1.846 |
po_big.xml | 219.84 | 138.44 | 24.84 |
//item/comment/../quantity
Jaxen (ms) | JDK (ms) | VTD-XML (ms) | |
po_small.xml | 0.494 | 1.689 | 0.147 |
po_medium.xml | 18.541 | 15.923 | 2.27825 |
po_big.xml | 229.81 | 143.18 | 22.943 |
/*/*/*/* [position() mod 2 = 0]
Jaxen (ms) | JDK(ms) | VTD-XML(ms) | |
po_small.xml | 0.25 | 1.672 | 0.0673 |
po_medium.xml | 5.29 | 15.582 | 1.167 |
po_big.xml | 79.184 | 128.19 | 12.58 |
/purchaseOrder/items/item[USPrice > 100]/comment/text()
Jaxen (ms) | JDK (ms) | VTD-XML (ms) | |
po_small.xml | 0.196 | 1.626 | 0.0727 |
po_medium.xml | 3.48 | 14.11 | 1.225 |
po_big.xml | 47.618 | 118.13 | 13.55 |
/*/*/*[position() mod 2 = 0]
/purchaseOrder/items/item[USPrice <100]
/*/*/*/quantity/text()
//item/comment
//item/comment/../quantity
/*/*/*/* [position() mod 2 = 0]
/purchaseOrder/items/item[USPrice > 100]/comment/text()
As readers can see from the benchmark, it is particularly troubling that the JDK's bundled XPath engine (rumored to be Xalan) is quite slow, especially for small documents, whose XPath performance is over twice as slow as parsing. Something seems to be extraordinarily wrong because although DOM's parsing is slow, one would expect that the random access nature of DOM would enable a speedy XPath evaluation. But it is not the case. In contrast, VTD-XML's XPath is far better!