A Quick Overview on Virtual Token Descriptor
- It is a binary format specification, not an API
specification
- A VTD record is a primitive data type (integer multiple
of 32 bit) that encodes the following parameters of a token in an XML file:
- Starting offset
- Length
- Nesting depth
- Token type
- VTD requires that XML document be maintained intact
in memory.
- Our current VTD record layout further specifies the following:
- Use 64 bit as the primitive type (b63~b0)
- Big endian
- Starting offset: 30 bit (b29 ~ b0) maixmum
value is 2^30 -1 = 1G -1
- Length: 20 bit (b51 ~ b32) maximum value is
2^20-1 = 1M -1
- For some token type
- Prefix length: 9 bit (b51~ b43)
max value 511
- Qname length: 11 bit (b42 ~ b 32) max
value 1023
- Depth: 8 bit (b59~b52) max value is 2^8-1 =
255
- Token type: 4 bit (b63~b60)
- Reserved bit: 2 bit (b31: b30)
Why VTD
- VTD is compact--Put every bit into good use.
- VTD is persistent
- VTD is fixed in length
- Making possible array-like bulk storage allocation, thus avoiding/minimizing
per object overhead
- Addressable using integers (hierarchical information also is persistent)
- VTD can be implemented in both software and hardware
- VTD is persistent
- VTD turns XML processing into a DES decryption problem (well suited
for dedicated hardware implementation)