Abstract:
Techniques are provided for ensuring lexical fidelity when an XML document is stored in a binary format. Operations, on the XML data, that would cause the loss of lexical fidelity between the original XML document and the binary-encoded version of the XML document are not performed. Such operations include the removal of unnecessary whitespace characters, certain data type conversions, CRLF normalization, the “collapsing” of two-tag empty elements into a single tag empty element, and the replacing of entity references or numeric character references with another value. An XML schema, to which the XML document conforms, may indicate that the XML document is to be stored in a lexical fidelity mode. Additionally, or alternatively, the database statement that (when executed) causes the XML document to be stored in a binary format may so indicate.