Technical Report Number
Data, Computing Methodologies, Information Systems
XML documents are by design self-describing. In order to accomplish this, the XML data is highly verbose and very repetitious. Although techniques already exist to compress XML and text in general, most do not keep the data in a form that is useful to users. We present a technique that makes use of recurring structures within an XML document to compress the file in a way that can achieve better compression than other query-friendly compression techniques while still maintaining the data in a form that allows for both querying and indexing. Further, we present an example implementation of the technique, complete with an index-building mechanism and query processing capabilities.