XQuery findings



XQuery Utilities Findings

Wilfred Bonney

Grahame Grieve

October 12, 2005

Table of Contents

1.0 Introduction 3

2.0 Functional Requirements for the Repository 3

3.0 Open Source Software 4

3.1 XQEngine 4

3.2 eXist 5

3.3 Galax 5

3.4 Berkeley DB XML 5

3.5 Berkeley NUX 6

3.6 Qizx/open 6

3.7 MonetDB/XQuery 7

4.0 Commercial Software 7

4.1 XML SPY 2006 7

4.2 DataDirect XQuery 8

4.3 NeoCore XMS 9

4.4 QuiLogic 9

5.0 In-House Software Development 10

6.0 Conclusions 10

1.0 Introduction

The evolution and continuous morphing of HL7 tools to satisfy an increasing number of tooling requirements has resulted in a more systemic, global and virtual approach towards HL7 tools development.[1] These requirements have necessitated the need to provide the growing user communities with the ability to easily manage and access all their HL7 tools in a single environment without any difficulty.

This document aims to assist the HL7 Tooling Committee (TC) in selecting a repository to maintain, manage, control and access the numerous artifacts that are generated from the HL7 version 3 generator tool. The fact that most of the generated artifacts are in XML formats calls for the need for a native XML database or repository that supports XQuery or XPath capabilities.

The best way to explain XQuery is to say that XQuery is to XML what SQL is to database tables. XQuery was designed to query XML data. XQuery is also known as XML Query.[2] XQuery can be used to:

• Extract information to use in a Web Service

• Generate summary reports

• Transform XML data to XHTML

• Search Web documents for relevant information

XQuery is compatible with several W3C standards, such as XML, Namespaces, XSLT, XPath, and XML Schema. However, XQuery 1.0 is not yet a W3C Recommendation (XQuery is a Working Draft[3]).

2.0 Functional Requirements for the Repository

The following are the list of functional requirements that the XQuery utility should support

1. The local repository and it’s associated tools should support non-technical users without requiring them to learn underlying technologies used to meet the various requirements for the repository (such as CVS for example)

2. The local repository and it’s associated tools should assist and preferably enforce the user to maintain a correctly structured repository while they work.

3. The local repository and it’s associated tools should make it easy for users to synchronize their work with master repositories, and to control the scope of the synchronization so that incomplete work is not accidentally committed to the master repository

4. The local repository and it’s associated tools will cooperate with master repositories to apply business rules to prevent corruption of the master repository and ensure correct positioning of the data within the master repository

5. The local repository infrastructure must support multiple different concurrently active HL7 Projects / repositories on a desktop, including multiple different versions

6. The underlying technologies used to provide the solution must be free for use by HL7 members in developing HL7 standards

7. The local repository must be able to run on a desktop with no network connections using locally stored content

8. The local repository must support quickly moving projects and repositories or subsets thereof between different systems using CDs, memory sticks, etc

9. The local repository must provide search mechanisms to allow users to search by HL7 specific concepts or relationships. Further searching requirements are attached below

10. The contents of the local repository must be available to developers of either Eclipse plug-ins and unrelated products such as .Net and COM based tooling

11. The repository access software must support the following operating systems:

· Windows 2000,

· XP,

· Mac OS X +

· Linux.

12. It should be possible to install and run the repository tools without special access authority not typically granted to most users (i.e. It should not require 'Admin'/'root' authority to install and/or run)

13.

14. Changes must have transactional integrity

15. The local repository installation should be simple, preferably part of a larger install of HL7 tools

Based on the requirements, three different categories of XQuery utilities were identified. The categories are made up of open-source software, commercial software and in-house software development.

 

3.0 Open Source Software

The open source utilities identified include XQEngine, eXist, Galax, Berkeley DB XML, Berkeley Nux, Qizx/open and MonetDB/XQuery.

3.1 XQEngine

XQEngine is a full-text search engine for XML documents. Utilizing XQuery as its front-end query language, it lets you interrogate collections of XML documents for boolean combinations of keywords, much as Google and other search engines let you do for HTML.[4]

|Query Capabilities |Provides much more powerful search capabilities than equivalent HTML-based engines, since its XPath|

| |component lets you specify constraints on attributes and element hierarchies, in addition to the |

| |specific word content you're searching on.[5] Query speed is fast. |

|Storage Capabilities |Uses data stored in XML files; pre-indexing type of engine |

|License |Open Source; GNU General Public License (GPL) |

|Platform |MacOSX, Linux, Solaris and Windows |

|stability/reliability/readiness |Ongoing conformant and ehancement |

|Other features |Works with CVS; handles collections of multiple documents, up to a maximum limit of 2,147,000,000 |

| |documents, memory constraints notwithstanding.[6] |

3.2 eXist

eXist is an Open Source native XML database featuring efficient, index-based XQuery processing, automatic indexing, extensions for full-text search, XUpdate support and tight integration with existing XML development tools. [7]

|Query Capabilities |Has its own, optimized XQuery engine, featuring an efficient, index-based query processing. It |

| |relies on fast path join algorithms to compute node relationships. Path joins can be applied to the|

| |entire node sequence in one, single step. [8] |

|Storage Capabilities |Uses data stored in files; produces numeric indexing |

|License |Open Source; GNU LGPL License |

|Platform |Tested on Linux, and Windows 2000/XP/XP Server Edition |

|stability/reliability/readiness |Ongoing conformant and enhancement |

|Other features |Works with CVS. The maximum number ot document to be stored in the database is 231. No transaction |

| |support but planned for future release. |

3.3 Galax

Galax is an open-source implementation of XQuery 1.0, the W3C XML Query Language.[9] It is a fully functional implementation of XQuery 1.0. Galax supports XPath 2.0, including operations on document order, forward and backward axis, support for namespaces, operations on types, and user defined functions.[10]

|Query Capabilities |Supports the optional XQuery features such as XML Schema import, static type checking, and modules.|

|Storage Capabilities |Uses data stored in files |

|License |Open Source; |

|Platform |MacOSX, Linux, Solaris and Windows |

|stability/reliability/readiness |Ongoing conformant and enhancement |

|Other features |Ability to write clearly and concisely advanced manipulation/ transformation of XML data; takes |

| |advantage of schema descriptions and enables early detection of errors in programs.[11] |

3.4 Berkeley DB XML

Berkeley DB XML provides XQuery access into a database of document containers. XML documents are stored and indexed in their native format using Berkeley DB as the transactional database engine. Berkeley DB XML is not a client/server database management system; it is a C++ library linked into a given application. Berkeley DB and Berkeley DB XML can seamlessly work together to store XML and non-XML data with full support for transactions and recovery.[12]

|Query Capabilities |BDB XML retrieves documents that match a given XQuery query. BDB XML query results are always |

| |returned as a set. The set can contain either matching documents, or a set of values from those |

| |matching documents[13] |

|Storage Capabilities |Stores and indexes XML files in containers |

|License |Open Source; BSD style license |

|Platform |Windows, Linux, UNIX and other O/S |

|stability/reliability/readiness |Stable |

|Other features |Ability to manage databases up to 256 terabytes in size.[14] |

3.5 Berkeley NUX

Nux is an open-source Java toolkit making efficient and powerful XML processing easy. It is geared towards embedded use in high-throughput XML messaging middleware such as large-scale Peer-to-Peer infrastructures, message queues, publish-subscribe and matchmaking systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc.[15]

|Query Capabilities |For simple and complex continuous queries and/or transformations over very large or infinitely long|

| |XML input, a convenient streaming path filter API combines full XQuery and XPath support with |

| |straightforward filtering |

|Storage Capabilities |Uses data stored in files |

|License |Open Source; BSD style license |

|Platform |MacOSX, Linux, Solaris and Windows 2000 or higher. |

|Stability/reliability/readiness |Stable |

|Other Features |Targets fulltext search of huge numbers of queries over comparatively small transient realtime data|

| |(prospective search), e.g. 100000-500000 queries/sec ballpark.[16] |

3.6 Qizx/open

Qizx/open is a high-performance implementation of the XQuery language developed by the W3C.[17]

|Query Capabilities |Allows calling the XQuery engine from Java applications, and provides a high-level query interface,|

| |based on XQuery expressions. This is the XQuery equivalent of JDBC for SQL[18] |

|Storage Capabilities |Uses data stored in XML files |

|License |Open Source; Licensed under the Mozilla Public License (MPL) |

|Platform |Windows and Unix |

|Stability/reliability/readiness |Reduced version of a commercial product (XQuest[19]) |

3.7 MonetDB/XQuery

MonetDB/XQuery provides a full-fledged XQuery implementation, which adheres to all the typing rules prescribed in the W3C standard.[20]

|Query Capabilities |XQuery processor is designed in a classical front-end/back-end style. The front-end XQuery |

| |compiler, dubbed Pathfinder, receives XQuery queries. From this input, Pathfinder derives a |

| |relational-algebra style program which is then fed into the back-end relational database system, |

| |MonetDB. MonetDB executes the program and finally invokes your choice of several possible |

| |serialization routines to serialize the final query result.[21] |

|Storage Capabilities |The system comes with a built-in XML document shredder which will import XML files (no DTD or XML |

| |Schema description required) and convert them into relations in the MonetDB table space.[22] |

|License |Open Source; Released under the Pathfinder Public License Version 1.1. |

|Platform |Linux (32 and 64 bits), Windows, MacOS X, and Unix variants (Solaris, IRIX, AIX) |

|Stability/reliability/readiness |On going enhancement and development |

4.0 Commercial Software

The commercial software identified includes XML SPY 2006, DataDirect XQuery, NeoCore XMS and Quilogic.

4.1 XML SPY 2006

XMLSpy® 2006 provides native support for XQuery 1.0 development and execution with its built-in, standards-conformant XQuery engine.

|Query Capabilities |Altova XMLSpy® 2006 provides powerful support for XQuery and XPath 2.0 and includes an XQuery |

| |debugger. |

|Storage Capabilities |Uses data stored in files |

|License |Commercial; 30-day trial use period |

|Platform |Windows; Tested on MacOS X using Virtual PC 6 from Microsoft; Tested on Red Hat Linux 8.0 using |

| |Wine 20030115 |

|Stability/reliability/readiness | |

4.2 DataDirect XQuery

DataDirect XQuery is the first embeddable component for XQuery that implements the XQuery API for Java™ (XQJ). It supports all major relational databases on any Java platform. Figure 1 shows the design architecture of the DataDirect XQuery

|Query Capabilities |Allows you to query XML, relational databases, or a combination of the two, integrating the results|

| |for XMLbased data exchange, XMLdriven Web sites, and other applications that require or leverage |

| |the power of XML.[23] |

|Storage Capabilities |Uses data stored in files; does not produces indexes |

|License |Commercial; 30-day evaluation use period |

|Platform |Linux, Unix and Windows |

|stability/reliability/readiness | |

|Other features |Installs easily; does not require its own server and is scalable from desktop to enterprise |

| |applications. DataDirect XQuery is designed for software developers and independent software |

| |vendors (ISVs) who need to manage heterogeneous data sources in XML applications.[24] |

[pic]

Figure 1 Design architecture of the DataDirect XQuery

4.3 NeoCore XMS

NeoCore XMS is a completely self-constructing, fully transactional native XML database.

|Query Capabilities |Uses patented techniques to break XML into its raw informational components, context and content, |

| |and stores them in what are called “information couplets”. These “couplets” are indexed using a |

| |patented pattern-recognition technology that allows high speed retrieval and manipulation at any |

| |level while keeping overhead and storage requirements at a minimum.[25] |

|Storage Capabilities |Uses data stored in files |

|License |Commercial; free unlimited download for development purposes. |

|Platform |Windows 2000, XP, and 2003 in 32-bit mode, Rehat Linux 32-bit mode, and Solaris 64-bit for the |

| |utmost scalability, performance, and reliability |

|stability/reliability/readiness | |

|Other Features |Providies fine-grained access, transactional capabilities, and access control. XMS also scales to |

| |large multi-cpu systems, and takes full advantage of system RAM. [26] |

4.4 QuiLogic

QuiLogic has developed SQL/XML-IMDB, a high performance in-memory native xml database component with SQL and XQuery interface, transaction and multi-threading support. It is available as NET Assembly, ActiveX, DLL and LIB component including ANSI as well as UNICODE libraries.[27] The design architecture of Quilogic is shown in Figure 2.

|Query Capabilities |Executes SQL or XQuery commands by calling two simple functions. A cursor is automatically opened |

| |and the result set is accessed through cursor based API calls much like in ADO. Build in XML |

| |Data-Binding facility for connecting XML input streams (SOAP, File) to in memory objects.[28] |

|Storage Capabilities |Uses data stored in XML files; memory optimized index structures for outstanding search |

| |performance.[29] |

|License |Commercial; FREE evaluation version requires an application restart every 1 hour |

|Platform |Windows |

|stability/reliability/readiness | |

|Other Features |Ability to create XML views over relational data and to create, manipulate and transfer data |

| |between XML and SQL tables. Supports UPDATE, DELETE and INSERT operations on SQL and XML data. [30]|

| |Capability to store and manage up to 2 billion XML nodes and SQL rows.[31] |

[pic]

Figure 2 Design architecture of QuiLogic

5.0 In-House Software Development

In-house development has the potential to provide a system that is tailored to the specific needs of the HL7 Tooling Committee. However, these advantages are offset by the substantial investment of time, money and resources to build a fully-functioning system that is stable and relatively error free. This can be a costly venture.

Beyond the demands of designing, building, testing and documenting the system, there is a degree of uncertainty involved in its ongoing maintenance. A significant portion of the total cost of ownership will be in supporting and updating this custom-built system over its functional life.

 

6.0 Conclusions

Among the list presented in the document, it is pertinent that we download, configure and evaluate a couple of them to see if they really meet all the requirements of the repository. We can make conclusive recommendations for the best repository to be used only when we have conducted a thorough evaluation on the above mentioned tools.

At the moment, in-house development seems not to be the preferred alternative. This is because in-house development is a very risky and time-consuming venture, considering the cost involved in design, development, deployment and maintenance.

-----------------------

[1] Jane Curry & Ben Van De Walle , “Requirements, Configuration and Deployment Management Software Selection Consideration”, December 2004

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download