WO2005101245A1 - Efficient query processing of xml data using xml index - Google Patents

Efficient query processing of xml data using xml index Download PDF

Info

Publication number
WO2005101245A1
WO2005101245A1 PCT/US2005/011762 US2005011762W WO2005101245A1 WO 2005101245 A1 WO2005101245 A1 WO 2005101245A1 US 2005011762 W US2005011762 W US 2005011762W WO 2005101245 A1 WO2005101245 A1 WO 2005101245A1
Authority
WO
WIPO (PCT)
Prior art keywords
template
index
query
path
generating
Prior art date
Application number
PCT/US2005/011762
Other languages
French (fr)
Inventor
Ashish Thusoo
Ravi Murthy
Sivasankaran Chandrasekar
Nipun Agarwal
Eric Sedlar
Original Assignee
Oracle International Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corporation filed Critical Oracle International Corporation
Publication of WO2005101245A1 publication Critical patent/WO2005101245A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/123Storage facilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99932Access augmentation or optimizing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure
    • Y10S707/99945Object-oriented database structure processing

Definitions

  • the present invention relates to accessing an XML index and, more specifically, to translating expressions and constructs into SQL for accessing an XML index.
  • XML data extensible Markup Language data
  • path expression is any expression that specifies a path through the hierarchical structure of an XML document.
  • the portion of an XML document identified by a path expression is the portion that resides, within the structure of the XML document, at the end of any path that matches the path expression.
  • a query that uses a path expression to identify one or more specific pieces XML data is referred to herein as a path-based query.
  • the process of determining which XML data corresponds to the path designated in a path-based query is referred to as "evaluating" the path expression.
  • FIG. 1 is a block diagram of a system upon which the techniques described herein may be implemented.
  • An XML index provides a mechanism for indexing paths, values, and order information in XML documents.
  • the actual XML data itself can reside in any form, like CLOB (character large object storing the actual XML text), O-R (object relational structured form in the presence of an XML schema), or BLOB (binary large object storing some binary form of the XML data).
  • an XML index consists of three logical structures that include a path index, an order index, and a value index, and can reside in a single table, hereinafter path_table.
  • pol.xml and po2.xml are merely two examples of XML documents.
  • the techniques described herein are not limited to XML documents having any particular types, structure or content. Examples shall be given hereafter of how such documents would be indexed and accessed according to various embodiments of the invention.
  • an XML index is a domain index that improves the performance of queries that include Xpath-based predicates and/or Xpath-based fragment extraction.
  • An XML index can be built, for example, over both XML Schema-based as well as schema-less XMLType columns which are stored either as CLOB or structured storage.
  • an XML index is a logical index that results from the cooperative use of a path index, a value index, and an order index.
  • the path index provides the mechanism to lookup fragments based on simple (navigational) path expressions.
  • the value index provides the lookup based on value equality or range. There could be multiple secondary value indexes - one per datatype.
  • the order index associates hierarchical ordering information with indexed nodes. The order index is used to determine parent-child, ancestor-descendant and sibling relationships between XML nodes.
  • the user XPath When the user submits a query involving XPaths (as predicate or fragment identifier), the user XPath is decomposed into a SQL query that accesses the XML index table.
  • the generated query typically performs a set of path, value and order-constrained lookups and merges their results appropriately.
  • a logical XML index includes a PATH table, and a set of secondary indexes.
  • each indexed XML document may include many indexed nodes.
  • the PATH table contains one row per indexed node. For each indexed node, the PATH table row for the node contains various pieces of information associated with the node.
  • the information contained in the PATH table includes (1) a PATHID that indicates the path to the node, (2) "location data” for locating the fragment data for the node within the base structures, and (3) "hierarchy data” that indicates the position of the node within the structural hierarchy of the XML document that contains the node.
  • the PATH table may also contain value information for those nodes that are associated with values. Each of these types of information shall be described in greater detail below.
  • PATHS The structure of an XML document establishes parent-child relationships between the nodes within the XML document.
  • the "path" for a node in an XML document reflects the series of parent-child links, starting from a "root” node, to arrive at the particular node.
  • the path to the "User" node in po2.xml is /PurchaseOrder/Actions/Action/User, since the "User" node is a child of the "Action” node, the "Action” node is a child of the "Actions” node, and the "Actions" node is a child of the "PurchaseOrder” node.
  • indexed XML documents The set of XML documents that an XML index indexes is referred to herein as the "indexed XML documents".
  • an XML index may be built on all of the paths within all of the indexed XML documents, or a subset of the paths within the indexed XML documents. Techniques for specifying which paths are indexed are described hereafter.
  • the set of paths that are indexed by a particular XML index are referred to herein as the "indexed XML paths”.
  • each of the indexed XML paths is assigned a unique path ID.
  • the paths that exist in pol.xml and po2.xml may be assigned path IDs as illustrated in the following table:
  • Various techniques may be used to identify paths and assign path IDs to paths. For example, a user may explicitly enumerate paths, and specify corresponding path IDs for the paths thus identified.
  • the database server may parse each XML document as the document is added to the set of indexed XML documents. During the parsing operation, the database server identifies any paths that have not already been assigned a path ID, and automatically assigns new path IDs to those paths.
  • the pathid-to-path mapping may be stored within the database in a variety of ways. According to one embodiment, the pathid- to-path mapping is stored as metadata separate from the XML indexes themselves.
  • the same access structures are used for XML documents that conform to different schemas. Because the indexed XML documents may conform to different schemas, each XML document will typically only contain a subset of the paths to which pathids have been assigned.
  • LOCATION DATA [0034] The location data associated with a node indicates where the XML document that contains the node resides within the base structures. Thus, the nature of the location data will vary from implementation to implementation based on the nature of the base structures. Depending on how the actual XML document is stored, the location data may also include a locator or logical pointer to point into the XML document.
  • the logical pointer may be used for extracting fragments that are associated with nodes identified by XPaths.
  • the base structures are tables within a relational database, and (2) each indexed XML document is stored in a corresponding row of a base table.
  • the location data for a node may include, for example, (1) the rowid of row, within the base table, in which the XML document containing the node is stored, and (2) a locator that provides fast access within the XML document, to the fragment data, that corresponds to the node.
  • the PATH table row for a node also includes information that indicates where the node resides within the hierarchical structure of the XML document containing the node. Such hierarchical information is referred to herein as the "OrderKey" of the node.
  • the hierarchical order information is represented using a Dewey-type value. Specifically, in one embodiment, the OrderKey of a node is created by appending a value to the OrderKey of the node's immediate parent, where the appended value indicates the position, among the children of the parent node, of that particular child node.
  • a particular node D is the child of a node C, which itself is a child of a node B that is a child of a node A.
  • node D has the OrderKey 1.2.4.3.
  • the final "3" in the OrderKey indicates that the node D is the third child of its parent node C.
  • the 4 indicates that node C is the fourth child of node B.
  • the 2 indicates that Node B is the second child of node A.
  • the leading 1 indicates that node A is the root node (i.e. has no parent).
  • the Orderkey of a child may be easily created by appending to the OrderKey of the parent a value that corresponds to the number of the child.
  • the OrderKey of the parent is easily derived from the OrderKey of the child by removing the last number in the Orderkey of the child.
  • the composite numbers represented by each OrderKey are converted into byte-comparable values, so that a mathematical comparison between two OrderKeys indicates the relative position, within the structural hierarchy of an XML document, of the nodes to which the OrderKeys correspond.
  • the node associated with the OrderKey 1.2.7.7 precedes the node associated with the OrderKey 1.3.1 in the hierarchical structure of an XML document.
  • the database server uses a conversion mechanism that converts OrderKey 1.2.7.7 to a first value, and to convert OrderKey 1.3.1 to a second value, where the first value is less than the second value. By comparing the second value to the first value, the database server can easily determine that the node associated with the first value precedes the node associated with the second value.
  • Various conversion techniques may be used to achieve this result, and the invention is not limited to any particular conversion technique.
  • VALUE INFORMATION Some nodes within an indexed document may be attribute nodes or nodes that correspond to simple elements. According to one embodiment, for attribute nodes and simple elements, the PATH table row also stores the actual value of the attributes and elements. Such values may be stored, for example, in a "value column" of the PATH table. The secondary "value indexes", which shall be described in greater detail hereafter, are built on the value column.
  • the PATH table includes columns defined as specified in the following table:
  • the PATHID is a number assigned to the node, and uniquely represents a fully expanded path to the node.
  • the ORDER_KEY is a system representation of the DEWEY ordering number associated with the node. According to one embodiment, the internal representation of the order key also preserves document ordering.
  • the VALUE column stores the effective text value for simple element (i.e. no element children) nodes and attribute nodes. According to one embodiment, adjacent text nodes are coalesced by concatenation. As shall be described in greater detail hereafter, a mechanism is provided to allow a user to customize the effective text value that gets stored in VALUE column by specifying options during index creation e.g.
  • the user can store the VALUE column in any number of formats, including a bounded RAW column or a BLOB. If the user chooses bounded storage, then any overflow during index creation is flagged as an error.
  • the following table is an example of a PATH table that (1) has the columns described above, and (2) is populated with entries for pol.xml and po2.xml. Specifically, each row of the PATH table corresponds to an indexed node of either pol.xml or po2.xml. In this example, it is assumed that pol.xml and po2.xml are respectively stored at rows Rl and R2 of a base table.
  • the rowid column stores a unique identifier for each row of the PATH table.
  • the rowid column may be an implicit column.
  • the disk location of a row may be used as the unique identifier for the row.
  • the secondary Order and Value indexes use the rowid values of the PATH table to locate rows within the PATH table.
  • the PATHID, ORDERKEY and VALUE of a node are all contained in a single table. In alternative embodiment, separate tables may be used to map the PATHID, ORDERKEY and VALUE information to corresponding location data (e.g. the base table Rid and Locator).
  • the PATH table includes the information required to locate the XML documents, or XML fragments, that satisfy a wide range of queries. However, without secondary access structures, using the PATH table to satisfy such queries will often require full scans of the PATH table. Therefore, according to one embodiment, a variety of secondary indexes are created by the database server to accelerate the queries that (1) perform path lookups and/or (2) identify order-based relationships. According to one embodiment, the following secondary indexes are created on the PATH table.
  • PATHID JNDEX [0050] The PATHID_INDEX is built on the pathid, rid columns of the PATH table. Thus, entries in the PATHID_LNDEX are in the form (keyvalue, rowid), where keyvalue is a composite value representing a particular pathid/rid combination, and rowid identifies a particular row of the PATH table.
  • the PATHID JNDEX may be used to quickly locate the row, within the PATH table, for the node. For example, based on the key value "3.R1", the PATHID_LNDEX may be traversed to find the entry that is associated with the key value "3.R1". Assuming that the PATH table is populated as illustrated above, the index entry would have a rowid value of 3. The rowid value of 3 points to the third row of the PATH table, which is the row for the node associated with the pathid 3 and the rid Rl .
  • ORDERKEY_INDEX is built on the rid and orderkey columns of the PATH table.
  • entries in the ORDERKEY_INDEX are in the form (keyvalue, rowid), where keyvalue is a composite value representing a particular rid/orderkey combination, and rowid identifies a particular row of the PATH table.
  • the ORDERKEY_LNDEX may be used to quickly locate the row, within the PATH table, for the node. For example, based on the key value "Rl .' 1.2'", the ORDERKEYJNDEX may be traversed to find the entry that is associated with the key value "Rl .' 1.2'". Assuming that the PATH table is populated as illustrated above, the index entry would have a rowid value of 3. The rowid value of 3 points to the third row of the PATH table, which is the row for the node associated with the orderkey 1.2 and the rid Rl .
  • NUMBER value index is used to handle number-based comparisons within user Xpaths.
  • Entries in the NUMBERJNDEX may, for example, be in the form (number, rowid), where the rowid points to a row, within the PATH table, for a node associated with the value of "number”.
  • entries within the STRLNGJNDEX may have the form (string, rowid)
  • entries within the TIMESTAMP JNDEX may have the form (timestamp, rowid).
  • the format of the values in the PATH table may not correspond to the native format of the data type. Therefore, when using the value indexes, the database server may call conversion functions to convert the value bytes from stored format to the specified datatype. In addition, the database server applies any necessary transformations, as shall be described hereafter. According to one embodiment, the conversion functions operate on both RAW and BLOB values and return NULL if the conversion is not possible.
  • the value indexes are created when the XML index is created. However, users can suppress the creation of one or more of value indexes based on the knowledge of query workload. For example, if all XPath predicates involve string comparisons only, the NUMBER and TIMESTAMP value indexes can be avoided.
  • the set of secondary indexes built on the PATH table include a PARENT_ORDERKEY_JNDEX. Similar to the ORDERJCEY index, the PARENT DERKEY JNDEX is built on the rid and order Jcey columns of the PATH table. Consequently, the index entries of the PARENT_ORDERKEY JNDEX have the form (keyvalue, rowid), where keyvalue is a composite value that corresponds to a particular rid/order Jcey combination. However, unlike the ORDERJCEY index, the rowid in a PARENT J3RDERKEY JNDEX entry does not point to the PATH table row that has the particular rid/orderjcey combination. Rather, the rowid of each
  • PARENT_ORDERKEY JNDEX entry points to the PATH table row of the node that is the immediate parent of the node associated with the rid/order Jcey combination.
  • the rid/order Jcey combination "Rl.'1.2"' corresponds to the node in row 3 of the PATH table.
  • the immediate parent of the node in row 3 of the PATH table is the node represented by row 1 of the PATH table. Consequently, the PARENT DRDERKEY JNDEX entry associated with the "Rl.'1.2"' key value would have a rowid that points to row 1 of the PATH table.
  • an XML index indexes nodes, within XML documents, based on the paths to the nodes.
  • the following are examples of path expressions that a path-based query may include: /po/action/action /po[id "abc"] /po//action [0061]
  • the path components contained in the XML index may be used to efficiently evaluate path expressions. Typically, evaluating path expressions from the indexed path components is much faster than evaluating path expressions against the base tables, which would result in a complete scan of the original XML documents.
  • the techniques involve identifying a path specified in the input query, identifying a template that corresponds to the format of the specified path, and generating, based on rules associated with the template, an "index- enabled" query that uses the XML index to locate the XML data that corresponds to the specified path.
  • the index-enabled query may be, for example, a SQLX query (a SQL query that may include XML-specific operators).
  • the techniques may involve (1) decomposing a generic path expression into simpler components such as simple paths, predicates, and structural joins; (2) generating a SQL query against tables of the XML index, which may involve expressing the structural joins using SQL predicates on Dewey order keys of the indexed paths components; and (3) fragment extraction using locators that point to the original data in an efficient way.
  • the database server uses the XML index to return the locators, and the actual XML data at the locators, that need to be read and supplied to the user.
  • the following templates define how, in one embodiment, index-enabled queries are generated based on path expressions, where the index-enabled queries access the path able of the XML index.
  • the path expression of a path-based query, or fragments thereof are matched against templates.
  • Each template is associated with a rule. When a fragment of a specified path is in a format that matches a template, the corresponding rule is then used to generate SQL for an index-enabled query.
  • Templates, and the corresponding rules, of one embodiment are described in detail hereafter.
  • the templates given as examples hereafter include templates that correspond to simple path expressions, filter expressions, descendant axis expressions, wildcard expressions, logical expressions, relational expressions, literals, casting nodesets to Boolean expressions, and text functions.
  • the translation of path expressions into SQL for accessing the pathjable of the XML index is not limited to the specific examples given hereafter.
  • the templates are simply examples of how, in one embodiment, such translation may be performed.
  • pathidQ denotes an internal function used to lookup the pathid associated with the concerned path.
  • Filter expressions are expressions of the type "P1[F(P2)]" where PI is a path expression and F is a filter defined on the relative path P2.
  • An example of a filter expression using the XML documents described above is /PurchaseOrder/Actions/ Action[User - 'King"].
  • PI would correspond to "/PurchaseOrder/ Actions/ Action” and P2 would correspond to "User.”
  • Filter expressions, such as "P1[F(P2)]" are rewritten, according to one embodiment, to the following SQL: P1[F(P2)] - select ptl.
  • Maxkey() is an internal function that takes an order key of a node as input and generates a key that is greater than the key of any descendant of the input order key.
  • /Po corresponds to PI of the template
  • id corresponds to P2 of the template
  • sqll is the SQL generated, by application of the other rules described herein, for evaluating the path expression /Po
  • sql2 is the SQL generated, by application of the other rules described herein, for evaluating the path expression /Po/id
  • PI i.e. /Po
  • P1/P2 i.e. /Po/id
  • the rows produced by sqll shall be referred to herein as the sqll rows.
  • the rows produced by sql2 shall be referred to herein as the sql2 rows.
  • the sql generated by the rule of this template therefore, selects from the sqll rows only those rows that have an order key corresponding to nodes that are ancestors (parents) of nodes returned by sql2.
  • Descendant axis expressions are expressions of the type "P1//P2" where PI and P2 are path expressions.
  • a simple example of a descendant axis expression is /PurchaseOrder//User, which selects all the User elements that are descendants (whether a child element, grandchild element, etc.) of the PurchaseOrder element.
  • Descendant axis expressions are rewritten, according to one embodiment, using the SQL shown below: P1//P2 -* select pt2.pathid, pt2.rid, pt2. order Jcey, pt2.
  • :B1 pathid(P2) in which P2 is a simple path, where sql corresponds to the rewritten SQL for expression PI, and where sys_xdbpathsuffix() is a table function that generates path ids corresponding to all nodes whose paths have P2 as a suffix, i.e. all //P2 paths.
  • Wildcard expressions are expressions of the type "Pl/*/P2.”
  • a simple example of this expression is "/PurchaseOrder/*/Action” which selects all the Action elements that are grandchild elements (and only grandchild elements) of the PurchaseOrder element.
  • multiple wildcards (*) in the expression such as "Pl/*/*/*/P2” which indicates that the first element of path P2 is the great-great grandchild of the last element of PI.
  • Wildcard expressions are rewritten, according to one embodiment, using the SQL shown below: Pl/*/P2 -» select pt2.pathid, pt2.rid, pt2.
  • Logical expressions are expressions of the type "El op E2" where El and E2 are expressions and op is either a logical AND or a logical OR.
  • the predicate "(sqll > 0) op (sql2 > 0)" is therefore TRUE only if either (1) op is “AND” and both El and E2 are true, or (2) op is "OR” and at least one of El and E2 are true. If the predicate "(sqll > 0) op (sql2 > 0)" is true, then the "select 1" statement causes a 1 to be returned (indicating true). If the predicate "(sqll > 0) op (sql2 > 0)" is false, then nothing is returned.
  • Boolean false() -> select 0 as value from dual
  • Boolean true() select 1 as value from dual where dual is a dummy table that contains no information, but rather is used for proper SQL syntax.
  • CASTING NODESETS TO BOOLEANS [0079] Another feature that is helpful in translating paths into corresponding SQL for querying the XML index is the casting operator. Nodesets are converted to Booleans in case a cast operator appears in the path expression. For example, a cast operator is implicit in an expression of the form /a[b], which selects all the "a" elements that have a "b” element.
  • the following SQL is used to generate a Boolean: /a[b] ⁇ select count(*) as value from (sql) where sql is the SQL generated from path "/ab", and where count() is an internal function that counts the number of nodes that are returned from sql. If the number of nodes is greater than zero, this SQL statement will return a positive value, else it will return a zero.
  • the existsNode operator determines whether a particular node, specified by an path, exists in an XML document. If the node is located in an XML document, and consequently in the XML index, then a "1", signifying true, is returned.
  • sql is the SQL obtained after applying the rules, defined in the previous sections, on the path expression P, and where xmltab is the base table that contains the XML documents.
  • This rule states in the where clause that the row id of the base table, where the XML document is found, is the same as the Rid of at least one tuple in the results indicated by sql. This condition ensures that only the particular document a user is considering is searched. Because the XML index is spanning multiple XML documents, it is important to ensure that only the applicable XML document in the base table is searched, and not all the XML documents in the base table.
  • the extractValue operator given a path expression, returns a single value from the XML index.
  • EXTRACT OPERATOR [0084] In contrast to the extractValue operator, the extract operator, given a path expression, is used to generate an XML type tree.
  • This rule generates all the rows that are indicated by expression P. For each row, the fragment is retrieved from the base table, T, and aggregated into a single XML type tree.
  • the output of select extract(value(T), /PurchaseOrder/Actions) from xmltab T would, for example, have the form: ⁇ Actions>
  • XMLSEQUENCE OPERATOR is an operator that returns a collection of XML instances corresponding to the root elements in the input fragment. For example, XMLSequcnce(extract(value(T),'/PurchaseOrder/LineItems')) returns a collection of XML instances corresponding to the individual Lineltems.
  • PI is a path expression
  • sql is the SQL obtained after applying the rules, defined in the previous sections, on the path expression P.
  • FIG. 1 is a block diagram that illustrates a computer system 100 upon which an embodiment of the invention may be implemented.
  • Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 coupled with bus 102 for processing information.
  • Computer system 100 also includes a main memory 106, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 102 for storing information and instructions to be executed by processor 104.
  • Main memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104.
  • Computer system 100 further includes a read only memory (ROM) 108 or other static storage device coupled to bus 102 for storing static information and instructions for processor 104.
  • ROM read only memory
  • a storage device 110 such as a magnetic disk or optical disk, is provided and coupled to bus 102 for storing information and instructions.
  • Computer system 100 may be coupled via bus 102 to a display 112, such as a cathode ray tube (CRT), for displaying information to a computer user.
  • a display 112 such as a cathode ray tube (CRT)
  • An input device 114 is coupled to bus 102 for communicating information and command selections to processor 104.
  • cursor control 116 is Another type of user input device
  • cursor control 116 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112.
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • the invention is related to the use of computer system 100 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another machine-readable medium, such as storage device 110. Execution of the sequences of instructions contained in main memory 106 causes processor 104 to perform the process steps described herein. In alternative embodiments, hard- wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
  • machine-readable medium refers to any medium that participates in providing data that causes a machine to operation in a specific fashion.
  • various machine-readable media are involved, for example, in providing instructions to processor 104 for execution.
  • Such a medium may take many forms, including but not limited to, non- olatile media, volatile media, and transmission media.
  • Non- volatile media includes, for example, optical or magnetic disks, such as storage device 110.
  • Volatile media includes dynamic memory, such as main memory 106.
  • Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 102. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
  • Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 100 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 102.
  • Bus 102 carries the data to main memory 106, from which processor 104 retrieves and executes the instructions.
  • the instructions received by main memory 106 may optionally be stored on storage device 110 either before or after execution by processor 104.
  • Computer system 100 also includes a communication interface 118 coupled to bus 102.
  • Communication interface 118 provides a two-way data communication coupling to a network link 120 that is connected to a local network 122.
  • communication interface 118 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 118 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 120 typically provides data communication through one or more networks to other data devices.
  • network link 120 may provide a connection through local network 122 to a host computer 124 or to data equipment operated by an Internet Service Provider (ISP) 126.
  • ISP 126 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet” 128.
  • Internet 128 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link 120 and through communication interface 118, which carry the digital data to and from computer system 100, are exemplary forms of carrier waves transporting the information.
  • Computer system 100 can send messages and receive data, including program code, through the network(s), network link 120 and communication interface 118.
  • a server 130 might transmit a requested code for an application program through Internet 128, ISP 126, local network 122 and communication interface 118.
  • the received code may be executed by processor 104 as it is received, and/or stored in storage device 110, or other non- volatile storage for later execution. In this manner, computer system 100 may obtain application code in the form of a carrier wave.

Abstract

A method and apparatus is provided for translating queries, such as path expressions and SQL/XML constructs, into SQL statements to be executed against an XML index, which improves processor time as opposed to applying path expressions directly to the original XML documents to extract the desired information. Simple path expressions, filter expressions, descendant axes, wildcards, logical expressions, relational expressions, literals, and other path expressions are all translated into SQL for efficient querying of an XML index. Similarly, rules for translating SQL/XML constructs into SQL are provided.

Description

EFFICIENT QUERY PROCESSING OF XML DATA USING XML INDEX
PRIORITY CLAIM
[0001] This application claims priority to U.S. Provisional Patent Application Serial No.
60/560,927, entitled XML LNDEX FOR XML DATA STORED IN NARIOUS STORAGE
FORMATS, filed on April 9, 2004, the contents of which are herein incorporated by reference in their entirety for all purposes.
[0002] This application claims priority to US. Provisional Patent Application No.
60/580,445 entitled XML INDEX FOR XML DATA STORED IN NARIOUS STORAGE
FORMATS, filed on June 16, 2004, the contents of which are herein incorporated by reference in their entirety for all purposes.
[0003] This application claims priority to U.S. Provisional Patent Application Serial No.
60/582,706, entitled TECHNIQUES FOR PROCESSING XQUERY QUERIES IN A
RELATIONAL DATABASE MANAGEMENT SYSTEM, filed on June 23, 2004, the contents of which are herein incorporated by reference in their entirety for all purposes.
[0004] This application claims priority to and is a continuation in part of U.S. Patent
Application Serial No. 10/884,311, entitled INDEX FOR ACCESSING XML DATA, filed on July 2, 2004, the contents of which are herein incorporated by reference in their entirety for all purposes.
[0005] This application is related to U.S. Patent Application Serial No. 10/944,171, entitled MECHANISM FOR EFFICIENTLY EVALUATING OPERATOR TREES, filed on
September 16, 2004, the contents of which are herein incorporated by reference in their entirety for all purposes.
[0006] This application is related to U.S. Patent Application Serial No. 10/944,177, entitled INDEX MAINTENANCE FOR OPERATIONS INVOLVING INDEXED XML
DATA, filed on September 16, 2004, the contents of which are herein incorporated by reference in their entirety for all purposes.
[0007] This application is related to U.S. Patent Application Serial No. 11/044,472, entitled PROCESSING QUERIES IN A CHOSEN ABSTRACT SYNTAX, filed on January 26, 2005, the contents of which are herein incorporated by reference in their entirety for all purposes.
[0008] This application is related to U.S. Patent Application Serial No. 10/948,523, entitled TECHNIQUES FOR OPTIMIZING MID-TIER XQUERY AGAINST SQL/XML ENABLED RDBMS, filed on September 22, 2004, the contents of which are herein incorporated by reference in their entirety for all purposes.
FIELD OF THE INVENTION
[0009] The present invention relates to accessing an XML index and, more specifically, to translating expressions and constructs into SQL for accessing an XML index.
BACKGROUND
[0010] There are many database systems that allow storage and querying of extensible Markup Language data ("XML data"). Though there are many evolving standards for querying XML, all of them include some variation of XPath. XPath allows XML data to be queried based on path expressions. A path expression is any expression that specifies a path through the hierarchical structure of an XML document. The portion of an XML document identified by a path expression is the portion that resides, within the structure of the XML document, at the end of any path that matches the path expression.
[0011] A query that uses a path expression to identify one or more specific pieces XML data is referred to herein as a path-based query. The process of determining which XML data corresponds to the path designated in a path-based query is referred to as "evaluating" the path expression.
[0012] Unfortunately, even database systems that have built-in support for storing XML data are usually not optimized to handle path-based queries, and the query performance of the database systems leaves much to be desired. In specific cases where an XML schema definition may be available, the structure and data types used in XML instance documents may be known. However, in cases where an XML schema definition is not available, and the documents to be searched do not conform to any schema, there are no efficient techniques for querying using path-based queries. [0013] Without XML indexes, path expressions were directly evaluated against the base tables. As a result, the processing of these expressions involved a complete scan of the base tables. Each scanned row was tested to ascertain whether it satisfied the path expression. Moreover, the evaluation of the path expression was typically done in a functional way manner by constructing a DOM (memory data structure) and traversing the DOM tree while evaluating the path.
[0014] Based on the foregoing, there is a clear need to improve the processing time of path-based queries by providing a way for path-based queries to retrieve data from XML documents without incurring the problems associated with a complete scan of the base tables and construction of expensive memory data structures.
[0015] The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The present invention is illustrated by way of example, and not by way of limitation, in the figure of the accompanying drawing and in which like reference numerals refer to similar elements and in which:
[0017] FIG. 1 is a block diagram of a system upon which the techniques described herein may be implemented.
DETAILED DESCRIPTION
[0018] In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention. XML INDEXES [0019] U.S. Patent Application Serial No. 10/884,311, entitled INDEX FOR ACCESSING XML DATA, filed on July 2, 2004, describes various embodiments of an index that may be used to efficiently access XML documents, managed by a relational database server, based on path-based queries. Indexes for accessing XML data, such as the indexes disclosed in '311, shall be referred to herein as XML indexes. [0020] An XML index provides a mechanism for indexing paths, values, and order information in XML documents. The actual XML data itself can reside in any form, like CLOB (character large object storing the actual XML text), O-R (object relational structured form in the presence of an XML schema), or BLOB (binary large object storing some binary form of the XML data). In one embodiment, an XML index consists of three logical structures that include a path index, an order index, and a value index, and can reside in a single table, hereinafter path_table.
[0021] For the purpose of explanation, the techniques described herein are described in a context in which an XML index, as described in U.S. Patent Application Serial No. 10/884,311 , is used to index the XML documents. However, the techniques are not limited to any specific index structure or mechanism.
EXAMPLE XML DOCUMENTS [0022] For the purpose of explanation, examples shall be given hereafter with reference to the following two XML documents: pol.xml
<PurchaseOrder> <Reference>SBELL-2002100912333601PDT</Reference> <Actions> <Action> <User>SVO LMAN</User> </Action> </Actions>
</PurchaseOrder> po2 .xml
<PurchaseOrder> <Reference>ABEL-20021127121040897PST</Reference> <Actions> <Action> <User>Z OTKEY</User> </Ac ion> <Action> <User>KING</User> </Action> </Actions>
</Purc aseOrder>
[0023] As indicated above, pol.xml and po2.xml are merely two examples of XML documents. The techniques described herein are not limited to XML documents having any particular types, structure or content. Examples shall be given hereafter of how such documents would be indexed and accessed according to various embodiments of the invention.
THE XML INDEX [0024] According to one embodiment, an XML index is a domain index that improves the performance of queries that include Xpath-based predicates and/or Xpath-based fragment extraction. An XML index can be built, for example, over both XML Schema-based as well as schema-less XMLType columns which are stored either as CLOB or structured storage. In one embodiment, an XML index is a logical index that results from the cooperative use of a path index, a value index, and an order index.
[0025] The path index provides the mechanism to lookup fragments based on simple (navigational) path expressions. The value index provides the lookup based on value equality or range. There could be multiple secondary value indexes - one per datatype. The order index associates hierarchical ordering information with indexed nodes. The order index is used to determine parent-child, ancestor-descendant and sibling relationships between XML nodes.
[0026] When the user submits a query involving XPaths (as predicate or fragment identifier), the user XPath is decomposed into a SQL query that accesses the XML index table. The generated query typically performs a set of path, value and order-constrained lookups and merges their results appropriately.
THE PATH TABLE [0027] According to one embodiment, a logical XML index includes a PATH table, and a set of secondary indexes. As mentioned above, each indexed XML document may include many indexed nodes. The PATH table contains one row per indexed node. For each indexed node, the PATH table row for the node contains various pieces of information associated with the node.
[0028] According to one embodiment, the information contained in the PATH table includes (1) a PATHID that indicates the path to the node, (2) "location data" for locating the fragment data for the node within the base structures, and (3) "hierarchy data" that indicates the position of the node within the structural hierarchy of the XML document that contains the node. Optionally, the PATH table may also contain value information for those nodes that are associated with values. Each of these types of information shall be described in greater detail below.
PATHS [0029] The structure of an XML document establishes parent-child relationships between the nodes within the XML document. The "path" for a node in an XML document reflects the series of parent-child links, starting from a "root" node, to arrive at the particular node. For example, the path to the "User" node in po2.xml is /PurchaseOrder/Actions/Action/User, since the "User" node is a child of the "Action" node, the "Action" node is a child of the "Actions" node, and the "Actions" node is a child of the "PurchaseOrder" node. [0030] The set of XML documents that an XML index indexes is referred to herein as the "indexed XML documents". According to one embodiment, an XML index may be built on all of the paths within all of the indexed XML documents, or a subset of the paths within the indexed XML documents. Techniques for specifying which paths are indexed are described hereafter. The set of paths that are indexed by a particular XML index are referred to herein as the "indexed XML paths".
PATH IDS [0031] According to one embodiment, each of the indexed XML paths is assigned a unique path ID. For example, the paths that exist in pol.xml and po2.xml may be assigned path IDs as illustrated in the following table:
Figure imgf000009_0001
[0032] Various techniques may be used to identify paths and assign path IDs to paths. For example, a user may explicitly enumerate paths, and specify corresponding path IDs for the paths thus identified. Alternatively, the database server may parse each XML document as the document is added to the set of indexed XML documents. During the parsing operation, the database server identifies any paths that have not already been assigned a path ID, and automatically assigns new path IDs to those paths. The pathid-to-path mapping may be stored within the database in a variety of ways. According to one embodiment, the pathid- to-path mapping is stored as metadata separate from the XML indexes themselves. [0033] According to one embodiment, the same access structures are used for XML documents that conform to different schemas. Because the indexed XML documents may conform to different schemas, each XML document will typically only contain a subset of the paths to which pathids have been assigned. LOCATION DATA [0034] The location data associated with a node indicates where the XML document that contains the node resides within the base structures. Thus, the nature of the location data will vary from implementation to implementation based on the nature of the base structures. Depending on how the actual XML document is stored, the location data may also include a locator or logical pointer to point into the XML document. The logical pointer may be used for extracting fragments that are associated with nodes identified by XPaths. [0035] For the purpose of explanation, it shall be assumed that (1) the base structures are tables within a relational database, and (2) each indexed XML document is stored in a corresponding row of a base table. In such a context, the location data for a node may include, for example, (1) the rowid of row, within the base table, in which the XML document containing the node is stored, and (2) a locator that provides fast access within the XML document, to the fragment data, that corresponds to the node.
HIERARCHY DATA [0036] The PATH table row for a node also includes information that indicates where the node resides within the hierarchical structure of the XML document containing the node. Such hierarchical information is referred to herein as the "OrderKey" of the node. [0037] According to one embodiment, the hierarchical order information is represented using a Dewey-type value. Specifically, in one embodiment, the OrderKey of a node is created by appending a value to the OrderKey of the node's immediate parent, where the appended value indicates the position, among the children of the parent node, of that particular child node.
[0038] For example, assume that a particular node D is the child of a node C, which itself is a child of a node B that is a child of a node A. Assume further that node D has the OrderKey 1.2.4.3. The final "3" in the OrderKey indicates that the node D is the third child of its parent node C. Similarly, the 4 indicates that node C is the fourth child of node B. The 2 indicates that Node B is the second child of node A. The leading 1 indicates that node A is the root node (i.e. has no parent).
[0039] As mentioned above, the Orderkey of a child may be easily created by appending to the OrderKey of the parent a value that corresponds to the number of the child. Similarly, the OrderKey of the parent is easily derived from the OrderKey of the child by removing the last number in the Orderkey of the child.
[0040] According to one embodiment, the composite numbers represented by each OrderKey are converted into byte-comparable values, so that a mathematical comparison between two OrderKeys indicates the relative position, within the structural hierarchy of an XML document, of the nodes to which the OrderKeys correspond. [0041] For example, the node associated with the OrderKey 1.2.7.7 precedes the node associated with the OrderKey 1.3.1 in the hierarchical structure of an XML document. Thus, the database server uses a conversion mechanism that converts OrderKey 1.2.7.7 to a first value, and to convert OrderKey 1.3.1 to a second value, where the first value is less than the second value. By comparing the second value to the first value, the database server can easily determine that the node associated with the first value precedes the node associated with the second value. Various conversion techniques may be used to achieve this result, and the invention is not limited to any particular conversion technique.
VALUE INFORMATION [0042] Some nodes within an indexed document may be attribute nodes or nodes that correspond to simple elements. According to one embodiment, for attribute nodes and simple elements, the PATH table row also stores the actual value of the attributes and elements. Such values may be stored, for example, in a "value column" of the PATH table. The secondary "value indexes", which shall be described in greater detail hereafter, are built on the value column.
PATH TABLE EXAMPLE [0043] According to one embodiment, the PATH table includes columns defined as specified in the following table:
Figure imgf000011_0001
Figure imgf000012_0001
[0044] As explained above, the PATHID is a number assigned to the node, and uniquely represents a fully expanded path to the node. The ORDER_KEY is a system representation of the DEWEY ordering number associated with the node. According to one embodiment, the internal representation of the order key also preserves document ordering. [0045] The VALUE column stores the effective text value for simple element (i.e. no element children) nodes and attribute nodes. According to one embodiment, adjacent text nodes are coalesced by concatenation. As shall be described in greater detail hereafter, a mechanism is provided to allow a user to customize the effective text value that gets stored in VALUE column by specifying options during index creation e.g. behavior of mixed text, whitespace, case-sensitive, etc can be customized. The user can store the VALUE column in any number of formats, including a bounded RAW column or a BLOB. If the user chooses bounded storage, then any overflow during index creation is flagged as an error. [0046] The following table is an example of a PATH table that (1) has the columns described above, and (2) is populated with entries for pol.xml and po2.xml. Specifically, each row of the PATH table corresponds to an indexed node of either pol.xml or po2.xml. In this example, it is assumed that pol.xml and po2.xml are respectively stored at rows Rl and R2 of a base table. POPULATED PATH TABLE
Figure imgf000012_0002
Figure imgf000013_0001
[0047] In this example, the rowid column stores a unique identifier for each row of the PATH table. Depending on the database system in which the PATH table is created, the rowid column may be an implicit column. For example, the disk location of a row may be used as the unique identifier for the row. As shall be described in greater detail hereafter, the secondary Order and Value indexes use the rowid values of the PATH table to locate rows within the PATH table. [0048] In the embodiment illustrated above, the PATHID, ORDERKEY and VALUE of a node are all contained in a single table. In alternative embodiment, separate tables may be used to map the PATHID, ORDERKEY and VALUE information to corresponding location data (e.g. the base table Rid and Locator).
SECONDARY INDEXES [0049] The PATH table includes the information required to locate the XML documents, or XML fragments, that satisfy a wide range of queries. However, without secondary access structures, using the PATH table to satisfy such queries will often require full scans of the PATH table. Therefore, according to one embodiment, a variety of secondary indexes are created by the database server to accelerate the queries that (1) perform path lookups and/or (2) identify order-based relationships. According to one embodiment, the following secondary indexes are created on the PATH table. • PATHID_INDEX on (pathid, rid) • ORDERKEYJNDEX on (rid, order_key) • VALUE INDEXES • PARENT_ORDERKEY_ΓNDEX on (rid, SYS_DEWEY_PARENT(order_key))
PATHID JNDEX [0050] The PATHID_INDEX is built on the pathid, rid columns of the PATH table. Thus, entries in the PATHID_LNDEX are in the form (keyvalue, rowid), where keyvalue is a composite value representing a particular pathid/rid combination, and rowid identifies a particular row of the PATH table.
[0051] When (1) the pathid of a node and (2) the base table row are known, the PATHID JNDEX may be used to quickly locate the row, within the PATH table, for the node. For example, based on the key value "3.R1", the PATHID_LNDEX may be traversed to find the entry that is associated with the key value "3.R1". Assuming that the PATH table is populated as illustrated above, the index entry would have a rowid value of 3. The rowid value of 3 points to the third row of the PATH table, which is the row for the node associated with the pathid 3 and the rid Rl .
THE ORDERKEY_INDEX [0052] The ORDERKEY_INDEX is built on the rid and orderkey columns of the PATH table. Thus, entries in the ORDERKEY_INDEX are in the form (keyvalue, rowid), where keyvalue is a composite value representing a particular rid/orderkey combination, and rowid identifies a particular row of the PATH table.
[0053] When (1) the base table row and (2) the orderkey of a node are known, the ORDERKEY_LNDEX may be used to quickly locate the row, within the PATH table, for the node. For example, based on the key value "Rl .' 1.2'", the ORDERKEYJNDEX may be traversed to find the entry that is associated with the key value "Rl .' 1.2'". Assuming that the PATH table is populated as illustrated above, the index entry would have a rowid value of 3. The rowid value of 3 points to the third row of the PATH table, which is the row for the node associated with the orderkey 1.2 and the rid Rl .
THE VALUE INDEXES [0054] Just as queries based on path lookups can be accelerated using the PATHID JNDEX, queries based on value lookups can be accelerated by indexes built on the value column of the PATH table. However, the value column of the PATH table can hold values for a variety of data types. Therefore, according to one embodiment, a separate value index is built for each data type stored in the value column. Thus, in an implementation in which the value column holds strings, numbers and timestamps, the following value (secondary) indexes are also created: • STRING JNDEX on SYS_XMLVALUE_TO_STRLNG(value) • NUMBER JNDEX on SYS_XMLVALUE_TO_NUMBER(value) • TIMESTAMP JNDEX on SYS_XMLVALUE_TO_TIMESTAMP(value)
[0055] These value indexes are used to perform datatype based comparisons (equality and range). For example, the NUMBER value index is used to handle number-based comparisons within user Xpaths. Entries in the NUMBERJNDEX may, for example, be in the form (number, rowid), where the rowid points to a row, within the PATH table, for a node associated with the value of "number". Similarly, entries within the STRLNGJNDEX may have the form (string, rowid), and entries within the TIMESTAMP JNDEX may have the form (timestamp, rowid).
[0056] The format of the values in the PATH table may not correspond to the native format of the data type. Therefore, when using the value indexes, the database server may call conversion functions to convert the value bytes from stored format to the specified datatype. In addition, the database server applies any necessary transformations, as shall be described hereafter. According to one embodiment, the conversion functions operate on both RAW and BLOB values and return NULL if the conversion is not possible. [0057] By default, the value indexes are created when the XML index is created. However, users can suppress the creation of one or more of value indexes based on the knowledge of query workload. For example, if all XPath predicates involve string comparisons only, the NUMBER and TIMESTAMP value indexes can be avoided.
PARENT DRDERKEY JNDEX [0058] According to one embodiment, the set of secondary indexes built on the PATH table include a PARENT_ORDERKEY_JNDEX. Similar to the ORDERJCEY index, the PARENT DERKEY JNDEX is built on the rid and order Jcey columns of the PATH table. Consequently, the index entries of the PARENT_ORDERKEY JNDEX have the form (keyvalue, rowid), where keyvalue is a composite value that corresponds to a particular rid/order Jcey combination. However, unlike the ORDERJCEY index, the rowid in a PARENT J3RDERKEY JNDEX entry does not point to the PATH table row that has the particular rid/orderjcey combination. Rather, the rowid of each
PARENT_ORDERKEY JNDEX entry points to the PATH table row of the node that is the immediate parent of the node associated with the rid/order Jcey combination. [0059] For example, in the populated PATH table illustrated above, the rid/order Jcey combination "Rl.'1.2"' corresponds to the node in row 3 of the PATH table. The immediate parent of the node in row 3 of the PATH table is the node represented by row 1 of the PATH table. Consequently, the PARENT DRDERKEY JNDEX entry associated with the "Rl.'1.2"' key value would have a rowid that points to row 1 of the PATH table.
USING THE XML INDEX FOR PATH EXPRESSION EVALUATION [0060] As described above, an XML index indexes nodes, within XML documents, based on the paths to the nodes. The following are examples of path expressions that a path-based query may include: /po/action/action /po[id="abc"] /po//action [0061] The path components contained in the XML index may be used to efficiently evaluate path expressions. Typically, evaluating path expressions from the indexed path components is much faster than evaluating path expressions against the base tables, which would result in a complete scan of the original XML documents. [0062] As shall be described in more detail hereafter, techniques are provided for rewriting path expressions, such as those contained within path-based queries, to access the XML index. According to one embodiment, the techniques involve identifying a path specified in the input query, identifying a template that corresponds to the format of the specified path, and generating, based on rules associated with the template, an "index- enabled" query that uses the XML index to locate the XML data that corresponds to the specified path. The index-enabled query may be, for example, a SQLX query (a SQL query that may include XML-specific operators).
[0063] More specifically, the techniques may involve (1) decomposing a generic path expression into simpler components such as simple paths, predicates, and structural joins; (2) generating a SQL query against tables of the XML index, which may involve expressing the structural joins using SQL predicates on Dewey order keys of the indexed paths components; and (3) fragment extraction using locators that point to the original data in an efficient way. Using fragment extraction, the database server uses the XML index to return the locators, and the actual XML data at the locators, that need to be read and supplied to the user.
TRANSLATION OF PATH EXPRESSIONS [0064] The following templates define how, in one embodiment, index-enabled queries are generated based on path expressions, where the index-enabled queries access the path able of the XML index. As mentioned above, the path expression of a path-based query, or fragments thereof, are matched against templates. Each template is associated with a rule. When a fragment of a specified path is in a format that matches a template, the corresponding rule is then used to generate SQL for an index-enabled query. [0065] Templates, and the corresponding rules, of one embodiment are described in detail hereafter. The templates given as examples hereafter include templates that correspond to simple path expressions, filter expressions, descendant axis expressions, wildcard expressions, logical expressions, relational expressions, literals, casting nodesets to Boolean expressions, and text functions. The translation of path expressions into SQL for accessing the pathjable of the XML index is not limited to the specific examples given hereafter. The templates are simply examples of how, in one embodiment, such translation may be performed.
SIMPLE PATH EXPRESSIONS [0066] Simple paths are expressions of the type "/a/b/c" which have no other axes apart from child axes and which do not have any filters on the axis paths. Simple paths, such as "/a/b/c," are rewritten, according to one embodiment, to the following SQL: /a/b/c -» select ptl .pathid, ptl.rid, ptl. order Jcey, ptl. locator, ptl. value from pathjable ptl where ptl. pathid = :B1
where :B1 = pathid(7a/b/c'), and where the function pathidQ denotes an internal function used to lookup the pathid associated with the concerned path.
FILTER EXPRESSIONS [0067] Filter expressions are expressions of the type "P1[F(P2)]" where PI is a path expression and F is a filter defined on the relative path P2. An example of a filter expression using the XML documents described above is /PurchaseOrder/Actions/ Action[User - 'King"]. In this example, PI would correspond to "/PurchaseOrder/ Actions/ Action" and P2 would correspond to "User." Filter expressions, such as "P1[F(P2)]", are rewritten, according to one embodiment, to the following SQL: P1[F(P2)] - select ptl. pathid, ptl.rid, ptl. order Jcey, ptl. locator, ptl. value from (sqll) ptl where exists (select pt2.pathid, pt2.rid, pt2. order Jcey, pt2.1ocator, pt2.value
Figure imgf000018_0001
where pt2.order Jcey > ptl .order Jcey and pt2.order_key < maxkey(ptl.order_key) and
Figure imgf000018_0002
[0068] where sqll and sql2 represent the SQL generated for evaluating PI and P1/P2, respectively. Maxkey() is an internal function that takes an order key of a node as input and generates a key that is greater than the key of any descendant of the input order key. [0069] An example of a filter expression that conforms to this template is /Po [id = "1 "] . In this example, /Po corresponds to PI of the template, "id" corresponds to P2 of the template, and "=1" corresponds to F of the template. Thus, in this example: o sqll is the SQL generated, by application of the other rules described herein, for evaluating the path expression /Po; o Similarly, sql2 is the SQL generated, by application of the other rules described herein, for evaluating the path expression /Po/id [0070] In this example, both PI (i.e. /Po) and P1/P2 (i.e. /Po/id) are simple path expressions. Therefore, the rule for simple path expressions, described above, would be used to determine the SQL for sqll and sql2.
[0071] For the purpose of explanation, the rows produced by sqll shall be referred to herein as the sqll rows. Similarly, the rows produced by sql2 shall be referred to herein as the sql2 rows. The sql generated by the rule of this template, therefore, selects from the sqll rows only those rows that have an order key corresponding to nodes that are ancestors (parents) of nodes returned by sql2.
DESCENDANT AXIS EXPRESSIONS [0072] Descendant axis expressions are expressions of the type "P1//P2" where PI and P2 are path expressions. A simple example of a descendant axis expression is /PurchaseOrder//User, which selects all the User elements that are descendants (whether a child element, grandchild element, etc.) of the PurchaseOrder element. Descendant axis expressions are rewritten, according to one embodiment, using the SQL shown below: P1//P2 -* select pt2.pathid, pt2.rid, pt2. order Jcey, pt2. locator, pt2.value from pathjable pt2 where pt2.pathid in (select * from sys_xdbpathsuffix(:Bl)) and exists(select ptl.* from (sql) ptl where pt2.order_key > ptl .order Jcey and pt2.order_key < maxkey(ptl .order_key) and ptl.rid = pt2.rid)
where :B1 = pathid(P2) in which P2 is a simple path, where sql corresponds to the rewritten SQL for expression PI, and where sys_xdbpathsuffix() is a table function that generates path ids corresponding to all nodes whose paths have P2 as a suffix, i.e. all //P2 paths.
WILDCARD EXPRESSIONS [0073] Wildcard expressions are expressions of the type "Pl/*/P2." A simple example of this expression is "/PurchaseOrder/*/Action" which selects all the Action elements that are grandchild elements (and only grandchild elements) of the PurchaseOrder element. Also, it is possible to include multiple wildcards (*) in the expression such as "Pl/*/*/*/P2" which indicates that the first element of path P2 is the great-great grandchild of the last element of PI. Wildcard expressions are rewritten, according to one embodiment, using the SQL shown below: Pl/*/P2 -» select pt2.pathid, pt2.rid, pt2. order Jcey, pt2.1ocator, pt2.values from pathjable pt2, (sql) ptl where pt2.pathid in (select * from sys_xdbpathsuffix(:Bl)) and exists(select ptl.*
Figure imgf000020_0001
where pt2.order_key > ptl .order cey and pt2.order_key < maxkey(ptl. order Jcey) and depth(pt2.order_key) = depth(ptl. order Jcey)+1 and
Figure imgf000020_0002
where :B1 equals pathid(//P2) in which P2 is a simple expression, where sql corresponds to the rewritten SQL for the expression PI, and where the function depth() is an internal function that, given an order key of a node, can compute the depth of the node. LOGICAL EXPRESSIONS [0074] Logical expressions are expressions of the type "El op E2" where El and E2 are expressions and op is either a logical AND or a logical OR. Logical expressions, such as "El op E2," are rewritten, according to one embodiment, using the following rule: El op E2 -» select 1 as value from dual where (sqll > 0) op (sql2 > 0) where sqll and sql2 represent the SQL generated for El and E2, respectively. In this rule, dual is a dummy table that contains no information, but rather is used for proper SQL syntax. [0075] Since El and E2 are being combined by a Boolean operator, El and E2 should be expressions that produce Boolean values. In the present example, the value 1 is used to represent true, and the value 0 is used to represent false. Thus, the statement "sqll > 0" is true if the expression El evaluates to 1, and if false if El evaluates to 0. Similarly, the statement "sql2 > 0" is false if the expression E2 evaluates to 1, and is false if E2 evaluates to false.
[0076] The predicate "(sqll > 0) op (sql2 > 0)" is therefore TRUE only if either (1) op is "AND" and both El and E2 are true, or (2) op is "OR" and at least one of El and E2 are true. If the predicate "(sqll > 0) op (sql2 > 0)" is true, then the "select 1" statement causes a 1 to be returned (indicating true). If the predicate "(sqll > 0) op (sql2 > 0)" is false, then nothing is returned.
RELATIONAL EXPRESSIONS [0077] Relational expressions are expressions of the type "El op E2" where El and E2 are expressions and op is a relational operator that maps to one of =, !=, >, <, etc. Relational expressions are rewritten, according to one embodiment, using the following rule: El op E2 -» select (case when al. value op a2.value then 1 else 0) as value from (sqll) al, (sql2) a2 where sqll and sql2 represent the SQL generated by El and E2, respectively. If the statement in the case when clause is true, then a one is returned; else, a zero is returned.
LITERALS [0078] Literals are stand-alone values, such as numbers, strings, and Booleans true and false. Literals are rewritten, according to one embodiment, using the following rules: Number literal n - select n as value
Figure imgf000022_0001
String literal s -> select 's' as value from dual
Boolean false() -> select 0 as value from dual
Boolean true() - select 1 as value from dual where dual is a dummy table that contains no information, but rather is used for proper SQL syntax. CASTING NODESETS TO BOOLEANS [0079] Another feature that is helpful in translating paths into corresponding SQL for querying the XML index is the casting operator. Nodesets are converted to Booleans in case a cast operator appears in the path expression. For example, a cast operator is implicit in an expression of the form /a[b], which selects all the "a" elements that have a "b" element. In these cases, the following SQL, according to one embodiment, is used to generate a Boolean: /a[b] ^ select count(*) as value from (sql) where sql is the SQL generated from path "/ab", and where count() is an internal function that counts the number of nodes that are returned from sql. If the number of nodes is greater than zero, this SQL statement will return a positive value, else it will return a zero.
TEXT FUNCTION [0080] Another useful expression to be converted to SQL is the text function. Expressions of the form "Pl/text()" are rewritten, according to one embodiment, to the following SQL expression: Pl/text() - select pt.value from (sql) pt
where sql is the rewritten SQL corresponding to path PI.
TRANSLATION OF SQL/XML CONSTRUCTS [0081] To further utilize the benefit of an XML index, it would be wise, not only to translate path expressions into SQL for querying the XML index, but also to allow for the translation of SQL/XML constructs to be applied against the XML index. This would allow for more user friendly SQL coding. Therefore, in another embodiment of the invention, there are four SQL/XML consfructs that are translated which include the existsNode, extractValue, extract, and XMLSequence operators. The invention is not limited to the specific examples given hereafter corresponding to each SQL/XML construct. Each translation of the following SQL/XML constructs illustrates one way such translation may be performed.
EXISTSNODE OPERATOR [0082] The existsNode operator determines whether a particular node, specified by an path, exists in an XML document. If the node is located in an XML document, and consequently in the XML index, then a "1", signifying true, is returned. The existsNode operator is rewritten, according to one embodiment, using the following rule: select ... from xmltab T where existsNode(value(T), P) = 1
- select ... from xmltab T where exists (select 1
Figure imgf000024_0001
where Q.rid = T.rowid)
where sql is the SQL obtained after applying the rules, defined in the previous sections, on the path expression P, and where xmltab is the base table that contains the XML documents. This rule states in the where clause that the row id of the base table, where the XML document is found, is the same as the Rid of at least one tuple in the results indicated by sql. This condition ensures that only the particular document a user is considering is searched. Because the XML index is spanning multiple XML documents, it is important to ensure that only the applicable XML document in the base table is searched, and not all the XML documents in the base table.
EXTRACTVALUE OPERATOR [0083] The extractValue operator, given a path expression, returns a single value from the XML index. The extractValue operator is rewritten, according to one embodiment, using the following rule: select extractValue(value(T), P) from xmltab T -= select (select Q.value
Figure imgf000024_0002
where Q.rid = T.rowid) from xmltab T where sql is the SQL obtained after applying the rules, defined in the previous sections, on the path expression P.
EXTRACT OPERATOR [0084] In contrast to the extractValue operator, the extract operator, given a path expression, is used to generate an XML type tree. The extract operator is rewritten, according to one embodiment, using the following rule: select extract(value(T), P) from xmltab T -> select (select xmlagg(get_frag(Q.rid, Q.locator))
Figure imgf000025_0001
where Q.rid = T.rowid) from xmltab T where sql is the SQL obtained after applying the rules, defined in the previous sections, on the path expression P, where getjrag is an operator that reads a fragment from the base tables, given a row and a locator, and where xmlagg is an operator that concatenates the fragments generated by the getjrag operator. This rule generates all the rows that are indicated by expression P. For each row, the fragment is retrieved from the base table, T, and aggregated into a single XML type tree. The output of select extract(value(T), /PurchaseOrder/Actions) from xmltab T would, for example, have the form: <Actions>
</Actions>
XMLSEQUENCE OPERATOR [0085] XMLSequence is an operator that returns a collection of XML instances corresponding to the root elements in the input fragment. For example, XMLSequcnce(extract(value(T),'/PurchaseOrder/LineItems')) returns a collection of XML instances corresponding to the individual Lineltems. The XMLSequence operator is rewritten, according to one embodiment, using the following rule: select ... from xmltab T, table(xmlsequence(extract(value(T), P))) TI where existsNode(value(Tl), PI) = 1 select ... from xmltab T, (select Q.* from (sql) Q where Q.rid = T.rowid) TI where exists (select 1
Figure imgf000026_0001
where Q 1.rid = T 1.rid and Ql. order Jcey > TI. order Jcey and Ql .order key < maxkey(Tl .order Jcey) and depth(Ql. order Jcey) = depth(Tl. order Jcey)+1)
where PI is a path expression, and sql is the SQL obtained after applying the rules, defined in the previous sections, on the path expression P.
[0086] In this example, the general existNode rewrite rule described above is not applied because the first operand to the existsNode operator (value(Tl)) is generated by operation of the XMLSequence operator.
QUERY GENERATION RESTRICTIONS [0087] If a query generation happens to fail, the old functional approach described earlier is used. One situation in which a query generation would fail is when a path expression contains constructs for which the conversion rules are not specified. Another situation in which a query generation would fail is when a rewritten SQL does not have the pathjable, or xmltab depending on the context, as a top level object in the from clause. HARDWARE OVERVIEW [0088] Figure 1 is a block diagram that illustrates a computer system 100 upon which an embodiment of the invention may be implemented. Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 coupled with bus 102 for processing information. Computer system 100 also includes a main memory 106, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 102 for storing information and instructions to be executed by processor 104. Main memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104. Computer system 100 further includes a read only memory (ROM) 108 or other static storage device coupled to bus 102 for storing static information and instructions for processor 104. A storage device 110, such as a magnetic disk or optical disk, is provided and coupled to bus 102 for storing information and instructions.
[0089] Computer system 100 may be coupled via bus 102 to a display 112, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 114, including alphanumeric and other keys, is coupled to bus 102 for communicating information and command selections to processor 104. Another type of user input device is cursor control 116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
[0090] The invention is related to the use of computer system 100 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another machine-readable medium, such as storage device 110. Execution of the sequences of instructions contained in main memory 106 causes processor 104 to perform the process steps described herein. In alternative embodiments, hard- wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software. [0091] The term "machine-readable medium" as used herein refers to any medium that participates in providing data that causes a machine to operation in a specific fashion. In an embodiment implemented using computer system 100, various machine-readable media are involved, for example, in providing instructions to processor 104 for execution. Such a medium may take many forms, including but not limited to, non- olatile media, volatile media, and transmission media. Non- volatile media includes, for example, optical or magnetic disks, such as storage device 110. Volatile media includes dynamic memory, such as main memory 106. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 102. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
[0092] Common forms of machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read. [0093] Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 100 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 102. Bus 102 carries the data to main memory 106, from which processor 104 retrieves and executes the instructions. The instructions received by main memory 106 may optionally be stored on storage device 110 either before or after execution by processor 104.
[0094] Computer system 100 also includes a communication interface 118 coupled to bus 102. Communication interface 118 provides a two-way data communication coupling to a network link 120 that is connected to a local network 122. For example, communication interface 118 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 118 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
[0095] Network link 120 typically provides data communication through one or more networks to other data devices. For example, network link 120 may provide a connection through local network 122 to a host computer 124 or to data equipment operated by an Internet Service Provider (ISP) 126. ISP 126 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 128. Local network 122 and Internet 128 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 120 and through communication interface 118, which carry the digital data to and from computer system 100, are exemplary forms of carrier waves transporting the information.
[0096] Computer system 100 can send messages and receive data, including program code, through the network(s), network link 120 and communication interface 118. In the Internet example, a server 130 might transmit a requested code for an application program through Internet 128, ISP 126, local network 122 and communication interface 118. [0097] The received code may be executed by processor 104 as it is received, and/or stored in storage device 110, or other non- volatile storage for later execution. In this manner, computer system 100 may obtain application code in the form of a carrier wave. [0098] In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMSWhat is claimed is:
1. A method for handling path-based queries, the method comprising the steps of: receiving a path-based query that specifies a path associated with data in an XML document; generating, based on the path, an index-enabled query that accesses an XML index that indexes a plurality of XML documents, including said XML document; and executing the index-enabled query to use said XML index to retrieve said data required by the path-based query.
2. The method of claim 1 wherein the steps of receiving, generating and executing are performed by a relational database server that manages access to the plurality of XML documents.
3. The method of claim 2 wherein: the step of generating an index-enabled query includes generating a SQL query; and the step of executing the index-enabled query is performed by the relational database server executing the SQL query.
4. The method of claim 1 wherein the step of generating an index-enabled query includes: identifying a template, of a plurality of available templates, that corresponds to a portion of the path-based query; and generating at least a portion of the index-enabled query based on a rule associated with the template the corresponds to the portion of the path-based query.
5. The method of claim 4 wherein: the template is a first template of the plurality of templates; the portion of the path-based query includes a subportion; the step of generating an index-enabled query further includes identifying a second template, of the plurality of templates, that corresponds to the subportion of the path-based query; and the portion of the index-enabled query that is based on the rule associated with the first template includes query content based on a rule associated with the second template.
6. The method of Claim 4 wherein: the portion of the path-based query is a simple path expression; the step of identifying a template includes identifying a template for simple path expressions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for simple path expressions.
7. The method of Claim 6 wherein the step of generating at least a portion of the index- enable query based on the template for simple path expressions includes generating query content that selects from the index based on a pathid associated with the simple path expression.
8. The method of Claim 4 wherein: the portion of the path-based query is a filter expression; the step of identifying a template includes identifying a template for filter expressions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for filter expressions.
9. The method of Claim 4 wherein: the portion of the path-based query is a descendant axes expression; the step of identifying a template includes identifying a template for descendant axes expressions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for descendant axes expressions.
10. The method of Claim 4 wherein: the portion of the path-based query is a wildcard expression; the step of identifying a template includes identifying a template for wildcard expressions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for wildcard expressions.
11. The method of Claim 4 wherein: the portion of the path-based query is a logical expression; the step of identifying a template includes identifying a template for logical expressions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for logical expressions.
12. The method of Claim 4 wherein: the portion of the path-based query is a relational expression; the step of identifying a template includes identifying a template for relational expressions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for relational expressions.
13. The method of Claim 4 wherein: the portion of the path-based query is a literal; the step of identifying a template includes identifying a template for literals; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for literals.
14. The method of Claim 4 wherein: the portion of the path-based query is a cast expression; the step of identifying a template includes identifying a template for cast expressions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for cast expressions.
15. The method of Claim 4 wherein: the portion of the path-based query is a text function; the step of identifying a template includes identifying a template for text functions; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template for text functions.
16. The method of Claim 4 wherein: the portion of the path-based query includes an operator for determining whether a node in the XML document exists; the step of identifying a template includes identifying a template associated with said operator; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template associated with said operator.
17. The method of Claim 4 wherein: the portion of the path-based query is a an operator for extracting a single value from the XML document; the step of identifying a template includes identifying a template associated with said operator; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template associated with said operator.
18. The method of Claim 4 wherein: the portion of the path-based query is an operator for extracting one or more elements in the XML document; the step of identifying a template includes identifying a template associated with the operator; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template associated with the operator.
19. The method of Claim 4 wherein: the portion of the path-based query is an operator for generating a collection of XML documents corresponding to root elements of an XML fragment; the step of identifying a template includes identifying a template associated with said operator; and the step of generating at least a portion of the index-enabled query based on a rule associated with the template includes generating at least a portion of the index-enable query based on the template associated with said operator.
20. A computer-readable medium carrying one or more sequences of instructions which, when executed by one or more processors, causes the one or more processors to perform the method recited in any one of Claims 1 to 19.
PCT/US2005/011762 2004-04-09 2005-04-06 Efficient query processing of xml data using xml index WO2005101245A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US56092704P 2004-04-09 2004-04-09
US60/560,927 2004-04-09
US10/944,170 US7398265B2 (en) 2004-04-09 2004-09-16 Efficient query processing of XML data using XML index
US10/944,170 2004-09-16

Publications (1)

Publication Number Publication Date
WO2005101245A1 true WO2005101245A1 (en) 2005-10-27

Family

ID=34966456

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/011762 WO2005101245A1 (en) 2004-04-09 2005-04-06 Efficient query processing of xml data using xml index

Country Status (2)

Country Link
US (1) US7398265B2 (en)
WO (1) WO2005101245A1 (en)

Families Citing this family (117)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7519948B1 (en) * 2002-12-26 2009-04-14 Judson Ames Cornish Platform for processing semi-structured self-describing data
US8229932B2 (en) 2003-09-04 2012-07-24 Oracle International Corporation Storing XML documents efficiently in an RDBMS
US8694510B2 (en) 2003-09-04 2014-04-08 Oracle International Corporation Indexing XML documents efficiently
US7366735B2 (en) * 2004-04-09 2008-04-29 Oracle International Corporation Efficient extraction of XML content stored in a LOB
US7493305B2 (en) * 2004-04-09 2009-02-17 Oracle International Corporation Efficient queribility and manageability of an XML index with path subsetting
US7440954B2 (en) 2004-04-09 2008-10-21 Oracle International Corporation Index maintenance for operations involving indexed XML data
US7499915B2 (en) * 2004-04-09 2009-03-03 Oracle International Corporation Index for accessing XML data
US7603347B2 (en) 2004-04-09 2009-10-13 Oracle International Corporation Mechanism for efficiently evaluating operator trees
US7930277B2 (en) 2004-04-21 2011-04-19 Oracle International Corporation Cost-based optimizer for an XML data repository within a database
DE602005022069D1 (en) 2004-06-23 2010-08-12 Oracle Int Corp EFFICIENT EVALUATION OF QUESTIONS BY TRANSLATION
US7516121B2 (en) * 2004-06-23 2009-04-07 Oracle International Corporation Efficient evaluation of queries using translation
US7885980B2 (en) 2004-07-02 2011-02-08 Oracle International Corporation Mechanism for improving performance on XML over XML data using path subsetting
US8566300B2 (en) * 2004-07-02 2013-10-22 Oracle International Corporation Mechanism for efficient maintenance of XML index structures in a database system
US7668806B2 (en) * 2004-08-05 2010-02-23 Oracle International Corporation Processing queries against one or more markup language sources
JP4301513B2 (en) * 2004-11-26 2009-07-22 インターナショナル・ビジネス・マシーンズ・コーポレーション Judgment method of access control effect using policy
US7921076B2 (en) 2004-12-15 2011-04-05 Oracle International Corporation Performing an action in response to a file system event
US7685203B2 (en) * 2005-03-21 2010-03-23 Oracle International Corporation Mechanism for multi-domain indexes on XML documents
US8346737B2 (en) * 2005-03-21 2013-01-01 Oracle International Corporation Encoding of hierarchically organized data for efficient storage and processing
US20060235839A1 (en) * 2005-04-19 2006-10-19 Muralidhar Krishnaprasad Using XML as a common parser architecture to separate parser from compiler
US8762410B2 (en) * 2005-07-18 2014-06-24 Oracle International Corporation Document level indexes for efficient processing in multiple tiers of a computer system
US8073841B2 (en) 2005-10-07 2011-12-06 Oracle International Corporation Optimizing correlated XML extracts
US8949455B2 (en) 2005-11-21 2015-02-03 Oracle International Corporation Path-caching mechanism to improve performance of path-related operations in a repository
US7913223B2 (en) * 2005-12-16 2011-03-22 Dialogic Corporation Method and system for development and use of a user-interface for operations, administration, maintenance and provisioning of a telecommunications system
US7933928B2 (en) 2005-12-22 2011-04-26 Oracle International Corporation Method and mechanism for loading XML documents into memory
US7730032B2 (en) 2006-01-12 2010-06-01 Oracle International Corporation Efficient queriability of version histories in a repository
US20070198479A1 (en) * 2006-02-16 2007-08-23 International Business Machines Corporation Streaming XPath algorithm for XPath expressions with predicates
US9229967B2 (en) * 2006-02-22 2016-01-05 Oracle International Corporation Efficient processing of path related operations on data organized hierarchically in an RDBMS
US7644066B2 (en) * 2006-03-31 2010-01-05 Oracle International Corporation Techniques of efficient XML meta-data query using XML table index
US8510292B2 (en) * 2006-05-25 2013-08-13 Oracle International Coporation Isolation for applications working on shared XML data
US10318752B2 (en) * 2006-05-26 2019-06-11 Oracle International Corporation Techniques for efficient access control in a database system
US20070288489A1 (en) * 2006-06-09 2007-12-13 Mark John Anderson Apparatus and Method for Autonomic Index Creation, Modification and Deletion
US8838574B2 (en) * 2006-06-09 2014-09-16 International Business Machines Corporation Autonomic index creation, modification and deletion
US8838573B2 (en) * 2006-06-09 2014-09-16 International Business Machines Corporation Autonomic index creation
US20080033967A1 (en) * 2006-07-18 2008-02-07 Ravi Murthy Semantic aware processing of XML documents
US20080033940A1 (en) * 2006-08-01 2008-02-07 Hung The Dinh Database Query Enabling Selection By Partial Column Name
KR100779395B1 (en) * 2006-08-31 2007-11-23 동부일렉트로닉스 주식회사 Semiconductor device and method for manufacturing thereof
US8635242B2 (en) * 2006-10-11 2014-01-21 International Business Machines Corporation Processing queries on hierarchical markup data using shared hierarchical markup trees
US8108765B2 (en) * 2006-10-11 2012-01-31 International Business Machines Corporation Identifying and annotating shared hierarchical markup document trees
US7797310B2 (en) 2006-10-16 2010-09-14 Oracle International Corporation Technique to estimate the cost of streaming evaluation of XPaths
US7627566B2 (en) * 2006-10-20 2009-12-01 Oracle International Corporation Encoding insignificant whitespace of XML data
US7739251B2 (en) * 2006-10-20 2010-06-15 Oracle International Corporation Incremental maintenance of an XML index on binary XML data
US8010889B2 (en) * 2006-10-20 2011-08-30 Oracle International Corporation Techniques for efficient loading of binary XML data
US8478760B2 (en) * 2006-11-17 2013-07-02 Oracle International Corporation Techniques of efficient query over text, image, audio, video and other domain specific data in XML using XML table index with integration of text index and other domain specific indexes
US9436779B2 (en) * 2006-11-17 2016-09-06 Oracle International Corporation Techniques of efficient XML query using combination of XML table index and path/value index
US20080120283A1 (en) * 2006-11-17 2008-05-22 Oracle International Corporation Processing XML data stream(s) using continuous queries in a data stream management system
US7840590B2 (en) * 2006-12-18 2010-11-23 Oracle International Corporation Querying and fragment extraction within resources in a hierarchical repository
US20080147615A1 (en) * 2006-12-18 2008-06-19 Oracle International Corporation Xpath based evaluation for content stored in a hierarchical database repository using xmlindex
US7716210B2 (en) 2006-12-20 2010-05-11 International Business Machines Corporation Method and apparatus for XML query evaluation using early-outs and multiple passes
US7552119B2 (en) * 2006-12-20 2009-06-23 International Business Machines Corporation Apparatus and method for skipping XML index scans with common ancestors of a previously failed predicate
US8078611B2 (en) * 2007-01-03 2011-12-13 Oracle International Corporation Query modes for translation-enabled XML documents
US7860899B2 (en) * 2007-03-26 2010-12-28 Oracle International Corporation Automatically determining a database representation for an abstract datatype
US7814117B2 (en) * 2007-04-05 2010-10-12 Oracle International Corporation Accessing data from asynchronously maintained index
US7693911B2 (en) * 2007-04-09 2010-04-06 Microsoft Corporation Uniform metadata retrieval
US7836098B2 (en) 2007-07-13 2010-11-16 Oracle International Corporation Accelerating value-based lookup of XML document in XQuery
US7840609B2 (en) * 2007-07-31 2010-11-23 Oracle International Corporation Using sibling-count in XML indexes to optimize single-path queries
US7979420B2 (en) * 2007-10-16 2011-07-12 Oracle International Corporation Handling silent relations in a data stream management system
US8296316B2 (en) * 2007-10-17 2012-10-23 Oracle International Corporation Dynamically sharing a subtree of operators in a data stream management system operating on existing queries
US7996388B2 (en) * 2007-10-17 2011-08-09 Oracle International Corporation Adding new continuous queries to a data stream management system operating on existing queries
US8073826B2 (en) 2007-10-18 2011-12-06 Oracle International Corporation Support for user defined functions in a data stream management system
US8521867B2 (en) * 2007-10-20 2013-08-27 Oracle International Corporation Support for incrementally processing user defined aggregations in a data stream management system
US7991766B2 (en) * 2007-10-20 2011-08-02 Oracle International Corporation Support for user defined aggregations in a data stream management system
US8090731B2 (en) 2007-10-29 2012-01-03 Oracle International Corporation Document fidelity with binary XML storage
US10089361B2 (en) * 2007-10-31 2018-10-02 Oracle International Corporation Efficient mechanism for managing hierarchical relationships in a relational database system
US7991768B2 (en) 2007-11-08 2011-08-02 Oracle International Corporation Global query normalization to improve XML index based rewrites for path subsetted index
US8250062B2 (en) 2007-11-09 2012-08-21 Oracle International Corporation Optimized streaming evaluation of XML queries
US7870124B2 (en) * 2007-12-13 2011-01-11 Oracle International Corporation Rewriting node reference-based XQuery using SQL/SML
US7996444B2 (en) * 2008-02-18 2011-08-09 International Business Machines Corporation Creation of pre-filters for more efficient X-path processing
US8868482B2 (en) * 2008-03-20 2014-10-21 Oracle International Corporation Inferring schemas from XML document collections
US7865502B2 (en) * 2008-04-10 2011-01-04 International Business Machines Corporation Optimization of extensible markup language path language (XPATH) expressions in a database management system configured to accept extensible markup language (XML) queries
US20100030727A1 (en) * 2008-07-29 2010-02-04 Sivasankaran Chandrasekar Technique For Using Occurrence Constraints To Optimize XML Index Access
US8073843B2 (en) * 2008-07-29 2011-12-06 Oracle International Corporation Mechanism for deferred rewrite of multiple XPath evaluations over binary XML
US7958112B2 (en) 2008-08-08 2011-06-07 Oracle International Corporation Interleaving query transformations for XML indexes
US20100057737A1 (en) 2008-08-29 2010-03-04 Oracle International Corporation Detection of non-occurrences of events using pattern matching
US8126932B2 (en) * 2008-12-30 2012-02-28 Oracle International Corporation Indexing strategy with improved DML performance and space usage for node-aware full-text search over XML
US8219563B2 (en) * 2008-12-30 2012-07-10 Oracle International Corporation Indexing mechanism for efficient node-aware full-text search over XML
US8352517B2 (en) * 2009-03-02 2013-01-08 Oracle International Corporation Infrastructure for spilling pages to a persistent store
US8145859B2 (en) 2009-03-02 2012-03-27 Oracle International Corporation Method and system for spilling from a queue to a persistent store
US8364714B2 (en) * 2009-06-08 2013-01-29 International Business Machines Corporation Servicing query with access path security in relational database management system
US8321450B2 (en) 2009-07-21 2012-11-27 Oracle International Corporation Standardized database connectivity support for an event processing server in an embedded context
US8387076B2 (en) 2009-07-21 2013-02-26 Oracle International Corporation Standardized database connectivity support for an event processing server
US8527458B2 (en) 2009-08-03 2013-09-03 Oracle International Corporation Logging framework for a data stream processing server
US8386466B2 (en) 2009-08-03 2013-02-26 Oracle International Corporation Log visualization tool for a data stream processing server
US9477778B2 (en) * 2009-08-21 2016-10-25 Oracle International Corporation XML query optimization with order analysis of XML schema
CN102033885B (en) * 2009-09-29 2013-10-02 国际商业机器公司 Method and system for XPath execution in XML (extensible markup language) data storage bank
US9305057B2 (en) 2009-12-28 2016-04-05 Oracle International Corporation Extensible indexing framework using data cartridges
US8959106B2 (en) 2009-12-28 2015-02-17 Oracle International Corporation Class loading using java data cartridges
US9430494B2 (en) 2009-12-28 2016-08-30 Oracle International Corporation Spatial data cartridge for event processing systems
US9165086B2 (en) 2010-01-20 2015-10-20 Oracle International Corporation Hybrid binary XML storage model for efficient XML processing
US8346813B2 (en) * 2010-01-20 2013-01-01 Oracle International Corporation Using node identifiers in materialized XML views and indexes to directly navigate to and within XML fragments
US8417714B2 (en) * 2010-01-22 2013-04-09 Oracle International Corporation Techniques for fast and scalable XML generation and aggregation over binary XML
US8838637B2 (en) * 2010-02-10 2014-09-16 Agfa Healthcare Inc. Systems and methods for processing consumer queries in different languages for clinical documents
US8447785B2 (en) 2010-06-02 2013-05-21 Oracle International Corporation Providing context aware search adaptively
US8566343B2 (en) 2010-06-02 2013-10-22 Oracle International Corporation Searching backward to speed up query
US8713049B2 (en) 2010-09-17 2014-04-29 Oracle International Corporation Support for a parameterized query/view in complex event processing
US9189280B2 (en) 2010-11-18 2015-11-17 Oracle International Corporation Tracking large numbers of moving objects in an event processing system
US8880508B2 (en) * 2010-12-30 2014-11-04 Sap Se Processing database queries using format conversion
US8990416B2 (en) 2011-05-06 2015-03-24 Oracle International Corporation Support for a new insert stream (ISTREAM) operation in complex event processing (CEP)
US9329975B2 (en) 2011-07-07 2016-05-03 Oracle International Corporation Continuous query language (CQL) debugger in complex event processing (CEP)
US9563663B2 (en) 2012-09-28 2017-02-07 Oracle International Corporation Fast path evaluation of Boolean predicates
US11288277B2 (en) 2012-09-28 2022-03-29 Oracle International Corporation Operator sharing for continuous queries over archived relations
US10956422B2 (en) 2012-12-05 2021-03-23 Oracle International Corporation Integrating event processing with map-reduce
US9043290B2 (en) * 2013-01-14 2015-05-26 International Business Machines Corporation Rewriting relational expressions for different type systems
US9098587B2 (en) 2013-01-15 2015-08-04 Oracle International Corporation Variable duration non-event pattern matching
US10298444B2 (en) 2013-01-15 2019-05-21 Oracle International Corporation Variable duration windows on continuous data streams
US9390135B2 (en) 2013-02-19 2016-07-12 Oracle International Corporation Executing continuous event processing (CEP) queries in parallel
US9047249B2 (en) 2013-02-19 2015-06-02 Oracle International Corporation Handling faults in a continuous event processing (CEP) system
US9063916B2 (en) * 2013-02-27 2015-06-23 Oracle International Corporation Compact encoding of node locations
US9418113B2 (en) 2013-05-30 2016-08-16 Oracle International Corporation Value based windows on relations in continuous data streams
US9934279B2 (en) 2013-12-05 2018-04-03 Oracle International Corporation Pattern matching across multiple input data streams
US9244978B2 (en) 2014-06-11 2016-01-26 Oracle International Corporation Custom partitioning of a data stream
US9712645B2 (en) 2014-06-26 2017-07-18 Oracle International Corporation Embedded event processing
US9886486B2 (en) 2014-09-24 2018-02-06 Oracle International Corporation Enriching events with dynamically typed big data for event processing
US10120907B2 (en) 2014-09-24 2018-11-06 Oracle International Corporation Scaling event processing using distributed flows and map-reduce operations
US9940351B2 (en) 2015-03-11 2018-04-10 International Business Machines Corporation Creating XML data from a database
WO2017018901A1 (en) 2015-07-24 2017-02-02 Oracle International Corporation Visually exploring and analyzing event streams
US10769209B1 (en) * 2017-01-13 2020-09-08 Marklogic Corporation Apparatus and method for template driven data extraction in a semi-structured document database
US11392607B2 (en) 2020-01-30 2022-07-19 International Business Machines Corporation Automatic feature engineering during online scoring phase

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001042881A2 (en) * 1999-12-06 2001-06-14 B-Bop Associates, Inc. System and method for the storage, indexing and retrieval of xml documents using relational databases
US20030212662A1 (en) * 2002-05-08 2003-11-13 Samsung Electronics Co., Ltd. Extended markup language (XML) indexing method for processing regular path expression queries in a relational database and a data structure thereof

Family Cites Families (106)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700458B2 (en) 1989-05-01 2014-04-15 Catalina Marketing Corporation System, method, and database for processing transactions
US5257365A (en) 1990-03-16 1993-10-26 Powers Frederick A Database system with multi-dimensional summary search tree nodes for reducing the necessity to access records
JPH0667951A (en) 1992-05-20 1994-03-11 Nec Corp Database management system
US5630125A (en) 1994-05-23 1997-05-13 Zellweger; Paul Method and apparatus for information management using an open hierarchical data structure
CA2167790A1 (en) 1995-01-23 1996-07-24 Donald S. Maier Relational database system and method with high data availability during table data restructuring
SE504472C2 (en) 1995-06-22 1997-02-17 Abb Flexible Automation As Color feeding system for spray painting robot
US5960194A (en) 1995-09-11 1999-09-28 International Business Machines Corporation Method for generating a multi-tiered index for partitioned data
US5893109A (en) 1996-03-15 1999-04-06 Inso Providence Corporation Generation of chunks of a long document for an electronic book system
US5893104A (en) 1996-07-09 1999-04-06 Oracle Corporation Method and system for processing queries in a database system using index structures that are not native to the database system
US6208993B1 (en) 1996-07-26 2001-03-27 Ori Software Development Ltd. Method for organizing directories
US5924088A (en) 1997-02-28 1999-07-13 Oracle Corporation Index selection for an index access path
US5983215A (en) 1997-05-08 1999-11-09 The Trustees Of Columbia University In The City Of New York System and method for performing joins and self-joins in a database system
US6141655A (en) 1997-09-23 2000-10-31 At&T Corp Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template
US5974407A (en) 1997-09-29 1999-10-26 Sacks; Jerome E. Method and apparatus for implementing a hierarchical database management system (HDBMS) using a relational database management system (RDBMS) as the implementing apparatus
US6772350B1 (en) 1998-05-15 2004-08-03 E.Piphany, Inc. System and method for controlling access to resources in a distributed environment
US6487546B1 (en) 1998-08-27 2002-11-26 Oracle Corporation Apparatus and method for aggregate indexes
US6330573B1 (en) 1998-08-31 2001-12-11 Xerox Corporation Maintaining document identity across hierarchy and non-hierarchy file systems
US6253195B1 (en) 1998-09-21 2001-06-26 Microsoft Corporation Optimized query tree
US6366902B1 (en) 1998-09-24 2002-04-02 International Business Machines Corp. Using an epoch number to optimize access with rowid columns and direct row access
US6631366B1 (en) 1998-10-20 2003-10-07 Sybase, Inc. Database system providing methodology for optimizing latching/copying costs in index scans on data-only locked tables
US6279007B1 (en) 1998-11-30 2001-08-21 Microsoft Corporation Architecture for managing query friendly hierarchical values
US6427123B1 (en) 1999-02-18 2002-07-30 Oracle Corporation Hierarchical indexing for accessing hierarchically organized information in a relational system
US7366708B2 (en) 1999-02-18 2008-04-29 Oracle Corporation Mechanism to efficiently index structured data that provides hierarchical access in a relational database system
US6341289B1 (en) 1999-05-06 2002-01-22 International Business Machines Corporation Object identity and partitioning for user defined extents
US6496842B1 (en) 1999-05-28 2002-12-17 Survol Interactive Technologies Navigating heirarchically organized information
US6470344B1 (en) 1999-05-29 2002-10-22 Oracle Corporation Buffering a hierarchical index of multi-dimensional data
WO2000079379A1 (en) 1999-06-19 2000-12-28 Kent Ridge Digital Labs A system of organising catalog data for searching and retrieval
US7181438B1 (en) 1999-07-21 2007-02-20 Alberti Anemometer, Llc Database access system
US6438562B1 (en) 1999-08-24 2002-08-20 Oracle Corporation Parallel index maintenance
US6665684B2 (en) 1999-09-27 2003-12-16 Oracle International Corporation Partition pruning with composite partitioning
US6826727B1 (en) 1999-11-24 2004-11-30 Bitstream Inc. Apparatus, methods, programming for automatically laying out documents
US6721727B2 (en) 1999-12-02 2004-04-13 International Business Machines Corporation XML documents stored as column data
US7089239B1 (en) 2000-01-21 2006-08-08 International Business Machines Corporation Method and system for preventing mutually exclusive content entities stored in a data repository to be included in the same compilation of content
US7043488B1 (en) 2000-01-21 2006-05-09 International Business Machines Corporation Method and system for storing hierarchical content objects in a data repository
US6604100B1 (en) 2000-02-09 2003-08-05 At&T Corp. Method for converting relational data into a structured document
US7031956B1 (en) 2000-02-16 2006-04-18 Verizon Laboratories Inc. System and method for synchronizing and/or updating an existing relational database with supplemental XML data
US7213017B2 (en) 2000-03-17 2007-05-01 Microsoft Corporation Systems and methods for transforming query results into hierarchical information
US6934712B2 (en) 2000-03-21 2005-08-23 International Business Machines Corporation Tagging XML query results over relational DBMSs
US6782380B1 (en) 2000-04-14 2004-08-24 David Victor Thede Method and system for indexing and searching contents of extensible mark-up language (XML) documents
US6915304B2 (en) 2000-05-23 2005-07-05 Kenneth A. Krupa System and method for converting an XML data structure into a relational database
US7043472B2 (en) 2000-06-05 2006-05-09 International Business Machines Corporation File system with access and retrieval of XML documents
US7024413B2 (en) 2000-07-26 2006-04-04 International Business Machines Corporation Method of externalizing legacy database in ASN.1-formatted data into XML format
US6654734B1 (en) 2000-08-30 2003-11-25 International Business Machines Corporation System and method for query processing and optimization for XML repositories
US7024425B2 (en) 2000-09-07 2006-04-04 Oracle International Corporation Method and apparatus for flexible storage and uniform manipulation of XML data in a relational database system
US20020184401A1 (en) 2000-10-20 2002-12-05 Kadel Richard William Extensible information system
US6785718B2 (en) 2000-10-23 2004-08-31 Schneider Logistics, Inc. Method and system for interfacing with a shipping service
US20030105732A1 (en) 2000-11-17 2003-06-05 Kagalwala Raxit A. Database schema for structure query language (SQL) server
EP1211610A1 (en) 2000-11-29 2002-06-05 Lafayette Software Inc. Methods of organising data and processing queries in a database system
US7174534B2 (en) 2001-01-22 2007-02-06 Symbol Technologies, Inc. Efficient system and method for running and analyzing multi-channel, multi-modal applications
US6959416B2 (en) 2001-01-30 2005-10-25 International Business Machines Corporation Method, system, program, and data structures for managing structured documents in a database
US7162467B2 (en) 2001-02-22 2007-01-09 Greenplum, Inc. Systems and methods for managing distributed database resources
JP4529063B2 (en) 2001-03-30 2010-08-25 ルネサスエレクトロニクス株式会社 System simulator, simulation method, and simulation program
US6778977B1 (en) 2001-04-19 2004-08-17 Microsoft Corporation Method and system for creating a database table index using multiple processors
US7028028B1 (en) 2001-05-17 2006-04-11 Enosys Markets,Inc. System for querying markup language data stored in a relational database according to markup language schema
US7581170B2 (en) 2001-05-31 2009-08-25 Lixto Software Gmbh Visual and interactive wrapper generation, automated information extraction from Web pages, and translation into XML
US7117216B2 (en) 2001-06-07 2006-10-03 Sun Microsystems, Inc. Method and apparatus for runtime merging of hierarchical trees
US7043716B2 (en) 2001-06-13 2006-05-09 Arius Software Corporation System and method for multiple level architecture by use of abstract application notation
US6886046B2 (en) 2001-06-26 2005-04-26 Citrix Systems, Inc. Methods and apparatus for extendible information aggregation and presentation
US7107521B2 (en) 2001-07-03 2006-09-12 International Business Machines Corporation XSL dynamic inheritance
US6795821B2 (en) 2001-07-17 2004-09-21 Trendium, Inc. Database systems, methods and computer program products including primary key and super key indexes for use with partitioned tables
US7047253B1 (en) 2001-09-28 2006-05-16 Oracle Interntional Corporation Mechanisms for storing content and properties of hierarchically organized resources
AU2002334721B2 (en) 2001-09-28 2008-10-23 Oracle International Corporation An index structure to access hierarchical data in a relational database system
US6836857B2 (en) 2001-10-18 2004-12-28 Sun Microsystems, Inc. Mechanism for debugging a computer process
US6928449B2 (en) 2001-10-18 2005-08-09 Sun Microsystems, Inc. Mechanism for facilitating backtracking
US7181489B2 (en) 2002-01-10 2007-02-20 International Business Machines Corporation Method, apparatus, and program for distributing a document object model in a web server cluster
US6732222B1 (en) 2002-02-01 2004-05-04 Silicon Motion, Inc. Method for performing flash memory file management
US6965894B2 (en) 2002-03-22 2005-11-15 International Business Machines Corporation Efficient implementation of an index structure for multi-column bi-directional searches
CA2382712A1 (en) 2002-04-19 2003-10-19 Ibm Canada Limited-Ibm Canada Limitee Detection and prevention of writing conflicts within nested query statements
US7548935B2 (en) 2002-05-09 2009-06-16 Robert Pecherer Method of recursive objects for representing hierarchies in relational database systems
US7457810B2 (en) 2002-05-10 2008-11-25 International Business Machines Corporation Querying markup language data sources using a relational query processor
EP1552427A4 (en) 2002-06-13 2009-12-16 Mark Logic Corp Parent-child query indexing for xml databases
EP1552426A4 (en) 2002-06-13 2009-01-21 Mark Logic Corp A subtree-structured xml database
AUPS300402A0 (en) 2002-06-17 2002-07-11 Canon Kabushiki Kaisha Indexing and querying structured documents
US7162485B2 (en) * 2002-06-19 2007-01-09 Georg Gottlob Efficient processing of XPath queries
US7574652B2 (en) 2002-06-20 2009-08-11 Canon Kabushiki Kaisha Methods for interactively defining transforms and for generating queries by manipulating existing query data
US6917935B2 (en) 2002-06-26 2005-07-12 Microsoft Corporation Manipulating schematized data in a database
US20040010752A1 (en) 2002-07-09 2004-01-15 Lucent Technologies Inc. System and method for filtering XML documents with XPath expressions
US6915392B2 (en) * 2002-07-12 2005-07-05 Intel Corporation Optimizing memory usage by vtable cloning
US7120645B2 (en) 2002-09-27 2006-10-10 Oracle International Corporation Techniques for rewriting XML queries directed to relational database constructs
GB2394800A (en) 2002-10-30 2004-05-05 Hewlett Packard Co Storing hierarchical documents in a relational database
US7124137B2 (en) 2002-12-19 2006-10-17 International Business Machines Corporation Method, system, and program for optimizing processing of nested functions
US20040143581A1 (en) 2003-01-15 2004-07-22 Bohannon Philip L. Cost-based storage of extensible markup language (XML) data
US20040148278A1 (en) 2003-01-22 2004-07-29 Amir Milo System and method for providing content warehouse
US7490097B2 (en) 2003-02-20 2009-02-10 Microsoft Corporation Semi-structured data storage schema selection
US7062507B2 (en) 2003-02-24 2006-06-13 The Boeing Company Indexing profile for efficient and scalable XML based publish and subscribe system
US20040193575A1 (en) 2003-03-25 2004-09-30 Chia-Hsun Chen Path expressions and SQL select statement in object oriented language
US7181680B2 (en) 2003-04-30 2007-02-20 Oracle International Corporation Method and mechanism for processing queries for XML documents using an index
US7146352B2 (en) 2003-06-23 2006-12-05 Microsoft Corporation Query optimizer system and method
US7383255B2 (en) 2003-06-23 2008-06-03 Microsoft Corporation Common query runtime system and application programming interface
US7519577B2 (en) 2003-06-23 2009-04-14 Microsoft Corporation Query intermediate language method and system
US7143078B2 (en) 2003-06-27 2006-11-28 Microsoft Corporation System and method for managed database query pre-optimization
US20050038688A1 (en) * 2003-08-15 2005-02-17 Collins Albert E. System and method for matching local buyers and sellers for the provision of community based services
US7174328B2 (en) 2003-09-02 2007-02-06 International Business Machines Corp. Selective path signatures for query processing over a hierarchical tagged data structure
US7634498B2 (en) * 2003-10-24 2009-12-15 Microsoft Corporation Indexing XML datatype content system and method
US7454428B2 (en) 2003-10-29 2008-11-18 Oracle International Corp. Network data model for relational database management system
US7315852B2 (en) * 2003-10-31 2008-01-01 International Business Machines Corporation XPath containment for index and materialized view matching
US7512615B2 (en) 2003-11-07 2009-03-31 International Business Machines Corporation Single pass workload directed clustering of XML documents
JP2005141650A (en) 2003-11-10 2005-06-02 Seiko Epson Corp Structured document encoding device, structured document encoding method and program thereof
US7287023B2 (en) 2003-11-26 2007-10-23 International Business Machines Corporation Index structure for supporting structural XML queries
US7290012B2 (en) 2004-01-16 2007-10-30 International Business Machines Corporation Apparatus, system, and method for passing data between an extensible markup language document and a hierarchical database
JP4227033B2 (en) 2004-01-20 2009-02-18 富士通株式会社 Database integrated reference device, database integrated reference method, and database integrated reference program
US7386541B2 (en) 2004-03-18 2008-06-10 Microsoft Corporation System and method for compiling an extensible markup language based query
US7499915B2 (en) * 2004-04-09 2009-03-03 Oracle International Corporation Index for accessing XML data
US7516121B2 (en) * 2004-06-23 2009-04-07 Oracle International Corporation Efficient evaluation of queries using translation
US20050289138A1 (en) * 2004-06-25 2005-12-29 Cheng Alex T Aggregate indexing of structured and unstructured marked-up content
US7921076B2 (en) * 2004-12-15 2011-04-05 Oracle International Corporation Performing an action in response to a file system event

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001042881A2 (en) * 1999-12-06 2001-06-14 B-Bop Associates, Inc. System and method for the storage, indexing and retrieval of xml documents using relational databases
US20030212662A1 (en) * 2002-05-08 2003-11-13 Samsung Electronics Co., Ltd. Extended markup language (XML) indexing method for processing regular path expression queries in a relational database and a data structure thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
J.MCHUGH ET. AL: "Query Optimization for XML", PROCEEDINGS OF THE 25TH VLDB CONFERENCE, 7 September 1999 (1999-09-07), EDINBURGH, SCOTLAND, pages 315 - 325, XP002333353, Retrieved from the Internet <URL:http://www.vldb.org/conf/1999/P32.pdf> [retrieved on 20050624] *
YOSHIKAWA M ET AL: "XREL: A PATH-BASED APPROACH TO STORAGE AND RETRIEVAL OF XML DOCUMENTS USING RELATIONAL DATABASES", ACM TRANSACTIONS ON INTERNET TECHNOLOGY, ACM, NEW YORK, NY, US, vol. 1, no. 1, August 2001 (2001-08-01), pages 110 - 141, XP001143686, ISSN: 1049-3301 *

Also Published As

Publication number Publication date
US20050229158A1 (en) 2005-10-13
US7398265B2 (en) 2008-07-08

Similar Documents

Publication Publication Date Title
US7398265B2 (en) Efficient query processing of XML data using XML index
US7499915B2 (en) Index for accessing XML data
US7493305B2 (en) Efficient queribility and manageability of an XML index with path subsetting
US7921101B2 (en) Index maintenance for operations involving indexed XML data
AU2005264926B2 (en) Efficient extraction of XML content stored in a LOB
US8209352B2 (en) Method and mechanism for efficient storage and query of XML documents based on paths
US8219563B2 (en) Indexing mechanism for efficient node-aware full-text search over XML
US7840590B2 (en) Querying and fragment extraction within resources in a hierarchical repository
US7885980B2 (en) Mechanism for improving performance on XML over XML data using path subsetting
US8126932B2 (en) Indexing strategy with improved DML performance and space usage for node-aware full-text search over XML
US20080010256A1 (en) Element query method and system
US20050055343A1 (en) Storing XML documents efficiently in an RDBMS
US20070250527A1 (en) Mechanism for abridged indexes over XML document collections
AU2005234002B2 (en) Index for accessing XML data
JP4866844B2 (en) Efficient extraction of XML content stored in a LOB
US20080147615A1 (en) Xpath based evaluation for content stored in a hierarchical database repository using xmlindex

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase