CN101984439A - Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries - Google Patents

Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries Download PDF

Info

Publication number
CN101984439A
CN101984439A CN 201010580677 CN201010580677A CN101984439A CN 101984439 A CN101984439 A CN 101984439A CN 201010580677 CN201010580677 CN 201010580677 CN 201010580677 A CN201010580677 A CN 201010580677A CN 101984439 A CN101984439 A CN 101984439A
Authority
CN
China
Prior art keywords
xml
inquiry
query
subquery
data source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010580677
Other languages
Chinese (zh)
Inventor
文纬
杨昆
严营
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Gongjin Communication Technology Co Ltd
Original Assignee
Shanghai Gongjin Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Gongjin Communication Technology Co Ltd filed Critical Shanghai Gongjin Communication Technology Co Ltd
Priority to CN 201010580677 priority Critical patent/CN101984439A/en
Publication of CN101984439A publication Critical patent/CN101984439A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a method for realizing optimization of a data source extensive makeup language (XML) query system based on sub-queries. The method comprises the following steps: receiving corresponding XML query input information; carrying out morphological analysis and syntactic analysis, and verifying the correctness and the validity; if the analysis is successful and the verification is passed, generating an XML analytical syntactic tree; translating the XML analytical syntactic tree, and converting the XML query input information into intermediate logical representation; carrying out rewriting treatment, and generating a target query expression; and calling supported querying and computing engines for querying and computing, and acquiring output query results. By adopting the method for realizing the optimization of the data source XML query system based on the sub-queries, the query rewriting can be used for converting a procedural query into a descriptive query, the sub-queries are optimized, and certain specific sub-queries are rewritten into equivalent attended operation of a plurality of tables by the combination of the sub-queries, thus the levels of query sentences are reduced as much as possible, the treatment of planning optimization can be carried out conveniently, and the method has stable and reliable working performance and wider application range.

Description

Realize the method that data source XML inquiry system is optimized based on subquery
Technical field
The present invention relates to computing machine external information process field, particularly heterogeneous data source inquiring technology field specifically is meant a kind of method that realizes the optimization of data source XML inquiry system based on subquery.
Background technology
Along with enterprises and institutions adopt various separate application systems, when part has improved efficient, the mutual independence of these systems also is provided with obstacle for holistic management, they lack a unified interface, there is not interconnective information channel, data are all sealed up for safekeeping on disparate databases, main frame, file server usually, have only minority to have the user of privileged access power can see these data.
So As time goes on and development of technology, just formed a series of information island, each department or unit all are data sources, and each data source all is an isomery, thereby the information between them and organize all differently, this has just constituted a huge isomeric data environment.But, often existing the contact of countless ties between the data of dispersion again, data are not to be to leave on the unrelated one by one information island.The user is when concrete the application, and the data that often need again to disperse need exchange by certain, so that understand overall condition.Simultaneously because historical reasons, each department exists not compatibility between each built-in system of different times exploitation, and don't think to abandon totally causing waste, and so how effectively to realize the information integration of these heterogeneous databases share, become the task of top priority.The integrated too busy to get away current hot technology XML technology of heterogeneous database, continuing to bring out of XML new technology injected new vitality to data integration, and isomeric data is converted into the unified XML form, by queried access, realize the heterogeneous database information sharing to the XML data.How improving XML queried access speed, is key issue and bottleneck that isomeric data is shared.
Simultaneously, in order to improve the inquiry velocity efficient of XML, query rewrite both can be implemented on the text of inquiry, also can implement on parsing tree, but most convenient was to implement in the Boolean query plan.At present the most frequently used is the view rewriting technique, but the view rewriting technique has some drawbacks.Sql like language is a descriptive deproceduring language, the user when writing SQL statement, need not know the data that will handle specifically be how to deposit and must just can handle according to what step, query processor can be finished these work automatically.But the database object that query statement is handled is except may being the base table of database, it also may be view, if query processor is directly handled view, the unique selection of the executive plan that query optimizer can generate is exactly to carry out view definition earlier, all the other processing that the Query Result of view is inquired about as a temporary table participation again, this processing mode efficient in most cases is extremely low.
Summary of the invention
The objective of the invention is to have overcome above-mentioned shortcoming of the prior art, a kind of search efficiency that can obviously improve XML is provided, the level of query statement is reduced as much as possible, make things convenient for processing, stable and reliable working performance, the scope of application of planning optimization to realize the method that data source XML inquiry system is optimized based on subquery comparatively widely.
In order to realize above-mentioned purpose, of the present invention have a following formation:
Should realize the method that data source XML inquiry system is optimized based on subquery, this method is applied to heterogeneous data source, and its principal feature is that described method may further comprise the steps:
(1) system carries out initialization operation;
(2) system receives corresponding XML inquiry input information according to user's input operation;
(3) system carries out morphology parsing and syntax parsing to described XML inquiry input information, and verifies the correctness and the validity of this XML inquiry input information;
(4) if resolve successfully and checking is passed through, then system generates corresponding XML and resolves syntax tree;
(5) system resolves syntax tree to described XML and translates, and described XML inquiry input information is converted to intermediate logic represents;
(6) described intermediate logic is represented to rewrite processing, and generate the target query expression formula after rewriteeing;
(7) system call supports the inquiry computing engines of described target query expression formula to carry out the inquiry calculating of heterogeneous data source, obtains and export corresponding Query Result.
Should resolve syntax tree based on the corresponding XML of generation in the method for subquery realization data source XML inquiry system optimization, may further comprise the steps:
(11) XML being inquired about input information carries out encoding between dynamic area;
(12) generate target XML inquiry input information dynamic area between coding schedule;
(13) information stores as much as possible in the XML inquiry input information is resolved syntax tree thereby obtain XML in relational database.
Should realize that the input information that XML is inquired about in the method for data source XML inquiry system optimization is converted to intermediate logic and represents, was specially based on subquery:
Generate corresponding relational view template at each inquiry composition in the described XML inquiry input information respectively.
Should comprise built-in function, assignment expression, conditional expression and circulation expression formula in various operational characters, the FLWOR expression formula based on the inquiry composition in the method for subquery realization data source XML inquiry system optimization.
Should realize in the method for data source XML inquiry system optimization middle logical expressions being rewritten processing based on subquery, and comprise that subquery merges to handle operation and predicate of equal value and rewrite conversion process and operate.
Should merge based on the subquery in the method for subquery realization data source XML inquiry system optimization and handle operation, may further comprise the steps:
(21) subquery during system represents described intermediate logic and the FROM clause of outer query are connected to same FROM clause, and revise corresponding operational factor;
The predicate symbol of the subquery during (22) system represents described intermediate logic is made corresponding modify;
The WHERE condition of the subquery during (23) system represents described intermediate logic is done as a wholely to merge mutually with the WHERE condition of outer query, and connects with AND condition conjunction.
Should realize in the method for data source XML inquiry system optimization predicate symbol being made corresponding modify based on subquery, be specially:
" IN " is revised as "=" with predicate symbol.
Should rewrite the conversion process operation based on the predicate of equal value in the method for subquery realization data source XML inquiry system optimization, may further comprise the steps:
(31) the predicate expression formula that a plurality of OR during system represents described intermediate logic connect is converted into the ANY expression formula;
(32) ANY expression formula or All expression formula during system represents described intermediate logic are converted into single value;
(33) the BETWEEN expression formula during system represents described intermediate logic is converted into the predicate expression formula that AND connects;
(34) the IN predicate expression formula during system represents described intermediate logic expands to the predicate expression formula that OR connects.
Target query expression formula after should rewriteeing based on the generation in the method for subquery realization data source XML inquiry system optimization is specially:
System makes up described relational view template, and constitutes the SQL query expression formula of equal value mutually with this relational view template.
Should be the XQuery document based on the XML inquiry input information in the method for subquery realization data source XML inquiry system optimization.
Should resolve syntax tree based on the XML in the method for subquery realization data source XML inquiry system optimization is that XQuery resolves syntax tree.
That has adopted this invention realizes the method that data source XML inquiry system is optimized based on subquery, owing to contained process in the inquiry that the user provides, thereby can utilize query rewrite to be converted into the process inquiry descriptive, and view rewritten, to be rewritten as quoting of base table that view is related to quoting of view, optimized subquery simultaneously, obtain an inquiry semantically of equal value fully, thereby made things convenient for optimizer that it is done further to optimize, merge the attended operation that the subquery that some is specific is rewritten as a plurality of tables of equal value by subquery simultaneously, thereby the level of query statement is reduced as much as possible, made things convenient for the processing of planning optimization, stable and reliable working performance, the scope of application are comparatively extensive.
Description of drawings
Fig. 1 is of the present invention based on XML query processing process synoptic diagram in the method for subquery realization data source XML inquiry system optimization.
Fig. 2 is of the present invention based on XML query rewrite processing procedure synoptic diagram in the method for subquery realization data source XML inquiry system optimization.
Embodiment
In order more to be expressly understood technology contents of the present invention, describe in detail especially exemplified by following examples.
See also illustrated in figures 1 and 2ly, wherein XQGM (XML Query Graph Model) is an extending mark language query graph table schema; Isomery point tuple (tuples), the tuple of expression heterogeneous database information point.
Should realize the method that data source XML inquiry system is optimized based on subquery, this method is applied to heterogeneous data source, comprising following steps:
(1) system carries out initialization operation;
(2) system receives corresponding XML inquiry input information according to user's input operation;
(3) system carries out morphology parsing and syntax parsing to described XML inquiry input information, and verifies the correctness and the validity of this XML inquiry input information;
(4) if resolve successfully and checking is passed through, then system generates corresponding XML and resolves syntax tree, may further comprise the steps:
(a) XML being inquired about input information carries out encoding between dynamic area;
(b) generate target XML inquiry input information dynamic area between coding schedule;
(c) information stores as much as possible in the XML inquiry input information is resolved syntax tree thereby obtain XML in relational database;
(5) system resolves syntax tree to described XML and translates, and described XML inquiry input information is converted to intermediate logic represents, is specially:
Generate corresponding relational view template at each inquiry composition in the described XML inquiry input information respectively;
Described inquiry composition comprises built-in function, assignment expression, conditional expression and the circulation expression formula in various operational characters, the FLWOR expression formula;
(6) described intermediate logic is represented to rewrite processing, and generate the target query expression formula after rewriteeing; This rewrites processing to middle logical expressions, comprises that subquery merges the processing operation and predicate of equal value rewrites the conversion process operation, and wherein said subquery merges the processing operation, may further comprise the steps:
(a) subquery during system represents described intermediate logic and the FROM clause of outer query are connected to same FROM clause, and revise corresponding operational factor;
The predicate symbol of the subquery during (b) system represents described intermediate logic is made corresponding modify; Predicate symbol is made corresponding modify, is specially:
" IN " is revised as "=" with predicate symbol;
The WHERE condition of the subquery during (c) system represents described intermediate logic is done as a wholely to merge mutually with the WHERE condition of outer query, and connects with AND condition conjunction;
This equivalence predicate rewrites the conversion process operation, may further comprise the steps:
(a) the predicate expression formula that a plurality of OR during system represents described intermediate logic connect is converted into the ANY expression formula;
(b) ANY expression formula or All expression formula during system represents described intermediate logic are converted into single value;
(c) the BETWEEN expression formula during system represents described intermediate logic is converted into the predicate expression formula that AND connects;
(d) the IN predicate expression formula during system represents described intermediate logic expands to the predicate expression formula that OR connects; Simultaneously, the target query expression formula that this generates after rewriteeing is specially:
System makes up described relational view template, and constitutes the SQL query expression formula of equal value mutually with this relational view template;
(7) system call supports the inquiry computing engines of described target query expression formula to carry out the inquiry calculating of heterogeneous data source, obtains and export corresponding Query Result.
Simultaneously, should be the XQuery document based on the XML inquiry input information in the method for subquery realization data source XML inquiry system optimization, it is that XQuery resolves syntax tree that described XML resolves syntax tree.
In the middle of reality is used, at the XML query optimization, improve the search efficiency of XML, the present invention relates generally to two kinds of methods optimizing subquery, and the firstth, subquery merges, and the secondth, predicate of equal value rewrites.It is the attended operation that the subquery that some is specific is rewritten as a plurality of tables of equal value that subquery merges.In general, planning optimization only evaluates and optimizes the plan that inquiry generated on the same level, and the effect that subquery merges is the level of query statement is reduced as much as possible, thereby makes things convenient for the processing of planning optimization.
Merging three principles should following at this subquery is:
(1) if the result of outer query does not repeat, promptly comprises among the SELECT clause, then can merge its subquery, and should add the DISTINCT sign before the SELECT clause of the inquiry after merging;
(2) if the DISTINCT sign is arranged among the SELECT clause of outer query, we can directly carry out the subquery merging so;
(3) if the internal layer Query Result does not repeat tuple, then can merge.
The concrete steps that subquery merges are as follows:
(1) the FROM clause with subquery and outer query is connected to same FROM clause, and revises corresponding operational factor;
(2) predicate symbol of subquery is made corresponding modify, for example " IN " is revised as "=";
(3) the WHERE condition of subquery is made as a whole WHERE condition and merged, and connect, thereby the context that guarantees newly-generated predicate and former predicate is equivalent in meaning, and become as a whole with AND condition conjunction with outer query.
Next is that predicate of equal value rewrites.Owing to carry out the disposal route difference of engine to various predicates, therefore be rewritten as logical expression of equal value and expression formula that efficient is higher, be the effective ways of raising the efficiency.
Predicate transformation rule in this proposition has:
(1) predicate that a plurality of OR are connected is converted into the ANY expression formula;
(2) ANY or All are converted into single value;
(3) BETWEEN is converted into the predicate that AND connects;
(4) IN predicate expression formula is expanded to the predicate expression formula that OR connects.
Usually, the execution efficient that predicate rewrites improving inquiry is helpful, and particularly when a plurality of rewriting rules all can be used, the raising of efficient often can reach up to a hundred even thousands of times.The situation of antithetical phrase query rewrite is fewer but then.In fact, it mainly is to be used for inquiry is converted into the single query statement that subquery rewrites, reduce the nested level of inquiry as far as possible, for the planning optimization stage prepares.
The query rewrite target has two:
(1) making inquiry is descriptive as far as possible, and the query statement of user writing is descriptive outwardly, but in fact some part is a process, can make the query optimizer of database can not select the optimum executive plan of this inquiry like this.So a main target of query rewrite is that the inquiry with process is converted into descriptive equivalence inquiry.
(2) finish nature heuristic optimization some be acknowledged as valuable heuristic rule and can finish with query rewrite, be converted into the higher predicate of efficient as the predicate of poor efficiency.This rule-like can improve the efficient that inquiry is carried out significantly, reduces the time that inquiry is carried out.
Using the view rewritten query is that the patent research achievement that proposes in the nearest more than ten years neither be a lot.May be one for the rewriting of part Materialized View is easy to realize and comparatively actual thought.Intuitively, this thought is variable and a kind of mapping of finding out in the inquiry each other of the variable in the view definition.If there is a mapping in all variablees, then the selection of inquiry " can be pushed " in the view definition, then the view (wishing littler) that rewrites is carried out materialization so that evaluation is carried out in inquiry.
It is a bit weaker to use view rewritten query method and traditional query rewrite method to compare on efficient, because use the view rewritten query to need the part materialization of view and need carry out original query in its materialization; But traditional query rewrite technology but needs an inquiry is optimized (may be very big), and this may be impracticable sometimes.Use the shortcoming of view rewritten query method to be, when on original view, answering subquery, may wish sometimes these view fragments are reused.Use the part Materialized View to be difficult to realize this point, this even ratio use in relational database and have more challenge.
At first provide some related notions that use in the query rewrite.Here use Datalog expression-form definition inquiry, its form is as follows:
q(X):-P1(Y 1),...P n(Y n)......(1-1)
In the formula (1-1), q and P 1..., P nAll are predicate titles, q represents Query Result relation, P 1..., P nPoint to the relation in the database.Q (X) is called the inquiry head, and remainder is called inquiry body, P 1(y 1) ..., P n(Y n) be called the basic form of inquiry in the body.Tuple X, Y 1..., Y nOnly comprise variable or constant.If
Figure BDA0000037094560000071
Each variable that promptly appears in the inquiry head also must appear in the inquiry body, claims that then inquiry is safe.
Inquiry can contain in the body band comparison predicate (<,≤,=, ≠, 〉=,>) basic form.In this case, if we require a variable x to appear in the basic form that contains comparison predicate, it also must appear in the common basic form.In addition, the result who when discussing, with Q (D) expression calculating the inquiry of database instance D is obtained usually.
Definition 1.1: inquiry comprises and equivalence of query.If for all database instance D, calculate the result set that obtains of inquiry Q1 and all be the subclass of calculating the result set that another inquiry Q2 obtains, promptly
Figure BDA0000037094560000072
Then claim Q 1Be contained in Q 2, note is done
Figure BDA0000037094560000073
If
Figure BDA0000037094560000074
And
Figure BDA0000037094560000075
Then claim Q 1And Q 2Be of equal value, note is made Q 1=Q 2
Definition 1.2: rewriting of equal value, local minimum rewrite, overall situation minimum rewrites and rewriting fully.A given inquiry Q and a view collection V=(V 1..., V m), if an inquiry E satisfies:
(1) only comprises basic form or the comparison predicate that appears among the V among the E;
(2) E and Q are of equal value, and promptly E produces identical result with Q to any database instance, claim that then E is of equal value rewrite of Q based on V.
Under the prerequisite that guarantees E and Q equivalence, if can not eliminate any basic form again from E, it is local minimum then weighing and writing E; Under the same prerequisite, if E contains the minimum rewriting of basic form, then weighing and writing E is that the overall situation is minimum.
Example 1.1: for given inquiry Q and view V, its form is as follows:
Q:q(X,U):-P(X,Y),p 0(Y,Z),P 1(X,W),p 2(W,U)
V:v(A,B):-P(A,C),p 0(C,B),p 1(A,D)
Following inquiry E is of equal value rewrite of inquiry Q based on view V:
E:q(X,U):-V(X,Z),p 1(X,W),p 2(W,U)
If only comprise basic form or built-in predicate among the V among the rewriting E of inquiry Q based on view collection V, claim that then E is the rewriting fully of Q.
Example 1.2: for the inquiry Q in the example 11, view V, when given following view V1:
Vl:v1(A,B):-p 1(A,C),p 2(C,B),p 0(D,E)
Following inquiry E is that Q is based on V and V 1One rewrite fully:
E:q(X,U):-v(X,Z),v 1(X,U)
Here need to prove that whole rewrite process is not that single step is carried out, promptly be not to use view V to rewrite earlier, and then merge V 1Just now result is finally rewritten E's; Use V and V 1Two parallel carrying out of process that view rewrites.
Definition 1.3: comprise mapping.With regard to form, an inquiry Q 1To another inquiry Q 2Comprise the mapping be meant from Q 1To Q 2Variable mapping so that Q 1In each basic form can be mapped to Q 2In on each basic form.In fact, in order to show Q 1Comprise Q 2, comprise mapping and also should comprise from Q 1The inquiry head is to Q 2The mapping of inquiry head.But in order discussing conveniently, to comprise mapping and only refer to here from Q 1The inquiry body is to Q 2The mapping of inquiry body.
Comprise mapping and be used for representing relation of inclusion between the conjunctive query more, and inquire about comprise with the rewriting of finding the solution an inquiry closely related.The correctness of query rewrite can { A->X, B->Z, C->Y, D->W} obtain proof by comprising mapping in the example 1.1.
As inquiry Q 1And Q 2In when all not containing built-in predicate, find Q 1To Q 2One to comprise mapping be to determine Q 1Comprise Q 2A necessary and sufficient condition, and this problem is NP one complete problem, this conclusion is for Q 2In contain built-in predicate situation also set up.But work as Q 1In when containing built-in predicate, find a Q 1To Q 2The mapping that comprises just determine Q 1Comprise Q 2Adequate condition.
Provide the theorem that several descriptions use the query rewrite method to exist and find the solution below:
Theorem 11: establishing inquiry Q and view v all is the conjunctive query that contains built-in predicate, and and if only if based on the rewriting of v then a Q
Figure BDA0000037094560000081
Be that Q is contained in the projection of v on the row empty set in the projection on the row empty set.Here,
Figure BDA0000037094560000082
Mean that for a given database instance if the Query Result of V is an empty set, then the Query Result of Q also is an empty set.
Theorem 1.2: given inquiry Q is suc as formula (1-1) and view collection v, and both all do not contain built-in predicate:
(1) if E is Q based on local minimum rewriting of v, then database one concerns the basic form subclass isomorphism of basic form collection and Q among the E;
(2) if Q based on rewriting form such as q (X) :-P of v 1(Y 1) ... P n(Y n), v 1(Z 1) ..., v k(Z k), then also there is rewriting that form following: the q (X) of Q based on v :-p 1(Y 1) ..., p n(Y n), v 1(U 1) ..., v k(U k).Here
Figure BDA0000037094560000091
Any new variables is not introduced in i.e. this rewriting;
(3) if contain built-in predicate among Q and the v, then there is the rewriting that provides in similar (2), it may be the connection of conjunctive query that unique difference is to rewrite.
Theorem 1.2 shows, can find its rewriting of not introducing new variables for an inquiry, so when finding the solution the rewriting of an inquiry, needn't consider that those contain the rewriting that the database one that had not occurred in the original query concerns the basic form variable, only consider that the rewriting that comprises view basic form and original query basic form subclass gets final product.But when comprising built-in predicate in the view, (2) are invalid in the theorem 1.2,1.3 explanations for example.
Example 1.3: for given inquiry Q:q (X, Y, U, W) :-p (X, Y), r (U, W), r (W, U) and view V:v (A, B, C, D): (A, B), (C, D), C≤D does not exist Q to rewrite based on the conjunctive query of not introducing new variables of V to r to-p.
But, E:q (X, Y, U, W) :-v (X, Y, C, D), (U, W), (W U) is the rewriting of Q based on V to r to r.Further, Q based on the rewriting of extracting of not introducing new variables of V is:
Q′:q(X,Y,U,W):-v(X,Y,U,W),r(W,U)
Q′:q(X,Y,U,w):-v(X,Y,U,W),r(U,W)
A free-revving engine that uses the view rewritten query is a cost of calculating original query in order to reduce.Therefore, in order to optimize an inquiry, not only need to find the rewriting of an inquiry, the minimum that also should solve this inquiry rewrites.Next the relevant redundant basic form number in the rewritten query that reduces is discussed, is found the solution minimum method and the complexity problem thereof that rewrites.At last, minimum rewriting problem is found the solution in announcement two independently complicacy sources.
Theorem 1.3: establish inquiry Q suc as formula (1-1), V is a view collection, and Q and V all do not contain built-in predicate.If p basic form arranged among the Q, to be Q rewrite based on the local minimum of view collection V E fully, and p basic form then arranged at most among the E.
Proof: establishing E ' is the expansion that rewrites E of Q, and the view basic form among the E is all replaced by the corresponding view definition.Consideration is from Q to E ' comprise the mapping M, each basic form L among the Q 1..., L pBe mapped in the expansion of maximum view basic forms among the E '.If there is the view basic form more than p among the E, then certain reflection of the expansion discord M of certain view basic form that exists intersects among the E '.If intersect, when guaranteeing the equivalence of E and Q, this view basic form can be eliminated.Therefore, the rewriting of existence has p view basic form at most.
Because in V, can find with database in the view of relationship consistency, so this theorem has guaranteed any local minimum rewriting of Q p basic form arranged at most all, and shows that Q can not increase the number of basic form in the inquiry based on the minimum rewriting of view collection V.
For the minimum of trying to achieve an inquiry rewrites, need as much as possible to eliminate the redundant basic form in the inquiry after rewriteeing, thereby determine the necessary basic form collection (the redundant basic form collection basic form collection necessary with it of inquiry Q be supplementary set each other) of inquiry Q.Introduce the redundant basic form of how determining to rewrite in the inquiry of back below.Given inquiry Q form is suc as formula (1-1), and view definition is as follows:
V:v(U):-r 1(W 1),...,r m(W m)......(1-2)
Comprise mapping if h is from v to q, it is as follows to inquiring about the result who obtains among the Q to increase basic form suitable among the view V:
q(X):-P 1(y 1),...,p n(Y n),V(Z)......(1-3)
Here Z=h (U).With the view basic form in the view definition alternate form (1-3) in the formula (1-2), in replacement process, variable in the rename formula according to the following rules (1-2): appear at each variable T among the U by RNTO h (T), do not appeared at variable among the U by P of RNTO i(Y i) in the new variables that do not have.The result is as follows:
q(X):-p 1(Y 1),...,P n(Y n),r 1(V 1),...,r m(V m)......(1-4)
Illustrate that in the whole process, variable is unique p that both appeared among the Z i(Y i) in appear at r again j(V j) in variable.
Given mapping h exists shine upon Φ comprising naturally of being defined as follows from formula (1-4) to formula (1-1), in this mapping Φ, and each basic form P i(Y i) be mapped to itself, each basic form r j(V j) be mapped on the same basic form in the formula (1-1) that under mapping h, is mapped to.Here, each variable that comprises among the mapping Φ mapping Z arrives himself.
Each basic form p in the formula (1-1) i(Y i) reflection under Φ all is himself or r j(W j) in some basic forms.We claim to be mapped to P under Φ i(Y i) basic form r j(w j) be p i(Y i) related basic form.If h does not shine upon r j(w j) in the same basic form of two basic forms in the formula (1-1) on, each P then i(Y i) correlation arranged at most.
The concrete steps that subquery merges are as follows:
(1) the FROM clause with subquery and outer query is connected to same FROM clause, and revises corresponding operational factor;
(2) predicate symbol of subquery is made corresponding modify, for example " IN " is revised as "=";
(3) the WHERE condition of subquery is made as a whole WHERE condition and merged, and connect, thereby the context that guarantees newly-generated predicate and former predicate is equivalent in meaning, and become as a whole with AND condition conjunction with outer query.
Predicate of equal value rewrites transformation rule to be had:
(1) predicate that a plurality of OR are connected is converted into the ANY expression formula;
(2) ANY or All are converted into single value;
(3) BETWEEN is converted into the predicate that AND connects;
(4) IN predicate expression formula is expanded to the predicate expression formula that OR connects.
Present query optimization is handled and comprised two stages usually: the phase one is called query rewrite, this stage is analyzed the internal representation of inquiry, and make some equivalence transformations as required, to its objective is in order being the higher form of efficient with query transformation, the preparation of necessity also to be provided for the subordinate phase of query optimization on the other hand; Subordinate phase is referred to as planning optimization, and this is the Main Stage of optimizing, and its determines connection order and the method for attachment in query execution plan and uses which type of access method.Query rewrite is the conversion a kind of of equal value of inquiry, be used for a kind of Boolean query plan is transformed into another kind of different but Boolean query plan semantically of equal value, its primary and foremost purpose is to make the Boolean query plan after rewriteeing have higher efficient than the inquiry plan before rewriteeing on execution performance.On the other hand, construct query optimizer for the principle that adopts query rewrite and planning optimization to combine.
The XML query rewrite is np problem, can be studied from a plurality of directions, for example, open to discussionly how efficiently to finish rewrite process apace, can also discuss and how utilize view to answer inquiry as much as possible.Usually, the implementation procedure of an XML query rewrite is mainly handled through following several steps as shown in Figure 1:
(1) user XQuery document is carried out morphology, syntax parsing, the correctness and the validity of checking XQuery document produce an XQuery and resolve syntax tree;
(2) the XQuery syntax tree after resolving is translated, the XQuery question blank is shown a kind of intermediate logic represents;
(3) rewrite (2) middle intermediate logic and represent the target query expression formula after the rewriting of generation expectation;
(4) use the inquiry computing engines of the target query expression formula that obtains after the support rewriting to inquire about calculating, obtain Query Result.
Because the XML data mostly are semi-structured data, its data model is different with the data model of traditional database, so the relevant query rewrite technology of traditional database can not be applied directly in the XML inquiry.Though traditional query rewrite method can not be applied directly in the XML query rewrite, in research process, can use for reference wherein rewriting thinking and existing achievement in research.For example, can adopt thought to rewrite or answer an XML inquiry based on view.As coding thinking between a kind of dynamic area of the present invention, its thinking is at first one piece of XML document to be carried out encoding between dynamic area, generates coding schedule between the dynamic area of target XML document, thus information stores as much as possible in the XML document in relational database; Then, each the inquiry composition (built-in function in for example various operational characters, the FLWOR expression formula, assignment expression, conditional expression and circulation expression formula etc.) in the user XQuery inquiry is all generated corresponding relational view template; Make up these templates at last and constitute SQL query of equal value with it, utilize relational database engine to realize inquiry.
Adopted above-mentioned method based on the optimization of subquery realization data source XML inquiry system, owing to contained process in the inquiry that the user provides, thereby can utilize query rewrite to be converted into the process inquiry descriptive, and view rewritten, to be rewritten as quoting of base table that view is related to quoting of view, optimized subquery simultaneously, obtain an inquiry semantically of equal value fully, thereby made things convenient for optimizer that it is done further to optimize, merge the attended operation that the subquery that some is specific is rewritten as a plurality of tables of equal value by subquery simultaneously, thereby the level of query statement is reduced as much as possible, made things convenient for the processing of planning optimization, stable and reliable working performance, the scope of application are comparatively extensive.
In this instructions, the present invention is described with reference to its certain embodiments.But, still can make various modifications and conversion obviously and not deviate from the spirit and scope of the present invention.Therefore, instructions and accompanying drawing are regarded in an illustrative, rather than a restrictive.

Claims (11)

1. realize the method that data source XML inquiry system is optimized based on subquery for one kind, this method is applied to heterogeneous data source, it is characterized in that, described method may further comprise the steps:
(1) system carries out initialization operation;
(2) system receives corresponding XML inquiry input information according to user's input operation;
(3) system carries out morphology parsing and syntax parsing to described XML inquiry input information, and verifies the correctness and the validity of this XML inquiry input information;
(4) if resolve successfully and checking is passed through, then system generates corresponding XML and resolves syntax tree;
(5) system resolves syntax tree to described XML and translates, and described XML inquiry input information is converted to intermediate logic represents;
(6) described intermediate logic is represented to rewrite processing, and generate the target query expression formula after rewriteeing;
(7) system call supports the inquiry computing engines of described target query expression formula to carry out the inquiry calculating of heterogeneous data source, obtains and export corresponding Query Result.
2. the method for optimizing based on subquery realization data source XML inquiry system according to claim 1 is characterized in that the corresponding XML of described generation resolves syntax tree, may further comprise the steps:
(11) XML being inquired about input information carries out encoding between dynamic area;
(12) generate target XML inquiry input information dynamic area between coding schedule;
(13) information stores as much as possible in the XML inquiry input information is resolved syntax tree thereby obtain XML in relational database.
3. according to claim 2ly realize the method that data source XML inquiry system is optimized, it is characterized in that, describedly XML is inquired about input information be converted to intermediate logic and represent, be specially based on subquery:
Generate corresponding relational view template at each inquiry composition in the described XML inquiry input information respectively.
4. the method that realizes the optimization of data source XML inquiry system based on subquery according to claim 3, it is characterized in that described inquiry composition comprises built-in function, assignment expression, conditional expression and the circulation expression formula in various operational characters, the FLWOR expression formula.
5. according to claim 1ly realize the method that data source XML inquiry system is optimized, it is characterized in that, described middle logical expressions are rewritten processing, comprise that subquery merges to handle operation and predicate of equal value and rewrite conversion process and operate based on subquery.
6. the method for optimizing based on subquery realization data source XML inquiry system according to claim 5 is characterized in that described subquery merges handles operation, may further comprise the steps:
(21) subquery during system represents described intermediate logic and the FROM clause of outer query are connected to same FROM clause, and revise corresponding operational factor;
The predicate symbol of the subquery during (22) system represents described intermediate logic is made corresponding modify;
The WHERE condition of the subquery during (23) system represents described intermediate logic is done as a wholely to merge mutually with the WHERE condition of outer query, and connects with AND condition conjunction.
7. according to claim 6ly realize the method that data source XML inquiry system is optimized, it is characterized in that, described predicate symbol is made corresponding modify, be specially based on subquery:
" IN " is revised as "=" with predicate symbol.
8. the method for optimizing based on subquery realization data source XML inquiry system according to claim 5 is characterized in that described predicate of equal value rewrites the conversion process operation, may further comprise the steps:
(31) the predicate expression formula that a plurality of OR during system represents described intermediate logic connect is converted into the ANY expression formula;
(32) ANY expression formula or All expression formula during system represents described intermediate logic are converted into single value;
(33) the BETWEEN expression formula during system represents described intermediate logic is converted into the predicate expression formula that AND connects;
(34) the IN predicate expression formula during system represents described intermediate logic expands to the predicate expression formula that OR connects.
9. the method for optimizing based on subquery realization data source XML inquiry system according to claim 3 is characterized in that the target query expression formula after described generation rewrites is specially:
System makes up described relational view template, and constitutes the SQL query expression formula of equal value mutually with this relational view template.
10. according to each described method of optimizing based on subquery realization data source XML inquiry system in the claim 1 to 9, it is characterized in that described XML inquiry input information is the XQuery document.
11. the method for optimizing based on subquery realization data source XML inquiry system according to claim 10 is characterized in that it is that XQuery resolves syntax tree that described XML resolves syntax tree.
CN 201010580677 2010-12-09 2010-12-09 Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries Pending CN101984439A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010580677 CN101984439A (en) 2010-12-09 2010-12-09 Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010580677 CN101984439A (en) 2010-12-09 2010-12-09 Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries

Publications (1)

Publication Number Publication Date
CN101984439A true CN101984439A (en) 2011-03-09

Family

ID=43641608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010580677 Pending CN101984439A (en) 2010-12-09 2010-12-09 Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries

Country Status (1)

Country Link
CN (1) CN101984439A (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102354318A (en) * 2011-09-22 2012-02-15 用友软件股份有限公司 Device and method for reducing ad hoc query languages in database system
CN102799624A (en) * 2012-06-19 2012-11-28 北京大学 Large-scale graph data query method in distributed environment based on Datalog
CN102867257A (en) * 2011-07-08 2013-01-09 阿里巴巴集团控股有限公司 Method and device for processing network logistic data
CN102982075A (en) * 2012-10-30 2013-03-20 北京京东世纪贸易有限公司 Heterogeneous data source access supporting system and method thereof
CN103226599A (en) * 2013-04-23 2013-07-31 翁杰 Method and system for accurately extracting webpage content
CN103530538A (en) * 2012-07-03 2014-01-22 沈阳高精数控技术有限公司 XML safety view querying method based on Schema
CN104484472A (en) * 2014-12-31 2015-04-01 天津南大通用数据技术股份有限公司 Database cluster for mixing various heterogeneous data sources and implementation method
CN104536987A (en) * 2014-12-08 2015-04-22 联动优势电子商务有限公司 Data query method and device
CN105518676A (en) * 2013-07-31 2016-04-20 甲骨文国际公司 Generic sql enhancement to query any semi-structured data and techniques to efficiently support such enhancements
CN105718593A (en) * 2016-01-28 2016-06-29 长春师范大学 Database query optimization method and system
CN107632999A (en) * 2017-07-24 2018-01-26 杭州沃趣科技股份有限公司 A kind of method that multiple associated predicates are merged
CN108121733A (en) * 2016-11-29 2018-06-05 北京国双科技有限公司 The querying method and device of a kind of data
CN108345648A (en) * 2018-01-18 2018-07-31 北京奇安信科技有限公司 A kind of method and device of the acquisition log information based on column storage
CN108369591A (en) * 2015-12-07 2018-08-03 华为技术有限公司 System and method for caching and parameterizing ir
CN109308300A (en) * 2018-09-27 2019-02-05 上海达梦数据库有限公司 A kind of processing method of logical operation, device, conversion plug-in unit and storage medium
CN109376220A (en) * 2018-12-12 2019-02-22 北京字节跳动网络技术有限公司 Method and apparatus for obtaining information
CN109492383A (en) * 2018-11-09 2019-03-19 四川长虹电器股份有限公司 A kind of analytic method of data permission
CN109656950A (en) * 2018-12-12 2019-04-19 上海达梦数据库有限公司 Recursive query method, apparatus, server and storage medium
CN109753658A (en) * 2018-12-29 2019-05-14 百度在线网络技术(北京)有限公司 Exchange method and device
CN110008238A (en) * 2019-03-12 2019-07-12 北京东方国信科技股份有限公司 NLJ improves table connection method and the data query method based on the improved method
CN110688393A (en) * 2019-09-29 2020-01-14 星环信息科技(上海)有限公司 Query statement optimization method and device, computer equipment and storage medium
CN110968579A (en) * 2018-09-30 2020-04-07 阿里巴巴集团控股有限公司 Execution plan generation and execution method, database engine and storage medium
CN111078950A (en) * 2019-11-29 2020-04-28 国网福建省电力有限公司经济技术研究院 XML data access method and system based on full-service unified data center
CN112069305A (en) * 2020-11-13 2020-12-11 北京智慧星光信息技术有限公司 Data screening method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609855A (en) * 2003-06-23 2005-04-27 微软公司 Query optimizer system and method
US7107255B2 (en) * 2001-06-21 2006-09-12 International Business Machines Corporation Self join elimination through union
CN101571863A (en) * 2008-04-29 2009-11-04 国际商业机器公司 XML query method and XML query system for variable-model XML documents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7107255B2 (en) * 2001-06-21 2006-09-12 International Business Machines Corporation Self join elimination through union
CN1609855A (en) * 2003-06-23 2005-04-27 微软公司 Query optimizer system and method
CN101571863A (en) * 2008-04-29 2009-11-04 国际商业机器公司 XML query method and XML query system for variable-model XML documents

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《中国优秀硕士学位论文全文数据库信息科技辑》 20020301 周冬平 关系数据库查询优化技术的研究与实现 138-324 , 2 *
《燕山大学学报》 20060131 车建华等 基于视图的查询重写 42-47 1-11 第30卷, 第1期 2 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867257A (en) * 2011-07-08 2013-01-09 阿里巴巴集团控股有限公司 Method and device for processing network logistic data
CN102354318A (en) * 2011-09-22 2012-02-15 用友软件股份有限公司 Device and method for reducing ad hoc query languages in database system
CN102354318B (en) * 2011-09-22 2013-09-11 用友软件股份有限公司 Device and method for reducing ad hoc query languages in database system
CN102799624B (en) * 2012-06-19 2015-03-04 北京大学 Large-scale graph data query method in distributed environment based on Datalog
CN102799624A (en) * 2012-06-19 2012-11-28 北京大学 Large-scale graph data query method in distributed environment based on Datalog
CN103530538B (en) * 2012-07-03 2016-05-18 沈阳高精数控技术有限公司 A kind of XML secured views querying method based on Schema
CN103530538A (en) * 2012-07-03 2014-01-22 沈阳高精数控技术有限公司 XML safety view querying method based on Schema
CN102982075A (en) * 2012-10-30 2013-03-20 北京京东世纪贸易有限公司 Heterogeneous data source access supporting system and method thereof
CN102982075B (en) * 2012-10-30 2016-10-05 北京京东世纪贸易有限公司 Support to access the system and method for heterogeneous data source
CN103226599A (en) * 2013-04-23 2013-07-31 翁杰 Method and system for accurately extracting webpage content
CN103226599B (en) * 2013-04-23 2018-09-28 翁杰 A kind of method and system of accurate extraction web page contents
CN105518676B (en) * 2013-07-31 2019-12-17 甲骨文国际公司 Universal SQL enhancement to query arbitrary semi-structured data and techniques to efficiently support such enhancements
CN105518676A (en) * 2013-07-31 2016-04-20 甲骨文国际公司 Generic sql enhancement to query any semi-structured data and techniques to efficiently support such enhancements
CN104536987A (en) * 2014-12-08 2015-04-22 联动优势电子商务有限公司 Data query method and device
CN104536987B (en) * 2014-12-08 2017-12-05 联动优势电子商务有限公司 A kind of method and device for inquiring about data
CN104484472B (en) * 2014-12-31 2018-10-16 天津南大通用数据技术股份有限公司 A kind of data-base cluster and implementation method of a variety of heterogeneous data sources of mixing
CN104484472A (en) * 2014-12-31 2015-04-01 天津南大通用数据技术股份有限公司 Database cluster for mixing various heterogeneous data sources and implementation method
CN108369591A (en) * 2015-12-07 2018-08-03 华为技术有限公司 System and method for caching and parameterizing ir
CN108369591B (en) * 2015-12-07 2021-08-13 华为技术有限公司 System and method for caching and parameterizing IR
CN105718593B (en) * 2016-01-28 2019-04-16 长春师范大学 A kind of database inquiry optimization method and system
CN105718593A (en) * 2016-01-28 2016-06-29 长春师范大学 Database query optimization method and system
CN108121733B (en) * 2016-11-29 2021-10-15 北京国双科技有限公司 Data query method and device
CN108121733A (en) * 2016-11-29 2018-06-05 北京国双科技有限公司 The querying method and device of a kind of data
CN107632999A (en) * 2017-07-24 2018-01-26 杭州沃趣科技股份有限公司 A kind of method that multiple associated predicates are merged
CN108345648A (en) * 2018-01-18 2018-07-31 北京奇安信科技有限公司 A kind of method and device of the acquisition log information based on column storage
CN109308300B (en) * 2018-09-27 2021-11-12 上海达梦数据库有限公司 Logic operation processing method and device, conversion plug-in and storage medium
CN109308300A (en) * 2018-09-27 2019-02-05 上海达梦数据库有限公司 A kind of processing method of logical operation, device, conversion plug-in unit and storage medium
CN110968579B (en) * 2018-09-30 2023-04-11 阿里巴巴集团控股有限公司 Execution plan generation and execution method, database engine and storage medium
CN110968579A (en) * 2018-09-30 2020-04-07 阿里巴巴集团控股有限公司 Execution plan generation and execution method, database engine and storage medium
CN109492383A (en) * 2018-11-09 2019-03-19 四川长虹电器股份有限公司 A kind of analytic method of data permission
CN109492383B (en) * 2018-11-09 2022-02-01 四川长虹电器股份有限公司 Data permission analysis method
CN109376220A (en) * 2018-12-12 2019-02-22 北京字节跳动网络技术有限公司 Method and apparatus for obtaining information
CN109656950B (en) * 2018-12-12 2020-08-07 上海达梦数据库有限公司 Recursive query method, device, server and storage medium
CN109656950A (en) * 2018-12-12 2019-04-19 上海达梦数据库有限公司 Recursive query method, apparatus, server and storage medium
CN109753658A (en) * 2018-12-29 2019-05-14 百度在线网络技术(北京)有限公司 Exchange method and device
CN109753658B (en) * 2018-12-29 2023-09-19 百度在线网络技术(北京)有限公司 Interaction method and device
CN110008238B (en) * 2019-03-12 2021-04-27 北京东方国信科技股份有限公司 NLJ improved table connection method and data query method based on the improved method
CN110008238A (en) * 2019-03-12 2019-07-12 北京东方国信科技股份有限公司 NLJ improves table connection method and the data query method based on the improved method
CN110688393A (en) * 2019-09-29 2020-01-14 星环信息科技(上海)有限公司 Query statement optimization method and device, computer equipment and storage medium
CN111078950A (en) * 2019-11-29 2020-04-28 国网福建省电力有限公司经济技术研究院 XML data access method and system based on full-service unified data center
CN111078950B (en) * 2019-11-29 2022-10-04 国网福建省电力有限公司经济技术研究院 XML data access method based on full-service unified data center
CN112069305A (en) * 2020-11-13 2020-12-11 北京智慧星光信息技术有限公司 Data screening method and device and electronic equipment
CN112069305B (en) * 2020-11-13 2021-03-30 北京智慧星光信息技术有限公司 Data screening method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN101984439A (en) Method for realizing optimization of data source extensive makeup language (XML) query system based on sub-queries
CN103064875B (en) A kind of spatial service data distributed enquiring method
Kornacker et al. Impala: A Modern, Open-Source SQL Engine for Hadoop.
US8126870B2 (en) System and methodology for parallel query optimization using semantic-based partitioning
US20140101130A1 (en) Join type for optimizing database queries
CN107169033A (en) Relation data enquiring and optimizing method with parallel framework is changed based on data pattern
JP2005018776A5 (en)
CN105718593A (en) Database query optimization method and system
Lopes et al. On the semantics of heterogeneous querying of relational, XML and RDF data with XSPARQL
Vincini et al. Semantic integration of heterogeneous data sources in the momis data transformation system
An et al. Refining semantic mappings from relational tables to ontologies
Marathe et al. Integrating the Orca Optimizer into MySQL.
CN101719162A (en) Multi-version open geographic information service access method and system based on fragment pattern matching
Chardin et al. RQL: A sql-like query language for discovering meaningful rules
He et al. Extending and inferring functional dependencies in schema transformation
Kolev et al. Design and Implementation of the CloudMdsQL Multistore System.
Ngu et al. Heterogeneous Query Optimization Using Maximal Sub-Queries
CN112100209B (en) Top-K query and optimization method of federated RDF system based on query plan
Saveliev Implementation of generalized relational algebraic operations with AsterixDB BDMS
Liu et al. MUSYOP: towards a query optimization for heterogeneous distributed database system in energy data management
Amini et al. A RDF-based Data Integration Framework
Rochlani et al. Integrating heterogeneous data sources using XML mediator
Bhargavi et al. Join queries translation from SQL to XPath
Suri et al. Schema based storage of XML documents in relational databases
LIM et al. Source-Aware multidatabase query processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20110309