CA2411459A1 - Document conversion system, document conversion method and computer readable recording medium storing document conversion program - Google Patents

Document conversion system, document conversion method and computer readable recording medium storing document conversion program Download PDF

Info

Publication number
CA2411459A1
CA2411459A1 CA002411459A CA2411459A CA2411459A1 CA 2411459 A1 CA2411459 A1 CA 2411459A1 CA 002411459 A CA002411459 A CA 002411459A CA 2411459 A CA2411459 A CA 2411459A CA 2411459 A1 CA2411459 A1 CA 2411459A1
Authority
CA
Canada
Prior art keywords
document
conversion
schema
structured
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002411459A
Other languages
French (fr)
Inventor
Hideharu Suzuki
Norihiro Ishikawa
Hidetoshi Ueno
Hiromitsu Sumino
Takeshi Kato
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Docomo Inc
Original Assignee
NTT Docomo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NTT Docomo Inc filed Critical NTT Docomo Inc
Publication of CA2411459A1 publication Critical patent/CA2411459A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/16Automatic learning of transformation rules, e.g. from examples
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets

Abstract

This invention aims at reducing an total time required for document conversion by outputting an appropriate document data Which matches a document type definition after conversion so as to omit a validity verification Step in the document structure conversion.
Specifically, this invention provides a document conversion method for converting a first structured document F1, formed based on a first document type definition D1, to a second structured document F3 , formed based on a second document type definition D2, the document conversion method comprises analyzing the document type definition D1 and document type definition D2 and extracting a different document type definition, generating a conversion template T2 described therein a conversion rule which prevents the structured document F3, which is the result of document conversion process, from being contradictory to the document type definition D2, based on the results of the analysis, and performing document conversion process using the conversion template T2.

Description

TITLE OF TI3E I1~1VENTION
DOCUMENT CC1NVERSION SYSTEM, DOCTJNENT CONVERSION MRTHOD aIND
rc7l~iPOTBR REIrDABLE ~tECORDINt~ MBDxUK STOKING DOCtJMBNT CC?N'V1~R8T01~
7~'R06RAH
CR08S REFEREliIGE TO REL11.TED A1~PLTCATIONS
This app~.ication is based upon and claims the benef3.t of pz~iority from the pr5.or aapanese Patent Applications No_ P2001-346736, filed on Nvvembar 12, cool: th~ entire contents 1o of which are incorporated herein by reference.
H~IC~CGRO'fTND OF THJr TNVBNTION
1. Field of the Invention The pTeSerit Invention re3ates to a document conversiozl .15 system far converting a first structured document form~d by a first document Schema to a second structured document formed by a second dacume~tlt schema, a document conversion method and a computer rEadable recording medium storing a document Conversion p~:ograln.
20 2. Descript~.~~n of the Related Art Coriveritiorially, the structures. document which not only handles taXt data of 'Cext document files as mare charact~r suing but also is capable of expressing the logical structure of the documenC layout, attrzbuvos, ezc_ has been propos~d. For ~5 excunple , SGI~iL specified 'by InternatZOnaI 3tandardiaatloz~
Orgariizat7.on ( ISO) sra.ndard 8879 and XML specified by Worla w~.d~
Gdeb coilsortlutn ( W3C ) are currently avaXlable . According to the SGML and XML, the logical Structure of a document l.s specified by dncumelit type deflnltioll (DTD) and the roles o~ document component el~ments such as title, author's name, preface and text call be expressed using identifier for structure elements called document tag.
rn the structured document, specific meaning or role _ etc.
may need to be assigned to the identifier and addizion8l infozmatxon (attributes) can be added to the identifier to express ~Chis characteristic.
Further, the format of the stylesheet fox describing the style of docutrient, Which is required for displaying the stzuctured do~uitlent on the Screen and printing the structured document on paper, has been proposed. As the format of th~
style Sheet, for eXaritple, spec3ficration language (DSSSL) or ISO
Z5 standard 10179 arid extenSi~le stylesheet language (XSh) specified by W3C are available.
DSSSL and XSL describe the dooument style by specify3.ng a pattern for expressing the condition for the identifier Constituting SGML or XrtL and an action corresponding to the 2o identifier which satisfies that pattern.
The stylesheet provides the document style and converts the Structure of the document . The specification for e~czracting c particular pactern of the structured document in XSL iS called XSL 'Cransformation (XSLT) . The use of the XSLT enable the XMI.
:?5 document to be Converted according to predetermined conditions and outputted in a different format such as HTML for example.
The structured document is produc~xd by dividing document z data (text) into units wh~.ch have a m~aning structurally and make these units using slam~nts and attributes_ In XML, the method for defining the structure of the document data .i.s called schema arid generally, CLacumenz type definition (nTD) 1s used for defining the schema. The schema defines which elements should be possessed In wrist order and how many times as the content of the document and which attributes should b~ pos~sessad as the content of the document . Since the structured doeumaz~.t 5.tself has no definition About fiats, it cannot automatically check for lU 311 error even if data is tniSSing for same reason. Thus, document type definition is to be performed to display data or exchange data and the document needs to be described according to the def init ion .
Fig . 1 shows an example flout of a conventional document conversion pxaocess for the structured document F7. which is described by the XML. As shown in the figure, generally, the conversion p7oocess of the structured document is coznp7r~.sed of 2 steps. 'Chat is mainly conversion of document structure Sloi and its vali3ity verification process 5102.
2U The conversion of document structure sloe is a step of gerieratiilg a new document by extracting elements and attributes using a pattern matching technique arid replacing them with new elements and attributes or by adding new elements, att7Cibutes and text . This process is perfor~reed 'based on a conversion rule :?5 described in a conversion template TZ. The conversion template T1 contains a structure canv~rsion rule which is generated ss an XSL f ile ( conversion template T1 ) in advance _ In the meantime , as the ~SI,T conversion engine for the conversion of document structure process 5101, the existing software ( a . g . , xalan-C++ ) can be uTilized.
The validity verification process 81.02 is a step of verifysng whether the output (structured document F2) by the XsLT conversion process follows a document type definition D2 after conversion and isr performee. us~.nq the document type definition DZ after conversion. Tae validity verification process S102 c:an be performed by the existing sof tware ( ~ . g . , l0 XMLaC ) _ Z~ the result of the validity verification process 5102 is acceptable, a new structured documeBt F3 is g~snerated. I~
~.t xs not Scceytable, document structure correction process SlOd ~s performed for the structured document F2 based on the error contez~t, and the validity verification process 5102 is again perforicned for the corrected structured document F2.
Fig . 2p, is a diagram shouting a conventional example for converting the structured document F1 defined by the document type definit~.on D1 to the structured document F3 based one the conversion template T1. In the figure. the structured document 2o F2 after a fi~,st conversion (i) is contradictory to the document type definition D2, and the structured document F3, in which she contxadietions are corrected . In a document exsmple of Fig .
2A, UL element and ul element d~flne a statement row without any number (list in random orQer) and each statarnexst item defiz~,ed with hI element and ii element which aro lower order ~f UL and ul elements.
As thd element after the conversa.on, the ul element and 1i element correspond to trie UL element and LI element . zr~, Zhe struczureGi document f1 , a list cOifctprXS:lng txl.rAb statements 1s described_ In the structured docuiment F,Z containlnq contradictions, simply corresponding elements are replaced.
Tf sucr~ a rule that only one 1i element can be deffined under the u1 e.Lemexxt i.s specified in the document type definition D2 , each 1i e.Lement is to be a pub-a1~mant of u1 element ( each 1~L element is enclose8 by u1 tag) in the structured document F2. Consequently, it is corrected to an appropriate structured :10 document F3 which satisfies the document type definition D2.
Fa.g. 2H is an example of a description of a eonvantiona~, conversion template T1. 11.s shown in ttte figure, the conversion template T1 acts as a eonversian rule about conversion from the structured document F1 to the structure4 document F2 (i) containing cantradlcts.oz~s The corwersl.on template T1 is comprised of a pattern assigning part and a template assigning part_ Through converss_on process. a document gattern (tag) defined by the pattern assigning part 1s extracted from that structured document. Further, addltLOn. deletion and replacement are perfoz~med to the extracted document pattern according to the template assigning part 1n order to generate a new documAnt.
In the Conventional conversion template Tl, Qaeh of <xsl : templets match> . <xsl: apply-teznplate> . <xsl :value-of> is an element defined by the ~c5z, spec~.ficatlon.
1 ) ana ( 3 ) using <xsl - template cnatch> mean specifying the pattern and (1? means extraction of the UL element oohile (3) means extraction of the Lz element. (2) and (4) mean specifyxxl~g the template . The UL element is extracted according to the pattern specifying of ( 1 ) an4 then the template of ( 2 ) is specified.
The specifying template of ( ~ ) means describing the start teg of nl and describing the terminat7_on tag of ul after process of applying a template rule to the LI element is perfozmed. The templa'Ce ruled for the LI element afe ( 3 ) and ( 4 ) , and the LI
l.U element is ext:cacted according to the pattern s$ecifying of ( 3 ) .
Further, aS the template specifying of (4), the start tag of ii is described, a portion under the LI element is converted to text altd finally the termination tag of ii a.s desoribed. Sinee there are throe LI elem~nts in the structured document Fl, three 7.5 portions corresponding to the pattern specifying of the above ( 3 ) are extracted. Further, the template specifying of ( 4 ) is appl3.ed respectively and then the process is complete.
However, as describ~d above. 1n a case where the document type definit~.on Dl contains a contradiction with the document 2J 'type definlt~.on D2 (e.g. , specillcati.an which is inhibited in the document type definition D~). if only extracting elements/attributes acvordingto the conversion template Tland replacing(eonverting)to corresponding elaments/attra,butes or adding such elements/at~eributes ~.s performed, a contradiction 75 with the document type definition D2 remains.
According to the conventional structured document conversion method, both the document structure conversion process 5101 and the validity verification process S102 search elements /attributes from a route element to an end el~ment in the document data. Therefore, there is a problem that the conversion of 3ocument takes longer time as the requlfed times of the documerW correction process Slo4.
Further , there is a problem that if a refiult of the validity verification process Sloz is not acc~ptable, an operator must manually perform a document correction process 5104 in an off-line state bases on the result of the validity ver~.f~.cat3.on process S102.
gRi~F suy og sty =rm$~rxom It is therefore an object of tho present invention to reduce a total time tequ~.retl for document conversion by outputting an appropriate document data which rnatcbes a docuzaent typ~
definition after conversion so as to omit avalialty verification step in the document etructurc conversion.
The present invention has a feature of , upon converting n First structured document formed bas~d on a f~Lrst document 2C1 Schema into a second structured document formed Dared on a second document schema, analyzing the first document schema and the second document schema and extracting a different document type definition, generating a conversion template having described therein a conversion rule which prevents the second structures doCUmerit , which is the. result of a document conversion yrocess , :from being contradictory to tho second document schema, based on the resul'C of the analysis , and performing document conversion proCesS using the conversion template.
According to the present a.nvez~,tion. if there is an output logic wnlch does not satisfy the document type definition after conversion (second document schema) . reflecting a process for Correc~Llngacontradictionwithaconvez~siontemplare, the second structurefi document which is a result of the document structure conversion process can be mad~ appropriately according to the docuri1ent type definition after conversion. As a result, a validity Verification step after conversion, which is p0rformad conventionally, can be omitted, thereby reQucing a total time requited for the document conversion.
BRIEF DESCRIPTION bF TxE ssvs~. vz$~s o~ ~xs aR~~=r~6s Fig. 1 is a Schematic diagram showing the outline of a conventional document conv~rsion method;
Fig . 2A and 2H are diagrams snowing an example of generation of a conventional conversion temp~.ate;
Flg. 3 is s schematic diagram showing the outlin~ of a docuimezlt cvnvers~.on method of an embodiment of the pr~seat invention;
Fig. 4~~i and 48 are diagrams snowing an example of descr~.ptlon of the conversion template according to the embodimeno of the present invention;
Fig . 5R arid 5B are diagrams showing ate example of generation 0~ aChe1 con~~rersion template of the embodiment of the present invention;
Fig _ 6H and fiB are diagrams showing an example of generaz ior~.

of other conve~~sion t~mp~.ate of the embodiment of th~ present Invention;
Fig. 7 is a schematic diagram showing the outline of the document conversion method according to a modification of the embodiment of the present invention;
F~.g. S is s bloolc dsagram showing the configuration of a computer which a document convez~sion Drogram of the embodim~nt is installed:
FS.g. 9 ~.s a flowchart showing process of zhc computer which the doCUment conversion program Qf Zl'ie embodiment is installed F~.g. 7.0 is a p~rspective view sTiowing a computer r~adabl~
recorQ.zng medium in which the document conversion program of trie embodiment is stoned;.
Fig. 11 is a schematic alagxam showing tho process of the computer which the document coza.versivn program of the embodiment 1S znstallad; and Fig. 1~ is a schematic d2agram showing the process of docucnent cony~rsion via a communication n~twerlc using a computer In which the document conversion Drvgram of the embodimerit is l.n5talled.
Fig. 13 is a table show~.ng the identifier correspondence table and conversion rule relating to the embodiment of the present invention.
2:5 D~ETlLTLED DESCRIPTTON OF T13F INVSNTIO1H
Document Conversion M,~Lhod itereir~after, the ~mbodl~menfis of a document conversion method of the,presant invention wslZ be described. Fi.g. 3 is a schematic diagram showing the out7_ine of the document conversion method of this embodiment_ AS showri in the figure, a conversio~i temple~te T2 contains descripLi.on of an appropriate convers~.on rule based on interpreting c~ document type defini.21o7t7t D1 ( first document schema) Which ~.s applied before the conversion and a document type definition D2 (second document schema) which is applied aftet the conversion for outputting a result according to the document type definition D2 . In a document structure conversion process S101 , the document structure of a structured docunnent F1 (First Structured document) Grhich zs a document before conversion is converted according to the description of the conversion templat~ T2 in order to generate a nBw srructured docucnent F3 (second structured document).
Such 3 cvnv~rsion template T2 can be general~d by the following procedure. Tn the meant~..~me, according to this embodiment, the document type definition D1 and the document type aefixlition D2 era document data having an identifser (cnarlc 'tag ) for Cief~.nlng the logical structure of a character string Uf the document such as XML. anQ I~'~ML .
tiers, an identifier correspondence table and conversion rule are generated. Fig . 13 is a table wh2ch shows the identifier COFreSpOndellce table and conversion rule relating Zo this 2~ emboda.ment .
AS shown in Fig. 13 , the 3.aentifier correspondence table Ls a table which indicates the re~.ationship between the elements For Qefining the same logical sLJCucture like the UL ~lemant and the u1 element . The conv~rsion ruJ~e is comprised of a replaceable templa'Ce for defining the logical structure a~tar conversion ~tnd Lhe conditions for adapting the template.
The identifier eorrespoadenee table is generated based on Lrie relationship between elements expressed in capital 1~tters and small letters or elements using arguments having the same content or ale:ments havi.x~g the same function. Following this iciewtl~ieT correspondence table, the logical structures before and after Conversion are compared az~d portions that differ between them are decocted. For examp~.e, as shoarn in Fig. 2, 'the document type def~.nition of tx7.e logical structure formed of the UL element and LI element in the structured document F1 and the document type de~initsox~ of the log3.cal structure fornceed 1 S of tk~.e u1 element and 1i element i.n the structured document F3 ate Compared so as to detect dif;tering portions.
Further, ache conditions of these detected 8iffering por~C,ions are analysed. Accordxz~g tv can ~xampie shoran in Fig.
z.H.. if there ors plural LI 8lemez~.zs ( trwo or more ) , the UL element 1s nested with respect to each LI element. Therefore, in this example , ( LI > >2 ) is adapted as the condition . Then , a conversion rule is generate8 based on the conditions of th~ differing portions and the corn~sponding logical structure after COTIVerSion, and the conversion rule is reflccced on the conversion templat0 T2.
According to this embodiment , the conversion templrata Tz is Comprised of pattern specifying and template specifying . The pattern is for specifying an 3dantifier to be convert~n_ here.
an identifiEr described in the Identifier correspondence table .ts the said identifier. The template specifying the conversion rule in Fig. 13 is reflected and comprised of a template fox replacing which d~f inss the logLCal structure after the conversion, and the condzt~.oz~ for adapting th~ replaceable template.
Figs. 4(a) and 4(b) show the template rules T12, T22 as an e~cample of description of the convArsion templates T2 of this emboQiment. Tha example corrects the contradiction shown in Fig. 2 and the structured dncuzaEnt F3 is outputted by a single conversion (Fig. 2(iii)). According to the templat~ rule T12 of this embod~.ment , { 5 ) ana ( 7 ) indicate the putt~rn specifying _ (5) describes the ~xtraetlozl of the UL element, while (7) describes the extraction of Zne Ll element. Furth~r, (s) and '( 8 ) describe 'Cemp~.ate specifying.
In the e~cample shown 3.n Fig. 4A, firstly, the UL element is extracted and the tempZaze of ( 6 ) is specified according to Gtie pattern specifying of ( 5 ) . Th~ template specifying Of ( 6 ) 2U means shiFting an object wh2ch a template is to be adapted from a current element (UI. ) to a sub-element (LI ) . The template 7CUle for the LI clement is indicat~d by (7). (8).
Next . the Ll element is extract~d by tho pattern specifying oI ( 7 ) . Ther., by the temp.taze specifying of ( 8 ) , txae start tag Z5 .for ul is d~scribed, the st8rt tag for Ii is d~scribed, a poftion Following the LI element a.s convened to text and a.esex~.bed .
Finally, the end tags o~ 11 and ul are described_ Since th~ struczurad document Fl before conversion has ChreE LI el.~rnents as shown in Fig . 3 , three portions corz'esponding to the pattern specifyi~.g of ( 7 ) arc oxtraot~d and the process of the template specifying of ( 8 ) is performed so as to complete the process of conversican .
Accordi.rrg to the temelate rul8 T22 shown in F1g_ 4H.
~xsl:for-each:r is one of elements defined by the sp~eifieazion of XSL . ( 9 ) maans the pattern specifying, which specifies the extraction of the UL ~lement . ( 10 ) means the template speeifylng , which specifies repeated. process of plural LI ~lements . As for the contact of the process . the start tag for u1 is dascz~ibed.
the start tag for Zi zs Qescribed, a portion following the LI
eloment is converted to text and described and then. th.e end tags ~or 1i an3 u1 are Qescr3.bed. Since the structured document F1 contains three Lz elements, the process by <xs1_for-each>
eicmant in this template specifxing of ( 10 ) is repeated fox' tha three eloments and then, the process is completa_ Next, an example of another conversion template will be described. figs, 5(a), ('b) rare diagrams showing an e~cample of 2D conversion or the! body olemont and bloclcquote element. Fig.
5A shows the ctructuxed document F31 ( first structured Qocument ) which is a document nefvra tha conversion, tha structured documewC F32 which is a Qocument after the conversion containing contradictions, and structured document F33 (second structured document ) ln~which tk~e contrcdiction is correct a8. Fig. 5H shows a conventional conversion template 231 and the conversion template T32 of thi8 embodim~nt.

zn the example aocutnent , the body element and BODY elemezit indiceite the main body of a document , and the bloekguote elemezlt and BLOCKQUOTE element specify displaying block of charactef string for quotation. AltlCaough a div element specifies a block.
tv which the st~~lesheet is aaSpted, the stylesheet does not always nave to be adapted.
According to this emDodim~nt, as shown in the. Fig. 13.
the div eloment is us~d as an element capable of oontaining the body element .and blockquote element. According to this embodirnenz, before and after conversion, the bod~r ~lement and blocxguote clement correspond to BODY element and BLOCKC~UOTE
element respectively.
The structured document F31 indicates a character strixig below the HODY slam~nt as a document main body and further, Zhe 7.5 structured document F31 Indicates a character sting block below the HLOCKQUOTfi element as quotation . The structured aocu~nent F32 containing the contx3dictians simply replaces correspol~7~Qlng elements.
In the document type definition D2, if a rule friar a character string cannot be described directly b~low the body element and bl.ockquote element is specified, the structured document F32 i.s contradictory to the docum~nt typ~ dafLn~i.z2on D2. Tha structured docwonent F33 corrects that contradict5.on in the structured document F32 to satisfy the docuznex~t zyDe 15 def~,nition DZ by placing the div element to each of trie body slam~nt and the blockquote element.
Fig. 5E is an example of description of the codversion ~emplaze rule. The conventional conversion template rule T31 Ctescribes the c.anversioln template rule for conversion Prom the structured doedtnent F31 to the document F32 after Conversion (iv) as shown in Fig_ 5A. The corxaet~d conversion template ~cule T32 da~crlbss the conversion rule for the cenvexs~an from the structured document F31 tv the structured docuzr~ent F33 in which the eontradietlot~ is correct~d (vi).
According to the conventlanal conversion tetttplate rule T31, ( 11 ) and ( 13 ) m~an t).2e pattern specifying, respectively aild ( 1i ) spec~.~ies extraction of the BODY element , w7n5.le ( 13 ) Specifa.es extraction of the BLOCKQUOTE element . ( 1z ) and ( 14 ) mean the template .specifying. respectively.
F~,rstly, the BGiDX element is extracted accord.inQ to the pattern specifying of (1i) and then the template cg (7.z) is specified. Secondary, 3n the template of ( 12 ) , ta7,e start tag Por body is described and an object ~ahiah the template is to adapt is shitted from the current element (BODX) tv the sub-element IBLOCRQUOTE). The template specifying of (12) means that the e.nd tag far body 19 describsci after the process of the templ.3t~ rule far the sub-element (SLOCKQUOTE) is performed .
The template rule for the HLOCKQUOTE element is indicated by ( 13 ) and ( ~.4 ) . The HLOCI~QiJOTE element is extracted according to the pattern speczPying of t 13 ? end the template oZ ( 1g ) is Z5 specified. xn the template of ( 14) , the start tag for bloclcc~uota is dascrib~ecl and an object rxhich the template is to adapt is shiFtedfrom the current element ( HLOCKQUOTE ) to the sub-element .

Further, the template of (1a,) specifies describing the end tag .Eor blOCkquote after the process of the teanplate rule for the sub-element.
According to th~ conventional conversion templat~ rule T31, trie 84DY element and the BLOCYCQUOTE element are simply converted to the body alem~nt and Dlockquot~ element.
According to the conversion template rule T32 of this embodiment , ( 15 ) and ( 17 ) mean the pattern respectively and ( 15 ) specifies extraction of the BODY ~elemeTlt . while ( 17 ) specifies extrdctlon of the BLOCKQUoTB element. (15) and (18) mean the template specifying respectively- Firstly, the BODY element ~i.s extracted according to the pa.tLern specifying of (15) and then ~CYie template of (i6) is sp~cZfied. Secondary, in the template of ( 16 ) , the start tag fez' body is described, th~ start tag for div is described, anB az~ object which they template is to aQapt is shifted from trie current element (BODY) to the sub-element (HLOCKQUOTE)_ xhe template specifying oi~ (16) means descrit~ing of the ena tags for div and body as shown in Fig. ~B after the process of the template rule to the sub-element ZO (BLOCKQUOTE) is p~rformed_ The templevte rul~ for the BLOCK(?UOTE element is indicated by ( 17 ) and ( I8 ) . The BLOCKQuOTE element is extracted according co ~Crie pattern specifying of ( ~.7 ) and the template of ( 18 ) is specLfied. In Che template of ( 18l . the start tag for blockquota Z ~ is Qescribed, the start tag fox div is descra.bed, and an object which the template is to adapt 1s shifted from the current element (BLoCKQDOTE) to the sub-el~ment_ Further, the template specifying of (18) means describing the end tags far die and blockquote as shown in Fig. 5B after the process of the template rule to the sub-element is performed. Hy us~.~lg the conversion templ~t'te T32 , the BODY element and th4 BLOC~CQUOTE element a,re converted to the body clement andblockquote element resQectively and the die element can be placed in the body element and the blockquote element.
Further, 3n example o~ generating the conversion template rule T2 according to this ~mbodiment will be described. Fig9.
6 ( a ) and 6 ( b ) ate schemat is diagrams of conversion examples with regard to the of element and 1i element. Fig. 6A shoves the structured doW ment F41 which is a document before conversion ( first structured document ) , the structured document F4~2 which is a document df~Cer conversion containing contradiction and the structured document F4~ after co~nversioz~ (second structured documen t ) in wri9.ch the contradieti.on is corrected . Fig . 68 ahoWs the eonventianal conversion tarnplate 'J~~1 and the coneersion template T42 of this embodiment_ The 01 element and OL elwment gel~erc3te numbered statement block (order list) and each statement 5.zem is defined by the ii element oT LI element , which is the lower level of of or OL
elemCnt . Tha docum~nt F41 indicat~s an exa~mmple of both a portion In which the LI element exists and a po~Ction in which the LI
element does not exist below the oL element.
As shown in Fig. 6A, the structuxeci document F42 containing contr3diCtic~Ils Simply raplacQ corresponding elements . In the document type definita,on after coavers9~oln, 1f a rule that at least one ii element is required below zhe of element is speeifi~d, the Structured document F42 is contradictory to the document type cx.efinition after conversion.
The structured document F4,3 colrrects contradictions in the structured document F42 to satisfy the document type defiri2tiOn by replacing the of element which has no ii el~ment with trie div element.
F1g . 68 sho~rs an e~cample of the convBrsion template rule T42. TAe Conventional conversion template rule T41 shown in Fig_ 6B descripes the conversion rule about conversion from 'the StructurgQ document F~1 to the structured document F42 after conversion (vii) as shown in Fig_ 6A. The conversion template rule T42 shown in Fig . 6B dsscriDes the convorsion rule about conversion from the structured aocument F41 to the structured doemnez~~C F41 ( ix ) .
As shown in Fig. 6, the convsnzional conversion template rule T41 is also compris~ad of the pattern for specifying extraction of th~ OL element/the LI element end the template corz~espond~.ng to each pattern. According to this conventional conversion template rule T41 . the OL element and LI element are simply converted to the of elezttent and ii e~.ement.
ACcordi.ng to the conversion template rule T42 of this amboalment, (19) and (2T,) mean the pattern specifying.
respectively. (19) speaif~.es extraction of the OL element.
while ( Zl ) specif~.es extraction: of the LI elemont respectively.
( ZO ) anCt ( 22 ) indxcats the template respectively. First~.y, the OL element is extracted accorasng to the pattesrn specifying .of ( 19 ) anCi then the template of ( z0 ) is specified.
Each of <xsl : choose? > <xsl :wrieW . <xsl: otherwise? in Fig.
bB are elements defined by the specification of XSL . The process is performed based on a combination og these three elements.
zF Che result of a conditional expression ( "count ( LI ) ! _ ' 0' "
deserlbeCi in test attribute is true, the process in the element <xsl:when> is performed and if the result a.s false, the process in the element <xsl:otherwise> is pe7Cformed.
UnQer the conditional expressxoz~.("cvunt(LI)! _ '0'"), the quantity of the Lz el~ments is counted and if one or more T,I elementS exist , the result is true _ In this case, the start tag for of is described according tv tna template of <xsl:whan>
element and than the process of the template rule to the LI element is perfortlted . After that , the end tag of o1 is described.
Further, according to the co~c~.Qirivnal expression ( "count ( LI ) ! _ ' 0' " ) , a.f the quantity of the LI elements is 0 .
the result is false. In this case, the stsrt tag of div is descrlped according to the template of <xsl:otherwise? element and. then an abject which the texctplate is to adapt is sh.~.ftna front the current element (0L) to the sub-element. After the process or the template rule to t'he sub-element is performed, the end tag of div is dascribeQ. According to the conv~rsion template rule T42 , if no ii element exists below the of element , the of element can be re~plaeeQ v~f~.zh the div element .
Z5 Trie document conversion~~nethod of this embodiment described above allows modificat~.on as shown in Fig. 7. Fig.
7 shows an example of conversion process in a case wher~ a ~, 9 structured docmnent x~oz following the XML. for example.
compact-HTML document far i-mode tlnformation s~rviea for cellular phone via the Internet) is used as the structured document before convars~.an t first structured docue~tent ) . xn this modification. sb,aping erocess SZO1 by using a shaping tool is added to the above-described smbadiment.
In this example of modifa.oation, a document ne~ds to follow the document type dQtinitlon (DTD) of XMh in order to activate the XSLT eng3.ne as a doCUment structure conversion tool _ The XML document needs to hdve a declare~tion statement such as XML
declaration and all the elements need to be d~scribed exactly in the nesting structure. Shaping process 5201. a.s performed in order to shape a stz'uGtured document F1 which is not based an tho XML to follow the specification of YML (well-formea)_ In the shaping process S2o1 . the fallowing proocss is performed_ The content of the Drocess is correcting the nest of the start tag and th~ end ta,g. adding the end tag if the ez~.d tag is not ~ttacnea an4 so on. Further, ache content of the process is inserting '/' if an ~mpty element exists (e. g_. --B'R//).
enclosing an attribute value with double quotation, adding an attribute value if the attribute value has been omitted, correcting the elemexl.t name and ~attxi,bute name to small lette7Cs and so on.
As shot~nn Ln fiq. 7 , shaping process 5201 as perfo~c~led in order to shape trie Structured document F1 b~fore conversion to follow the specification of XML . In the shaping process S201 .
Free softwar~s (e-g. . HTML Tidy) can be used. Document struCLUre ao conversion S101. is performed to a document shsped by the shaping proceSS 5201 ire order to generate a new structured document F3.
The cozwersion template T2 describes an appropriate conversion rule by interpreting the document °~ype definition Di before conversion and tho document type definizivn D2 in order to output a result according to the document type definition D2 after conversion. The process iccompzete once the document structure conversion 51~~1 is perform~d for conversion of th~ "shaped°
structure document F1 to a ne~a sxxuctura document F3.
Document Conversion progra~e and Document Convers3.on S_~~s t ern The above-mentioned document conversion meThod can be achieved by a par5onal computer or workstation which a program described by an appropriate computeZ language is installed. In Z5 a ease wtrera such a document conversion program is installed to a computer, that computer functions a,s a document conversion system.
Fig. 8 is a block diagz~am showing the configuration of a computer 1 in which the document conversion program is installed.
As sriown 1n th~ figure, the computer 1 comprises a hard disK
11 , a printer Znterfaca 1a , a display interface 13 , an I /O device 1 ~ , a memvr~y 15 , a communicat~.ox~ device 16 , a CBU 1? and a bus is for connecting these devices, etc.
The hard disk 11 is a recording modium which stores ~rarious kinds of data. Various kinds of darn r~ad via the I/O d~vice 1,4 is stored in the hard disK 1l and the data is outputted to the memory 15 or the CpU 17 accorQing to a request by the CPU

17. Further, data, which is the result of processes in each device, 1s also stored .izt the hard disk 11. This hard dls3c 1l stores document oonversi.oi~ programPi and the document conve~CS3on program P 1 is activated and 2s controlled according to the col~l.trol or the CPU 17.
The printer interface 12 is a device for connecting the ComQuteritoanexternalpz~:~nter, etc. andperfarmsfilepriz~zing depends on s request trocn the CPU 17 , etc . The display int0x'face 13 displays images basest azl display data generated by the CPU
17 and displays approprl.are images for control of the document Conversion pragram P1 or a result of various processes.
The communioatioz~ device 16 is a communication uziit such 8s LAN card or a modem, which connects the computer 1 to a communication netarorx x0 such as the Internet, etc. via a communication line.ao alto transmit/receive data. The computer 1 is capable of raceivinq data from external terminal or transmitting converted document file through the communication device 16.
The I/Cr device 14 is a device for readinq/wzit.ing data Zo ~rom/to an e~sternal recording medium, such as a fle~ibl~ disk drive and a CD-ROi~d c9ri a . According to this embodxzc~ent , the conversion template T2, the documenfi type definitions Dl. D2 and the structured doCUments F1/F3 are inpuztea/outputted.
The memory 15 is a main memory device for storing date temporarily when the CPU 17 executes process_ The memory l5 holds data read out rrom the hard disk ii or a result of processes executed by the CPU 17.

The CPU 17 is a central processing unit, which functions as a document type definitl.ozl analyzer 17a, a conversion template generator 17b, a document structure converter 17c, a shaper 17d, a file T/o unit 17e , a communication processor 17f . a display data generator 17g and a p~inti.ng processor 17h, by e~ceeuzing the document conversion pz~ogram P1 read cut from the hdrd disk 11.
The document type defXnitivn analyzer 17a analyzes th~
document type definition D~. and the docurnent type deflz~itlon af'Cer conversion, and extz~3cts a difference between these document type definitions . According to this ~mbodimen'C . this document type definition analyzer 17a comprises an idenzilier correspondence table storl.ng unit for storing the idexx'cifier correspondence table which the identifier of the document Zype definition before convers7LOn and the 3.dc~ntifier of the aocuxnent type definition after conversion ar~ linked, a logical structure extracting un it for extx3eting e~ first logical structure defined by the idrnti~ier of trie document type definitio~z D1 as well as a second logical structure defin~d by the identlfl.er of the z0 aocument type definltloz~ D2, and a condition detector which Compares the first logical structure with the seeonQ logical structure accsording to the identifier corr~spondenee table and analyzes the condition based on difforing pardons laetween the both structures.
Z5 The identifier cox~resDondence tabl~ storing unit can be acizl.eved with a cache memory inside the CPU 17 and the hard desk 11 or the memory 15 can also be used as an auxiliary means.

The logical structure extracting unit reads data contained in the document type definitions Dl and D2 sBquentially and verzfies the data using ?~4entifiers described in the identifier correspond~nce table_ In a case where a matching ident9..fier is detected, the logical structure mxtracting unit extracts its pattern by referring to a logical structure existing below the identif3.er.
The condition detector compares rul~s specified far the dQCument type definit3.ons D1 and D2 before/aftar convers?~on so as to detect .~ condlt~.on which generates a difference. For example, the eonditloxi detector dateots a condition where a difference in pattern occurs if however many Lz elements exist below the UL.
The conversion template generator 17b gen~rates a conversion template xl according to a r~sult of the dOCUmerit ~typc definition analyzer 17a. Tha conversion template T1 dcscrib0s a conversion told for the structured documAft~t F2 which is aresult of 'the document conversion to avoid any contradictions to the document type def inition D2 . According to tnis embodiment , the conversion template generator 7,7b generates a conversion rule based on the aforemanti.oned condition about tl7~e differing portions an8 its Corresponding logioa7. structure after conv~rsion (pattern extr$ctad fram D2). The conversion tcmplat2 generator' 17b then correlates th~ iciez~tifier z5 correspondence table with the conversion rulm and converts them to the format of the conversion template.
The document structure conv~rt~r 17c processes the document conversion using the conversion template. The C~ocument s~Cructure converter replaces the identifiers described in the identifier correspondence table and converts the argument attached ~to th.e identifier. Further, the document structure converter 17c adds, deletes and converts the logical structure of an identifier which matches the aforementioned condition acCOrding to the template for replacing.
The shaper 17d shapes the first structured docu~cewnt F1 so as to enabl~° conversion by the docum8nt structure converter 1U 17c and corrects erroneous dasez~ipzion in the structured document F1 ( this 1s not required :for a sb.aped document . a . g . , XML ~ . Mor~
specifically, the shaper 17d Corrects th~ nest of the start tag anCi, the and tag , and adds the end tag if the end tag is not already a~CZachad. Further, the Shaper 17d inserts ' / ' if an empty element exists ( a . g . , : BR/ / ) , encloses an attribute value with double quotation, adds an attribute value if the e.ttribute value has be~n omitted, corrects t7t~e element name and attribute name to small letters and so on_ The file I/O unit 17e controls input/output of a file and zp the operation of the hard dxsK 11 as well as I/O device 14. More specifa.cally, the file T/D unit 17e reads the structureddocumant F 1 , the conve.rs ion template TZ , and the identifier corresgondanee table, etc. The file r/o unit 17e also stores the structured document F3 in the hard disx 11 arses writes it into a flexible 2°i disk ar a CD-R, etc. through the I/O device 14. Further, the Bile I/o unit 17e inputs Ox outputs each file to/from the Communication processor 17f or printing processor 17h as required.
Ttte communiCatioo. processor 17f contfiols th~
communication device 16 and is connected to the network 2o through the communication device Z6 So aS to transmit/receive the StrLlcturad document F1 and the structured document F3 to/from an external tErminal. The communication processor 17f also receLves a converazon request of a file from the other terminals through the communication device 16.
The display data generator 17g generates imag~ data for cllsplaying on s screen and cont7cols th~ display interface 13 Image data is 3isplayed on an external display unit through the display ~.nterzaca 13. Th~.s aisplay data includes graphic data to be generated according to the document conversion program P1 and th~ display data is used to display an image for control of each process and a review of each tile.
The printing processor 17ri controls the printer interface 12 to print tHa strucrurea document F3 by an external print~r.
Operation The document aonv~rsion system can be achi~ved by execut ing Che document conversion program described above on a personal computer. etc. The operatioz~ of this document conversion system w.i~.l be dtscr:Lbedwith reference to Fig, 9. Fig. 9 is a flowchart sn.ow~ng the process of the document conversion system.
As shown in Fig. 9 ~ the document type definition D1 before conversion is read out one andlyzed ( 5201 ) . More sp~cifically, a file is re,~d out from the I /o device 1 ~4 or the hard dis3c 11 and analyza8 by the document type definition analy:er 17a.

Similarly, the document type ciefit~~.~cion D2 after conversion is rea.a out and analy2ed (S20z). After that, the conversion tampl8ze is generated (S203 ) _ More specifically, the doeumerrt type Ciefinitioa analyzer 17a analyzes the document type definition D1 /D2 and extracts a difference between these document type deflnitians.
Next , the structured documex~z F1 is read out ( 8204 ) , the read-out structured document F1 is shaped ( 6205 ) if shaping is requlreCi and document structure of the shaped document is converted (326).
Then, the converted structured document F3 is outputted (5207). This output includes writing it into the r/0 d~vice 14 or the hard disk 11 , transmittlz~g ~.t to the netWOrk 20 through the communication device 16 and printing it out through the printer interface 13.
Cvmputcr Readable Reco.~CQing Medium Storing Document Conversion Program The above described eocumchr conversion program can be s Coxed in a recording me~di.um readable by the computer 1 . This Computer readable r~cording meCiium includes , as shown in Fig _ 10. a flexible disk 216, a CO-ROM 217, a ROM 218, a magnetic tape 219 , et~:.
As shov~n in Fig. 11 , the computer readable recording medium Storing such a document conversion program enables document conversion by using computer 34 such as a notebook type personal computer, a desk-top personal, cornDuter or a worlsstation_ Fox example, in a case where the structured document F1 ~whl.clz is to be converted is stoxed in a file as sho4m in Fig.
ll , such a structured document stored in a local disk. is converted by the cocuputer 30 in whicri the abeve-described document Conversion program is installed. as a document converter.
Although the above embo4.iment has been described about a case where both the hard disk 11 for storing the structured d.oCUment Fl. F3 and the CPU 17 for arithmetic operation, ete are incorporated in a sa.ngle Computer, the present invention LS not restri~~ted to this example. For example, the above-described respective devices can be decentralized on plural. computers .
Fig. 12 is a schematic diagram showing a case where the above described respective devi.Ces are dacentrali~ed on plural Computers. As sho'm in the figure, the structured document F1 which is to be converted is, stored in a content server 401 which is connected to the World WiQe Web ( WfiiW ) . The structured document F1 can be converted by a Conversion server ~402 dape~nd on a conversion request issued by a client terminal 403.
in this case, the conversion server 402 in which the z0 above-described document conversion program is installed is util~z~d. Tr4e eonvers9.on server 402 is connect~d to the communication network (e. g_. the Internet). The conversion server 402 comprises a rseei.v9.ng unit far receiving a conversion request from the client texminal 403 via the communication network and obtaining the structured document F1 from thg content server 401. The conversion server 402 also comprises a ~tansmitta,ng unit for traza.sm?~zting the structured document F3 clfter convexsi~an to the client terminal device 403 via the Communication network, Th~ above-d2scr~.bad communication devl.ce 1$ can be us0d to funct~ozl as the transmitting unit and 'Che receiv~.ng unit .
As explained above, accozQing to 'the present inv.ntion.
since the validity verifa.cation step for document type definition after convar$ion is omitted by replacing with an appropriate conversion template in conversion of the structured document.
a toz81 time for the document structure conversion can be reduced .
~,0 The present invention leas been described in detail by referring to the embodim~nts _ It is obvious to those skille4 in art that the present invenLS.on is not restricted to trie embodiments rnantioned above_ The present invention may be cafrled out as ~x corraot~d or modified aiubodimant not departing 3.5 from the gist and scope speca_~Cl.ed by th~ scope of claim for a pnLent. Therefore, the descr3.pz3vn of this specification aiAns aC the representation of examples but does not have any limitation on the present invention.

Claims (15)

1. A document conversion system for converting a first structured document formed based on a first document schema into a second structured document formed based on a second document schema, the document conversion system comprising:
a document type definition analyzer for analyzing the first document schema and the second document schema and extracting a different document type definition;
a conversion template generator for generating a conversion template having described therein a conversion rule which prevents the second structured document which is the result of a document conversion process, from being contradictory to the second document schema based on the results of the analysis performed by the document type definition analyzer:
and a document structure converter for performing document conversion process using the conversion template.
2. The document conversion system according to claim 1.
wherein the first document schema and the second document schema each have an identifier for defining the logical structure of a character string constituting a document.
the document type definition analyzer comprises.
an identifier correspondence table storing unit for storing an identifier correspondence table which makes a correspondence between the identifier of the first document schema and the identifier of the second document schema:

a logical structure extracting unit for extracting a first logical structure defined by the identifier of the first document schema and a second logical structure defined by the identifier of the second document schema; and a condition detector for detecting that portions differ between the first logical structure and the second logical structure by comparing bath structures according to the identifier correspondence table and analyzing conditions generated by the detected differing portions, and the conversion template generator which generates a conversion rule based on the condition of the detected differing portions and their corresponding second logical structure.
3. The document conversion system according to claim 1 further comprising a file recorder for storing the first structured document and the second structured document as file data, wherein the document structure converter convents the first structured document read out from the files recorder.
4. The document conversion system according to claim 1 further comprising:
a receiver which is connected to communication network for acquiring a conversion request and the first structured document from the communication network: and a transmitter for transmitting the second structured document converted by the document structure converter to the communication network.
5. The document conversion system according to claim 1 further comprising a shaper for correcting errors in the description of the first structured document so that the first structured document can be read by the document structure converter.
6. A document conversion method fox converting a first structured document formed based on a first document schema into a second structured document formed based an a second document schema, the document conversion method comprising the steps of, (A) analyzing the first document schema and the second document schema and extracting a different document type definition;

(B) generating a conversion template having described therein a conversion rule which prevents the second structured document, which is the result of a document conversion process, from being contradictory to the second document schema, based on the results of the analysis; and (C) performing document conversion process using the conversion template.
7. The document conversion method according to claim 6, wherein the first document schema and the second document schema each have an identifier for defining the logical structure of a character string constituting a document.
the step (A) comprises the steps of:
(A-1) extracting a first logical structure defined by the identifier of the first document schema and a second logical structure defined by the identifier of the second document schema;
(A-2) detecting portions that differ between the first logical structure and the second logical structure by comparing both structures according to an identifier correspondence table which makes a correspondence between the identifier of the first document schema and the identifier of the second document type:
and (A-3) analyzing conditions which ere generated by the detected differing portions, and the step (B) is for generating a conversion rule based on the condition of the detected differing portions and their corresponding second logical structure.
8. The document conversion method according to claim 6, wherein the first structured document and the second structured document are stored in a file recorder as file data, and the step (C) is for converting the first structures document read from the file recorder.
9. The document conversion method according to claim 6 further comprising:
a step of acquiring a conversion request and the first structured document from communication network, and a step of transmitting a converred second structured document to the communication network in the step (C).
10. The document conversion method according to claim 6, wherein the step (C) includes a step of correcting errors in the description of the first structured document so that the first structured document can be reed.
11. A computer readable recording medium storing a document conversion program which converts a first structured document formed based on a first document schema into a second structured document formed based on a second document schema and makes a computer to execute a process comprising the steps of:
(A) analyzing the first document schema and the second document schema and extracting a different document type definition;
(B) generating a conversion template having described therein a conversion rule which prevents the second structured document, which is the result of a document conversion process, from being contradictory to the second document schema, based on the results of the analysis; and (C) performing document conversion process using the conversion template.
12. The computer readable recording medium storing the document conversion program according to claim 11, wherein the first document schema and the second document schema each have an identifier for defining the logical structure of a character string constituting a document.
the step (A) comprises the steps of:
(A-1) extracting a first logical structure defined by the identifier of the first document schema and a second logical structure defined by the identifier of the second document schema;
(A-2) detecting portions that differ between the first logical structure and the second logical structure by comparing both structures according to an identifier correspondence table which makes a correspondence between the identifier of the first document schema and the identifier of the second document schema;

and (A-3) analyzing conditions which are generated by the detected differing portions, and the step (B) is for generating the conversion rule based on the condition of the detected differing portions and their corresponding second logical structure.
13. The computer readable recording medium storing the document conversion program according to claim 11. wherein the first structured document and the second structured document are stored in a file recorder as file data and the step (C) is for converting the first structured document read from the film recorder.
14. The computer readable recording medium storing the document conversion program according to claim 11 further comprising:
a step of acquiring a conversion request and the first structured document from communication network, and a step of transmitting a converted second structured document to the communication network in the step (C).
15. The computer readable recording medium storing the document conversion program according to claim 11, wherein the step (C) includes a step of correcting errors in the description of the first structured document so that the first structured document can be read.
CA002411459A 2001-11-12 2002-11-08 Document conversion system, document conversion method and computer readable recording medium storing document conversion program Abandoned CA2411459A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2001346736A JP2003150586A (en) 2001-11-12 2001-11-12 Document converting system, document converting method and computer-readable recording medium with document converting program recorded thereon
JP2001-346736 2001-11-12

Publications (1)

Publication Number Publication Date
CA2411459A1 true CA2411459A1 (en) 2003-05-12

Family

ID=19159847

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002411459A Abandoned CA2411459A1 (en) 2001-11-12 2002-11-08 Document conversion system, document conversion method and computer readable recording medium storing document conversion program

Country Status (8)

Country Link
US (1) US7139975B2 (en)
EP (1) EP1313032A1 (en)
JP (1) JP2003150586A (en)
KR (1) KR100486138B1 (en)
CN (2) CN1612136A (en)
AU (1) AU2002301951B2 (en)
CA (1) CA2411459A1 (en)
TW (1) TWI267004B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005001708A1 (en) * 2003-06-27 2005-01-06 Common Ground Publishing Pty Ltd Method and apparatus for the creation, location and formatting of digital content
AU2004252575B2 (en) * 2003-06-27 2009-05-21 Common Ground Publishing Pty Ltd Method and apparatus for the creation, location and formatting of digital content

Families Citing this family (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3972323B2 (en) * 2001-09-04 2007-09-05 インターナショナル・ビジネス・マシーンズ・コーポレーション Schema generation device, data processing device, method thereof, and program
US7831905B1 (en) * 2002-11-22 2010-11-09 Sprint Spectrum L.P. Method and system for creating and providing web-based documents to information devices
JP4267336B2 (en) * 2003-01-30 2009-05-27 インターナショナル・ビジネス・マシーンズ・コーポレーション Method, system and program for generating structure pattern candidates
JP4676136B2 (en) * 2003-05-19 2011-04-27 株式会社日立製作所 Document structure inspection method and apparatus
WO2005003965A2 (en) * 2003-06-25 2005-01-13 Siemens Medical Solutions Health Services Corporation Data migration and format transformation system
EP1649384A2 (en) * 2003-07-11 2006-04-26 Computer Associates Think, Inc. System and method for generating html based on common xslt
CA2438362C (en) * 2003-08-26 2011-05-31 John William Comeau A method and system for synchronizing a client user interface with server backend
US20050114765A1 (en) * 2003-11-25 2005-05-26 Gudenkauf John C. Producing a page of information based on a dynamic edit form and one or more transforms
JP2005234837A (en) * 2004-02-19 2005-09-02 Fujitsu Ltd Structured document processing method, structured document processing system and its program
US7607120B2 (en) * 2004-04-20 2009-10-20 Hewlett-Packard Development Company, L.P. Method and apparatus for creating data transformation routines for binary data
WO2006017944A1 (en) * 2004-08-16 2006-02-23 Abb Research Ltd Method and system for bi-directional data conversion between iec 61970 and iec 61850
CA2576976A1 (en) * 2004-08-21 2006-03-02 Co-Exprise, Inc Methods, systems, and apparatuses for extended enterprise commerce
KR100636177B1 (en) 2004-09-20 2006-10-19 삼성전자주식회사 Method and system for managing output of policy based extensible markup language document
JP4666996B2 (en) * 2004-10-21 2011-04-06 キヤノン株式会社 Electronic filing system and electronic filing method
JP4868733B2 (en) * 2004-11-25 2012-02-01 キヤノン株式会社 Structured document processing apparatus, structured document processing method, and program
US8706475B2 (en) 2005-01-10 2014-04-22 Xerox Corporation Method and apparatus for detecting a table of contents and reference determination
US7693848B2 (en) * 2005-01-10 2010-04-06 Xerox Corporation Method and apparatus for structuring documents based on layout, content and collection
US7937653B2 (en) * 2005-01-10 2011-05-03 Xerox Corporation Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents
JP4539386B2 (en) * 2005-03-16 2010-09-08 富士ゼロックス株式会社 Electronic form printing device
US8302002B2 (en) * 2005-04-27 2012-10-30 Xerox Corporation Structuring document based on table of contents
US7818667B2 (en) * 2005-05-03 2010-10-19 Tv Works Llc Verification of semantic constraints in multimedia data and in its announcement, signaling and interchange
JP2006351002A (en) 2005-05-17 2006-12-28 Fuji Xerox Co Ltd Document verifying apparatus, document verifying method and program
US7703006B2 (en) * 2005-06-02 2010-04-20 Lsi Corporation System and method of accelerating document processing
US7743327B2 (en) 2006-02-23 2010-06-22 Xerox Corporation Table of contents extraction with improved robustness
US7890859B2 (en) * 2006-02-23 2011-02-15 Xerox Corporation Rapid similarity links computation for table of contents determination
US8407585B2 (en) * 2006-04-19 2013-03-26 Apple Inc. Context-aware content conversion and interpretation-specific views
KR100910061B1 (en) 2006-09-26 2009-07-30 한국전자통신연구원 Metadata encoding apparatus and method for digital broadcasting and metadata decoding apparatus and method
US7979793B2 (en) 2007-09-28 2011-07-12 Microsoft Corporation Graphical creation of a document conversion template
US7941399B2 (en) 2007-11-09 2011-05-10 Microsoft Corporation Collaborative authoring
US20090132676A1 (en) * 2007-11-20 2009-05-21 Mediatek, Inc. Communication device for wireless virtual storage and method thereof
US8028229B2 (en) * 2007-12-06 2011-09-27 Microsoft Corporation Document merge
US8825758B2 (en) 2007-12-14 2014-09-02 Microsoft Corporation Collaborative authoring modes
JP5248145B2 (en) * 2008-03-05 2013-07-31 株式会社日立製作所 Information providing server, program, information providing method, and information providing system
US8301588B2 (en) * 2008-03-07 2012-10-30 Microsoft Corporation Data storage for file updates
US20090240628A1 (en) * 2008-03-20 2009-09-24 Co-Exprise, Inc. Method and System for Facilitating a Negotiation
US8352870B2 (en) 2008-04-28 2013-01-08 Microsoft Corporation Conflict resolution
US8429753B2 (en) * 2008-05-08 2013-04-23 Microsoft Corporation Controlling access to documents using file locks
US8825594B2 (en) * 2008-05-08 2014-09-02 Microsoft Corporation Caching infrastructure
US8417666B2 (en) * 2008-06-25 2013-04-09 Microsoft Corporation Structured coauthoring
US20100005112A1 (en) * 2008-07-01 2010-01-07 Sap Ag Html file conversion
US11832024B2 (en) 2008-11-20 2023-11-28 Comcast Cable Communications, Llc Method and apparatus for delivering video and video-related content at sub-asset level
US20100131836A1 (en) * 2008-11-24 2010-05-27 Microsoft Corporation User-authored notes on shared documents
US8346768B2 (en) * 2009-04-30 2013-01-01 Microsoft Corporation Fast merge support for legacy documents
US9063800B2 (en) 2010-05-26 2015-06-23 Honeywell International Inc. Automated method for decoupling avionics application software in an IMA system
CN101968784A (en) * 2010-10-13 2011-02-09 无锡永中软件有限公司 Digital format conversion method and device
US9727748B1 (en) * 2011-05-03 2017-08-08 Open Invention Network Llc Apparatus, method, and computer program for providing document security
US9275554B2 (en) * 2013-09-24 2016-03-01 Jimmy M Sauz Device, system, and method for enhanced memorization of a document
CN105302775A (en) * 2014-06-30 2016-02-03 鸿合科技有限公司 File format converting method and apparatus
US11783382B2 (en) 2014-10-22 2023-10-10 Comcast Cable Communications, Llc Systems and methods for curating content metadata
CN109947771B (en) * 2018-08-20 2023-07-21 中国平安人寿保险股份有限公司 File type conversion method, apparatus, device and computer readable storage medium
US11003835B2 (en) * 2018-10-16 2021-05-11 Atos Syntel, Inc. System and method to convert a webpage built on a legacy framework to a webpage compatible with a target framework
JP7159019B2 (en) * 2018-11-22 2022-10-24 横河電機株式会社 DATA GENERATION DEVICE, DATA GENERATION METHOD, DATA GENERATION PROGRAM, AND RECORDING MEDIUM
CN112416190B (en) * 2019-08-23 2022-05-06 珠海金山办公软件有限公司 Method and device for displaying document
KR102471587B1 (en) * 2019-11-28 2022-11-28 한국과학기술정보연구원 Documents conversion apparatus, and control method thereof
CN112560401B (en) * 2020-12-22 2024-04-09 成都海光微电子技术有限公司 Verilog file conversion method, device, storage medium and equipment

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5299304A (en) * 1991-04-16 1994-03-29 International Business Machines Corporation Method and apparatus for identifying multiple stage document format transformations
US5491628A (en) 1993-12-10 1996-02-13 Xerox Corporation Method and apparatus for document transformation based on attribute grammars and attribute couplings
JPH07319917A (en) * 1994-05-24 1995-12-08 Fuji Xerox Co Ltd Document data base managing device and document data base system
US5915259A (en) * 1996-03-20 1999-06-22 Xerox Corporation Document schema transformation by patterns and contextual conditions
JP3605941B2 (en) * 1996-05-20 2004-12-22 富士ゼロックス株式会社 Document structure creation device and document structure creation method
JPH10307818A (en) * 1997-05-08 1998-11-17 Nec Corp Document translation system, its method and recording medium recording document translating program
US6182092B1 (en) * 1997-07-14 2001-01-30 Microsoft Corporation Method and system for converting between structured language elements and objects embeddable in a document
JP3843574B2 (en) 1998-01-26 2006-11-08 富士ゼロックス株式会社 Document conversion rule generation device, document conversion rule generation method, and computer-readable recording medium recording a document conversion rule generation program
US6115686A (en) * 1998-04-02 2000-09-05 Industrial Technology Research Institute Hyper text mark up language document to speech converter
US6473805B2 (en) * 1998-06-08 2002-10-29 Telxon Corporation Method and apparatus for intergrating wireless and non-wireless devices into an enterprise computer network using an interfacing midware server
US6424980B1 (en) * 1998-06-10 2002-07-23 Nippon Telegraph And Telephone Corporation Integrated retrieval scheme for retrieving semi-structured documents
US6336124B1 (en) 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
KR100415996B1 (en) * 1998-10-12 2004-01-31 삼성전자주식회사 Method of rendering documents by server
CA2255047A1 (en) * 1998-11-30 2000-05-30 Ibm Canada Limited-Ibm Canada Limitee Comparison of hierarchical structures and merging of differences
US6535896B2 (en) 1999-01-29 2003-03-18 International Business Machines Corporation Systems, methods and computer program products for tailoring web page content in hypertext markup language format for display within pervasive computing devices using extensible markup language tools
JP2000339312A (en) * 1999-05-31 2000-12-08 Toshiba Corp System for editing document and method for generating tag information management table
US6502112B1 (en) * 1999-08-27 2002-12-31 Unisys Corporation Method in a computing system for comparing XMI-based XML documents for identical contents
FR2811782B1 (en) 2000-07-12 2003-09-26 Jaxo Europ DOCUMENT CONVERSION SYSTEM WITH TREE STRUCTURE BY SELECTIVE PATHWAY OF SAID STRUCTURE
US6681223B1 (en) * 2000-07-27 2004-01-20 International Business Machines Corporation System and method of performing profile matching with a structured document
US6694338B1 (en) * 2000-08-29 2004-02-17 Contivo, Inc. Virtual aggregate fields
US7970437B2 (en) * 2000-11-29 2011-06-28 Nokia Corporation Wireless terminal device with user interaction system
US7152205B2 (en) * 2000-12-18 2006-12-19 Siemens Corporate Research, Inc. System for multimedia document and file processing and format conversion
JP2002259362A (en) 2001-02-28 2002-09-13 Fujitsu Ltd Document conversion definition generating program
US6964025B2 (en) * 2001-03-20 2005-11-08 Microsoft Corporation Auto thumbnail gallery
US6725231B2 (en) * 2001-03-27 2004-04-20 Koninklijke Philips Electronics N.V. DICOM XML DTD/schema generator
US20020184263A1 (en) * 2001-05-17 2002-12-05 Pierre Perinet Method and system for converting usage data to extensive markup language
US7099885B2 (en) * 2001-05-25 2006-08-29 Unicorn Solutions Method and system for collaborative ontology modeling
US20030145305A1 (en) * 2001-11-16 2003-07-31 Mario Ruggier Method for developing and managing large-scale web user interfaces (WUI) and computing system for said WUI
US8032828B2 (en) * 2002-03-04 2011-10-04 Hewlett-Packard Development Company, L.P. Method and system of document transformation between a source extensible markup language (XML) schema and a target XML schema
US7069497B1 (en) * 2002-09-10 2006-06-27 Oracle International Corp. System and method for applying a partial page change
US20040181748A1 (en) * 2003-03-10 2004-09-16 International Business Machines Corporation Thin client framework deployment of spreadsheet applications in a web browser based environment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005001708A1 (en) * 2003-06-27 2005-01-06 Common Ground Publishing Pty Ltd Method and apparatus for the creation, location and formatting of digital content
AU2004252575B2 (en) * 2003-06-27 2009-05-21 Common Ground Publishing Pty Ltd Method and apparatus for the creation, location and formatting of digital content
US7886225B2 (en) 2003-06-27 2011-02-08 Common Ground Publishing Pty. Ltd. Method and apparatus for the creation, location and formatting of digital content

Also Published As

Publication number Publication date
CN1419211A (en) 2003-05-21
KR20030040113A (en) 2003-05-22
EP1313032A1 (en) 2003-05-21
TWI267004B (en) 2006-11-21
KR100486138B1 (en) 2005-04-28
JP2003150586A (en) 2003-05-23
AU2002301951B2 (en) 2007-07-12
US20030093760A1 (en) 2003-05-15
US7139975B2 (en) 2006-11-21
TW200300233A (en) 2003-05-16
CN1612136A (en) 2005-05-04

Similar Documents

Publication Publication Date Title
CA2411459A1 (en) Document conversion system, document conversion method and computer readable recording medium storing document conversion program
EP1406181B1 (en) Document revision support
US7249316B2 (en) Importing and exporting markup language data in a spreadsheet application document
US7617444B2 (en) File formats, methods, and computer program products for representing workbooks
EP1672543A2 (en) File formats, methods, and computer program products for representing presentations
US20050149861A1 (en) Context-free document portions with alternate formats
US20040172591A1 (en) Method and system for inferring a schema from a hierarchical data structure for use in a spreadsheet
US20090019064A1 (en) Document processing device and document processing method
US6883139B2 (en) Manual processing system
US20070174307A1 (en) Graphic object themes
RU2579888C2 (en) Universal presentation of text to support various formats of documents and text subsystem
US9298675B2 (en) Smart document import
US20090083300A1 (en) Document processing device and document processing method
EP1830274A1 (en) Server device and name space issuing method
EP1405207B1 (en) Defining layout files by markup language documents
US7310771B2 (en) Method and computer-readable medium for providing page and table formatting services
Ott Strategies and tools for textual scholarship: the Tübingen System of Text Processing Programs (TUSTEP)
KR100733054B1 (en) Document Conversion System Using Synchronization of Structured Documents, And It&#39;s Method
Lemnitzer et al. Representing human and machine dictionaries in Markup languages
US5640581A (en) CD-ROM information editing apparatus
Sinclair 4.2 Corpus processing
KR20020057709A (en) XML builder
JPH1021227A (en) Device and method for document structure conversion
Ribarov Towards Intelligent Written Cultural Heritage Processing-Lexical processing.
JP2001117919A (en) Device and method for automatically pre-editing natural language sentence and storage medium to be utilized for the same

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued