|Publication number||USRE39184 E1|
|Application number||US 10/601,267|
|Publication date||Jul 11, 2006|
|Filing date||Jun 19, 2003|
|Priority date||Nov 13, 1998|
|Also published as||US6249844|
|Publication number||10601267, 601267, US RE39184 E1, US RE39184E1, US-E1-RE39184, USRE39184 E1, USRE39184E1|
|Inventors||Robert Schloss, Philip Shi-lung Yu|
|Original Assignee||International Busniess Machines Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (9), Non-Patent Citations (9), Referenced by (26), Classifications (9), Legal Events (2)|
|External Links: USPTO, USPTO Assignment, Espacenet|
The present invention relates generally to the analysis of the content of a digital document and in particular to the creation and maintenance of persistent fragment identities to facilitate caching.
With the rapid growth of the Internet, the need for efficient document exchange becomes increasingly important. In additional to the hypertext markup language (HTML), Extensible Markup Languages (XML) are becoming available that provide a meta-language for authors to design their own markup language.
On the other hand, the proliferation of various non-PC computing devices, including: handheld devices; palmtop devices; and various other Microsoft WINDOWS CE™-based devices; set-top boxes; WEB TV; smart phones; and so-called Internet appliances, (hereinafter all referred to as Internet appliances) further complicates the presentation of a Web document to a client device. In a Web document based on HTML, images are treated as separate objects pointed to by the Web document. A proxy/Web server may generate a lower resolution version or a black and white version of a color image to accommodate the limited capability of the Internet appliance. Nonetheless, these images are named persistent objects (i.e., they have separate identities which are their URLs). The proxy or Web server is merely trying to provide different versions of a named entity based on the capability of a receiving device. This is independent of any caching issues at the proxy or Web server to improve object access time.
Various work exists to provide different versions of a named object in the Web environment to support Internet appliances access to the Web. For example, PRISM from Spyglass (see e.g., http://www.spyglass.com) provides different versions of images to the Internet appliance. It can also dynamically translate richly formatted Web documents into simplified Web pages to accommodate the requirements of the receiving devices. A means for performing on-demand data type-specific lossy compression on semantically typed data and tailoring content to the specific constraints of the clients is described in “Adapting to Newark and Client Variability via On-Demand Dynamic Distiflation,” by A. Fox, et al., Proc. 7th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.
Using formal descriptors, such as a markup language, to describe a digital document provides tremendous flexibility. In the Internet environment, more powerful markup languages such as XML, or a subset of the Standard Generalized Markup Language (SGML) (see e.g., ISO 8879/1986; and Designing XML Internet Applications, by M. Laventhal, et al., Prentice Hall, 1998), arc being defined to augment HTML. The markup language description can provide rich information on the document structure and the final document to be generated. In fact, XML is a language that allows users to define their own language. For example, chemists can define a chemical markup language to describe a molecular structure. Mathematicians or scientists can define a math markup language to describe complex mathematical formulas. The interpretation of the markup language description and generation of the object can thus be complex. It is desirable to avoid regeneration of the same description repeatedly. Since Web pages, objects or documents on a common subject, or from the same company/division/department or authors often have parts in common, there is a need to go beyond recognizing just the repeated references to named entities (i.e., subject already has a name, e.g., URL) to subparts of named entities.
However, proxy or Web servers and client browsers today do not interpret the markup language to decompose a document or object into components, provide persistent identities and tracking mechanisms to facilitate caching and recognition of repeated occurrences of components of a named object. They mainly provide caching or processing service for named objects as a whole. For example, as mentioned previously, in HTML the text documents and images (which are separated out from the text documents by the authors) are all named objects and hence cacheable entities. Another problem is that if a document includes dynamic content caching is not meaningful as the next reference to the same document URL can result in a different version of the document. Thus a document is not cached even if only a small fraction of its content is dynamic. This is an issue for HTML documents today and is expected to become more severe for XML documents, which are more flexible and make it easier to incorporate various types of dynamic information, such as data from a database.
Thus, the need remains for a system and method for identifying and creating one or more persistent object fragments from named object, for example to facilitate caching. The present invention addresses this need.
In accordance with the aforementioned needs, the present invention is directed to a method and apparatus for identifying and creating persistent object fragments from a named object. In one example, the present invention is directed to a method and apparatus for dynamically parsing a digital content description of a named digital object, creating and maintaining fragment identities to facilitate caching. Examples of named digital objects include but are not limited to: Web pages described in XML, SGML, and HTML.
The present invention has features which can parse/analyze the object description, identify object fragments and create persistent object fragment identities, and revise the object description by replacing each object fragment with its newly created persistent identity and send the revised object description to the requesting node. Depending upon the properties of a fragment, this can either enable the fragment to be cacheable (which can be at the content/proxy server and the client device in the Web environment), or make the revised object description cacheable at the server and client device. For example, consider the object description of a purchase order which contains a dynamic part to retrieve the current price of a product from the database. This dynamic part may be a small portion of the purchase order, but would prevent the object from being cached. According to one feature of the present invention for recognizing and treating the dynamic part as a separate fragment from the object description, the revised document becomes static and therefore cacheable. Furthermore, fragments can be nested.
A method is also provided to determine which part/segment of a named object to recognize as a fragment identity, based on its properties, which can include its size, processing cost to generate that segment of the object from its description, and other properties such as static vs. dynamic.
The present invention has yet other features to determine which fragments to cache and replace. The cache manager takes into account the fragment size and processing cost to generate the fragment.
The present invention has still other features which allow different versions to be generated for a fragment upon request. The version created can be determined by the property of the requesting devices and the fragment description. Different generators can be maintained for each type of descriptors or markup tags to generate different versions for different types of devices.
An example of a method for identifying object fragments in an object having features of the present invention comprises the steps of: analyzing an object description to identify one or more persistent object fragments associated with the object; creating the one or more persistent object fragments, in response to mid analyzing; and creating a persistent object fragment identity for a persistent object fragment, based on one or more of formal descriptors or an object fragment property. In one embodiment the object description is revised by replacing at least one object fragment with an associated persistent object fragment identity to enable the fragment to be cacheable at one or more of a server and a client; and the revised object description is sent to the client. The client receives the revised object description; and processes and/or caches the revised object description. The client can also receive a version of the one a more object fragments associated with the fragment identity, wherein the version is generated at the server and is based on the capability of the client (e.g., whether it is a handheld device, a set top box, or an Internet appliance.
These, and further, objects, advantages, and features of the invention will be mere apparent from the following detailed description of a preferred embodiment and the appended drawings wherein:
In a preferred embodiment, an XML-like document will be used as an example of a document described using some formal language, such as a markup language.
The first fragment creation eligibility criterion is to recognize and separate out a segment as an object fragment so as to make the remaining document cacheable at the server or client device and/or processable/interpretable at the client device. An example is to recognize a dynamic segment as an object fragment. Consider another example where a segment can not be rendered from the markup language description by a simple client device such as WINDOWS CET™-based Internet appliances. By recognizing the segment as a separate object fragment, the client can process and/or cache the remaining document and let the proxy server interpret the markup language describing the fragment and generate an appropriate version for the client. This limitation on the client devices can be either due to limitation on the processing power or storage capacity of the client device to interpret the markup language and generate the object fragment, the limitation on the bandwidth available to the client device to retrieve the DTD of the fragment or other limitations.
The second criterion is based on the tradeoffs of processing and storage or bandwidth requirements to recognize and separate out a segment as an object fragment so it can be cached separately and reused to avoid going through interpreting the markup language description of the object to generate it again. This will improve response time and reduce server load on fragment re-references. Each fragment—once separated out—may need to be requested separately with additional requests from the client. Thus, preferably, only a segment or group of segments that meet a certain threshold on the processing requirements of interpreting the markup language description to generate the object segment is recognized as a fragment. Another consideration is the additional storage requirement to store the rendered segment. For example, consider two cases. In a first case, the processing time is 100 second of CPU time to generate the segment from the description, and the size of the rendered segment is 10K bytes. In a second case, the processing time is 1 second of CPU time to generate the segment from the description, and the size of the rendered segment is 1000 K bytes. In case 1, the savings on CPU time is substantial while the additional storage cost is minimal. The opposite is true for the second case. In other words, only in the first case is it worthwhile to recognize the segment as a separate fragment for caching. In the preferred embodiment, for an object O, let P(O) be its processing cost to generate a segment from its description and S(O) be the additional storage requirement to store the segment. A value function, F(P(O), S(O)), based on processing costs and storage requirements is used to determine the value of recognizing a fragment. An example of a value function (F) will be processing cost (in seconds) divided by the square root of the additional storage requirement (in 100 Kbytes increments). When the value function exceeds a given threshold (say 5), the segment will be recognized as a fragment.
To facilitate garbage collection of fragment descriptions that are no longer in use, an object-fragment table can be maintained which tracks the fragment created for each object and an fragment-object table to track all objects containing a common fragment. After an object is updated, on its next reference, the object parser may detect that the object now contains some new fragments and some fragments previously contained in the object are no longer in it. It will then check for each fragment no longer in use by the object whether there is any other object containing it based on the fragment-object table. If so, the fragment description element in
In step 1320, if the input is an object (e.g., a server response from a previous object request), the object is rendered and displayed to the user in step 1330. Recall that since persistent, object fragments have been recognized to make the revised object document cacheable at the server or client device and/or processable/interpretable at the client device. Consider the example where a segment can not be rendered from the markup language description by a simple client device such as WINDOWS CE™-based Internet appliances. According to the present invention, by recognizing the segment as a separate object fragment, the client can process and/or cache the revised document and allow the server to interpret the markup language describing the fragment and generate an appropriate version for the client. Examples of the limitations on the client device include but are not limited to the processing power or storage capacity of the client device to interpret the markup language and generate the object fragment; and/or the bandwidth available to the client device to retrieve the description of the fragment. Recall also that the recognition and revision of an object to remove segments qualifying as object fragments enable the object fragment to be cached separately and reused to avoid going through interpreting the markup language description of the object to generate it again. This will improve response time and reduce server load on fragment re-references. Each fragment—once removed—may need to be requested separately with additional requests from the client. Thus, preferably, only a segment or group of segments that meet a certain threshold on the processing requirements of interpreting the markup language description to generate the object segment were recognized as a fragment by the server.
In step 1335, the client determines whether the object is cacheable. Recall that any dynamic object or object exceeding a certain size will be deemed not cacheable at the client device, which often has limited caching capacity. According to the present invention, the server uses persistent object fragment identifiers to replace persistent object fragments (such as dynamic objects or large segments) in a Web object. The revised object is thus more cacheable at the client device, since the server has removed the dynamic or large objects from the object and reduced the size of the object. For example, recall the example of an object description for a purchase order that includes a dynamic part for retrieving the current price of a product from the database. This dynamic part may be a small portion of the purchase order, but would prevent the object from being cached. According to one feature of the present invention for recognizing and treating the dynamic part as a separate fragment from the object description, the revised document becomes static and therefore cacheable. In step 1340, if the object is cacheable, the object is cached at the local client cache. In step 1325, a miscellaneous routine is invoked to handle other types of input, such as a pager message.
A preferred embodiment of the present invention includes features implemented as software tangibly embodied on a computer program product or program storage device for execution on a processor (not shown) provided with the client (60 . . . 63) and/or server (30 . . . 33). For example, software implemented in a popular object-oriented computer executable code such as JAVA provides portability across different platforms. Those skilled in the art will appreciate that other procedure-oriented and object-oriented (OO) programming environments, including but not limited to C++ and Smalltalk can also be employed.
Those skilled in the art will also appreciate that methods of the present invention may be implemented as software for execution on a computer or other processor-based device. The software may be embodied on a magnetic, electrical, optical, or other persistent program and/or data storage device, including but not limited to: magnetic disks, DASD, bubble memory; tape; optical disks such as CD-ROMs; and other persistent (also called nonvolatile) storage devices such as core, ROM, PROM, flash memory, or battery backed RAM. Those skilled in the art will appreciate that within the spirit and scope of the present invention, one or more of the components instantiated in the memory of the clients (60 . . . 63) or server (30 . . . 33) could be accessed and maintained directly via disk (260), the network 25, another server, or could be distributed across a plurality of servers.
Now that a preferred embodiment of the present invention has been described, with alternatives, various modifications and improvements will occur to those skill in the art. Thus, the detailed description should be understood as an example and not as a limitation. The proper scope of the invention is defined by the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5924116 *||Apr 2, 1997||Jul 13, 1999||International Business Machines Corporation||Collaborative caching of a requested object by a lower level node as a function of the caching status of the object at a higher level node|
|US5946697 *||Apr 22, 1997||Aug 31, 1999||Microsoft Corporation||Rapid transfer of HTML files|
|US6012126 *||Oct 29, 1996||Jan 4, 2000||International Business Machines Corporation||System and method for caching objects of non-uniform size using multiple LRU stacks partitions into a range of sizes|
|US6026413 *||Aug 1, 1997||Feb 15, 2000||International Business Machines Corporation||Determining how changes to underlying data affect cached objects|
|US6065058 *||May 9, 1997||May 16, 2000||International Business Machines Corp.||Dynamic push filtering based on information exchanged among nodes in a proxy hierarchy|
|US6122666 *||Feb 23, 1998||Sep 19, 2000||International Business Machines Corporation||Method for collaborative transformation and caching of web objects in a proxy network|
|US6128627 *||Apr 15, 1998||Oct 3, 2000||Inktomi Corporation||Consistent data storage in an object cache|
|US6138141 *||Oct 18, 1996||Oct 24, 2000||At&T Corp||Server to client cache protocol for improved web performance|
|US6178461 *||Dec 8, 1998||Jan 23, 2001||Lucent Technologies Inc.||Cache-based compaction technique for internet browsing using similar objects in client cache as reference objects|
|1||*||"Spyglass Prism Allow Non-PC Devices to Display Content Up to Four Times faster", http://www.spyglass.com/newsflash/releases/091697 prismperf.html, 3 pages, printed Sep. 19, 1997.|
|2||*||"Spyglass: Making Devices Work With The Web", Products and Services, http://www.spyglass.com/product/wp, 7 pages, printed Sep. 19, 1997.|
|3||*||Armando Fox et al., "Adapting to Network and Client Variability via On-Demand Dynamic Distillation", University of California at Berkeley, 11 pages, published in Proc. 7<SUP>th </SUP>Intl. conference on Architectural Support for Programming Language and Operating System, (Oct. 1996).|
|4||*||Benoit Marchal from Pineapplesoft sprl, "An Introduction to SGML", http://www.pineapplesoft.com/reports/sgml/preface.html, 4 pages, (last modified Sep. 25, 1997).|
|5||*||Charu Aggarwal et al., "Caching on the World Wide Web", IEEE Transactions on Knowledge and Data Engineering, vol. 11, No. 1, pp. 94-107, Jan./Feb. 1999.|
|6||*||ISO 8879:1986, http://www.iso.ch/cate/d16387.html, Table of Contents, 1 page, (Last updated on May 8, 1999).|
|7||*||Jadau et al. "Caching of Large Database Objects in Web Server", IEEE Jun. 1007, pp 10-19.|
|8||*||Michael Leventhal et al., "Designing XML Internet Applications", Prentice Hall PTR, Table of Contents, 18 pages, (1998).|
|9||*||Spyglass Ships Spyglass Prism 1.0 Dynamic Content Conversion Solution; Revolutionary Product Delives Existing Web Content to Non-PC Devices, http://www.spyglass.com/newsflash/releases/091697 prismships.html, 3 pages, printed Sep. 19, 1997.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7349977||Dec 17, 2004||Mar 25, 2008||Microsoft Corporation||Fast dynamic measurement of bandwidth in a TCP network environment|
|US7353286||Dec 17, 2004||Apr 1, 2008||Microsoft Corporation||Fast dynamic measurement of bandwidth in a TCP network environment|
|US7391717||Jun 30, 2003||Jun 24, 2008||Microsoft Corporation||Streaming of variable bit rate multimedia content|
|US7548948 *||Nov 4, 2005||Jun 16, 2009||Microsoft Corporation||Client-side caching of streaming media content|
|US7594025||Aug 30, 2004||Sep 22, 2009||Microsoft Corporation||Startup methods and apparatuses for use in streaming content|
|US7634373||Mar 21, 2006||Dec 15, 2009||Microsoft Corporation||Midstream determination of varying bandwidth availability|
|US7650421||Dec 30, 2002||Jan 19, 2010||Microsoft Corporation||Adaptable accelerated content streaming|
|US7725557||Jun 24, 2002||May 25, 2010||Microsoft Corporation||Client-side caching of streaming media content|
|US7783772||Jul 21, 2006||Aug 24, 2010||Microsoft Corporation||Session description message extensions|
|US7809851||Dec 13, 2005||Oct 5, 2010||Microsoft Corporation||Session description message extensions|
|US8275940 *||Mar 27, 2006||Sep 25, 2012||Streamezzo||Method and device for optimisation of the management of a server cache which may be consulted by client terminals with differing characteristics|
|US8316132 *||Sep 8, 2005||Nov 20, 2012||Nokia Corporation||Method to determine the completeness of a service guide|
|US9053199 *||Mar 7, 2012||Jun 9, 2015||Google Inc.||Uniquely identifying script files by appending a unique identifier to a URL|
|US9195638||Jul 6, 2011||Nov 24, 2015||Alibaba Group Holding Limited||Method and apparatus of processing nested fragment caching of a web page|
|US20030236906 *||Jun 24, 2002||Dec 25, 2003||Klemets Anders E.||Client-side caching of streaming media content|
|US20040264489 *||Jun 30, 2003||Dec 30, 2004||Klemets Anders E.||Streaming of variable bit rate multimedia content|
|US20050044166 *||Aug 30, 2004||Feb 24, 2005||Microsoft Corporation||Startup methods and apparatuses for use in streaming content|
|US20050100014 *||Dec 17, 2004||May 12, 2005||Microsoft Corporation||Fast dynamic measurement of bandwidth in a TCP network environment|
|US20050108420 *||Dec 17, 2004||May 19, 2005||Microsoft Corporation||Fast dynamic measurement of bandwidth in a TCP network environment|
|US20060059223 *||Nov 4, 2005||Mar 16, 2006||Microsoft Corporation||Client-side caching of streaming media content|
|US20060092822 *||Dec 13, 2005||May 4, 2006||Microsoft Corporation||Session Description Message Extensions|
|US20060168295 *||Mar 21, 2006||Jul 27, 2006||Microsoft Corporation||Midstream Determination of Varying Bandwidth Availability|
|US20070055786 *||Sep 8, 2005||Mar 8, 2007||Nokia Corporation||Method to determine the completeness of a service guide|
|US20080288722 *||Mar 27, 2006||Nov 20, 2008||Streamezzo||Method for Optimization of the Management of a Server Cache Which May be Consulted by Client Terminals with Differing Characteristics|
|US20130238970 *||Mar 7, 2012||Sep 12, 2013||Google Inc.||Uniquely Identifying Script Files|
|WO2012009191A1 *||Jul 6, 2011||Jan 19, 2012||Alibaba Group Holding Limited||Method and apparatus of processing nested fragment caching of a web page|
|U.S. Classification||711/122, 709/203, 711/118, 715/234|
|International Classification||G06F17/30, G06F12/00, G06F15/16|
|Oct 15, 2008||FPAY||Fee payment|
Year of fee payment: 8
|Aug 14, 2012||FPAY||Fee payment|
Year of fee payment: 12