WO2001027799A1 - Data extractor - Google Patents

Data extractor Download PDF

Info

Publication number
WO2001027799A1
WO2001027799A1 PCT/US2000/028084 US0028084W WO0127799A1 WO 2001027799 A1 WO2001027799 A1 WO 2001027799A1 US 0028084 W US0028084 W US 0028084W WO 0127799 A1 WO0127799 A1 WO 0127799A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
per
user
end user
stripping
Prior art date
Application number
PCT/US2000/028084
Other languages
French (fr)
Other versions
WO2001027799A9 (en
Inventor
Naphtali Rishe
Original Assignee
Leilandbridge Holdings Ltd.
Helfgott & Karas, P.C.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leilandbridge Holdings Ltd., Helfgott & Karas, P.C. filed Critical Leilandbridge Holdings Ltd.
Priority to AU10788/01A priority Critical patent/AU1078801A/en
Publication of WO2001027799A1 publication Critical patent/WO2001027799A1/en
Publication of WO2001027799A9 publication Critical patent/WO2001027799A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/912Applications of a database
    • Y10S707/944Business related
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Definitions

  • the present invention relates generally to the field of data retrieval. More specifically, the present invention is related to a business model which provides a fee- based, real-time, intermediary service including a method of extracting data from third party providers, removing existing formatting information and returning the data to the requester in a desired format.
  • Cambio extracts the desired data fields (which can be spread across multiple lines in a text file) and assembles those fields into a flat record of data. These records are presented in the conventional row/column, tabular format (see http://www.datajunction.com/products/cambio_technical .html).
  • a data extractor system for the extraction, deformatting, and postformatting of data available on the WWW The system enables buffering and streamlining between the user and web data providers; converting the visual presentation of information into data for further processing, translating one data request into a cascade of data requests and pasting results together, filtering data output; allowing a variety of presentations of data different from the original presentation; optional dataflow between the user's applications and the third-party data providers thereby bypassing interactive interfaces.
  • a user connected to the Internet/Web, contacts an intermediate data service which provides an interface to determine various aspects of the user' s query, including output format.
  • the intermediate data service generates a stripping agent, such as a Java program, which is sent to the user's browser to interface with a third party data provider.
  • the Java stripping agent contains the knowledge to strip away the formatting of user interfaces such as HTML, reformat, reorganize, filter and present the data in real-time in a user-selected format.
  • the present invention :
  • the present invention delivers standardized extracted graphic files of spatial data: maps, remote-sensing images, etc.
  • the user can specify a parameter EGREP_SCREEN giving a regular expression to screen the output or a simplified parameter KEYWORDS_SCREEN. (Note: this is post-processing of results after they are received from third-party providers)
  • the intermediate data service subscribes to a variety of pay-per-use services and re-delivers information to paying customers.
  • the end user's convenience, in addition to repackaging, will be that the user does not have to subscribe to many services, just to the intermediate data service (a charge includes a small mark-up, or no mark-up if wholesale rates are obtained).
  • the system performs merges and joins between data from more than one server.
  • joins will be allowed within same site, e.g., by traversing pointers to product detail from product list.
  • the system includes a virtual conceptual semantic schema of all WWW information accessible by the user via the service and allow the user to specify complex database query against same schema without knowing which third-party sites need to be accessed or joined to perform the query.
  • the program can employ Java-agent technology, which agent will perform all the activities at user site; reducing traffic on the intermediate data service and will also protect the intermediate service provider from possible claims of third-party data providers regarding reselling or storing of their data contrary to license or copyright provisions.
  • the program will allow a number of post-formatting options, including: audio file produced after adding connecting words to properly delineate fields (it is impossible to produce a meaningful audio file without first stripping output and delimiting fields with connecting words) smart translation into other languages ;
  • the present invention will decide which fields should be translated and which should not, exercising its knowledge of the semantics of the data source.
  • the program is written in such a way that definitions of the third-party web site protocols are outside of the program, in a Knowledge Base, and easy to maintain and change by a low-skilled staff.
  • the intermediate data service maintains a large database or references to data providing sites whose input/output stripping instructions are known.
  • the present invention replies with a list of third party services it knows to query, the kind of information they provide, and list of field names.
  • Examples of services to be supported are: various white and yellow phone directories business directories and classification (SIC)-zip2.com weather services stock quotes (input: a list of ticker symbols) public English dictionaries, bilingual dictionaries, and thesauri web search engines (Dog Metafind; Yahoo!; Infoseek) geographic text servers (zipcode ⁇ — >city, address ⁇ -> area code ⁇ -> airport code) online translators airline schedules and flight info (airline-specific sites) professional directories: doctors, lawyers
  • FIG. 1 illustrates a flowchart of the present invention.
  • Figure 2 illustrates an enhanced information services interface
  • Figure 3 illustrates an enhanced keyword search interface
  • Figure 4 illustrates an example output in text(tab-delimited) format.
  • Figure 5 illustrates an example output in HTML format.
  • Figure 6 illustrates an SQL interface
  • Figure 7a-c collectively illustrate a SQL example.
  • Figure 1 illustrates a flow diagram of a user 102 connecting to a data supplier 104 to perform a search during a typical search session using known Internet/WWW search engines such as Lycos®, Excite®, Snap®, Infoseek®, Webcrawler®, etc.
  • User 102 represents a PC owner with Internet access and a browser 110 (e.g., Netscape® or Microsoft Explorer®), WebTVD, or other Internet/WWW access methods.
  • a browser 110 e.g., Netscape® or Microsoft Explorer®
  • WebTVD or other Internet/WWW access methods.
  • the present invention provides for an intermediate data service enhancement 106 enabling the user to: strip 116 away the formatting of user interfaces such as HTML used by either the data provider 104 or browser 110, reformat, reorganize, filter and present the data 126 in a user selected format.
  • User 102 connects 108 to intermediate data service enhancement 106 through their browser 110.
  • Intermediate data service enhancement 106 provides a user with a search enhancement interface (figures 2-7c) to determine a choice of data supplier 104, return data format, and query.
  • Intermediate data service enhancement 106 returns 109 a Java strip class algorithm 116 to the user' s system to enable realtime local enhancement.
  • the strip algorithm 116 retrieves the requested data 118/ 120 strips the non-data formatting, reformats, reorganizes, filters and presents the data 126 in a user selected format 126.
  • Figure 2 illustrates a typical user interface 200 provided by the intermediate data service enhancement 106.
  • User 102 first selects an Information Service 202 such as LawStreet® (shown), Bellsouth®, Excite®, Webcrawler®, Lycos®, Snap®, Goto®, Scrubtheweb®, MSN®, or a generic search engine - search and actuates this selection by selecting "Go".
  • Information Service 202 such as LawStreet® (shown), Bellsouth®, Excite®, Webcrawler®, Lycos®, Snap®, Goto®, Scrubtheweb®, MSN®, or a generic search engine - search and actuates this selection by selecting "Go".
  • Instructions provided include: a. With this application you may get enhanced data from various Information Service Providers - 206 b. You can save the results of your query as various formats - 208. You can make advanced ad-hoc queries 210 c.
  • Working example(s) - 212 in
  • Entry box 302 enables the user to enter typical keyword(s) normally used during a search.
  • Drop-down menu 304 enables the user to select a desired output format such as "plain text (tab-delimited)"- shown, HTML, Excel®, Microsoft Access®, or other known data formats.
  • Drop-down menu 306 enables a selection of language, e.g., English (shown), French, German, Italian, etc., for the returned data.
  • keywords, output format and language the user can start the creation of a Java strip algorithm by selecting "Go" or change information service providers at 202.
  • the user can bookmark 310 the result for future access.
  • Intermediate data service enhancement 106 returns a Java strip class algorithm (Java strip agent) which works locally with the user's browser 110 to return a "data only" result in the format selected by the user selections registered in the steps outlined above. All Java® strip agents 116 created by the intermediate data service enhancement 106 system are retained therein for quick future access by a requesting user.
  • Java strip class algorithm Java strip agent
  • Figures 4 and 5 illustrate two possible outputs of the example 212.
  • the user selected "BellSouth” as the service provider 202, "Sears” as the keyword 302, "English” as the output language 306 and "plain text (tab-delimited)" as the output shown in Figure 4 and HTML as shown in Figure 5.
  • Figure 6 illustrates the SQL (structured query logic) query interface 600 with BellSouth Yellow Pages chosen as the service provider. The user is given guidance instructions 602-622 describing basic SQL procedures related to BellSouth databases as follows: BASIC PARAMETERS OF THE DATABASE
  • the user can issue one or many SQL queries to BellSouth Yellow Pages. 602
  • Each SQL query should have the semicolonQ) marker on the end - 606
  • FIGS 7a-7c reveal the actual SQL query entered into the series of SQL statements entry box 626.
  • Figure 7c illustrates the examples 620/622 which produce the SQL
  • Strip Class algorithm 116 equating to the SQL and enhanced data output selections which
  • the dispatched agent retrieves and reformats the data. Limited amounts of results are delivered to the user at no charge. Larger amounts are charged in correlation to the amount of data retrieved.
  • the user has the option to order a preview and sampling of data before the full set is delivered and the charge is effected.
  • the user purchases a license for unlimited utilization of the service. Additionally, the service can be provided free to the user and payments made by advertisers or other third parties. Furthermore, in some situations, no charges would be incurred at all.
  • LAN local area network
  • WWW wide area network
  • All programming, Java strip agent algorithms, GUIs, display panels and dialog box templates, and data related thereto are stored in computer memory, static or dynamic, and may be retrieved by the user of the present invention in any of: conventional computer storage, display (i.e. CRT) and/or hardcopy (i.e. printed) formats.
  • the programming of the present invention may be implemented by one of skill in the art of database, Internet related and E-commerce programming.

Abstract

A data extractor system for extraction, deformatting, and postformatting of data available on the WWW. A user (102), connected to the Internet/Web, contacts a fee-based intermediate data service which provides an interface to determine various aspects of the user's query, including output format. The intermediate data service (106) generates a Java stripping agent which is sent to the user's browser (110) to interface with a third party data provider. The Java stripping agent contains the knowledge to strip (116) away the formatting of user interfaces. The system will allow buffering and streamlining between the user and web data providers; converting visual presentation of information into data for further processing, translating one data request into a cascade of data requests and pasting results together, filtering data output; allowing a variety of presentations of data different from the original presentation; optional dataflow between the user's applications and the third-party data providers bypassing interactive interfaces.

Description

DATA EXTRACTOR BACKGROUND OF THE INVENTION Field of Invention
The present invention relates generally to the field of data retrieval. More specifically, the present invention is related to a business model which provides a fee- based, real-time, intermediary service including a method of extracting data from third party providers, removing existing formatting information and returning the data to the requester in a desired format.
Discussion of Prior Art
The proliferation of the Internet and World Wide Web (WWW) has produced a deluge of information often times in unmanageable formats to the average user. To assist the user, various search engines have been developed which work through the user's browser to keyword search various indexed data sources. While search results of text Web pages may be easy to manage, search results of structured type data prove not to be so easily managed. Typically database results are returned preformatted in HTML, text or spreadsheet forms. The user, however, has no means of selecting a format not envisioned by the data supplier. The user may want to select a data output only in spreadsheet format for direct integration into locally stored table structures. Most users cannot perform such a conversion because of software or hardware limitations, and certainly not in real-time. What is needed is an intermediate service provider through which a user can enhance their data retrieval by customizing the data output without having to create complex algorithms or mapping structures locally on their PC. The following prior art describes various attempts to extract data from database sources located on the Web. The patent to Schofield (5,860,072), assigned to Tandem Computers Incorporated, provides for a Method and Apparatus for Transporting Interface Definition Language- Defined Data Structures Between Heterogenous Systems. Data strings are stored locally in a receiving computer' s buffer and thereafter, the data structure extracted, realigned and stored. Column 4, lines 37-39 suggest an Internet embodiment.
The patent to Horvitz et al. (5,864,848), assigned to Microsoft Corporation, provides for a Goal-Driven Information Interpretation and Extraction System. Column 1, lines 47-52 suggest the extraction of data from Internet web pages.
The web page entitled, "Visual Design and Cross-Platform Execution", provides for a technical overview of the software product "Cambio." Cambio extracts the desired data fields (which can be spread across multiple lines in a text file) and assembles those fields into a flat record of data. These records are presented in the conventional row/column, tabular format (see http://www.datajunction.com/products/cambio_technical .html).
The web page entitled, "GlimpseGate", provides for context searching of html web documents with data strings (see http://phones.cybercell.net/~hsf/sources/glimpse gate/).
Additional data extractors can be found in the following patents, web pages and articles:
US patents: 5,761,656 to Ben-Shachar; 5,819,265 to Ravin et al. ; 5,870,746 to Knutson et al.; 5,881,232 to Cheng et al., and 5,892,908 to Hughes et al.,
Web sites:
4.1 Overview -http://skwww.enc.iis.sinica.edu.tw/user-manual/ node42.html; HelponCitibaseDataExtraction- http://biscu.its.yale.edu /SSDA helpfiles/citihelp.html HTML Presentation - http://www.fortnet.org
/FortNet/HTML/Presentation/stats/HTML2TEXT vl.51 - http://www. telekabel.nl/sprinter/wieger/html2txt.htm HTMLess 2.0 - http://elanor.sci
.muni.cz/ar/ar407_Sections/newsl9.html NeXtract - http://www.nextract
.com
Article: SAC Software Agent Corporation Presents The Search
Agent - http://www.io.com/~sac/, and article by Lawrence, Steve et al.,
ZEEE Internet Computing,
"Context and Page Analysis for Improved Web Search", July-August
1998, pp. 38-46. Whatever the precise merits, features and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention, one of which specifically to provide an Ε-commerce business model and system including an intermediate service provider through which a user can enhance their WWW data retrieval by customizing the data output in realtime without creating and maintaining complex data mapping algorithms. The prior art shows that both stripping algorithms and Java agents are known, however, neither have been used to dispatch intermediary agents for real-time extraction of structured data from HTML pages accessed by the user and arbitrary post-processing of third party data.
These and other objectives are achieved by the detailed description that follows .
SUMMARY OF THE INVENTION
A data extractor system for the extraction, deformatting, and postformatting of data available on the WWW. The system enables buffering and streamlining between the user and web data providers; converting the visual presentation of information into data for further processing, translating one data request into a cascade of data requests and pasting results together, filtering data output; allowing a variety of presentations of data different from the original presentation; optional dataflow between the user's applications and the third-party data providers thereby bypassing interactive interfaces.
A user, connected to the Internet/Web, contacts an intermediate data service which provides an interface to determine various aspects of the user' s query, including output format. The intermediate data service generates a stripping agent, such as a Java program, which is sent to the user's browser to interface with a third party data provider.
The Java stripping agent contains the knowledge to strip away the formatting of user interfaces such as HTML, reformat, reorganize, filter and present the data in real-time in a user-selected format. The present invention:
1. Embeds all user input in a standardized way in a URL (CGI), hiding from the user various data entry protocols such as post-data, Java script data entry forms, etc. Thus, allowing the user to: a. bookmark this URL with predefined input data b. embed this URL in various user scripts
2. Converts the formatted data retrieved from third party data provider into an ASCII file, one line per result, tabs separating fields; eliminating all graphics and irrelevant text, leaving only data allowing: a. convenient downloading of data into user applications b. compact results c. development of embedded applications
3. When a third-party site gives a few records at a time and a "next" button, the present invention recursively dispatches an agent to recursively call the third-party data provider to give the user in one operation a large volume of data. 4. In addition to plain ASCII output by default, the user will be able to parametrically specify additional forms of output: formatted ASCII (72 characters per line, aligned spaces instead of tabs, one field can continue on several lines)
RTF
HTML tables
PDF
Postscript
And others The present invention delivers standardized extracted graphic files of spatial data: maps, remote-sensing images, etc.
5. The user can specify a parameter EGREP_SCREEN giving a regular expression to screen the output or a simplified parameter KEYWORDS_SCREEN. (Note: this is post-processing of results after they are received from third-party providers)
6. In an alternative embodiment, the intermediate data service subscribes to a variety of pay-per-use services and re-delivers information to paying customers. The end user' s convenience, in addition to repackaging, will be that the user does not have to subscribe to many services, just to the intermediate data service (a charge includes a small mark-up, or no mark-up if wholesale rates are obtained).
7. In an alternative embodiment, the system performs merges and joins between data from more than one server.
8. In an alternative embodiment, certain joins will be allowed within same site, e.g., by traversing pointers to product detail from product list.
9. In an alternative embodiment, the system includes a virtual conceptual semantic schema of all WWW information accessible by the user via the service and allow the user to specify complex database query against same schema without knowing which third-party sites need to be accessed or joined to perform the query.
10. The program can employ Java-agent technology, which agent will perform all the activities at user site; reducing traffic on the intermediate data service and will also protect the intermediate service provider from possible claims of third-party data providers regarding reselling or storing of their data contrary to license or copyright provisions.
11. The program will allow a number of post-formatting options, including: audio file produced after adding connecting words to properly delineate fields (it is impossible to produce a meaningful audio file without first stripping output and delimiting fields with connecting words) smart translation into other languages ; The present invention will decide which fields should be translated and which should not, exercising its knowledge of the semantics of the data source.
12. The program is written in such a way that definitions of the third-party web site protocols are outside of the program, in a Knowledge Base, and easy to maintain and change by a low-skilled staff.
13. The intermediate data service maintains a large database or references to data providing sites whose input/output stripping instructions are known.
14. When no parameters are given, the present invention replies with a list of third party services it knows to query, the kind of information they provide, and list of field names.
15. Examples of services to be supported are: various white and yellow phone directories business directories and classification (SIC)-zip2.com weather services stock quotes (input: a list of ticker symbols) public English dictionaries, bilingual dictionaries, and thesauri web search engines (Dog Metafind; Yahoo!; Infoseek) geographic text servers (zipcode<— >city, address<-> area code <-> airport code) online translators airline schedules and flight info (airline-specific sites) professional directories: doctors, lawyers
Microsoft aerial photography maps
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates a flowchart of the present invention.
Figure 2 illustrates an enhanced information services interface.
Figure 3 illustrates an enhanced keyword search interface.
Figure 4 illustrates an example output in text(tab-delimited) format.
Figure 5 illustrates an example output in HTML format.
Figure 6 illustrates an SQL interface.
Figure 7a-c collectively illustrate a SQL example.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
While this invention is illustrated and described in a preferred embodiment, the device may be produced in many different configurations, forms and materials. There is depicted in the drawings, and will herein be described in detail, a preferred embodiment of the invention, with the understanding that the present disclosure is to be considered as a exemplification of the principles of the invention and the associated functional specifications of the materials for its construction and is not intended to limit the invention to the embodiment illustrated. Those skilled in the art will envision many other possible variations within the scope of the present invention.
Figure 1 illustrates a flow diagram of a user 102 connecting to a data supplier 104 to perform a search during a typical search session using known Internet/WWW search engines such as Lycos®, Excite®, Snap®, Infoseek®, Webcrawler®, etc. User 102 represents a PC owner with Internet access and a browser 110 (e.g., Netscape® or Microsoft Explorer®), WebTVD, or other Internet/WWW access methods.
The present invention provides for an intermediate data service enhancement 106 enabling the user to: strip 116 away the formatting of user interfaces such as HTML used by either the data provider 104 or browser 110, reformat, reorganize, filter and present the data 126 in a user selected format. User 102 connects 108 to intermediate data service enhancement 106 through their browser 110. Intermediate data service enhancement 106 provides a user with a search enhancement interface (figures 2-7c) to determine a choice of data supplier 104, return data format, and query. Intermediate data service enhancement 106 returns 109 a Java strip class algorithm 116 to the user' s system to enable realtime local enhancement. The strip algorithm 116 retrieves the requested data 118/ 120 strips the non-data formatting, reformats, reorganizes, filters and presents the data 126 in a user selected format 126.
Figure 2 illustrates a typical user interface 200 provided by the intermediate data service enhancement 106. User 102 first selects an Information Service 202 such as LawStreet® (shown), Bellsouth®, Excite®, Webcrawler®, Lycos®, Snap®, Goto®, Scrubtheweb®, MSN®, or a generic search engine - search and actuates this selection by selecting "Go". Instructions provided include: a. With this application you may get enhanced data from various Information Service Providers - 206 b. You can save the results of your query as various formats - 208. You can make advanced ad-hoc queries 210 c. Working example(s) - 212 in formats 214 (text (tab-delimited), HTML, Excel®, Microsoft Access®.
Upon actuation of the "Go" button 204, the user receives the next user interface 300 as shown in figure 3. Entry box 302 enables the user to enter typical keyword(s) normally used during a search. Drop-down menu 304 enables the user to select a desired output format such as "plain text (tab-delimited)"- shown, HTML, Excel®, Microsoft Access®, or other known data formats. Drop-down menu 306 enables a selection of language, e.g., English (shown), French, German, Italian, etc., for the returned data. Upon selecting keywords, output format and language, the user can start the creation of a Java strip algorithm by selecting "Go" or change information service providers at 202. In addition, the user can bookmark 310 the result for future access. Intermediate data service enhancement 106 returns a Java strip class algorithm (Java strip agent) which works locally with the user's browser 110 to return a "data only" result in the format selected by the user selections registered in the steps outlined above. All Java® strip agents 116 created by the intermediate data service enhancement 106 system are retained therein for quick future access by a requesting user.
Figures 4 and 5 illustrate two possible outputs of the example 212. In this example, the user selected "BellSouth" as the service provider 202, "Sears" as the keyword 302, "English" as the output language 306 and "plain text (tab-delimited)" as the output shown in Figure 4 and HTML as shown in Figure 5.
In addition to simple keyword searches, advanced ad-hoc queries 210 can be made with relative ease. Figure 6 illustrates the SQL (structured query logic) query interface 600 with BellSouth Yellow Pages chosen as the service provider. The user is given guidance instructions 602-622 describing basic SQL procedures related to BellSouth databases as follows: BASIC PARAMETERS OF THE DATABASE
The user can issue one or many SQL queries to BellSouth Yellow Pages. 602
• The name of the table, which consists Yellow Pages is "AllBell"- 604
• Each SQL query should have the semicolonQ) marker on the end - 606
• The results of the last query could be displayed on the screen - 608
• The user should specify the list of field names for getting these results - 610
• The number of output lines is limited to 500 lines. - 612 EXAMPLES
Please look into examples of SQL queries and feel free to modify them or put your own SQL query: - 614
• Businesses which are located in some zip-code - 616
• Count all businesses, which are located in some zip-code (count example) - 618
• Select phones like "348%" 620, then show all distinct cities - 622
As each hypertext example 616-622 is selected, the interfaces shown in figures 7a-7c reveal the actual SQL query entered into the series of SQL statements entry box 626. Figure 7a illustrates the SQL entry 701 "Select * from AllBell where zip=33199"; 716 correlating to the text example 616. Figure 7b illustrates the example 618 which produces the SQL entry 702 "Select count (*) CountResults from AllBell where zip=33174"; - 718. Figure 7c illustrates the examples 620/622 which produce the SQL
entries 703 "Select* from AllBell where Phone Like 348%"; - 720; Select distinct city from Allbell; - 722".
The remainder of SQL selections include Output columns 624 desired for data
output and output format 628. "Go" 628 actuates the SQL process, creating the Java®
Strip Class algorithm 116 equating to the SQL and enhanced data output selections which
is then returned to the user 102.
The dispatched agent retrieves and reformats the data. Limited amounts of results are delivered to the user at no charge. Larger amounts are charged in correlation to the amount of data retrieved. The user has the option to order a preview and sampling of data before the full set is delivered and the charge is effected. In an alternative embodiment, the user purchases a license for unlimited utilization of the service. Additionally, the service can be provided free to the user and payments made by advertisers or other third parties. Furthermore, in some situations, no charges would be incurred at all.
Format of the default ASCII output follows: row separator: newline (+ optionally carriage return if parameter DOS=y is given) field separator: tab or other user-specified delimiter structure: <document titles>
<lines of column headers>
<data>
informational messages, including sites contacted queries performed, time stamps, statistics>
<error messages>
<optional promotional material and paid advertisements> <links to third party services used> The above enhancements for data extraction and its described functional elements are implemented in various computing environments. For example, the present invention may be implemented on a conventional IBM PC or equivalent, multi-nodal system (e.g.
LAN) or networking system (e.g. Internet, WWW). All programming, Java strip agent algorithms, GUIs, display panels and dialog box templates, and data related thereto are stored in computer memory, static or dynamic, and may be retrieved by the user of the present invention in any of: conventional computer storage, display (i.e. CRT) and/or hardcopy (i.e. printed) formats. The programming of the present invention may be implemented by one of skill in the art of database, Internet related and E-commerce programming.
CONCLUSION
A system and method has been shown in the above embodiments for the effective implementation of a data extractor. While various preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, it is intended to cover all modifications and alternate constructions falling within the spirit and scope of the invention as defined in the appended claims. For example, the present invention should not be limited by computer operating system, database management system, database management model, directory structure, DBMS-file linking technology, the type of user interface, computer hardware platform, network operating system, programming language of the agents, archiving software, or archiving hardware. In addition, the present invention can be implemented locally on a single PC, connected workstations (i.e. networked-LAN), across extended networks such as the Internet or using portable equipment such as laptop computers or wireless equipment (RF, microwaves, infrared, photonic, etc.)

Claims

1. An E-commerce system generating revenues by providing a data extraction service for an end user comprising the method: receiving a request from said end user comprising a database query to a data supplier; providing an interface to said end user, said user interface requesting at least data output formatting requirements; generating an agent based on at least said database query and formatting requirements; communicating said agent to said end user; said end user implementing said agent to extract data from said data supplier, said data returned to said end user in said format specified, and billing said end user for use of the created agents.
2. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 1, wherein said data extraction includes stripping formatting information and reformatting in the requested format.
3. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 2, wherein said step of stripping formatting information includes stripping HTML formatting.
4. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 2, wherein said step of stripping formatting information includes stripping graphics from HTML data.
5. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 2, wherein said stripping and reformatting is performed locally at the end user's location.
6. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 5, wherein said stripping and reformatting is performed in real-time.
7. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 1 , wherein said agents are retained by said data extraction service for future use.
8. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 1, wherein said user interface further includes a selection from known data suppliers.
9. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 1, wherein said user interface further includes SQL query capability.
10. An E-commerce system generating revenues by providing a data extraction service for an end user as per claim 1, wherein said user interface further requests a language format to return data to said end user.
11. A Web based system for data extraction comprising: a data requestor connected to the Web; a third party data provider; an intermediate service provider, said intermediate service provider receiving a request from said requestor for data located at said third party data provider; said intermediate service provider providing a user interface to said requestor, said user interface requesting at least data formatting requirements; generating an agent based on at least said request for data and formatting requirements; communicating said agent to said requestor; said end user implementing said agent to extract data from said data supplier, and said requested data returned to said end user in said format specified.
12. A Web based system for data extraction as per claim 11, wherein said requestor is billed for the services provided by said intermediate service provider.
13. A Web based system for data extraction as per claim 1 1, wherein said data extraction includes stripping formatting information and reformatting in the requested format.
14. A Web based system for data extraction as per claim 13, wherein said step of stripping formatting information includes stripping HTML formatting.
15. A Web based system for data extraction as per claim 13, wherein said step of stripping formatting information includes stripping a graphics from HTML data.
16. A Web based system for data extraction as per claim 13, wherein said stripping and reformatting is performed locally at the end user's location.
17. A Web based system for data extraction as per claim 11, wherein said stripping and reformatting is performed in real-time.
18. A Web based system for data extraction as per claim 11, wherein said agents are retained by said intermediate service provider for future use.
19. A Web based system for data extraction as per claim 11, wherein said graphical user interface further includes a selection from known data suppliers.
20. A Web based system for data extraction as per claim 11, wherein said graphical user interface further includes SQL Capability.
21. A Web based system for data extraction as per claim 11, wherein said graphical user interface further requests a language parameter to return data to said end user.
22. A computer-based method of extracting data in a selected format from third party data providers comprising: a computer user, connected to a computer network, contacting a data extraction service provider; said data extraction service provider requesting at least a source and output format of a data query, and said data extraction service provider returning to said computer user a stripping agent to enable said computer user to obtain said data from of said data source in the requested output format.
23. A computer-based method of extracting data in a selected format from third party data providers as per claim 22, wherein said stripping agent strips formatting information including HTML.
24. A computer-based method of extracting data in a selected format from third party data providers as per claim 22, wherein said stripping agent strips formatting information including graphics from HTML data.
25. A computer-based method of extracting data in a selected format from third party data providers as per claim 22, wherein said stripping and formatting is performed locally at the computer user's location.
26. A computer-based method of extracting data in a selected format from third party data providers as per claim 22, wherein said stripping and formatting is performed in real-time.
27. A computer-based method of extracting data in a selected format from third party data providers as per claim 22, wherein said stripping agents are retained by said data extraction service provider for future use.
28. A computer-based method of extracting data in a selected format from third party data providers as per claim 22, wherein said step of said data extraction service provider requesting at least a source and output format of a data query is provided by a user interface.
29. A computer-based method of extracting data in a selected format from third party data providers as per claim 28, wherein said user interface further includes SQL capability.
30. A computer-based method of extracting data in a selected format from third party data providers as per claim 28, wherein said graphical user interface further requests a language parameter to return data to said end user.
31. A computer-based method of extracting data in a selected format from third party data providers as per claim 28, wherein said graphical user interface further includes a selection from known data suppliers.
32. A computer-based method of extracting data in a selected format from third party data providers as per claim 22, wherein said requestor is billed for the services provided by said data extraction service provider.
PCT/US2000/028084 1999-10-12 2000-10-11 Data extractor WO2001027799A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU10788/01A AU1078801A (en) 1999-10-12 2000-10-11 Data extractor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/415,998 1999-10-12
US09/415,998 US6339773B1 (en) 1999-10-12 1999-10-12 Data extractor

Publications (2)

Publication Number Publication Date
WO2001027799A1 true WO2001027799A1 (en) 2001-04-19
WO2001027799A9 WO2001027799A9 (en) 2002-09-26

Family

ID=23648102

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/028084 WO2001027799A1 (en) 1999-10-12 2000-10-11 Data extractor

Country Status (3)

Country Link
US (1) US6339773B1 (en)
AU (1) AU1078801A (en)
WO (1) WO2001027799A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7305380B1 (en) 1999-12-15 2007-12-04 Google Inc. Systems and methods for performing in-context searching
US7275038B1 (en) * 2000-08-18 2007-09-25 The Crawford Group, Inc. Web enabled business to business operating system for rental car services
US8600783B2 (en) 2000-08-18 2013-12-03 The Crawford Group, Inc. Business to business computer system for communicating and processing rental car reservations using web services
US7899690B1 (en) 2000-08-18 2011-03-01 The Crawford Group, Inc. Extended web enabled business to business computer system for rental vehicle services
US20050154664A1 (en) * 2000-08-22 2005-07-14 Guy Keith A. Credit and financial information and management system
NL1019286C2 (en) * 2001-11-05 2003-05-07 Koninkl Kpn Nv Information storage system.
US7296016B1 (en) 2002-03-13 2007-11-13 Google Inc. Systems and methods for performing point-of-view searching
US8108231B2 (en) * 2002-06-14 2012-01-31 The Crawford Group, Inc. Method and apparatus for improved customer direct on-line reservation of rental vehicles
US20040039612A1 (en) 2002-06-14 2004-02-26 Neil Fitzgerald Method and apparatus for customer direct on-line reservation of rental vehicles
US20040078422A1 (en) * 2002-10-17 2004-04-22 Toomey Christopher Newell Detecting and blocking spoofed Web login pages
US20040162820A1 (en) * 2002-11-21 2004-08-19 Taylor James Search cart for search results
US8090678B1 (en) 2003-07-23 2012-01-03 Shopping.Com Systems and methods for extracting information from structured documents
US20050192937A1 (en) * 2004-02-26 2005-09-01 International Business Machines Corporation Dynamic query optimization
US20050198636A1 (en) * 2004-02-26 2005-09-08 International Business Machines Corporation Dynamic optimization of batch processing
US8707251B2 (en) * 2004-06-07 2014-04-22 International Business Machines Corporation Buffered viewing of electronic documents
US20050289098A1 (en) * 2004-06-24 2005-12-29 International Business Machines Corporation Dynamically selecting alternative query access plans
US20070118607A1 (en) * 2005-11-22 2007-05-24 Niko Nelissen Method and System for forensic investigation of internet resources
US8271309B2 (en) 2006-03-16 2012-09-18 The Crawford Group, Inc. Method and system for providing and administering online rental vehicle reservation booking services
US7974982B2 (en) * 2008-02-04 2011-07-05 Disney Enterprises, Inc. System and method for device profiling using cascaded databases
US20100071046A1 (en) * 2008-09-17 2010-03-18 Yahoo! Inc. Method and System for Enabling Access to a Web Service Provider Through Login Based Badges Embedded in a Third Party Site
US9639515B2 (en) * 2014-01-07 2017-05-02 Bank Of America Corporation Transfer of data between applications using intermediate user interface

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5752246A (en) * 1995-06-07 1998-05-12 International Business Machines Corporation Service agent for fulfilling requests of a web browser
US5761656A (en) * 1995-06-26 1998-06-02 Netdynamics, Inc. Interaction between databases and graphical user interfaces
US5860072A (en) * 1996-07-11 1999-01-12 Tandem Computers Incorporated Method and apparatus for transporting interface definition language-defined data structures between heterogeneous systems
US5864848A (en) * 1997-01-31 1999-01-26 Microsoft Corporation Goal-driven information interpretation and extraction system

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5701451A (en) * 1995-06-07 1997-12-23 International Business Machines Corporation Method for fulfilling requests of a web browser
US5870746A (en) * 1995-10-12 1999-02-09 Ncr Corporation System and method for segmenting a database based upon data attributes
US5774123A (en) * 1995-12-15 1998-06-30 Ncr Corporation Apparatus and method for enhancing navigation of an on-line multiple-resource information service
US6016484A (en) * 1996-04-26 2000-01-18 Verifone, Inc. System, method and article of manufacture for network electronic payment instrument and certification of payment and credit collection utilizing a payment
US5918013A (en) * 1996-06-03 1999-06-29 Webtv Networks, Inc. Method of transcoding documents in a network environment using a proxy server
US5819265A (en) * 1996-07-12 1998-10-06 International Business Machines Corporation Processing names in a text
US5881232A (en) * 1996-07-23 1999-03-09 International Business Machines Corporation Generic SQL query agent
US5892908A (en) * 1996-09-10 1999-04-06 Marketscape Method of extracting network information
AU4823697A (en) * 1996-10-15 1998-05-11 Cymedix Corp. Automated networked service request and fulfillment system and method
US6233601B1 (en) * 1996-11-14 2001-05-15 Mitsubishi Electric Research Laboratories, Inc. Itinerary based agent mobility including mobility of executable code
US6065039A (en) * 1996-11-14 2000-05-16 Mitsubishi Electric Information Technology Center America, Inc. (Ita) Dynamic synchronous collaboration framework for mobile agents
US6061665A (en) * 1997-06-06 2000-05-09 Verifone, Inc. System, method and article of manufacture for dynamic negotiation of a network payment framework
JPH1155324A (en) * 1997-07-31 1999-02-26 Fujitsu Ltd Communication system for computer network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5752246A (en) * 1995-06-07 1998-05-12 International Business Machines Corporation Service agent for fulfilling requests of a web browser
US5761656A (en) * 1995-06-26 1998-06-02 Netdynamics, Inc. Interaction between databases and graphical user interfaces
US5860072A (en) * 1996-07-11 1999-01-12 Tandem Computers Incorporated Method and apparatus for transporting interface definition language-defined data structures between heterogeneous systems
US5864848A (en) * 1997-01-31 1999-01-26 Microsoft Corporation Goal-driven information interpretation and extraction system

Also Published As

Publication number Publication date
WO2001027799A9 (en) 2002-09-26
US6339773B1 (en) 2002-01-15
AU1078801A (en) 2001-04-23

Similar Documents

Publication Publication Date Title
US6339773B1 (en) Data extractor
US7624114B2 (en) Automatically generating web forms from database schema
KR100320980B1 (en) Apparatus and method for formatting a web page
US5987454A (en) Method and apparatus for selectively augmenting retrieved text, numbers, maps, charts, still pictures and/or graphics, moving pictures and/or graphics and audio information from a network resource
US6848077B1 (en) Dynamically creating hyperlinks to other web documents in received world wide web documents based on text terms in the received document defined as of interest to user
JP3943830B2 (en) Document composition method and document composition apparatus
US20090094327A1 (en) Method and apparatus for mapping a site on a wide area network
Heath et al. Envision: A user-centered database of computer science literature
EP1251438A2 (en) Information retrieval system
US20030018607A1 (en) Method of enabling browse and search access to electronically-accessible multimedia databases
JP2005535039A (en) Interact with desktop clients with geographic text search systems
JP2004334866A (en) Conversion of web site summary through tag rib
US6915303B2 (en) Code generator system for digital libraries
JP3378848B2 (en) Message brokers that provide publish / subscribe services and methods for processing messages in a publish / subscribe environment
JP5048956B2 (en) Information retrieval by database crawling
JP2003524823A (en) Systems and methods for capturing and managing information from digital sources
JP2003016101A (en) System and method for retrieving electronic catalog
US6754697B1 (en) Method and apparatus for browsing and storing data in a distributed data processing system
CA2405893A1 (en) Xml flattener
Yates et al. Searching the web using a map
Agosti et al. Managing the interactions between handheld devices, mobile applications, and users
Davis Locating and Accessing Data and Information on the Internet: Methods and Organizational Impacts
AU768160B2 (en) Method of enabling browse and search access to electronically-accessible multimedia databases
Gobel et al. Metadata information systems for geospatial data
Sreenath et al. Metadata-Mediated Browsing and Retrieval in a Cultural Heritage Image Collection

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: C2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/9-9/9, DRAWINGS, REPLACED BY NEW PAGES 1/9-9/9; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP