WO2005029365A2 - Surveillance, monitoring and real-time events platform - Google Patents

Surveillance, monitoring and real-time events platform Download PDF

Info

Publication number
WO2005029365A2
WO2005029365A2 PCT/US2004/021671 US2004021671W WO2005029365A2 WO 2005029365 A2 WO2005029365 A2 WO 2005029365A2 US 2004021671 W US2004021671 W US 2004021671W WO 2005029365 A2 WO2005029365 A2 WO 2005029365A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
processing system
rdf
query
Prior art date
Application number
PCT/US2004/021671
Other languages
French (fr)
Other versions
WO2005029365A3 (en
Inventor
Colin P. Britton
Howard Greenblatt
Alan Greenblatt
Original Assignee
Metatomix, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metatomix, Inc. filed Critical Metatomix, Inc.
Priority to EP04809476A priority Critical patent/EP1690210A2/en
Priority to US11/064,438 priority patent/US7890517B2/en
Publication of WO2005029365A2 publication Critical patent/WO2005029365A2/en
Publication of WO2005029365A3 publication Critical patent/WO2005029365A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Definitions

  • the invention pertains to surveillance, monitoring and real-time event processing. It has application in public health & bioterrorism, border and port security, public and community safety, and government data integration, to name a few.
  • Today, national, state, and local governments are challenged to achieve unprecedented levels of cooperation in and among agencies and organizations charged with protecting the safety of communities.
  • Many of these organizations use either proprietary or incompatible technology infrastructures that need to be integrated in order to provide real-time, critical infor- mation for effective event monitoring and coordinated emergency response. Information must be shared instantaneously and among numerous entities to effectively identify and respond to a potential threat or emergency-related event.
  • the Centers for Disease Control and Prevention (CDC) of the U.S. Department of Health and Human Services has launched several initiatives toward forming nationwide networks of shared health-related information that, when fully implemented, will facilitate the rapid identification of, and response to, health and bioterrorism threats.
  • the CDC plans the Health Alert Network (HAN), for example, to provide infrastructure supporting for distribution of health alerts, disease surveillance, and laboratory reporting.
  • HAN Health Alert Network
  • the Public Health Information Network (PHTN) is another CDC initiative that will provide detailed specifications for the acquisition, management, analysis and dissemination of health-related information, building upon the HAN and other CDC initiatives, such as the National Electronic Disease Surveillance System (NEDSS).
  • An object of this invention is to provide improved methods and apparatus surveillance, monitoring and real-time events processing.
  • a related object is to provide such methods and apparatus as can applied in public health and bioterrorism, e.g., to facilitate CDC initiatives in this area.
  • a further object of the invention is to provide such methods and apparatus as can be applied border and port security, public and community safety, and government data integration.
  • a still further object of the invention is to provide such methods and apparatus as can be implemented inexpensively, incrementally or otherwise without interruption of IT functions that they bring together.
  • systems and methods described herein provide a surveillance, monitoring and real-time events platform to (i) enable the integration and communication of information between government agencies and organizations specifically tasked with ensuring the security and safety of our nation and its communities, (ii) to integrate information systems from federal, state and/or local agencies (from disparate data sources if necessary) in order to obtain a single, real-time view of the entire organization, and (iii) to extract more complete, actionable information from their existing systems, thereby dramatically improving decision making speed and accuracy.
  • the platform has application in a variety of areas, including, public health & bioterrorism, border and port security, public and community safety, and government data integration, to name a few.
  • Effective .and timely surveillance and monitoring of health-related events is essential for early detection and management of a public health threats, whether a naturally occurring disease, such as West Nile Virus, or a biological or chemical attack.
  • State and local public health officials must have the ability to identify the specific nature and scope of an event and launch a tightly coordinated response, all in real-time.
  • the surveillance, monitoring and real-time events plat- form is adapted for use, e.g., as a local, state or federal node, in a network conforming to the Public Health Information Network (PHLN) initiative of the Centers for Disease Control and Prevention (CDC) of the U.S. Department of Health and Human Services, or as an infrastructure element of that network.
  • PHLN Public Health Information Network
  • CDC Centers for Disease Control and Prevention
  • Systems and methods according to this aspect of the invention are designed as for multi- pu ⁇ oses. They function as a real-time surveillance system, a bioterrorism detection and response system and a collaborative network for distance learning and communication.
  • Border and port security represent complex security challenges. These sites represent vulnerable points of entry and require monitoring of ocean vessel arrivals and departures, assessing potentially hazardous cargo, responding to immigration challenges, terrorist threats and managing the proximity risk to civilians and land-based targets such as nuclear facilities, dams, power plants, gas lines, and other biological and chemical facilities. Due to the complex and porous nature of borders and ports, many distinct organizations are required to work in close cooperation and effectively share critical information.
  • the surveillance, monitoring and real-time events platform is adapted for border and port security applications, providing:
  • GIS geo-spatial mapping
  • the surveillance, monitoring and real-time events platform can be deployed in applications designed to identify community threats or security breaches in a wide range of settings including inter-agency solutions for superior security surveillance and response. This platform provides:
  • GIS geo-spatial mapping
  • the surveillance, monitoring and real-time events plat- form provides a single point of access to all state security-related IT systems (Justice Dept, Law Enforcement, Dept of Health) to expedite identifying potential threats.
  • the platform can also provide information visibility across, an organizations systems. The platform:
  • Figure 1 depicts a surveillance, monitoring and real-time events system 100 according to the invention suitable for the adaptation to a public health & bioterrorism application, e.g., as part of PHIN, HAN or NEDSS-compatible networks;
  • Figure 2A depicts an architecture for a hologram data store used in the system of Figure i;
  • Figure 2B depicts the tables in a model store and a triples store of the hologram data store of Figure 2 A;
  • Figure 3 depicts an expert engine to identify information in the data store or from the other information in the system of Figure 1;
  • Figure 4-16 depict a visual display used in the system of Figure 1 to call alerts and other information to the attention of the user.
  • FIG. 1 depicts a surveillance, monitoring and real-time events system 100 according to the invention suitable for the adaptation to a public health & bioterrorism application, e.g., as part of PHIN, HAN or NEDSS networks.
  • Illustrated system 100 represents a data processing station (or stations) resident at a node in such a network, such as, for example, a clinical care provider, a laboratory, a local or state health department, the CDC headquarters, a local or national law enforcement office, or otherwise.
  • a data processing station or stations resident at a node in such a network, such as, for example, a clinical care provider, a laboratory, a local or state health department, the CDC headquarters, a local or national law enforcement office, or otherwise.
  • the illustrated system is used in a public health & bioterrorism application, it will be appreciated that a similar such system can be applied in border & port security, public & community safety, and government data integration applications, described above, among others.
  • Illustrated system 100 which can be embodied in conventional digital data processing apparatus (including attendant processor(s), display units, storage units, and communications devices) of the type conventional in the art, comprises connectors 108 that provide software interfaces to legacy and other databases, data streams, and sources of information — collectively, databases 140 — in clinical care facilities or other entities (such as agency field offices or laboratories), organizations (such as a governmental agencies) or ente ⁇ rises, such as the PHIN network, the HAN network or otherwise.
  • a "hologram" data store 114 (hereinafter, “data store” or “hologram data store”), which is coupled to the databases 140 via the connectors 108, stores data from those databases 140.
  • a framework server 116 accesses the data store 114, presenting selected data to (and permitting queries from) a user browser 118.
  • the server 116 can also permit updates to data in the data store 114 and, thereby, in the databases 140. These updates can include both the addition of new data and the modification of old data.
  • databases 140 include a database 140a maintained with a Sybase® database management system, a database 140b maintained with an Oracle® database management system.
  • the "databases” 140 also include a data stream 140c providing information from other nodes 100b, 100c, lOOd, lOOe, of the PHIN, HAN, NEDSS or other network 120.
  • Those other nodes can be constructed and operated in the manner of system 100 (as suggested in the illustration by their depiction using like silhouettes) or in any other manner consistent with PHIN, HAN, NEDSS or other network operations.
  • the network 120 represents the Internet, wide area network or other medium or collection of media that permit the transfer of information (continuous, periodic or otherwise) between the nodes in a manner consistent with require- ments of PHIN, HAN, NEDSS or other applicable network standards.
  • illustrated databases 140 9 (Detailed Descr) features of illustrated databases 140 are that they provide access to information of actual or potential interest to the node in which system 100 resides and that they can be accessed via application program interfaces (API) or other mechanisms dictated by the PHIN, HAN, NEDS S or other applicable network.
  • API application program interfaces
  • Connectors 108 serve as interfaces to databases, streams and other information sources 140. Each connector applies requests to, and receives information from, a respective database, using that database's API or other interface mechanism, e.g., as dictated by the PHIN, HAN or other otherwise. Thus, for example, connector 108a applies requests to database 140a using the corresponding SAP API; connector 108b applies requests to database 140b using the Oracle API; and connector 108c applies requests to and/or receives information from the stream or information source 140c use PHIN-appropriate, HAN-appropriate, NEDSS-appropriate or other stream or network-appropriate requests. Thus, by way of non-limiting example, the connector 108c can generate requests to the network 120 to obtain data from health care institu- tions and other nodes on the network.
  • the requests can be simple queries, such as SQL queries and the like (e.g., depending on the type of the underlying database and its API) or more complex sets of queries, such as those commonly used in data mining.
  • one or more of the connectors can use decision trees, statistical techniques or other query and analysis mechanisms known in the art of data mining to extract information from the databases.
  • Specific queries and analysis methodologies can be specified by the hologram data store 114 or the framework server 116 for application by the connectors.
  • the connectors themselves can construct specific queries and methodologies from more general queries received from the data store 114 or server 116. For example, request-specific items can be "plugged" into query templates thereby effecting greater speed and efficiency.
  • the requests can be stored in the connectors 108 for application and/or reapplication to the respective databases 108 to provide one-time or periodic data store updates.
  • Connectors can use expiration date information to determine which of a plurality of similar data to return to the data store, or if dates are absent, the connectors can mark returned data as being of lower confidence levels.
  • the con- nector 108c (and/or other functionality not shown) provides for the automated exchange of data between public health partners, as required of nodes in the PHIN network.
  • the connector 108c (and/or other functionality) comprises an ebXML compliant SOAP web service that can be reached via an HTTPS connection after appropriate authentication and comprises,
  • the connector 108c also provides for translation of messages received from the network 120 into a format compatible with the NEDSS and/or other requisite data models specified by the PHIN standards for storage in the data store 114 as detailed further below. And, the connector 108c (or other functionality) facilitates the exchange and management of specimen and lab result information, as required under the PHIN standard.
  • Systems 100 according to the invention used as part of HAN or NEDSS-compatible networks provide similar functionality, as particularly required under those initiatives.
  • Data and other information generated by the databases, streams and other information sources 140 in response to the requests are routed by connectors to the hologram data store 114. That other information can include, for example, expiry or other adjectival data for use by the data store in caching, purging, updating and selecting data.
  • the messages can be cached by the connectors 108, though, they are preferably immediately routed to the store 114.
  • server 116 Any tri- pies implicated by the change are created or changed in store 114C, as are the corresponding RDF document objects in store 114A.
  • An indication of these changes can be forwarded to the respective databases, streams or other information sources 140 via the connectors 108, which utilize the corresponding API (or other interface mechanisms) to alert those sources 140 of updates.
  • changes made directly to the store 114C e.g., using a WebDAV client or otherwise, can be forwarded by the connector 108 to the respective sources 140.
  • the hologram data store 114 stores data from the databases 140 (and from the framework server 116, as discussed below) as RDF triples.
  • the data store 114 can be embodied on any digital data processing system or systems that are in communications coupling (e.g., as defined above) with the connectors 108 andthe framework server 116.
  • the data store 114 is embodied in a workstation or other high-end computing device with high capacity storage devices or arrays, though, this may not be required for any given implementation.
  • the hologram data store 114 may be contained on an optical storage device, this is not the sense in which the term "hologram” is used. Rather, it refers to its storage of data from multiple sources (e.g., the databases 140) in a form which permits that data to be queried and coalesced from a variety of perspectives, depending on the needs of the user and the capabilities of the framework server 116.
  • a preferred data store 114 stores the data from the databases 140 in subject- predicate-object form, e.g., RDF triples, though those of ordinary skill in the art will appreciate that other forms may be used as well, or instead.
  • RDF is a way of expressing the properties of items of data. Those items are referred to as subjects. Their prop- 5 erties are referred to as predicates. And, the values of those properties are referred to as objects.
  • an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object.
  • RDF triples here, expressed in extensible markup language (XML) syntax.
  • XML extensible markup language
  • the listing shows only a sampling of the triples in a data store 114, which typically would contain tens of thous,ands or more of such triples.
  • the second line of the listing defines a subject as a resource named "postal://zip#02886.” That subject has predicates and objects that follow the subject declaration.
  • One predicate, ⁇ town> is associated with a value "Warwick”.
  • Another predicate, ⁇ state> is associated with a value "Rl”.
  • the listing shows properties for the subject " ⁇ ostal://zip#02901,” namely, ⁇ town> "Providence,” ⁇ state> "Rl,” ⁇ country> "US” and ⁇ zip> "02901.”
  • URIs Uniform Resource Identifiers
  • ⁇ scheme> is "postal”
  • ⁇ ath> is "zip”
  • ⁇ fragment> is, for example, "02886” and "02901.”
  • predicates are expressed in the form ⁇ scheme>:// ⁇ path># ⁇ fragment , as is evident to those in ordinary skill in the art.
  • the ⁇ scheme> for the predicates is "http" and ⁇ path> is "www.metatomix.com/ postalCode/1.0.”
  • the ⁇ fragment> portions are ⁇ town>, ⁇ state>, ⁇ country> .and ⁇ zip>, respectively. It is important to note that the listing is in some ways simplistic in that each of its objects is a literal value. Commonly, an object may itself be another subject, with its own objects and predicates. In such cases, a resource can be both a subject and an object, e.g., an object to all "upstream” resources and a subject to all "downstream” resources and properties. Such "branching" allows for complex relationships to be modeled within the RDF triple framework. ,
  • Figure 2 A depicts an architecture for a preferred hologram data store 114 according to the invention.
  • the illustrated store 114 includes a model document store 114A and a model document manager 114B. It also includes a relational triples store 114C, a relational triples store manager 114D, and a parser 114E interconnected as shown in the drawing.
  • RDF triples maintained by the store 114 are received ⁇ from the databases 140 (via connectors 108) and/or from time-based data reduction module 150 (described below) — in the form of document objects, e.g., of the type generated from a Document Object Model (DOM) in a JAVA, C++ or other application.
  • DOM Document Object Model
  • these are stored in the model document store 114A as such (i.e., document objects) particularly, using the tables and inter-table relationships shown in Figure IB (see dashed box labelled 114B).
  • the model document manager 114B manages storage/retrieval of the document object to/from the model document store 114A.
  • the manager 114B manages storage/retrieval of the document object to/from the model document store 114A.
  • the manager 114B
  • WebDAV protocol allows for adding, updating and deleting RDF document objects using a variety of WebDAV client tools (e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors), in addition to adding, updating and deleting document objects via connectors 108 and/or time-based data reduction module 150.
  • WebDAV client tools e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors
  • This also allows for presenting the user with a view of a traversable file system, with RDF documents that can be opened directly in XML editing tools or from Java programs supporting WebDAV protocols, or from processes on remote machines via any HTTP protocol on which WebDAV is based.
  • RDF triples received by the store 114 are also stored to a relational database, here, store 114C, that is managed and accessed by a conventional relational database management system (RDBMS) 114D, operating in accord with the teachings hereof.
  • RDBMS relational database management system
  • the triples are divided into their constituent components (subject, predicate, and object), which are indexed and stored to respective tables in the manner of a "hashed with origin" approach.
  • a parser 114E extracts its triples .and conveys them to the RDBMS 114D with a corresponding indicator that they are to be added, updated or deleted from the relational database.
  • Such a parser 114E operates in the conventional manner known in the art for extracting triples from RDF documents.
  • the illustrated database store 114C has five tables interrelated as particularly shown in Figure 2B (see dashed box labelled 114C). In general, these tables rely on indexes generated by hashing the triples' respective subjects, predicates and objects using a 64-bit hashing algo- rithm based on cyclical redundancy codes (CRCs) — though, it will be appreciated that the indexes can be generated by other techniques as well, industry-standard, proprietary or otherwise.
  • CRCs cyclical redundancy codes
  • the "triples" table 534 maintains one record for each stored triple.
  • Each record contains an aforementioned hash code for each of the subject, predicate and object that make up the respective triple, along with a resource flag ("resource_flg") indicating whether that object is of the resource or literal type.
  • resource_flg a resource flag indicating whether that object is of the resource or literal type.
  • Each record also includes an aforemen-
  • m_hash tioned hash code
  • the values of the subjects, predicates and objects are not stored in the triples table. Rather, those values are stored in the resources table 530, namespaces table 532 and literals table 536. Particularly, the resources table 530, in conjunction with the namespaces table 532, stores the subjects, predicates and resource-type objects; whereas, the literals table 536 stores the literal-type objects.
  • the resources table 530 maintains one record for each unique subject, predicate or resource-type object. Each record contains the value of the resource, along with its aforementioned 64-bit hash. It is the latter on which the table is indexed. To conserve space, portions of those values common to multiple resources (e.g., common ⁇ scheme>:// ⁇ path> identifiers) are stored in the namespaces table 532. Accordingly the field, "r_value,” contained in each record of the resources table 530 reflects only the unique portion (e.g., ⁇ fragment> identifier) of each resource.
  • the namespaces table 532 maintains one record for each unique common portion referred to in the prior paragraph (hereinafter, "namespace"). Each record contains the value of that namespace, along with its aforementioned 64-bit hash. As above, it is the latter on which this table is indexed.
  • the literals table 536 maintains one record for each unique literal-type object. Each record contains the value of the object, along with its aforementioned 64-bit hash. Each record also includes an indicator of the type of that literal (e.g., integer, string, and so forth). Again, it is the latter on which this table is indexed.
  • the models table 538 maintains one record for each RDF document object contained in the model document store 114A.
  • Each record contains the URI of the corresponding document object ("uri_string”), along with its aforementioned 64-bit hash ("m_hash"). It is the latter on which this table is indexed.
  • uri_string the URI of the corresponding document object
  • m_hash 64-bit hash
  • each record of the models table 538 also contains the ID of the corresponding document object in the store 114A. That ID can be assigned by the model document manager 114B, or otherwise.
  • relational triples store 114C is a schema- less structure for storing RDF triples.
  • triples maintained in that store can be reconstituted via an
  • RDF documents and, more generally, objects maintained in the store 114 can be contained in other stores — structured relation- ally, hierarchically or otherwise — as well, in addition to or instead of stores 114A and 114C.
  • the maintenance of data in the store 114 is accomplished in a manner compatible with the applicable PHIN standards, e.g., for the use of electronic clinical data for event detection.
  • data storage is compatible with the applicable logical data model(s), can associate incoming data with appropriate existing data (e.g., a report of a disease in a person who had another condition previously reported), permits potential cases should be "linked” and traceable from detection via electronic sources of clinical data or manual entry of potential case data through confirmation via laboratory result reporting, and permits data to be accessed for report-
  • a system 100 according to the invention used as part of the PHIN network provides directories of public health and clinical personnel accessible as required under the PHIN standards.
  • Systems 100 according to the invention used as part of HAN or NEDSS-compatible networks provide similar functionality, as particularly required under those initiatives.
  • the relational triples store manager 114D supports SQL que- ries such as the one exemplified above (for extracting a triple with the subject "postal:// zip#02886", the predicate "http://www.metatomix.eom/postalCode/l.0#town", and the object
  • the data store 114 can likewise include time- wise data reduction component of the type described in commonly assigned United States Patent Application Serial No. 10/302,727, filed November 21, 2002, entitled METHODS AND APPARATUS FOR STATISTICAL DATA ANALYSIS AND REDUCTION FOR AN ENTERPRISE APPLICATION, now published as PCT WO 03046769 (Application WO2002US0037727), the teachings of which are inco ⁇ o- rated herein by reference (see, specifically, for example, Figure 3 thereof and the accompanying text), a copy of which may be attached as an appendix hereto (and, if so, as Appendix B), to perform a time- wise reduction on data from the database, streams or other sources 140.
  • data store 114 includes a graph generator that uses RDF triples to generate directed graphs in response to queries made — e.g., by a user accessing the store via the browser 118 and server 116, by a surveillance, monitoring and realtime events application executing on the server 116 or in connection with the browser 118, by another node on the network 120 and received electronically or otherwise, or made otherwise — for information reflected by triples originating from data in one or more of the databases, strea. ms or other sources 140.
  • Such generation of directed graphs from triples can be accomplished in any conventional manner known the art (e.g., as appropriate to RDF triples or other manner in which the information is stored) or, preferably, in the manner described in co-pending, commonly assigned United States Patent Application Serial No. 10/138,725, filed May 3,
  • the data store 114 can utilize genetic, self-adapting, algorithms to traverse the RDF triples in response to such queries.
  • the data store utilizes a genetic algorithm that performs several searches, each utilizing a dif- ferent methodology but all based on the underlying query from the framework server, against the RDF triples. It compares the results of the searches quantitatively to discern which produce(s) the best results and reapplies that search with additional terms or further granularity.
  • surveillance, monitoring and real-time events appli- cations executing on the connectors 108, the server 116, the browser .and/or the data store 114 utilize an expert engine-based system 8 of the type shown in Figure 3 to identify information in the data store 114 and/or from sources 140 responsive to queries and/or otherwise for presentation via browser 118, e.g., in the form of alerts, reports, or otherwise.
  • the information so identified can, instead or in addition, form the basis of further processing, e.g., by such surveil- lance, monitoring and real-time events applications, in the form of broadcasts or messages to other nodes in the network 120, or otherwise, consistent with requirements of PHIN, HAN or other applicable standards.
  • the system 8 can be used to process data incoming from the sources 140 to determine whether it should be ignored, stored, logged for alert or classified otherwise.
  • Data reaching a certain classification limit can be displayed via the browser 118 and, more particularly, the dashboard discussed below, e.g., along with a map of the state, country or other relevant geographic region and/or along with other similar data.
  • the expert engine-based system 8 can be used to detect the numbers of instances occurring over time and, if the number exceeds a threshold, to generate a report, e.g., for display via a dashboard window,
  • the expert engine can also be used to subset data used for display or reporting in connection with the collaborative function, e.g., specified under the CDC's HAN guidelines.
  • the system 8 includes a module 12 that executes a set of rules 18 with respect to a set of facts 16 representing criteria in order to (i) generate a subset 20 of a set of facts 10 representing an input data set, (ii) trigger a further rule, .and/or (iii) generate an alert, broadcast, message, or otherwise.
  • a module 12 that executes a set of rules 18 with respect to a set of facts 16 representing criteria in order to (i) generate a subset 20 of a set of facts 10 representing an input data set, (ii) trigger a further rule, .and/or (iii) generate an alert, broadcast, message, or otherwise.
  • the set of facts 16 representing criteria are referred to as "criteria” or "criteria 16”
  • data data
  • data 10 data 10
  • Illustrated module 12 is an executable program (compiled, inte ⁇ reted or otherwise) embodying the rules 18 and operating in the manner described herein for identifying subsets of directed graphs.
  • module 12 is implemented in Jess (Java Expert System Shell), a rule-based expert system shell, commercially available from Sandia National Laboratories. However it can be implemented using any other "expert system” engine, if-then- else network, or other software, firmware and/or hardware environment (whether or not expert system-based) suitable for adaptation in accord with the teachings hereof.
  • the module 12 embodies the rules 18 in a network representation 14, e.g., an if-then- else network, or the like, native to the Jess environment.
  • the network nodes are preferably executed so as to effect substantially parallel operation of the rules 18, though they can be executed so as to effect serial .and/or iterative operation as well or in addition.
  • the rules are represented in accord with the specifics of the corresponding engine, if- then-else network, or other softw.are, firmware and/or hardware environment on which the embodiment is implemented. These likewise preferably effect parallel execution of the rules 18, though they may effect serial or iterative execution instead or in addition.
  • the data set 10 can comprise any directed graph, e.g., a collection of nodes representing data and directed arcs connecting nodes to one another, though in the illustrated embodiment it comprises RDF triples contained in the data store 114 and/or generated from information received from the sources 140 via connectors 108.
  • the data set can comprise data structures representing a meta directed graph of the type disclosed in co-pending, commonly assigned United States Patent Application Serial No.
  • Criteria 16 contains expressions including, for example, literals, wildcards, Boolean operators and so forth, against which nodes in the data set are tested.
  • the criteria can specify subject, predicate and/or object values or other attributes.
  • the criteria can be input by a user, e.g., via browser 118, e.g., on an ad hoc basis. Alternatively or in addition, they can be generated by surveillance, monitoring and real-time events applications executing on the connectors 108, the server 116, the browser and/or the data store 114.
  • Rules 18 define the tests for identifying data in the data set 20 that match the criteria or, where applicable, are related thereto. These are expressed in terms of the types and values of the data items as well as their interrelationships or connectedness.
  • a set of rules applicable to a data set comprised of RDF triples for identifying triples that match or are related to the criteria are disclosed in aforementioned inco ⁇ orated by reference United States Patent Application Serial No. 60/416,616, filed October 7, 2002, entitled METHODS AND APPARATUS FOR IDENTIFYING RELATED NODES IN A DIRECTED GRAPH HAVING NAMED ARCS (see, Appendix C hereof).
  • the data 20 output or otherwise generated by module 12 represents those triples matching (or, where applicable, related) to the criteria as determined by exercise of the rules.
  • the data 20 can be output as triples or some alternate form, e.g., pointers or other references to identified data within the data set 10, depending on the needs of the surveillance, monitoring and real-time events application that invoked the system 8.
  • the module 12 instead of or in addition to outputting data 20, the module 12 triggers execution of further rules, generate alerts, broadcasts, messages, or otherwise, consistent with requirements of PHIN, HAN or other applicable standards.
  • the framework server 116 presents information from the data store 114 and or sources 140 via browser 118. This can be based on requests entered directly by the user directly, e.g., in response to selections/responses to questions, dialog boxes or other user-input controls gen- erated by a surveillance, monitoring and real-time events application executing on the server 116 or in connection with the browser 118. It can also be based, for example, on information obtained from the database 114 and/or sources 140 by the expert engine-based system 8 described above.
  • a surveillance, monitoring and real-time events application includes a "dashboard" with display windows or panels that provide comprehensive real-time displays of information gathered from the data store 114 or other sources 140, as well as “alerts” resulting from anomalous situations detected by the surveillance, monitoring and real-time events application.
  • the dashboard and alerts can be generated by an application executing on the server 116 and/or the browser 118 or otherwise.
  • monitoring and real-time events dashboards can display information and alerts that are specific to predefined categories, such as boarder and port security, health and bioterrorism, or public and community safety. These can be configured by users to display information from ad hoc combinations of data sources and user-defined alerts. For the pu ⁇ ose of describing the structure and operation of the surveillance, monitoring and real-time events dashboards, reference will be made to two representative examples (boarder/port security and health/bioterrorism), although these descriptions apply to other predefined and user-defined categories of information.
  • FIG. 4 illustrates a border/port security dashboard 400.
  • the dashboard displays several panels 402, 404, 406, 408, 410, 412 and 414.
  • Panel 402 can be used to display information relating to an alert, if one has been issued by the surveillance, monitoring and real-time events application or by an external system. Panel 402 is described in more detail below.
  • Each panel 404-414 displays information from a particular data source or an aggregation of data from several data sources.
  • panel 404 can contain real-time radar data from the US Coast Guard superimposed on a satellite image of Boston's inner harbor.
  • the panel 404 display can be augmented with other Coast Guard data.
  • GPS global positioning system
  • US Coast Guard vessels and vehicles can be used to identify and then look up information related to these units.
  • the unit identities can be superimposed on the image displayed in panel 404, as shown at 416, 418 and 420. Double-clicking on one of these units can cause the surveillance, monitoring and real-time events application to display infor- mation about the unit.
  • This information can include, for example, contact information (e.g. frequency, call sign, name of person in charge, etc.), capabilities (e.g. maximum speed, crew size, weaponry, fire-fighting equipment, etc.) and status (e.g. docked, patrolling, busy intercepting a vessel, etc.).
  • Panel 406 can contain real-time data from a port authority superimposed on a map of the inner harbor.
  • port authority data can include information related to the inner harbor that is different than information provided by the US Coast Guard.
  • the port authority data can include information on vessels traveling or docked within the inner harbor.
  • the port authority data can relate to more than just the inner harbor.
  • the port authority data can include information related to an a port and a rail yard.
  • Panel 410 and 412 can display information from other data sources, such as US Customs and local or state police.
  • Panel 408 displays a current Homeland Security Advisory System threat level.
  • Panel 414 displays contact information for agencies, such as the US Coast Guard, US Customs, port authority and state police, that might be invoked in case of an alert.
  • a user can double-click on any panel to display a separate window containing the panel. By this mechanism, the user can enlarge any panel.
  • the user can zoom in on a portion of the image displayed by a panel. For example, the user can select a point on the panel display to re-center the display to the selected point and zoom in on that point.
  • the user can select a rectangular portion of the panel display using a "rubber band" cursor and instruct the system to fill the entire panel with the selected portion.
  • Figure 5 illustrates an example of such a window 500 displaying the port authority panel 406 of Figure 4.
  • a user can, for example, double-click on a vessel 502 to display information about the vessel.
  • Figure 6 illustrates an example of a pop-up window 600 that displays information about the selected vessel.
  • panels 402-414 contain graphical displays
  • other panels can con- tain textural or numeric data.
  • panels containing shipping schedules, airline schedules, port volume statistics, recent headlines, weather forecasts, etc. can be available for display.
  • other graphical panels such as current meteorological data for various portions of the world, can also be available.
  • the surveillance, monitoring and real-time events application can make available more panels than can be displayed at one time on the dashboard 400 ( Figure 4).
  • the dashboard 400 can display a default set of panels, such as panels 404-414.
  • the user can select which panels to display in the dashboard 400, as well as arrange the panels within the dashboard and control the size of each panel. If it is deemed desirable to display more panels than can be displayed at one time, some or all of the desired panels can be displayed on a round-robin basis.
  • the surveillance, monitoring and real-time events application can include rules and/or heuristics to automatically detect anomalies and alert users to these anomalies
  • the surveillance, monitoring and real-time events application preferably can select one or more panels containing particularly relevant information and display or enlarge those panels.
  • the selected panels need not be ones that the user could select.
  • the surveillance, monitoring and real-time events application can create a new panel that includes a combination of data from several sources, the sources being selected by rule(s) that caused the alert to be issued.
  • the surveillance, monitoring and real-time events application can include rules describing permitted, required and/or prohibited behavior of vessels in these shipping lanes 700 and 702. Some rules can apply to all vessels. Other rules can apply to only certain vessels, for example according to the vessels' types, cargos, speeds, country of registry, as well as according to data unrelated to the vessels, such as time of day, day of week, season, Homeland Security Advisory System threat level, amount of other harbor traffic or amount or schedule of non-harbor traffic, such as aircraft at an adjacent ai ⁇ ort. Other rules can apply to docked vessels, vessels under tow, etc.
  • rules can apply to aircraft, vehicles, or any measurable quantity, such as air quality in a subway station, seismic data, voltage in a portion of a power grid or vibration in a building, bridge or other structure. Rules can also apply to data entered by humans, such as the number of reported cases of food poisoning or quantities of antibiotics prescribed, ordered or on hand during a selected period of time.
  • the dashboard 400 displays a default set of panels or a set of panels selected by the user, as previously described. If, for example, the previously mentioned tanker vessel 502 ( Figure 7) carrying a hazardous cargo, such as liquefied natural gas (LNG), deviates 704 from a prescribed course, the surveillance, monitoring and real-time events application can issue an alert.
  • a hazardous cargo such as liquefied natural gas (LNG)
  • LNG liquefied natural gas
  • other vessels can trigger the alert. For example, if the LNG tanker 502 is traveling within its prescribed course, but a high-speed vessel (not shown) or an aircraft is on a collision course with the LNG tanker, the surveillance, monitoring and real-time events application can issue an alert.
  • the surveillance, monitoring and real-time events application displays the alert panel 402 ( Figure 4) and an alert message 422.
  • the alert panel 402 Figure 4
  • the surveillance, monitoring and real-time events application can automatically notify a predetermined list of people or agencies.
  • the particular people or agencies can depend on factors, such as the time
  • the surveillance, monitoring and real-time events application can notify other users at other nodes, such as nodes 100b, 100c, lOOd and/or lOOe ( Figure 1).
  • Information displayed on dashboards (not shown) at these other nodes lOOb-e need not be the same as information displayed on the dashboard 400.
  • the information displayed on these other nodes lOOb-e can be more or less detailed than the information displayed on the dashboard 400.
  • summary information such as an icon displayed on a map of the United States, can be displayed at command/control node to indicate an alert in Boston, without necessarily displaying all details related to the alert.
  • a user at the command/ control node can double-click on the icon to obtain more detailed information.
  • FIGS 8-16 illustrate an exemplary dashboard that can be used in a health and bioterrorism context.
  • Figure 8 illustrates a dashboard 800 that contains several panels 802, 804, 806, 808 and 810.
  • Panel 802 contains a map of the United States with icons 812, 814, 816 indicating locations of three alerts.
  • Panel 804 contains emergency contact information that is relevant to the alerts.
  • Panel 806 contains hyperlinks to discussion forums, in which agency representatives and other authorized groups and people can post messages and replies, as is well known in the art.
  • Panel 808 contains hyperlinks to information that is relevant to the alerts.
  • Panel 810 displays the current Homeland Security Advisory System threat level. These panels will be described in more detail below.
  • the icons 812, 814 and 816 represent medical care providers that have experienced noteworthy events or levels of activity.
  • an alert can be issued if, for example, the number of cases of disease, such as influenza, exceeds a predetermined threshold.
  • Provider 3 has encountered patients with pneumonia that does not respond to antibiotics.
  • the other alerts could relate to other anomalous events or levels of activity.
  • Clicking the icon 816 causes the system to display information 818 related to the selected alert.
  • Clicking on a link 820 causes the system to display more detailed information about the alert.
  • Figure 9 illustrates two panels 902 and 904, as well as a user selection area 906, that can be displayed.
  • Panel 902 contains a more detailed map of the area in which the event occurred.
  • Panel 904 list the number of cases by zip code of the patients.
  • User selection area 906 enables the user to select one or more of the alerts, thereby selecting or aggregating data from the selected provider(s) for display in panels 902 and 904.
  • panel 804 contains icons for government agencies and other individuals or organizations (collectively “responders”) that might be called upon to respond to manage a biological, nuclear, foodborne or other situations identified by the expert engine-based system 8 (e.g., as where the number of instances matching a specified critereon exceeds a threshold).
  • Responders e.g., as where the number of instances matching a specified critereon exceeds a threshold.
  • Clicking link 822 displays a window containing emergency contact
  • Panel 1002 contains several emergency callout options, by which the user can manage the alerts. For example, clicking "Message Board” link 1004 displays a window containing messages posted in relation to this alert, as shown in Figure 11 at 1100. This message board enables users and responders to com- municate with each other in relation to the alert. An "Initiate a new Callout” link 1102 enables the user to initiate a new situation, as shown in Figure 12.
  • the surveillance, monitoring and real-time events application In response to an alert, the surveillance, monitoring and real-time events application automatically performs searches of the Internet and responder intranets for information rele- vant to the alert.
  • panel 808 ( Figure 8) contains hyperlinks to information that is relevant to the alerts, including results from these searches and predefined information sources that have been identified as relevant.
  • the surveillance, monitoring and realtime events application can, for example, have a database of information sources catalogued according to alert type. As shown in Figure 13, clicking on one of the hyperlinks in the panel 808 opens a new window 1300 displaying contents identified by the hyperlink.
  • the user can select a module via a pull-down list 824.
  • the user can select "Reports", in which case the system displays a window similar to that shown in Figure 14.
  • the system displays a report in a report panel 1406.
  • Figure 15 illustrates another graphical display 1500, by which the system can display an alert.
  • the system displays information, such as proximity of the outbreak to the nearest residential area, as well as the population of the residential area, proximity to the nearest emergency medical center and the number of free beds in the medical center.
  • the surveillance, monitoring and real-time events application can query those hospital systems and display relevant information, as shown in Figure 16.
  • the invention pertains to digital data proce ⁇ ing ⁇ , more p.articularly, to methods and apparatus for ente ⁇ rise business visibility and insight using real-time reporting tools.
  • a major impediment to ente ⁇ rise business visibility is the consolidation of data from these disparate legacy databases with one another and with that from newer e-commerce databases. For instance, inventory on-hand data gleaned from a legacy ERP system may be diffi- cult to combine with customer order data gleaned from web servers that .support e-commerce (and other web-b.ased) transactions. This is not to mention difficulties, for example, in consolidating resource scheduling data from the ERP system with the forecasting data from the marketing database system.
  • An object of this invention is to provide improved methods and apparatus for digital data processing and, more particul.arly, for ente ⁇ rise business visibility and insight (hereinafter, "ente ⁇ rise business visibility").
  • a further object is to provide such methods and apparatus as can rapidly and accurately retrieve information responsive to user inquiries.
  • a further object of the invention is to provide such methods .and app.aratus as can be readily .and inexpensively integrated with legacy, current and future database management systems.
  • a still further object of the invention is to provide such methods and apparatus as can be implemented incrementally or otherwise without interruption of ente ⁇ rise operation.
  • Yet a still further object of the invention is to provide such methods and apparatus as to facilitate ready access to up-to-date enterprise data, regardless of its underlying source.
  • Yet still a further object of the invention is to provide such methods and apparatus as permit flexible presentation of ente ⁇ rise data in an easily understood manner.
  • the invention provides, in one aspect, a method of searching an RDF triples data store of the type in which the ' triples are maintained in accord with a first .storage schema.
  • the method includes inputting a first query based, for example, on a user request, specifying RDF triples that are to be identified in the data store. That first query assumes either (i) that the triples are stored in a schema-less manner (i.e., with no storage schema) or (ii) that the triples are maintained in accord with a second storage schema that differs from the first.
  • the method further includes generating, from the first query, a second query that specifies those same RDF triples, yet, that reflects the first storage schema. That second query can be applied to the RDF triples data store in order to identify .and/or retrieve the desired data.
  • the invention provides, in further aspects, a method as described above including the .steps of examining the first query for one or more tokens that represent data to be used in generating the second query. It also includes dispatching context-specific grammar events containing that data.
  • Arelated aspect of the invention provides for dispatching events that represent any of declarations and constraints specified in the first query.
  • a still further related aspect provides for dispatching declaration events specifying RDF documents from which triples are to be identified and constraint events specifying the triples themselves.
  • Further aspects of the invention provides methods as described above that include the steps of extracting statement data from the first query and associating that .statement data with at least a portion of the second query. That second query can be generated, according to related aspects of the invention, in the form of an SQL SELECT statement
  • the associating step can include associating statement data from the first query with one or more clauses of the SELECT statement, to wit, the SELECT clause, the FROM clause, the WHERE clause and the ORDER- BY clause.
  • aspects of the invention provide a method of translating a schema-less input query in a first language to .an output query in a second l.anguage.
  • the method includes examining the schema-less input query for one or more tokens that represent data to be used in generating the output query; dispatching context-specific gr.amm.ar events containing that data; and populating portions of the output query according to the events and data.
  • the method further includes generating the output query in the second language comprising those populated portions, where the output query embodies a schema of a relational database storing RDF triples.
  • a related aspect of the invention provides methods as described above in which the dispatching step includes generating any of a logical condition event, a selection term decimation event, .and a triple declarations event.
  • a further related aspect of the invention includes generating a logical condition event containing data which, when applied to the relational data- base via the output query, identifies RDF triples according to a specific Boolean condition.
  • a further related aspect of the invention includes generating sa event containing data which, when applied to the relational database via the output query identifies RDF triples including a specified term.
  • a still further related aspect of the invention includes generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples having a specified subject predicate and/or object.
  • Figure 1 depicts an improved ente ⁇ rise business visibility and insight system according invention
  • Figure 1 A depicts an architecture for a hologram data store according to the invention, e.g., in the system of cl.aim 1;
  • Figure IB depicts the tables in a model store and a triples store of the hologram data store of Figure 1A;
  • Figure 2 depicts a directed graph representmg data triples of the type maintained in a data store according to the invention.
  • Figure 3 is a functional block diagram of a query translator module in a system according to the invention.
  • FIG. 1 depicts a real-time enterprise business visibility and insight system according to the invention.
  • the illustrated system 100 includes connectors 108 that provide software interfaces to legacy, e-commerce and other databases 140 (hereinafter, collectively, “legacy databases”).
  • a “hologram” database 114 (hereinafter, “data store” or “hologram data store”), which is coupled to the legacy databases 140 via the connectors 108, stores data from those databases 140.
  • a framework server 116 accesses the data store 114, presenting selected data to (and permitting queries from) a user browser 118.
  • the server 116 can also permit updates to data in the data store 114 and, thereby, in the legacy datab.ases 140.
  • Legacy databases 140 represent existing (and future) databases and other sources of information (including data streams) in a company, organization or other entity (hereinafter "ente ⁇ rise”).
  • these include a retail e-commerce database (e.g., as indicated by the cloud and server icons adjacent database 140c) maintained with a Sybase® database management system, an inventory database maintained with an Oracle® database management system and an ERP database maintained with a SAP® Ente ⁇ rise Resource Planning system.
  • a retail e-commerce database e.g., as indicated by the cloud and server icons adjacent database 140c
  • an inventory database maintained with an Oracle® database management system
  • ERP database maintained with a SAP® Ente ⁇ rise Resource Planning system.
  • SAP® Ente ⁇ rise Resource Planning system SAP® Ente ⁇ rise Resource Planning system
  • Connectors 108 serve as an interface to legacy database systems 140. Each connector applies requests to, and receives information from, a respective legacy database, using that database's API or other interface mechanism. Thus, for example, connector 108a applies requests to legacy database 140a using the corresponding SAP API; connector 108b, to legacy database 140b using Oracle API; and connector 108c, to legacy database 140c using the corresponding Sybase API.
  • the requests can be simple queries, such as SQL queries and the like (e.g., depending on the type of the underlying database and its API) or more complex sets of queries, such as those commonly used in data mining.
  • one or more of the connectors can use decision trees, statistical techniques or other query and analysis mechanisms l nown in the art of data mining to extract information from the databases.
  • Specific queries and analysis methodologies can be specified by the hologram data store 114 or the framework server 116 for application by the connectors.
  • the connectors themselves can construct specific queries and methodologies from more general queries received from the data store 114 or server 116. For example, request-specific items can be "plugged" into query templates thereby effecting greater speed and efficiency.
  • the requests can be stored in the connectors 108 for application and/or reapplication to the respective legacy databases 108 to provide one-time or periodic data store updates.
  • Connectors can use expiration date information to determine which of a plurality of similar data to return to the data store, or if dates are absent, the connectors can mark returned data as being of lower confidence levels.
  • Data and other information generated by the databases 140 in response to the requests are routed by connectors to the hologram data store 114. That other information can include, for example, expiry or other adjectiv data for use by the data store in caching, purging, updating and selecting data.
  • the messages can be cached by the connectors 108, though, they are preferably immediately routed to the store 114.
  • the hologram data store 114 stores data from the legacy databases 140 (and from the framework server 116, as discussed below) as RDF triples.
  • the data store 114 can be embodied on any digital data processing system or systems that are in communications coupling (e.g., as defined above) with the connectors 108 .and the framework server 116.
  • the data store 114 is embodied in a workstation or other high-end computing device with high capacity storage devices or arrays, though, this may not be required for any given implementation.
  • the hologram data .store 114 may be contained on an optical storage device, this is not the sense in which the term "hologram" is used. Rather, it refers to its storage of data from multiple sources (e.g., the legacy databases 140) in a form which permits that data to be queried and coalesced from a variety of perspectives, depending on the needs of the user .and the capabilities of the framework server 116.
  • sources e.g., the legacy databases 140
  • a preferred data store 114 stores the data from the legacy databases 140 in subject-predicate-object form, e.g., RDF triples, though those of ordin.ary skill in the art will appreciate triat other forms may be used as well, or instead.
  • RDF is a way of expressing the properties of items of data. Those items are referred to as subjects. Their properties .are referred to as predicates. And, the values of those properties are referred to as objects.
  • an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object.
  • Subjects, also referred to as resources can be anything that is described by an RDF expression.
  • a subject can be person, place or thing— though, typically, only an identifier of the subject is used in an actual RDF expression, not the person, place or thing itself. Examples of subjects might be "car,” “Joe,” “http://www.metatomix.com.” 5
  • a predicate identifies a property of a subject According to the RDF specification, this may be any "specific aspect, characteristic, attribute, or relation used to describe a resource.” For the three exemplary subjects above, examples of predicates might be "make,” “citizenship,” “owner.” 0
  • Q Objects can be literals, i.e., strings that identify or name the coiresponcling property
  • predicate They can .also be resources.
  • Methodatomix, Inc.” further triples may be specified— presumably, ones identifying that company in the subject and giving details in predicates and objects. ;
  • a given subject may have multiple predicates, each predicate indexing an object
  • a subject postal zip code might have an index to an object town and an index to an object state, either (or both) index being a predicate URI.
  • RDF triples here, expressed in extensible markup language (XML) syntax.
  • XML extensible markup language
  • Subjects are indicated within the listing using a "rdfiabout" statement.
  • the second line of the listing defines a subject as a resource named "postal://zip#02886.” That subject has predicates .and objects that follow the subject declaration.
  • URIs uniform resource indicators
  • TJRI Uniform Resource Identifiers
  • RRC 2396 Generic Syntax
  • the predicates are expressed in the form ⁇ scheme>:// ⁇ path># ⁇ fragment>, as is evident to those in ordinary skill in the art.
  • an object may itself be another subject, with its own objects and predicates.
  • a resource can be both a subject and an object, e.g., an object to all "upstream” resources and a subject to all "downstream” resources and properties.
  • Such "branching” allows for complex relationships to be modeled within the RDF triple framework.
  • Figure 2 depicts a directed graph composed of RDF triples of the type stored by the illustrated data store 114, here, by way of non-limiting example, triples representing relationships .among four companies (id#l, id#2, id#3 and id#4) and between two of those companies (id#l and id#2) and their employees.
  • id#l, id#2, id#3 and id#4 per convention, subjects and resource-type objects are depicted a . s oval-shaped nodes; literal-type objects axe depicted as rectangular nodes; and predicates are depicted as arcs connecting those nodes.
  • Figure 1 A depicts £tn architecture for a preferred hologram data store 114 according to the invention.
  • the illustrated store 114 includes a model document store 114A and a model document manager 114B. It also includes a relational triples store 114C, a relational triples store manager 114D, and a parser 114E interconnected as shown in the drawing.
  • RDF triples maintained by the store 114 are received — from the legacy databases 140 (via connectors 108) and/or from time-based data reduction module 150 (described below) -- in the form of document objects, e.g., of the type generated from a Document Object Model (DOM) in a JAVA, C++ or other application.
  • DOM Document Object Model
  • these are stored in the model document store 114A as such (i.e., document objects) particularly, using the tables and inter-table relationships shown in Figure IB (see dashed box labelled 114B).
  • the model document manager 114B manages storage/retrieval of the document object to/from the model document store 114A.
  • the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache SoftwEire Foundation. It stores (and retrieves) document objects to (and from) the store 114A in accord with the WebDAV protocol.
  • the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache SoftwEire Foundation. It stores (and retrieves) document objects to (and from) the store 114A in accord with the WebDAV protocol.
  • Those skilled in the art will, of course, appreciate that other applications can be used in place of Slide and that document objects can be stored/retrieved from the store 114A in accord with other protocols, industry- standard, proprietary or otherwise.
  • WebDAV protocol allows for adding, updating and deleting RDF document objects using a variety of WebDAV client tools (e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors), in addition to adding, updating and deleting document objects via connectors 108 and/or time-based data reduction module 150.
  • WebDAV client tools e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors
  • This al lows for presenting the user with a view of a traversable file system, with RDF documents that can be opened directly in XML editing tools or from Java programs supporting WebDAV protocols, or from processes on remote machines via any HTTP protocol on which WebDAV is based.
  • RDF triples received by the store 11 are also stored to a relational database, here, store
  • RDBMS relational database management system
  • a parser 114E extracts its triples and conveys them to the RDBMS U4D with a corresponding indicator that they .are to be added, updated or deleted from the relational database.
  • Such a parser 114E operates in the conventional m.anner known in the art for extracting triples from RDF documents.
  • the illustr.ated database store 114C has five tables interrelated as particulwly shown in
  • Figure IB (see d.ashed box labelled 114C).
  • these tables rely on indexes generated by hashing the triples' respective subjects, predicates and objects using a 64-bit hashing algorithm based on cyclical redundancy codes (CRCs) — though, it will be appreciated that the indexes can be generated by other techniques as well, industry-standard, proprietary or other- wise.
  • CRCs cyclical redundancy codes
  • the "triples" table 534 maintains one record for e.ach stored triple.
  • Each record contains an aforementioned hash code for each of the subject, predicate and object that make up the respective triple, along with a resource flag (“resource_flg”) indicating whether that object is of the resource or literal type.
  • resource_flg a resource flag
  • Each record also includes an aforementioned hash code (“m_hash”) identifying the document object (stored in model document store 114A) from which the triple was parsed, e.g., by parser 114E.
  • the values of the subjects, predicates and objects are not stored in the triples table. Rather, those values are stored in the resources table 530, namespaces table 532 and literals table 536.
  • the resources table 530 in conjunction with the namespaces table 532, stores the subjects, predicates and resource-type objects; whereas, the literals table 536 stores the literal-type objects.
  • the resources table 530 maintains one record for each unique subject, predicate or resource-type object Each record contains the value of the resource, along with its aforementioned 64-bit hash. It is the latter on which the table is indexed.
  • portions of those values common to multiple resources e.g., common ⁇ scheme> * y/ ⁇ path> identifiers)a . re stored in the namespaces table 532. Accordingly the field, "r_value,” contained in each record of the resources table 530 reflects only the unique portion (e.g., ⁇ fragmem identifier) of each resource.
  • the namespaces table 532 maintains one record for each unique common portion referred to in the prior paragraph (herein-after, "namespace"). Each record 'contains the value of that namespace, along with its aforementioned 64-bit hash. As above, it is the latter on which this table is indexed.
  • the literals table 536 maintains one record for each unique literal-type object Each record contains the value of the object, along with its .aforementioned 64-bit hash. Each record also includes an indicator of the type of that literal (e.g., integer, string, and so foith). Again, it is the latter on which this table is indexed.
  • the models table 538 maintains one record for each RDF document object contained in the model document store 114A.
  • Each record contains the URI of the corresponding document object ("uri_.string”), along with its aforementioned 64-bit hash ("m_hash"). It is the latter on which this table is indexed.
  • uri_.string the URI of the corresponding document object
  • m_hash 64-bit hash
  • each record of the models table 538 also contains the ID of the corresponding document object in the store 114A. That ID can be assigned by the model document manager 114B, or otherwise.
  • relational triples store 114C is a schema- less structure for storing RDF triples.
  • triples maintained in that store can be reconstituted via an SQL query. For example, to reconstitute the RDF triple having a subject equal to "postal://zip#02886", apredicate equal to "http://wwwjnetatomix.com/ postaICode/1.0#town", and an object equal to "Warwick”, the following SQL statement is applied:
  • RDF documents and, more gener- ally, objects maintained in the store 114 can be contained in other stores — structured relation- ally, hierarchically or otherwise — as well, in addition to or instead of stores 114A and 114C.
  • the relational triples .store manager 114D supports SQL queries such as the one exemplified above (for extracting a triple with the subject "postal;// zip#02886", the predicate "ht .v7www.metatormx.com/postaICode/l .0#town", and the object "Warwick”).
  • SQL queries must take into account the underlying storage schema of the relational database (here, hashed by origin).
  • a query translator 190 translates schema-less queries 12 into schema-based SQL queries 642 for application to the relational store 114C.
  • the schema-less queries are expressed in an SQL-like language (here, identified as ' ⁇ xQL") or in an XML-like language (here; identified as "HxML”), however, it will be appreciated that any language or means for expressing a query, schema-less or otherwise, may be used instead or in addition.
  • the illustrated query translator 190 has a language-parsing component 602, an event- processing component 604, and an SQL statement management/generation component 606.
  • the language-parsing component 602 examines the input query 612 for tokens that represent data to be used in generating the SQL statement 642.and dispatches context-specific grammar events containing that data to the event processor.
  • the event processor receives these and retrieves the data stored within them for use by statement management/generation component 606 to generate the SQL SELECT statement 642.
  • the language-parsing component 602 has two parsing elements, each directed to one of two languages in which schema-less queries 612 can be expressed.
  • the HxQL parser 608 parses queries expressed in the HxQL language, while the HxML parser 610 parses queries expressed in the HxML.
  • HxQL grammar is based on R.V. Guha's RDFDB query language, Libby Miller's SquishQL and Andy Seaborne's RDQL.
  • the HxQL parser 608 is implemented using JavaCC, a commercially available parser developed jointly by Sun Microsystems and Metamata.
  • HxML comprises a grammar based on XML.
  • the HxML parser 610 is implemented using an XML parser, such as Xerces available from Apache. It will be appreciated that in other embodiments, the lmguage-parsing component 602 can have more, or fewer, parsing elements, and that those elements can be used to parse other languages in which the input query may be expressed.
  • the illustrated language-parsing component 602 can dispatch eight events. For example, a global document declaration event is dispatched indicating that a RDF document specified by a URI is included in the optional set of default document models to query. A logical condition event is dispatched when a constraint is parsed limiting triple data that is to be con- sidered for retrieval. A namespace declaration event is dispatched when a mapping has been declared between an alias id and a URI fragment. An order by declaration event is dispatched when a record sorting order is specified with regard to columns of data representing terms selected for retrieval. A selection term declaration event is dispatched when a term is selected for retrieval.
  • a global document declaration event is dispatched indicating that a RDF document specified by a URI is included in the optional set of default document models to query.
  • a logical condition event is dispatched when a constraint is parsed limiting triple data that is to be con- sidered for retrieval.
  • a namespace declaration event is dispatched
  • a triple decl£iration event is dispatched when a criterion for triple consideration is declared.
  • a triple document decimation event is dispatched when at least one URI for an RDF document is declared to replace the set of default document models to query against but for a single particular triple criterion.
  • a triple model-mapping event is dispatched when the set of default document models to query against for an individual triple criterion will be sh-ared with a different individual triple criterion. It will be appreciated that more, or less, that these events are only examples of ones that can be dispatched, and in one embodiment, more (or less) events .are appropriate depending on the schema of the datable to be searched.
  • the event-processing component 604 listens for context-specific grammar events and extracts the data stored within them to populate the statement managment/generator component 606 with the data it needs for generating the SQL SELECT statement 642. For example, a Boolean constraint represented in a logical condition event is extracted and dispatched to the statement management/generation component 606 for inclusion in a SELECT WHERE clause of a SQL SELECT statement.
  • the statement management generation component 606 stores and manages statement data and maps it directly to the relational triples store 114C schema. It uses that mapped data to generate an output query 642 corresponding to the input query 612.
  • the statement manager 606 delegates the generation of the SQL SELECT statement to agent objects 634-640. E.ach agent generates a particular clause of the SELECT statement, e.g., the SELECT, FROM, WHERE and ORDER-BY clauses.
  • the statement manager can generate queries according to a different database storage schema and can output queries conforming to other languages.
  • a select clause agent 634 generates the SELECT clause by mapping each term to the appropriate table and/or field name corresponding to tables/field names in triples data store 114C.
  • a from clause agent 636 generates the FROM clause and ensures that table instances .and their alias abbreviations are declared for use in other clauses.
  • clause agent 638 generates the WHERE clause and ensures that all necessary table JOINS and filtering constraints are specified.
  • an order-by clause agent 640 generates an optional ORDER-BY clause thus specifying an order of the output results.
  • the agent objects distribute SQL generation between custom fragment managers and uses dif- fering agents in accord with the databa to be searched.
  • agents are exemplary of a query translator 600 directed to generating queries for a relational triple store 114C, .and in other embodiment, agents will be in accord with the data store of that embodiment.
  • Each agent can also gather data from other agents as necessary, for example, alias information stored in a SELECT clause can be used to formulate constraints in the WHERE clause.
  • the agents work in tandem until .all statement data is properly "mapped" according to the schema of the triples store 114C.
  • the query translator 600 can be encapsulated and composited into other software components. It will also be appreciated that although the query translator 160 is directed toward an RDF triples store utilizing the hash with origin schema, it can generate output for use with triples (or other) stores utilizing other database vendors. For example, the query translator 160 can be implemented to output various SQL dialects, e.g., Microsoft SQL, which uses 0 and 1 for Boolean values versus the conven- tional TRUE/FALSE keywords. Further, configurable options such as generating SQL with or without computed hash codes in join criteria can be accommodated, as well.
  • Microsoft SQL which uses 0 and 1 for Boolean values versus the conven- tional TRUE/FALSE keywords.
  • configurable options such as generating SQL with or without computed hash codes in join criteria can be accommodated, as well.
  • a schema-less query 612, here expressed in the HxQL language, for returning all blood types stored in the triples store 114C is as follows:.
  • the data store 114 includes a graph generator (not shown) that uses RDF triples to generate directed graphs in response to queries (e.g., in HxQl or HxML form) from the framework server 116. These may be queries for information reflected by triples originating from data in one or more of the legacy databases 140 (one example might be a request for the residence cities of hotel guests who booked reservations on account over Independence Day weekend, as reflected by data from an e-Commerce database and an Accounts Receivable database). Such generation of directed graphs from triples can be accomplished in any conventional manner known the art (e.g., as appropriate to RDF triples or other maimer in which the information is .stored) or, preferably, in the manner described in co-pending, commonly assigned United States Patent Application Serial No.
  • the data store 114 utilizes genetic, self- adapting, algorithms to traverse the RDF triples in response to queries from the framework server 116.
  • genetic, self-adapting, algorithms can be beneficially applied to the RDF database which, due to its inherently flexible (i.e., schema-less) structure, is not readily se.arched using tradition search techniques.
  • the data store utilizes a genetic algorithm that performs several searches, each utilizing a different methodology but all based on the underlying query from the framework server, against the RDF triples. It compares the results of the searches quantitatively to discern which produce(s) the best results s nd reapplies that search with additional terms or further granul.arity.
  • the framework server 116 generates requests to the data store 114 (and/or indirectly to the legacy databases via connectors 108, as discussed above) and presents information therefrom to the user via browser 118.
  • the requests can be based on HxQL or HxML requests entered directly by the user though, preferably, they .are generated by the server 116 based on user selections/responses to questions, dialog boxes or other user-input controls.
  • the framework server includes one or more user interface modules, plug-ins, or the like, each for generating queries of a particul.ar nature.
  • One such module for example, generates queries pertaining to marketing information, another such module generates queries pertaining to financial information, and so forth.
  • the framework server In addition to generating queries, the framework server (and/or the aforementioned modules) "walks" directed graphs generated by the data store 114 to present to the user (via browser 118) my specific items of requested information. Such walking of the directed graphs can be accomplished via any conventional technique known in the .art. Presentation of ques- tions, dialog boxes or other user-input controls to the user and, likewise, presentation of responses thereto based on the directed graph can be accomplished via conventional server/ browser or other user interface technology.
  • the framework server 116 permits a user to update data stored in the data store 114 and, thereby, that stored in the legacy databases 140.
  • any triples implicated by the change are updated in store 114C, as are the corresponding RDF document objects in store 114A.
  • An indication of these changes can be forwarded to the respective legacy databases 140, which utilize the corresponding API (or other interface mech- anisms) to update their respective stores.
  • changes made directly to the store 114C as discussed above, e.g., using a WebDAV client can be forwarded to the respective legacy database.
  • the server 116 can present to the user not only data from the data store 114, but also data gleaned by the server directly from other sources.
  • the server 116 CM directly query an enterprise web site for statistics regarding web page usage, or otherwise.
  • framework server 116 A further understanding of the operation of the framework server 116 may be attained by reference to the appendix filed with United States Patent Application Serial No.09/917,264, filed July 27, 2001, and entitled “Methods and Apparatus for Enterprise Application Integration,” which appendix is incorporated herein by reference.
  • a method for searching an RDF triples data store having a first storage schema comprising:
  • each of the events represents any of a declaration and a constraint specified in the first query.
  • the associating step includes associating statement data with one or more of a SELECT clause, a FROM clause, a WHERE clause and a ORDER-BY clause of an SQL statement.
  • a method for translating a schema-less input query in a first language to an output query in a second language comprising:
  • the output query represents a schema of a relational database storing RDF triples.
  • dispatching events further comprises generating any of a logical condition event, a selection term declaration event, and a triple declaration event.
  • generating a logical condition event comprises generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples according to a Boolean condition.
  • generating a selection term declaration event comprises generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples including a specified term.
  • generating a triple declaration event comprises generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples according to a specified subject, predicate and object.
  • the method of claim 10 wherein the first language is any of SQL-like and XM Jike ⁇ 16.
  • a digital system for seeching an RDF triples data store having a storage schema comprising:
  • parser component that examines a schema-less, first query specifying one or more RDF triples to be identified, the parser component examines the first query for one or more tokens that represent data to be used in generating a second query and that dispatches context-specific gramm. r events containing that data;
  • event-processing component coupled to the parser component, the event-processing component extracts statement data from one or more events
  • the statement management generation component generates the second query so as to identify the same RDF triples identified in the schema-less, first query .and so as to reflect the storage schema of the RDF triples data store.
  • a sorting order event specifies an order in which identified RDF triples are to be sorted for presentation to a user.
  • Priority Data AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU,
  • METATOMIX INC. [US US]; 275 Wyman MZ, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, Street, Suite 130, Waltham, MA 02451 (US). TJ, TM, TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW.
  • European patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE, For two-letter codes and other abbreviations, refer to the "GuidES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE, SK, ance Notes on Codes and Abbreviations " appearing at the beginTR), OAPI patent (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, ning of each regular issue of the PCT Gazette. GW, ML, MR, NE, SN, TD, TG).
  • the invention pertains to digital data processing and, more particularly, to methods and apparatus for enterprise business visibility and insight using real-time reporting tools.
  • a major impediment to enterprise business visibility is the consolidation of data from these disparate legacy databases with one another and with that from newer e-commerce databases.
  • inventory on-hand data gleaned from a legacy ERP system may be diffi- cult to combine with customer order data gleaned from web servers that support e-commerce (and other web-based) transactions. This is not to mention difficulties, for example, in consolidating resource scheduling data from the ERP system with the forecasting data from the marketing database system.
  • An object of this invention is to provide improved methods .and apparatus for digital data processing and, more particularly, for enterprise business visibility and insight (hereinafter, "ente ⁇ rise business visibility").
  • a further object is to provide such methods and apparatus as can rapidly and accurately retrieve information responsive to user inquiries.
  • a further object of the invention is to provide such methods and apparatus as can be readily and inexpensively integrated with legacy, current and future database management systems.
  • a still further object of the invention is to provide such methods and apparatus as can be implemented incrementally or otherwise without interruption of ente ⁇ rise operation.
  • Yet a still further object of the invention is to .provide such methods and apparatus as to facilitate ready access to up-to-date ente ⁇ rise data, regardless of its underlying source.
  • Yet still a further object of the invention is to provide such methods and apparatus as permit flexible presentation of ente ⁇ rise data in an easily understood manner.
  • the aforementioned are among the objects attained by the invention, one aspect of which provides a method of time- wise data reduction that includes the steps of inputting data from a source; summarizing that data according to one or more selected epochs in which it belongs; and generating for each such selected epoch one or more RDF triples characterizing the summarized data.
  • the data source may be, for example, a database, a data stream or otherwise.
  • the selected epoch may be a second, minute, hour, week, month, year, or so forth.
  • Still further related aspects of the invention provide for parsing triples from the RDF document objects and storing them in a relational data store.
  • a further related aspect of the invention provides for storing the triples in a relational store that is organized according to a hashed with origin approach.
  • Still yet other aspects of the invention provide for retrieving information represented by the triples in the hierarchical and/or relational data stores, e.g., for presentation to a user.
  • Related aspects of the invention provide for retrieving triples containing time-wise reduced data, e.g., for presentation to a user.
  • Related aspects of the invention provide methods as described above including a sum- marizing the input data according to one or more epochs of differing length. Further aspects of the invention provide methods as described above including querying the source, e.g., a legacy database, in order to obtain the input data. Related aspects of the invention provides for generating such queries in SQL format.
  • Still other aspects of the invention provide methods as described above including the step of inputting an XML file that identifies one or more sources of input data, one or more fields thereof to be summarized in the time-wise reduction, and/or one or more epochs for which those fields are to be summarized.
  • Further aspects of the invention provide methods as described above including responding to an input datum by updating summary data for an epoch of the shortest duration, e.g., a store of per day data. Related aspects of the invention provide for updating a store of summary
  • epochs of greater duration e.g., stores of per week or per month data
  • summary data maintained in a store for an epoch of lesser duration e.g., a store of per day data.
  • Figure 1 depicts an improved ente ⁇ rise business visibility and insight system according invention
  • Figure 1 A depicts an .architecture for a hologram data store according to the invention, e.g., in the system of claim 1 ;
  • Figure IB depicts the tables in a model store and a triples store of the hologram data store of Figure 1A;
  • Figure 2 depicts a directed graph representing data triples of the type maintained in a data store according to the invention.
  • Figure 3 is a functional block diagram of a time- wise data reduction module in a system according to the module.
  • FIG. 1 depicts a real-time ente ⁇ rise business visibility and insight system according to the invention.
  • the illustrated system 100 includes connectors 108 that provide software interfaces to legacy, e-commerce and other databases 140 (hereinafter, collectively, “legacy databases”).
  • a “hologram” database 114 (hereinafter, “data store” or “hologram data store”), which is coupled to the legacy databases 140 via the connectors 108, stores data from those databases 140.
  • a framework server 116 accesses the data store 114, presenting selected data to (and permitting queries from) a user browser 118.
  • the server 116 CM also permit updates to data in the data store 114 and, thereby, in the legacy databases 140.
  • Legacy databases 140 represent existing (and future) databases and other sources of information (including data streams) in a company, organization or other entity (hereinafter "ente ⁇ rise”).
  • these include a retail e-commerce database (e.g., as indicated by the cloud and server icons adjacent database 140c) maintained with a Sybase® database management system, an inventory database maintained with .an Oracle® database management system and an ERP database maintained with a SAP® Ente ⁇ rise Resource Planning system.
  • a retail e-commerce database e.g., as indicated by the cloud and server icons adjacent database 140c
  • an inventory database maintained with .an Oracle® database management system
  • ERP database maintained with a SAP® Ente ⁇ rise Resource Planning system.
  • Common features of illustrated databases 140 are that they maintain information of interest to an ente ⁇ rise and that they can be accessed via respective software application program interfaces (API) or other mechanisms known in the art.
  • API software application program interfaces
  • Connectors 108 serve as an interface to legacy database systems 140. Each connector applies requests to, and receives information from, a respective legacy database, using that database's API or other interface mechanism. Thus, for example, connector 108a applies requests to legacy database 140a using the corresponding SAP API; connector 108b, to legacy database 140b using Oracle API; and connector 108c, to legacy database 140c using the corresponding Sybase API.
  • these requests are for pu ⁇ oses of accessing data stored in the respective databases 140.
  • the requests can be simple queries, such as SQL queries and the like (e.g., depending on the type of the underlying database and its API) or more complex sets of queries, such as those commonly used in data mining.
  • one or more of the connectors can use decision trees, statistical techniques or other query and analysis mechanisms l ⁇ iown in the art of data mining to extract information from the databases.
  • Specific queries and analysis methodologies can be specified by the hologram data store 114 or the framework server 116 for application by the connectors.
  • the connectors themselves can construct specific queries and methodologies from more general queries received from the data store 114 or server 116. For example, request-specific items can be "plugged" into query templates thereby effecting greater speed and efficiency.
  • the requests can be stored in the connectors 108 for application and/or reapplication to the respective legacy databases 108 to provide one-time or periodic data store updates.
  • Connectors can use expiration date information to determine which of a plurality of similar data to return to the data store, or if dates are absent, the connectors can mark returned data as being of lower confidence levels.
  • Data and other information generated by the databases 140 in response to the requests are routed by connectors to the hologram data store 114. That other information can include, for example, expiry or other adjectival data for use by the data store in caching, purging, updating and selecting data.
  • the messages can be cached by the connectors 108, though, they are preferably immediately routed to the store 114.
  • the hologram data store 114 stores data from the legacy databases 140 (and from the framework server 116, as discussed below) as RDF triples.
  • the data store 114 can be embodied on any digital data processing system or systems that are in communications coupling (e.g., as defined above) with the connectors 108 and the framework server 116.
  • the data store 114 is embodied in a workstation or other high-end computing device with high capacity storage devices or arrays, though, this may not be required for any given implementation.
  • the hologram data store 114 may be contained on an optical storage device, this is not the sense in which the term "hologram" is used. Rather, it refers to its storage of data from multiple sources (e.g., the legacy databases 140) in a form which permits that data to be queried and coalesced from a variety of perspectives, depending on the needs of the user and the capabilities of the framework server 116.
  • sources e.g., the legacy databases 140
  • a preferred data store 114 stores the data from the legacy databases 140 in subject-predicate-object form, e.g., RDF triples, though those of ordinary skill in the art will appreciate that other forms may be used as well, or instead.
  • RDF is a way of expressing the properties of items of data. Those items are referred to as subjects. Their properties are referred to as predicates. And, the values of those properties are referred to as objects.
  • an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object.
  • Subjects can be anything that is described by an RDF expression.
  • a subject can be person, place or thing — though, typically, only an identifier of the subject is used in an actual RDF expression, not the person, place or thing itself. Examples of subjects might be "car,” “Joe,” “http://www.metatomix.com.” 5
  • Apredicate identifies a property of a subject. According to the RDF specification, this may be any "specific aspect, characteristic, attribute, or relation used to describe a resource.” For the three exemplary subjects above, examples of predicates might be "make,” “citizenship,” “owner.” 10
  • Objects can be literals, i.e., strings that identify or name the corresponding property
  • a given subject may have multiple predicates, each predicate indexing an object.
  • a subject postal zip code might have an index to an object town and an index to an object state, either (or both) index being a predicate URI.
  • RDF triples here, expressed in extensible markup language (XML) syntax.
  • XML extensible markup language
  • the listing shows only a sampling of the triples in a database 114, which typically would contain tens of thousands or more of such triples.
  • Subjects are indicated within the listing using a "rdf:about” statement.
  • the second line of the listing defines a subject as a resource named "postal://zip#02886.” That subject has predicates and objects that follow the subject declaration.
  • URIs uniform resource indicators
  • UTD Uniform Resource Identifiers
  • the predicates are expressed in the form ⁇ scheme>:// ⁇ path># ⁇ fragment>, as is evident to those in ordinciry skill in the .art.
  • the ⁇ scheme> for the predicates is "http" and ⁇ path> is "www.metatomix.com/ postalCode/1.0.”
  • the ⁇ fragment> portions are ⁇ town>, ⁇ state>, ⁇ country> and ⁇ zip>, respectively. It is important to note that the listing is in some ways simplistic in that each of its objects is a literal value. Commonly, an object may itself be another subject, with its own objects and predicates. In such cases, a resource can be both a subject and an object, e.g., an object to all "upstream” resources and a subject to all "downstream” resources and properties. Such "branching" allows for complex relationships to be modeled within the RDF triple framework.
  • Figure 2 depicts a directed graph composed of RDF triples of the type stored by the illustrated data store 114, here, by way of non-limiting example, triples representing relationships among four companies (id#l, id#2, id#3 .and id#4) and between two of those companies (id#l and id#2) and their employees.
  • subjects and resource-type objects are depicted as oval-shaped nodes; literal-type objects are depicted as rectangular nodes; and predicates are depicted as arcs connecting those nodes.
  • Figure 1A depicts an architecture for a preferred hologram data store 114 according to the invention.
  • the illustrated store 114 includes a model document store 114A and a model document manager 114B. It also includes a relational triples store 114C, a relational triples store manager 114D, and a parser 114E interconnected as shown in the drawing.
  • RDF triples maintained by the store 114 are received ⁇ from the legacy databases 140 (via connectors 108) and/or from time-based data reduction module 150 (described below) ⁇ in the form of document objects, e.g., of the type generated from a Document Object Model (DOM) in a JAVA, C++ or other application.
  • DOM Document Object Model
  • these are stored in the model document store 114A as such (i.e., document objects) particularly, using the tables and inter-table relationships shown in Figure IB (see dashed box labelled 114B).
  • the model document manager 114B manages storage/retrieval of the document object to/from the model document store 114A.
  • the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache Software Foundation. It stores (and retrieves) document objects to (and from) the store 114A in accord with the WebDAV protocol.
  • the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache Software Foundation. It stores (and retrieves) document objects to (and from) the store 114A in accord with the WebDAV protocol.
  • Those skilled in the art will, of course, appreciate that other applications can be used in place of Slide and that document objects can be stored/retrieved from the store 114A in accord with other protocols, industry- standard, proprietary or otherwise.
  • WebDAV protocol allows for adding, updating and deleting RDF document objects using a variety of WebDAV client tools (e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors), in addition to adding, updating and deleting document objects via connectors 108 and/or time-based data reduction module 150.
  • WebDAV client tools e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors
  • This also allows for presenting the user with a view of a traversable file system, with RDF documents that can be opened directly in XML editing tools or from Java programs supporting WebDAV protocols, or from processes on remote machines via any HTTP protocol on which WebDAV is based.
  • RDF triples received by the store 114 are also stored to a relational database, here, store
  • RDBMS relational database management system
  • the triples are divided into their constituent components (subject, predicate, and object), which .are indexed .and stored to respective tables in the manner of a "hashed with origin" approach.
  • a parser 114E extracts its triples and conveys them to the RDBMS 114D with a corresponding indicator that they are to be added, updated or deleted from the relational database.
  • Such a parser 114E operates in the conventional manner known in the art for extracting triples from RDF documents.
  • the illustrated database store 114C has five tables interrelated as particularly shown in
  • Figure IB (see dashed box labelled 114C).
  • these tables rely on indexes generated by hashing the triples' respective subjects, predicates and objects using a 64-bit hashing algorithm based on cyclical redundancy codes (CRCs) — though, it will be appreciated that the indexes can be generated by other techniques as well, industry-standard, proprietary or other- wise.
  • CRCs cyclical redundancy codes
  • the "triples" table 534 maintains one record for each stored triple.
  • Each record contains an aforementioned hash code for each of the subject, predicate and object that make up the respective triple, along with a resource flag (“resourcc_flg”) indicating whether that object is of the resource or literal type.
  • Each record also includes an aforementioned hash code (“m_hash”) identifying the document object (stored in model document store 114A) from which the triple was parsed, e.g., by parser 114E.
  • the values of the subjects, predicates and objects . are not stored in the triples table. Rather, those values are stored in the resources table 530, namespaces table 532 and literals table 536. Particularly, the resources table 530, in conjunction with the namespaces table 532, stores the subjects, predicates and resource-type objects; whereas, the literals table 536 stores the literal-type objects.
  • the resources table 530 maintains one record for each unique subject, predicate or resource-type object. Each record contains the value of the resource, along with its aforementioned 64-bit hash. It is the latter on which the table is indexed. To conserve space, portions of those values common to multiple resources (e.g., common ⁇ scheme>:// ⁇ path> identifiers) are stored in the namespaces table 532. Accordingly the field, "r_value,” contained in each record of the resources table 530 reflects only the unique portion (e.g., ⁇ fragment> identifier) of each resource.
  • the namespaces table 532 maintains one record for each unique common portion referred to in the prior paragraph (hereinafter, "namespace"). Each record contains the value of that namespace, along with its aforementioned 64-bit hash. As above, it is the latter on which this table is indexed.
  • the literals table 536 maintains one record for each unique literal-type object. Each record contains the value of the object, along with its aforementioned 64-bit hash. Each record also includes an indicator of the type of that literal (e.g., integer, string, and so forth). Again, it is the latter on which this table is indexed.
  • the models table 538 maintains one record for each RDF document object contained in the model document store 114 A.
  • Each record contains the URI of the corresponding document object ("uri string”), along with its aforementioned 64-bit hash ("m hash"). It is the latter on which this table is indexed.
  • m hash 64-bit hash
  • each record of the models table 538 also contains the ID of the corresponding document object in the store 114A. That ID can be assigned by the model document manager 114B, or otherwise.
  • relational triples store 114C is a schema- less structure for storing RDF triples.
  • triples maintained in that store can be reconstituted via an SQL query. For example, to reconstitute the RDF triple having a subject equal to "postal://zip#02886", a predicate equal to "http://www.metatomix.com/ postalCode/1.0#town", and an object equal to "Warwick”, the following SQL statement is applied:
  • RDF documents and, more generally, objects maintained in the store 114 can be contained in other stores — structured relation- ally, hierarchically or otherwise ⁇ as well, in addition to or instead of stores 114A and 114C.
  • time-wise data reduction component 150 comprises an XML parser 504, a query module 506, an analysis module 507 and an output module 508.
  • the component 150 performs a time-wise reduction on data from the legacy databases 140. In some embodiments, that data is supplied to the component 150 by the connectors 108 in the form of RDF documents. In the illustrated embodiment, the component 150 functions, in part, like a connector itself — obtaining data directly from the legacy databases 140 before time- wise reducing it.
  • illustrated component 150 outputs the reduced data in the form of RDF triples contained in RDF documents.
  • these are stored in the model store 114A (and the underlying triples, in relational store 114C), alongside the RDF documents (and their respective underlying triples) from which the reduced data was gener-
  • Module 504 parses an XML file 502 which specifies one or more sources of data to be time-wise reduced. That file may be supplied by the framework server 116, or otherwise.
  • the specified sources may be legacy databases, data streams, or otherwise 140. They may also be connectors 108, e.g., identified by symbolic name, virtual port number, or otherwise.
  • the XML specification file 502 specifies the data items which are to be time- wise reduced. These can be field names, identifiers or otherwise.
  • the XML file 502 further specifies the time periods or epochs over which data is to be time-wise reduced. These can be seconds, minutes, hours, days, months, weeks, years, and so forth, depending on the type of data to be reduced. For example, if the data source contains hospital patient data, the specified epochs may be weeks and months; whereas, if the data source contains web site access data, the specified epochs may be hours and days.
  • the parser component 504 parses the XML file 502 to discern the aforementioned data source identifiers, field identifiers, and epochs. To this end, the parser 504 may be constructed and operated in the conventional manner known in the art.
  • the query module 506 generates queries in order to obtain the field specified in the XML specification file 502. It queries the identified data source(s) in the manner appropriate to those sources. For example, the processing module 510 queries SQL-compatible databases using an SQL query. Other data sources are queried via their respective applications program interfaces (APIs), or otherwise. In embodiments where source data is supplied to the component 150 by the connectors 108, querying may be performed explicitly or implicitly by those connectors 108. Moreover, querying might not need to be performed on some data sources, e.g., data streams, from which data is broadcast or otherwise available without the need for request. In such instances, filtering may be substituted for querying in order that the specific fields or other items of data specified in the XML file are obtained.
  • APIs applications program interfaces
  • the analysis module 507 compiles time-wise statistics or summaries for each epoch specified in the XML file 502. To this end, it maintains for each such epoch one or more run- ning statistics (e.g., sums or averages) for each data field specified by the file 502 and received from the sources. As datum for each field are input, the running statistics for that field are updated. Such updating can include incrementing a count maintained for the field, recomput-
  • run- ning statistics e.g., sums or averages
  • the analysis module 507 would maintain a store reflecting the number of hits thus far counted on a given day for that web site (e.g., based on data received from a source identifying each hit as it occurs, or otherwise).
  • the module When no further data is received from the source for that day, the module generates RDF output (via the output module 508) reflecting that number of counts (or other specified summary information) for output to the hologram store 114.
  • the analysis module 507 would maintain a separate store of counts for the month for which data is currently being received from the source. As above, when no further data is received from the source for that month, the module generates RDF output reflecting the total number of counts (or other specified summary information) for output to the hologram store 114.
  • An analysis module 507 maintains stores for each epoch for which running statistics (.i.e., time-wise summaries) are to be maintained.
  • the stores 514 can be allocated from an array, a pointer table or other data structure, with specific allocations made depending on the specific number of running statistics being tracked.
  • an XML file 502 specifies that access statistics are to be maintained for a web site on daily and monthly bases using data from a first data source, and that running statistics for the numbers of visitors to a retail store are to be maintained on monthly and yearly bases from data from a second data source
  • the analysis module 507 can maintain four stores: store 14A maintaining a daily count for the web site; store 514B maintaining a monthly count for the web site; store 514C maintaining a monthly account for the retail store; and store 514D
  • Each of the stores 514 is updated as corresponding data is received from the respective data sources.
  • a count maintained in the first store 514A is incremented.
  • the output module 508 can generate one or more RDF triples reflecting a count for the (then-complete) prior day for storage in the hologram store 114.
  • the store 514A can be reset to zero and the process restarted for tracking accesses on that succeeding day.
  • the second store 514B i.e., that tracking the longer epoch for data from the first source, can be incremented in parallel with the first store 514A as web access data is received from the source or, alternatively, can be updated when the first store 514A is rolled over, i.e. reset for tracking statistics for each successive day.
  • RDF triples can be generated to reflect web access statistics for the then- completed prior month, concurrently with zeroing the second store 514B for tracking of statistics for the succeeding month.
  • the .analysis module 507 maintains running statistics for the epochs specified in the XML file 502, outputting RDF triples reflecting those statistics as data for each successive epoch is received.
  • running statistics may be maintained in other ways, as well. For example, continuing the above example, in instances where data received from the first source is not received ordered by day (but, rather, is intermingled with respect to many days), multiple stores can be maintained — one for each day (or other epoch).
  • the output module 508 generates RDF documents reflect- ing the summ.arized data stored in stores 514 for output to the hologram data store 114.
  • This can be performed by generating and RDF stream ad hoc or, preferably, by utilizing native commands, e.g., of the Java programming language, to gather the epoch data into a document object model (DOM).
  • DOM document object model
  • the DOM can be output in RDF format to the hologram store 114 directly.
  • the data store 114 supports a SQL-like query languages called HxQL and HxML. This allows retrieval of RDF triples matching defined criteria.
  • the data store 114 includes a graph generator (not shown) that uses RDF triples to generate directed graphs in response to queries (e.g., in HxQL or HxML form) from the framework server 116. These may be queries for information reflected by triples originating from data in one or more of the legacy databases 140 (one example might be a request for the residence cities of hotel guests who booked reservations on account over Independence Day weekend, as reflected by data from an e-Commerce database and an Accounts Receivable database).
  • queries e.g., in HxQL or HxML form
  • queries e.g., in HxQL or HxML form
  • queries e.g., in HxQL or HxML form
  • queries e.g., in HxQL or HxML form
  • queries e.g., in HxQL or HxML form
  • queries e.g., in HxQL or HxML form
  • These may be queries for information reflected by triples originating from data in
  • the data store 114 utilizes genetic, self- adapting, algorithms to traverse the RDF triples in response to queries from the framework server 116. Though not previously known in the .art for this pu ⁇ ose, such techniques can be beneficially applied to the RDF database which, due to its inherently flexible (i.e., schema-less) structure, is not readily searched using traditional search techniques.
  • the data store utilizes a genetic algorithm that performs several searches, each utilizing a different methodol- ogy but all based on the underlying query from the framework server, against the RDF triples. It compares the results of the searches quantitatively to discern which produce(s) the best results and reapplies that search with additional terms or further gr.anularity.
  • the framework server 116 generates requests to the data store 114 (and/or indirectly to the legacy databases via connectors 108, as discussed above) and presents information therefrom to the user via browser 118.
  • the requests can be based on
  • the framework server includes one or more user interface modules, plug-ins, or the like, each for generating queries of a particular nature.
  • One such module for example, generates queries pertaining to marketing information, another such module generates queries pertaining to financial information, and so forth.
  • queries to the data store are structured on a SQL based RDF query language, in the general manner of SquishQL, as known in the art.
  • the framework server In addition to generating queries, the framework server (and/or the aforementioned modules) "walks" directed graphs generated by the data store 114 to present to the user (via browser 118) any specific items of requested information. Such walking of the directed graphs can be accomplished via any conventional technique known in the art. Presentation of questions, dialog boxes or other user-input controls to the user and, likewise, presentation of responses thereto based on the directed graph can be accomplished via conventional server/ browser or other user interface technology.
  • the framework server 116 permits a user to update data stored in the data store 114 and, thereby, that stored in the legacy databases 140.
  • changes made to data displayed by the browser 118 are transmitted by server 116 to data store 114.
  • any triples implicated by the change are updated in store 114C, as are the corresponding RDF document objects in store 114A.
  • An indication of these changes can be forwarded to the respective legacy databases 140, which utilize the corresponding API (or other interface mechanisms) to update their respective stores.
  • changes made directly to the store 114C as discussed above, e.g., using a WebDAV client can be forwarded to the respective legacy database.
  • the server 116 can present to the user not only data from the data store 114, but also data gleaned by the server directly from other sources.
  • the server 116 can directly query an ente ⁇ rise web site for statistics regarding web page usage, or otherwise.
  • a method of time- wise data reduction and storage comprising
  • a method of time-wise data reduction and storage comprising
  • RDF triples to one or more data stores, along with further RDF triples characterizing the data from which the summaries were generated, where the one or more data stores include any of a hierarchical data store and a relational data store.
  • the method of claim 7, comprising parsing an XML file that identifies one or more of the data sources, one or more fields thereof to be summarized, and/or one or more epochs for which those fields are to be summarized.
  • a method of time- wise data reduction and storage comprising
  • the method of claim 13, comprising parsing an XML file that identifies one or more of the data sources, one or more fields thereof to be summarized, and/or one or more epochs for which those fields are to be summarized.
  • the invention provides methods of time-wise data reduction that include the steps of inputting data from a source; summarizing that data according to one or more selected epochs in which it belongs; and generating for each such selected epoch one or more RDF triples characterizing the summarized data.
  • the data source may be, for example, a database, a data stream or otherwise.
  • the selected epoch may be a second, minute, hour, week, month, year, or so forth.
  • the triples may be output in the form of RDF document objects. These can be stored, for example, in a hierarchical data store such as, for example, a WebDAV server. Triples parsed from the document objects may be maintained in a relational store that is organized, for example, according to a hashed with origin approach.
  • FIG. IB is a diagrammatic representation of FIG. IB
  • the invention pertains to digital data processing .and, more particularly, to methods .and apparatus for identifying subsets of related data in a data set.
  • the invention h * s application, for example, in ente ⁇ rise business visibility and insight using real-time reporting tools.
  • J5 (ERP) system tracking inventory might be two or three ye.ars old. Integration between these systems is difficult at best, consuming specialized progr-amming skill and constant maintenance expenses.
  • a major impediment to enterprise business visibility is the consolidation of these dispa- 0 rate legacy databases with one another .and with newer databases.
  • inventory on- hand data gleaned from a legacy ERP system may be difficult to combine with customer order data gleaned from web servers that support e-commerce (and other web-based) transactions. This is not to mention difficulties, for example, in consolidating resource scheduling data from the ERP system with the forecasting data from the mwketing database system. 5
  • An object of this invention is to provide improved methods and apparatus for digital data processing and, more particularly, for identifying subsets of related data in a data set. 5
  • a related object is to provide such methods and apparatus as facilitate ente ⁇ rise business visibility and insight.
  • a further object is to provide such methods and apparatus as can rapidly identify subsets of related data in a data set, e.g., in response to user directives or otherwise.
  • a further object of the invention is to provide such methods and apparatus as can be readily and inexpensively implemented.
  • a "first” step though the steps are not necessarily executed in sequential order — includes identifying (or marking) as related data expressly satisfying a criteria (e.g., specified by a user).
  • a “second” step includes identifying as related ancestors of any data identified as related, e.g., in the first step, unless that ancestor conflicts with the criteria.
  • a “third” step of the method is identifying descendents of .any data identified, e.g., in the prior steps, unless that descendent conflicts with the criteria or has a certain relationship with the ancestor from which it descends.
  • the methods generates, e.g., as output, an indication of each of the nodes identified .as related in these steps.
  • criteria are specific to the types of data in the data set and can be more complex, including for example, Boolean expressions and operators, wildcards, and so forth.
  • the method "walks" up the directed graph from each node identified as related in first step (or any of the steps) to find ancestor nodes. Each of these is identified as related unless it conflicts with the criteria.
  • the second step marks as related a second, parent triple whose object is the subject of the first triple, unless that second (or parent) triple otherwise conflicts with the criteria, e.g., has another object specifying that Dave is the CTO.
  • the method walks down the directed graph from each node identified in the previously described steps (or any of the steps) to find descendent nodes.
  • Each of these is identified as related unless (i) it conflicts with the criteria or (ii) its relationship with the ancestor from which walking occurs is of the same type as the relation- ship that ancestor has with a child, if any, from which the ancestor was identified by operation of the second step.
  • the third step marks as related a third, descendent triple whose subject is the object of the second, parent triple, unless that descendent triple conflicts with the criteria (e.g., has a predicate-object pair specifying that Dave is the CTO) or unless its relationship with the parent triple is also defined by a predicate relationship of type "Subsidiary.”
  • the data are defined by RDF triples and the nodes by subjects (or resource-type objects) of those triples.
  • the data and nodes are of other data types — including, for example, meta directed graph data (of the type defined in one of the aforementioned incorporated-by-refer- ence applications) where a node represents a plurality of subjects each sharing a named relationship with a plurality of objects represented by a node.
  • Still further aspects of the invention provide methods as described above in which the so-called first, second and third steps are executed in parallel, e.g., as by an expert system rule- engine. In other .aspects, the steps are executed in series and/or iteratively.
  • the invention provides methods for identifying related data in a directed graph by exercising only the first and second aforementioned steps. Other aspects provide such methods in which only the first and third such steps are exercised.
  • Still other aspects of the invention provide methods as described above in which the directed graph is made up of, at least in p.art, a data flow, e.g. of the type containing transactional or ente ⁇ rise data.
  • Related aspects provide such methods in which the steps .are executed on a first portion of a directed graph and, then, separately on a second portion of the directed graph, e.g., as where the second portion reflects updates to a data set represented by the first portion.
  • Figure 1 is a block diagram of a system according to the invention for identifying related data in a data set
  • Figure 2 depicts a data set suitable for processing by a methods and apparatus according to the invention
  • Figures 3-5 depict operation of the system of Figure 1 on the data set of Figure 2 with different criteria.
  • Figure 1 depicts a system 8 according to the invention for identifying and/or generating (collectively, "identifying") a subset of a directed graph, namely, that subset matching or related to a criteria.
  • the embodiment (and, more generally, the invention) is suited for use inter alia in generating subsets of RDF data sets consolidated from one or more data sources, e.g., in the manner described in the following copending, commonly .assigned application, the teachings of which are incorporated herein by reference
  • the embodiment (.and, again, more generally, the invention) is also suited inter alia for generating subsets of "meta” directed graphs of the type described in copending, commonly assigned application United States Patent Application Number Serial No. 10/138,725, filed May 3, 2002, entitled “Methods And Apparatus for Visualizing Relationships Among Triples of Resource Description Fr.amework (RDF) Data Sets,” the teachings of which are incorporated herein by reference.
  • RDF Resource Description Fr.amework
  • the illustrated system 8 includes a module 12 that executes a set of rules 18 with respect to a set of facts 16 representing criteria in order to generate a subset 20 of a set of facts 10 representing an input data set, where that subset 20 represents those input data facts that match the criteria or are related thereto.
  • the set of facts 16 representing criteria are referred to as "criteria” or "criteria 16”
  • the set of facts 10 representing data are referred to as "data” or “data 10.”
  • the illustrated system 8 is implemented on a general- or special-pu ⁇ ose digital data processing system, e.g., a workstation, server, mainframe or other digital data processing system of the type conventionally available in the marketplace, configured .and operated in accord with the teachings herein.
  • the digital data processing system can be coupled for communication with other such devices, e.g., via a network or otherwise, and can include input/output devices, such as a keyboard, pointing device, display, printer and the like.
  • Illustrated module 12 is an executable program (compiled, interpreted or otherwise) embodying the rules 18 and operating in the manner descrited herein for identifying subsets of directed graphs.
  • module 12 is implemented in Jess (Java Expert System Shell), a rule-based expert system shell, commercially available from Sandia National Laboratories. However it can be implemented using any other "expert system” engine, if-then- else network, or other software, firmware and/or hardware environment (whether or not expert system-based) suitable for adaptation in accord with the teachings hereof.
  • the module 12 embodies the rules 18 in a network representation 14, e.g., an if-then- else network, or the like, native to the Jess environment.
  • the network nodes are preferably executed so as to effect substsintially parallel operation of the rules 18, though they can be executed so as to effect serial and/or iterative operation as well or in addition.
  • the rules are represented in accord with the specifics of the corresponding engine, if- then-else network, or other software, firmware and/or hardware environment on which the embodiment is implemented. These likewise preferably effect parallel execution of the rules 18, though they may effect serial or iterative execution instead or in addition.
  • the data set 10 is a directed graph, e.g., a collection of nodes representing data .and directed arcs connecting nodes to one another.
  • a node at the source of an arc is referred to as an "ancestor” (or “direct ancestor"), while the node at the target of. he arc is referred to herein as a "descendent" (or “direct descendent”).
  • each arc has an associated type or name, e.g., in the manner of predicates of RDF triples — which, themselves, constitute and/or form directed graphs.
  • the data set 10 can comprise data structures representing a meta directed graph of the type disclosed in copending, commonly assigned United States Patent Application Serial No. 10/138,725, filed May 3, 2002, entitled “Methods And Apparatus for Visualizing Relationships Among Triples of Resource Description Framework (RDF) Data Sets, e.g., at Figure 4A - 6B and accompanying text, all of which inco ⁇ o- rated herein by reference.
  • RDF Resource Description Framework
  • the data set 10 can comprise RDF triples of the type conventionally known in the art and described, for example, in Resource Description Framework (RDf") Model and Syntax Specification (Febru.ary 22, 1999).
  • RDF Resource Description Framework
  • Those items are referred to as subjects or resources. Their properties are referred to as predicates. And, the values of those properties .are referred to as objects.
  • an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object.
  • Subjects can be anything that is described by an RDF expression.
  • a predicate identifies a property of a subject.
  • An object gives a "value" of a property.
  • Objects can be literals, i.e., strings that identify or name the corresponding property (predicate). They can also be resources.
  • the data set 10 may be stored on disk for input to module 12.
  • the data set may be a data flow, e.g., a stre.am of data (real-time or otherwise) originating from e-commercc, point-of-sale or other transactions or sources (whether or not business- or ente ⁇ rise-orientcd).
  • the data set may comprise multiple parts, each operated on by module 12 at different times — for example, a first part representing a database and a second part representing updates to that database.
  • Criteria 16 contains expressions including, for example, literals, wildcards, Boolean operators and so forth, against which nodes in the data set are tested.
  • the criteria can specify subject, predicate and/or object values or other attributes.
  • other appropriate values and attributes may be specified.
  • Criteria can be input by a user, e.g., from a user interface, e.g., on an ad hoc b.asis. Alternatively or in addition, they can be stored and re-used, such as where numerous data sets exist of which the same criteria is applied. Further, the criteria 16 can be generated via dynamically, e.g., via other software (or hardware) applications.
  • Rules 18 define the tests for identifying data in the data set 20 that match the criteria or that are related thereto. These are expressed in terms of the types and values of the data items .as well as their interrelationships or connectedness.
  • triple's object is a resource, identify triple as related if triple's predicate matches that specified in criteria, if any, and if triples object matches that specified in criteria.
  • a triple whose object is the subject of another triple is deemed a direct ancestor of that other triple; a triple whose subject is the object of .another triple is deemed a direct descendent of that other triple.
  • TM rial while for other uses (and/or embodiments) differences with respect to suffix, case and/or tense are immaterial. Those skilled in the art will appreciate that for other uses and/or embodiments, factors other than suffix, case and/or tense may be used in determining materiality or lack thereof.
  • the related data 20 output or otherwise generated by module 12 represents those nodes or triples identified as "related" during exercise of the rules.
  • the data 20 can be output in the same form as the input data or some alternate form, e.g., pointers or other references to identified data within the data set 10. In some embodiments, it can be displayed via a user interface or printed, or digitally communicated to further applications for additional processing, e.g., via a network or the Internet. In one non-limiting example, the related data 20 can be used to generate mailings or to trigger message events.
  • the module 12 is loaded with rules 18.
  • this is accomplished via compilation of source code embodying those rules (expressed above in pseudo code) in the native or appropriate language of the expert system engine or other environment in which the module is implemented. See, step A.
  • rules in source code format can be retrieved at run time .and interpreted instead of compiled.
  • the criteria 16 is then supplied to the module 12. See, step B. These can be entered by an operator, e.g., via a keyboard or other input device. Alternatively, or in addition, they can be retrieved from disk or input from another application (e.g., a messaging system) or device, e.g., via network, inte ⁇ rocess communication or otherwise.
  • a messaging system e.g., a messaging system
  • device e.g., via network, inte ⁇ rocess communication or otherwise.
  • the data set 10 is applied to the module 12 in step C.
  • the data set 10 can be as described above, to wit, a RDF data set or other directed graph stored in a data base or contained in a data stream, or otherwise.
  • the data set can be applied to the module 12 via conventional techniques known in the art, e.g., retrieval from disk, communication via network, or via any other tech- ⁇ ique capable of communicating a data set to a digital application.
  • step D the module 12 uses the rules 18 to apply the criteria 16 to the data set 10.
  • this step is executed via the network 14 configured (via the rules engine) in accord with the rules.
  • this step is executed via the corresponding internal representation of those rules.
  • Triples (in the case of RDF data sets) or data (in the case of data sets comprising other types of directed graphs) identified by the module .as "related" — eaning, in the context hereof, that those triples match the criteria or .are related thereto — are output as "identified data" in Step D.
  • the output can be a list or other tabulation of identified data 20, or it can be a pointer or reference to that data, for example, a reference to a location within the data set 10.
  • the output of identified data 20 can be stored for future use, e.g., for use with a mail-merge or other applications. In other embodiments, it can be digitally communicated to other data base systems or information repositories. Still further, in some embodiments, it can be added to a data base containing other related data, or even replace portions of that data based.
  • a conventional data format e.g., XML
  • Figure 2 is a graphical depiction of this directed graph, i.e., RDF data set.
  • subjects .and resource-type objects are depicted as oval-shaped nodes; literal-type objects are depicted as rectangular nodes; and predicates are depicted as arcs connecting those nodes.
  • Figure 3 depicts application by module 12 of criteria on the data set shown in Figure 2 using the above-detailed rules, specifically, those of the RDF type.
  • the depiction is simplified insofar as it shows execution of the rules serially: in practice, a preferred module 12 implemented in a rules engine (such as Jess) executes the rules in accord with the engine's underlying algorithm (e.g., a Rete algorithm as disclosed by Forgy, "Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match," Problem Artificial Intelligence, 19(1982) 17-37, by http:/ herzberg.ca.sandia.gov/jess/docs/52/ rete.html; or other underlying algorithm).
  • a Rete algorithm as disclosed by Forgy, "Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match," Problem Artificial Intelligence, 19(1982) 17-37,
  • the depiction shows successive identification of triples as "related" (i.e., matching the criteria or related thereto) as each rule is applied or re-applied.
  • the illustrated sequence proceeds from left-to-right then top-to-bottom, as indicated by the dashed-line arrows.
  • the data set is depicted in abstract in each frame, i.e., by a small directed graph of identical shape as that of Figure 2, but without the labels. Triples identified as related are indicated in black.
  • the module applies the Sibling Rule to find triples at the same level as the one(s) previously identified by the Criteria Rule.
  • the company ://id#l — employee — Howard and company ://id#l — employee — Alan triples are identified and marked accordingly.
  • the module applies the Ancestor Rule to walk up the directed graph to find ancestors of the triples previously identified as related.
  • the company:// id#3 — customer — company ://id#l triple is identified and marked accordingly.
  • the module applies the Descendent Rule to walk down the directed graph to find descendents of the triples previously identified as related. No triples are selected since both company://id#3 — customer — company://id#2 and company://id#3 — customer — comp.any://id#4 share the same predicate as company://id#3 — customer — company ://id#l .
  • company ://id#2 is a direct descendent that has a predicate (to wit, customer) connecting it with its identified direct ancestor (to wit, company.7/id#3) which matches a predicate that ancestor (to wit, company://id#3) has with a direct descendent (to wit, company ://id#l) via which that direct ancestor (to wit, company:// id#3) was identified during the execution of the .Ancestor Rule.
  • the module 12 reapplies the rules, this time beginning with a Criteria Rule match of company://id#2 — CTO — Colin. In frames 9-12, the module 12 finds no further matches upon reapplication of the rules.
  • Figure 4 is frame two.
  • application of the Sibling Rule by module 12 does not result in identification of all of the siblings of company://id#l — employee — Alan (which had been identified as relevant in the prior execution of the Criteria Rule). This is because, one of siblings company ://id#l — employee — Howard has the s.ame predicate as that specified in the criteria. Accordingly, that triple is not identified or marked as related.
  • the identifications effected by specification of a resource as a criteria.
  • module 12 can likewise apply the rules to data sets representing the meta directed graphs disclosed in copending, commonly assigned application United States PatentApplication Number Serial No. 10/138,725, filed May 3, 2002, entitled “Methods And Apparatus for Visualizing Relationships Among Triples of Resource Description Framework (RDF) Data Sets,” the teachings of which are inco ⁇ orated herein by reference.
  • RDF Resource Description Framework
  • a method for identifying related data in a directed graph comprising:
  • identified descendent identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified as related in any of sub-steps (i), (ii) and (iii), and which identified descendent
  • step (A) B. generating an indication of data identified as related in step (A).
  • sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
  • sub-step (iii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
  • step (A) further comprising executing any of the sub-steps of step (A) using a rule-based engine.
  • step (A) executing step (A) with respect to a first data set representing a first portion of the directed graph
  • step (A) separately with respect to a second data set representing a second portion of the directed graph.
  • a method for identifying related data in a directed graph comprising:
  • step (A) B. generating an indication of data identified as related in step (A).
  • sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
  • a method for identifying related data in a directed graph comprising:
  • identified descendent identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified in any of sub- steps (i) and (ii), and which identified descendent (a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria if any, and
  • (c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if any, as a result of which the identified ancestor was identified as related.
  • sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
  • step (A) executing step (A) with respect to a first data set representing a first portion of the directed graph
  • step (A) separately with respect to a second data set representing a second portion of the directed graph.
  • RDF resource description framework
  • identifyd descendent identifying as related a triple (hereinafter “identified descendent") that is a direct descendent of triple (hereinafter “identified ancestor”) identified as related in any of sub-steps (i), (ii) and (iii), and which identified descendent
  • step (A) B. generating an indication of triples identified as related in step (A).
  • sub-step (iii) includes comparing at least one of the predicate and object specified in the criteria with the identified descendent in order to determine whether the identified descendent ancestor is in substanti.al conflict with the criteria.
  • step (A) any of serially, in p.arallel, or recursively.
  • step (A) The method of claim 21, further comprising executing any of the sub-steps of step (A) using a rule-based engine.
  • step (A) executing step (A) with respect to a first data set of RDF triples
  • step (A) separately with respect to a second, related data set of RDF triples.
  • a method for identifying related triples in a resource description framework (RDF) data set comprising
  • step (A) B. generating an indication of data identified as related in step (A).
  • sub-step (ii) includes comparing at least one of the predicate and object specified in the criteria with direct ancestor in order to determine whether the director ancestor is in substantial conflict with the criteria.
  • a method for identifying related triples in a resource description framework (RDF) data set comprising
  • identified descendent identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified as related in any of sub-steps (i) and (ii), and which identified descendent
  • (b) is not in substantial conflict with the criteria; (c) is not associated with the identified ancestor via a predicate matching a predicate by which the identified ancestor is associated with a triple, if any, as a result of which the identified ancestor was identified as related,
  • step (A) B. generating an indication of data identified as related in step (A).
  • sub-step (iii) includes comparing at least one of the predicate and object specified in the criteria with the identified descendent in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
  • step (A) executing step (A) with respect to a first data set of RDF triples
  • step (A) separately with respect to a second, related data set of RDF triples.
  • a method for identifying related data in a directed graph comprising:
  • identified descendent identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified as related in any of sub-steps (i) and (ii) and which identified descendent (a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria, if any, and
  • step (c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if .any, as a result of which the identified ancestor was identified during execution of sub-step (ii), ing an indication of data identified as related in step (A).
  • a "first" (or marking) as related data expressly identified as related, e.g., in the first descendents of any data identified, relationship with the ancestor from which it identified as related in the three steps.

Abstract

Systems and methods according to the invention provide a surveillance, monitoring and real-time events platform to (i) enable the integration and communication of information between government agencies and organizations specifically tasked with ensuring the security and safety of our nation and its communities, (ii) to integrate information systems from federal, state and/or local agencies (from disparate data sources if necessary) in order to obtain a single, real-time view of the entire organization, and (iii) to extract more complete, actionable infor­mation from their existing systems, thereby dramatically improving decision making speed and accuracy.

Description

SURVEILLANCE, MONITORING AND REAL-TIME EVENTS PLATFORM
Background of the Invention
This application claims the benefit of priority of United States Provisional Patent Application Serial No. 60/485,200, filed July 7, 2003, entitled "Surveillance, Monitoring and Real- Time Events Platform," the teachings of which are incoφorated herein by reference. This application is a continuation in part of and claims the benefit of priority of the following copend- ing, commonly-assigned patent applications, the teachings of all of which are incoφorated herein by reference: United States Patent Application Serial No. 10/680,049, filed October 7, 2003, entitled "Methods and Apparatus for Identifying Related Nodes in a Directed Graph Having Named Arcs"; United States Provisional Patent Application Serial No. 60/416,616, filed October 7, 2002, entitled "Methods and Apparatus for Identifying Related Nodes in a Directed Graph Having Named Arcs"; United States Patent Application Serial No. 09/917,264, filed July 27, 2001, entitled "Methods and Apparatus for Enteφrise Application Integration"; United States Provisional Patent Application Serial No. 60/291,185, filed May 15, 2001, enti- tied "Methods and Apparatus for Enteφrise Application Integration"; United States Patent Application Serial No. 10/051,619, filed October 29, 2001, entitled "Methods and Apparatus for Real-Time Business Visibility Using Persistent Schema-Less Data Storage"; United States Provisional Patent Application Serial No. 60/324,037, filed September 21, 2001, entitled
"Methods and Apparatus for Real-Time Business Visibility Using Persistent Schema-Less Data
Storage"; United States Patent Application Serial No. 10/302,764, filed November 21, 2002, entitled "Methods and Apparatus for Querying a Relational Data Store Using Schema-Less
Queries"; United States Provisional Patent Application Serial No. 60/332,053, filed November 21, 2001, entitled "Methods and Apparatus for Querying a Relational Database of RDF Triples in a System for Real-Time Business Visibility"; United States Provisional Patent Application Serial No. 60/332,219, filed November 21, 2001, entitled "Methods and Apparatus For Calculation and Reduction of Time-Series Metrics from Event Streams or Legacy Databases in a System for Real-Time Business Visibility"; United States Patent Application Serial No. 10/302,727, filed November 21, 2002, entitled "Methods and Apparatus for Statistical Data Analysis and Reduction for an Enteφrise Application"; United States Patent Application Serial No. 10/138,725, filed May 3, 2002, entitled "Methods and Apparatus for Visualizing Relationships Among Triples of Resource Description Framework (RDF) Data Sets."
The invention pertains to surveillance, monitoring and real-time event processing. It has application in public health & bioterrorism, border and port security, public and community safety, and government data integration, to name a few. Today, national, state, and local governments are challenged to achieve unprecedented levels of cooperation in and among agencies and organizations charged with protecting the safety of communities. Many of these organizations use either proprietary or incompatible technology infrastructures that need to be integrated in order to provide real-time, critical infor- mation for effective event monitoring and coordinated emergency response. Information must be shared instantaneously and among numerous entities to effectively identify and respond to a potential threat or emergency-related event.
Significant efforts are underway along these lines, for example, in the public health and bioterrorism arena. The Centers for Disease Control and Prevention (CDC) of the U.S. Department of Health and Human Services has launched several initiatives toward forming nationwide networks of shared health-related information that, when fully implemented, will facilitate the rapid identification of, and response to, health and bioterrorism threats. The CDC plans the Health Alert Network (HAN), for example, to provide infrastructure supporting for distribution of health alerts, disease surveillance, and laboratory reporting. The Public Health Information Network (PHTN) is another CDC initiative that will provide detailed specifications for the acquisition, management, analysis and dissemination of health-related information, building upon the HAN and other CDC initiatives, such as the National Electronic Disease Surveillance System (NEDSS).
While these initiatives, and others like them in both health and non-health-related fields, define functional requirements and set standards for interoperability of the IT systems that hospitals, laboratories, government agencies and others will use in forming the nationwide networks, they do not solve the problem of finding data processing equipment capable of meet- ing those requirements and standards.
It is not uncommon for a single enteφrise, such as a hospital, for example, to have several separate database systems to track medical records, patient biographical data, hospital bed utilization, vendors, and so forth. The same is true of the government agencies charged with monitoring local, state and national health. In each enteφrise, different data processing systems might have been added at different times throughout the history of the enteφrise and, therefore, represent differing generations of computer technology. Integration of these systems, at the enteφrise level is difficult enough; it would be impossible on .any grander scale. This is a major impediment to surveillance, monitoring and real-time events processing in public health and bioterrorism. Similar issues result in parallel problems in border and port security, public and community safety, and government data integration, is the consolidation of data from disparate databases and other sources.
2 (Background) An object of this invention is to provide improved methods and apparatus surveillance, monitoring and real-time events processing.
A related object is to provide such methods and apparatus as can applied in public health and bioterrorism, e.g., to facilitate CDC initiatives in this area.
A further object of the invention is to provide such methods and apparatus as can be applied border and port security, public and community safety, and government data integration.
A still further object of the invention is to provide such methods and apparatus as can be implemented inexpensively, incrementally or otherwise without interruption of IT functions that they bring together.
(Background) Summary of the Invention
To meet these challenges, systems and methods described herein provide a surveillance, monitoring and real-time events platform to (i) enable the integration and communication of information between government agencies and organizations specifically tasked with ensuring the security and safety of our nation and its communities, (ii) to integrate information systems from federal, state and/or local agencies (from disparate data sources if necessary) in order to obtain a single, real-time view of the entire organization, and (iii) to extract more complete, actionable information from their existing systems, thereby dramatically improving decision making speed and accuracy.
The platform has application in a variety of areas, including, public health & bioterrorism, border and port security, public and community safety, and government data integration, to name a few.
Public Health & BioTerrorism
Effective .and timely surveillance and monitoring of health-related events is essential for early detection and management of a public health threats, whether a naturally occurring disease, such as West Nile Virus, or a biological or chemical attack. State and local public health officials must have the ability to identify the specific nature and scope of an event and launch a tightly coordinated response, all in real-time.
,In one aspect of the invention, the surveillance, monitoring and real-time events plat- form is adapted for use, e.g., as a local, state or federal node, in a network conforming to the Public Health Information Network (PHLN) initiative of the Centers for Disease Control and Prevention (CDC) of the U.S. Department of Health and Human Services, or as an infrastructure element of that network. This provides a real-time solution that:
• Delivers a dual purpose real-time syndromic surveillance system covering both bioterrorism and targeted communicable diseases
Transforms data from a variety of protocols (CSV, EDI, Excel, XML) into industry standard formats HL7 and HIPPA
Integrates disparate data systems (hospitals, labs, clinics, pharmacies) from any format or location quickly and without custom coding
4 (Summary) Enables synchronous and asynchronous collaboration between participating departments and personnel
Provides real-time customizable reporting and GIS mapping via web-based graphical interface
• Initiates and manages real-time notifications to first responders and public health officials via web, email, phone, wireless PDA and mobile phone
• Complies with the CDC's NEDSS, HAN and PHIN architectures
Systems and methods according to this aspect of the invention are designed as for multi- puφoses. They function as a real-time surveillance system, a bioterrorism detection and response system and a collaborative network for distance learning and communication.
As the CDC develops standards and mandated reporting protocols such as the National Electronic Disease Surveillance System (NEDSS), Health Alert Network (HAN) and Public Health Information Network (PHIN), it is up to state and local health officials to understand these new requirements and develop a system to comply. Systems and methods according to this aspect of the invention are designed to satisfy all NEDSS, HAN and PHIN requirements ■and more. They provide a platform technology that is highly flexible and scaleable meaning that it can adapt and stay current with new requirements and specifications with minimal effort. This feature allows health agencies to add data systems and functionality to the platform easily without impacting the current architecture.
Border & Port Security
Border and port security represent complex security challenges. These sites represent vulnerable points of entry and require monitoring of ocean vessel arrivals and departures, assessing potentially hazardous cargo, responding to immigration challenges, terrorist threats and managing the proximity risk to civilians and land-based targets such as nuclear facilities, dams, power plants, gas lines, and other biological and chemical facilities. Due to the complex and porous nature of borders and ports, many distinct organizations are required to work in close cooperation and effectively share critical information.
In one aspect of the invention, the surveillance, monitoring and real-time events platform is adapted for border and port security applications, providing:
5 (Summary) Real-time information in a secure web-based user interface
Providing a consolidated view of port security status by integrating multiple agencies and organizations existing information systems to appear as one, in real-time.
Integration of meteorological or other environmental information
GIS (geo-spatial mapping) for rapid local assessment and visibility
• Time-critical risk assessment based on local, state and federal data sources
Scenario-based event management for medical, emergency and public safety responders with immediate notifications to key safety personnel
Public & Community Safety
Local law enforcement agencies are increasingly involved in complex public safety issues. Today, aiφorts, construction sites, concerts, and other large, high-profile community events require greater levels of security, including biometric identification and other methods of individual scanning and surveillance. The surveillance, monitoring and real-time events platform can be deployed in applications designed to identify community threats or security breaches in a wide range of settings including inter-agency solutions for superior security surveillance and response. This platform provides:
• Real-time reporting with secure web-based user interface enabling a single view of a multi-agency operation
Integration of critical data from existing data sources (any data in any format) to create better public safety information
GIS (geo-spatial mapping) for rapid local assessment and visibility
Real-time risk assessment based on local, state and federal data sources
• Coordinated communication and immediate notifications to key safety personnel and responders
(Summary) Government Solution for Data Visibility
Government agencies are challenged with the daunting task of improving agency- wide and inter-agency information flow and visibility, especially in today's volatile environment. True agency-wide information access for real-time analysis is only achieved by being able to tie together all existing disparate data sources, from any location, and offer a consolidated view of critical information.
In one aspect of the invention, the surveillance, monitoring and real-time events plat- form provides a single point of access to all state security-related IT systems (Justice Dept, Law Enforcement, Dept of Health) to expedite identifying potential threats. The platform can also provide information visibility across, an organizations systems. The platform:
Leverages investments in existing IT infrastructure
• Provides a single, comprehensive view of critical information from all data sources
Provides a solution that is operational in a fraction of the time a "traditional" data integration project would take.
Benefits from a flexible, scalable, interoperable platform capable of integrating any agency's data sources for optimal visibility .and operational readiness
The aforementioned and other aspects of the invention are evident in the drawings and in the description that follows.
(Summary) Brief Description of the Drawings
The foregoing features of this invention, as well as the invention itself, may be more fully understood from the following detailed description of the drawings in which:
Figure 1 depicts a surveillance, monitoring and real-time events system 100 according to the invention suitable for the adaptation to a public health & bioterrorism application, e.g., as part of PHIN, HAN or NEDSS-compatible networks;
Figure 2A depicts an architecture for a hologram data store used in the system of Figure i;
Figure 2B depicts the tables in a model store and a triples store of the hologram data store of Figure 2 A;
Figure 3 depicts an expert engine to identify information in the data store or from the other information in the system of Figure 1; and
Figure 4-16 depict a visual display used in the system of Figure 1 to call alerts and other information to the attention of the user.
(Brief Descr) Detailed Description of the Illustrated Embodiment
Figure 1 depicts a surveillance, monitoring and real-time events system 100 according to the invention suitable for the adaptation to a public health & bioterrorism application, e.g., as part of PHIN, HAN or NEDSS networks. Illustrated system 100 represents a data processing station (or stations) resident at a node in such a network, such as, for example, a clinical care provider, a laboratory, a local or state health department, the CDC headquarters, a local or national law enforcement office, or otherwise. Though the illustrated system is used in a public health & bioterrorism application, it will be appreciated that a similar such system can be applied in border & port security, public & community safety, and government data integration applications, described above, among others.
Illustrated system 100, which can be embodied in conventional digital data processing apparatus (including attendant processor(s), display units, storage units, and communications devices) of the type conventional in the art, comprises connectors 108 that provide software interfaces to legacy and other databases, data streams, and sources of information — collectively, databases 140 — in clinical care facilities or other entities (such as agency field offices or laboratories), organizations (such as a governmental agencies) or enteφrises, such as the PHIN network, the HAN network or otherwise. A "hologram" data store 114 (hereinafter, "data store" or "hologram data store"), which is coupled to the databases 140 via the connectors 108, stores data from those databases 140. A framework server 116 accesses the data store 114, presenting selected data to (and permitting queries from) a user browser 118. The server 116 can also permit updates to data in the data store 114 and, thereby, in the databases 140. These updates can include both the addition of new data and the modification of old data.
In the illustration, databases 140 include a database 140a maintained with a Sybase® database management system, a database 140b maintained with an Oracle® database management system. The "databases" 140 also include a data stream 140c providing information from other nodes 100b, 100c, lOOd, lOOe, of the PHIN, HAN, NEDSS or other network 120. Those other nodes can be constructed and operated in the manner of system 100 (as suggested in the illustration by their depiction using like silhouettes) or in any other manner consistent with PHIN, HAN, NEDSS or other network operations. The network 120 represents the Internet, wide area network or other medium or collection of media that permit the transfer of information (continuous, periodic or otherwise) between the nodes in a manner consistent with require- ments of PHIN, HAN, NEDSS or other applicable network standards.
Of course, these are merely examples of the variety of databases or other sources of information with which methods and apparatus as described herein can be used. Common
9 (Detailed Descr) features of illustrated databases 140 are that they provide access to information of actual or potential interest to the node in which system 100 resides and that they can be accessed via application program interfaces (API) or other mechanisms dictated by the PHIN, HAN, NEDS S or other applicable network.
Connectors 108 serve as interfaces to databases, streams and other information sources 140. Each connector applies requests to, and receives information from, a respective database, using that database's API or other interface mechanism, e.g., as dictated by the PHIN, HAN or other otherwise. Thus, for example, connector 108a applies requests to database 140a using the corresponding SAP API; connector 108b applies requests to database 140b using the Oracle API; and connector 108c applies requests to and/or receives information from the stream or information source 140c use PHIN-appropriate, HAN-appropriate, NEDSS-appropriate or other stream or network-appropriate requests. Thus, by way of non-limiting example, the connector 108c can generate requests to the network 120 to obtain data from health care institu- tions and other nodes on the network.
The requests can be simple queries, such as SQL queries and the like (e.g., depending on the type of the underlying database and its API) or more complex sets of queries, such as those commonly used in data mining. For example, one or more of the connectors can use decision trees, statistical techniques or other query and analysis mechanisms known in the art of data mining to extract information from the databases. Specific queries and analysis methodologies can be specified by the hologram data store 114 or the framework server 116 for application by the connectors. Alternatively, the connectors themselves can construct specific queries and methodologies from more general queries received from the data store 114 or server 116. For example, request-specific items can be "plugged" into query templates thereby effecting greater speed and efficiency.
Regardless of their origin, the requests can be stored in the connectors 108 for application and/or reapplication to the respective databases 108 to provide one-time or periodic data store updates. Connectors can use expiration date information to determine which of a plurality of similar data to return to the data store, or if dates are absent, the connectors can mark returned data as being of lower confidence levels.
In a system 100 according to the invention used as part of the PHIN network, the con- nector 108c (and/or other functionality not shown) provides for the automated exchange of data between public health partners, as required of nodes in the PHIN network. Thus the connector 108c (and/or other functionality) comprises an ebXML compliant SOAP web service that can be reached via an HTTPS connection after appropriate authentication and comprises,
10 (Detailed Descr) or is coupled to, an HTTPS port. It also supports messaging in the industry standard requisite formats and message content specified by the PHIN standard. The connector 108c also provides for translation of messages received from the network 120 into a format compatible with the NEDSS and/or other requisite data models specified by the PHIN standards for storage in the data store 114 as detailed further below. And, the connector 108c (or other functionality) facilitates the exchange and management of specimen and lab result information, as required under the PHIN standard. Systems 100 according to the invention used as part of HAN or NEDSS-compatible networks provide similar functionality, as particularly required under those initiatives.
Data and other information (collectively, "messages") generated by the databases, streams and other information sources 140 in response to the requests are routed by connectors to the hologram data store 114. That other information can include, for example, expiry or other adjectival data for use by the data store in caching, purging, updating and selecting data. The messages can be cached by the connectors 108, though, they are preferably immediately routed to the store 114.
Information updates entered, for example, by a user who is accessing the store 114 via a server 116 and browser 118, are transmitted by server 116 to data store 114. There, any tri- pies implicated by the change are created or changed in store 114C, as are the corresponding RDF document objects in store 114A. An indication of these changes can be forwarded to the respective databases, streams or other information sources 140 via the connectors 108, which utilize the corresponding API (or other interface mechanisms) to alert those sources 140 of updates. Likewise, changes made directly to the store 114C, e.g., using a WebDAV client or otherwise, can be forwarded by the connector 108 to the respective sources 140.
The hologram data store 114 stores data from the databases 140 (and from the framework server 116, as discussed below) as RDF triples. The data store 114 can be embodied on any digital data processing system or systems that are in communications coupling (e.g., as defined above) with the connectors 108 andthe framework server 116. Typically, the data store 114 is embodied in a workstation or other high-end computing device with high capacity storage devices or arrays, though, this may not be required for any given implementation.
Though the hologram data store 114 may be contained on an optical storage device, this is not the sense in which the term "hologram" is used. Rather, it refers to its storage of data from multiple sources (e.g., the databases 140) in a form which permits that data to be queried and coalesced from a variety of perspectives, depending on the needs of the user and the capabilities of the framework server 116.
11 (Detailed Descr) To this end, a preferred data store 114 stores the data from the databases 140 in subject- predicate-object form, e.g., RDF triples, though those of ordinary skill in the art will appreciate that other forms may be used as well, or instead. By way of background, RDF is a way of expressing the properties of items of data. Those items are referred to as subjects. Their prop- 5 erties are referred to as predicates. And, the values of those properties are referred to as objects. In RDF, an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object.
Listed below is a portion of a data set of the type with which the invention can be prac- 10 ticed. The listing contains RDF triples, here, expressed in extensible markup language (XML) syntax. Those skilled in the art will, of course, appreciate that RDF triples can be expressed in other syntaxes and that the teachings hereof are equally applicable to those syntaxes. Further, the listing shows only a sampling of the triples in a data store 114, which typically would contain tens of thous,ands or more of such triples.
15
<rdf:RDF...xmms- 'http://wvmmetatomix.eom/postalCode/l .0#>
<rdf:Description rdf:about="postal://ziρ#02886">
<town>Warwick</town>
<state>RI</state> «Λ <coιmtry>USA</country>
<zip>02886</zip> <rdf:Description>
<rdf:Description rdf:about="postal://zip#02901">
<to n>Providence</town> 2-- <state>RI</state>
<country>USA</country>
<zip>0290K/zip> </rdf:Description>
30 Subjects .are indicated within the listing using a "rdfabout" statement. For example, the second line of the listing defines a subject as a resource named "postal://zip#02886." That subject has predicates and objects that follow the subject declaration. One predicate, <town>, is associated with a value "Warwick". Another predicate, <state>, is associated with a value "Rl". The same follows for the predicates <country> and <zip>, which are associated with 35 values "USA" and "02886," respectively. Similarly, the listing shows properties for the subject "ρostal://zip#02901," namely, <town> "Providence," <state> "Rl," <country> "US" and <zip> "02901."
12 (Detailed Descr) In the listing, the subjects and predicates are expressed as uniform resource indicators
(URIs), e.g., of the type defined in Berniers-Lee et al, Uniform Resource Identifiers (URT):
Generic Syntax (RFC 2396) (August 1998), and can be said to be expressed in a form <scheme>:/
/<path>#<fragment>. For the subjects given in the example, <scheme> is "postal," <ρath> is "zip," and <fragment> is, for example, "02886" and "02901."
The predicates, too, are expressed in the form <scheme>://<path>#<fragment , as is evident to those in ordinary skill in the art. In accord with XML syntax, the predicates in lines two, et seq., of the listing must be inteφreted as suffixes to the string provided in the namespace directive "xmh s=http://www.metatomix.com/postalCode/1.0#" in line one of the listing. This results in predicates that are formally expressed as: "http://www.metatomix.com/postalCode/ 1.0#town," "http://www.metatomix.eom/postalCode/l .0#state," "http://www.metatomix.com/ postalCode/1.O#country" .and "http://www.metatomix.eom/postalCode/l .0#zip."
Hence, the <scheme> for the predicates is "http" and <path> is "www.metatomix.com/ postalCode/1.0." The <fragment> portions are <town>, <state>, <country> .and <zip>, respectively. It is important to note that the listing is in some ways simplistic in that each of its objects is a literal value. Commonly, an object may itself be another subject, with its own objects and predicates. In such cases, a resource can be both a subject and an object, e.g., an object to all "upstream" resources and a subject to all "downstream" resources and properties. Such "branching" allows for complex relationships to be modeled within the RDF triple framework. ,
Figure 2 A depicts an architecture for a preferred hologram data store 114 according to the invention. The illustrated store 114 includes a model document store 114A and a model document manager 114B. It also includes a relational triples store 114C, a relational triples store manager 114D, and a parser 114E interconnected as shown in the drawing.
As indicated in the drawing, RDF triples maintained by the store 114 are received ~ from the databases 140 (via connectors 108) and/or from time-based data reduction module 150 (described below) — in the form of document objects, e.g., of the type generated from a Document Object Model (DOM) in a JAVA, C++ or other application. In the illustrated embodiment, these are stored in the model document store 114A as such (i.e., document objects) particularly, using the tables and inter-table relationships shown in Figure IB (see dashed box labelled 114B).
The model document manager 114B manages storage/retrieval of the document object to/from the model document store 114A. In the illustrated embodiment, the manager 114B
13 (Detailed Descr) comprises the Slide content management and integration framework, publicly available through the Apache Software Foundation. It stores (and retrieves) document objects to (and from) the store 114A in accord with the WebDAV protocol. Those skilled in the art will, of course, appreciate that other applications can be used in place of Slide and that document objects can be stored/retrieved from the store 114A in accord with other protocols, industry-standard, proprietary or otherwise.
However, use of the WebDAV protocol allows for adding, updating and deleting RDF document objects using a variety of WebDAV client tools (e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors), in addition to adding, updating and deleting document objects via connectors 108 and/or time-based data reduction module 150. This also allows for presenting the user with a view of a traversable file system, with RDF documents that can be opened directly in XML editing tools or from Java programs supporting WebDAV protocols, or from processes on remote machines via any HTTP protocol on which WebDAV is based.
RDF triples received by the store 114 are also stored to a relational database, here, store 114C, that is managed and accessed by a conventional relational database management system (RDBMS) 114D, operating in accord with the teachings hereof. In that database, the triples are divided into their constituent components (subject, predicate, and object), which are indexed and stored to respective tables in the manner of a "hashed with origin" approach. Whenever an RDF document is added, updated or deleted, a parser 114E extracts its triples .and conveys them to the RDBMS 114D with a corresponding indicator that they are to be added, updated or deleted from the relational database. Such a parser 114E operates in the conventional manner known in the art for extracting triples from RDF documents.
The illustrated database store 114C has five tables interrelated as particularly shown in Figure 2B (see dashed box labelled 114C). In general, these tables rely on indexes generated by hashing the triples' respective subjects, predicates and objects using a 64-bit hashing algo- rithm based on cyclical redundancy codes (CRCs) — though, it will be appreciated that the indexes can be generated by other techniques as well, industry-standard, proprietary or otherwise.
Referring to Figure 2B, the "triples" table 534 maintains one record for each stored triple. Each record contains an aforementioned hash code for each of the subject, predicate and object that make up the respective triple, along with a resource flag ("resource_flg") indicating whether that object is of the resource or literal type. Each record also includes an aforemen-
14 (Detailed Descr) tioned hash code ("m_hash") identifying the document object (stored in model document store 114A) from which the triple was parsed, e.g., by parser 114E.
In the illustrated embodiment, the values of the subjects, predicates and objects are not stored in the triples table. Rather, those values are stored in the resources table 530, namespaces table 532 and literals table 536. Particularly, the resources table 530, in conjunction with the namespaces table 532, stores the subjects, predicates and resource-type objects; whereas, the literals table 536 stores the literal-type objects.
The resources table 530 maintains one record for each unique subject, predicate or resource-type object. Each record contains the value of the resource, along with its aforementioned 64-bit hash. It is the latter on which the table is indexed. To conserve space, portions of those values common to multiple resources (e.g., common <scheme>://<path> identifiers) are stored in the namespaces table 532. Accordingly the field, "r_value," contained in each record of the resources table 530 reflects only the unique portion (e.g., <fragment> identifier) of each resource.
The namespaces table 532 maintains one record for each unique common portion referred to in the prior paragraph (hereinafter, "namespace"). Each record contains the value of that namespace, along with its aforementioned 64-bit hash. As above, it is the latter on which this table is indexed.
The literals table 536 maintains one record for each unique literal-type object. Each record contains the value of the object, along with its aforementioned 64-bit hash. Each record also includes an indicator of the type of that literal (e.g., integer, string, and so forth). Again, it is the latter on which this table is indexed.
The models table 538 maintains one record for each RDF document object contained in the model document store 114A. Each record contains the URI of the corresponding document object ("uri_string"), along with its aforementioned 64-bit hash ("m_hash"). It is the latter on which this table is indexed. To facilitate associating document objects identified in the models table 538 with document objects maintained by the model document store 114A, each record of the models table 538 also contains the ID of the corresponding document object in the store 114A. That ID can be assigned by the model document manager 114B, or otherwise.
From the above, it can be appreciated that the relational triples store 114C is a schema- less structure for storing RDF triples. As suggested by Melnik, an author well known to those skilled in the art of RDF and SQL, triples maintained in that store can be reconstituted via an
15 (Detailed Descr) SQL query. For example, to reconstitute the RDF triple having a subject equal to "postal:// zip#02886", a predicate equal to "http://www.metatomix.eom postalCode/l.0#town", and an object equal to "Warwick", the following SQL statement is applied:
SELECT m.uri_string, t.resource_flg, concat (nl.n_value, rl.r_value) as subj, concat (n2.n_yalue, r2.r_value) as pred, concat (n3.n_value,r3.r_value), l.l_value FROM triples t, models m, resources rl, resources r2, namespaces nl, namespaces n2 LEFT JOIN literals 1 on t.object=l.l_hash LEFT JOIN resources r3 on t.object=r3.r_hash LEFT JOIN namespaces n3 on r3.r_value=n3.n_value WHERE t.subject=rl .r_hash AND rl .n_hash=nl .n hash AND t.predicate=r2.r_hash AND r2.n_hash=n2.n_hash AND m.uri_id=t.m_hash AND t.subject=hash("ρostal://zip#02886") AND t.predicate=hash('http://www.metatomix.com/postalcode/l .0#town') AND t.object=hash('warwick')
Those skilled in the art will, of course, appreciate that RDF documents and, more generally, objects maintained in the store 114 can be contained in other stores — structured relation- ally, hierarchically or otherwise — as well, in addition to or instead of stores 114A and 114C.
In a system 100 according to the invention used as part of the PHIN network, the maintenance of data in the store 114 is accomplished in a manner compatible with the applicable PHIN standards, e.g., for the use of electronic clinical data for event detection. Thus, for example, data storage is compatible with the applicable logical data model(s), can associate incoming data with appropriate existing data (e.g., a report of a disease in a person who had another condition previously reported), permits potential cases should be "linked" and traceable from detection via electronic sources of clinical data or manual entry of potential case data through confirmation via laboratory result reporting, and permits data to be accessed for report-
16 (Detailed Descr) ing, statistical analysis, geographic mapping and automated outbreak detection algorithms, and so forth, all as required under the PHIN standards and further discussed below. Whether maintained in the data store 114, or otherwise, a system 100 according to the invention used as part of the PHIN network, provides directories of public health and clinical personnel accessible as required under the PHIN standards. Systems 100 according to the invention used as part of HAN or NEDSS-compatible networks provide similar functionality, as particularly required under those initiatives.
Referring to Figures 2 A, the relational triples store manager 114D supports SQL que- ries such as the one exemplified above (for extracting a triple with the subject "postal:// zip#02886", the predicate "http://www.metatomix.eom/postalCode/l.0#town", and the object
"Warwick"), in the manner described in commonly assigned United States Patent Application
Serial No. 10/302,764, filed November 21, 2002, entitled METHODS AND APPARATUS
FOR QUERYING A RELATIONAL DATA STORE USING SCHEMA-LESS QUERIES, now published as PCT WO 03044634 (Application WO2002US0037729), the teachings of which are incoφorated herein by reference (see, specifically, for example, Figure 3 thereof and the accompanying text), and a copy of which may be attached as an appendix hereto (and, if so, as
Appendix A).
The data store 114 can likewise include time- wise data reduction component of the type described in commonly assigned United States Patent Application Serial No. 10/302,727, filed November 21, 2002, entitled METHODS AND APPARATUS FOR STATISTICAL DATA ANALYSIS AND REDUCTION FOR AN ENTERPRISE APPLICATION, now published as PCT WO 03046769 (Application WO2002US0037727), the teachings of which are incoφo- rated herein by reference (see, specifically, for example, Figure 3 thereof and the accompanying text), a copy of which may be attached as an appendix hereto (and, if so, as Appendix B), to perform a time- wise reduction on data from the database, streams or other sources 140.
According to one practices of the invention, data store 114 includes a graph generator that uses RDF triples to generate directed graphs in response to queries made — e.g., by a user accessing the store via the browser 118 and server 116, by a surveillance, monitoring and realtime events application executing on the server 116 or in connection with the browser 118, by another node on the network 120 and received electronically or otherwise, or made otherwise — for information reflected by triples originating from data in one or more of the databases, strea. ms or other sources 140. Such generation of directed graphs from triples can be accomplished in any conventional manner known the art (e.g., as appropriate to RDF triples or other manner in which the information is stored) or, preferably, in the manner described in co-pending, commonly assigned United States Patent Application Serial No. 10/138,725, filed May 3,
17 (Detailed Descr) 2002, entitled METHODS AND APPARATUS FOR VISUALIZING RELATIONSHIPS AMONG TRIPLES OF RESOURCE DESCRIPTION FRAMEWORK (RDF) DATA SETS, now published as PCT WO 03094142A1 (Application WO2003US0012479), and United States Patent Application Serial No. 60/416,616, filed October 7, 2002, entitled METHODS AND APPARATUS FOR IDENTIFYING RELATED NODES IN A DIRECTED GRAPH HAVING NAMED ARCS, now published as PCT WO 04034625 (Application WO2003US0031636), a copy of which may be attached as an appendix hereto (and, if so, as Appendix C), the teachings of both of which are incoφorated herein by reference. Directed graphs so generated can be passed back to the server 116 for presentation to the user via browser 118, they can be "walked" by the server 116 to identify specific information responsive to queries, or otherwise.
Alternatively, or in addition, to the graph generator, the data store 114 can utilize genetic, self-adapting, algorithms to traverse the RDF triples in response to such queries. To this end, the data store utilizes a genetic algorithm that performs several searches, each utilizing a dif- ferent methodology but all based on the underlying query from the framework server, against the RDF triples. It compares the results of the searches quantitatively to discern which produce(s) the best results and reapplies that search with additional terms or further granularity.
In some practices of the invention, surveillance, monitoring and real-time events appli- cations executing on the connectors 108, the server 116, the browser .and/or the data store 114 utilize an expert engine-based system 8 of the type shown in Figure 3 to identify information in the data store 114 and/or from sources 140 responsive to queries and/or otherwise for presentation via browser 118, e.g., in the form of alerts, reports, or otherwise. The information so identified can, instead or in addition, form the basis of further processing, e.g., by such surveil- lance, monitoring and real-time events applications, in the form of broadcasts or messages to other nodes in the network 120, or otherwise, consistent with requirements of PHIN, HAN or other applicable standards.
Thus, for example, in a system 100 adapted for use in a node on the PHIN, the system 8 can be used to process data incoming from the sources 140 to determine whether it should be ignored, stored, logged for alert or classified otherwise. Data reaching a certain classification limit, moreover, can be displayed via the browser 118 and, more particularly, the dashboard discussed below, e.g., along with a map of the state, country or other relevant geographic region and/or along with other similar data.
Alternatively, in a system 100 adapted for use in a NEDSS compliant node, the expert engine-based system 8 can be used to detect the numbers of instances occurring over time and, if the number exceeds a threshold, to generate a report, e.g., for display via a dashboard window,
18 (Detailed Descr) or generate alert messages for transfer over the network 120 to targeted personnel (e.g., as identified by action of further rules or otherwise). In such a system 100, the expert engine can also be used to subset data used for display or reporting in connection with the collaborative function, e.g., specified under the CDC's HAN guidelines.
Referring to Figure 3, the system 8 includes a module 12 that executes a set of rules 18 with respect to a set of facts 16 representing criteria in order to (i) generate a subset 20 of a set of facts 10 representing an input data set, (ii) trigger a further rule, .and/or (iii) generate an alert, broadcast, message, or otherwise. For simplicity, in the discussion that follows the set of facts 16 representing criteria are referred to as "criteria" or "criteria 16," while the set of facts 10 representing data are referred to as "data" or "data 10."
Illustrated module 12 is an executable program (compiled, inteφreted or otherwise) embodying the rules 18 and operating in the manner described herein for identifying subsets of directed graphs. In the illustrated embodiment, module 12 is implemented in Jess (Java Expert System Shell), a rule-based expert system shell, commercially available from Sandia National Laboratories. However it can be implemented using any other "expert system" engine, if-then- else network, or other software, firmware and/or hardware environment (whether or not expert system-based) suitable for adaptation in accord with the teachings hereof.
The module 12 embodies the rules 18 in a network representation 14, e.g., an if-then- else network, or the like, native to the Jess environment. The network nodes are preferably executed so as to effect substantially parallel operation of the rules 18, though they can be executed so as to effect serial .and/or iterative operation as well or in addition. In other embodi- ments, the rules are represented in accord with the specifics of the corresponding engine, if- then-else network, or other softw.are, firmware and/or hardware environment on which the embodiment is implemented. These likewise preferably effect parallel execution of the rules 18, though they may effect serial or iterative execution instead or in addition.
The data set 10 can comprise any directed graph, e.g., a collection of nodes representing data and directed arcs connecting nodes to one another, though in the illustrated embodiment it comprises RDF triples contained in the data store 114 and/or generated from information received from the sources 140 via connectors 108. Alternatively, or in addition, the data set can comprise data structures representing a meta directed graph of the type disclosed in co-pending, commonly assigned United States Patent Application Serial No. 10/138,725, filed May 3, 2002, entitled METHODS AND APPARATUS FOR VISUALIZING RELATIONSHIPS AMONG TRIPLES OF RESOURCE DESCRIPTION FRAMEWORK (RDF) DATA SETS, e.g., at Figure 4 A - 6B .and accompanying text, all of which incoφorated herein by reference.
19 (Detailed Descr) Criteria 16 contains expressions including, for example, literals, wildcards, Boolean operators and so forth, against which nodes in the data set are tested. In embodiments that operate on RDF data sets, the criteria can specify subject, predicate and/or object values or other attributes. In embodiments that operate on directed graphs of other types other appropri- ate values and attributes may be specified. The criteria can be input by a user, e.g., via browser 118, e.g., on an ad hoc basis. Alternatively or in addition, they can be generated by surveillance, monitoring and real-time events applications executing on the connectors 108, the server 116, the browser and/or the data store 114.
Rules 18 define the tests for identifying data in the data set 20 that match the criteria or, where applicable, are related thereto. These are expressed in terms of the types and values of the data items as well as their interrelationships or connectedness. By way of example, a set of rules applicable to a data set comprised of RDF triples for identifying triples that match or are related to the criteria are disclosed in aforementioned incoφorated by reference United States Patent Application Serial No. 60/416,616, filed October 7, 2002, entitled METHODS AND APPARATUS FOR IDENTIFYING RELATED NODES IN A DIRECTED GRAPH HAVING NAMED ARCS (see, Appendix C hereof). Those skilled in the .art will, of course, appreciate that different rules may be applicable depending on the nature and focus of the information sought by any given surveillance, monitoring and real-time events application and that con- struction of such rules is within the ken of those skilled in the art based on the teachings hereof.
Referring to back to Figure 3, the data 20 output or otherwise generated by module 12 represents those triples matching (or, where applicable, related) to the criteria as determined by exercise of the rules. The data 20 can be output as triples or some alternate form, e.g., pointers or other references to identified data within the data set 10, depending on the needs of the surveillance, monitoring and real-time events application that invoked the system 8. As noted above, instead of or in addition to outputting data 20, the module 12 triggers execution of further rules, generate alerts, broadcasts, messages, or otherwise, consistent with requirements of PHIN, HAN or other applicable standards.
The framework server 116 presents information from the data store 114 and or sources 140 via browser 118. This can be based on requests entered directly by the user directly, e.g., in response to selections/responses to questions, dialog boxes or other user-input controls gen- erated by a surveillance, monitoring and real-time events application executing on the server 116 or in connection with the browser 118. It can also be based, for example, on information obtained from the database 114 and/or sources 140 by the expert engine-based system 8 described above.
20 (Detailed Descr) A further understanding of the operation of the framework server 116 may be attained by reference to the appendix filed with United States Patent Application Serial No. 09/917,264, filed July 27, 2001, now published as PCT WO02093319A2 and EP 1405219A2 (Application
EP2002000741711), and entitled METHODS AND APPARATUS FOR ENTERPRISE APPLI- CATION INTEGRATION, which appendix is incoφorated herein by reference.
According to one practice of the invention, a surveillance, monitoring and real-time events application includes a "dashboard" with display windows or panels that provide comprehensive real-time displays of information gathered from the data store 114 or other sources 140, as well as "alerts" resulting from anomalous situations detected by the surveillance, monitoring and real-time events application. The dashboard and alerts can be generated by an application executing on the server 116 and/or the browser 118 or otherwise.
Surveillance, monitoring and real-time events dashboards can display information and alerts that are specific to predefined categories, such as boarder and port security, health and bioterrorism, or public and community safety. These can be configured by users to display information from ad hoc combinations of data sources and user-defined alerts. For the puφose of describing the structure and operation of the surveillance, monitoring and real-time events dashboards, reference will be made to two representative examples (boarder/port security and health/bioterrorism), although these descriptions apply to other predefined and user-defined categories of information.
Figure 4 illustrates a border/port security dashboard 400. The dashboard displays several panels 402, 404, 406, 408, 410, 412 and 414. Panel 402 can be used to display information relating to an alert, if one has been issued by the surveillance, monitoring and real-time events application or by an external system. Panel 402 is described in more detail below. Each panel 404-414 displays information from a particular data source or an aggregation of data from several data sources. For example, panel 404 can contain real-time radar data from the US Coast Guard superimposed on a satellite image of Boston's inner harbor. The panel 404 display can be augmented with other Coast Guard data. For example, global positioning system (GPS) data from US Coast Guard vessels and vehicles (collectively "units") can be used to identify and then look up information related to these units. The unit identities can be superimposed on the image displayed in panel 404, as shown at 416, 418 and 420. Double-clicking on one of these units can cause the surveillance, monitoring and real-time events application to display infor- mation about the unit. This information can include, for example, contact information (e.g. frequency, call sign, name of person in charge, etc.), capabilities (e.g. maximum speed, crew size, weaponry, fire-fighting equipment, etc.) and status (e.g. docked, patrolling, busy intercepting a vessel, etc.).
21 (Detailed Descr) Panel 406 can contain real-time data from a port authority superimposed on a map of the inner harbor. Note that port authority data can include information related to the inner harbor that is different than information provided by the US Coast Guard. For example, the port authority data can include information on vessels traveling or docked within the inner harbor. Furthermore, the port authority data can relate to more than just the inner harbor. For example, the port authority data can include information related to an a port and a rail yard.
Other panels 410 and 412 can display information from other data sources, such as US Customs and local or state police. Panel 408 displays a current Homeland Security Advisory System threat level. Panel 414 displays contact information for agencies, such as the US Coast Guard, US Customs, port authority and state police, that might be invoked in case of an alert.
A user can double-click on any panel to display a separate window containing the panel. By this mechanism, the user can enlarge any panel. In addition, through appropriate mouse or keyboard commands, the user can zoom in on a portion of the image displayed by a panel. For example, the user can select a point on the panel display to re-center the display to the selected point and zoom in on that point. Alternatively, the user can select a rectangular portion of the panel display using a "rubber band" cursor and instruct the system to fill the entire panel with the selected portion. Figure 5 illustrates an example of such a window 500 displaying the port authority panel 406 of Figure 4. A user can, for example, double-click on a vessel 502 to display information about the vessel. Figure 6 illustrates an example of a pop-up window 600 that displays information about the selected vessel.
Although panels 402-414 contain graphical displays, other panels (not shown) can con- tain textural or numeric data. For example, panels containing shipping schedules, airline schedules, port volume statistics, recent headlines, weather forecasts, etc. can be available for display. Of course, other graphical panels, such as current meteorological data for various portions of the world, can also be available. The surveillance, monitoring and real-time events application can make available more panels than can be displayed at one time on the dashboard 400 (Figure 4). The dashboard 400 can display a default set of panels, such as panels 404-414. Optionally, the user can select which panels to display in the dashboard 400, as well as arrange the panels within the dashboard and control the size of each panel. If it is deemed desirable to display more panels than can be displayed at one time, some or all of the desired panels can be displayed on a round-robin basis.
In addition to allowing users to select items on panels to obtain further information about these items, the surveillance, monitoring and real-time events application can include rules and/or heuristics to automatically detect anomalies and alert users to these anomalies
22 (Detailed Descr) (hereinafter referred to as "alerts"). As a result of one of these alerts, the surveillance, monitoring and real-time events application preferably can select one or more panels containing particularly relevant information and display or enlarge those panels. The selected panels need not be ones that the user could select. For example, the surveillance, monitoring and real-time events application can create a new panel that includes a combination of data from several sources, the sources being selected by rule(s) that caused the alert to be issued.
The following example illustrates how an alert can be issued. As shown in Figure 7, the inner harbor can be partitioned into shipping lanes 700 and 702. The surveillance, monitoring and real-time events application can include rules describing permitted, required and/or prohibited behavior of vessels in these shipping lanes 700 and 702. Some rules can apply to all vessels. Other rules can apply to only certain vessels, for example according to the vessels' types, cargos, speeds, country of registry, as well as according to data unrelated to the vessels, such as time of day, day of week, season, Homeland Security Advisory System threat level, amount of other harbor traffic or amount or schedule of non-harbor traffic, such as aircraft at an adjacent aiφort. Other rules can apply to docked vessels, vessels under tow, etc. Similarly, rules can apply to aircraft, vehicles, or any measurable quantity, such as air quality in a subway station, seismic data, voltage in a portion of a power grid or vibration in a building, bridge or other structure. Rules can also apply to data entered by humans, such as the number of reported cases of food poisoning or quantities of antibiotics prescribed, ordered or on hand during a selected period of time.
Under normal circumstances, i.e. when no alerts are pending, the dashboard 400 (Figure 4) displays a default set of panels or a set of panels selected by the user, as previously described. If, for example, the previously mentioned tanker vessel 502 (Figure 7) carrying a hazardous cargo, such as liquefied natural gas (LNG), deviates 704 from a prescribed course, the surveillance, monitoring and real-time events application can issue an alert. Note that rules for vessels carrying hazardous cargos can be different than for vessels carrying non-hazardous cargos. In addition, other vessels can trigger the alert. For example, if the LNG tanker 502 is traveling within its prescribed course, but a high-speed vessel (not shown) or an aircraft is on a collision course with the LNG tanker, the surveillance, monitoring and real-time events application can issue an alert.
As a result of the alert, the surveillance, monitoring and real-time events application displays the alert panel 402 (Figure 4) and an alert message 422. In this case, the alert panel
402 displays a zoomed-in portion of the port authority panel 406. In addition, the surveillance, monitoring and real-time events application can automatically notify a predetermined list of people or agencies. The particular people or agencies can depend on factors, such as the time
23 (Detailed Descr) of day or the day of the week of the alert. Optionally, the surveillance, monitoring and real-time events application can notify other users at other nodes, such as nodes 100b, 100c, lOOd and/or lOOe (Figure 1). Information displayed on dashboards (not shown) at these other nodes lOOb-e need not be the same as information displayed on the dashboard 400. In particular, the information displayed on these other nodes lOOb-e can be more or less detailed than the information displayed on the dashboard 400. For example, summary information, such as an icon displayed on a map of the United States, can be displayed at command/control node to indicate an alert in Boston, without necessarily displaying all details related to the alert. A user at the command/ control node can double-click on the icon to obtain more detailed information.
Figures 8-16 illustrate an exemplary dashboard that can be used in a health and bioterrorism context. Figure 8 illustrates a dashboard 800 that contains several panels 802, 804, 806, 808 and 810. Panel 802 contains a map of the United States with icons 812, 814, 816 indicating locations of three alerts. Panel 804 contains emergency contact information that is relevant to the alerts. Panel 806 contains hyperlinks to discussion forums, in which agency representatives and other authorized groups and people can post messages and replies, as is well known in the art. Panel 808 contains hyperlinks to information that is relevant to the alerts. Panel 810 displays the current Homeland Security Advisory System threat level. These panels will be described in more detail below.
In this example, the icons 812, 814 and 816 represent medical care providers that have experienced noteworthy events or levels of activity. As previously described, an alert can be issued if, for example, the number of cases of disease, such as influenza, exceeds a predetermined threshold. In this example, Provider 3 has encountered patients with pneumonia that does not respond to antibiotics. The other alerts could relate to other anomalous events or levels of activity. Clicking the icon 816 causes the system to display information 818 related to the selected alert. Clicking on a link 820 causes the system to display more detailed information about the alert. For example, Figure 9 illustrates two panels 902 and 904, as well as a user selection area 906, that can be displayed. Panel 902 contains a more detailed map of the area in which the event occurred. Panel 904 list the number of cases by zip code of the patients. User selection area 906 enables the user to select one or more of the alerts, thereby selecting or aggregating data from the selected provider(s) for display in panels 902 and 904.
Returning for a moment to Figure 8, panel 804 contains icons for government agencies and other individuals or organizations (collectively "responders") that might be called upon to respond to manage a biological, nuclear, foodborne or other situations identified by the expert engine-based system 8 (e.g., as where the number of instances matching a specified critereon exceeds a threshold). Clicking link 822 displays a window containing emergency contact
24 (Detailed Descr) information for these responders, as shown in Figure 10 at 1000. Panel 1002 contains several emergency callout options, by which the user can manage the alerts. For example, clicking "Message Board" link 1004 displays a window containing messages posted in relation to this alert, as shown in Figure 11 at 1100. This message board enables users and responders to com- municate with each other in relation to the alert. An "Initiate a new Callout" link 1102 enables the user to initiate a new situation, as shown in Figure 12.
In response to an alert, the surveillance, monitoring and real-time events application automatically performs searches of the Internet and responder intranets for information rele- vant to the alert. As previously mentioned, panel 808 (Figure 8) contains hyperlinks to information that is relevant to the alerts, including results from these searches and predefined information sources that have been identified as relevant. The surveillance, monitoring and realtime events application can, for example, have a database of information sources catalogued according to alert type. As shown in Figure 13, clicking on one of the hyperlinks in the panel 808 opens a new window 1300 displaying contents identified by the hyperlink.
Returning again to the dashboard 800 shown in Figure 8, the user can select a module via a pull-down list 824. For example, the user can select "Reports", in which case the system displays a window similar to that shown in Figure 14. After selecting one or more providers 1402 and 1404, the system displays a report in a report panel 1406.
Figure 15 illustrates another graphical display 1500, by which the system can display an alert. In the example of Figure 15, two potential outbreaks of anthrax are shown. For each potential outbreak, the system displays information, such as proximity of the outbreak to the nearest residential area, as well as the population of the residential area, proximity to the nearest emergency medical center and the number of free beds in the medical center. Being tied into existing hospital systems, the surveillance, monitoring and real-time events application can query those hospital systems and display relevant information, as shown in Figure 16.
Described herein are methods and apparatus meeting the above-mentioned objects. It will be appreciated that the illustrated embodiment is merely an example of the invention and that other embodiments, incoφorating changes to those described herein, fall within the scope of the invention. Thus, for example, as noted earlier, although the illustrated embodiment is adapted for use in public health & bioterrorism application (with additional examples provided with respect to border and port security) it will be appreciated that a similar such systems can be applied in public & community safety, and government data integration applications, described above, among others.
25 (Detailed Descr) 2004/034625
Appendix A
Copy of United States Patent Application Serial No. 10/302,764, filed November 21, 2002, entitled METHODS AND APPARATUS FOR QUERYING A RELATIONAL DATA STORE USING SCHEMA-LESS QUERIES, now published as PCT WO 03044634 (Application WO2002US0037729).
(31 pages, including this cover sheet) AODA-1 f Atraendix A METHODS AND APPARATUS FOR QUERYING A RELATIONAL DATA STORE
USING SCHEMA-LESS QUERIES
Background
This application claims the benefit of priority of United States Provisiond Patent Application Serial No. 60/332,053, filed November 21, 2001, entitled "Methods And Apparatus For Querying A Relational Database In A System For Real-Time Business Visibility" .and U.S. Provisional Patent Application Serial No. 60/332,219, filed on November 21, 2001, entitled "Methods And App.aratus For Calculation and Reduction of Time-Seri∞ Metrics From Event Streams Or Legacy Databases In A System For Real-Time Business Visibility." This application is .also a continuation-in-part of United States Patent Application Serial No. 09/917,264, filed July 27, 2001, entitled "Methods .and Apparatus for Enteφrise Application Integration" and United States Patent Application Serial No. 10/051,619, filed October 29, 2001, entitled "Methods And Apparatus For Real-Time Business Visibility Using Persistent Schema-Less Data Storage." The teachings of all of the foregoing applications are incoφorated herein by reference.
The invention pertains to digital data proce^ing ά, more p.articularly, to methods and apparatus for enteφrise business visibility and insight using real-time reporting tools.
It is not uncommon for a single enteφrise to have several separate database .sy.stems to track internal and external planning and transaclional data. Such systems might have been developed at different times throughout the history of the enteφrise and, therefore, represent differing generations of computer technology. For example, a marketing database system tracking customers may be ten years old, while .an enteφrise resource planning (ERP) .system tracking inventory might be two or three years old. Integration between these systems is dif- ficult at best, consuming specialized programming skill and constant maintenance expenses.
A major impediment to enteφrise business visibility is the consolidation of data from these disparate legacy databases with one another and with that from newer e-commerce databases. For instance, inventory on-hand data gleaned from a legacy ERP system may be diffi- cult to combine with customer order data gleaned from web servers that .support e-commerce (and other web-b.ased) transactions. This is not to mention difficulties, for example, in consolidating resource scheduling data from the ERP system with the forecasting data from the marketing database system.
An object of this invention is to provide improved methods and apparatus for digital data processing and, more particul.arly, for enteφrise business visibility and insight (hereinafter, "enteφrise business visibility"). A further object is to provide such methods and apparatus as can rapidly and accurately retrieve information responsive to user inquiries.
A further object of the invention is to provide such methods .and app.aratus as can be readily .and inexpensively integrated with legacy, current and future database management systems.
A still further object of the invention is to provide such methods and apparatus as can be implemented incrementally or otherwise without interruption of enteφrise operation.
Yet a still further object of the invention is to provide such methods and apparatus as to facilitate ready access to up-to-date enterprise data, regardless of its underlying source.
Yet still a further object of the invention is to provide such methods and apparatus as permit flexible presentation of enteφrise data in an easily understood manner.
Summary of the Invention
These .and other objects are attained by the invention which provides, in one aspect, a method of searching an RDF triples data store of the type in which the' triples are maintained in accord with a first .storage schema. The method includes inputting a first query based, for example, on a user request, specifying RDF triples that are to be identified in the data store. That first query assumes either (i) that the triples are stored in a schema-less manner (i.e., with no storage schema) or (ii) that the triples are maintained in accord with a second storage schema that differs from the first. The method further includes generating, from the first query, a second query that specifies those same RDF triples, yet, that reflects the first storage schema. That second query can be applied to the RDF triples data store in order to identify .and/or retrieve the desired data.
The invention provides, in further aspects, a method as described above including the .steps of examining the first query for one or more tokens that represent data to be used in generating the second query. It also includes dispatching context-specific grammar events containing that data. Arelated aspect of the invention provides for dispatching events that represent any of declarations and constraints specified in the first query. A still further related aspect provides for dispatching declaration events specifying RDF documents from which triples are to be identified and constraint events specifying the triples themselves.
Further aspects of the invention provides methods as described above that include the steps of extracting statement data from the first query and associating that .statement data with at least a portion of the second query. That second query can be generated, according to related aspects of the invention, in the form of an SQL SELECT statement The associating step can include associating statement data from the first query with one or more clauses of the SELECT statement, to wit, the SELECT clause, the FROM clause, the WHERE clause and the ORDER- BY clause.
Still further .aspects of the invention provide a method of translating a schema-less input query in a first language to .an output query in a second l.anguage. As above, the method includes examining the schema-less input query for one or more tokens that represent data to be used in generating the output query; dispatching context-specific gr.amm.ar events containing that data; and populating portions of the output query according to the events and data. The method further includes generating the output query in the second language comprising those populated portions, where the output query embodies a schema of a relational database storing RDF triples. A related aspect of the invention provides methods as described above in which the dispatching step includes generating any of a logical condition event, a selection term decimation event, .and a triple declarations event. A further related aspect of the invention includes generating a logical condition event containing data which, when applied to the relational data- base via the output query, identifies RDF triples according to a specific Boolean condition. A further related aspect of the invention includes generating sa event containing data which, when applied to the relational database via the output query identifies RDF triples including a specified term. A still further related aspect of the invention includes generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples having a specified subject predicate and/or object.
Related aspects of the invention provide methods as described above in which the first language is any of SQL-like and XML-like.
These and other aspects of the invention are evident in the drawings and in the description that follows.
Brief Description of the Drawings
The foregoing features of this invention, as well as the invention itself, may be more fully understood from the following detailed description of the drawings in which:
Figure 1 depicts an improved enteφrise business visibility and insight system according invention;
Figure 1 A depicts an architecture for a hologram data store according to the invention, e.g., in the system of cl.aim 1;
Figure IB depicts the tables in a model store and a triples store of the hologram data store of Figure 1A;
Figure 2 depicts a directed graph representmg data triples of the type maintained in a data store according to the invention.
Figure 3 is a functional block diagram of a query translator module in a system according to the invention.
Detailed Description of the Illustrated Embodiment
Figure 1 depicts a real-time enterprise business visibility and insight system according to the invention. The illustrated system 100 includes connectors 108 that provide software interfaces to legacy, e-commerce and other databases 140 (hereinafter, collectively, "legacy databases"). A "hologram" database 114 (hereinafter, "data store" or "hologram data store"), which is coupled to the legacy databases 140 via the connectors 108, stores data from those databases 140. A framework server 116 accesses the data store 114, presenting selected data to (and permitting queries from) a user browser 118. The server 116 can also permit updates to data in the data store 114 and, thereby, in the legacy datab.ases 140.
Legacy databases 140 represent existing (and future) databases and other sources of information (including data streams) in a company, organization or other entity (hereinafter "enteφrise"). In the illustration, these include a retail e-commerce database (e.g., as indicated by the cloud and server icons adjacent database 140c) maintained with a Sybase® database management system, an inventory database maintained with an Oracle® database management system and an ERP database maintained with a SAP® Enteφrise Resource Planning system. Of course, these we merely examples of the variety of databases or other sources of information with which methods and apparatus as described herein can be used. Common features of illustrated databases 140 are that they maintain information of interest to an enteφrise and that they can be accessed via respective software application program interfaces (API) or other mechanisms known in the art.
Connectors 108 serve as an interface to legacy database systems 140. Each connector applies requests to, and receives information from, a respective legacy database, using that database's API or other interface mechanism. Thus, for example, connector 108a applies requests to legacy database 140a using the corresponding SAP API; connector 108b, to legacy database 140b using Oracle API; and connector 108c, to legacy database 140c using the corresponding Sybase API.
In the illustrated embodiment, these requests .are for puφoses of accessing data stored in the respective databases 140. The requests can be simple queries, such as SQL queries and the like (e.g., depending on the type of the underlying database and its API) or more complex sets of queries, such as those commonly used in data mining. For example, one or more of the connectors can use decision trees, statistical techniques or other query and analysis mechanisms l nown in the art of data mining to extract information from the databases. Specific queries and analysis methodologies can be specified by the hologram data store 114 or the framework server 116 for application by the connectors. Alternatively, the connectors themselves can construct specific queries and methodologies from more general queries received from the data store 114 or server 116. For example, request-specific items can be "plugged" into query templates thereby effecting greater speed and efficiency.
Regardless of their origin, the requests can be stored in the connectors 108 for application and/or reapplication to the respective legacy databases 108 to provide one-time or periodic data store updates. Connectors can use expiration date information to determine which of a plurality of similar data to return to the data store, or if dates are absent, the connectors can mark returned data as being of lower confidence levels.
Data and other information (collectively, "messages") generated by the databases 140 in response to the requests are routed by connectors to the hologram data store 114. That other information can include, for example, expiry or other adjectiv data for use by the data store in caching, purging, updating and selecting data. The messages can be cached by the connectors 108, though, they are preferably immediately routed to the store 114.
The hologram data store 114 stores data from the legacy databases 140 (and from the framework server 116, as discussed below) as RDF triples. The data store 114 can be embodied on any digital data processing system or systems that are in communications coupling (e.g., as defined above) with the connectors 108 .and the framework server 116. Typically, the data store 114 is embodied in a workstation or other high-end computing device with high capacity storage devices or arrays, though, this may not be required for any given implementation.
Though the hologram data .store 114 may be contained on an optical storage device, this is not the sense in which the term "hologram" is used. Rather, it refers to its storage of data from multiple sources (e.g., the legacy databases 140) in a form which permits that data to be queried and coalesced from a variety of perspectives, depending on the needs of the user .and the capabilities of the framework server 116.
To this end, a preferred data store 114 stores the data from the legacy databases 140 in subject-predicate-object form, e.g., RDF triples, though those of ordin.ary skill in the art will appreciate triat other forms may be used as well, or instead. By way of background, RDF is a way of expressing the properties of items of data. Those items are referred to as subjects. Their properties .are referred to as predicates. And, the values of those properties are referred to as objects. In RDF, an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object. Subjects, also referred to as resources, can be anything that is described by an RDF expression. A subject can be person, place or thing— though, typically, only an identifier of the subject is used in an actual RDF expression, not the person, place or thing itself. Examples of subjects might be "car," "Joe," "http://www.metatomix.com." 5
A predicate identifies a property of a subject According to the RDF specification, this may be any "specific aspect, characteristic, attribute, or relation used to describe a resource." For the three exemplary subjects above, examples of predicates might be "make," "citizenship," "owner." 0
An object gives a "value" of a property. These might be "Ford," "United Kingdom," "Metatomix, Inc." for the subject and objects given in the prior paragraphs, forming the following RDF triples: 5 Subject Predicate Object
"car" "make" "Ford"
"Joe" "citizenship" "United Kingdom"
"http://metatomix.com" "owner" "Metatomix, Inc." Q Objects can be literals, i.e., strings that identify or name the coiresponcling property
(predicate). They can .also be resources. In the example above, rather than merely the string "Metatomix, Inc." further triples may be specified— presumably, ones identifying that company in the subject and giving details in predicates and objects. ; A given subject may have multiple predicates, each predicate indexing an object For example, a subject postal zip code might have an index to an object town and an index to an object state, either (or both) index being a predicate URI.
Listed below is a portion of a data set of the type with which the invention can be prac-0 ticed. The listing contains RDF triples, here, expressed in extensible markup language (XML) syntax. Those skilled in the art will, of course, appreciate that RDF triples can be expressed in other syntaxes and that the teachings hereof are equally applicable to those syntaxes. Further, the listing shows only a sampling of the triples in a database 114, which typically would contain tens of thousands or more of such triples. 5 <rdf:RDF...xmlns="http://www.metatomix.com/postalCode 1.0#> <rdf:Description rdf:about="postal://zip#02886">
<town>Warwick<Λown>
<state RI</state>
<country>USA< countr >
<zip>02886< zip>
Figure imgf000037_0001
<rdf:Description rdf:about="postal://zip#02901">
<to n>Providence<Λown>
<state>RI state>
<country>USA</country>
<zip 02901</zip> < rdf:Descriptiori>
Subjects are indicated within the listing using a "rdfiabout" statement. For example, the second line of the listing defines a subject as a resource named "postal://zip#02886." That subject has predicates .and objects that follow the subject declaration.
One predicate, <town>, is associated with a value "Warwick". Another predicate, <state>, is associated with a value "Rl". The same follows for the predicates <country> and <zip>, which are associated with values "USA" and "02886," respectively. Similarly, the listing shows properties for the subject "postal://zip#02901," namely, <town> "Providence," <state> "Rl," <couπtry> ,(US" and zip> "02901."
In the listing, the subjects and predicates are expressed as uniform resource indicators (URIs), e.g., of the type defined in Berniers-Lee et al, Uniform Resource Identifiers (TJRI): Generic Syntax (RFC 2396) (August 1998), and can be said to be expressed in a form <scheme>://<path>#<fragment>. For the subjects given in the ex. mple, <scheme> is "postal," <path> is "zip," and <fragment> is, for example, "02886" and "02901."
The predicates, too, are expressed in the form <scheme>://<path>#<fragment>, as is evident to those in ordinary skill in the art. In accord with XML syntax, the predicates in lines two, etseq., of the listing must be inteφreted as suffixes to the string provided in the namespace directive "xmlns=htφ://www.metatomix.com/postalCode/1.0#" in line one of the listing. This results in predicates that are formally expressed as: "htφ://www.metatomix.com/postalCode/ 1.0#town," "htφ://www.metatomix.com ρostalCode/1.0#state," "http://www.metatomix.com/ postalCode/1.0#country" and "http://www.metatomix.eom/postalCode/l.0#zip." Hence, the <scheme> for the predicates is "http" and <path> is "www.metatomix.com/ postalCode/1.0." The <fragment> portions are <town>, <state>, <country> and <zip>, respectively. It is important to note that the listing is in some ways simplistic in that each of its objects is a literal value. Commonly, an object may itself be another subject, with its own objects and predicates. In such cases, a resource can be both a subject and an object, e.g., an object to all "upstream" resources and a subject to all "downstream" resources and properties. Such "branching" allows for complex relationships to be modeled within the RDF triple framework.
Figure 2 depicts a directed graph composed of RDF triples of the type stored by the illustrated data store 114, here, by way of non-limiting example, triples representing relationships .among four companies (id#l, id#2, id#3 and id#4) and between two of those companies (id#l and id#2) and their employees. Per convention, subjects and resource-type objects are depicted a . s oval-shaped nodes; literal-type objects axe depicted as rectangular nodes; and predicates are depicted as arcs connecting those nodes.
Figure 1 A depicts £tn architecture for a preferred hologram data store 114 according to the invention. The illustrated store 114 includes a model document store 114A and a model document manager 114B. It also includes a relational triples store 114C, a relational triples store manager 114D, and a parser 114E interconnected as shown in the drawing.
As indicated in the drawing, RDF triples maintained by the store 114 are received — from the legacy databases 140 (via connectors 108) and/or from time-based data reduction module 150 (described below) -- in the form of document objects, e.g., of the type generated from a Document Object Model (DOM) in a JAVA, C++ or other application. In the illustrated embodiment, these are stored in the model document store 114A as such (i.e., document objects) particularly, using the tables and inter-table relationships shown in Figure IB (see dashed box labelled 114B).
The model document manager 114B manages storage/retrieval of the document object to/from the model document store 114A. In the illustrated embodiment, the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache SoftwEire Foundation. It stores (and retrieves) document objects to (and from) the store 114A in accord with the WebDAV protocol. Those skilled in the art will, of course, appreciate that other applications can be used in place of Slide and that document objects can be stored/retrieved from the store 114A in accord with other protocols, industry- standard, proprietary or otherwise. However, use of the WebDAV protocol allows for adding, updating and deleting RDF document objects using a variety of WebDAV client tools (e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors), in addition to adding, updating and deleting document objects via connectors 108 and/or time-based data reduction module 150. This also al lows for presenting the user with a view of a traversable file system, with RDF documents that can be opened directly in XML editing tools or from Java programs supporting WebDAV protocols, or from processes on remote machines via any HTTP protocol on which WebDAV is based.
RDF triples received by the store 11 .are also stored to a relational database, here, store
114C, that is m-anaged and accessed by a conventional relational database management system (RDBMS) 114D operating in accord with the teachings hereof. In that database, the triples .are divided into their constituent components (subject, predicate, and object), which are indexed and stored to respective tables in the manner of a "hashed with origin" approach. Whenever an RDF document is added, updated or deleted, a parser 114E extracts its triples and conveys them to the RDBMS U4D with a corresponding indicator that they .are to be added, updated or deleted from the relational database. Such a parser 114E operates in the conventional m.anner known in the art for extracting triples from RDF documents.
The illustr.ated database store 114C has five tables interrelated as particulwly shown in
Figure IB (see d.ashed box labelled 114C). In general, these tables rely on indexes generated by hashing the triples' respective subjects, predicates and objects using a 64-bit hashing algorithm based on cyclical redundancy codes (CRCs) — though, it will be appreciated that the indexes can be generated by other techniques as well, industry-standard, proprietary or other- wise.
Referring to Figure IB, the "triples" table 534 maintains one record for e.ach stored triple. Each record contains an aforementioned hash code for each of the subject, predicate and object that make up the respective triple, along with a resource flag ("resource_flg") indicating whether that object is of the resource or literal type. Each record also includes an aforementioned hash code ("m_hash") identifying the document object (stored in model document store 114A) from which the triple was parsed, e.g., by parser 114E.
In the illustrated embodiment, the values of the subjects, predicates and objects are not stored in the triples table. Rather, those values are stored in the resources table 530, namespaces table 532 and literals table 536. Particularly, the resources table 530, in conjunction with the namespaces table 532, stores the subjects, predicates and resource-type objects; whereas, the literals table 536 stores the literal-type objects. The resources table 530 maintains one record for each unique subject, predicate or resource-type object Each record contains the value of the resource, along with its aforementioned 64-bit hash. It is the latter on which the table is indexed. To conserve sp.ace, portions of those values common to multiple resources (e.g., common <scheme>*y/<path> identifiers)a . re stored in the namespaces table 532. Accordingly the field, "r_value," contained in each record of the resources table 530 reflects only the unique portion (e.g., <fragmem identifier) of each resource.
The namespaces table 532 maintains one record for each unique common portion referred to in the prior paragraph (herein-after, "namespace"). Each record 'contains the value of that namespace, along with its aforementioned 64-bit hash. As above, it is the latter on which this table is indexed.
The literals table 536 maintains one record for each unique literal-type object Each record contains the value of the object, along with its .aforementioned 64-bit hash. Each record also includes an indicator of the type of that literal (e.g., integer, string, and so foith). Again, it is the latter on which this table is indexed.
The models table 538 maintains one record for each RDF document object contained in the model document store 114A. Each record contains the URI of the corresponding document object ("uri_.string"), along with its aforementioned 64-bit hash ("m_hash"). It is the latter on which this table is indexed. To facilitate associating document objects identified in the models table 538 with document objects maintained by the model document store 114A, each record of the models table 538 also contains the ID of the corresponding document object in the store 114A. That ID can be assigned by the model document manager 114B, or otherwise.
From the above, it can be appreciated that the relational triples store 114C is a schema- less structure for storing RDF triples. As suggested by Melnik, supra, triples maintained in that store can be reconstituted via an SQL query. For example, to reconstitute the RDF triple having a subject equal to "postal://zip#02886", apredicate equal to "http://wwwjnetatomix.com/ postaICode/1.0#town", and an object equal to "Warwick", the following SQL statement is applied:
SELECT m.uri_string, tresource lg,
concat (nl.n_value, rl.r_v.alue) as subj,
concat (n2.n_value, r2.r_value) aspred, concat (n3.n_value,r3.r_value),
Upvalue
FROM triples t, models m, resources r 1 , resources r2, namespaces nl , namespaces n2
LEFT JOIN literals 1 on tobject=l.l_hash
LEFT JOIN resources r3 on t.object=τ3.r_hash
LEFT JOIN namespaces n3 on r3.r_value=n3.n_value
WHERE t.subject=rl .r Jhash AND r I .n_hash=n 1.n hash AND
t.predicate=r2.r_hash AND r2.n_hash=n2.n_hash AND
Figure imgf000041_0001
t.predicate=hωh(,http://www.metatomix.com/postalcode/l .0#town') AND
t.object~h.ash('warwick')
Those skilled in the art will, of course, appreciate that RDF documents and, more gener- ally, objects maintained in the store 114 can be contained in other stores — structured relation- ally, hierarchically or otherwise — as well, in addition to or instead of stores 114A and 114C.
Referring to Figures 1A and 3, the relational triples .store manager 114D supports SQL queries such as the one exemplified above (for extracting a triple with the subject "postal;// zip#02886", the predicate "ht .v7www.metatormx.com/postaICode/l .0#town", and the object "Warwick"). As evident in the example, such SQL queries must take into account the underlying storage schema of the relational database (here, hashed by origin). To remove that as a constraint on queries, e.g., made by a user and applied via the framework 116, or otherwise, a query translator 190 translates schema-less queries 12 into schema-based SQL queries 642 for application to the relational store 114C. In the illustrated embodiment, the schema-less queries are expressed in an SQL-like language (here, identified as 'ΗxQL") or in an XML-like language (here; identified as "HxML"), however, it will be appreciated that any language or means for expressing a query, schema-less or otherwise, may be used instead or in addition. The illustrated query translator 190 has a language-parsing component 602, an event- processing component 604, and an SQL statement management/generation component 606. The language-parsing component 602 examines the input query 612 for tokens that represent data to be used in generating the SQL statement 642.and dispatches context-specific grammar events containing that data to the event processor. The event processor receives these and retrieves the data stored within them for use by statement management/generation component 606 to generate the SQL SELECT statement 642.
In tihe illu.strated embodiment, the language-parsing component 602 has two parsing elements, each directed to one of two languages in which schema-less queries 612 can be expressed. The HxQL parser 608 parses queries expressed in the HxQL language, while the HxML parser 610 parses queries expressed in the HxML. HxQL grammar is based on R.V. Guha's RDFDB query language, Libby Miller's SquishQL and Andy Seaborne's RDQL. The HxQL parser 608 is implemented using JavaCC, a commercially available parser developed jointly by Sun Microsystems and Metamata. HxML comprises a grammar based on XML. The HxML parser 610 is implemented using an XML parser, such as Xerces available from Apache. It will be appreciated that in other embodiments, the lmguage-parsing component 602 can have more, or fewer, parsing elements, and that those elements can be used to parse other languages in which the input query may be expressed.
The illustrated language-parsing component 602 can dispatch eight events. For example, a global document declaration event is dispatched indicating that a RDF document specified by a URI is included in the optional set of default document models to query. A logical condition event is dispatched when a constraint is parsed limiting triple data that is to be con- sidered for retrieval. A namespace declaration event is dispatched when a mapping has been declared between an alias id and a URI fragment. An order by declaration event is dispatched when a record sorting order is specified with regard to columns of data representing terms selected for retrieval. A selection term declaration event is dispatched when a term is selected for retrieval. A triple decl£iration event is dispatched when a criterion for triple consideration is declared. A triple document decimation event is dispatched when at least one URI for an RDF document is declared to replace the set of default document models to query against but for a single particular triple criterion. And finally, a triple model-mapping event is dispatched when the set of default document models to query against for an individual triple criterion will be sh-ared with a different individual triple criterion. It will be appreciated that more, or less, that these events are only examples of ones that can be dispatched, and in one embodiment, more (or less) events .are appropriate depending on the schema of the datable to be searched. The event-processing component 604 listens for context-specific grammar events and extracts the data stored within them to populate the statement managment/generator component 606 with the data it needs for generating the SQL SELECT statement 642. For example, a Boolean constraint represented in a logical condition event is extracted and dispatched to the statement management/generation component 606 for inclusion in a SELECT WHERE clause of a SQL SELECT statement.
The statement management generation component 606 stores and manages statement data and maps it directly to the relational triples store 114C schema. It uses that mapped data to generate an output query 642 corresponding to the input query 612. The statement manager 606 delegates the generation of the SQL SELECT statement to agent objects 634-640. E.ach agent generates a particular clause of the SELECT statement, e.g., the SELECT, FROM, WHERE and ORDER-BY clauses. In other embodiments, the statement manager can generate queries according to a different database storage schema and can output queries conforming to other languages.
In the illustrated embodiment, a select clause agent 634 generates the SELECT clause by mapping each term to the appropriate table and/or field name corresponding to tables/field names in triples data store 114C. A from clause agent 636 generates the FROM clause and ensures that table instances .and their alias abbreviations are declared for use in other clauses. A where clause agent 638 generates the WHERE clause and ensures that all necessary table JOINS and filtering constraints are specified. Lastly, an order-by clause agent 640 generates an optional ORDER-BY clause thus specifying an order of the output results. In one embodiment, the agent objects distribute SQL generation between custom fragment managers and uses dif- fering agents in accord with the databa to be searched. Hence, it can be appreciated that the above agents are exemplary of a query translator 600 directed to generating queries for a relational triple store 114C, .and in other embodiment, agents will be in accord with the data store of that embodiment. Each agent can also gather data from other agents as necessary, for example, alias information stored in a SELECT clause can be used to formulate constraints in the WHERE clause. Hence, the agents work in tandem until .all statement data is properly "mapped" according to the schema of the triples store 114C.
It will be appreciated by those skilled in the art that the query translator 600 can be encapsulated and composited into other software components. It will also be appreciated that although the query translator 160 is directed toward an RDF triples store utilizing the hash with origin schema, it can generate output for use with triples (or other) stores utilizing other database vendors. For example, the query translator 160 can be implemented to output various SQL dialects, e.g., Microsoft SQL, which uses 0 and 1 for Boolean values versus the conven- tional TRUE/FALSE keywords. Further, configurable options such as generating SQL with or without computed hash codes in join criteria can be accommodated, as well.
Illustrated below is an example of use of the query translator 160 to generate an
5 output SQL query 642 for application against a relational store 114C containing triples
(organized in the .aforementioned hashed with origin schema) from the RDF document
<?xml version="1.0" encoding="UTF-8" ?>
, 0 <rdf:RDF xm\ns-.rd^h^:/ / w.vt3.oτg/l999/02J72-τdf-syntøκ-ns#" xmlns:mtx="httρ- / metatomix.co /blood/1.0#">
<rdf:Descriρtion rdf.about="bl<κxltyρe://id#()01"> <mtx:group>0<Λntx:group> <mtx:rh_status>+< ιntx:rh_status ■. <- </rdf_Descriptiori>
<ϊdf:Descriprion rdf:about="bloodtype://id#002"> ratx:group>0</mtx:gro p>
<3mtx:rh_status>- mtx:rh_status> </rdf.Description>
20
<rdf:Description rdf:about =,,bloodtYpe://id#003">
<mtx:group>A</mtx:group>
<mtx:rh_status>+</ιntx:ι _status>
Figure imgf000044_0001
25 <rdf:Description rdf:about ="bloodtype://id#004">
<mtx:group>A mtx:group> <mtx:rh_støtus>-< mtx:rh_status < rdf:Description>
<*df:Description rdf:about="bloodtype://id#005"> O Λ <mtx:group>B<Λntx:group
<mtx:rh_status>+</mtx:r _statιιs> < rdf.Descriρtion>
<rdf:Description rdf:about="bloodtype://id#M)06">
<mtx:group>B</mtx:group>
35 <mtx:rh_status>-< mtx:rh_status> </rdf:Description>
<rdf:Description rdf:about="bloodtype://id#007"> <mtx:group>AB mtx:group> <Jmtx:rh_status>+< mtx:rh_status> <rdf:Description>
<rdf:Description rdf:abouH'bloodt pe://id#008">
<mtx:group>AB< mtx:group>
<mtx:rh_status>-</mtx:rh_status> </rdf:Description> </rdf:RDF>
A schema-less query 612, here expressed in the HxQL language, for returning all blood types stored in the triples store 114C is as follows:.
/*
* Display ah the different blood types (e.g. AB-) */
USING mtx FOR <http://metatomix.eom/blood/l .0#> SELECT ?blood_£rouρ, ?blood_rh
FROM <* blood_*jrd£
WHERE (<mtx:group>, ?blood_type, ?bloodjgroup),
(<mtx:rh_status>, ?blood_type, ?blood_rh) AND ?blood_t pe = <bloodtype://id#*>
An equivalent query expressed in the HxML language is as follows:
<?xml version="1.0" ?>
<a:lιml xmlns:a- 'http_// ww rιet.ϊtomix.com/hml#',> <!— Display all the different blood types (e.g. AB-) — > <a:NamespaoeAliasSet>
<a:namespace a:uri="http://metatomix.com/blood/1.0#" aialias- 'mtx" t> <a:NamespaceAliasSet>
<aJ)efaultDocumentSet> <a:document
Figure imgf000045_0001
/> <a:DefaultDocumentSe >
<a:SelectionSet>
<a:variable a:id="?blood_group"/> <a.Υariable a:id="?blood_rh" i>
<a:SelectionSet>
<a:TriplesSet> <artriple a:id="l" a:predicate="mtx:group" a:subject ="?blood_type"> <a:object a:type=''literal">?blood^group< a:object> < artriple> <aΛriple a:id- '2M a:predicate="mtx:rh_status" ∑ subject="?blood ype"> <a:object aΛype="literalB>?blood_Λ</a:object> </a:triple> </a:TriplesSet>
<a:ConstraintSet>
<a:constraint
Figure imgf000046_0001
<a:operand a:type=Mresource' bloodtype://id#*</a:operand> </a:constraint> </a:ConstraintSet> </a:hml>
Operation of the query translator 160 results in generation of the following SQL SELECT statement for application against the relational data store 114C:
SELECT 10.1_value AS "BLOOD_GROUP", ll.l_value AS "BLOODJttF
FROMmodels mO, models ml, triples tO, triples tl, literals 10, literals 11, resources r2, resources r3, resources r4, namespaces n2, namespaces n3, namespaces n4
WHERE (t0.m_hash = m0.m_hash AND m0.uri_strmg LIKE '% blood~_%.rdr ESCAPE *~*)
AND (tO .predicate = r2.r_hash AW) r2 ji hash = n2.n_hash AND tO.predicate = 8588294745283711900)
AND (tO .subject = r3.r_hash AND r3.n_hash = n3.n_hash)
AND (tO.object = lO.IJhash AND t0.resource_flg = 0)
AND (tl.m_hash = ml.mjiash AND ml.uri_string LIKE '°/o/bIood~_%.rdf ESCAPE '-')
AND (tl .predicate = r4.r_hash AND r4.n_hash = n4.n_hash AND tl .predicate = - 8645869300922183732)
AND (tl .object = lUJiash AND tl.resource_flg = 0)
AND (ttsubject = tO.subject)
AND (n3.n_value + r3.r_value) LIKE 'bloodtype://id#%' ESCAPE '-'
Application of this SELECT statement to the relational store 114C yields the following result set:
Figure imgf000046_0002
The data store 114 includes a graph generator (not shown) that uses RDF triples to generate directed graphs in response to queries (e.g., in HxQl or HxML form) from the framework server 116. These may be queries for information reflected by triples originating from data in one or more of the legacy databases 140 (one example might be a request for the residence cities of hotel guests who booked reservations on account over Independence Day weekend, as reflected by data from an e-Commerce database and an Accounts Receivable database). Such generation of directed graphs from triples can be accomplished in any conventional manner known the art (e.g., as appropriate to RDF triples or other maimer in which the information is .stored) or, preferably, in the manner described in co-pending, commonly assigned United States Patent Application Serial No. 10/138,725, filed May 3, 2002, entitled METHODS AND APPARATUS FOR VISUALIZING RELATIONSHIPS AMONG TRIPLES OF RESOURCE DESCRIPTION FRAMEWORK (RDF) DATA SETS and Serial No.60/416,616, filed October 7, 2002, entitled METHODS AND APPARATUS FOR IDENnFYI G RELATED NODES IN A DIRECTED GRAPH HAVING NAMED ARCS, the teachings of both of which are incor- porated herein by reference. Directed graphs so generated are passed back to the server 116 for presentation to the user.
According to one practice of the invention, the data store 114 utilizes genetic, self- adapting, algorithms to traverse the RDF triples in response to queries from the framework server 116. Though not previously known in the art for this purpose, such techniques can be beneficially applied to the RDF database which, due to its inherently flexible (i.e., schema-less) structure, is not readily se.arched using tradition search techniques. To this end, the data store utilizes a genetic algorithm that performs several searches, each utilizing a different methodology but all based on the underlying query from the framework server, against the RDF triples. It compares the results of the searches quantitatively to discern which produce(s) the best results s nd reapplies that search with additional terms or further granul.arity.
Referring back to Figure 1, the framework server 116 generates requests to the data store 114 (and/or indirectly to the legacy databases via connectors 108, as discussed above) and presents information therefrom to the user via browser 118. The requests can be based on HxQL or HxML requests entered directly by the user though, preferably, they .are generated by the server 116 based on user selections/responses to questions, dialog boxes or other user-input controls. In a preferred embodiment, the framework server includes one or more user interface modules, plug-ins, or the like, each for generating queries of a particul.ar nature. One such module, for example, generates queries pertaining to marketing information, another such module generates queries pertaining to financial information, and so forth. In addition to generating queries, the framework server (and/or the aforementioned modules) "walks" directed graphs generated by the data store 114 to present to the user (via browser 118) my specific items of requested information. Such walking of the directed graphs can be accomplished via any conventional technique known in the .art. Presentation of ques- tions, dialog boxes or other user-input controls to the user and, likewise, presentation of responses thereto based on the directed graph can be accomplished via conventional server/ browser or other user interface technology.
In some embodiments, the framework server 116 permits a user to update data stored in the data store 114 and, thereby, that stored in the legacy databases 140. To this end, changes made to data displayed by the browser 118 a . re transmitted by server 116 to data store 114. There, any triples implicated by the change are updated in store 114C, as are the corresponding RDF document objects in store 114A. An indication of these changes can be forwarded to the respective legacy databases 140, which utilize the corresponding API (or other interface mech- anisms) to update their respective stores. (Likewise, changes made directly to the store 114C as discussed above, e.g., using a WebDAV client, can be forwarded to the respective legacy database.)
In some embodiments, the server 116 can present to the user not only data from the data store 114, but also data gleaned by the server directly from other sources. Thus, for example, the server 116 CM directly query an enterprise web site for statistics regarding web page usage, or otherwise.
A further understanding of the operation of the framework server 116 may be attained by reference to the appendix filed with United States Patent Application Serial No.09/917,264, filed July 27, 2001, and entitled "Methods and Apparatus for Enterprise Application Integration," which appendix is incorporated herein by reference.
Described herein are methods and apparatus meeting the above-mentioned objects. It will be appreciated tirat the illustrated embodiment is merely .an example of the invention and that other embodiments, incorporating changes to those described herein, fall within the scope of the invention, of which we claim: 1. A method for searching an RDF triples data store having a first storage schema, comprising:
inputting a first query specifying RDF triples to be identified in the data store, where the first query reflects any of a second storage schema or no storage scheme;
generating from the first query a second query that specifies RDF triples to be identified in the data store and that reflects the first storage schema;
applying the second query to the data store for identification of the specified RDF triples.
2. The method of claim 1, further comprising;
examining the first query for one or more tokens that represent data to be used in generating the second query; and
dispatching context-specific grammar events containing that date.
3. The method of claim 2, wherein each of the events represents any of a declaration and a constraint specified in the first query.
4. The method of claim 3, wherein the declaration specifies one or more RDF documents from which triples to be identified are contained.
5. The method of claim 3, wherein the constraint specifies RDF triples to be identified.
6. The method of claim 1, further comprising:
extracting .statement data from the first query; and
associating that statement data with at least a portion of the second query.
7. The method of claim 6, further comprising:
generating the second query in the form of an SQL SELECT statement. 8. The method of claim 7, wherein the associating step includes associating statement data with one or more of a SELECT clause, a FROM clause, a WHERE clause and a ORDER-BY clause of an SQL statement.
9. The method of claim 1 , wherein the RDF triples store uses a hashed by origin schema.
10. A method for translating a schema-less input query in a first language to an output query in a second language comprising:
examining the schema-less input query for one or more tokens that represent data to be used in generating the output query;
dispatching context-specific grammar events containing that data; and
populating portions of the output query according to the events and data;
generating the output query in the second language comprising those populated portions, wherein the output query represents a schema of a relational database storing RDF triples.
11. The method of claim 10, wherein dispatching events further comprises generating any of a logical condition event, a selection term declaration event, and a triple declaration event.
12. The method of claim 11 , where generating a logical condition event comprises generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples according to a Boolean condition.
13. The method of claim 11 , where generating a selection term declaration event comprises generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples including a specified term.
14. The method of claim 11 , further where generating a triple declaration event comprises generating an event containing data which, when applied to the relational database via the output query, identifies RDF triples according to a specified subject, predicate and object.
15. The method of claim 10, wherein the first language is any of SQL-like and XM Jike^ 16. The method of claim 10, further comprising generating the output query as an SQL Select statement.
17. The method of claim 10, wherein the RDF triples are stored in a hashed with origin schema.
18. A digital system for seeching an RDF triples data store having a storage schema, comprising:
a parser component that examines a schema-less, first query specifying one or more RDF triples to be identified, the parser component examines the first query for one or more tokens that represent data to be used in generating a second query and that dispatches context-specific gramm. r events containing that data;
an event-processing component coupled to the parser component, the event-processing component extracts statement data from one or more events;
a statement management/generation component coupled to the event-processing component, the statement management generation component generates the second query so as to identify the same RDF triples identified in the schema-less, first query .and so as to reflect the storage schema of the RDF triples data store.
19. The digital system of claim 18, wherein events represent any of a declaration, constraint and sorting order.
20. The digital system of claim 19, wherein a declaration event specifies RDF documents to be searched for those RDF triples to be identified.
21. The digital system of claim 19, wherein the constraint event specifies RDF triples to be identified that match a . n associated constraint.
22. The digital system of claim 19, wherein the associated constraint is any of a Boolean expression and a literal.
23. The digital system of claim 19, wherein a sorting order event specifies an order in which identified RDF triples are to be sorted for presentation to a user.
Figure imgf000052_0001
25
PCTUS02/37729
2/S
Figure imgf000053_0001
3/5
Figure imgf000054_0001
FIG. IB 4/5
Figure imgf000055_0001
Figure imgf000055_0003
Figure imgf000055_0002
5/5
Figure imgf000056_0001
QUERYTRANS1.ATOR
Figure imgf000056_0002
FIG.3 O 2004/034625
(12) IN 1 H.KNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
(19) World Intellectual Prcperty Organization
International Bureau i mil mi røϊ u iiim n in mi 111 IΪI nm nm nm inn n m IJIIIΠ mi mi αιι
Figure imgf000057_0001
(43) International Publication Date (10) International Publication Number 30 May 2003 (30.05.2003) PCT WO 03/044634 A2
(51) International Patent Classification7: G06F David; 324 Concord Avenue, .Lexington, MA 02421 (US). DEFUSCO, Anthony, J.; 1140-C Diamond Hill Road,
(21) International Application Numljer: PCT US02 37729 oonsocket, Rl 02895 (US). GREENBLATT, Howard;
22 Coolidge Street, Wayland, MA 01778 (US).
(22) International Filing Date:
21 November 2002 (21.11.2002) (74) Agents: POWSNER, David, J. et al.; Nutter, McClennen & Fish LLP, World Trade Center West, 155 Seaport Blvd.,
(25) Filing Language: English Boston, MA 02110-2604 (US).
(26) Publication Language: English
(81) Designated States (national).' A.B, AG, AL, AM, AT, AU,
(30) Priority Data: AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU,
60 332,053 21 November 2001 (21.11.2001) US CZ, DE, DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM,
60332,219 21 November 2001 (21.11.2001) US HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX,
(71) Applicant: METATOMIX, INC. [US US]; 275 Wyman MZ, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, Street, Suite 130, Waltham, MA 02451 (US). TJ, TM, TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW.
Figure imgf000057_0002
004/034625
WO 03/044634 A2 II
European patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE, For two-letter codes and other abbreviations, refer to the "GuidES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE, SK, ance Notes on Codes and Abbreviations " appearing at the beginTR), OAPI patent (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, ning of each regular issue of the PCT Gazette. GW, ML, MR, NE, SN, TD, TG).
Published:
— without international search report and to be republished upon receipt of that report
Appendix B
Copy of United States Patent Application Serial No. 10/302,727, filed November 21, 2002, entitled METHODS AND APPARATUS FOR STATISTICAL DATA ANALYSIS AND REDUCTION FOR AN ENTERPRISE APPLICATION, now published as PCT WO 03046769 (Application WO2002US0037727).
(29 pages, including this cover sheet) AppB-1
(Appendix B) 2004/034625
METHODS AND APPARATUS FOR STATISTICAL DATA ANALYSIS AND REDUCTION FOR AN ENTERPRISE APPLICATION
Background
This application claims the benefit of priority of United States Provisional Patent Application Serial No. 60/332,053, filed November 21 , 2001 , entitled "Methods And Apparatus For Querying A Relational Database In A System For Real-Time Business Visibility" and U.S. Provisional Patent Application Serial No. 60/332,219, filed on November 21, 2001, entitled "Methods And Apparatus For Calculation and Reduction of Time-Series Metrics From Event Streams Or Legacy Databases In A System For Real-Time Business Visibility." This application is also a continuation-in-part of United States Patent Application Serial No. 09/917,264, filed July 27, 2001, entitled "Methods and Apparatus for Enterprise Application Integration" and United States Patent Application Serial No. 10/051,619, filed October 29, 2001, entitled "Methods And Apparatus For Real-Time Business Visibility Using Persistent Schema-Less Data Storage." The teachings of all of the foregoing applications are incorporated herein by reference.
The invention pertains to digital data processing and, more particularly, to methods and apparatus for enterprise business visibility and insight using real-time reporting tools.
It is not uncommon for a single enterprise to have several separate database systems to track internal and external planning and transactional data. Such systems might have been developed at different times throughout the history of the enteφrise .and, therefore, represent differing generations of computer technology. For example, a marketing database system tracking customers may be ten years old, while an enterprise resource planning (ERP) system tracking inventory might be two or three years old. Integration between these systems is dif- ficult at best, consuming specialized programming skill and constant maintenance expenses.
A major impediment to enterprise business visibility is the consolidation of data from these disparate legacy databases with one another and with that from newer e-commerce databases. For instance, inventory on-hand data gleaned from a legacy ERP system may be diffi- cult to combine with customer order data gleaned from web servers that support e-commerce (and other web-based) transactions. This is not to mention difficulties, for example, in consolidating resource scheduling data from the ERP system with the forecasting data from the marketing database system.
An object of this invention is to provide improved methods .and apparatus for digital data processing and, more particularly, for enterprise business visibility and insight (hereinafter, "enteφrise business visibility").
58 (Background) 2004/034625
A further object is to provide such methods and apparatus as can rapidly and accurately retrieve information responsive to user inquiries.
A further object of the invention is to provide such methods and apparatus as can be readily and inexpensively integrated with legacy, current and future database management systems.
A still further object of the invention is to provide such methods and apparatus as can be implemented incrementally or otherwise without interruption of enteφrise operation.
Yet a still further object of the invention is to .provide such methods and apparatus as to facilitate ready access to up-to-date enteφrise data, regardless of its underlying source.
Yet still a further object of the invention is to provide such methods and apparatus as permit flexible presentation of enteφrise data in an easily understood manner.
59 (Background) Summary of the Invention
The aforementioned are among the objects attained by the invention, one aspect of which provides a method of time- wise data reduction that includes the steps of inputting data from a source; summarizing that data according to one or more selected epochs in which it belongs; and generating for each such selected epoch one or more RDF triples characterizing the summarized data. The data source may be, for example, a database, a data stream or otherwise. The selected epoch may be a second, minute, hour, week, month, year, or so forth.
Further aspects of the invention provide a method as described above including the step of ouφutting the RDF triples in the form of RDF document objects. These can be stored, for example, in a hierarchical data store such as, for example, a WebDAV server.
Still further related aspects of the invention provide for parsing triples from the RDF document objects and storing them in a relational data store. A further related aspect of the invention provides for storing the triples in a relational store that is organized according to a hashed with origin approach.
Still yet other aspects of the invention provide for retrieving information represented by the triples in the hierarchical and/or relational data stores, e.g., for presentation to a user. Related aspects of the invention provide for retrieving triples containing time-wise reduced data, e.g., for presentation to a user.
Related aspects of the invention provide methods as described above including a sum- marizing the input data according to one or more epochs of differing length. Further aspects of the invention provide methods as described above including querying the source, e.g., a legacy database, in order to obtain the input data. Related aspects of the invention provides for generating such queries in SQL format.
Still other aspects of the invention provide methods as described above including the step of inputting an XML file that identifies one or more sources of input data, one or more fields thereof to be summarized in the time-wise reduction, and/or one or more epochs for which those fields are to be summarized.
Further aspects of the invention provide methods as described above including responding to an input datum by updating summary data for an epoch of the shortest duration, e.g., a store of per day data. Related aspects of the invention provide for updating a store of summary
(Summary) 2004/034625
data for epochs of greater duration, e.g., stores of per week or per month data, from summary data maintained in a store for an epoch of lesser duration, e.g., a store of per day data.
These and other aspects of the invention are evident in the drawings and in the descrip- tion that follows.
(Summary) 2004/034625
Brief Description of the Drawings
The foregoing features of this invention, as well as the invention itself, may be more fully understood from the following detailed description of the drawings in which:
Figure 1 depicts an improved enteφrise business visibility and insight system according invention;
Figure 1 A depicts an .architecture for a hologram data store according to the invention, e.g., in the system of claim 1 ;
Figure IB depicts the tables in a model store and a triples store of the hologram data store of Figure 1A;
Figure 2 depicts a directed graph representing data triples of the type maintained in a data store according to the invention.
Figure 3 is a functional block diagram of a time- wise data reduction module in a system according to the module.
f Brief Descr) 2004/034625
Detailed Description of the Illustrated Embodiment
Figure 1 depicts a real-time enteφrise business visibility and insight system according to the invention. The illustrated system 100 includes connectors 108 that provide software interfaces to legacy, e-commerce and other databases 140 (hereinafter, collectively, "legacy databases"). A "hologram" database 114 (hereinafter, "data store" or "hologram data store"), which is coupled to the legacy databases 140 via the connectors 108, stores data from those databases 140. A framework server 116 accesses the data store 114, presenting selected data to (and permitting queries from) a user browser 118. The server 116 CM also permit updates to data in the data store 114 and, thereby, in the legacy databases 140.
Legacy databases 140 represent existing (and future) databases and other sources of information (including data streams) in a company, organization or other entity (hereinafter "enteφrise"). In the illustration, these include a retail e-commerce database (e.g., as indicated by the cloud and server icons adjacent database 140c) maintained with a Sybase® database management system, an inventory database maintained with .an Oracle® database management system and an ERP database maintained with a SAP® Enteφrise Resource Planning system. Of course, these are merely examples of the variety of databases or other sources of information with which methods and apparatus as described herein can be used. Common features of illustrated databases 140 are that they maintain information of interest to an enteφrise and that they can be accessed via respective software application program interfaces (API) or other mechanisms known in the art.
Connectors 108 serve as an interface to legacy database systems 140. Each connector applies requests to, and receives information from, a respective legacy database, using that database's API or other interface mechanism. Thus, for example, connector 108a applies requests to legacy database 140a using the corresponding SAP API; connector 108b, to legacy database 140b using Oracle API; and connector 108c, to legacy database 140c using the corresponding Sybase API.
In the illustrated embodiment, these requests are for puφoses of accessing data stored in the respective databases 140. The requests can be simple queries, such as SQL queries and the like (e.g., depending on the type of the underlying database and its API) or more complex sets of queries, such as those commonly used in data mining. For example, one or more of the connectors can use decision trees, statistical techniques or other query and analysis mechanisms lαiown in the art of data mining to extract information from the databases.
(Detailed Descr) 004/034625
Specific queries and analysis methodologies can be specified by the hologram data store 114 or the framework server 116 for application by the connectors. Alternatively, the connectors themselves can construct specific queries and methodologies from more general queries received from the data store 114 or server 116. For example, request-specific items can be "plugged" into query templates thereby effecting greater speed and efficiency.
Regardless of their origin, the requests can be stored in the connectors 108 for application and/or reapplication to the respective legacy databases 108 to provide one-time or periodic data store updates. Connectors can use expiration date information to determine which of a plurality of similar data to return to the data store, or if dates are absent, the connectors can mark returned data as being of lower confidence levels.
Data and other information (collectively, "messages") generated by the databases 140 in response to the requests are routed by connectors to the hologram data store 114. That other information can include, for example, expiry or other adjectival data for use by the data store in caching, purging, updating and selecting data. The messages can be cached by the connectors 108, though, they are preferably immediately routed to the store 114.
The hologram data store 114 stores data from the legacy databases 140 (and from the framework server 116, as discussed below) as RDF triples. The data store 114 can be embodied on any digital data processing system or systems that are in communications coupling (e.g., as defined above) with the connectors 108 and the framework server 116. Typically, the data store 114 is embodied in a workstation or other high-end computing device with high capacity storage devices or arrays, though, this may not be required for any given implementation.
Though the hologram data store 114 may be contained on an optical storage device, this is not the sense in which the term "hologram" is used. Rather, it refers to its storage of data from multiple sources (e.g., the legacy databases 140) in a form which permits that data to be queried and coalesced from a variety of perspectives, depending on the needs of the user and the capabilities of the framework server 116.
To this end, a preferred data store 114 stores the data from the legacy databases 140 in subject-predicate-object form, e.g., RDF triples, though those of ordinary skill in the art will appreciate that other forms may be used as well, or instead. By way of background, RDF is a way of expressing the properties of items of data. Those items are referred to as subjects. Their properties are referred to as predicates. And, the values of those properties are referred to as objects. In RDF, an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object.
(Detailed Descr) Subjects, also referred to as resources, can be anything that is described by an RDF expression. A subject can be person, place or thing — though, typically, only an identifier of the subject is used in an actual RDF expression, not the person, place or thing itself. Examples of subjects might be "car," "Joe," "http://www.metatomix.com." 5
Apredicate identifies a property of a subject. According to the RDF specification, this may be any "specific aspect, characteristic, attribute, or relation used to describe a resource." For the three exemplary subjects above, examples of predicates might be "make," "citizenship," "owner." 10
An object gives a "value" of a property. These might be "Ford," "United Kingdom," "Metatomix, Inc." for the subject and objects given in the prior paragraphs, forming the following RDF triples:
15 SSuubbjjeecctt PPrreeddiiccaattee Object
'car" "make" "Ford"
"Joe" "citizenship" "United Kingdom"
"http://metatomix.com" "owner" "Metatomix, Inc."
2Q Objects can be literals, i.e., strings that identify or name the corresponding property
(predicate). They can also be resources. In the example above, rather than merely the string "Metatomix, Inc." further triples may be specified — resumably, ones identifying that company in the subject and giving details in predicates and objects.
2<- A given subject may have multiple predicates, each predicate indexing an object. For example, a subject postal zip code might have an index to an object town and an index to an object state, either (or both) index being a predicate URI.
Listed below is a portion of a data set of the type with which the invention can be prac- -,ft ticed. The listing contains RDF triples, here, expressed in extensible markup language (XML) syntax. Those skilled in the art will, of course, appreciate that RDF triples can be expressed in other syntaxes and that the teachings hereof are equally applicable to those syntaxes. Further, the listing shows only a sampling of the triples in a database 114, which typically would contain tens of thousands or more of such triples.
35
(Detailed Descr) 2004/034625
<rdf:RDF...xmlns="http://www.metatomix.com/postalCode/1.0#> <rdf:Description rdf:about="postal://zip#02886">
<town>Warwick</to n>
<state>RI</state>
<country>USA</country>
<zip>02886</zip> <rdf:Description>
<rdf:Description rdf:about="postal://zip#02901">
<town>Providence</town>
<state>RI</state>
<country>USA</country>
<zip>02901</zip> </rdf:Description>
Subjects are indicated within the listing using a "rdf:about" statement. For example, the second line of the listing defines a subject as a resource named "postal://zip#02886." That subject has predicates and objects that follow the subject declaration.
One predicate, <town>, is associated with a value "Warwick". Another predicate, <state>, is associated with a value "Rl". The same follows for the predicates <country> and <zip>, which .are associated with values "USA" and "02886," respectively. Similarly, the listing shows properties for the subject "postal://zip#02901," namely, <town> "Providence," <state> "Rl," <country> "US" and <zip> "02901."
In the listing, the subjects and predicates are expressed as uniform resource indicators (URIs), e.g., of the type defined in Berniers-Lee et al, Uniform Resource Identifiers (URD: Generic Syntax (RFC 2396) (August 1998), and can be said to be expressed in a form <scheme>://<path>#<fragment>. For the subjects given in the example, <scheme> is "postal," <path> is "zip," and <fragment> is, for example, "02886" and "02901."
The predicates, too, are expressed in the form <scheme>://<path>#<fragment>, as is evident to those in ordinciry skill in the .art. In accord with XML syntax, the predicates in lines two, etseq., of the listing must be inteφreted as suffixes to the string provided in the namespace directive "xmlns=http://www.metatomix.com/postalCode/1.0#" in line one of the listing. This results in predicates that are formally expressed as: "http://www.metatomix.com/postalCode/ 1.0#town," "http://www.metatomix.eom/postalCode/l .0#state," "http://www.metatomix.com postalCode/1.0#country" and "http://www.metatomix.eom/postalCode/l .0#zip."
(Detailed Descr) 004/034625
Hence, the <scheme> for the predicates is "http" and <path> is "www.metatomix.com/ postalCode/1.0." The <fragment> portions are <town>, <state>, <country> and <zip>, respectively. It is important to note that the listing is in some ways simplistic in that each of its objects is a literal value. Commonly, an object may itself be another subject, with its own objects and predicates. In such cases, a resource can be both a subject and an object, e.g., an object to all "upstream" resources and a subject to all "downstream" resources and properties. Such "branching" allows for complex relationships to be modeled within the RDF triple framework.
Figure 2 depicts a directed graph composed of RDF triples of the type stored by the illustrated data store 114, here, by way of non-limiting example, triples representing relationships among four companies (id#l, id#2, id#3 .and id#4) and between two of those companies (id#l and id#2) and their employees. Per convention, subjects and resource-type objects are depicted as oval-shaped nodes; literal-type objects are depicted as rectangular nodes; and predicates are depicted as arcs connecting those nodes.
Figure 1A depicts an architecture for a preferred hologram data store 114 according to the invention. The illustrated store 114 includes a model document store 114A and a model document manager 114B. It also includes a relational triples store 114C, a relational triples store manager 114D, and a parser 114E interconnected as shown in the drawing.
As indicated in the drawing, RDF triples maintained by the store 114 are received ~ from the legacy databases 140 (via connectors 108) and/or from time-based data reduction module 150 (described below) ~ in the form of document objects, e.g., of the type generated from a Document Object Model (DOM) in a JAVA, C++ or other application. In the illustrated embodiment, these are stored in the model document store 114A as such (i.e., document objects) particularly, using the tables and inter-table relationships shown in Figure IB (see dashed box labelled 114B).
The model document manager 114B manages storage/retrieval of the document object to/from the model document store 114A. In the illustrated embodiment, the manager 114B comprises the Slide content management and integration framework, publicly available through the Apache Software Foundation. It stores (and retrieves) document objects to (and from) the store 114A in accord with the WebDAV protocol. Those skilled in the art will, of course, appreciate that other applications can be used in place of Slide and that document objects can be stored/retrieved from the store 114A in accord with other protocols, industry- standard, proprietary or otherwise.
(Detailed Descr) 2004/034625
However, use of the WebDAV protocol allows for adding, updating and deleting RDF document objects using a variety of WebDAV client tools (e.g., Microsoft Windows Explorer, Microsoft Office, XML Spy or other such tools available from a variety of vendors), in addition to adding, updating and deleting document objects via connectors 108 and/or time-based data reduction module 150. This also allows for presenting the user with a view of a traversable file system, with RDF documents that can be opened directly in XML editing tools or from Java programs supporting WebDAV protocols, or from processes on remote machines via any HTTP protocol on which WebDAV is based.
RDF triples received by the store 114 are also stored to a relational database, here, store
114C, that is managed and accessed by a conventional relational database management system (RDBMS) 114D operating in accord with the teachings hereof. In that database, the triples are divided into their constituent components (subject, predicate, and object), which .are indexed .and stored to respective tables in the manner of a "hashed with origin" approach. Whenever an RDF document is added, updated or deleted, a parser 114E extracts its triples and conveys them to the RDBMS 114D with a corresponding indicator that they are to be added, updated or deleted from the relational database. Such a parser 114E operates in the conventional manner known in the art for extracting triples from RDF documents.
The illustrated database store 114C has five tables interrelated as particularly shown in
Figure IB (see dashed box labelled 114C). In general, these tables rely on indexes generated by hashing the triples' respective subjects, predicates and objects using a 64-bit hashing algorithm based on cyclical redundancy codes (CRCs) — though, it will be appreciated that the indexes can be generated by other techniques as well, industry-standard, proprietary or other- wise.
Referring to Figure IB, the "triples" table 534 maintains one record for each stored triple. Each record contains an aforementioned hash code for each of the subject, predicate and object that make up the respective triple, along with a resource flag ("resourcc_flg") indicating whether that object is of the resource or literal type. Each record also includes an aforementioned hash code ("m_hash") identifying the document object (stored in model document store 114A) from which the triple was parsed, e.g., by parser 114E.
In the illustrated embodiment, the values of the subjects, predicates and objects .are not stored in the triples table. Rather, those values are stored in the resources table 530, namespaces table 532 and literals table 536. Particularly, the resources table 530, in conjunction with the namespaces table 532, stores the subjects, predicates and resource-type objects; whereas, the literals table 536 stores the literal-type objects.
(Detailed Descr) 2004/034625
The resources table 530 maintains one record for each unique subject, predicate or resource-type object. Each record contains the value of the resource, along with its aforementioned 64-bit hash. It is the latter on which the table is indexed. To conserve space, portions of those values common to multiple resources (e.g., common <scheme>://<path> identifiers) are stored in the namespaces table 532. Accordingly the field, "r_value," contained in each record of the resources table 530 reflects only the unique portion (e.g., <fragment> identifier) of each resource.
The namespaces table 532 maintains one record for each unique common portion referred to in the prior paragraph (hereinafter, "namespace"). Each record contains the value of that namespace, along with its aforementioned 64-bit hash. As above, it is the latter on which this table is indexed.
The literals table 536 maintains one record for each unique literal-type object. Each record contains the value of the object, along with its aforementioned 64-bit hash. Each record also includes an indicator of the type of that literal (e.g., integer, string, and so forth). Again, it is the latter on which this table is indexed.
The models table 538 maintains one record for each RDF document object contained in the model document store 114 A. Each record contains the URI of the corresponding document object ("uri string"), along with its aforementioned 64-bit hash ("m hash"). It is the latter on which this table is indexed. To facilitate associating document objects identified in the models table 538 with document objects maintained by the model document store 114A, each record of the models table 538 also contains the ID of the corresponding document object in the store 114A. That ID can be assigned by the model document manager 114B, or otherwise.
From the above, it can be appreciated that the relational triples store 114C is a schema- less structure for storing RDF triples. As suggested by Melnik, supra, triples maintained in that store can be reconstituted via an SQL query. For example, to reconstitute the RDF triple having a subject equal to "postal://zip#02886", a predicate equal to "http://www.metatomix.com/ postalCode/1.0#town", and an object equal to "Warwick", the following SQL statement is applied:
SELECT m.uri_string, t.resource_flg,
concat (nl.n value, rl.r_value) as subj,
concat (n2.n_value, r2.r_value) as pred,
(Detailed Descr) 004/034625 concat (n3.n_value,r3.r_value),
l.l value
FROM triples t, models m, resources rl, resources r2, namespaces nl, namespaces n2
LEFT JOIN literals 1 on t.object= .1_hash
LEFT JOIN resources r3 on t.object=r3.r_hash
LEFT JOIN namespaces n3 on r3.r_value=n3.n_value
WHERE t.subject=rl .r_hash AND rl .n_hash=nl .n_hash AND
t.ρredicate=r2.r_hash AND r2.n_hash=n2.n_hash AND
Figure imgf000072_0001
t.predicate=hash('http://www.metatomix.com/postalcode/l .0#town') AND
t.object=hash('warwick')
Those skilled in the art will, of course, appreciate that RDF documents and, more generally, objects maintained in the store 114 can be contained in other stores — structured relation- ally, hierarchically or otherwise ~ as well, in addition to or instead of stores 114A and 114C.
Referring to Figure 3, time-wise data reduction component 150 comprises an XML parser 504, a query module 506, an analysis module 507 and an output module 508. The component 150 performs a time-wise reduction on data from the legacy databases 140. In some embodiments, that data is supplied to the component 150 by the connectors 108 in the form of RDF documents. In the illustrated embodiment, the component 150 functions, in part, like a connector itself — obtaining data directly from the legacy databases 140 before time- wise reducing it.
" Regardless, illustrated component 150 outputs the reduced data in the form of RDF triples contained in RDF documents. In the illustrated embodiment, these are stored in the model store 114A (and the underlying triples, in relational store 114C), alongside the RDF documents (and their respective underlying triples) from which the reduced data was gener-
(Detailed Descr) 2004/034625
ated. This facilitates, for example, reporting of the time- wise reduced data, e.g., by the framework server 116, since that data is readily available for display to the user and does not require ad hoc generation of data summaries in response to user requests.
Module 504 parses an XML file 502 which specifies one or more sources of data to be time-wise reduced. That file may be supplied by the framework server 116, or otherwise. The specified sources may be legacy databases, data streams, or otherwise 140. They may also be connectors 108, e.g., identified by symbolic name, virtual port number, or otherwise. Along with the data source identifiers), the XML specification file 502 specifies the data items which are to be time- wise reduced. These can be field names, identifiers or otherwise.
The XML file 502 further specifies the time periods or epochs over which data is to be time-wise reduced. These can be seconds, minutes, hours, days, months, weeks, years, and so forth, depending on the type of data to be reduced. For example, if the data source contains hospital patient data, the specified epochs may be weeks and months; whereas, if the data source contains web site access data, the specified epochs may be hours and days.
The parser component 504 parses the XML file 502 to discern the aforementioned data source identifiers, field identifiers, and epochs. To this end, the parser 504 may be constructed and operated in the conventional manner known in the art.
The query module 506 generates queries in order to obtain the field specified in the XML specification file 502. It queries the identified data source(s) in the manner appropriate to those sources. For example, the processing module 510 queries SQL-compatible databases using an SQL query. Other data sources are queried via their respective applications program interfaces (APIs), or otherwise. In embodiments where source data is supplied to the component 150 by the connectors 108, querying may be performed explicitly or implicitly by those connectors 108. Moreover, querying might not need to be performed on some data sources, e.g., data streams, from which data is broadcast or otherwise available without the need for request. In such instances, filtering may be substituted for querying in order that the specific fields or other items of data specified in the XML file are obtained.
The analysis module 507 compiles time-wise statistics or summaries for each epoch specified in the XML file 502. To this end, it maintains for each such epoch one or more run- ning statistics (e.g., sums or averages) for each data field specified by the file 502 and received from the sources. As datum for each field are input, the running statistics for that field are updated. Such updating can include incrementing a count maintained for the field, recomput-
(Detailed Descr) 004/034625
ing a numerical total, modifying a concatenated string, and so forth, as appropriate to the type of the underlying field data.
By way of example, if the XML specification file 502 specifies that a summary of the number of "hits" of a web site are to be maintained on a per day basis, the analysis module 507 would maintain a store reflecting the number of hits thus far counted on a given day for that web site (e.g., based on data received from a source identifying each hit as it occurs, or otherwise). When no further data is received from the source for that day, the module generates RDF output (via the output module 508) reflecting that number of counts (or other specified summary information) for output to the hologram store 114.
If the XML file 502 additionally specifies that summary data of web site accesses is to be maintained on a per month basis, the analysis module 507 would maintain a separate store of counts for the month for which data is currently being received from the source. As above, when no further data is received from the source for that month, the module generates RDF output reflecting the total number of counts (or other specified summary information) for output to the hologram store 114.
As .an alternative to simultaneously updating stores for each of multiple epochs as new data is received, other embodiments of the invention increment (or otherwise update) the store for the epoch of shortest relevant duration (e.g., the per day store) as each such data item is received. Additional stores reflecting epochs of longer duration (e.g., the per month store) are only updated as those for the shorter duration epochs are completed.
An analysis module 507 according to a preferred practice of the invention maintains stores for each epoch for which running statistics (.i.e., time-wise summaries) are to be maintained. In order to accommodate the maintenance of running statistics for epochs from a plurality of sources, the stores 514 can be allocated from an array, a pointer table or other data structure, with specific allocations made depending on the specific number of running statistics being tracked.
For example if an XML file 502 specifies that access statistics are to be maintained for a web site on daily and monthly bases using data from a first data source, and that running statistics for the numbers of visitors to a retail store are to be maintained on monthly and yearly bases from data from a second data source, the analysis module 507 can maintain four stores: store 14A maintaining a daily count for the web site; store 514B maintaining a monthly count for the web site; store 514C maintaining a monthly account for the retail store; and store 514D
(Detailed Descr) 004/034625
maintaining a yearly count for the retail store. Each of the stores 514 is updated as corresponding data is received from the respective data sources.
Thus, continuing the above example, as data (in the form of records, packets, or so forth) are received from the first data source reflecting web site accesses on a given day, a count maintained in the first store 514A is incremented. When the received data begins to reflect accesses on the succeeding day, the output module 508 can generate one or more RDF triples reflecting a count for the (then-complete) prior day for storage in the hologram store 114. Concurrently, the store 514A can be reset to zero and the process restarted for tracking accesses on that succeeding day.
The second store 514B, i.e., that tracking the longer epoch for data from the first source, can be incremented in parallel with the first store 514A as web access data is received from the source or, alternatively, can be updated when the first store 514A is rolled over, i.e. reset for tracking statistics for each successive day. As above, when data received from the first source begins to reflect web accesses for a succeeding month (i.e., the period associated with the second store 514B), RDF triples can be generated to reflect web access statistics for the then- completed prior month, concurrently with zeroing the second store 514B for tracking of statistics for the succeeding month.
In this way, the .analysis module 507 maintains running statistics for the epochs specified in the XML file 502, outputting RDF triples reflecting those statistics as data for each successive epoch is received. Those skilled in the art will appreciate that running statistics may be maintained in other ways, as well. For example, continuing the above example, in instances where data received from the first source is not received ordered by day (but, rather, is intermingled with respect to many days), multiple stores can be maintained — one for each day (or other epoch).
Referring again to FIG. 1A, the output module 508 generates RDF documents reflect- ing the summ.arized data stored in stores 514 for output to the hologram data store 114. This can be performed by generating and RDF stream ad hoc or, preferably, by utilizing native commands, e.g., of the Java programming language, to gather the epoch data into a document object model (DOM). In such a language, the DOM can be output in RDF format to the hologram store 114 directly.
A more complete understanding of the store 114 may be attained by reference to the aforementioned incoφorated-by-reference applications.
(Detailed Descr) 2004/034625
Referring to copending, commonly assigned United States Patent Application Serial No. , filed this day herewith, entitled "Methods and Apparatus for Querying a Relational Data Store Using Schema-Less Queries," the teachings of which are incoφorated herein by reference, the data store 114 supports a SQL-like query languages called HxQL and HxML. This allows retrieval of RDF triples matching defined criteria.
The data store 114 includes a graph generator (not shown) that uses RDF triples to generate directed graphs in response to queries (e.g., in HxQL or HxML form) from the framework server 116. These may be queries for information reflected by triples originating from data in one or more of the legacy databases 140 (one example might be a request for the residence cities of hotel guests who booked reservations on account over Independence Day weekend, as reflected by data from an e-Commerce database and an Accounts Receivable database). Such generation of directed graphs from triples can be accomplished in any conventional manner known the art (e.g., as appropriate to RDF triples or other manner in which the information is stored) or, preferably, in the manner described in co-pending, commonly assigned United States Patent Application Serial No. 10/138,725, filed May 3, 2002, entitled METHODS AND APPARATUS FOR VISUALIZING RELATIONSHIPS AMONG TRIPLES OF RESOURCE DESCRIPTION FRAMEWORK (RDF) DATA SETS and Serial No. 60/416,616, filed October 7, 2002, entitled METHODS AND APPARATUS FOR IDENTIFYING RELATED NODES IN A DIRECTED GRAPH HAVING NAMED ARCS, the teachings of both of which are incorporated herein by reference. Directed graphs so generated are passed back to the server 116 for presentation to the user.
According to one practice of the invention, the data store 114 utilizes genetic, self- adapting, algorithms to traverse the RDF triples in response to queries from the framework server 116. Though not previously known in the .art for this puφose, such techniques can be beneficially applied to the RDF database which, due to its inherently flexible (i.e., schema-less) structure, is not readily searched using traditional search techniques. To this end, the data store utilizes a genetic algorithm that performs several searches, each utilizing a different methodol- ogy but all based on the underlying query from the framework server, against the RDF triples. It compares the results of the searches quantitatively to discern which produce(s) the best results and reapplies that search with additional terms or further gr.anularity.
Referring back to Figure 1, the framework server 116 generates requests to the data store 114 (and/or indirectly to the legacy databases via connectors 108, as discussed above) and presents information therefrom to the user via browser 118. The requests can be based on
HxQL or HxML requests entered directly by the user though, preferably, they are generated by the server 116 based on user selections/responses to questions, dialog boxes or other user-input
(Detailed Descr) 2004/034625
controls. In a preferred embodiment, the framework server includes one or more user interface modules, plug-ins, or the like, each for generating queries of a particular nature. One such module, for example, generates queries pertaining to marketing information, another such module generates queries pertaining to financial information, and so forth.
In some embodiments, queries to the data store are structured on a SQL based RDF query language, in the general manner of SquishQL, as known in the art.
In addition to generating queries, the framework server (and/or the aforementioned modules) "walks" directed graphs generated by the data store 114 to present to the user (via browser 118) any specific items of requested information. Such walking of the directed graphs can be accomplished via any conventional technique known in the art. Presentation of questions, dialog boxes or other user-input controls to the user and, likewise, presentation of responses thereto based on the directed graph can be accomplished via conventional server/ browser or other user interface technology.
In some embodiments, the framework server 116 permits a user to update data stored in the data store 114 and, thereby, that stored in the legacy databases 140. To this end, changes made to data displayed by the browser 118 are transmitted by server 116 to data store 114. There, any triples implicated by the change are updated in store 114C, as are the corresponding RDF document objects in store 114A. An indication of these changes can be forwarded to the respective legacy databases 140, which utilize the corresponding API (or other interface mechanisms) to update their respective stores. (Likewise, changes made directly to the store 114C as discussed above, e.g., using a WebDAV client, can be forwarded to the respective legacy database.)
In some embodiments, the server 116 can present to the user not only data from the data store 114, but also data gleaned by the server directly from other sources. Thus, for example, the server 116 can directly query an enteφrise web site for statistics regarding web page usage, or otherwise.
A further understanding of the operation of the framework server 116 may be attained by reference to the appendix filed with United States Patent Application Serial No. 09/917,264, filed July 27, 2001, and entitled "Methods and Apparatus for Enteφrise Application Integra- tion," which appendix is incoφorated herein by reference.
Described herein are methods and apparatus meeting the above-mentioned objects. It will be appreciated that the illustrated embodiment is merely an example of the invention and
(Detailed Descr) 2004/034625
that other embodiments, incoφorating changes to those described herein, fall within the scope of the invention, of which we claim:
(Detailed Descr) 2004/034625
1. A method of time- wise data reduction and storage, comprising
A. inputting data from at least one source,
B. summarizing that data according to a specified epoch in which it belongs,
C. generating for each such epoch, one or more RDF triples characterizing the summarized data.
2. The method of claim 1, comprising outputting the RDF triples in one or more RDF document objects.
3. The method of claim 2, comprising storing the RDF document objects in a hierarchical data store.
4. The method of claim 3, comprising store the RDF document objects in accord with a WebDAV protocol.
5. The method of claim 1 , comprising storing the RDF triples in a relational data store.
6. The method of claim 5, comprising storing the RDF triples in a relational data organized according to a hashed with origin approach.
7. A method of time-wise data reduction and storage, comprising
A. querying one or more data sources,
B. summarizing data received from the data sources in response to querying, where the data is summarized by selected epoch,
C. generating for each such epoch, one or more RDF triples characterizing the summarized data,
D. storing the RDF triples to one or more data stores, along with further RDF triples characterizing the data from which the summaries were generated, where the one or more data stores include any of a hierarchical data store and a relational data store.
77 (Claims) 004/034625
8. The method of claim 7, comprising summarizing data received from the data sources with respect to multiple epochs of differing length.
9. The method of claim 7, comprising querying one or more data sources in an SQL format.
10. The method of claim 7, comprising parsing an XML file that identifies one or more of the data sources, one or more fields thereof to be summarized, and/or one or more epochs for which those fields are to be summarized.
11. The method of claim 7, comprising responding data received from a data source by updating a store associated with an epoch of shorter duration.
12. The method of claim 11 , comprising updating a store associated with an epoch of longer duration based on information maintained in an epoch of shorter duration.
13. A method of time- wise data reduction and storage, comprising
A. at least one of querying and filtering data from one or more data sources,
B. * summarizing the data received in one or more selected epochs of differing length,
C. generating RDF document objects comprising one or more RDF triples characterizing the summarized data,
D. storing the RDF documents to a first, hierarchical data store,
E. storing the triples therein to a second, relational data store.
14. The method of claim 13, comprising querying one or more data sources in an SQL format.
15. The method of claim 13, comprising parsing an XML file that identifies one or more of the data sources, one or more fields thereof to be summarized, and/or one or more epochs for which those fields are to be summarized.
16. The method of claim 13, comprising responding data received from a data source by updating a store associated with an epoch of shorter duration.
(Claims) 2004/034625
17. The method of claim 16, comprising updating a store associated with an epoch of longer duration based on information maintained in an epoch of shorter duration.
18. The method of claim 13, comprising generating a display or other presentation based on the RDF triples characterizing the summarized data.
(Claims) 2004/034625
Abstract of the Invention
The invention provides methods of time-wise data reduction that include the steps of inputting data from a source; summarizing that data according to one or more selected epochs in which it belongs; and generating for each such selected epoch one or more RDF triples characterizing the summarized data. The data source may be, for example, a database, a data stream or otherwise. The selected epoch may be a second, minute, hour, week, month, year, or so forth. The triples may be output in the form of RDF document objects. These can be stored, for example, in a hierarchical data store such as, for example, a WebDAV server. Triples parsed from the document objects may be maintained in a relational store that is organized, for example, according to a hashed with origin approach.
0 Abstract 25
Figure imgf000083_0001
25
Figure imgf000084_0001
(Dra ines) O 2004/034625
Figure imgf000085_0001
FIG. IB
83 (Drawines')
Figure imgf000086_0001
84 (Drawines')
Figure imgf000087_0001
FIG 3
Appendix C
Copy of United States Patent Application Serial No. 60/416,616, filed October 7, 2002, entitled METHODS AND APPARATUS FOR IDENTIFYING RELATED NODES IN A DIRECTED GRAPH HAVING NAMED ARCS, now published as PCT WO 04034625 (Application WO2003US0031636).
(32 pages, including this cover sheet)
AppC-1
(Appendix D) METHODS AND APPARATUS FOR IDENTIFYING RELATED NODES IN A DIRECTED GRAPH HAVING NAMED ARCS
Background of the Invention
5
The invention pertains to digital data processing .and, more particularly, to methods .and apparatus for identifying subsets of related data in a data set. The invention h* s application, for example, in enteφrise business visibility and insight using real-time reporting tools.
j 0 It is not uncommon for a single company to have several database systems — separate systems not interfaced — to track intern^ .and external pfenning and trωsaction data. Such systems might have been developed at different times throughout the history of the comply .and are therefore of differing generations of computer technology. For example, a marketing database system tracking customers may be ten years old, while an enterprise resource planning
J5 (ERP) system tracking inventory might be two or three ye.ars old. Integration between these systems is difficult at best, consuming specialized progr-amming skill and constant maintenance expenses.
A major impediment to enterprise business visibility is the consolidation of these dispa- 0 rate legacy databases with one another .and with newer databases. For instance, inventory on- hand data gleaned from a legacy ERP system may be difficult to combine with customer order data gleaned from web servers that support e-commerce (and other web-based) transactions. This is not to mention difficulties, for example, in consolidating resource scheduling data from the ERP system with the forecasting data from the mwketing database system. 5
Even where data from disparate databϊises can be consolidated, e.g., through data mining, directed queries, brute-force conversion .and combination, or otherwise, it may be difficult (if not impossible) to use. For ex.αmple, the manager of a corporate marketing campaign may be wholly unable to identify relevant customers from a listing of tens, hundreds or even 0 thousands of pages of consolidated corporate ERP, e-commerce, marketing and other data.
An object of this invention is to provide improved methods and apparatus for digital data processing and, more particularly, for identifying subsets of related data in a data set. 5 A related object is to provide such methods and apparatus as facilitate enteφrise business visibility and insight.
A further object is to provide such methods and apparatus as can rapidly identify subsets of related data in a data set, e.g., in response to user directives or otherwise. A further object of the invention is to provide such methods and apparatus as can be readily and inexpensively implemented.
Summary of the Invention
The foregoing are among the objects attained by the invention which provides, in one aspect, a method for identifying related data in a directed graph, such as an RDF data set. A "first" step — though the steps are not necessarily executed in sequential order — includes identifying (or marking) as related data expressly satisfying a criteria (e.g., specified by a user). A "second" step includes identifying as related ancestors of any data identified as related, e.g., in the first step, unless that ancestor conflicts with the criteria. A "third" step of the method is identifying descendents of .any data identified, e.g., in the prior steps, unless that descendent conflicts with the criteria or has a certain relationship with the ancestor from which it descends. The methods generates, e.g., as output, an indication of each of the nodes identified .as related in these steps.
By way of example, in the first step, a method according to this aspect of the invention can identify nodes in the directed graph that explicitly match a criteria in the form field 1 = value 1, where field I is a characteristic (or attribute) of one or more of the nodes and value 1 is a value of the specific characteristic (or attribute). Of course, criteria are specific to the types of data in the data set and can be more complex, including for example, Boolean expressions and operators, wildcards, and so forth. Thus, for example, a criteria of a data set composed of RDF triples might be of the form predicate-CTO and object=Colin, which identifies, as related, triples having a predicate "CTO" and an object "Colin."
By way of further example, in second step, the method "walks" up the directed graph from each node identified as related in first step (or any of the steps) to find ancestor nodes. Each of these is identified as related unless it conflicts with the criteria. To continue the exam- ple, if the first step marlcs as related a first RDF triple that matches the criteria predicαte=CTO .and object=Colin, the second step marks as related a second, parent triple whose object is the subject of the first triple, unless that second (or parent) triple otherwise conflicts with the criteria, e.g., has another object specifying that Dave is the CTO.
By way of further example, in the third step, the method walks down the directed graph from each node identified in the previously described steps (or any of the steps) to find descendent nodes. Each of these is identified as related unless (i) it conflicts with the criteria or (ii) its relationship with the ancestor from which walking occurs is of the same type as the relation- ship that ancestor has with a child, if any, from which the ancestor was identified by operation of the second step. To continue the example, if the first step marl s as related a first RDF triple that matches the criteria predicαte=CTO and object=Colin and the second step marks as related a second, parent triple whose object is the subject of the first triple via a predicate rela- tionship "Subsidiary," the third step marks as related a third, descendent triple whose subject is the object of the second, parent triple, unless that descendent triple conflicts with the criteria (e.g., has a predicate-object pair specifying that Dave is the CTO) or unless its relationship with the parent triple is also defined by a predicate relationship of type "Subsidiary."
As evident in the discussion above, according to some aspects of the invention, the data are defined by RDF triples and the nodes by subjects (or resource-type objects) of those triples. In other aspects, the data and nodes are of other data types — including, for example, meta directed graph data (of the type defined in one of the aforementioned incorporated-by-refer- ence applications) where a node represents a plurality of subjects each sharing a named relationship with a plurality of objects represented by a node.
Still further aspects of the invention provide methods as described above in which the so-called first, second and third steps are executed in parallel, e.g., as by an expert system rule- engine. In other .aspects, the steps are executed in series and/or iteratively.
In still further aspects of the invention, the invention provides methods for identifying related data in a directed graph by exercising only the first and second aforementioned steps. Other aspects provide such methods in which only the first and third such steps are exercised.
Still other aspects of the invention provide methods as described above in which the directed graph is made up of, at least in p.art, a data flow, e.g. of the type containing transactional or enteφrise data. Related aspects provide such methods in which the steps .are executed on a first portion of a directed graph and, then, separately on a second portion of the directed graph, e.g., as where the second portion reflects updates to a data set represented by the first portion.
These and other aspects are evident in the drawings and in the description that follows.
Brief Description of the Drawings
A more complete understanding of the invention may be attained by reference to the drawings, in which:
Figure 1 is a block diagram of a system according to the invention for identifying related data in a data set;
Figure 2 depicts a data set suitable for processing by a methods and apparatus according to the invention;
Figures 3-5 depict operation of the system of Figure 1 on the data set of Figure 2 with different criteria.
Detailed Description of the Illustrated Embodiment
Figure 1 depicts a system 8 according to the invention for identifying and/or generating (collectively, "identifying") a subset of a directed graph, namely, that subset matching or related to a criteria. The embodiment (and, more generally, the invention) is suited for use inter alia in generating subsets of RDF data sets consolidated from one or more data sources, e.g., in the manner described in the following copending, commonly .assigned application, the teachings of which are incorporated herein by reference
United States Patent Application Number Serial No. 09/917,2.54, filed July 27, 2001, entitled "Methods and Apparatus for Enteφrise Application Integration,"
United States Patent Application Number Serial No. 10/051,619, filed October 29, 2001, entitled "Methods And Apparatus For Real-time Business Visibility Using Persistent Schema-less Data Storage,"
United States Patent Application Number Serial No. 60/332,219, filed November 21,
2001 , entitled "Methods And Apparatus For Calculation And Reduction Of Time-series
Metrics From Event Streams Or Legacy Databases In A System For Real-time Business
Visibility," and
United States Patent Application Number Serial No. 60/332,053, filed November 21,
2001, entitled "Methods And Apparatus For Querying A Relational Database Of RDF
Triples In A System For Real-time Business Visibility."
The embodiment (.and, again, more generally, the invention) is also suited inter alia for generating subsets of "meta" directed graphs of the type described in copending, commonly assigned application United States Patent Application Number Serial No. 10/138,725, filed May 3, 2002, entitled "Methods And Apparatus for Visualizing Relationships Among Triples of Resource Description Fr.amework (RDF) Data Sets," the teachings of which are incorporated herein by reference.
The illustrated system 8 includes a module 12 that executes a set of rules 18 with respect to a set of facts 16 representing criteria in order to generate a subset 20 of a set of facts 10 representing an input data set, where that subset 20 represents those input data facts that match the criteria or are related thereto. For simplicity, in the discussion that follows the set of facts 16 representing criteria are referred to as "criteria" or "criteria 16," while the set of facts 10 representing data are referred to as "data" or "data 10." The illustrated system 8 is implemented on a general- or special-puφose digital data processing system, e.g., a workstation, server, mainframe or other digital data processing system of the type conventionally available in the marketplace, configured .and operated in accord with the teachings herein. Though not shown in the drawing, the digital data processing system can be coupled for communication with other such devices, e.g., via a network or otherwise, and can include input/output devices, such as a keyboard, pointing device, display, printer and the like.
Illustrated module 12 is an executable program (compiled, interpreted or otherwise) embodying the rules 18 and operating in the manner descrited herein for identifying subsets of directed graphs. In the illustrated embodiment, module 12 is implemented in Jess (Java Expert System Shell), a rule-based expert system shell, commercially available from Sandia National Laboratories. However it can be implemented using any other "expert system" engine, if-then- else network, or other software, firmware and/or hardware environment (whether or not expert system-based) suitable for adaptation in accord with the teachings hereof.
The module 12 embodies the rules 18 in a network representation 14, e.g., an if-then- else network, or the like, native to the Jess environment. The network nodes are preferably executed so as to effect substsintially parallel operation of the rules 18, though they can be executed so as to effect serial and/or iterative operation as well or in addition. In other embodiments, the rules are represented in accord with the specifics of the corresponding engine, if- then-else network, or other software, firmware and/or hardware environment on which the embodiment is implemented. These likewise preferably effect parallel execution of the rules 18, though they may effect serial or iterative execution instead or in addition.
The data set 10 is a directed graph, e.g., a collection of nodes representing data .and directed arcs connecting nodes to one another. As used herein, a node at the source of an arc is referred to as an "ancestor" (or "direct ancestor"), while the node at the target of. he arc is referred to herein as a "descendent" (or "direct descendent"). In the illustrated embodiment, each arc has an associated type or name, e.g., in the manner of predicates of RDF triples — which, themselves, constitute and/or form directed graphs.
By way of example, in addition to RDF triples, the data set 10 can comprise data structures representing a meta directed graph of the type disclosed in copending, commonly assigned United States Patent Application Serial No. 10/138,725, filed May 3, 2002, entitled "Methods And Apparatus for Visualizing Relationships Among Triples of Resource Description Framework (RDF) Data Sets, e.g., at Figure 4A - 6B and accompanying text, all of which incoφo- rated herein by reference.
Alternatively or in addition, the data set 10 can comprise RDF triples of the type conventionally known in the art and described, for example, in Resource Description Framework (RDf") Model and Syntax Specification (Febru.ary 22, 1999). Briefly, RDF is a way of expressing the properties of items of data. Those items are referred to as subjects or resources. Their properties are referred to as predicates. And, the values of those properties .are referred to as objects. In RDF, an expression of a property of an item is referred to as a triple, a convenience reflecting that the expression contains three parts: subject, predicate and object. Subjects can be anything that is described by an RDF expression. A predicate identifies a property of a subject. An object gives a "value" of a property. Objects can be literals, i.e., strings that identify or name the corresponding property (predicate). They can also be resources.
The data set 10 may be stored on disk for input to module 12. Alternatively, or in addition, the data set may be a data flow, e.g., a stre.am of data (real-time or otherwise) originating from e-commercc, point-of-sale or other transactions or sources (whether or not business- or enteφrise-orientcd). Moreover, the data set may comprise multiple parts, each operated on by module 12 at different times — for example, a first part representing a database and a second part representing updates to that database.
Criteria 16 contains expressions including, for example, literals, wildcards, Boolean operators and so forth, against which nodes in the data set are tested. In embodiments that operate on RDF data sets, the criteria can specify subject, predicate and/or object values or other attributes. In embodiments that operate on directed graphs of other types other appropriate values and attributes may be specified. Criteria can be input by a user, e.g., from a user interface, e.g., on an ad hoc b.asis. Alternatively or in addition, they can be stored and re-used, such as where numerous data sets exist of which the same criteria is applied. Further, the criteria 16 can be generated via dynamically, e.g., via other software (or hardware) applications.
Rules 18 define the tests for identifying data in the data set 20 that match the criteria or that are related thereto. These are expressed in terms of the types and values of the data items .as well as their interrelationships or connectedness.
Rules applicable to a data set comprised of RDF triples can be expressed as follows:
Rule No. Purpose Rule
0 ("Criteria Rule") Match criteria to triples in If triple's object is a literal, identify triple data set as related if both triple's predicate and the object match those specified in the criteria.
If triple's object is a resource, identify triple as related if triple's predicate matches that specified in criteria, if any, and if triples object matches that specified in criteria.
1 ("Sibling Rule") Find as related other Identify as related a triple that shares the triples at the same level same subject (i.e., siblings), except those siblings that have the same predicate as that specified in the criteria.
("Ancestor Rule") Walk up the directed Identify as related a triple that is a direct graph to find valid triples, .ancestor of a triple identified by any of the other rules and that is not in substantial conflict with the criteria;
For puφoses hereof, a triple whose object is the subject of another triple is deemed a direct ancestor of that other triple; a triple whose subject is the object of .another triple is deemed a direct descendent of that other triple.
3 ("Descendent Walk down the directed Identify as related a triple (hereinafter
Rule") graph to find valid triples, "identified descendent") that is a direct descendent of a triple (hereinafter "identified ancestor") identified as related by any of the other rules and which identified descendent
(a) is not associated with the identified ancestor via a predicate sub¬
10 stantially matching a predicate named in the criteria, if any, and
(b) is not in substantial conflict with the criteria;
15 (c) is not associated with the identified ancestor via a predicate matching a predicate by which the identified ancestor is associated with a triple, if any, as a result of
20 which the identified ancestor was identified during execution of the Ancestor Rule.
2^ As used above and throughout "substantial conflict" means conflict that is direct or otherwise material in regard to determining related data vis-a-vis the use for which the invention is employed (e.g., as determined by default in an embodiment and/or by selection made by a user thereof). By way of non-limiting example, for some uses (and/or embodiments) differences of any sort between the object of .an RDF triple and that specified in a criteria are mate-
™ rial, while for other uses (and/or embodiments) differences with respect to suffix, case and/or tense are immaterial. Those skilled in the art will appreciate that for other uses and/or embodiments, factors other than suffix, case and/or tense may be used in determining materiality or lack thereof.
3 Rules applicable to other directed graphs (e.g., not comprised of RDF triples) can be expressed as shown below. As noted above, these other directed graphs can include the aforementioned meta directed graphs, by way of non-limiting example. It will be appreciated that the rules which follow are functionally equivalent to those expressed above. However, they take into that the data nodes in those other directed graphs may have attributes in addition to those represented in their connectedness to other data nodes. To this end, the aforementioned Sibling Rule is subsumed in those aspects of the rules that follow which call for testing each data node to determine whether they conflict with the criteria.
Rule No. Purpose Rule
0 ('Criteria Rule") Match criteria to Identify as related data substantially matching data in data set a criteria;
1 (Ancestor Rule) Walk up the directed Identified as related data that is a direct ances- graph to find valid tor of data identified in any of these rules, and data that is not in substantial conflict with the criteria;
2 (Descendent Rule) Walk down the Identify as related data (hereinafter "identified directed graph to descendent") that is a direct descendent of find valid data data (hereinafter "identified .ancestor") identified as related in any of these rules, and which identified descendent:
(a) Does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria, if any,
•and (b) Is not in substantial conflict with the criteria; and
(c) Does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if any, as a result of which the identified ancestor was identified during execution of Rule I.
Referring to back to figure J, the related data 20 output or otherwise generated by module 12 represents those nodes or triples identified as "related" during exercise of the rules. The data 20 can be output in the same form as the input data or some alternate form, e.g., pointers or other references to identified data within the data set 10. In some embodiments, it can be displayed via a user interface or printed, or digitally communicated to further applications for additional processing, e.g., via a network or the Internet. In one non-limiting example, the related data 20 can be used to generate mailings or to trigger message events.
In operation, the module 12 is loaded with rules 18. In the illustrated embodiment, this is accomplished via compilation of source code embodying those rules (expressed above in pseudo code) in the native or appropriate language of the expert system engine or other environment in which the module is implemented. See, step A. Of course, those skilled in the art will appreciate that, alternatively, rules in source code format can be retrieved at run time .and interpreted instead of compiled.
The criteria 16 is then supplied to the module 12. See, step B. These can be entered by an operator, e.g., via a keyboard or other input device. Alternatively, or in addition, they can be retrieved from disk or input from another application (e.g., a messaging system) or device, e.g., via network, inteφrocess communication or otherwise.
The data set 10 is applied to the module 12 in step C. The data set 10 can be as described above, to wit, a RDF data set or other directed graph stored in a data base or contained in a data stream, or otherwise. The data set can be applied to the module 12 via conventional techniques known in the art, e.g., retrieval from disk, communication via network, or via any other tech- πique capable of communicating a data set to a digital application.
In step D, the module 12 uses the rules 18 to apply the criteria 16 to the data set 10. In the illustrated embodiment, by way of non-limiting example, this step is executed via the network 14 configured (via the rules engine) in accord with the rules. In other embodiments, this step is executed via the corresponding internal representation of those rules.
Triples (in the case of RDF data sets) or data (in the case of data sets comprising other types of directed graphs) identified by the module .as "related" — eaning, in the context hereof, that those triples match the criteria or .are related thereto — are output as "identified data" in Step D. As described above, the output can be a list or other tabulation of identified data 20, or it can be a pointer or reference to that data, for example, a reference to a location within the data set 10.
In some embodiments, the output of identified data 20 can be stored for future use, e.g., for use with a mail-merge or other applications. In other embodiments, it can be digitally communicated to other data base systems or information repositories. Still further, in some embodiments, it can be added to a data base containing other related data, or even replace portions of that data based. The table below lists a directed graph — here, the triples of an RDF data se — of the type suitable for processing by module 12 to identify data matching a criteria and related thereto. It will be appreciated that in practice, directed graphs processed by module 12 may contain hundreds, thousands or more nodes, e.g., as would be typical for an RDF set representing transactional and enteφrise-related dat.3. Moreover, it will be appreciated that the directed graphs and/or triples Eire typically expressed in a conventional data format (e.g., XML), or otherwise, for transfer to and from the module 12.
Subject Predicate Object company://id#3 customer company://id#l company ://id#3 customer company://id#4 company.7/id#3 customer comp.aπy.7/id#2 comply ://id#l employee Howard comply ://id#l employee Alan company ://id#1 CTO Colin company://id#2 employee David company ://id#2 CTO Colin
Figure 2 is a graphical depiction of this directed graph, i.e., RDF data set. Per conveπ- tion, subjects .and resource-type objects are depicted as oval-shaped nodes; literal-type objects are depicted as rectangular nodes; and predicates are depicted as arcs connecting those nodes.
Figure 3 depicts application by module 12 of criteria on the data set shown in Figure 2 using the above-detailed rules, specifically, those of the RDF type. The criteria is predicate = CTO and object - Colin. The depiction is simplified insofar as it shows execution of the rules serially: in practice, a preferred module 12 implemented in a rules engine (such as Jess) executes the rules in accord with the engine's underlying algorithm (e.g., a Rete algorithm as disclosed by Forgy, "Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match," Problem Artificial Intelligence, 19(1982) 17-37, by http:/ herzberg.ca.sandia.gov/jess/docs/52/ rete.html; or other underlying algorithm).
In a sequence of twelve frames, the depiction shows successive identification of triples as "related" (i.e., matching the criteria or related thereto) as each rule is applied or re-applied. The illustrated sequence proceeds from left-to-right then top-to-bottom, as indicated by the dashed-line arrows. For sake of simplicity, the data set is depicted in abstract in each frame, i.e., by a small directed graph of identical shape as that of Figure 2, but without the labels. Triples identified as related are indicated in black. Referring to the first frame of Figure 3, the module 12 applies the Criteria Rule to the data set. Because the company ://id#l — CTO — Colin triple matches the criteria (to repeat, predicate = CTO .and object = Colin), it is identified as "related" and marked accordingly.
In the second frame, the module applies the Sibling Rule to find triples at the same level as the one(s) previously identified by the Criteria Rule. In this instance, the company ://id#l — employee — Howard and company ://id#l — employee — Alan triples are identified and marked accordingly.
In the third frame, the module applies the Ancestor Rule to walk up the directed graph to find ancestors of the triples previously identified as related. In this instance, the company:// id#3 — customer — company ://id#l triple is identified and marked accordingly.
In the fourth frame, the module applies the Descendent Rule to walk down the directed graph to find descendents of the triples previously identified as related. No triples are selected since both company://id#3 — customer — company://id#2 and company://id#3 — customer — comp.any://id#4 share the same predicate as company://id#3 — customer — company ://id#l . Referring back to the detailed rules, company ://id#2, by way of example, is a direct descendent that has a predicate (to wit, customer) connecting it with its identified direct ancestor (to wit, company.7/id#3) which matches a predicate that ancestor (to wit, company://id#3) has with a direct descendent (to wit, company ://id#l) via which that direct ancestor (to wit, company:// id#3) was identified during the execution of the .Ancestor Rule.
In frames 5-8, the module 12 reapplies the rules, this time beginning with a Criteria Rule match of company://id#2 — CTO — Colin. In frames 9-12, the module 12 finds no further matches upon reapplication of the rules.
Figure 4 parallels Figure 3, showing however application by module 12 of the criteria predicate = employee and object = Alan to the data set of Figure 2. Only eight frames are shown since module 12 finds no further matches during execution of the rules represented in the final four frames.
Of note in Figure 4 is frame two. Here, application of the Sibling Rule by module 12 does not result in identification of all of the siblings of company://id#l — employee — Alan (which had been identified as relevant in the prior execution of the Criteria Rule). This is because, one of siblings company ://id#l — employee — Howard has the s.ame predicate as that specified in the criteria. Accordingly, that triple is not identified or marked as related. Figure 5 also parallels Figure 3, showing however application by module 12 of the criteria resource = company: f/idM to the data set of Figure 2. Again, only eight frames are shown since module 12 finds no further matches during execution of the rules represented in the final four frames. Of note in Figure 5 is the identifications effected by specification of a resource as a criteria.
A further understanding of these examples may be attained by reference the Appendices A and B, filed herewith, which provide XML/RDF listings of the data sets and criteria, and which also show rule-by-rule identification or ("validation") of the triples.
Though the examples show application of the rules by module 12 to an RDF data set, it will be appreciated that alternate embodiments of the module can likewise apply the rules to data sets representing the meta directed graphs disclosed in copending, commonly assigned application United States PatentApplication Number Serial No. 10/138,725, filed May 3, 2002, entitled "Methods And Apparatus for Visualizing Relationships Among Triples of Resource Description Framework (RDF) Data Sets," the teachings of which are incoφorated herein by reference.
Described above are methods and apparatus meeting the desired objects. Those skilled in the art will, of course, appreciate that these are merely examples and that other embodiments, incoφorating modifications to those described herein fall within the scope of the invention, of which we claim:
1. A method for identifying related data in a directed graph, comprising:
A. executing the sub-steps of
(i) identifying as related data substantially matching a criteria;
(ii) identifying as related data that is a direct ancestor of data identified in any of sub-steps (i), (ii) and (iii), and that is not in substantial conflict with the criteria;
(iii) identifying as related data (hereinafter "identified descendent") that is a direct descendent of data (hereinafter "identified ancestor") identified as related in any of sub-steps (i), (ii) and (iii), and which identified descendent
(a) does not have a named relationship with the identified .ancestor substantially matching a relationship named in the criteria, if any, and
(b) is not in substantial conflict with the criteria;
(c) does not have a named relationship with the identified ancestor matching a relationship the identified .ancestor has with a data, if any, as a result of which the identified ancestor was identified during execution of sub-step (ii),
B. generating an indication of data identified as related in step (A).
2. The method of claim 1, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein
sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
attributes of the direct ancestor, and
a relationship between the direct ancestor and any data that descends therefrom,
in order to determine whether the director .ancestor is in substantial conflict with the criteria. 3. The method of claim 1 , wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein
sub-step (iii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
attributes of the identified descendent, and
a relationship between the identified descendent and any data that descends therefrom.
in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
4. The method of claim 1 , comprising executing any of the sub-steps of step (A) any of serially, in parallel, or recursively.
5. The method of claim 1, further comprising executing any of the sub-steps of step (A) using a rule-based engine.
6. The method of claim 5, wherein the rule-based engine uses a Rete algorithm to effect execution of one or more of the sub-steps of step (A).
7. The method of claim 1 , wherein the directed graph comprises a data flow.
8. The method of claim 7, wherein the data flow comprises any of transactional information and enteφrise-related information.
9. The method of claim 1 , comprising
executing step (A) with respect to a first data set representing a first portion of the directed graph, and
executing step (A) separately with respect to a second data set representing a second portion of the directed graph.
10. A method of claim 9, wherein the second data set comprises an update to the first data set. 11. A method for identifying related data in a directed graph, comprising:
A. executing the sub-steps of
(i) identifying as related data substantially matching a criteria;
(ii) identifying as related data that is a direct .ancestor of data identified as related in any of sub-steps (i) and (ii), and that is not in substantial conflict with the criteria;
B. generating an indication of data identified as related in step (A).
12. The method of claim 11, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein
sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
attributes of the direct ancestor, and
a relationship between the direct ancestor and any data that descends therefrom,
in order to determine whether the director ancestor is in substantial conflict with the criteria.
13. The method of claim 11 , wherein the directed graph comprises a data flow.
14. The method of claim 13, wherein the data flow comprises any of transactional information and enteφrise-related information.
15. A method for identifying related data in a directed graph, comprising:
A. executing the sub-steps of
(i) identifying as related data substantially matching a criteria;
(ii) identifying as related data (hereinafter "identified descendent") that is a direct descendent of data (hereinafter "identified ancestor") identified in any of sub- steps (i) and (ii), and which identified descendent (a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria if any, and
(b) is not in substantial conflict with the criteria;
(c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if any, as a result of which the identified ancestor was identified as related.
16. The method of claim 15, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein
sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
attributes of the identified descendent, and
a relationship between the identified descendent and any data that descends therefrom,
in oider to determine whether the identified descendent .ancestor is in substantial conflict with the criteria.
17. The method of claim 15, wherein the directed graph comprises a data flow.
18. The method of claim 17, wherein the data flow comprises any of transactional information and enteφrise-related information.
1 . The method of claim 15, comprising
executing step (A) with respect to a first data set representing a first portion of the directed graph, and
executing step (A) separately with respect to a second data set representing a second portion of the directed graph.
20. A method of claim 1 , wherein the second data set comprises an update to the first data set. 21. A method for identifying related triples in a resource description framework (RDF) data set, comprising
A. executing with respect to the data set the sub-steps of
(i) identifying as related a triple substantially matching a criteria;
I (ii) identifying as related a triple that is a direct ancestor of a triple identified as related in .any of sub-steps (i), (ii) and (iii), and that is not in substantial conflict with the criteria,
where, for purposes hereof, a triple whose object is the subject of another triple is deemed a direct ancestor of that other triple, .and, conversely, where a triple whose subject is the object of another triples is deemed a direct descendent of that other triple; ;
(iii) identifying as related a triple (hereinafter "identified descendent") that is a direct descendent of triple (hereinafter "identified ancestor") identified as related in any of sub-steps (i), (ii) and (iii), and which identified descendent
(a) is not associated with the identified ancestor via a predicate substantially matching a predicate named in the criteria, if any, and
(b) is not in substantial conflict with the criteria;
(c) is not associated with the identified ancestor via a predicate matching a predicate by which the identified ancestor is associated with a triple, if any, as a result of which the identified ancestor was identified during execution of sub-step (ii),
B. generating an indication of triples identified as related in step (A).
22. The method of claim 21 , comprising identifying as related a triple that is a sibling of a triple identified as related in sub-step (i) and that is not in substantial conflict with the criteria, where, for purposes hereof, triples that share a common subject are deemed siblings. 23. The method of claim 21 , wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (ii) includes comparing at least one of the predicate and object specified in the criteria with direct ancestor in order to determine whether the director .ancestor is in substantial conflict with the criteria.
24. The method of claim 21 , wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (iii) includes comparing at least one of the predicate and object specified in the criteria with the identified descendent in order to determine whether the identified descendent ancestor is in substanti.al conflict with the criteria.
25. The method of claim 21, comprising executing any of the sub-steps of step (A) any of serially, in p.arallel, or recursively.
26. The method of claim 21, further comprising executing any of the sub-steps of step (A) using a rule-based engine.
27. The method of claim 26, wherein the rule-based engine uses a Rete algorithm to effect execution of one or more of the sub-steps of step (A).
28. The method of claim 21 , wherein the data set comprises a data flow.
29. The method of claim 28, wherein the data flow comprises any of transactional information and enteφrise-related information.
30. The method of claim 21, comprising
executing step (A) with respect to a first data set of RDF triples,
executing step (A) separately with respect to a second, related data set of RDF triples.
31. A method of claim 30, wherein the second data set comprises an update to the first data set.
32. A method for identifying related triples in a resource description framework (RDF) data set, comprising
A. executing with respect to the data set the sub-steps of (i) identifying as related data substantially matching a criteria;
(ii) identifying as relied a triple that is a direct ancestor of a triple identified in any of sub-steps (i) and (ii), and that is not in substantial conflict with the criteria,
where, for puφoses hereof, a triple whose object is the subject of another triple is deemed a direct .ancestor of that other triple; a triple whose subject is the object of another triples is deemed a direct descendent of that other triple;
B. generating an indication of data identified as related in step (A).
33. The method of claim 32, wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (ii) includes comparing at least one of the predicate and object specified in the criteria with direct ancestor in order to determine whether the director ancestor is in substantial conflict with the criteria.
34. The method of claim 33, wherein the data set comprises a data flow.
35. The method of claim 34, wherein the data flow comprises any of transactional information and enterprise-related information.
36. A method for identifying related triples in a resource description framework (RDF) data set, comprising
A. executing with respect to the data set the sub-steps of
(i) identifying as related data substantially matching a criteria;
(ii) identifying as related data (hereinafter "identified descendent") that is a direct descendent of data (hereinafter "identified ancestor") identified as related in any of sub-steps (i) and (ii), and which identified descendent
(a) is not associated with the identified ancestor via a predicate substantially matching a predicate named in the criteria, if any, and
(b) is not in substantial conflict with the criteria; (c) is not associated with the identified ancestor via a predicate matching a predicate by which the identified ancestor is associated with a triple, if any, as a result of which the identified ancestor was identified as related,
B. generating an indication of data identified as related in step (A).
37. The method of claim 36, wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (iii) includes comparing at least one of the predicate and object specified in the criteria with the identified descendent in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
38. The method of claim 36, wherein the data set comprises a data flow.
39. The method of ctøim 38, wherein the data flow comprises any of transactional information and enteφrise-related information.
40. The method of claim 21 , comprising
executing step (A) with respect to a first data set of RDF triples,
executing step (A) separately with respect to a second, related data set of RDF triples.
41. A method of claim 40, wherein the second data set comprises an update to the first data set.
42. A method for identifying related data in a directed graph, comprising:
A. executing the sub-steps of
(i) identifying as related data that is a direct ancestor of data identified in .any of sub-steps (i) and (ii), and that is not in substantial conflict with the criteria;
(ii) identifying as related data (hereinafter "identified descendent") that is a direct descendent of data (hereinafter "identified ancestor") identified as related in any of sub-steps (i) and (ii) and which identified descendent (a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria, if any, and
(b) is not in substantial conflict with the criteria;
(c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if .any, as a result of which the identified ancestor was identified during execution of sub-step (ii), ing an indication of data identified as related in step (A).
1/5
Figure imgf000113_0001
ill 2/5
Figure imgf000114_0002
Figure imgf000114_0001
3/5
Figure imgf000115_0001
4/5
Figure imgf000116_0001
5/5
Figure imgf000117_0001
I
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
(19) World Intellectual Property Organization
International Bureau i mti iiHim it min f ii nut nm IΪII I » m urn inn nm inn y HI ιaι inim mi mi an
Figure imgf000118_0001
(43) International Publication Date (10) International Publication Number 22 April 2004 (22.04.2004) PCT WO 2004/034625 A2
(51) International Patent Classification7: H04L (74) Agents: POWSNER, David, J. et al.; Nutter, McClennen & Fish LLP, World Trade Center West, 155 Seaport Blvd.,
(21) International Application Number: Boston, MA 02210-2604 (US).
PCT/US20O3/D31636
(Α) Designated States (national): AE, AG, AL, AM, AT, ΛU,
(22) International Filing Date: 7 October 2003 (07.10.2003) AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, CZ, DE, DK, DM, DZ, EC, EE, ES, FI, GB, GD, GE, GH,
(25) Filing Language: English GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, NI, NO, NZ, OM, PG, PH, PL, PT, RO, RU, SC,
(26) Publication Language: English SD, S13, SG, SK, SL, SY, TJ, TM, TN, TR, TT, TZ, UA,
NL, PT, RO, BJ, CF, CG, CI, CM, NE, SN, TD, TG).
report and to be republished
page] DIRECTED GRAPH HAVING
graph (e.g., an RDF data set). A "first" (or marking) as related data expressly identified as related, e.g., in the first descendents of any data identified, relationship with the ancestor from which it
Figure imgf000118_0002
identified as related in the three steps. - C T. -' U S O 8 !•/ Ξ .1 G 7.1
WO 2004/034625 A2 ft>r two-letter codes and other abbreviations, refer to the "Guidance Notes on Codes and Abbreviations "appearing at the beginning of each regular issue of the PCT Gazette.
Ill

Claims

Claims
1. A digital data processing system for surveillance, monitoring and real-time events handling, the system comprising
a plurality of data sources,
query functionality, coupled to the data sources, to identify information from the data sources responsive to one or more queries,
a framework server, coupled to the query functionality, that presents information selected identified by the expert engine to a web browser.
2. The digital data processing system of claim 1, wherein the data sources comprise network nodes at one or more clinical care providers, laboratories, governmental health departments, centers for disease control, and law enforcement offices.
3. The digital data processing system of claim 1, wherein the query functionality comprises any of a graph generator, algorithmic search and an expert engine,
4. The digital data processing system of claim 3, wherein
the query functionality comprises an expert engine, and wherein
the expert engine classifies the information from the data sources to be ignored, stored and/or logged for alert.
5. The digital data processing system of claim 1, comprising a data store that maintains resource description framework (RDF) triples representing at least selected data from the data sources.
6. The digital data processing system of claim 5, wherein the data sources are compliant with a public health information network (PHIN) protocol, a health area network (HAN) protocol, National Electronic Disease Surveillance System (NEDSS) protocol, or other protocol for communication of health and/or bioterrorism data and wherein the system comprises interconnect functionality that applies requests to, and receives information from, one or more of the data sources utilizing an API or interface mechanism dictated by the PHIN, HAN or NEDSS or other protocol communication of information used in communication of health and bioterrorism data.
(Claims)
7. The digital data processing system of claim 6, wherein the interconnect applies SQL queries to selected ones of the data sources.
8. The digital data processing system of claim 7, wherein the interconnect provides for the automated exchange of information with one or more selected data sources.
9. The digital data processing system of claim 8, wherein the interconnect provides for the automated exchange of information between public health partners on a PHIN network.
10. A digital data processing system for surveillance, monitoring and real-time events handling, the system comprising
a plurality of data sources,
an expert engine, coupled to the data sources, to identify related information received therefrom, the expert engine executing the steps of
(i) identifying as related data substantially matching a criteria;
(ii) identifying as related data that is a direct ancestor of data identified in any of steps (i), (ii) and (iii), and that is not in substantial conflict with the criteria;
(iii) identifying as related data (hereinafter "identified descendent") that is a direct descendent of data (hereinafter "identified ancestor") identified as related in any of steps (i), (ii) and (iii), and which identified descendent
(a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria, if any, and
(b) is not in substantial conflict with the criteria;
(c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if any, as a result of which the identified ancestor was identified during execution of step (ii),
a framework server, coupled to the expert engine, that presents selected information identified as related by the expert engine to a web browser.
(Claims)
11. The digital data processing system of claim 10, wherein the data sources comprise network nodes at one or more clinical care providers, laboratories, governmental health departments, centers for disease control, and law enforcement offices, and wherein the data sources are compliant with a public health information network (PHIN) protocol, a health area network (HAN) protocol, National Electronic Disease Surveillance System (NEDSS) protocol, or other protocol for communication of health and/or bioterrorism data.
12. The digital data processing system of claim 10, comprising query functionality, comprising any of a graph generator and algorithmic search, to identify information from the data sources responsive to one or more queries.
13. The digital data processing system of claim 12, wherein the expert engine is coupled to the query functionality to classify information received therefrom to be ignored, stored and/or logged for alert.
14. The method of claim 10, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein
step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of attributes of the direct ancestor, and
a relationship between the direct ancestor and any data that descends therefrom, in order to determine whether the director ancestor is in substantial conflict with the criteria.
15. The method of claim 10, wherein
the criteria specifies a named relationship and a characteristic of that named relationship, and wherein
step (iii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of
attributes of the identified descendent, and
a relationship between the identified descendent and any data that descends therefrom,
in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
(Claims)
16. A digital data processing system for surveillance, monitoring and real-time events handling, the system comprising
a plurality of data sources,
a data store that maintains resource description framework (RDF) triples representing at least selected data from the data sources
data reduction functionality, coupled to the data sources and to the data store, to identify information from the data sources responsive to one or more queries and to summarize that information in one or more selected epochs of differing length, the data reduction functionality generating RDF triples characterizing the summarized information and storing those triples to the data store,
a framework server, coupled to the query functionality, that presents information selected identified by the expert engine to a web browser.
17. The digital data processing system of claim 16, wherein the data reduction functionality queries one or more data sources in an SQL format.
18. The digital data processing system of claim 16, wherein the data reduction functionality parses an XML file that identifies one or more of the data sources, one or more fields thereof to be summarized, and/or one or more epochs for which those fields are to be summarized.
19. The digital data processing system of claim 16, wherein the data reduction functionality responds to data received from a data source by updating a store associated with an epoch of shorter duration.
20. The digital data processing system claim 16, wherein the framework server generates a display or other presentation based on the RDF triples characterizing the summarized data.
21. The digital data processing system of claim 20, query functionality, coupled to the data sources and to the data reduction functionality, to identify information from the data sources responsive to one or more queries,
a framework server, coupled to the query functionality, that presents information selected identified by the expert engine to a web browser.
(Claims)
22. The digital data processing system of claim 20, wherein the data sources comprise network nodes at one or more clinical care providers, laboratories, governmental health departments, centers for disease control, and law enforcement offices.
23. The digital data processing system of claim 20, wherein the query functionality comprises any of a graph generator, algorithmic search and an expert engine,
24. The digital data processing system of claim 23, wherein
the query functionality comprises an expert engine, and wherein
the expert engine classifies the information from the data sources to be ignored, stored and/or logged for alert.
25. A digital data processing system for surveillance, monitoring and real-time events handling, the system comprising
A. a plurality of data sources,
B. query functionality, coupled to the data sources, to identify information from the data sources responsive to one or more queries, the query functionality translating a schema- less input query in a first language to an output query in a second language by the steps of:
i) examining the schema-less input query for one or more tokens that represent data to be used in generating the output query;
ii) dispatching context-specific grammar events containing that data; and
iii) populating portions of the output query according to the events and data;
iv) generating the output query in the second language comprising those populated portions, wherein the output query represents a schema of a relational database storing RDF triples,
C. a framework server, coupled to the query functionality, that presents information selected identified by the expert engine to a web browser.
26. The digital data processing system of claim 1, wherein the data sources comprise network nodes at one or more clinical care providers, laboratories, governmental health departments, centers for disease control, and law enforcement offices and wherein the
(Claims) data sources are compliant with a public health information network (PHIN) protocol, a health area network (HAN) protocol, National Electronic Disease Surveillance System (NEDSS) protocol, or other protocol for communication of health and/or bioterrorism data.
26. The digital data processing system of claim 25, wherein in connection with dispatching events the query functionality generates any of a logical condition event, a selection term declaration event, and a triple declaration event.
27. The digital data processing system of claim 25, wherein in connection with generating a logical condition event the query functionality generates an event containing data which, when applied to the relational database via the output query, identifies RDF triples according to a Boolean condition.
28. The digital data processing system of claim 25, wherein in connection with generating a selection term declaration event the query functionality generates an event containing data which, when applied to the relational database via the output query, identifies RDF triples including a specified term.
29. The digital data processing system of claim 25, wherein in connection with generating a triple declaration event the query functionality generates an event containing data which, when applied to the relational database via the output query, identifies RDF triples according to a specified subject, predicate and object.
30. The digital data processing system of claim 25, wherein the query functionality comprises any of a graph generator, algorithmic search and an expert engine,
31. The digital data processing system of claim 30, wherein
the query functionality comprises an expert engine, and wherein
the expert engine classifies the information from the data sources to be ignored, stored and/or logged for alert.
32. The digital data processing system of claim 25, further comprising data reduction functionality, coupled to the data sources and to the data store, to identify information from the data sources responsive to one or more queries and to summarize that information in one or more selected epochs of differing length, the data reduction functionality gen-
(Claims) erating RDF triples characterizing the summarized information and storing those triples to the data store.
33. The digital data processing system of claim 30, wherein
the query functionality comprises an expert engine, and wherein
the expert engine classifies the information from the data sources to be ignored, stored and/or logged for alert.
34. The digital data processing system of any of claims 1, 10, 16 and 25 wherein the framework server generates a border/port security display comprising a plurality of panels, a first of which displays information relating to an alert, a second of which displays information from a particular data source or an aggregation of data from several data sources, a third of which displays real-time data from a data source superimposed on a map of a locale.
35. The digital data processing system of claim 34, wherein the framework server responds to a selected user input with respect to an item displayed in one of the panels by display additional information about that item.
36. The digital data processing system of claim 34, wherein the framework server responds to an alert by displaying a zoomed-in portion of a locale shown in one of the panels and wherein the system includes additional functionality for alerting people, agencies or other entities of such alert.
37. The digital data processing system of any of claims 1, 10, 16 and 25 wherein the system is configured for any of border & port security, public & community safety, and government data integration applications.
38. The method of operating a digital data processing system having the functionality recited in claims 1 - 34, above.
(Claims)
PCT/US2004/021671 2001-05-15 2004-07-07 Surveillance, monitoring and real-time events platform WO2005029365A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP04809476A EP1690210A2 (en) 2003-07-07 2004-07-07 Surveillance, monitoring and real-time events platform
US11/064,438 US7890517B2 (en) 2001-05-15 2005-02-23 Appliance for enterprise information integration and enterprise resource interoperability platform and methods

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US48520003P 2003-07-07 2003-07-07
US60/485,200 2003-07-07

Publications (2)

Publication Number Publication Date
WO2005029365A2 true WO2005029365A2 (en) 2005-03-31
WO2005029365A3 WO2005029365A3 (en) 2005-05-19

Family

ID=34375218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/021671 WO2005029365A2 (en) 2001-05-15 2004-07-07 Surveillance, monitoring and real-time events platform

Country Status (3)

Country Link
US (2) US8572059B2 (en)
EP (1) EP1690210A2 (en)
WO (1) WO2005029365A2 (en)

Families Citing this family (118)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7770102B1 (en) 2000-06-06 2010-08-03 Microsoft Corporation Method and system for semantically labeling strings and providing actions based on semantically labeled strings
US7788602B2 (en) * 2000-06-06 2010-08-31 Microsoft Corporation Method and system for providing restricted actions for recognized semantic categories
US7716163B2 (en) 2000-06-06 2010-05-11 Microsoft Corporation Method and system for defining semantic categories and actions
US7712024B2 (en) 2000-06-06 2010-05-04 Microsoft Corporation Application program interfaces for semantically labeling strings and providing actions based on semantically labeled strings
US20060122474A1 (en) 2000-06-16 2006-06-08 Bodymedia, Inc. Apparatus for monitoring health, wellness and fitness
US7778816B2 (en) * 2001-04-24 2010-08-17 Microsoft Corporation Method and system for applying input mode bias
US7890517B2 (en) * 2001-05-15 2011-02-15 Metatomix, Inc. Appliance for enterprise information integration and enterprise resource interoperability platform and methods
US6856992B2 (en) * 2001-05-15 2005-02-15 Metatomix, Inc. Methods and apparatus for real-time business visibility using persistent schema-less data storage
US6925457B2 (en) * 2001-07-27 2005-08-02 Metatomix, Inc. Methods and apparatus for querying a relational data store using schema-less queries
US7058637B2 (en) * 2001-05-15 2006-06-06 Metatomix, Inc. Methods and apparatus for enterprise application integration
US20030208499A1 (en) * 2002-05-03 2003-11-06 David Bigwood Methods and apparatus for visualizing relationships among triples of resource description framework (RDF) data sets
US7707496B1 (en) 2002-05-09 2010-04-27 Microsoft Corporation Method, system, and apparatus for converting dates between calendars and languages based upon semantically labeled strings
US7707024B2 (en) * 2002-05-23 2010-04-27 Microsoft Corporation Method, system, and apparatus for converting currency values based upon semantically labeled strings
US7742048B1 (en) 2002-05-23 2010-06-22 Microsoft Corporation Method, system, and apparatus for converting numbers based upon semantically labeled strings
US7827546B1 (en) 2002-06-05 2010-11-02 Microsoft Corporation Mechanism for downloading software components from a remote source for use by a local software application
US7281245B2 (en) * 2002-06-05 2007-10-09 Microsoft Corporation Mechanism for downloading software components from a remote source for use by a local software application
US7356537B2 (en) * 2002-06-06 2008-04-08 Microsoft Corporation Providing contextually sensitive tools and help content in computer-generated documents
US7716676B2 (en) 2002-06-25 2010-05-11 Microsoft Corporation System and method for issuing a message to a program
US7392479B2 (en) * 2002-06-27 2008-06-24 Microsoft Corporation System and method for providing namespace related information
US7209915B1 (en) 2002-06-28 2007-04-24 Microsoft Corporation Method, system and apparatus for routing a query to one or more providers
CA2501847A1 (en) * 2002-10-07 2004-04-22 Metatomix, Inc Methods and apparatus for identifying related nodes in a directed graph having named arcs
KR20050055072A (en) * 2002-10-09 2005-06-10 보디미디어 인코퍼레이티드 Apparatus for detecting, receiving, deriving and displaying human physiological and contextual information
US7418666B2 (en) 2002-10-21 2008-08-26 Bentley Systems, Incorporated System, method and computer program product for managing CAD data
US9412141B2 (en) * 2003-02-04 2016-08-09 Lexisnexis Risk Solutions Fl Inc Systems and methods for identifying entities using geographical and social mapping
US7783614B2 (en) 2003-02-13 2010-08-24 Microsoft Corporation Linking elements of a document to corresponding fields, queries and/or procedures in a database
US20040172584A1 (en) * 2003-02-28 2004-09-02 Microsoft Corporation Method and system for enhancing paste functionality of a computer software application
US7711550B1 (en) 2003-04-29 2010-05-04 Microsoft Corporation Methods and system for recognizing names in a computer-generated document and for providing helpful actions associated with recognized names
US7739588B2 (en) 2003-06-27 2010-06-15 Microsoft Corporation Leveraging markup language data for semantically labeling text strings and data and for providing actions based on semantically labeled text strings and data
EP1690210A2 (en) 2003-07-07 2006-08-16 Metatomix, Inc. Surveillance, monitoring and real-time events platform
US7405739B2 (en) * 2003-08-22 2008-07-29 Honeywell International Inc. System and method for changing the relative size of a displayed image
US20050182617A1 (en) * 2004-02-17 2005-08-18 Microsoft Corporation Methods and systems for providing automated actions on recognized text strings in a computer-generated document
WO2005081963A2 (en) * 2004-02-23 2005-09-09 Metatomix, Inc. Appliance for enterprise information integration and enterprise resource interoperability platform and methods
WO2005094226A2 (en) * 2004-03-04 2005-10-13 United States Postal Service System and method for providing centralized management and distribution of information to remote users
US7665063B1 (en) 2004-05-26 2010-02-16 Pegasystems, Inc. Integration of declarative rule-based processing with procedural programming
US20060187082A1 (en) * 2004-08-31 2006-08-24 Santiago Estefania Processing observed data received over a network
US20060069593A1 (en) * 2004-08-31 2006-03-30 Estefania Santiago S Notification transmission over a network based on observed data
US7703671B2 (en) * 2005-01-28 2010-04-27 Arrowhead Center, Inc. Monitoring device and security system
US8335704B2 (en) 2005-01-28 2012-12-18 Pegasystems Inc. Methods and apparatus for work management and routing
US20060248586A1 (en) * 2005-04-27 2006-11-02 Tekelec Methods, systems, and computer program products for surveillance monitoring in a communication network based on a national surveillance database
EP1932350B1 (en) * 2005-09-06 2016-10-26 Infraegis, Inc. Threat detection and monitoring apparatus with integrated display system
US20080046285A1 (en) * 2006-08-18 2008-02-21 Greischar Patrick J Method and system for real-time emergency resource management
US8428961B2 (en) * 2005-09-14 2013-04-23 Emsystem, Llc Method and system for data aggregation for real-time emergency resource management
US20070174093A1 (en) * 2005-09-14 2007-07-26 Dave Colwell Method and system for secure and protected electronic patient tracking
US7992085B2 (en) 2005-09-26 2011-08-02 Microsoft Corporation Lightweight reference user interface
US7788590B2 (en) * 2005-09-26 2010-08-31 Microsoft Corporation Lightweight reference user interface
US10354760B1 (en) * 2005-10-18 2019-07-16 At&T Intellectual Property Ii, L.P. Tool for visual exploration of medical data
US8706514B1 (en) 2005-10-18 2014-04-22 At&T Intellectual Property Ii, L.P. Case management system and method for mediating anomaly notifications in health data to health alerts
US7725325B2 (en) * 2006-01-18 2010-05-25 International Business Machines Corporation System, computer program product and method of correlating safety solutions with business climate
US7515974B2 (en) * 2006-02-21 2009-04-07 Honeywell International Inc. Control system and method for compliant control of mission functions
US20070220006A1 (en) * 2006-03-07 2007-09-20 Cardiac Pacemakers, Inc. Method and apparatus for automated generation and transmission of data in a standardized machine-readable format
US20070226013A1 (en) * 2006-03-07 2007-09-27 Cardiac Pacemakers, Inc. Method and apparatus for automated generation and transmission of data in a standardized machine-readable format
US20090132232A1 (en) * 2006-03-30 2009-05-21 Pegasystems Inc. Methods and apparatus for implementing multilingual software applications
US8924335B1 (en) 2006-03-30 2014-12-30 Pegasystems Inc. Rule-based user interface conformance methods
US8131696B2 (en) * 2006-05-19 2012-03-06 Oracle International Corporation Sequence event processing using append-only tables
US8762395B2 (en) 2006-05-19 2014-06-24 Oracle International Corporation Evaluating event-generated data using append-only tables
EP2052318A4 (en) 2006-07-25 2014-04-30 Northrop Grumman Systems Corp Global disease surveillance platform, and corresponding system and method
WO2008022051A2 (en) * 2006-08-10 2008-02-21 Loma Linda University Medical Center Advanced emergency geographical information system
US7962955B2 (en) * 2006-08-15 2011-06-14 International Business Machines Corporation Protecting users from malicious pop-up advertisements
US8069202B1 (en) 2007-02-02 2011-11-29 Resource Consortium Limited Creating a projection of a situational network
US9152706B1 (en) * 2006-12-30 2015-10-06 Emc Corporation Anonymous identification tokens
US20080319796A1 (en) * 2007-02-16 2008-12-25 Stivoric John M Medical applications of lifeotypes
US8250525B2 (en) 2007-03-02 2012-08-21 Pegasystems Inc. Proactive performance management for multi-user enterprise software systems
US8370812B2 (en) 2007-04-02 2013-02-05 International Business Machines Corporation Method and system for automatically assembling processing graphs in information processing systems
US8166465B2 (en) * 2007-04-02 2012-04-24 International Business Machines Corporation Method and system for composing stream processing applications according to a semantic description of a processing goal
US8117233B2 (en) * 2007-05-14 2012-02-14 International Business Machines Corporation Method and system for message-oriented semantic web service composition based on artificial intelligence planning
US20090030758A1 (en) * 2007-07-26 2009-01-29 Gennaro Castelli Methods for assessing potentially compromising situations of a utility company
WO2009081393A2 (en) * 2007-12-21 2009-07-02 Semantinet Ltd. System and method for invoking functionalities using contextual relations
US20090177626A1 (en) * 2008-01-05 2009-07-09 Robert Lottero Apparatus and method for investigative analysis of law enforcement cases
US10481878B2 (en) * 2008-10-09 2019-11-19 Objectstore, Inc. User interface apparatus and methods
US8843435B1 (en) 2009-03-12 2014-09-23 Pegasystems Inc. Techniques for dynamic data processing
US8468492B1 (en) 2009-03-30 2013-06-18 Pegasystems, Inc. System and method for creation and modification of software applications
US20110202326A1 (en) * 2010-02-17 2011-08-18 Lockheed Martin Corporation Modeling social and cultural conditions in a voxel database
WO2011137935A1 (en) * 2010-05-07 2011-11-10 Ulysses Systems (Uk) Limited System and method for identifying relevant information for an enterprise
US8266551B2 (en) * 2010-06-10 2012-09-11 Nokia Corporation Method and apparatus for binding user interface elements and granular reflective processing
US8972070B2 (en) 2010-07-02 2015-03-03 Alstom Grid Inc. Multi-interval dispatch system tools for enabling dispatchers in power grid control centers to manage changes
US9251479B2 (en) * 2010-07-02 2016-02-02 General Electric Technology Gmbh Multi-interval dispatch method for enabling dispatchers in power grid control centers to manage changes
US9093840B2 (en) * 2010-07-02 2015-07-28 Alstom Technology Ltd. System tools for integrating individual load forecasts into a composite load forecast to present a comprehensive synchronized and harmonized load forecast
US20110071690A1 (en) * 2010-07-02 2011-03-24 David Sun Methods that provide dispatchers in power grid control centers with a capability to manage changes
US20110029142A1 (en) * 2010-07-02 2011-02-03 David Sun System tools that provides dispatchers in power grid control centers with a capability to make changes
US8538593B2 (en) 2010-07-02 2013-09-17 Alstom Grid Inc. Method for integrating individual load forecasts into a composite load forecast to present a comprehensive synchronized and harmonized load forecast
US9558250B2 (en) * 2010-07-02 2017-01-31 Alstom Technology Ltd. System tools for evaluating operational and financial performance from dispatchers using after the fact analysis
US9727828B2 (en) * 2010-07-02 2017-08-08 Alstom Technology Ltd. Method for evaluating operational and financial performance for dispatchers using after the fact analysis
MY180571A (en) * 2010-12-10 2020-12-02 Mimos Berhad A system and method for providing interface for real-time surveillance
US8880487B1 (en) 2011-02-18 2014-11-04 Pegasystems Inc. Systems and methods for distributed rules processing
US20130094403A1 (en) * 2011-10-18 2013-04-18 Electronics And Telecommunications Research Institute Method and apparatus for providing sensor network information
KR101720316B1 (en) * 2011-10-18 2017-04-05 한국전자통신연구원 Method and apparatus for providing information for sensor network
US9129039B2 (en) * 2011-10-18 2015-09-08 Ut-Battelle, Llc Scenario driven data modelling: a method for integrating diverse sources of data and data streams
EP2798483A1 (en) 2011-12-28 2014-11-05 Nokia Corporation Application switcher
US8996729B2 (en) 2012-04-12 2015-03-31 Nokia Corporation Method and apparatus for synchronizing tasks performed by multiple devices
US9195936B1 (en) 2011-12-30 2015-11-24 Pegasystems Inc. System and method for updating or modifying an application without manual coding
US9395880B2 (en) * 2012-06-12 2016-07-19 Qvera Llc Health information mapping system with graphical editor
US10910095B1 (en) * 2012-06-12 2021-02-02 Qvera Llc Mapping systems
US20140047129A1 (en) * 2012-08-09 2014-02-13 Mckesson Financial Holdings Method, apparatus, and computer program product for interfacing with an unidentified health information technology system
EP3063670A2 (en) 2013-10-31 2016-09-07 Isis Innovation Limited Parallel materialisation of a set of logical rules on a logical database
US9262740B1 (en) * 2014-01-21 2016-02-16 Utec Survey, Inc. Method for monitoring a plurality of tagged assets on an offshore asset
US11120274B2 (en) 2014-04-10 2021-09-14 Sensormatic Electronics, LLC Systems and methods for automated analytics for security surveillance in operation areas
US10057546B2 (en) 2014-04-10 2018-08-21 Sensormatic Electronics, LLC Systems and methods for automated cloud-based analytics for security and/or surveillance
US10084995B2 (en) 2014-04-10 2018-09-25 Sensormatic Electronics, LLC Systems and methods for an automated cloud-based video surveillance system
US11093545B2 (en) 2014-04-10 2021-08-17 Sensormatic Electronics, LLC Systems and methods for an automated cloud-based video surveillance system
US10217003B2 (en) 2014-04-10 2019-02-26 Sensormatic Electronics, LLC Systems and methods for automated analytics for security surveillance in operation areas
US10469396B2 (en) 2014-10-10 2019-11-05 Pegasystems, Inc. Event processing with enhanced throughput
US10185807B2 (en) * 2014-11-18 2019-01-22 Mastercard International Incorporated System and method for conducting real time active surveillance of disease outbreak
US10698599B2 (en) 2016-06-03 2020-06-30 Pegasystems, Inc. Connecting graphical shapes using gestures
US10698647B2 (en) 2016-07-11 2020-06-30 Pegasystems Inc. Selective sharing for collaborative application usage
JP6310532B1 (en) * 2016-11-24 2018-04-11 ヤフー株式会社 Generating device, generating method, and generating program
US11294641B2 (en) 2017-05-30 2022-04-05 Dimitris Lyras Microprocessor including a model of an enterprise
CN108228691B (en) * 2017-06-30 2023-06-23 勤智数码科技股份有限公司 Processing method of data elements in government information management
US10848512B2 (en) 2018-06-06 2020-11-24 Reliaquest Holdings, Llc Threat mitigation system and method
US11709946B2 (en) 2018-06-06 2023-07-25 Reliaquest Holdings, Llc Threat mitigation system and method
US11048488B2 (en) 2018-08-14 2021-06-29 Pegasystems, Inc. Software code optimizer and method
USD926809S1 (en) 2019-06-05 2021-08-03 Reliaquest Holdings, Llc Display screen or portion thereof with a graphical user interface
USD926810S1 (en) 2019-06-05 2021-08-03 Reliaquest Holdings, Llc Display screen or portion thereof with a graphical user interface
USD926782S1 (en) 2019-06-06 2021-08-03 Reliaquest Holdings, Llc Display screen or portion thereof with a graphical user interface
USD926200S1 (en) 2019-06-06 2021-07-27 Reliaquest Holdings, Llc Display screen or portion thereof with a graphical user interface
USD926811S1 (en) 2019-06-06 2021-08-03 Reliaquest Holdings, Llc Display screen or portion thereof with a graphical user interface
US20210398236A1 (en) * 2020-06-19 2021-12-23 Abhijit R. Nesarikar Remote Monitoring with Artificial Intelligence and Awareness Machines
US11567945B1 (en) 2020-08-27 2023-01-31 Pegasystems Inc. Customized digital content generation systems and methods
US11132552B1 (en) * 2021-02-12 2021-09-28 ShipIn Systems Inc. System and method for bandwidth reduction and communication of visual events

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030009239A1 (en) * 2000-03-23 2003-01-09 Lombardo Joseph S Method and system for bio-surveillance detection and alerting

Family Cites Families (170)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US582252A (en) * 1897-05-11 Valve
US4701130A (en) 1985-01-11 1987-10-20 Access Learning Technology Corporation Software training system
US4895518A (en) 1987-11-02 1990-01-23 The University Of Michigan Computerized diagnostic reasoning evaluation system
US4953106A (en) 1989-05-23 1990-08-28 At&T Bell Laboratories Technique for drawing directed graphs
US5119465A (en) 1989-06-19 1992-06-02 Digital Equipment Corporation System for selectively converting plurality of source data structures through corresponding source intermediate structures, and target intermediate structures into selected target structure
US5129043A (en) 1989-08-14 1992-07-07 International Business Machines Corporation Performance improvement tool for rule based expert systems
US5301270A (en) 1989-12-18 1994-04-05 Anderson Consulting Computer-assisted software engineering system for cooperative processing environments
JP3245655B2 (en) 1990-03-05 2002-01-15 インキサイト ソフトウェア インコーポレイテッド Workspace display processing method
US6185516B1 (en) 1990-03-06 2001-02-06 Lucent Technologies Inc. Automata-theoretic verification of systems
US5761493A (en) 1990-04-30 1998-06-02 Texas Instruments Incorporated Apparatus and method for adding an associative query capability to a programming language
US5311422A (en) 1990-06-28 1994-05-10 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration General purpose architecture for intelligent computer-aided training
SE9002558D0 (en) 1990-08-02 1990-08-02 Carlstedt Elektronik Ab PROCESSOR
US5199068A (en) 1991-01-22 1993-03-30 Professional Achievement Systems, Inc. Computer-based training system with student verification
US5270920A (en) 1991-05-13 1993-12-14 Hughes Training, Inc. Expert system scheduler and scheduling method
US5326270A (en) 1991-08-29 1994-07-05 Introspect Technologies, Inc. System and method for assessing an individual's task-processing style
US5395243A (en) 1991-09-25 1995-03-07 National Education Training Group Interactive learning system
US5333254A (en) 1991-10-02 1994-07-26 Xerox Corporation Methods of centering nodes in a hierarchical display
US5421730A (en) 1991-11-27 1995-06-06 National Education Training Group, Inc. Interactive learning system providing user feedback
US5381332A (en) 1991-12-09 1995-01-10 Motorola, Inc. Project management system with automated schedule and cost integration
US5259766A (en) 1991-12-13 1993-11-09 Educational Testing Service Method and system for interactive computer science testing, anaylsis and feedback
US5267865A (en) 1992-02-11 1993-12-07 John R. Lee Interactive computer aided natural learning method and apparatus
US5310349A (en) 1992-04-30 1994-05-10 Jostens Learning Corporation Instructional management system
US5450480A (en) 1992-08-25 1995-09-12 Bell Communications Research, Inc. Method of creating a telecommunication service specification
US5463682A (en) 1992-08-25 1995-10-31 Bell Communications Research, Inc. Method of creating user-defined call processing procedures
US5579486A (en) 1993-01-14 1996-11-26 Apple Computer, Inc. Communication node with a first bus configuration for arbitration and a second bus configuration for data transfer
WO1994020918A1 (en) 1993-03-11 1994-09-15 Fibercraft/Descon Engineering, Inc. Design and engineering project management system
US5809212A (en) 1993-07-12 1998-09-15 New York University Conditional transition networks and computational processes for use interactive computer-based systems
US5519618A (en) 1993-08-02 1996-05-21 Massachusetts Institute Of Technology Airport surface safety logic
US5374932A (en) 1993-08-02 1994-12-20 Massachusetts Institute Of Technology Airport surface surveillance system
US6115509A (en) * 1994-03-10 2000-09-05 International Business Machines Corp High volume document image archive system and method
US5548506A (en) 1994-03-17 1996-08-20 Srinivasan; Seshan R. Automated, electronic network based, project management server system, for managing multiple work-groups
US5655118A (en) 1994-03-21 1997-08-05 Bell Communications Research, Inc. Methods and apparatus for managing information on activities of an enterprise
US5597312A (en) 1994-05-04 1997-01-28 U S West Technologies, Inc. Intelligent tutoring method and system
US5862321A (en) 1994-06-27 1999-01-19 Xerox Corporation System and method for accessing and distributing electronic documents
JPH0876680A (en) 1994-09-02 1996-03-22 Fujitsu Ltd Management education system
US5732192A (en) 1994-11-30 1998-03-24 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Global qualitative flow-path modeling for local state determination in simulation and analysis
US5499293A (en) 1995-01-24 1996-03-12 University Of Maryland Privacy protected information medium using a data compression method
US5745753A (en) 1995-01-24 1998-04-28 Tandem Computers, Inc. Remote duplicate database facility with database replication support for online DDL operations
JP2765506B2 (en) 1995-01-30 1998-06-18 日本電気株式会社 Logic circuit delay information retention method
US5701451A (en) 1995-06-07 1997-12-23 International Business Machines Corporation Method for fulfilling requests of a web browser
US5907837A (en) 1995-07-17 1999-05-25 Microsoft Corporation Information retrieval system in an on-line network including separate content and layout of published titles
US5634053A (en) 1995-08-29 1997-05-27 Hughes Aircraft Company Federated information management (FIM) system and method for providing data site filtering and translation for heterogeneous databases
US5873076A (en) 1995-09-15 1999-02-16 Infonautics Corporation Architecture for processing search queries, retrieving documents identified thereby, and method for using same
US5788504A (en) 1995-10-16 1998-08-04 Brookhaven Science Associates Llc Computerized training management system
US6546406B1 (en) 1995-11-03 2003-04-08 Enigma Information Systems Ltd. Client-server computer system for large document retrieval on networked computer system
US5765140A (en) 1995-11-17 1998-06-09 Mci Corporation Dynamic project management system
US5832483A (en) 1995-12-15 1998-11-03 Novell, Inc. Distributed control interface for managing the interoperability and concurrency of agents and resources in a real-time environment
US5852715A (en) 1996-03-19 1998-12-22 Emc Corporation System for currently updating database by one host and reading the database by different host for the purpose of implementing decision support functions
US5795155A (en) 1996-04-01 1998-08-18 Electronic Data Systems Corporation Leadership assessment tool and method
JPH09297768A (en) 1996-05-07 1997-11-18 Fuji Xerox Co Ltd Management device and retrieval method for document data base
US5826252A (en) 1996-06-28 1998-10-20 General Electric Company System for managing multiple projects of similar type using dynamically updated global database
US5881269A (en) 1996-09-30 1999-03-09 International Business Machines Corporation Simulation of multiple local area network clients on a single workstation
US6137797A (en) 1996-11-27 2000-10-24 International Business Machines Corporation Process definition for source route switching
US5822780A (en) 1996-12-31 1998-10-13 Emc Corporation Method and apparatus for hierarchical storage management for data base management systems
US5818463A (en) 1997-02-13 1998-10-06 Rockwell Science Center, Inc. Data compression for animated three dimensional objects
US5935249A (en) 1997-02-26 1999-08-10 Sun Microsystems, Inc. Mechanism for embedding network based control systems in a local network interface device
US5995958A (en) 1997-03-04 1999-11-30 Xu; Kevin Houzhi System and method for storing and managing functions
US6122632A (en) 1997-07-21 2000-09-19 Convergys Customer Management Group Inc. Electronic message management system
US6052685A (en) 1997-08-13 2000-04-18 Mosaix, Inc. Integration of legacy database management systems with ODBC-compliant application programs
US5983267A (en) 1997-09-23 1999-11-09 Information Architects Corporation System for indexing and displaying requested data having heterogeneous content and representation
US5974443A (en) 1997-09-26 1999-10-26 Intervoice Limited Partnership Combined internet and data access system
US6044373A (en) 1997-09-29 2000-03-28 International Business Machines Corporation Object-oriented access control method and system for military and commercial file systems
US6044466A (en) 1997-11-25 2000-03-28 International Business Machines Corp. Flexible and dynamic derivation of permissions
US6769019B2 (en) 1997-12-10 2004-07-27 Xavier Ferguson Method of background downloading of information from a computer network
US6151624A (en) 1998-02-03 2000-11-21 Realnames Corporation Navigating network resources based on metadata
US6012098A (en) 1998-02-23 2000-01-04 International Business Machines Corp. Servlet pairing for isolation of the retrieval and rendering of data
US6185534B1 (en) 1998-03-23 2001-02-06 Microsoft Corporation Modeling emotion and personality in a computer user interface
US6078982A (en) 1998-03-24 2000-06-20 Hewlett-Packard Company Pre-locking scheme for allowing consistent and concurrent workflow process execution in a workflow management system
US6154738A (en) 1998-03-27 2000-11-28 Call; Charles Gainor Methods and apparatus for disseminating product information via the internet using universal product codes
US6125363A (en) 1998-03-30 2000-09-26 Buzzeo; Eugene Distributed, multi-user, multi-threaded application development method
US6085188A (en) 1998-03-30 2000-07-04 International Business Machines Corporation Method of hierarchical LDAP searching with relational tables
US6360330B1 (en) 1998-03-31 2002-03-19 Emc Corporation System and method for backing up data stored in multiple mirrors on a mass storage subsystem under control of a backup server
US6369819B1 (en) 1998-04-17 2002-04-09 Xerox Corporation Methods for visualizing transformations among related series of graphs
US6509898B2 (en) 1998-04-17 2003-01-21 Xerox Corporation Usage based methods of traversing and displaying generalized graph structures
US6151595A (en) 1998-04-17 2000-11-21 Xerox Corporation Methods for interactive visualization of spreading activation using time tubes and disk trees
US6389460B1 (en) 1998-05-13 2002-05-14 Compaq Computer Corporation Method and apparatus for efficient storage and retrieval of objects in and from an object storage device
US6182085B1 (en) 1998-05-28 2001-01-30 International Business Machines Corporation Collaborative team crawling:Large scale information gathering over the internet
US6094652A (en) 1998-06-10 2000-07-25 Oracle Corporation Hierarchical query feedback in an information retrieval system
US6594662B1 (en) 1998-07-01 2003-07-15 Netshadow, Inc. Method and system for gathering information resident on global computer networks
US6583800B1 (en) 1998-07-14 2003-06-24 Brad Ridgley Method and device for finding, collecting and acting upon units of information
US6266668B1 (en) 1998-08-04 2001-07-24 Dryken Technologies, Inc. System and method for dynamic data-mining and on-line communication of customized information
US6177932B1 (en) 1998-08-21 2001-01-23 Kana Communications, Inc. Method and apparatus for network based customer service
US6243713B1 (en) 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
GB2343763B (en) 1998-09-04 2003-05-21 Shell Services Internat Ltd Data processing system
US6725227B1 (en) 1998-10-02 2004-04-20 Nec Corporation Advanced web bookmark database system
US6415283B1 (en) 1998-10-13 2002-07-02 Orack Corporation Methods and apparatus for determining focal points of clusters in a tree structure
US8006177B1 (en) 1998-10-16 2011-08-23 Open Invention Network, Llc Documents for commerce in trading partner networks and interface definitions based on the documents
US6341277B1 (en) 1998-11-17 2002-01-22 International Business Machines Corporation System and method for performance complex heterogeneous database queries using a single SQL expression
JP3760057B2 (en) 1998-11-19 2006-03-29 株式会社日立製作所 Document search method and document search service for multiple document databases
US6941321B2 (en) 1999-01-26 2005-09-06 Xerox Corporation System and method for identifying similarities among objects in a collection
US6418413B2 (en) 1999-02-04 2002-07-09 Ita Software, Inc. Method and apparatus for providing availability of airline seats
JP2000235493A (en) 1999-02-12 2000-08-29 Fujitsu Ltd Trading device
US6246320B1 (en) * 1999-02-25 2001-06-12 David A. Monroe Ground link with on-board security surveillance system for aircraft and other commercial vehicles
JP3484096B2 (en) 1999-03-03 2004-01-06 インターナショナル・ビジネス・マシーンズ・コーポレーション Logical zoom method in logical zoom device for directed graph
US6308163B1 (en) 1999-03-16 2001-10-23 Hewlett-Packard Company System and method for enterprise workflow resource management
US6751663B1 (en) 1999-03-25 2004-06-15 Nortel Networks Limited System wide flow aggregation process for aggregating network activity records
US6625657B1 (en) 1999-03-25 2003-09-23 Nortel Networks Limited System for requesting missing network accounting records if there is a break in sequence numbers while the records are transmitting from a source device
US6405251B1 (en) 1999-03-25 2002-06-11 Nortel Networks Limited Enhancement of network accounting records
US6446200B1 (en) 1999-03-25 2002-09-03 Nortel Networks Limited Service management
US6393423B1 (en) 1999-04-08 2002-05-21 James Francis Goedken Apparatus and methods for electronic information exchange
US6463440B1 (en) 1999-04-08 2002-10-08 International Business Machines Corporation Retrieval of style sheets from directories based upon partial characteristic matching
US6530079B1 (en) 1999-06-02 2003-03-04 International Business Machines Corporation Method for optimizing locks in computer programs
US6539374B2 (en) 1999-06-03 2003-03-25 Microsoft Corporation Methods, apparatus and data structures for providing a uniform representation of various types of information
US6778971B1 (en) 1999-06-03 2004-08-17 Microsoft Corporation Methods and apparatus for analyzing computer-based tasks to build task models
US6330554B1 (en) 1999-06-03 2001-12-11 Microsoft Corporation Methods and apparatus using task models for targeting marketing information to computer users based on a task being performed
US6606613B1 (en) 1999-06-03 2003-08-12 Microsoft Corporation Methods and apparatus for using task models to help computer users complete tasks
US6427151B1 (en) 1999-06-29 2002-07-30 International Business Machines Corporation Method, computer program product, system and data structure for formatting transaction results data
US6446256B1 (en) 1999-06-30 2002-09-03 Microsoft Corporation Extension of parsable structures
US6405211B1 (en) 1999-07-08 2002-06-11 Cohesia Corporation Object-oriented representation of technical content and management, filtering, and synthesis of technical content using object-oriented representations
US6381738B1 (en) 1999-07-16 2002-04-30 International Business Machines Corporation Method for optimizing creation and destruction of objects in computer programs
US6389429B1 (en) 1999-07-30 2002-05-14 Aprimo, Inc. System and method for generating a target database from one or more source databases
US6577769B1 (en) 1999-09-18 2003-06-10 Wildtangent, Inc. Data compression through adaptive data size reduction
US6598043B1 (en) 1999-10-04 2003-07-22 Jarg Corporation Classification of information sources using graph structures
US20030050927A1 (en) 2001-09-07 2003-03-13 Araha, Inc. System and method for location, understanding and assimilation of digital documents through abstract indicia
US6496833B1 (en) 1999-11-01 2002-12-17 Sun Microsystems, Inc. System and method for generating code for query object interfacing
US20020069134A1 (en) 1999-11-01 2002-06-06 Neal Solomon System, method and apparatus for aggregation of cooperative intelligent agents for procurement in a distributed network
US6714952B2 (en) 1999-11-10 2004-03-30 Emc Corporation Method for backup and restore of a multi-lingual network file server
US6901438B1 (en) 1999-11-12 2005-05-31 Bmc Software System selects a best-fit form or URL in an originating web page as a target URL for replaying a predefined path through the internet
US6418448B1 (en) 1999-12-06 2002-07-09 Shyam Sundar Sarkar Method and apparatus for processing markup language specifications for data and metadata used inside multiple related internet documents to navigate, query and manipulate information from a plurality of object relational databases over the web
US7047411B1 (en) 1999-12-17 2006-05-16 Microsoft Corporation Server for an electronic distribution system and method of operating same
US7064241B2 (en) 2000-01-05 2006-06-20 The United States Of America As Represented By The Secretary Of The Navy Chemical and biological warfare decontaminating solution using peracids and germinants in microemulsions, process and product thereof
US6529899B1 (en) 2000-01-12 2003-03-04 International Business Machines Corporation System and method for registering and providing a tool service
US6556983B1 (en) 2000-01-12 2003-04-29 Microsoft Corporation Methods and apparatus for finding semantic information, such as usage logs, similar to a query using a pattern lattice data space
AU2001229371A1 (en) 2000-01-14 2001-07-24 Saba Software, Inc. Information server
US20020049788A1 (en) 2000-01-14 2002-04-25 Lipkin Daniel S. Method and apparatus for a web content platform
US6643652B2 (en) * 2000-01-14 2003-11-04 Saba Software, Inc. Method and apparatus for managing data exchange among systems in a network
AU2001226401A1 (en) 2000-01-14 2001-07-24 Saba Software, Inc. Method and apparatus for a business applications server
US6701314B1 (en) * 2000-01-21 2004-03-02 Science Applications International Corporation System and method for cataloguing digital information for searching and retrieval
US7117260B2 (en) 2000-01-27 2006-10-03 American Express Travel Related Services Company, Inc. Content management application for an interactive environment
EP1264263A2 (en) 2000-02-25 2002-12-11 Saba Software, Inc. Method for enterprise workforce planning
US6757708B1 (en) 2000-03-03 2004-06-29 International Business Machines Corporation Caching dynamic content
US6865509B1 (en) 2000-03-10 2005-03-08 Smiths Detection - Pasadena, Inc. System for providing control to an industrial process using one or more multidimensional variables
WO2001069466A1 (en) 2000-03-15 2001-09-20 British Telecommunications Public Limited Company Apparatus and method of allocating communications resources
CA2407974A1 (en) * 2000-03-16 2001-09-20 Poly Vista, Inc. A system and method for analyzing a query and generating results and related questions
US6643638B1 (en) 2000-03-25 2003-11-04 Kevin Houzhi Xu System and method for storing and computing data and functions
US20020024424A1 (en) * 2000-04-10 2002-02-28 Burns T. D. Civil defense alert system and method using power line communication
JP3562572B2 (en) 2000-05-02 2004-09-08 インターナショナル・ビジネス・マシーンズ・コーポレーション Detect and track new items and new classes in database documents
US6640284B1 (en) 2000-05-12 2003-10-28 Nortel Networks Limited System and method of dynamic online session caching
US6636848B1 (en) 2000-05-31 2003-10-21 International Business Machines Corporation Information search using knowledge agents
US7313588B1 (en) 2000-07-13 2007-12-25 Biap Systems, Inc. Locally executing software agent for retrieving remote content and method for creation and use of the agent
US20020059566A1 (en) 2000-08-29 2002-05-16 Delcambre Lois M. Uni-level description of computer information and transformation of computer information between representation schemes
WO2002021259A1 (en) 2000-09-08 2002-03-14 The Regents Of The University Of California Data source integration system and method
US6678679B1 (en) 2000-10-10 2004-01-13 Science Applications International Corporation Method and system for facilitating the refinement of data queries
US20020118688A1 (en) 2000-10-25 2002-08-29 Ravi Jagannathan Generation of fast busy signals in data networks
US7290061B2 (en) 2000-12-05 2007-10-30 Citrix Systems, Inc. System and method for internet content collaboration
US20020091678A1 (en) 2001-01-05 2002-07-11 Miller Nancy E. Multi-query data visualization processes, data visualization apparatus, computer-readable media and computer data signals embodied in a transmission medium
US20020133502A1 (en) * 2001-01-05 2002-09-19 Rosenthal Richard Nelson Method and system for interactive collection of information
US6804677B2 (en) 2001-02-26 2004-10-12 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
US20020143759A1 (en) 2001-03-27 2002-10-03 Yu Allen Kai-Lang Computer searches with results prioritized using histories restricted by query context and user community
US20030088639A1 (en) 2001-04-10 2003-05-08 Lentini Russell P. Method and an apparatus for transforming content from one markup to another markup language non-intrusively using a server load balancer and a reverse proxy transcoding engine
US6934702B2 (en) 2001-05-04 2005-08-23 Sun Microsystems, Inc. Method and system of routing messages in a distributed search network
US7171415B2 (en) 2001-05-04 2007-01-30 Sun Microsystems, Inc. Distributed information discovery through searching selected registered information providers
US6925457B2 (en) 2001-07-27 2005-08-02 Metatomix, Inc. Methods and apparatus for querying a relational data store using schema-less queries
US6856992B2 (en) * 2001-05-15 2005-02-15 Metatomix, Inc. Methods and apparatus for real-time business visibility using persistent schema-less data storage
US7890517B2 (en) 2001-05-15 2011-02-15 Metatomix, Inc. Appliance for enterprise information integration and enterprise resource interoperability platform and methods
MXPA03011976A (en) * 2001-06-22 2005-07-01 Nervana Inc System and method for knowledge retrieval, management, delivery and presentation.
US20030004934A1 (en) 2001-06-29 2003-01-02 Richard Qian Creating and managing portable user preferences for personalizion of media consumption from device to device
US6792420B2 (en) 2001-06-29 2004-09-14 International Business Machines Corporation Method, system, and program for optimizing the processing of queries involving set operators
US7130861B2 (en) 2001-08-16 2006-10-31 Sentius International Corporation Automated creation and delivery of database content
US20030050834A1 (en) 2001-09-07 2003-03-13 Sergio Caplan System and method for dynamic customizable interactive portal active during select computer time
AUPR796801A0 (en) 2001-09-27 2001-10-25 Plugged In Communications Pty Ltd Computer user interface tool for navigation of data stored in directed graphs
AUPR796701A0 (en) 2001-09-27 2001-10-25 Plugged In Communications Pty Ltd Database query system and method
US6965816B2 (en) * 2001-10-01 2005-11-15 Kline & Walker, Llc PFN/TRAC system FAA upgrades for accountable remote and robotics control to stop the unauthorized use of aircraft and to improve equipment management and public safety in transportation
US7289793B2 (en) 2001-12-03 2007-10-30 Scott Gilbert Method and apparatus for displaying real-time information objects between a wireless mobile user station and multiple information sources based upon event driven parameters and user modifiable object manifest
US20040054690A1 (en) * 2002-03-08 2004-03-18 Hillerbrand Eric T. Modeling and using computer resources over a heterogeneous distributed network using semantic ontologies
US7286997B2 (en) * 2002-05-07 2007-10-23 Cembex Care Solutions, Llc Internet-based, customizable clinical information system
US7519541B2 (en) * 2003-01-29 2009-04-14 Cerner Innovation, Inc. System and method in a computer system for managing a number of attachments associated with a patient
EP1690210A2 (en) 2003-07-07 2006-08-16 Metatomix, Inc. Surveillance, monitoring and real-time events platform
US20050049924A1 (en) 2003-08-27 2005-03-03 Debettencourt Jason Techniques for use with application monitoring to obtain transaction data
JP2005149126A (en) 2003-11-14 2005-06-09 Sony Corp Information acquiring system and method, and information processing program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030009239A1 (en) * 2000-03-23 2003-01-09 Lombardo Joseph S Method and system for bio-surveillance detection and alerting

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"NATIONAL ELECTRONIC DISEASE SURVEILLANCE SYSTEM (NEDSS): A STANDARDS-BASED APPROACH TO CONNECT PUBLIC HEALTH AND CLINICAL MEDICINE THE NATIONAL ELECTRONIC DISEASE SURVEILLANCE SYSTEM WORKING GROUP" JOURNAL OF PUBLIC HEALTH MANAGEMENT AND PRACTICE, ASPEN PUBLISHERS, FREDERICK, MD, US, vol. 7, no. 6, November 2001 (2001-11), pages 43-50, XP008035113 ISSN: 1078-4659 *
DELCARNBRE L ET AL: "Bundles in captivity: an application of superimposed information" PROCEEDINGS 17TH. INTERNATIONAL CONFERENCE ON DATA ENGINEERING. (ICDE'2001). HEIDELBERG, GERMANY, APRIL 2 - 6, 2001, INTERNATIONAL CONFERENCE ON DATA ENGINEERING. (ICDE), LOS ALAMITOS, CA : IEEE COMP. SOC, US, vol. CONF. 17, 2 April 2001 (2001-04-02), pages 111-120, XP010538052 ISBN: 0-7695-1001-9 *
MANOIA F: "Towards a richer Web object model" SIGMOD RECORD ACM USA, vol. 27, no. 1, 1998, pages 76-80, XP008044102 ISSN: 0163-5808 *
PHIN: "Public Health Information Network Functions and Specifications Version 1."[Online] 18 December 2002 (2002-12-18), pages 1-56, XP002320364 Retrieved from the Internet: URL:http://www.cdc.gov/phin/Public%20Healt h%20Information%20Network%20Functions%20an d%20Specificat%85.pdf> [retrieved on 2005-03-04] *
YASNOFF W A ET AL: "Public health informatics: Improving and transforming public health in the information age" TOPICS IN HEALTH INFORMATION MANAGEMENT, ASPEN PUBLISHERS, FREDERICK, MD, US, vol. 21, no. 3, February 2001 (2001-02), page 44, XP002969652 ISSN: 1065-0989 *

Also Published As

Publication number Publication date
US20140052779A1 (en) 2014-02-20
EP1690210A2 (en) 2006-08-16
US8572059B2 (en) 2013-10-29
US20050055330A1 (en) 2005-03-10
WO2005029365A3 (en) 2005-05-19

Similar Documents

Publication Publication Date Title
US8572059B2 (en) Surveillance, monitoring and real-time events platform
US7890517B2 (en) Appliance for enterprise information integration and enterprise resource interoperability platform and methods
US10592310B2 (en) System and method for detecting, collecting, analyzing, and communicating event-related information
US10872388B2 (en) Global disease surveillance platform, and corresponding system and method
US7958155B2 (en) Systems and methods for the management of information to enable the rapid dissemination of actionable information
US8112453B2 (en) Systems and methods for retrieving data
Bellini et al. Smart city architecture for data ingestion and analytics: Processes and solutions
Welbourne et al. Cascadia: a system for specifying, detecting, and managing RFID events
US20040103147A1 (en) System for enabling collaboration and protecting sensitive data
CN101436274A (en) Method for across-platform monitoring enterprise application system performance
Ray et al. Information Technology: principles and applications
Ding et al. Massive heterogeneous sensor data management in the Internet of Things
WO2005081963A2 (en) Appliance for enterprise information integration and enterprise resource interoperability platform and methods
Perry et al. Geospatial and temporal semantic analytics
Zeng et al. West nile virus and botulism portal: a case study in infectious disease informatics
Zoppi et al. Labelling relevant events to support the crisis management operator
US10354760B1 (en) Tool for visual exploration of medical data
Feyer et al. Many-dimensional schema modeling
CA2471468A1 (en) Methods and apparatus for statistical data analysis
EP1444612A1 (en) Information aggregation, processing and distribution system
Tsoi Development of a Cross-Domain Web-Based GIS Platform to Support Surveillance and Control of Communicable Diseases
Poulsen Designing a database for law enforcement agencies
Rice Improving emergency responder situational awareness for incident command systems (ICS) using critical information management, simulation, and analysis
Turner et al. Conceptual ecological models in benthic habitats monitoring
TC et al. Health Informatics–HEALTH INFORMATION ARCHTECTURE FRAMEWORK

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BW BY BZ CA CH CN CO CR CU CZ DK DM DZ EC EE EG ES FI GB GD GE GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MK MN MW MX MZ NA NI NO NZ PG PH PL PT RO RU SC SD SE SG SK SY TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SZ TZ UG ZM ZW AM AZ BY KG MD RU TJ TM AT BE BG CH CY DE DK EE ES FI FR GB GR HU IE IT MC NL PL PT RO SE SI SK TR BF CF CG CI CM GA GN GQ GW ML MR SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004809476

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2004809476

Country of ref document: EP