CA2546896A1 - Extraction of facts from text - Google Patents

Extraction of facts from text Download PDF

Info

Publication number
CA2546896A1
CA2546896A1 CA002546896A CA2546896A CA2546896A1 CA 2546896 A1 CA2546896 A1 CA 2546896A1 CA 002546896 A CA002546896 A CA 002546896A CA 2546896 A CA2546896 A CA 2546896A CA 2546896 A1 CA2546896 A1 CA 2546896A1
Authority
CA
Canada
Prior art keywords
text
attributes
base
patterns
tokens
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002546896A
Other languages
French (fr)
Other versions
CA2546896C (en
Inventor
Mark Wasson
James Wiltshire, Jr.
Donald Loritz
Steve Xu
Shian-Jung Chen
Valentina Templar
Eleni Koutsomitopoulou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LexisNexis Inc
Original Assignee
Lexisnexis, A Division Of Reed Elsevier Inc.
Mark Wasson
James Wiltshire, Jr.
Donald Loritz
Steve Xu
Shian-Jung Chen
Valentina Templar
Eleni Koutsomitopoulou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lexisnexis, A Division Of Reed Elsevier Inc., Mark Wasson, James Wiltshire, Jr., Donald Loritz, Steve Xu, Shian-Jung Chen, Valentina Templar, Eleni Koutsomitopoulou filed Critical Lexisnexis, A Division Of Reed Elsevier Inc.
Publication of CA2546896A1 publication Critical patent/CA2546896A1/en
Application granted granted Critical
Publication of CA2546896C publication Critical patent/CA2546896C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes

Abstract

A fact extraction tool set ("FEX") finds and extracts targeted pieces of information from text using linguistic and pattern matching technologies, and in particular, text annotation and fact extraction. Text annotation tools break a text, such as a document, into its base tokens and annotate those tokens or patterns of tokens with orthographic, syntactic, semantic, pragmatic and other attributes. A user-defined "Annotation Configuration" controls which annotation tools are used in a given application. XML is used as the basis for representing the annotated text. A tag uncrossing tool resolves conflicting (crossed) annotation boundaries in an annotated text to produce well-formed XML from the results of the individual annotators. The fact extraction tool is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.

Claims (55)

1. A fact extraction tool set for extracting information from a document, comprising:
means for annotating a text; and means for extracting facts from the annotated text.
2. The fact extraction tool set of claim 1, wherein the means for annotating a text comprises means for assigning syntactic and semantic attributes to a text passage by at least one of parsing the text passage and applying text annotation processes other than parsing the text passage.
3. The fact extraction tool set of claim 2, wherein the means for assigning syntactic and semantic attributes to a text passage comprises means for breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens with a number of orthographic, syntactic, semantic, pragmatic and dictionary-based attributes.
4. The fact extraction tool set of claim 3, wherein the attributes include tokenization, text normalization, part of speech tags, sentence boundaries, parse trees, semantic attribute tagging and other interesting attributes of the text.
5. The fact extraction tool set of claim 2, wherein the means for assigning syntactic and semantic attributes to a text passage comprises independent annotators.
6. The fact extraction tool set of claim 5, wherein the independent annotators use XML as a basis for representing annotated text.
7. The fact extraction tool set of claim 6, further comprising means for resolving conflicting annotation boundaries in the annotated text to produce well-formed XML from the results of independent annotators.
8. The fact extraction tool set of claim 3, wherein the means for breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens comprises independent annotators, wherein the annotators are of three types comprising:
token attributes, which have a one-per-base-token alignment, where for the attribute type represented, there is an attempt to assign an attribute to each base token;
constituent attributes assigned yes-no values to patterns of base tokens, where the entire pattern is considered to be a single constituent with respect to some annotation value; and links, which assign common identifiers to coreferring and other related patterns of base tokens.
9. The fact extraction tool set of claim 3, wherein the means for annotating a text further comprises means for associating all annotations assigned to a particular piece of text with the base tokens for that text to generate aligned annotations.
10. The fact extraction tool set of claim 9, wherein the means for extracting facts comprises means for identifying and extracting potentially interesting pieces of information in the aligned annotations by finding patterns in the attributes stored by the annotators.
11. The fact extraction tool set of claim 10, wherein the means for identifying and extracting potentially interesting pieces of information comprises means for recognizing both true left and right constituent attributes and non-contiguous constituent attributes.
12. The fact extraction tool set of claim 10, wherein the means for identifying and extracting potentially interesting pieces of information comprises at least one text pattern recognition rule written in a rule-based information extraction language, wherein the at least one text pattern recognition rule queries for at least one of literal text, attributes, and relationships found in the aligned annotations to define the facts to be extracted.
13. The fact extraction tool set of claim 12, wherein the at least one text pattern recognition rule can use regular expression functionality, XPath-based functionality, and auxiliary definitions in any combination.
14. The fact extraction tool set of claim 12, wherein the at least one text pattern recognition rule comprises a pattern that describes te text of interest, a label that names the pattern for testing and debugging purposes; and an action that indicates what should be done in response to a successful match.
15. The fact extraction tool set of claim 12, wherein the means for identifying and extracting potentially interesting pieces of information further comprises at least one auxiliary definition statement used to name and define a fragment of a pattern.
16. A rule-based information extraction language for use in identifying and extracting potentially interesting pieces of information in aligned annotations in a text, comprising at least one text pattern recognition rule that queries for at least one of literal text, attributes, and relationships found in the aligned annotations to define the facts to be extracted.
17. The language of claim 16, wherein the at least one text pattern recognition rule can use regular expression functionality, XPath-based functionality, and auxiliary definitions in any combination.
18. The language of claim 16, wherein the at least one text pattern recognition rule comprises a pattern that describes the text of interest, a label that names the pattern for testing and debugging purposes, and an action that indicates what should be done in response to a successful match.
19. The language of claim 16, further comprising at least one auxiliary definition statement used to name and define a fragment of a pattern.
20. A text annotation tool comprising:
means for assigning syntactic and semantic attributes to a text passage by at least one of parsing the text passage and applying text annotation processes other than parsing the text passage, including means for breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens with a number of orthographic, syntactic, semantic, pragmatic and dictionary-based attributes; and means for associating all annotations assigned to a particular piece of text with the base tokens for that text to generate aligned annotations.
21. The text annotation tool of claim 20, wherein the attributes include tokenization, text normalization, part of speech tags, sentence boundaries, parse trees, semantic attribute tagging and other interesting attributes of the text.
22. The text annotation tool of claim 20, wherein the means for assigning syntactic and semantic attributes to a text passage comprises independent annotators.
23. The text annotation tool of claim 22, wherein the independent annotators use XML as a basis for representing annotated text.
24. The text annotation tool of claim 23, further comprising means for resolving conflicting annotation boundaries in the annotated text to produce well-formed XML from the results of independent annotators.
25. The text annotation tool of claim 20, wherein the means for breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens comprises independent annotators, wherein the annotators are of three types comprising:
token attributes, .which have a one-per-base-token alignment, where for the attribute type represented, there is an attempt to assign an attribute to each base token;
constituent attributes assigned yes-no values to patterns of base tokens, where the entire pattern is considered to be a single constituent with respect to some annotation value; and links, which assign common identifiers to coreferring and other related patterns of base tokens.
26. A computer program product for extracting information from a document, the computer program product comprising a computer usable storage medium having computer readable program code means embodied in the medium, the computer readable program code means comprising:
computer readable program code means for annotating a text; and computer readable program code means for extracting facts from the annotated text.
27. The computer program product of claim 26, wherein the computer readable program code means for annotating a text comprises computer readable program code means for assigning syntactic and semantic attributes to a text passage by at least one of parsing the text passage and applying text annotation processes other than parsing the text passage.
28. The computer program product of claim 27, wherein the computer readable program code means for assigning syntactic and semantic attributes to a text passage comprises computer readable program code means for breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens with a number of orthographic, syntactic, semantic, pragmatic and dictionary-based attributes.
29. The computer program product of claim 28, wherein the attributes include tokenization, text normalization, part of speech tags, sentence boundaries, parse trees, semantic attribute tagging and other interesting attributes of the text.
30. The computer program product of claim 27, wherein the computer readable program code means for assigning syntactic and semantic attributes to a text passage comprises independent annotators.
31. The computer program product of claim 30, wherein the independent annotators use XML as a basis for representing annotated text.
32. The computer program product of claim 31, further comprising computer readable program code means for resolving conflicting annotation boundaries in the annotated text to produce well-formed XML from the results of independent annotators.
33. The computer program product of claim 28, wherein the computer readable program code means for breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens comprises individual annotators, wherein the annotators are of three types comprising:
token attributes, which have a one-per-base-token alignment, where for the attribute type represented, there is an attempt to assign an attribute to each base token;
constituent attributes assigned yes-no values to patterns of base tokens, where the entire pattern is considered to be a single constituent with respect to some annotation value; and links, which assign common identifiers to coreferring and other related patterns of base tokens.
34. The computer program product of claim 28, wherein the computer readable program code means for annotating a text further comprises computer readable program code means for associating all annotations assigned to a particular piece of text with the base tokens for that text to generate aligned annotations.
35. The computer program product of claim 34, wherein the computer readable program code means for extracting facts comprises computer readable program code means for identifying and extracting potentially interesting pieces of information in the aligned annotations by finding patterns in the attributes stored by the annotators.
36. The computer program product of claim 35, wherein the computer readable program code means for identifying and extracting potentially interesting pieces of information further comprises computer readable program code means for recognizing both true left and right constituent attributes and non-contiguous constituent attributes.
37. The computer program product of claim 35, wherein the computer readable program code means for identifying and extracting potentially interesting pieces of information comprises at least one text pattern recognition rule written in a rule-based information extraction language, wherein the at least one text pattern recognition rule queries for at least one of literal text, attributes, and relationships found in the aligned annotations to define the facts to be extracted.
38. The computer program product of claim 37, wherein the at least one text pattern recognition rule can use regular expression functionality, XPath-based functionality, and auxiliary definitions in any combination.
39. The computer program product of claim 37, wherein the at least one text pattern recognition rule comprises a pattern that describes the text of interest, a label that names the pattern for testing and debugging purposes, and an action that indicates what should be done in response to a successful match.
40. The computer program product of claim 37, wherein the computer readable program code means for identifying and extracting potentially interesting pieces of information further comprises at least one auxiliary definition statement used to name and define a fragment of a pattern.
41. A method of extracting information from a document, comprising the steps of:
annotating a text; and extracting facts from the annotated text.
42. The method of claim 41, wherein the step of annotating a text comprises assigning syntactic and semantic attributes to a text passage by at least one of parsing the text passage and applying text annotation processes other than parsing the text passage.
43. The method of claim 42, wherein the parsing of the text passage comprises breaking it into its base tokens and annotating the base tokens and patterns of base tokens with a number of orthographic, syntactic, semantic, pragmatic and dictionary-based attributes.
44. The method of claim 43, wherein the attributes include tokenization, text normalization, part of speech tags, sentence boundaries, parse trees, semantic attribute tagging and other interesting attributes of the text.
45. The method of claim 42, wherein the parsing of the text passage is carried out by independent annotators.
46. The method of claim 45, wherein the individual annotators use XML as a basis for representing annotated text.
47. The method of claim 46, further comprising the step of resolving conflicting annotation boundaries in the annotated text to produce well-formed XML from the results of independent annotators.
48. The method of claim 43, wherein the step of breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens is carried out using independent annotators, wherein the annotators are of three types comprising:
token attributes, which have a one-per-base-token alignment, where for the attribute type represented, there is an attempt to assign an attribute to each base token;
constituent attributes assigned yes-no values to patterns of base tokens, where the entire pattern is considered to be a single constituent with respect to some annotation value; and links, which assign common identifiers to coreferring and other related patterns of base tokens.
49. The method of claim 43, wherein the step of annotating a text further comprises the step of associating all annotations assigned to a particular piece of text with the base tokens for that text to generate aligned annotations.
50. The method of claim 49, wherein the step of extracting facts comprises identifying and extracting potentially interesting pieces of information in the aligned annotations by finding patterns in the attributes stored by the annotators.
51. The method of claim 50, wherein the step of identifying and extracting potentially interesting pieces of information comprises recognizing both true left and right constituent attributes and non-contiguous constituent attributes.
52. The method of claim 50, wherein the patterns are found using at least one text pattern recognition rule written in a rule-based information extraction language, wherein the at least one text pattern recognition rule queries for at least one of literal text, attributes, and relationships found in the aligned annotations to define the facts to be extracted.
53. The method of claim 52, wherein the at least one text pattern recognition rule can use regular expression functionality, XPath-based functionality, and auxiliary definitions in any combination.
54. The method of claim 52, wherein the at least one text pattern recognition rule describes the text of interest, names the pattern for testing and debugging purposes; and indicates what should be done in response to a successful match.
55. The method of claim 52, wherein the patterns are found further using at least one auxiliary definition statement used to name and define a fragment of a pattern.
CA2546896A 2003-11-19 2004-10-26 Extraction of facts from text Active CA2546896C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US10/716,202 US20050108630A1 (en) 2003-11-19 2003-11-19 Extraction of facts from text
US10/716,202 2003-11-19
PCT/US2004/035359 WO2005052727A2 (en) 2003-11-19 2004-10-26 Extraction of facts from text

Publications (2)

Publication Number Publication Date
CA2546896A1 true CA2546896A1 (en) 2005-06-09
CA2546896C CA2546896C (en) 2012-08-07

Family

ID=34574367

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2546896A Active CA2546896C (en) 2003-11-19 2004-10-26 Extraction of facts from text

Country Status (6)

Country Link
US (2) US20050108630A1 (en)
EP (1) EP1695170A4 (en)
AU (1) AU2004294094B2 (en)
CA (1) CA2546896C (en)
NZ (1) NZ547871A (en)
WO (1) WO2005052727A2 (en)

Families Citing this family (315)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9219755B2 (en) 1996-11-08 2015-12-22 Finjan, Inc. Malicious mobile code runtime monitoring system and methods
US8079086B1 (en) 1997-11-06 2011-12-13 Finjan, Inc. Malicious mobile code runtime monitoring system and methods
US7058822B2 (en) 2000-03-30 2006-06-06 Finjan Software, Ltd. Malicious mobile code runtime monitoring system and methods
US7975305B2 (en) * 1997-11-06 2011-07-05 Finjan, Inc. Method and system for adaptive rule-based content scanners for desktop computers
US8225408B2 (en) * 1997-11-06 2012-07-17 Finjan, Inc. Method and system for adaptive rule-based content scanners
EP1686499B1 (en) * 2002-06-28 2010-06-30 Nippon Telegraph and Telephone Corporation Selection and extraction of information from structured documents
JP4024137B2 (en) * 2002-11-28 2007-12-19 沖電気工業株式会社 Quantity expression search device
AU2003901428A0 (en) * 2003-03-24 2003-04-10 Objective Systems Pty Ltd A system and method for formatting and distributing reading material
US8694510B2 (en) 2003-09-04 2014-04-08 Oracle International Corporation Indexing XML documents efficiently
US8229932B2 (en) * 2003-09-04 2012-07-24 Oracle International Corporation Storing XML documents efficiently in an RDBMS
US8548170B2 (en) 2003-12-10 2013-10-01 Mcafee, Inc. Document de-registration
US7984175B2 (en) 2003-12-10 2011-07-19 Mcafee, Inc. Method and apparatus for data capture and analysis system
US8656039B2 (en) 2003-12-10 2014-02-18 Mcafee, Inc. Rule parser
US7493305B2 (en) * 2004-04-09 2009-02-17 Oracle International Corporation Efficient queribility and manageability of an XML index with path subsetting
US7499915B2 (en) * 2004-04-09 2009-03-03 Oracle International Corporation Index for accessing XML data
US7603347B2 (en) * 2004-04-09 2009-10-13 Oracle International Corporation Mechanism for efficiently evaluating operator trees
US7398274B2 (en) * 2004-04-27 2008-07-08 International Business Machines Corporation Mention-synchronous entity tracking system and method for chaining mentions
JP4254623B2 (en) * 2004-06-09 2009-04-15 日本電気株式会社 Topic analysis method, apparatus thereof, and program
US7885980B2 (en) * 2004-07-02 2011-02-08 Oracle International Corporation Mechanism for improving performance on XML over XML data using path subsetting
US8560534B2 (en) 2004-08-23 2013-10-15 Mcafee, Inc. Database for a capture system
US7949849B2 (en) 2004-08-24 2011-05-24 Mcafee, Inc. File system for a capture system
US9171100B2 (en) 2004-09-22 2015-10-27 Primo M. Pettovello MTree an XPath multi-axis structure threaded index
WO2006051718A1 (en) * 2004-11-12 2006-05-18 Justsystems Corporation Document processing device, and document processing method
EP1837776A1 (en) * 2004-11-12 2007-09-26 JustSystems Corporation Document processing device and document processing method
US9195766B2 (en) * 2004-12-14 2015-11-24 Google Inc. Providing useful information associated with an item in a document
US7921076B2 (en) * 2004-12-15 2011-04-05 Oracle International Corporation Performing an action in response to a file system event
US7698270B2 (en) * 2004-12-29 2010-04-13 Baynote, Inc. Method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge
US7769579B2 (en) 2005-05-31 2010-08-03 Google Inc. Learning facts from semi-structured text
US9208229B2 (en) * 2005-03-31 2015-12-08 Google Inc. Anchor text summarization for corroboration
US7587387B2 (en) 2005-03-31 2009-09-08 Google Inc. User interface for facts query engine with snippets from information sources that include query terms and answer terms
US8682913B1 (en) 2005-03-31 2014-03-25 Google Inc. Corroborating facts extracted from multiple sources
US7831545B1 (en) * 2005-05-31 2010-11-09 Google Inc. Identifying the unifying subject of a set of facts
US8996470B1 (en) 2005-05-31 2015-03-31 Google Inc. System for ensuring the internal consistency of a fact repository
US8762410B2 (en) * 2005-07-18 2014-06-24 Oracle International Corporation Document level indexes for efficient processing in multiple tiers of a computer system
US20070016605A1 (en) * 2005-07-18 2007-01-18 Ravi Murthy Mechanism for computing structural summaries of XML document collections in a database system
US7937344B2 (en) 2005-07-25 2011-05-03 Splunk Inc. Machine data web
US7907608B2 (en) 2005-08-12 2011-03-15 Mcafee, Inc. High speed packet capture
US7818326B2 (en) 2005-08-31 2010-10-19 Mcafee, Inc. System and method for word indexing in a capture system and querying thereof
US20070067320A1 (en) * 2005-09-20 2007-03-22 International Business Machines Corporation Detecting relationships in unstructured text
US7548933B2 (en) * 2005-10-14 2009-06-16 International Business Machines Corporation System and method for exploiting semantic annotations in executing keyword queries over a collection of text documents
US7730011B1 (en) 2005-10-19 2010-06-01 Mcafee, Inc. Attributes of captured objects in a capture system
US7664742B2 (en) * 2005-11-14 2010-02-16 Pettovello Primo M Index data structure for a peer-to-peer network
US7693836B2 (en) 2005-12-27 2010-04-06 Baynote, Inc. Method and apparatus for determining peer groups based upon observed usage patterns
US7487174B2 (en) * 2006-01-17 2009-02-03 International Business Machines Corporation Method for storing text annotations with associated type information in a structured data store
US20070174309A1 (en) * 2006-01-18 2007-07-26 Pettovello Primo M Mtreeini: intermediate nodes and indexes
US8260785B2 (en) 2006-02-17 2012-09-04 Google Inc. Automatic object reference identification and linking in a browseable fact repository
US20070185870A1 (en) 2006-01-27 2007-08-09 Hogue Andrew W Data object visualization using graphs
US8055674B2 (en) * 2006-02-17 2011-11-08 Google Inc. Annotation framework
US8954426B2 (en) * 2006-02-17 2015-02-10 Google Inc. Query language
US7836399B2 (en) * 2006-02-09 2010-11-16 Microsoft Corporation Detection of lists in vector graphics documents
US7958164B2 (en) 2006-02-16 2011-06-07 Microsoft Corporation Visual design of annotated regular expression
US7844603B2 (en) * 2006-02-17 2010-11-30 Google Inc. Sharing user distributed search results
US8122019B2 (en) * 2006-02-17 2012-02-21 Google Inc. Sharing user distributed search results
US8862572B2 (en) * 2006-02-17 2014-10-14 Google Inc. Sharing user distributed search results
US7860881B2 (en) * 2006-03-09 2010-12-28 Microsoft Corporation Data parsing with annotated patterns
US7949538B2 (en) * 2006-03-14 2011-05-24 A-Life Medical, Inc. Automated interpretation of clinical encounters with cultural cues
US8504537B2 (en) 2006-03-24 2013-08-06 Mcafee, Inc. Signature distribution in a document registration system
US8731954B2 (en) * 2006-03-27 2014-05-20 A-Life Medical, Llc Auditing the coding and abstracting of documents
US7958227B2 (en) 2006-05-22 2011-06-07 Mcafee, Inc. Attributes of captured objects in a capture system
US8510292B2 (en) * 2006-05-25 2013-08-13 Oracle International Coporation Isolation for applications working on shared XML data
US8996979B2 (en) 2006-06-08 2015-03-31 West Services, Inc. Document automation systems
US7668791B2 (en) * 2006-07-31 2010-02-23 Microsoft Corporation Distinguishing facts from opinions using a multi-stage approach
US8234706B2 (en) * 2006-09-08 2012-07-31 Microsoft Corporation Enabling access to aggregated software security information
US9147271B2 (en) 2006-09-08 2015-09-29 Microsoft Technology Licensing, Llc Graphical representation of aggregated data
US20080126385A1 (en) * 2006-09-19 2008-05-29 Microsoft Corporation Intelligent batching of electronic data interchange messages
US20080126386A1 (en) * 2006-09-20 2008-05-29 Microsoft Corporation Translation of electronic data interchange messages to extensible markup language representation(s)
US8108767B2 (en) * 2006-09-20 2012-01-31 Microsoft Corporation Electronic data interchange transaction set definition based instance editing
US8161078B2 (en) * 2006-09-20 2012-04-17 Microsoft Corporation Electronic data interchange (EDI) data dictionary management and versioning system
US20080071806A1 (en) * 2006-09-20 2008-03-20 Microsoft Corporation Difference analysis for electronic data interchange (edi) data dictionary
US8954412B1 (en) 2006-09-28 2015-02-10 Google Inc. Corroborating facts in electronic documents
US9892111B2 (en) 2006-10-10 2018-02-13 Abbyy Production Llc Method and device to estimate similarity between documents having multiple segments
US9075864B2 (en) 2006-10-10 2015-07-07 Abbyy Infopoisk Llc Method and system for semantic searching using syntactic and semantic analysis
US9495358B2 (en) 2006-10-10 2016-11-15 Abbyy Infopoisk Llc Cross-language text clustering
US9098489B2 (en) 2006-10-10 2015-08-04 Abbyy Infopoisk Llc Method and system for semantic searching
US9189482B2 (en) 2012-10-10 2015-11-17 Abbyy Infopoisk Llc Similar document search
US9069750B2 (en) 2006-10-10 2015-06-30 Abbyy Infopoisk Llc Method and system for semantic searching of natural language texts
US8122026B1 (en) * 2006-10-20 2012-02-21 Google Inc. Finding and disambiguating references to entities on web pages
US8099429B2 (en) * 2006-12-11 2012-01-17 Microsoft Corporation Relational linking among resoures
US20080162449A1 (en) * 2006-12-28 2008-07-03 Chen Chao-Yu Dynamic page similarity measurement
US20080168109A1 (en) * 2007-01-09 2008-07-10 Microsoft Corporation Automatic map updating based on schema changes
US20080168081A1 (en) * 2007-01-09 2008-07-10 Microsoft Corporation Extensible schemas and party configurations for edi document generation or validation
US8347202B1 (en) 2007-03-14 2013-01-01 Google Inc. Determining geographic locations for place names in a fact repository
US20100131534A1 (en) * 2007-04-10 2010-05-27 Toshio Takeda Information providing system
US7908552B2 (en) 2007-04-13 2011-03-15 A-Life Medical Inc. Mere-parsing with boundary and semantic driven scoping
US8682823B2 (en) 2007-04-13 2014-03-25 A-Life Medical, Llc Multi-magnitudinal vectors with resolution based on source vector features
JP4536747B2 (en) * 2007-04-19 2010-09-01 インターナショナル・ビジネス・マシーンズ・コーポレーション Advertisement selection system, method and program
US9779079B2 (en) * 2007-06-01 2017-10-03 Xerox Corporation Authoring system
US9251137B2 (en) * 2007-06-21 2016-02-02 International Business Machines Corporation Method of text type-ahead
US8812296B2 (en) * 2007-06-27 2014-08-19 Abbyy Infopoisk Llc Method and system for natural language dictionary generation
US8250651B2 (en) * 2007-06-28 2012-08-21 Microsoft Corporation Identifying attributes of aggregated data
US8302197B2 (en) * 2007-06-28 2012-10-30 Microsoft Corporation Identifying data associated with security issue attributes
JP2010532897A (en) * 2007-07-10 2010-10-14 インターナショナル・ビジネス・マシーンズ・コーポレーション Intelligent text annotation method, system and computer program
US7970766B1 (en) 2007-07-23 2011-06-28 Google Inc. Entity type assignment
US9946846B2 (en) * 2007-08-03 2018-04-17 A-Life Medical, Llc Visualizing the documentation and coding of surgical procedures
EP2176730A4 (en) * 2007-08-08 2011-04-20 Baynote Inc Method and apparatus for context-based content recommendation
US8463593B2 (en) * 2007-08-31 2013-06-11 Microsoft Corporation Natural language hypernym weighting for word sense disambiguation
US8346756B2 (en) * 2007-08-31 2013-01-01 Microsoft Corporation Calculating valence of expressions within documents for searching a document index
US8229730B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Indexing role hierarchies for words in a search index
US8316036B2 (en) 2007-08-31 2012-11-20 Microsoft Corporation Checkpointing iterators during search
US8280721B2 (en) * 2007-08-31 2012-10-02 Microsoft Corporation Efficiently representing word sense probabilities
US8041697B2 (en) * 2007-08-31 2011-10-18 Microsoft Corporation Semi-automatic example-based induction of semantic translation rules to support natural language search
US8868562B2 (en) * 2007-08-31 2014-10-21 Microsoft Corporation Identification of semantic relationships within reported speech
US8712758B2 (en) * 2007-08-31 2014-04-29 Microsoft Corporation Coreference resolution in an ambiguity-sensitive natural language processing system
US20090070322A1 (en) * 2007-08-31 2009-03-12 Powerset, Inc. Browsing knowledge on the basis of semantic relations
KR101522049B1 (en) * 2007-08-31 2015-05-20 마이크로소프트 코포레이션 Coreference resolution in an ambiguity-sensitive natural language processing system
US8639708B2 (en) * 2007-08-31 2014-01-28 Microsoft Corporation Fact-based indexing for natural language search
US8229970B2 (en) * 2007-08-31 2012-07-24 Microsoft Corporation Efficient storage and retrieval of posting lists
US9058608B2 (en) * 2007-09-12 2015-06-16 Google Inc. Placement attribute targeting
US8326833B2 (en) * 2007-10-04 2012-12-04 International Business Machines Corporation Implementing metadata extraction of artifacts from associated collaborative discussions
US7991768B2 (en) 2007-11-08 2011-08-02 Oracle International Corporation Global query normalization to improve XML index based rewrites for path subsetted index
US7987416B2 (en) * 2007-11-14 2011-07-26 Sap Ag Systems and methods for modular information extraction
US8812435B1 (en) * 2007-11-16 2014-08-19 Google Inc. Learning objects and facts from documents
US20090157385A1 (en) * 2007-12-14 2009-06-18 Nokia Corporation Inverse Text Normalization
US20090172517A1 (en) * 2007-12-27 2009-07-02 Kalicharan Bhagavathi P Document parsing method and system using web-based GUI software
US8316035B2 (en) 2008-01-16 2012-11-20 International Business Machines Corporation Systems and arrangements of text type-ahead
US8019769B2 (en) * 2008-01-18 2011-09-13 Litera Corp. System and method for determining valid citation patterns in electronic documents
US20090198646A1 (en) * 2008-01-31 2009-08-06 International Business Machines Corporation Systems, methods and computer program products for an algebraic approach to rule-based information extraction
JP4613214B2 (en) * 2008-02-26 2011-01-12 日立オートモティブシステムズ株式会社 Software automatic configuration device
US8249856B2 (en) * 2008-03-20 2012-08-21 Raytheon Bbn Technologies Corp. Machine translation
KR101475339B1 (en) * 2008-04-14 2014-12-23 삼성전자주식회사 Communication terminal and method for unified natural language interface thereof
US8359532B2 (en) * 2008-04-28 2013-01-22 International Business Machines Corporation Text type-ahead
US8869140B2 (en) * 2008-05-09 2014-10-21 Sap Se Deploying software modules in computer system
US8738360B2 (en) 2008-06-06 2014-05-27 Apple Inc. Data detection of a character sequence having multiple possible data types
US8205242B2 (en) 2008-07-10 2012-06-19 Mcafee, Inc. System and method for data mining and security policy management
US9253154B2 (en) 2008-08-12 2016-02-02 Mcafee, Inc. Configuration management for a capture/registration system
CA2639438A1 (en) * 2008-09-08 2010-03-08 Semanti Inc. Semantically associated computer search index, and uses therefore
US20100083095A1 (en) * 2008-09-29 2010-04-01 Nikovski Daniel N Method for Extracting Data from Web Pages
TW201027375A (en) * 2008-10-20 2010-07-16 Ibm Search system, search method and program
US8489388B2 (en) * 2008-11-10 2013-07-16 Apple Inc. Data detection
US8306806B2 (en) * 2008-12-02 2012-11-06 Microsoft Corporation Adaptive web mining of bilingual lexicon
US8850591B2 (en) 2009-01-13 2014-09-30 Mcafee, Inc. System and method for concept building
US8706709B2 (en) 2009-01-15 2014-04-22 Mcafee, Inc. System and method for intelligent term grouping
US8190538B2 (en) * 2009-01-30 2012-05-29 Lexisnexis Group Methods and systems for matching records and normalizing names
US8473442B1 (en) 2009-02-25 2013-06-25 Mcafee, Inc. System and method for intelligent state management
US8433559B2 (en) * 2009-03-24 2013-04-30 Microsoft Corporation Text analysis using phrase definitions and containers
US8667121B2 (en) 2009-03-25 2014-03-04 Mcafee, Inc. System and method for managing data and policies
US8447722B1 (en) 2009-03-25 2013-05-21 Mcafee, Inc. System and method for data mining and security policy management
US8325974B1 (en) 2009-03-31 2012-12-04 Amazon Technologies Inc. Recognition of characters and their significance within written works
US8073718B2 (en) * 2009-05-29 2011-12-06 Hyperquest, Inc. Automation of auditing claims
US8346577B2 (en) 2009-05-29 2013-01-01 Hyperquest, Inc. Automation of auditing claims
US8447632B2 (en) * 2009-05-29 2013-05-21 Hyperquest, Inc. Automation of auditing claims
US8255205B2 (en) 2009-05-29 2012-08-28 Hyperquest, Inc. Automation of auditing claims
US8150695B1 (en) 2009-06-18 2012-04-03 Amazon Technologies, Inc. Presentation of written works based on character identities and attributes
US9189475B2 (en) * 2009-06-22 2015-11-17 Ca, Inc. Indexing mechanism (nth phrasal index) for advanced leveraging for translation
US8386498B2 (en) * 2009-08-05 2013-02-26 Loglogic, Inc. Message descriptions
US20110035210A1 (en) * 2009-08-10 2011-02-10 Benjamin Rosenfeld Conditional random fields (crf)-based relation extraction system
US8631028B1 (en) 2009-10-29 2014-01-14 Primo M. Pettovello XPath query processing improvements
US8458186B2 (en) 2009-11-06 2013-06-04 Symantec Corporation Systems and methods for processing and managing object-related data for use by a plurality of applications
US20110145240A1 (en) * 2009-12-15 2011-06-16 International Business Machines Corporation Organizing Annotations
US9047283B1 (en) * 2010-01-29 2015-06-02 Guangsheng Zhang Automated topic discovery in documents and content categorization
JP5656585B2 (en) * 2010-02-17 2015-01-21 キヤノン株式会社 Document creation support apparatus, document creation support method, and program
US8339094B2 (en) * 2010-03-11 2012-12-25 GM Global Technology Operations LLC Methods, systems and apparatus for overmodulation of a five-phase machine
US9760634B1 (en) 2010-03-23 2017-09-12 Firstrain, Inc. Models for classifying documents
US8463789B1 (en) 2010-03-23 2013-06-11 Firstrain, Inc. Event detection
US9460232B2 (en) * 2010-04-07 2016-10-04 Oracle International Corporation Searching document object model elements by attribute order priority
US8538916B1 (en) 2010-04-09 2013-09-17 Google Inc. Extracting instance attributes from text
JP2011232871A (en) * 2010-04-26 2011-11-17 Sony Corp Information processor, text selection method and program
US9858338B2 (en) * 2010-04-30 2018-01-02 International Business Machines Corporation Managed document research domains
US8457948B2 (en) * 2010-05-13 2013-06-04 Expedia, Inc. Systems and methods for automated content generation
US9418069B2 (en) 2010-05-26 2016-08-16 International Business Machines Corporation Extensible system and method for information extraction in a data processing system
US20110295864A1 (en) * 2010-05-29 2011-12-01 Martin Betz Iterative fact-extraction
GB201010545D0 (en) * 2010-06-23 2010-08-11 Rolls Royce Plc Entity recognition
US8527488B1 (en) * 2010-07-08 2013-09-03 Netlogic Microsystems, Inc. Negative regular expression search operations
US8468021B2 (en) * 2010-07-15 2013-06-18 King Abdulaziz City For Science And Technology System and method for writing digits in words and pronunciation of numbers, fractions, and units
TWI403304B (en) * 2010-08-27 2013-08-01 Ind Tech Res Inst Method and mobile device for awareness of linguistic ability
US8977538B2 (en) * 2010-09-13 2015-03-10 Richard Salisbury Constructing and analyzing a word graph
US8239349B2 (en) 2010-10-07 2012-08-07 Hewlett-Packard Development Company, L.P. Extracting data
US8312018B2 (en) * 2010-10-20 2012-11-13 Business Objects Software Limited Entity expansion and grouping
US9015033B2 (en) * 2010-10-26 2015-04-21 At&T Intellectual Property I, L.P. Method and apparatus for detecting a sentiment of short messages
US20120101980A1 (en) * 2010-10-26 2012-04-26 Microsoft Corporation Synchronizing online document edits
US8806615B2 (en) * 2010-11-04 2014-08-12 Mcafee, Inc. System and method for protecting specified data combinations
JP5197774B2 (en) * 2011-01-18 2013-05-15 株式会社東芝 Learning device, determination device, learning method, determination method, learning program, and determination program
EP2678822A4 (en) 2011-02-23 2014-09-10 Bottlenose Inc System and method for analyzing messages in a network or across networks
US8719692B2 (en) * 2011-03-11 2014-05-06 Microsoft Corporation Validation, rejection, and modification of automatically generated document annotations
JP2012212422A (en) * 2011-03-24 2012-11-01 Sony Corp Information processor, information processing method, and program
US9110883B2 (en) * 2011-04-01 2015-08-18 Rima Ghannam System for natural language understanding
US10048992B2 (en) * 2011-04-13 2018-08-14 Microsoft Technology Licensing, Llc Extension of schematized XML protocols
US20120265784A1 (en) * 2011-04-15 2012-10-18 Microsoft Corporation Ordering semantic query formulation suggestions
US8838992B1 (en) * 2011-04-28 2014-09-16 Trend Micro Incorporated Identification of normal scripts in computer systems
US20120303570A1 (en) * 2011-05-27 2012-11-29 Verizon Patent And Licensing, Inc. System for and method of parsing an electronic mail
US8630989B2 (en) 2011-05-27 2014-01-14 International Business Machines Corporation Systems and methods for information extraction using contextual pattern discovery
US9164983B2 (en) * 2011-05-27 2015-10-20 Robert Bosch Gmbh Broad-coverage normalization system for social media language
WO2013054348A2 (en) * 2011-07-20 2013-04-18 Tata Consultancy Services Limited A method and system for differentiating textual information embedded in streaming news video
US8745521B2 (en) * 2011-08-08 2014-06-03 The Original Software Group Limited System and method for annotating graphical user interface
WO2013026146A1 (en) * 2011-08-24 2013-02-28 Alexei Kamychev Method and apparatus for emulating short text fast-reading processes
US9785638B1 (en) 2011-08-25 2017-10-10 Infotech International Llc Document display system and method
US9633012B1 (en) 2011-08-25 2017-04-25 Infotech International Llc Construction permit processing system and method
US9116895B1 (en) 2011-08-25 2015-08-25 Infotech International Llc Document processing system and method
US8812301B2 (en) * 2011-09-26 2014-08-19 Xerox Corporation Linguistically-adapted structural query annotation
KR101510647B1 (en) * 2011-10-07 2015-04-10 한국전자통신연구원 Method and apparatus for providing web trend analysis based on issue template extraction
US8782042B1 (en) 2011-10-14 2014-07-15 Firstrain, Inc. Method and system for identifying entities
US20130110818A1 (en) * 2011-10-28 2013-05-02 Eamonn O'Brien-Strain Profile driven extraction
US9201868B1 (en) * 2011-12-09 2015-12-01 Guangsheng Zhang System, methods and user interface for identifying and presenting sentiment information
US8700561B2 (en) 2011-12-27 2014-04-15 Mcafee, Inc. System and method for providing data protection workflows in a network environment
US8832092B2 (en) 2012-02-17 2014-09-09 Bottlenose, Inc. Natural language processing optimized for micro content
CN102646128A (en) * 2012-03-06 2012-08-22 北京航空航天大学 Method for labeling word properties of emotional words based on extensible markup language (XML)
US9158754B2 (en) * 2012-03-29 2015-10-13 The Echo Nest Corporation Named entity extraction from a block of text
US20150082142A1 (en) * 2012-04-27 2015-03-19 Citadel Corporation Pty Ltd Method for storing and applying related sets of pattern/message rules
US20130298003A1 (en) * 2012-05-04 2013-11-07 Rawllin International Inc. Automatic annotation of content
US9569413B2 (en) * 2012-05-07 2017-02-14 Sap Se Document text processing using edge detection
CA2865186C (en) * 2012-05-15 2015-10-20 Whyz Technologies Limited Method and system relating to sentiment analysis of electronic content
US9684648B2 (en) * 2012-05-31 2017-06-20 International Business Machines Corporation Disambiguating words within a text segment
US9292505B1 (en) 2012-06-12 2016-03-22 Firstrain, Inc. Graphical user interface for recurring searches
US9009126B2 (en) 2012-07-31 2015-04-14 Bottlenose, Inc. Discovering and ranking trending links about topics
WO2014049186A1 (en) * 2012-09-26 2014-04-03 Universidad Carlos Iii De Madrid Method for generating semantic patterns
US9436660B2 (en) * 2012-11-16 2016-09-06 International Business Machines Corporation Building and maintaining information extraction rules
JP2014115894A (en) * 2012-12-11 2014-06-26 Canon Inc Display device
US10592480B1 (en) 2012-12-30 2020-03-17 Aurea Software, Inc. Affinity scoring
US20140236570A1 (en) * 2013-02-18 2014-08-21 Microsoft Corporation Exploiting the semantic web for unsupervised spoken language understanding
US10235358B2 (en) 2013-02-21 2019-03-19 Microsoft Technology Licensing, Llc Exploiting structured content for unsupervised natural language semantic parsing
US8762302B1 (en) 2013-02-22 2014-06-24 Bottlenose, Inc. System and method for revealing correlations between data streams
US9201860B1 (en) * 2013-03-12 2015-12-01 Guangsheng Zhang System and methods for determining sentiment based on context
US9262550B2 (en) 2013-03-15 2016-02-16 Business Objects Software Ltd. Processing semi-structured data
US9299041B2 (en) 2013-03-15 2016-03-29 Business Objects Software Ltd. Obtaining data from unstructured data for a structured data collection
US9218568B2 (en) 2013-03-15 2015-12-22 Business Objects Software Ltd. Disambiguating data using contextual and historical information
US9898523B2 (en) 2013-04-22 2018-02-20 Abb Research Ltd. Tabular data parsing in document(s)
US9460199B2 (en) 2013-05-01 2016-10-04 International Business Machines Corporation Application of text analytics to determine provenance of an object
WO2014194321A2 (en) * 2013-05-31 2014-12-04 Joshi Vikas Balwant Method and apparatus for browsing information
US10037317B1 (en) 2013-07-17 2018-07-31 Yseop Sa Techniques for automatic generation of natural language text
US9411804B1 (en) * 2013-07-17 2016-08-09 Yseop Sa Techniques for automatic generation of natural language text
US9747280B1 (en) * 2013-08-21 2017-08-29 Intelligent Language, LLC Date and time processing
US9639818B2 (en) 2013-08-30 2017-05-02 Sap Se Creation of event types for news mining for enterprise resource planning
US10541053B2 (en) 2013-09-05 2020-01-21 Optum360, LLCq Automated clinical indicator recognition with natural language processing
US9916289B2 (en) * 2013-09-10 2018-03-13 Embarcadero Technologies, Inc. Syndication of associations relating data and metadata
US9898467B1 (en) * 2013-09-24 2018-02-20 Amazon Technologies, Inc. System for data normalization
US10133727B2 (en) 2013-10-01 2018-11-20 A-Life Medical, Llc Ontologically driven procedure coding
US10002117B1 (en) * 2013-10-24 2018-06-19 Google Llc Translating annotation tags into suggested markup
US8781815B1 (en) * 2013-12-05 2014-07-15 Seal Software Ltd. Non-standard and standard clause detection
US10073840B2 (en) 2013-12-20 2018-09-11 Microsoft Technology Licensing, Llc Unsupervised relation detection model training
RU2586577C2 (en) 2014-01-15 2016-06-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Filtering arcs parser graph
US9870356B2 (en) 2014-02-13 2018-01-16 Microsoft Technology Licensing, Llc Techniques for inferring the unknown intents of linguistic items
US10075484B1 (en) 2014-03-13 2018-09-11 Issuu, Inc. Sharable clips for digital publications
US9665617B1 (en) * 2014-04-16 2017-05-30 Google Inc. Methods and systems for generating a stable identifier for nodes likely including primary content within an information resource
US9665454B2 (en) 2014-05-14 2017-05-30 International Business Machines Corporation Extracting test model from textual test suite
US9659005B2 (en) 2014-05-16 2017-05-23 Semantix Technologies Corporation System for semantic interpretation
US9836765B2 (en) 2014-05-19 2017-12-05 Kibo Software, Inc. System and method for context-aware recommendation through user activity change detection
US9761222B1 (en) * 2014-06-11 2017-09-12 Albert Scarasso Intelligent conversational messaging
RU2674331C2 (en) * 2014-09-03 2018-12-06 Дзе Дан Энд Брэдстрит Корпорейшн System and process for analysis, qualification and acquisition of sources of unstructured data by means of empirical attribution
US9348806B2 (en) * 2014-09-30 2016-05-24 International Business Machines Corporation High speed dictionary expansion
US9454695B2 (en) * 2014-10-22 2016-09-27 Xerox Corporation System and method for multi-view pattern matching
US10148547B2 (en) * 2014-10-24 2018-12-04 Tektronix, Inc. Hardware trigger generation from a declarative protocol description
US9626358B2 (en) 2014-11-26 2017-04-18 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
EP3029607A1 (en) * 2014-12-05 2016-06-08 PLANET AI GmbH Method for text recognition and computer program product
US11140115B1 (en) * 2014-12-09 2021-10-05 Google Llc Systems and methods of applying semantic features for machine learning of message categories
US10176163B2 (en) * 2014-12-19 2019-01-08 International Business Machines Corporation Diagnosing autism spectrum disorder using natural language processing
US10706124B2 (en) * 2015-01-12 2020-07-07 Microsoft Technology Licensing, Llc Storage and retrieval of structured content in unstructured user-editable content stores
US10019437B2 (en) * 2015-02-23 2018-07-10 International Business Machines Corporation Facilitating information extraction via semantic abstraction
US20160299928A1 (en) * 2015-04-10 2016-10-13 Infotrax Systems Variable record size within a hierarchically organized data structure
US11010768B2 (en) * 2015-04-30 2021-05-18 Oracle International Corporation Character-based attribute value extraction system
US10102275B2 (en) 2015-05-27 2018-10-16 International Business Machines Corporation User interface for a query answering system
US11842802B2 (en) * 2015-06-19 2023-12-12 Koninklijke Philips N.V. Efficient clinical trial matching
US9824083B2 (en) * 2015-07-07 2017-11-21 Rima Ghannam System for natural language understanding
US9805025B2 (en) * 2015-07-13 2017-10-31 Seal Software Limited Standard exact clause detection
US9363149B1 (en) 2015-08-01 2016-06-07 Splunk Inc. Management console for network security investigations
US9516052B1 (en) 2015-08-01 2016-12-06 Splunk Inc. Timeline displays of network security investigation events
US10254934B2 (en) 2015-08-01 2019-04-09 Splunk Inc. Network security investigation workflow logging
US11157532B2 (en) * 2015-10-05 2021-10-26 International Business Machines Corporation Hierarchical target centric pattern generation
US9633048B1 (en) 2015-11-16 2017-04-25 Adobe Systems Incorporated Converting a text sentence to a series of images
US10146858B2 (en) 2015-12-11 2018-12-04 International Business Machines Corporation Discrepancy handler for document ingestion into a corpus for a cognitive computing system
US9842161B2 (en) * 2016-01-12 2017-12-12 International Business Machines Corporation Discrepancy curator for documents in a corpus of a cognitive computing system
US10176250B2 (en) 2016-01-12 2019-01-08 International Business Machines Corporation Automated curation of documents in a corpus for a cognitive computing system
US10268750B2 (en) * 2016-01-29 2019-04-23 Cisco Technology, Inc. Log event summarization for distributed server system
US9836451B2 (en) * 2016-02-18 2017-12-05 Sap Se Dynamic tokens for an expression parser
US10726054B2 (en) 2016-02-23 2020-07-28 Carrier Corporation Extraction of policies from natural language documents for physical access control
JP2017167433A (en) * 2016-03-17 2017-09-21 株式会社東芝 Summary generation device, summary generation method, and summary generation program
CN107342881B (en) * 2016-05-03 2021-03-19 中国移动通信集团四川有限公司 Northbound interface data processing method and device for operation and maintenance center
US11163806B2 (en) * 2016-05-27 2021-11-02 International Business Machines Corporation Obtaining candidates for a relationship type and its label
US11049190B2 (en) 2016-07-15 2021-06-29 Intuit Inc. System and method for automatically generating calculations for fields in compliance forms
US11222266B2 (en) 2016-07-15 2022-01-11 Intuit Inc. System and method for automatic learning of functions
US20180018322A1 (en) * 2016-07-15 2018-01-18 Intuit Inc. System and method for automatically understanding lines of compliance forms through natural language patterns
US10579721B2 (en) 2016-07-15 2020-03-03 Intuit Inc. Lean parsing: a natural language processing system and method for parsing domain-specific languages
US10725896B2 (en) 2016-07-15 2020-07-28 Intuit Inc. System and method for identifying a subset of total historical users of a document preparation system to represent a full set of test scenarios based on code coverage
US10169581B2 (en) 2016-08-29 2019-01-01 Trend Micro Incorporated Detecting malicious code in sections of computer files
US10769213B2 (en) * 2016-10-24 2020-09-08 International Business Machines Corporation Detection of document similarity
RU2636098C1 (en) * 2016-10-26 2017-11-20 Общество с ограниченной ответственностью "Аби Продакшн" Use of depth semantic analysis of texts on natural language for creation of training samples in methods of machine training
US10832000B2 (en) * 2016-11-14 2020-11-10 International Business Machines Corporation Identification of textual similarity with references
US10402499B2 (en) 2016-11-17 2019-09-03 Goldman Sachs & Co. LLC System and method for coupled detection of syntax and semantics for natural language understanding and generation
MX2019008257A (en) * 2017-01-11 2019-10-07 Koninklijke Philips Nv Method and system for automated inclusion or exclusion criteria detection.
CA2977847A1 (en) * 2017-01-27 2018-07-27 Hootsuite Media Inc. Automated extraction tools and their use in social content tagging systems
US10565498B1 (en) 2017-02-28 2020-02-18 Amazon Technologies, Inc. Deep neural network-based relationship analysis with multi-feature token model
US10579719B2 (en) * 2017-06-15 2020-03-03 Turbopatent Inc. System and method for editor emulation
US10713519B2 (en) * 2017-06-22 2020-07-14 Adobe Inc. Automated workflows for identification of reading order from text segments using probabilistic language models
US10740560B2 (en) 2017-06-30 2020-08-11 Elsevier, Inc. Systems and methods for extracting funder information from text
RU2665261C1 (en) * 2017-08-25 2018-08-28 Общество с ограниченной ответственностью "Аби Продакшн" Recovery of text annotations related to information objects
US11475209B2 (en) * 2017-10-17 2022-10-18 Handycontract Llc Device, system, and method for extracting named entities from sectioned documents
US10191975B1 (en) * 2017-11-16 2019-01-29 The Florida International University Board Of Trustees Features for automatic classification of narrative point of view and diegesis
US11055209B2 (en) * 2017-12-21 2021-07-06 Google Llc Application analysis with flexible post-processing
US10872122B2 (en) * 2018-01-30 2020-12-22 Government Of The United States Of America, As Represented By The Secretary Of Commerce Knowledge management system and process for managing knowledge
US11586955B2 (en) 2018-02-02 2023-02-21 Accenture Global Solutions Limited Ontology and rule based adjudication
JP7247460B2 (en) * 2018-03-13 2023-03-29 富士通株式会社 Correspondence Generating Program, Correspondence Generating Device, Correspondence Generating Method, and Translation Program
US10733389B2 (en) * 2018-09-05 2020-08-04 International Business Machines Corporation Computer aided input segmentation for machine translation
US10936809B2 (en) * 2018-09-11 2021-03-02 Dell Products L.P. Method of optimized parsing unstructured and garbled texts lacking whitespaces
US11295083B1 (en) * 2018-09-26 2022-04-05 Amazon Technologies, Inc. Neural models for named-entity recognition
US11238215B2 (en) 2018-12-04 2022-02-01 Issuu, Inc. Systems and methods for generating social assets from electronic publications
US10977289B2 (en) * 2019-02-11 2021-04-13 Verizon Media Inc. Automatic electronic message content extraction method and apparatus
US11048864B2 (en) * 2019-04-01 2021-06-29 Adobe Inc. Digital annotation and digital content linking techniques
US11030402B2 (en) 2019-05-03 2021-06-08 International Business Machines Corporation Dictionary expansion using neural language models
US11163956B1 (en) 2019-05-23 2021-11-02 Intuit Inc. System and method for recognizing domain specific named entities using domain specific word embeddings
WO2020240870A1 (en) * 2019-05-31 2020-12-03 日本電気株式会社 Parameter learning device, parameter learning method, and computer-readable recording medium
CN114616572A (en) * 2019-09-16 2022-06-10 多库加米公司 Cross-document intelligent writing and processing assistant
US11163954B2 (en) * 2019-09-18 2021-11-02 International Business Machines Corporation Propagation of annotation metadata to overlapping annotations of synonymous type
US11783128B2 (en) 2020-02-19 2023-10-10 Intuit Inc. Financial document text conversion to computer readable operations
US11625555B1 (en) 2020-03-12 2023-04-11 Amazon Technologies, Inc. Artificial intelligence system with unsupervised model training for entity-pair relationship analysis
US11074402B1 (en) * 2020-04-07 2021-07-27 International Business Machines Corporation Linguistically consistent document annotation
US11514321B1 (en) 2020-06-12 2022-11-29 Amazon Technologies, Inc. Artificial intelligence system using unsupervised transfer learning for intra-cluster analysis
US20210403036A1 (en) * 2020-06-30 2021-12-30 Lyft, Inc. Systems and methods for encoding and searching scenario information
US11423072B1 (en) 2020-07-31 2022-08-23 Amazon Technologies, Inc. Artificial intelligence system employing multimodal learning for analyzing entity record relationships
US11620558B1 (en) 2020-08-25 2023-04-04 Amazon Technologies, Inc. Iterative machine learning based techniques for value-based defect analysis in large data sets
CN112035408B (en) * 2020-09-01 2023-10-31 文思海辉智科科技有限公司 Text processing method, device, electronic equipment and storage medium
RU2751993C1 (en) * 2020-09-09 2021-07-21 Глеб Валерьевич Данилов Method for extracting information from unstructured texts written in natural language
US20220101873A1 (en) * 2020-09-30 2022-03-31 Harman International Industries, Incorporated Techniques for providing feedback on the veracity of spoken statements
CN112417161B (en) * 2020-11-12 2022-06-24 福建亿榕信息技术有限公司 Method and storage device for recognizing upper and lower relationships of knowledge graph based on mode expansion and BERT classification
CN112819622B (en) * 2021-01-26 2023-10-17 深圳价值在线信息科技股份有限公司 Information entity relationship joint extraction method and device and terminal equipment
EP4075320A1 (en) 2021-04-15 2022-10-19 Wonop Holding ApS A method and device for improving the efficiency of pattern recognition in natural language
CN113420149A (en) * 2021-06-30 2021-09-21 北京百度网讯科技有限公司 Data labeling method and device

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794177A (en) * 1995-07-19 1998-08-11 Inso Corporation Method and apparatus for morphological analysis and generation of natural language text
US6279017B1 (en) * 1996-08-07 2001-08-21 Randall C. Walker Method and apparatus for displaying text based upon attributes found within the text
US6108698A (en) 1998-07-29 2000-08-22 Xerox Corporation Node-link data defining a graph and a tree within the graph
US6442545B1 (en) * 1999-06-01 2002-08-27 Clearforest Ltd. Term-level text with mining with taxonomies
US6601026B2 (en) * 1999-09-17 2003-07-29 Discern Communications, Inc. Information retrieval by natural language querying
JP3879350B2 (en) * 2000-01-25 2007-02-14 富士ゼロックス株式会社 Structured document processing system and structured document processing method
US7010479B2 (en) * 2000-07-26 2006-03-07 Oki Electric Industry Co., Ltd. Apparatus and method for natural language processing
SE524595C2 (en) 2000-09-26 2004-08-31 Hapax Information Systems Ab Procedure and computer program for normalization of style throws
US7330811B2 (en) * 2000-09-29 2008-02-12 Axonwave Software, Inc. Method and system for adapting synonym resources to specific domains
WO2002033584A1 (en) * 2000-10-19 2002-04-25 Copernic.Com Text extraction method for html pages
US6714939B2 (en) * 2001-01-08 2004-03-30 Softface, Inc. Creation of structured data from plain text
US6892189B2 (en) 2001-01-26 2005-05-10 Inxight Software, Inc. Method for learning and combining global and local regularities for information extraction and classification
US6813616B2 (en) * 2001-03-07 2004-11-02 International Business Machines Corporation System and method for building a semantic network capable of identifying word patterns in text
SE0101127D0 (en) 2001-03-30 2001-03-30 Hapax Information Systems Ab Method of finding answers to questions
US20020165717A1 (en) * 2001-04-06 2002-11-07 Solmer Robert P. Efficient method for information extraction
JP4843867B2 (en) * 2001-05-10 2011-12-21 ソニー株式会社 Document processing apparatus, document processing method, document processing program, and recording medium
US7013262B2 (en) * 2002-02-12 2006-03-14 Sunflare Co., Ltd System and method for accurate grammar analysis using a learners' model and part-of-speech tagged (POST) parser
JP2003242136A (en) * 2002-02-20 2003-08-29 Fuji Xerox Co Ltd Syntax information tag imparting support system and method therefor
EP1686499B1 (en) * 2002-06-28 2010-06-30 Nippon Telegraph and Telephone Corporation Selection and extraction of information from structured documents
US7139752B2 (en) * 2003-05-30 2006-11-21 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations
US20040243556A1 (en) * 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS)
US7165216B2 (en) * 2004-01-14 2007-01-16 Xerox Corporation Systems and methods for converting legacy and proprietary documents into extended mark-up language format

Also Published As

Publication number Publication date
NZ547871A (en) 2010-03-26
EP1695170A2 (en) 2006-08-30
US20100195909A1 (en) 2010-08-05
AU2004294094B2 (en) 2010-05-13
EP1695170A4 (en) 2010-06-02
US7912705B2 (en) 2011-03-22
US20050108630A1 (en) 2005-05-19
CA2546896C (en) 2012-08-07
WO2005052727A2 (en) 2005-06-09
AU2004294094A1 (en) 2005-06-09
WO2005052727A3 (en) 2007-12-21

Similar Documents

Publication Publication Date Title
CA2546896A1 (en) Extraction of facts from text
Wang et al. Bootstrapping both product features and opinion words from chinese customer reviews with cross-inducing
Dragoni et al. Combining NLP approaches for rule extraction from legal documents
Councill et al. ParsCit: an Open-source CRF Reference String Parsing Package.
Yeniterzi Exploiting morphology in Turkish named entity recognition system
US11314807B2 (en) Methods and systems for comparison of structured documents
Freeman et al. Cross linguistic name matching in English and Arabic
CN101702944A (en) Be used for discerning the semantic processor of the whole-part relations of natural language documents
Sawalha et al. SALMA: standard Arabic language morphological analysis
Malmasi et al. Arabic native language identification
Dragoni et al. Combining natural language processing approaches for rule extraction from legal documents
Kocoń et al. Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF
Kumar et al. Sanskrit compound processor
Yusof et al. Qur'anic words stemming
Singh et al. Identification of languages and encodings in a multilingual document
Kaur et al. Evaluation of named entity features for Punjabi language
JP3744676B2 (en) Information extraction apparatus and method
Khalil et al. Extracting Arabic composite names using genitive principles of Arabic grammar
Bosch et al. Memory-based morphological analysis and part-of-speech tagging of Arabic
Suriyachay et al. Thai named entity tagged corpus annotation scheme and self verification
Raza et al. N-gram based authorship attribution in Urdu poetry
Broda et al. Towards a set of general purpose morphosyntactic tools for Polish
Kim et al. Annotated Bibliographical Reference Corpora in Digital Humanities.
Karkaletsis et al. Populating ontologies in biomedicine and presenting their content using multilingual generation
Hufflen Names in {BibTeX} and {mlBibTeX}

Legal Events

Date Code Title Description
EEER Examination request