« PreviousContinue »
METHODS AND SYSTEMS FOR
GENERATING NATURAL LANGUAGE
DESCRIPTIONS FROM DATA
REFERENCE TO RELATED APPLICATION 5
This application claims priority from Provisional Application Ser. No. 60/644,438, filed on Jan. 14, 2005, which is expressly incorporated by reference herein.
DESCRIPTION OF THE RELATED ART
Data-to-Speech (DTS) generally refers to any system that converts structured data into human language, regardless of whether its output is written, spoken, or otherwise delivered. 15 Existing DTS software systems generally have limitations that may affect their applicability in differing environments or contexts to generate human language text in a format desired by the users. Some DTS systems, such as Goal Getter and D2S, are highly specific to a single domain or subject matter, 20 and may not be easily adapted for use with other subject matters. Such domain-specificity is present in the GoalGetter system that generates descriptions of soccer games in Dutch. This is also an example of a system that is limited to generating text and/or speech in one language. Further, in most 25 template-based systems, the DTS software is only capable of generating a finite number of distinct sentences dependent on the number of templates provided in the DTS software. Moreover, many existing DTS systems typically generate text and/ or speech with a fixed structure, such as detailing events in 30 chronological order. Therefore, a need exists for a DTS system that overcomes the limitations of domain-specificity, language-specificity, finiteness and fixed structure.
In addition, existing DTS systems typically limit user intervention to the selection of a single item or event to be 35 described, and provide an objective factual description of the selected item or event (e.g., a summary of the events in a soccer game, such as goals, fouls, players' actions, and the like). Consequently, a need also exists for a DTS system that may allow users to select any number of items to be described 40 and/or compared, and to specify the user's own preferences as to how the items' information should be evaluated in relation to one another, to provide highly varied descriptions and/or comparisons based on the user's choices and preferences.
In retail commerce, many vendors, whether they are brick- 45 and-mortar retailers, catalog distributors, E-commerce providers with retail websites, and the like, offer a wide variety of goods for sale to their customers. In many instances, the vendors offer the same product or type of product in many different models from many different manufacturers, thereby 50 providing the customer with numerous options such that the customer may select the particular product that most closely satisfies their needs and preferences. Such products include consumer electronics devices, computers and peripheral devices, large and small appliances, automobiles, motor- 55 cycles, furniture and the like. Each product or type of product has associated specifications that serve to define the product and provide a basis for evaluating the product against other similar products that may be purchased by the consumer. Despite the availability of the product specifications for use in 60 the education of the consumer about the product and comparison to similar products, consumers may not be able to efficiently process the information in a manner that allows the consumer to be fully informed about the product and to meaningfully compare similar products and draw meaningful dis- 65 tinctions, especially in areas where the consumer is relatively unsophisticated.
Various sources exist that provide descriptions and comparisons for products to assist consumers in their purchasing decisions. Of course, the product manufacturers provide descriptions of their products that may include comparisons to comparable products offered by themselves and other manufacturers. As these are sales aids, the descriptions will be written in a manner to persuade the consumer to purchase the product, and generally will only include comparisons tending to show the superiority of the product over other products. Consumer Reports is a source for product reviews and ratings of consumer products based on testing performed on the products, and may include comparisons, both favorable and unfavorable, to comparable products based on the results of the testing. While this information is less biased than that provided by the manufacturer, it may be limited to the extent that only a limited number of comparable products may be included in the testing, and that the particular test performed on the products and reported when the reviews are drafted may not include all the information about the product that may be relevant to the consumer in making their purchasing decisions. Still further, in the area of automobiles, services such as Edmunds provide reviews of automobiles that may be viewed on-line or purchased and downloaded. These may include professionally written vehicle reviews, but the reviews are limited to only those vehicles that actually had an editor sit in, drive and live with the reviewed vehicles, and typically do not include comparisons to other vehicles.
With so many different options available for these types of consumer products, it is not feasible for a product reviewing source, let alone an individual consumer, to perform comparisons that are meaningful to the consumers of the specifications of each product to the specifications of each comparable product and generate a textual description of the comparison. Therefore, a need also exists for a system for generating natural language descriptions and comparisons of items within a domain, such as similar products, services, fields of endeavor, and the like, based on specification data for the items such that the descriptions and comparisons may be generated for any desired combination of the items within the domain.
The invention is directed to a natural language generation (NLG) software system that generates rich, content-sensitive human language descriptions based on unparsed raw domainspecific data. In one embodiment, the NLG software system may include a data parser/normalizer, a comparator, a language engine, and a document generator. The data parser/ normalizer may be configured to retrieve specification information for items to be described by the NLG software system, to extract pertinent information from the raw specification information, and to convert and normalize the extracted information so that the items may be compared specification by specification. The comparator may be configured to use the normalized data from the data parser/normalizer to compare the specifications of the items using comparison functions and interpretation rules to determine outcomes of the comparisons. The language engine may be configured to cycle through all or a subset of the normalized specification information, to retrieve all sentence templates associated with each of the item specifications, to call the comparator to compute or retrieve the results of the comparisons between the item specifications, and to recursively generate every possible syntactically legal sentence associated with the specifications based on the retrieved sentence templates. The document generator may be configured to select one or more