|Publication number||US20090177471 A1|
|Application number||US 11/971,897|
|Publication date||Jul 9, 2009|
|Filing date||Jan 9, 2008|
|Priority date||Jan 9, 2008|
|Also published as||US8086455|
|Publication number||11971897, 971897, US 2009/0177471 A1, US 2009/177471 A1, US 20090177471 A1, US 20090177471A1, US 2009177471 A1, US 2009177471A1, US-A1-20090177471, US-A1-2009177471, US2009/0177471A1, US2009/177471A1, US20090177471 A1, US20090177471A1, US2009177471 A1, US2009177471A1|
|Inventors||Yifan Gong, Ye Tian|
|Original Assignee||Microsoft Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Classifications (10), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
Speech recognition model development processes involve performing a large number (e.g., 100) of processing steps. Each step is implemented by a processor that creates one or more outputs based on the consumption of inputs, which are the outputs of other processor steps. A problem is to decide which step is to be performed next. The process can be divided into several phases such as planning in which human experts define high-level tasks of the model development, modeling in which software programs perform data processing and build the models and model tuning in which a number of modeling parameters are tested in order to optimize speech recognition performance.
A current planning phase requires the manual specification of the “predecessors” for each task. The author has to manually order the tasks to ensure that all inputs are prepared before executing a task. When there are a large number of tasks, creation and management of the plan then becomes unreliable.
The modeling phase currently is implemented as rigid (e.g., hard-coded) procedural software. The work-flow is controlled by predesigned switches that determine the sequence of model building actions (e.g., configuration files).
The tuning phase currently is also implemented as rigid procedural software. In this phase, based on the intermediate data files or parameters that are tuned, some processes (and only those processes) must be activated. Since the work-flow is controlled by predesigned switches, it becomes difficult to satisfy the requirement by changing the code.
This approach has several limitations. By design the developer has control over which step to perform or not to perform by turning on/off switches in a control file. Consequently, the subsequent steps are not codified in the process and there is no enforcement to complete all the steps. Moreover, to add new functionality or update to the tool, the whole training process has to be tested against regression. This is expensive and a principal reason why deploying new technology can be costly. Additionally, if the modeling process is interrupted, there is no mechanism to guarantee that the restart will perform only the needed steps without duplicated or missing steps. The tuning stage is more complex because once an input file or a configuration is changed, it is difficult to determine which components are affected and should be rebuilt. Finally, there is no automatic way to know how many subcomponents will be affected if one input is missing.
The following presents a simplified summary in order to provide a basic understanding of some novel embodiments described herein. This summary is not an extensive overview, and it is not intended to identify key/critical elements or to delineate the scope thereof. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
The disclosed architecture includes a recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Rather than hard-coded predefined planning, training and tuning sequences, process steps are defined as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A declarative language is provided that allows a user to declare each processing step, in terms of “action” (processor), “input data”, “output data”, “duration”, and “resource”. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles disclosed herein can be employed and is intended to include all such aspects and equivalents. Other advantages and novel features will become apparent from the following detailed description when considered in conjunction with the drawings.
The disclosed architecture is a computational model that is declarative and utilizes data-dependency driven programming (referred to herein as 4D). The architecture finds particular application to complex data conversion problems involving intermediate data with complex dependencies and many processing steps. For example, the development of recognition models (e.g., speech, handwriting, character, language, etc.) for authoring, generation, and execution can obtain the benefit of the disclosed architecture. A process is defined by, for each processing step, the data dependencies and the input-to-output mapping. The architecture is declarative in a fundamental and basic way such that the end-user can do the programming, rather than requiring specialized employees or vendor involvement to do the programming. For each step, all that is needed is the input data, the desired output data and a processor to process the input data to get the output data. The user does not need to describe the details of the orders processes.
The infrastructure includes a set of terminal processors that perform the step-wise conversion, a language that captures the dependencies, and a compiler (or other ordering component) that determines the process order for arriving at the desired output. The task description is defined using the 4D language to describe the processor, data, and control to generate the output data or results. The user explicitly declares the steps in terms of data input and output. The declaration includes the information leading to the determination of an order of steps, based on the dependencies of the data. For example, if processor P takes X as input and Y as output, then Y depends on X and P, and the processor that generates X will be executed before processor P. The solution derives the sequence of execution automatically based on the dependencies among the pieces of data. The processors perform data processing independently of other processors. The process generator takes the 4D language and the processors to generate a process graph. A graph interpreter executes the process based on the process graph and recursively applies the processor(s) to instantiate the data.
Reference is now made to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding thereof. It may be evident, however, that the novel embodiments can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate a description thereof.
The language utilized in the programming component 102 allows an author to declare the input data and output data of each elementary processing step (a processor) in a recognition (e.g., speech) model development process of multiple processing steps. The declaration does not need to provide predetermined sequence of steps as in conventional procedural languages. Overall, the system 100 (and other systems described herein) facilitates the piece-wise specification of the model development process in which the elementary chunk is a rule defined by a processor, and the processor inputs and outputs. The framework automatically generates the sequence of recognition model development actions based on the declarations. By following the sequence, it is guaranteed that all the steps will have inputs available before being executed.
The system 100 also generates the predecessor steps (that are completed) of each step based on the declarations. As part of the system processes, a model development process graph is automatically generated showing the inputs and outputs of each of the steps. The steps are ordered as defined by data consumption relationships (data dependencies) described by the declarations. No manual programming work is involved for the ordering. All the model development data inputs of a process are verified to be available before executing the process. The system 100 also validates the consistency of the process based on the model development process declarations.
A generated sequence of actions is interpreted and any model development process described by the declaration can be implemented. Any arbitrary development process can be implemented without procedural changes via execution of a fixed set of steps and based on the declarations. Therefore no coding and testing is needed when the model development process declaration is changed. Moreover, recognition modeling processes do not need recoding and retesting when new processors are added, when the process is changed or extended, or when new events have occurred.
As previously indicated, the programming component 102 facilitates the declaration of a processor and processor input/output data as a task description. The task description includes variables constants, macro declarations, processors and data-dependencies (e.g., hierarchical with local variables) and deliverables (e.g., terminal recognition components). This allows the author of the model development process to describe, in a declarative language, each piece of data and the processing step to obtain the data. The final model development outcome (deliverables) is a set of such data. The declaration contains information on the model development process, which allows the determination of the order of execution.
The system 200 further includes a compiler 204 for the 4D language. Based on the model development task description, the compiler 204 generates an initial model development process graph. In this graph, any data or processor (of the processors 202) is represented as a node, and the edges represent the dependencies.
A validation component 206 is provided validating that a graph qualifies as a data-dependence graph. Not all the graphs may qualify as a data dependence graph. Qualifying graphs verify certain requirements such as data should not depend on more than one processor. A detection component 208 detects cyclic graphs, which means that in the declaration there is an item that is defined onto itself.
The execution component 108 can include a generator component 210 for automatically generating the ordered list of processors so that for any given processor, when the processor is executed, all of the processor input data is available. This generation process can be as follows: find the nodes with no exiting edges, record these nodes in the ordered action list, delete the edges going from any node to these nodes, and repeat, unless all nodes are recorded.
An interpreter component 212 is provided that has access of the ordered list of processors. The interpreter component 212 examines the dates of data and determines the next processor(s) to be executed based on the date(s). The execution stops when the dates of all the data are consistent across the system.
The system 200 includes a processor registration process which ensures that a processor declared in the task description file is known to the system. The processors of the recognition model development process can be existing tools, executables, and scripts written in various languages, one or more of which can be wrapped into a consistent interface for activation by the runtime process interpreter component 212.
Conventional procedural code for the above processes could be the following:
If (A is missing, or B is missing)
If (A is changed, or B is changed, or D does not exist)
call process PA
Save the date and time of A and B
If (B is missing, or C is missing)
If (B is changed, or C is changed, or E does not exist)
call process PB
Save the date and time of B and C
If (D is missing, or E is missing, or C is missing)
If (D is changed, or E is changed, or C is changed, or F does not exist)
call process PC
Save the date and time of D and E
In contrast, following is a sample of declarations that can be employed to perform the same processes and obtain the same results.
LTS process PA:
(DATA A, DATA B, DATA C)
[call process PA]
LTS process PB:
(DATA C, DATA B, DATA A)
[call process PB]
LTS process PC:
(DATA D, DATA E, DATA C)
[call process PC]
For example, Data D depends on DATA A and DATA B, and is generated by process PA. No manual programming is involved. The compiler will order the tasks based on the dependencies among the data.
Clearly, a considerable amount of additional code is required procedurally to describe which processes need to be re-executed when/if some input data is changed. Code is also required to see if the input files are ready. A tool is required to parse this code file and build a model graph. According to the graph, once a file is updated, the necessary process is automatically activated until the output is generated.
Comparing the above procedural code to the declarative code below, it is easy to see that the declarative code is much clearer and is easier to maintain by the end-user.
<identifier> ‘[‘ <attributes> ’]’ ‘:’ ‘(‘<a b c ...>’)’ ‘(‘<i j k ...>’)’ ‘[algorithm]’
where <a b c . . . > is input data, including data to be converted and control parameters, <i j k . . . > is output data, and ‘algorithm’ defines the operations for generating the output data from the input data.
The model task description 512 is input to a 4D language compiler process 514 for compiling into an initial model development process graph 516. The initial graph is then validated 518 for all processors to ensure that each processor only has a single parent process. The validation process 518 creates a first verified graph 520. Next, a detection process analyzes the first verified graph 520 for cyclic processes (a process that refers back to itself). The output is a second verified graph 524 free of cyclic processes.
The build 500 also include a task wrapper 526 for wrapping tools, a tool library and other functions 528 for interfacing to processors 530. The processors of the speech model development process build 500 are existing tools, executables, and scripts written in various languages, which are wrapped into a consistent interface for activation by the run-time process interpreter. Processor registration 532 ensures that a processor declared in the task description 512 is known to the system. Terminal processors 534 are then created and input to a process generator 536, along with the second verified graph 524. The process generator determines logical dependencies between processors by exploring the data dependencies, applies formal methods, creates project files (in a project management application), and inserts a post-processor validation/check point. The methods include verifying type and data consistency (e.g., unspecified/over-specified data), ordering and parallelizing processors, and optimizing sequences. The project files can include schedules, linguist work and development and test work.
An output of the process generator 536 is the desired source data 538, which is passed to a runtime process interpreter 540. The interpreter 540 also receives model building dated data 542, access to a runtime DLL library 544, the second verified graph 524, and the terminal processors 534. Outputs of the interpreter 540 include the speech model components 546 and a log per processor instance 548.
The corresponding directed graph 604 for the existing code excludes the circled portion 606. Adding the processor above in bolded code and the input/output data mappings causes the generation of the new process in the circled portion 606 of the direct graph 604. Thus, the simplicity for applying new processes is now available to the naive end-user. Moreover, rather than adding new processors, a process can be changed or extended, and/or adjustments made when new events occur.
Following is a series of flow charts representative of exemplary methodologies for performing novel aspects of the disclosed architecture. While, for purposes of simplicity of explanation, the one or more methodologies shown herein, for example, in the form of a flow chart or flow diagram, are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance therewith, occur in a different order and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all acts illustrated in a methodology may be required for a novel implementation.
The application can do, based on the task declaration and given a set of output data, deduce all inputs that are needed by the processors to produce the output data (without actually running the processors).
The application can do, based on the task declaration and given a set of input data, deduce all possible output data that can be produced by the processors (without actually running the processors).
As used in this application, the terms “component” and “system” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.
Referring now to
Generally, program modules include routines, programs, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the inventive methods can be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, hand-held computing devices, microprocessor-based or programmable consumer electronics, and the like, each of which can be operatively coupled to one or more associated devices.
The illustrated aspects can also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules can be located in both local and remote memory storage devices.
A computer typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer and includes volatile and non-volatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media can comprise computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
With reference again to
The system bus 1008 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. The system memory 1006 can include non-volatile memory (NON-VOL) 1010 and/or volatile memory 1012 (e.g., random access memory (RAM)). A basic input/output system (BIOS) can be stored in the non-volatile memory 1010 (e.g., ROM, EPROM, EEPROM, etc.), which BIOS stores the basic routines that help to transfer information between elements within the computer 1002, such as during start-up. The volatile memory 1012 can also include a high-speed RAM such as static RAM for caching data.
The computer 1002 further includes an internal hard disk drive (HDD) 1014 (e.g., EIDE, SATA), which internal HDD 1014 may also be configured for external use in a suitable chassis, a magnetic floppy disk drive (FDD) 1016, (e.g., to read from or write to a removable diskette 1018) and an optical disk drive 1020, (e.g., reading a CD-ROM disk 1022 or, to read from or write to other high capacity optical media such as a DVD). The HDD 1014, FDD 1016 and optical disk drive 1020 can be connected to the system bus 1008 by a HDD interface 1024, an FDD interface 1026 and an optical drive interface 1028, respectively. The HDD interface 1024 for external drive implementations can include at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
The drives and associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For the computer 1002, the drives and media accommodate the storage of any data in a suitable digital format. Although the description of computer-readable media above refers to a HDD, a removable magnetic diskette (e.g., FDD), and a removable optical media such as a CD or DVD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and further, that any such media may contain computer-executable instructions for performing novel methods of the disclosed architecture.
A number of program modules can be stored in the drives and volatile memory 1012, including an operating system 1030, one or more application programs 1032, other program modules 1034, and program data 1036. The one or more application programs 1032, other program modules 1034, and program data 1036 can include the programming component 102, the recognition model build steps 104, the processors 202, the execution component 108, the recognition model actions 110, the compiler 204, the validation component 206, the detection component 208, the generator component 210, the interpreter component 212, and the build 500, for example.
All or portions of the operating system, applications, modules, and/or data can also be cached in the volatile memory 1012. It is to be appreciated that the disclosed architecture can be implemented with various commercially available operating systems or combinations of operating systems.
A user can enter commands and information into the computer 1002 through one or more wire/wireless input devices, for example, a keyboard 1038 and a pointing device, such as a mouse 1040. Other input devices (not shown) may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, touch screen, or the like. These and other input devices are often connected to the processing unit 1004 through an input device interface 1042 that is coupled to the system bus 1008, but can be connected by other interfaces such as a parallel port, IEEE 1394 serial port, a game port, a USB port, an IR interface, etc.
A monitor 1044 or other type of display device is also connected to the system bus 1008 via an interface, such as a video adaptor 1046. In addition to the monitor 1044, a computer typically includes other peripheral output devices (not shown), such as speakers, printers, etc.
The computer 1002 may operate in a networked environment using logical connections via wire and/or wireless communications to one or more remote computers, such as a remote computer(s) 1048. The remote computer(s) 1048 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 1002, although, for purposes of brevity, only a memory/storage device 1050 is illustrated. The logical connections depicted include wire/wireless connectivity to a local area network (LAN) 1052 and/or larger networks, for example, a wide area network (WAN) 1054. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet.
When used in a LAN networking environment, the computer 1002 is connected to the LAN 1052 through a wire and/or wireless communication network interface or adaptor 1056. The adaptor 1056 can facilitate wire and/or wireless communications to the LAN 1052, which may also include a wireless access point disposed thereon for communicating with the wireless functionality of the adaptor 1056.
When used in a WAN networking environment, the computer 1002 can include a modem 1058, or is connected to a communications server on the WAN 1054, or has other means for establishing communications over the WAN 1054, such as by way of the Internet. The modem 1058, which can be internal or external and a wire and/or wireless device, is connected to the system bus 1008 via the input device interface 1042. In a networked environment, program modules depicted relative to the computer 1002, or portions thereof, can be stored in the remote memory/storage device 1050. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
The computer 1002 is operable to communicate with wire and wireless devices or entities using the IEEE 802 family of standards, such as wireless devices operatively disposed in wireless communication (e.g., IEEE 802.11 over-the-air modulation techniques) with, for example, a printer, scanner, desktop and/or portable computer, personal digital assistant (PDA), communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone. This includes at least Wi-Fi (or Wireless Fidelity), WiMax, and Bluetooth™ wireless technologies. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices. Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wire networks (which use IEEE 802.3-related media and functions).
What has been described above includes examples of the disclosed architecture. It is, of course, not possible to describe every conceivable combination of components and/or methodologies, but one of ordinary skill in the art may recognize that many further combinations and permutations are possible. Accordingly, the novel architecture is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
|U.S. Classification||704/251, 704/E15.001|
|Cooperative Classification||G06K9/6253, G06F2209/485, G10L15/28, G06Q10/0637|
|European Classification||G06K9/62B5, G06Q10/0637, G10L15/28|
|Jan 9, 2008||AS||Assignment|
Owner name: MICROSOFT CORPORATION, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GONG, YIFAN;TIAN, YE;REEL/FRAME:020344/0616
Effective date: 20080104
|Dec 9, 2014||AS||Assignment|
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034542/0001
Effective date: 20141014
|May 26, 2015||FPAY||Fee payment|
Year of fee payment: 4