US 20070015121 A1
Language learning systems and methods may be provided. An interactive lesson module may be provided. The interactive lesson module may be configured to provide an interactive language lesson that prompts a user to repeat, translate, or define words or phrases, or to provide words corresponding to images, at a controllable difficulty level. An interactive social simulation module may be provided. The interactive social simulation module may be configured to provide an interactive environment that requires the user to use the language to communicate with a virtual character to achieve a goal at a controllable difficulty level. A learner model module may be provided. The learner model may be configured to control the difficulty level of the interactive language lesson and the interactive social simulation based on a lesson progress report and a simulation progress report.
1. A language learning system, comprising:
an interactive lesson module configured to:
provide an interactive language lesson that prompts a user to repeat, translate or define words or phrases, or to provide words corresponding to images, at a controllable difficulty level; and
generate a lesson progress report indicative of the success of the user's language learning based on the user's interaction during the interactive language lesson;
an interactive social simulation module configured to:
provide an interactive environment that requires the user to use the language to communicate with a virtual character to achieve a goal at a controllable difficulty level; and
generate a simulation progress report indicative of the success of the user's language learning based on the user's interaction with the interactive environment; and
a learner model module configured to:
receive the lesson progress report and the simulation progress report; and
control the difficulty level of the interactive language lesson and the interactive social simulation based on the lesson progress report and the simulation progress report.
2. A language learning system, comprising:
an interactive lesson module configured to:
provide an interactive language lesson that prompts a user to repeat, translate or define words or phrases, or to provide words corresponding to images; and
generate a lesson progress report indicative of the success of the user's language learning based on the user's interaction during the interactive language lesson; and
an interactive social simulation module configured to:
provide an interactive environment that requires the user to use the language to communicate with a virtual character to achieve a goal; and
generate a simulation progress report indicative of the success of the user's language learning based on the user's interaction with the interactive environment.
3. The language learning system of
4. The language learning system of
5. The language learning system of
allow the user to control the distance between the virtual character and another character in the simulation that is representative of the user; and
evaluate the distance between the virtual a character and the other character.
6. The language learning system of
7. The language learning system of
provide an interactive environment that simulates an environment of a specific culture; and
evaluate whether the gesture is consistent with social norms for that culture.
8. The language learning system of
9. The language learning system of
10. The language learning system of
11. The language learning system of
12. The language learning system of
13. The language learning system of
14. The language learning system of
15. The language learning system of
16. The language learning system of
17. The language learning system of
18. The language learning system of
19. The language learning system of
20. The language learning system of
21. The language learning system of
22. The learning system of
23. The language learning system of
24. The language learning system of
25. The language learning system of
26. The language learning system of
This application is based upon and claims priority to U.S. Provisional Patent Application Ser. No. 60/686,900, entitled “Tactical Language Training System,” filed Jun. 2, 2005, attorney docket number 28080-168, the entire content of which is incorporated herein by reference.
The application of communication skills, such as learning foreign languages and cultures, learning other skills where face to face communication plays a key role (including law enforcement and clinical practice), conducting plant safety inspections, and providing customer service.
2. Description of Related Art
Methods and products for teaching foreign languages are known. One such product is called Rosetta Stone. It presents images, spoken utterances, and written phrases, and has the user indicate which image matches which spoken utterance or phrase. It has some ability to generate feedback on the learner's speech, by presenting spectrograms of the learner's speech which the learner must then analyze and compare with spectrograms of native speakers.
Another product that is used to teach foreign languages is the TeLL me More product series. It includes lesson pages that present language material. It includes some structured dialog practice, where the learner hears an utterance and sees it in printed form, sees a set of possible responses (typically two to four), and selects one of the presented responses. The choices may not vary according to the learner's level of proficiency. This may differ from real conversation since, in real conversation, speakers are not given preset choices of things to say at each turn in the conversation, but instead may decide for themselves what to say and how to say it.
Virtual Conversations provides a form of conversational interaction. The product plays a video clip of a person speaking, and then presents a small set of written responses. The user can read one of the presented responses into the microphone, and if the system recognizes the user's speech, the system will play another video clip based upon that response.
The MILT prototype language learning system also supports a form of conversational interaction. MILT displays an on-screen character in a room or other environment. The user can speak a series of commands for the system to carry out, such as commands to walk forward, pick up an object, etc. In response the character can either carry out the command or reply indicating that it did not understand the command.
Interactive games like Herr Kommissar 1.5—emulates dialog with a computer character, via text. The game includes some language instruction, but presumes that the learner already has some ability in the language. The language instruction that is included interrupts the flow of the game, unlike in natural conversational interaction. However, it may not effectively train learners at different levels of proficiency, nor provide a means to measure the success of the learning effort.
Other systems such as MRE, and SASO, and VECTOR emulate conversations. MRE and SASO support unstructured conversational interaction within a specific task domain. VECTOR may not support conversational interaction, but may instead have the user select from a set of presented responses at each stage in the dialog.
Cocinella simulates conversation in a foreign language, where at each stage the learner can read from a presented set of possible responses or else recall the expected responses from memory. Interactive lessons may be limited to opportunities to practice the specific phrases used in the game dialog.
These systems may not adequately train the user in the foreign language. They may not keep the attention of the user, result in the user being able to readily transfer his or her training to a real-life environment, be well suited to learners at different proficiency levels, aid the learner in improving his or her pronunciation, and/or induce the learner to fully participate in the learning process.
Language learning systems and methods may be provided.
An interactive lesson module may be provided. The interactive lesson module may be configured to provide an interactive language lesson that prompts a user to repeat, translate or define words or phrases, or to provide words corresponding to images, at a controllable difficulty level. It may also be configured to generate a lesson progress report indicative of the success of the user's language learning based on the user's interaction during the interactive language lesson.
An interactive social simulation module may be provided. The interactive social simulation module may be configured to provide an interactive environment that requires the user to use the language to communicate with a virtual character to achieve a goal at a controllable difficulty level. It may also be configured to generate a simulation progress report indicative of the success of the user's language learning based on the user's interaction with the interactive environment.
A learner model module may be provided. The learner model may be configured to receive the lesson progress report and the simulation progress report. It may also be configured to control the difficulty level of the interactive language lesson and the interactive social simulation based on the lesson progress report and the simulation progress report.
These, as well as other components, steps, features, objects, benefits, and advantages, will now become clear from a review of the following detailed description of illustrative embodiments, the accompanying drawings, and the claims.
As will be described in further detail below, using embodiments, users gradually learn communicative skills for interacting with people who speak foreign languages or belong to foreign cultures. Communicative skills may include spoken language skills in foreign languages. They may also include knowledge of nonverbal communication modalities such as hand gestures and nonverbal vocalizations, as well as social norms and rules of politeness and etiquette governing conversational interaction in various settings.
A foreign language teaching device and method may be provided. Any foreign language may be taught, such as Spanish, French, Arabic, Chinese, English, and Pashto.
A foreign language that a user wants to learn is called herein a “target language” A language which the user has mastery is called herein a “native language.” A “user” may be a person learning a target language, or an instructor or trainer who is guiding, assisting, or facilitating a learning process. A “learner” is used herein to refer to users who are language learners, and an “instructor” is used herein to refer to users who are guiding or facilitating a learning process. A learner may be a child or an adult.
Learners may be beginner language learners, and may not have any prior language experience. Alternatively, a training device may be employed by learners with previous language training, including learners whom wish to conduct quick refresher training to maintain and improve their communicative skills.
Learners may learn through a combination of interactive lessons, social simulations, and/or other learning modalities. Interactive lessons may include structured presentations of vocabulary, phrases, and other specific communicative skills, as well as quizzes and exercises focusing on those skills. Social simulations may involve simulated conversations with interactive characters in a game or simulation context. Learners may receive continual feedback from a training system as they work with it. A teaching device may continually track a learner's mastery of each of a range of communicative skills, and may use this information to customize a learning experience.
Skills needed for particular tasks and situations may be taught. Vocabulary may be limited to what is required for specific situations, and may be gradually expanded through a series of increasingly challenging situations. Emphasis may be placed on oral proficiency.
Learners may practice their communication skills in a simulated village, where they may be required to develop rapport with local people, who in term may help them accomplish missions, such as post-war reconstruction. Other situations and environments may be modeled, such as restaurants, hotel reception desks, or medical offices.
Each learner may be accompanied by a virtual aide who can provide assistance and guidance if needed, tailored to each learner's individual skills. The aide may act as a virtual tutor as part of an intelligent tutoring system, giving the learner feedback on his performance. Learners may communicate via a multimodal interface, which may permit them to speak and choose gestures on behalf of their character in the simulation. The system may be configured to allow learners to communicate or say any of a range of things appropriate to that situation, rather than select from a fixed sets of choices.
Grammar may be introduced as needed to enable learners to generate and understand a sufficient variety of utterances to cope with novel situations. Nonverbal gestures (both “dos” and “don'ts”) may be introduced, as well as cultural norms of etiquette and politeness, to help learners accomplish social interaction tasks successfully.
A collection of authoring tools may be included which support the rapid creation of new task-oriented language learning environments, thus making it easier to support less commonly taught languages.
Instructional content may be organized using a skills model 3. The skills model 3 may be a hierarchical taxonomy of skills to be learned. Language skills, cultural skills, and task skills may be subsumed in the skills model 3. Both interactive lesson content and interactive game content may be annotated according to the skills that they train. This may help to maintain the coordination between the interactive lessons 1 and interactive social simulations 2, to ensure that skills employed in the interactive social simulations 2 are taught in the interactive lessons 1.
Instructional content 4 may be authored based on the skills to be covered. Interactive lessons 1 and interactive social simulations 2 may configured to cover the target the skill set. As the instructional content 4 is authored, it may be annotated to indicate what skills it covers.
The system may be configured to continually process a learner's input as a learner interacts with computer-based software, so that it can provide continual feedback 5. The feedback 5 may be appropriate to the learning context, e.g., feedback 5 during the interactive social simulations 2 may be different from that in the interactive lessons 1. But in any case, the feedback 5 may give learners immediate indications of how well they are employing their communicative skills.
All materials in the learning system 14 may be generated from a set of specifications of content specifications 20. The content specifications 20 may specify the structure, properties, and behavior of the interactive simulations in a user-friendly form so that these simulations can be authored, edited, and analyzed without knowledge of programming languages or program codes. The content specifications 20 may also be used in authoring, editing, and analyzing other aspects of the system, such as the interactive lesson 1 materials and the supplementary learning materials 19, to promote consistency between them.
The content specifications 20 may make reference to a skills model 3, as discussed above in connection with
This technology may be employed on any single or combination of computers and networked configurations. It may also be employed on other types of computing devices, such as game consoles.
A learner 41 may provide inputs to an interactive social simulation 2 by verbal behavior 43, nonverbal behavior 44, and/or other control actions 45 (e.g., to direct the learner's character to move in a particular direction in the game world). Not all types of input need be provided at all times. For example, nonverbal behavior may be omitted. Spoken inputs may also be used in place of control actions using a keyboard or mouse.
The verbal behavior 43 may take the form of speech. The learner 41 may speak into a microphone in the target foreign language. A speech recognizer 46 may then translate the input speech signal into an utterance hypothesis 49 in textual form. Alternatively, the verbal input may be entered by typing in text or selecting from a range of options via menus.
At the same or at a different time, the learner 41 may select nonverbal behavior 44 for his character, such as a hand gesture. A video camera and image processing capability may be provided to allow the learner 41 to act out the desired gesture. Alternatively, the learner 41 may select an appropriate gesture from a menu. The computer mouse 31 (shown in
The social simulation may include a game engine 47. This may include a 3D simulation of a milieu in which the user's character interacts with other characters. This may be implemented using a game engine (e.g., Unreal Engine, or Torque engine). For example, one version of Tactical Iraqi may utilize the Unreal Tournament 2003 game, and another version may utilize the Unreal Engine 2.5. 2D simulations are also permitted, or a sequence of still images. They may provide contexts in which to apply the communicative skills. Other devices such as telephones may provide sound-only interaction.
The game engine may provide control actions such as moving, turning, etc. It may input control actions 45 from the learner 41. For example, the current implementation of Tactical Iraqi inputs arrow keys into the game engine, and uses these to move and turn the player character.
A mission engine module 48 may control the characters in the game world, and determine their responses to the actions of the learner 41 and to other characters. An input manager 50 may interpret an utterance hypothesis 49 and nonverbal behavior 44 of the learner 41, and produce a learner communicative act description 51 that may describe the content of the utterance hypothesis 49 and the meaning of the nonverbal behaviors 44. Communicative acts may be similar to speech acts as commonly defined in linguistics and philosophy of language, but may allow for communication to occur through nonverbal means, as well as through speech. A social simulation engine 52 may then determine how each character in the game should respond to the learner's action.
The social simulation engine 52 may provide high-level control of characters and overall action in the game. For example, it may be used to control or manage the interaction as an interactive pedagogical drama. See Marsella, S., Johnson, W. L., & LaBore, C. (2003). An interactive pedagogical drama for health interventions. In U. Hoppe and F. Verdejo (Eds.), Artificial Intelligence in Education: Shaping the Future of Learning through Intelligent Technologies, pp. 341-348. Amsterdam: IOS Press. The content of all of these publications are incorporated herein by reference. It may for example be used to create interactive social simulations to teach clinical skills to health professionals, such as clinical psychologists. A character could play the role of a patient or caregiver, and the user could then play the role of a clinical psychologist, selecting things to say to the virtual patient or caregiver that would help her to overcome her problems. Carmen's Bright IDEAS, an interactive health intervention described in Marsella et al. (2003), provides a model, in which a virtual caregiver, Carmen, converses with a virtual counselor, Gina. The social simulation engine could allow psychologist trainee play the role of Gina, trying to get Carmen to reflect on her problems and develop options for solving them. Projects like Carmen's Bright IDEAS have identified and catalogued a number of common phrases that psychologists use in such consultations, which could be incorporated into the dialog of the social simulation. Social skills such as developing and maintaining rapport and interpreting nonverbal cues and body language may be relevant for such applications, and may be incorporated into the skills model 3 and learner model 18, just as they may be incorporated into language training applications (e.g., see
The social simulation engine 52 may have scenario logic 53 and agents 54. The scenario logic 53 may define what events occur in the simulated world, in response to other events or world states. The agents 54 may determine what actions non-player characters in the game perform.
Multiple non-player characters may be supported. This may allow the learner 41 to practice participating in complex multi-way conversations. Having additional characters may allow the learner 41 to see how other characters in the environment are reacting to current conversation; those characters may even jump into the conversation if they object to what the learner 41 or other characters are saying. This can result in a social simulation with a high degree of realism.
In order to make these decisions, the social simulation engine 52 may receive notifications of the current state of the simulated world as well as the status of previous actions (whether they have been completed or not) 55. Based upon this information, it may select behavior instructions 56 for each character to perform. An action scheduler 57 may implement these actions as a sequence of animations for the game characters to carry out. The game engine 47 may utilize video clips, in which case the action scheduler 57 may select video clips to play that closely match the behavior instructions. The game medium may only use audio, in which case the action scheduler 57 may select or compose a sound sequence that satisfies the behavior instructions 56. The action scheduler 57 may also monitor the state of the game world and of actions in progress, and pass this information to the social simulation engine 52.
As the learner 41 interacts with the social simulation engine 52, he may save data to event logs 59. The event logs 59 may record actions on the part of the learner 41, as well responses by characters and/or game world objects. The system also may save recordings 60 of the learner's speech or language as he/she interacts with the game. The recordings 60 may be used to evaluate the learner's performance, as well as train the speech recognizer 46 to improve recognition accuracy.
When learners communicate with on-screen characters, they may provide audio input, but they also may provide nonverbal information through a choice of gesture or the state of their own on-screen character (e.g., wearing sunglasses). The audio input may be passed through an speech recognizer 46 that may output an utterance hypothesis 49 in textual form. An utterance mapping function 65 may map the utterance hypothesis 49 into a parameterized communicative act 66. The parameterized communicative act 66 may identify the semantic category of the communication, e.g., where it is a greeting, a response to a greeting, an enquiry, an offer of information, etc. At this stage in the process the communicative act description may not capture all the differences between variants of the same speech act—e.g., differences in degree of informality (e.g., “How do you do” vs. “Hey there”), or differences in context (e.g., “Good morning” vs. “Good evening”). It may disregard variants in language that do not significantly change the communicative intent of the utterance, e.g., “What is you name?” vs. “Tell me your name.” It also may fail to capture the meaning of associated nonverbal information such as wearing sunglasses (which break eye contact, and therefore are considered rude in some cultures) and nonverbal gestures (bowing, placing your hand over your heart, and other emblematic gestures). Further processing may therefore be performed on the parameterized communicative act 66 to add parameters which may capture some of these other aspects of the meaning of the utterance.
The utterance hypothesis 49 and nonverbal behavior 44 may therefore be passed through an aggregation module 67, which may return context parameters 68 based on an interpretation of the utterance surface form in the given nonverbal and social context—this is where differences between alternative surface forms of a speech act may be captured. These parameters may be added to the learner communicative act description 51.
Utterances may contain references that are meaningless without proper context (e.g., when using pronouns) and these references may need to be resolved. Before being combined with the context parameters, the parameterized communicative act 66 may be passed into a discourse model 70, which may maintain a focus stack 71 and a dialogue history 72. The focus stack 71 may maintain a list of objects and topics referred to during the course of the conversation. These references may have been made verbally or nonverbally. For example, if the learner 63 points at an object in the simulated world, the target object may get added to the focus stack 71. The dialogue history 72 may contain a list of all earlier speech acts in the current conversation. The discourse model 70 may use these data structures as context for resolving any references in the current communicative act and update them in preparation for dealing with future communicative acts. For example, if the learner says “Where is he?” the discourse model 70 may refer to the focus stack 71 to determine which male person was most recently discussed. The communicative act with resolved references 73 and context parameters 68 may then be finally combined to yield the complete learner communicative act description 51, which may represent the unambiguous communicative intent that is sent to the social simulation engine 52.
The Input Manager may used in a variety of interactive games and simulations that may benefit from multimodal input. For example, role playing games such as Everquest allow users to control an animated character and communicate with other characters. The Input Manager may permit such applications to input a combination of gesture, and interpret them in a consistent way. It may allow the application developer to increase the repertoire of nonverbal communicative behaviors that the user may enter (e.g., hand waving, bowing, handshakes, etc.) and interpret them as instances of more general categories of communicative acts (greetings, acknowledgments, etc.). It may also allow the application to recognize and interpret in a consistent way those aspects of the user's utterances that pertain to social interaction and rapport, such as expressions of politeness and mitigation of face threat (see P. Brown & S. C. Levinson (1987). Politeness: Some Universals in Language Usage. New York: Cambridge University Press. The content of this publication is incorporated herein by reference). This in term may enhance the ability of the social simulation to model social interaction between the users and the computer characters in a variety of application areas.
The learner ability 76 and skills/missions 77 parameters may be processed by scenario logic 53 which may serve the role of a director that sets up and manages the scene. This scenario logic 53 may initialize the state of each character (also known as an agent) in the scene. This may include initializing the mental state of each character, e.g., the character's initial level of trust toward the learner. The scenario logic 53 may also select a personality profile for each character, which may determine how the character will react to actions by the learner and other characters. These parameters may depend upon the learner's level of ability. In particular, characters may be directed to be relatively tolerant of mistakes made by beginner learners, but relatively intolerant of mistakes by advanced players. Likewise, the characters may be directed to allow the learner an indefinite amount of response time, or to react if the learner fails to respond within an amount of time typical for spoken conversation.
During execution of the social simulation, learner communicative acts description 51 representing learner speech and gesture may get processed by a dialogue manager 78. The dialogue manager 78 may send these acts to an agent decision module 79 that may decide how nearby agents respond. A single decision module may make the decisions for all the nearby agents, or alternatively there may be a separate decision module instance for each agent.
To determine which agents can respond, the scenario logic 53 may place agents into conversation groups at creation time. The learner may then select an agent to speak to, e.g., by walking up to and facing a particular agent. The game engine may use a special indicator such as an arrow or highlighting to indicate which agent has been selected. As an example, in
When the learner 41 selects an agent to speak to, all agents belonging to the same conversation group may get a chance to respond. When the dialogue manager 78 receives the responses back from the agent decision module 79, it may order them according to the relevance to the learners original input (e.g. a direct answer to the learner's question is ranked higher than the start of a new topic) and in that sequence may pass the communicative acts from the agents 80 to a social puppet manager 81.
The dialogue manager 78 may also pass information about the updated agent states to the game engine 47 where it can be displayed in an interface element such as the graphical trust bars under the agents' corresponding portrait 12. Although PsychSim multi-agent system (see S. Marsella, D. V. Pynadath, & S. Read (2004). Agent-based modeling of social interactions and influence. In Proceedings of the International Conference on Cognitive Modeling, pp. 243-249. The content of this publication is incorporated herein by reference.) has been used as the decision module 79 in one embodiment, other implementations can be plugged in depending on the depth of reasoning required. For example, a customized finite state machine may be used in another embodiment.
The social simulation may be organized into a set of scenes or situations. For example, in one scene a group of agents might be sitting at a table in a café; in another situation an agent playing the role of policeman might be standing in a traffic police kiosk; in yet another scene an agent playing the role of sheikh might be sitting in his living room with his family. In each scene or situation each agent may have a repertoire of communicative acts available to it, appropriate to that scene. Some communicative acts are generic and applicable to a wide range of agents and situations. This might include greetings such as “Hello,” or “How are you?” or “My name is <agent's name>” (if English is the target language). Other communicative acts may be appropriate only to a specific situation, such as “I understand you are a member of a big tribe,” or “Is this Jassim il-Wardi's house?” These may be supplemented by generic phrases to employ when the agent didn't understand another agent's or user's communicative act, such as “Okay” or “What did you say?” or “Sorry, I don't speak English.” Each agent also may have a repertoire of communicative acts that it is prepared to respond to, including generic ones such as “What is your name?” or “Who is the leader in this district?”
The designer of the scene may provide each agent with a repertoire of communicative acts that it can perform and a repertoire of communicative acts that it can respond to, appropriate to that scene or situation. Generally the number of types of parameterized communicative acts may be much less than the number of concrete utterances. For example, “Hello!” and “Hey there!” may both be considered instances of greet speech acts. “I'm Mike” and “My name is Mike” are both instances of inform speech acts, where the object of the inform act is the agent's name. Agents may respond to similar speech acts in similar ways, reducing the complexity of dialog management for each agent.
These similarities may also be exploited to reduce the range of utterances which the speech recognizer 46 (
Other characteristics of the scene and of the learning content may be exploited to reduce the complexity of the agents while retaining the impression of robust dialog. If it is expected that the user is a beginner language learner, one can limit the range of communicative acts that the agents are prepared to respond to, under the assumption that the learners will know how to say only a limited range of utterances. For some minor characters the repertoire may be quite small, for example an agent playing the role of waiter may have little to say other than “Please take a seat, I will be with you shortly.” Limiting the range of communicative acts makes it easy to populate a game world with large numbers of simple agents.
For agents with more significant roles, the decision module 79 may choose appropriate communicative act responses to a wide range of input utterances. The dialog may be organized as a set of utterance-response pairs, or “beats.” The decision module may then manage the dialog by determining which beats are appropriate at a given point in the dialog. Some utterance-response pairs may be generically appropriate at any time during the conversation. For example, if the input utterance is “What's your name?” then the agent's response might be “My name is Mike” regardless of when the user asks the question. Some utterance-response pairs may be appropriate only after certain events have occurred, or when certain states hold. For example, if the user asks “Where is the leader of this district?” the agent might respond with the name only if the agent's level of trust of the user is sufficiently high. The decision module 79 may therefore keep track of states or context changes 86 in order to determine which responses are appropriate in the current situation. The selection of appropriate responses may then be performed via finite-state machines whose transitions are may be conditioned on state or context. They may also be chosen using production rules that are conditioned on the current state. Other dialogue modeling methods such as partially observable Markov decision processes may be used.
The behavior of the virtual aide 91 may be driven from two agent models, one representing the aide's own role in the game and one representing the learner's role in the game. Based on the model of the aide's own role in the game, the decision module 79 (
As shown in
Many communicative behaviors can be performed by characters in a range of different situations, but it is the dialog context that may give them communicative function. For example, a character may stand up for various reasons, and may face and gaze at a variety of objects. The social puppets 82 may utilize the character bodies' repertoire of behaviors to perform actions which the user will interpret as communicative in nature.
Social puppets 82 may also generate a nonverbal reaction to events other than speech. This may be possible if information about the status of various actions and the state of the game world 85 is being routed directly to the social puppet manager 81. The social puppet manager 81 may look to see if those events have any communicative function, and if so, asks the social puppets 82 to react according to their social rules. For instance, if the learner approaches a group of puppets, they need to demonstrate a reaction that reveals something about their willingness to interact. The approach event triggers a reaction rule that generates visible behavior, taking into account the context that the scenario logic 53 has supplied.
At any stage in the agent behavior generation, the scenario logic 53 may intervene and implement puppet behavior or changes in the game world that are tailored to the specific scene. The scenario logic 53 may affect the game world directly, or it may influence agents and puppets by changing their contextual parameters at run-time (such as affecting agent trust).
During the course of the game an objectives tracker 87 (
As the learner engages in the social simulation, the objectives tracker 87 may note when the learner employs particular skills, and use this information to update 88 the learner model 18, updating its estimates that those skills have been mastered. The social simulation may then make available to the learner a skill map 74 which summarizes the skills required to play the game scene successfully, and the learner's current degree of mastery of those skills. This may employ learner model update mechanisms similar to those used in the interactive lessons, as well as the skills model, both of which are described in further detail below.
The scenario logic 53 may terminate the mission with a success debriefing if all game objectives have been met and with a failure debriefing if it detects a failure condition. Further summaries of learner performance during the scene may be provided at that time.
The communicative function 83 shown in
As shown in
A social puppet 82 may generate a communicative behavior description 102 that realizes the communicative function 101. The communicative behavior description 102 may specify a set of individual movements and actions, which may include: (1) head movements, (2) movement of the torso, (3) facial expressions or other movements of facial muscles, (4) gaze actions which may involve coordinated movement of the eyes, neck, and head direction, indicating where the character is looking, (5) movements of the legs and feet, (6) gestures, involving coordinated movement of arms and hands, (7) speech, which may include verbal and paraverbal behavior, and/or (8) lip movements. These communicative behavior descriptions 102 may be specified in Behavioral Markup Language (BML), or they may be realized in some other embodied conversational agent behavior description language such as MURML or ASL. See S. Kopp, B. Krenn, S. Marsella, A. Marshall, C. Pelachaud, H. Pirker, K. Thórisson, H. Vilhjalmsson (2006). Towards a common framework for multimodal generation in ECAs: The Behavior Markup Language. In 2006 Conference on Intelligent Virtual Agents, in press. The entire content of these references are incorporated herein by reference.
The translation from communicative functions 101 to communicative behaviors descriptions 102 may depend upon the agent's context. A puppet context 103 may record the particular set of features in the world and agent state which are relevant for selecting appropriate behaviors. The puppet context 103 may include information about the agent's attitude (e.g., content, neutral, annoyed), the agent's body configuration (e.g., sitting, standing, crouching), and/or the current activity (e.g., conversation, eating, reading, changing tires, etc.). These context features may be easily extended to capture other relevant aspects of context. The puppet also may receive notifications of state or context changes 86 that occur in the surrounding environment and that may influence the choice of communicative behaviors.
Given the desired communicative function, the social puppet 82 may select or construct a behavior description that is appropriate for the current context. This may be achieved using FML to BML mapping rules 104, or some other set of rules or procedures. For example, if the agent's attitude is respectful, an FML to BML mapping rule may select a respectful gesture such as placing the hand over the heart to accompany a response to a greeting. If however the agent's attitude is suspicious, an FML to BML mapping rule may select a standoffish gesture such as folding the arms instead.
The following are some examples of rules that may be used to select communicative behaviors in different situations. The player character may walk toward a non-player character. When the player reaches a certain distance from the non-player character, this may signal a state or context change 86, indicating that the player is close enough to start a conversation. The scenario logic 53 shown in
Once the social puppet 82 is done generating behaviors and aligning them with their semantic units, it may combine them into a schedule of actions to be performed. These may then be passed to the action scheduler 57. The action scheduler 57 may start execution of each element, behavior by behavior.
If the action schedule is specified in BML or some other structured representation, the action scheduler 57 may compile the specification into a directed acyclic graph whose nodes are the primitive behavior elements and the arcs are the temporal dependencies between elements. The action scheduler 57 then may execute the specification by progressively dequeueing elements from the directed acyclic graph and sending them to the game engine for execution. If the element fails to execute successfully, a failure action may be activated or the overall behavior may be aborted; otherwise if the element completes, the pending actions are checked, and if another action depends upon the completed action and is not waiting for other elements to complete, it may be activated. The process may continue until all component elements are complete or otherwise have been disposed of, at which point the scenario logic 53 shown in
The separation between communicative function and communicative behavior, and the use of mapping rules to define the realization of communicative functions, may enable multidisciplinary teams to author content. An animator may create a repertoire of basic animation elements, and then a cultural expert or other content expert may use an authoring tool to select behaviors to realize a particular communicative function in a particular context, e.g., to choose gaze aversion behaviors for Afghan women characters as realizations of show-recognition communicative intents. Programmer effort may be unnecessary in order to create animated characters with believable interactive communicative behaviors.
The skill builder manager 106 may select lesson page descriptions from a skill builder file 107, which may encode the content of each lesson and lesson page. The skill builder file 107 may be the lesson content specification file created during the authoring process. Alternatively, the lesson content may be compiled into binary form and loaded into a teaching device, either as part of the same program or as a separate database. Alternatively, the lesson content may reside on a separate server, and be downloaded over a network on demand.
Lesson content may consist of a set of lesson pages, each of which may be an instance of a lesson page template. The set of lesson page templates may be extensible. Page templates may include:
A Lesson Display module 108 may display the lesson pages. It also may display the learner's progress in mastering the skills covered in the lesson material.
Additional modules may be employed to implement particular types of lesson pages. The example dialog pages and active dialog pages may require a video player 109 if the dialogs are presented using recorded video, and an animation player if the dialogs are presented using animations. The skill builder 1 may make use of same action scheduler 57 and game engine 47 used in the social simulation.
A pedagogical agent 110 may be employed to evaluate learner performance in the lesson pages, particularly the vocabulary page, and generate feedback. When enabled, it may be invoked on each learner's speech input to a vocabulary page. It may evaluate the quality of the learner's speech, may identify the most significant error, and may generate feedback that informs the learner of the nature of the error and aims to encourage and motivate as appropriate. Alternatively, the skill builder manager 106 may process some user responses and generate feedback itself.
The skill builder 1 may access and update a learner model 18, based upon the learner's performance in the lessons. A learner model update module 111 may continually update the current estimates of learner mastery of each skill, based on learner performance on each page. It then may periodically save the updates to the learner model 18.
The learner model update module 111 may utilize a Bayesian knowledge tracing algorithm that computes estimates of mastery statistically, similar to the knowledge tracing method of Beck and Sison. Beck, J. and Sison, J. (2004). Using knowledge tracing to measure student reading proficiencies. In Proceedings of ITS 2004. In Proceedings of the 2004 Conference on Intelligent Tutoring Systems, 624-634 (Berlin: Springer-Verlag). The entire content of this publication is incorporated herein by reference. Each correct learner speech input may be regarded as uncertain evidence that the learner has mastered the skills associated with that item, and incorrect learner speech input may be regarded as uncertain evidence that the learner has failed to master those skills. The Beck and Sison method may not apply precisely, since the Beck and Sison method applies to reading skills, in particular grapheme to phoneme translations, whereas the Learner Model Update module may apply to communicative skills generally, and applies to foreign language skills. Moreover, it may use a wide range of learner inputs and not just speech input. Once properly calibrated with appropriate prior probabilities, the learner model update module 111 may provide accurate and up-to-date assessments of learner proficiency that work well with beginner language learners.
As the learner 41 interacts with the skill builder 1, learner actions may be recorded in an event log 59 and learner speech samples may be saved in a database of recordings 60. These may be used to evaluate system performance and learner outcomes. In fact, in one possible method of employing the skill builder 1, the speech recognizer 46 may be disabled and the skill builder 1 may be used to record samples of learner speech, which can then be employed to train the speech recognizer. This may be appropriate at early stages of development of language training systems, when a trained speech recognizer for the target language has not yet been developed.
The skill builder 1 may be implemented using the same game engine as is used for the social simulations. This makes it possible for learners to switch quickly and easily between the interactive lessons and the social simulations. This in turn may encourage learners to apply the skills that they acquire in the skill builder 1 in the social simulations, and refer to the relevant skill builder lessons to help them make progress in the social simulation games.
A speech recognizer 46 may take as input a start/stop signal 112, which signals when to start recognition and when to stop recognition. The start/stop signal 112 may be generated by clicking a button on the graphical user interface, or may be produced by some other signaling device. Between the start and stop signal, the speech recognizer 46 processes the speech signal 42 from the user's microphone. It may process the speech signal as the user speaks it, or it may first record the user's speech as a sound file and then process the sound file. Either way, a recording 113 may be created, which may be stored in a file of recordings 60 on the user's computer or on a remote server.
The speech recognizer 46 may operate using a non-native acoustic model 114, i.e., an acoustic model of the target language that is customized to recognize the speech of non-native speakers of the target language. This customization may be performed by training the acoustic model on a combination of native and non-native speech. Alternatively, the properties of non-native speech may be used to bias or adjust an acoustic model that has been trained on native speech. Different acoustic models may be used in the interactive lessons and social simulations, and even in different parts of each, in order to maximize robustness of recognition. For example, the acoustic model used in the social simulation may be trained on poorly pronounced non-native speech, to ensure that learners with poor pronunciation are able to play the game. In the contrast, the acoustic model used in advanced lessons may be trained on well-pronounced non-native speech and native speech, and therefore less tolerant of learner error and able to discriminate learner errors. A recognition mode indicator 116 may be used to indicate which acoustic model to use.
The speech recognizer may use a language model 115 to determine which phrases to recognize. Context-free recognition grammars may be used; alternatively, n-gram language models may be used. The language models may be tailored to the particular context in which recognition will be used. For example, in the social simulations a set of language models may be built, each tailored to recognize the particular repertoire of communicative acts that are expected to arise in each scene. In the interactive lessons recognition grammars may be created from the sets of words and phrases that occur in groups of lesson pages. The size of the group of words and phrases may depend upon the desired tolerance of learner error, since increasing the grammar size generally reduces the tolerance of pronunciation errors. Grammars containing specific classes of language errors may also be used, in order to help detect those classes of errors. This technique may be used both to detect pronunciation errors and other types of errors such as grammatical errors. For example, common mispronunciations of the Arabic pharyngeal fricative consonant /H/ can be detected by taking words that incorporate that consonant, e.g., /marHaba/ (an informal way of saying “hello”), and creating a recognition grammar that includes the correctly pronounced word as well as common mispronunciations such as /marhaba/ and /markhaba/. Then if a learner mispronounces the word in one of these ways, the speech recognizer may be able to detect it.
For each speech input, the speech recognizer may output the most likely utterance hypothesis 49, in textual form. The speech recognizer may also input the level of confidence 117 of the recognition. The skill builder manager 106 shown in
For some lesson items, such as vocabulary page items and memory page items, there may be just one expected correct answer; if, for example, an item is a vocabulary item introducing the Arabic word /marHaba/, there is only one expected correct response. For some items such as utterance formation page items, there may be multiple possible correct responses. For example, consider an utterance formation page in Tactical Iraqi, where the prompt is as follows: “Hamid just introduced himself to you. Respond to him by saying that you are honored to meet him.” Multiple Iraqi Arabic responses may be permissible, including “tsherrafna,” “tsherrafna seyyid Hamiid,” or “tsherrafna ya seyyid.” In such cases a set of possible correct responses may be included among the expected inputs 119. For some lesson items a wide range of correct responses may be possible, in which case a pattern or description characterizing the set of possible correct responses may be provided, or even a procedure for generating possible correct responses or for testing individual responses to determine whether or not they are correct. Also, a language model 120, with knowledge of the structure of the target language and/or common errors that language learners make, may be used at authoring time to generate possible correct alternative responses 121.
Likewise, the expected inputs 119 may include possible incorrect responses, patterns or descriptions of expected incorrect responses, or procedures for generating incorrect responses. The language model 120 may be used to generate possible incorrect responses as well. The pedagogical agent 110 may further assume that any input that is not explicitly designated as correct or incorrect may be presumed incorrect.
The learner input 118 and expected inputs 119 may be passed to an error analyzer module 122. The error analyzer module 122 may evaluate the learner's input to identify specific errors committed by the learner, and may select one or more errors to focus on in producing feedback. This evaluation may involve classifying the learner's error, and matching it against known classes of learner error. As an example, suppose that the learner was prompted to say /marHaba/ (with the voiceless pharyngeal fricative /H/), and instead says /marhaba/ (with the voiceless glottal transition /h/ instead). This is an instance of a common class of pronunciation errors committed by English-speaking learners of Arabic: to substitute /h/ for /H/. Classifying the error in this case thus might analyze this error as an instance of /H/->/h/ phoneme substitution. This classification process may be assisted by an error database 123, listing severe language errors commonly made by language learners, with their frequency. This database in turn may be produced through an analysis of samples of learner speech.
If this process yields a set of error classes, the error analyzer may then select the error class or classes that should serve as the focus of instruction. This may take into account the confidence rating provided by the speech recognizer; specific feedback on particular learner errors may be inadvisable if the confidence that the error has in fact been detected is low. Confidence may be boosted if the learner model 18 shows that the learner has a history of making this particular error. If an utterance exhibits multiple errors, then the error analyzer module 122 may select an error to focus on based upon its degree of severity. Native listeners judge some language errors to be more severe than others; for example errors that can lead to confusions between words tend to be regarded as highly severe. If the error database 123 includes information about the relative severity of errors, this can then be used to prioritize among errors.
As errors are detected, or as the learner demonstrates the ability to generate responses without errors, this information may be used to update the learner model. Error instances may be added to the history of learner performance. Moreover, each instance of correct or incorrect performance may serve as probabilistic evidence for the mastery of particular language skills, or the lack thereof. The confidence level provided by the speech recognizer may further be used to adjust the probability that an instance of correct or incorrect language performance was in fact observed. This evidence and confidence may be used in a Bayesian network or other probabilistic model of skill, where the probabilities that the individual responses were or were not correct propagate back through the network to produce probabilities that the underlying skills were or were not mastered.
Once an error has been detected and chosen, or no error has been found, an immediate feedback model 124 may determine what response to give to the learner. It may select a feedback message from a feedback database 125. The feedback database 125 may include a collection of tutoring tactics commonly employed by language tutors, and/or specific feedback tactics recommended by a lesson author to use in response to particular errors. The immediate feedback model 124 may also take into account the learner's history of making particular errors, noting for example when the learner pronounces a word correctly after multiple failed attempts. The immediate feedback model 124 may also take into account the learner's profile, in particular the learner's general skill at language learning and self-confidence. The feedback messages may be chosen and phrased in order to mitigate direct criticism. See W. L. Johnson, S. Wu, & Y. Nouhi (2004). Socially intelligent pronunciation feedback for second language learning. In Proceedings of the Workshop on Social and Emotional Intelligence in Learning Environments at the 2004 International Conference on Intelligent Tutoring Systems. Available at http://www.cogsci.ed.ac.uk/˜kaska/WorkshopSi. The entire content of this publication is incorporated herein by reference.
Once the immediate feedback model 124 has chosen a feedback message to give to the learner, may be output. This output may be realized in any of a variety of modalities, including text, a voice recording, a synthesized voice, a video recording, an animation coupled with text, or animation coupled with voice.
Skills can be used to annotate the content in all parts of the system (interactive lessons 1, interactive social simulations 2, other interactive games 17, etc.). This way the different content elements are explicitly linked to all the skills they either teach or practice. Specifically:
If the social interaction specifications or interactive lesson specifications are formatted in XML, then the link or annotation may be made by adding attributes or tags to XML files that specify that content. For example, the following is an excerpt of a skills builder XML specification that shows how a skill may be annotated to be exercised in a specific page:
There are many elements in how to make effective links from content to the skills model. Some strategies that may be employed include:
The skills model 3 may be used to customize lessons based on learner skills. For example, a remedial lesson may be dynamically put together at run time to address skills that the learner has shown to have problems with in the interactive social simulations 2. This may be done by using a simple algorithm that walks though the interactive lesson specification 127 and extracts the pages that have the specific skill annotated. This may also be done by using a more complex algorithm would take into consideration the performance on pre-requisite skills and assemble necessary material also for those pre-requisite skills where the learner is not performing well enough.
Skills may be used to customize lessons based on learner objectives. For example, a given embodiment of the invention may have content about many different professions. The system may ask a learner what professions he/she is interested in learning, and tailor the lesson accordingly by selecting the material with the relevant skills for those professions. This allows skills to work as a content modularization mechanism.
A skill 130 may have an ID 131 and/or a name 132. Specific usages may choose to use the same string as name and ID, or to use the name as a unique identification.
A skill 130 may have zero or more parent skills 133, specified by their IDs and optionally their names. A skill may have multiple parent skills. In order to facilitate display in a strict taxonomy or tree format, the link to a parent skill may also specify that that specific parent skill is to be considered to be the (only) primary parent 134 (as opposed to all other skills, which will be secondary parents).
A skill may have types 135. A skill may be restricted to have only one type. The following are types used so far in the system; other types are possible. Types are displayed in a hierarchy but referenced in the skill specification as a single name.
Other linkages between skills may include one or more optional pre-requisite 136 skills, that is, a skill that is recommended to be learned before the skill being specified.
The details of each skill may be specified by parameters such as:
Standard and condition are elements borrowed from the structure of Enabling Learning Objectives, as used by instructional designers in the US military and elsewhere. See R. F. Mager (1984). Preparing Instructional Objectives. Belmont, Calif.: Pitman Management and Training. The entire content of this publication is incorporated herein by reference.
Difficulty and importance are relative concepts, because they depend on the specific learner (what is easy for some is hard for others, and a skill may be important for a medic but not a builder). These attributes may be used as a “default” or “average” for an implied or explicitly defined audience. The values of these attributes may be adjusted based on learner models that make clear the starting point from different groups of learners (e.g., someone who speaks Dutch would probably find it easier to pronounce German than someone who only speaks English).
Other reference materials may be created, including printed materials, and subsets of the content for other platforms (e.g., a subset of the skill builder 1 on a web application).
The components, steps, features, objects, benefits and advantages that have been discussed are merely illustrative. None of them, nor the discussions relating to them, are intended to limit the scope of protection in any way. Numerous other embodiments are also contemplated, including embodiments that have fewer, additional, and/or different components, steps, features, objects, benefits and advantages. The components and steps may also be arranged and ordered differently.
In short, the scope of protection is limited solely by the claims that now follow. That scope is intended to be as broad as is reasonably consistent with the language that is used in the claims and to encompass all structural and functional equivalents. Nothing that has been stated or illustrated is intended to cause a dedication of any component, step, feature, object, benefit, advantage, or equivalent to the public, regardless of whether it is recited in the claims.
The phrase “means for” when used in a claim embraces the corresponding structure and materials that have been described and their equivalents. Similarly, the phrase “step for” when used in a claim embraces the corresponding acts that have been described and their equivalents. The absence of these phrases means that the claim is not limited to any corresponding structures, materials, or acts.