US20050049860A1 - Method and apparatus for improved speech recognition with supplementary information - Google Patents
Method and apparatus for improved speech recognition with supplementary information Download PDFInfo
- Publication number
- US20050049860A1 US20050049860A1 US10/652,146 US65214603A US2005049860A1 US 20050049860 A1 US20050049860 A1 US 20050049860A1 US 65214603 A US65214603 A US 65214603A US 2005049860 A1 US2005049860 A1 US 2005049860A1
- Authority
- US
- United States
- Prior art keywords
- supplementary data
- user
- speech
- input
- input speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/26—Devices for calling a subscriber
- H04M1/27—Devices whereby a plurality of signals may be stored simultaneously
- H04M1/271—Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
Definitions
- the present invention relates to voice dialing systems, and more particularly to improving the performance of voice dialing systems.
- Voice dialing systems require speech recognition capabilities to process voice commands.
- Speech recognition capabilities may be implemented in a mobile phone to allow a user to easily dial a phone number. For example, the user may initiate a call to a contact in an address book on the mobile phone by saying the name of the contact. Speech recognition allows the voice dialing system to process the name and automatically dial the correct number.
- voice dialing systems may implement a system wherein a confidence value is assigned to the voice input of the user.
- the confidence value indicates the presumed accuracy of the intended contact as determined by the voice dialing system.
- a low confidence value may indicate that further measures are necessary in order to dial the correct number.
- the voice dialing system may require the user to restate the name of the intended contact.
- the role of the confidence value might be played by a measure whose value would be low when speech is well-recognized, and high when misunderstanding is probable. This could be called an “uncertainty measure.”
- a high value of the uncertainty measure might indicate that further measures are necessary to dial the correct number.
- they play the same kind of role in the system.
- a method for improving recognition results of a speech recognizer at a remote location comprises receiving input speech from a user at the remote location.
- One or more candidate matches are determined based on the input speech.
- the one or more candidate matches represent the whole list of possible candidates, and such list is sorted according to the input speech.
- the user is prompted for supplementary data associated with the input speech.
- the supplementary data is received from the user.
- One of the one or more candidate matches is selected based on the input speech and the supplementary data.
- a method for improving recognition results of a speech recognizer in an electronic device comprises receiving input speech from a user at the device.
- the input speech is interpreted at a speech recognizer.
- One or more candidate entries from a plurality of candidate entries are determined based on the input speech.
- the one or more candidate matches represent the whole list of possible candidates, and such list is sorted according to the input speech.
- a confidence measure is generated for the one or more candidate entries based on the input speech.
- the user is prompted for supplementary data associated with the input speech if the confidence measure is below a threshold.
- the supplementary data is received from the user at the device.
- One of the one or more candidate entries is selected based on the input speech and the supplementary data.
- a system for directing telephone calls based on input speech comprises a speech recognizer that receives input speech from a remote user.
- a database includes a plurality of entries.
- a controller communicates with the speech recognizer and the database to select one or more candidate entries from the plurality of entries based on the input speech.
- the controller determines supplementary data based on ambiguities between the input speech and the one or more candidate entries and prompts the user for the supplementary data.
- FIG. 1 is a flow diagram of a voice dialing system according to the present invention
- FIG. 2 is a functional block diagram of a mobile device according to the present invention.
- FIG. 3 is a flow diagram of a voice dialing system incorporating a history-based confidence measure according to the present invention.
- FIG. 4 is functional block diagram of an automated switchboard according to the present invention.
- a user initiates a voice dialing algorithm 10 by speaking a contact name at step 12 .
- the voice dialing system 10 interprets the input contact name and associates the input contact name with a contact name in an address book.
- the voice dialing system 10 determines a confidence measure of the input contact name at step 14 .
- the voice dialing system determines if the confidence measure is above a threshold at 16 . If the confidence measure is high, the voice dialing system 10 acts on the contact name and dials the phone number of the contact at step 18 .
- the voice dialing system 10 may require that the confidence measure is above a specific threshold, such as 60%.
- the threshold may be predetermined or be modifiable by the user.
- the voice dialing system 10 asks the user to confirm that the spoken contact name was interpreted correctly at step 20 .
- the voice dialing system 10 may repeat the contact name and ask the user to reply “yes” or “no.” If the user says “yes,” then the voice dialing system 10 may proceed with the phone call and dial the number of the confirmed contact name at step 18 . If the user says “no,” the voice dialing system 10 requires the user to enter supplementary information at step 22 . Alternatively, if the confidence measure is below the minimum threshold, the voice dialing system 10 may omit step 20 and move directly to step 22 . The system 10 may ask the user to enter the supplementary information using a keypad and/or voice commands.
- the voice dialing system 10 may require the user to speak the initials of the intended contact, or to enter the initials using the keypad.
- the user may enter supplementary information using other suitable methods, such as a mouse, touchpad, touchscreen, or stylus pen.
- the voice dialing system 10 may require that the user enter keypad information prior to asking for input speech at step 12 .
- the voice dialing system 10 may interpret the input speech according to constraints defined by the keypad input.
- the voice dialing system 10 processes the supplementary information to determine the correct contact name, and thereafter proceeds to step 18 . If the supplementary information is not sufficient to determine the correct contact name, further actions may be necessary. For example, the voice dialing system 10 may return to step 22 to request additional supplementary information. In another embodiment, the voice dialing system 10 may return to step 12 and require the user to restate the intended contact name. In another embodiment, the voice dialing system 10 may not be able to correctly determine the input speech. In this case, the voice dialing system 10 may direct the user to an operator for further assistance. For example, the voice dialing system 10 may increment and check a counter at step 26 if the supplementary information is not sufficient to determine a contact. If the counter has not reached a predetermined setpoint, the voice dialing system 10 may continue to request supplementary information at step 22 . If the counter has reached the setpoint, the voice dialing system 10 may direct the user to an operator at step 28 .
- the voice dialing system 10 determines what supplementary information to request according to ambiguities of the initial spoken contact name that is input at step 12 . For example, if the intended contact name is “John Smith,” and there are numerous entries in the address book with the initials “J” and “S,” then requesting the user to input initials would be ineffective. In this instance, the voice dialing system 10 may require the user to enter other supplementary information such as the first three letters of the last name of the intended contact. Alternatively, the voice dialing system 10 may require the user enter an area code of the intended contact.
- the supplementary information that the voice dialing system 10 requires at step 22 is minimized. In other words, the voice dialing system 10 will request as little information as possible in order to confirm the intended contact name and proceed with the call. The voice dialing system 10 will not request the supplementary information if the initial confidence measure determined at step 14 was sufficient. If the voice dialing system 10 requires keypad inputs, the voice dialing system 10 will request the minimum number of key presses necessary to distinguish the intended contact name from an N-best list of contact names. For example, if the first three letters of both the intended contact and a similar entry are “smi,” the voice dialing system 10 may ask the user to input the first four letters of the intended contact.
- the voice dialing system 10 may simply ask the user to input the contact name using the keypad, and then automatically select the correct contact as the keys are entered. In other words, if the user begins to input supplementary information, the voice dialing system 10 may automatically select the correct contact name as soon as sufficient information has been entered.
- the voice dialing system 10 may also dynamically request a different type of supplementary information as the user inputs previously requested information.
- the voice dialing system 10 may compensate for garbled or distorted input speech.
- a speech recognizer may initially interpret the input speech incorrectly. After the user inputs supplementary information, the input speech may be reinterpreted by the speech recognizer. The speech recognizer interprets the input speech within the constraints defined by the supplementary information. In this manner, the voice dialing system 10 may also compensate for mispronunciations.
- the voice dialing system 10 may compensate for misspellings or typographical errors in the user's manual input. For example, the voice dialing system 10 may determine that the supplementary information is not consistent with a contact list or database. The system 10 may thus include an algorithm to determine an approximate matching between the supplementary information and the database or contact list, and eventually considering the input speech.
- a mobile device 30 incorporating the voice dialing system 10 is shown in FIG. 2 .
- the mobile device may be a mobile phone, PDA, or other suitable device.
- a user speaks a contact name, or other audio input, into an audio input mechanism 32 of the mobile device 30 .
- a speech recognizer 34 interprets the audio input.
- a controller 36 compares the audio input as interpreted by the recognizer 34 to a database of contact names 38 . The controller 36 generates a confidence measure based on the comparison.
- the controller 36 dials the number of the contact. If the confidence measure is not greater than the threshold, the controller 36 requests a “yes” or “no” verification from the user. If the user does not answer “yes,” the controller 36 determines what supplementary information to request from the user. In other words, the controller 36 determines desired supplementary information according to potential ambiguities between the intended contact name and the interpreted contact name. Supplementary information includes, but is not limited to, spelling, initials, and area code. The desired supplementary information may be conveyed to the user visually at a display 40 or at an audio output mechanism or speaker 42 . The user may enter the supplementary information verbally into the audio input mechanism 32 , or manually at a keypad 44 .
- the controller 36 determines the correct contact name based on the supplementary information. Alternatively, the controller 36 may forego the step of requesting a “yes” or “no” verification. For example, if the confidence measure is greater than a second threshold, the controller 36 may determine that a particular contact name is correct and automatically dial the corresponding number.
- the user may apply the present invention to other applications.
- the user may request navigation information.
- the user inputs a location or place name into the audio input mechanism 32 .
- the speech recognizer 34 interprets the audio input.
- the controller 36 compares the audio input as interpreted by the recognizer 34 to a list of locations in the database 38 .
- the controller 36 may request supplementary information relevant to navigation. For example, if the user input a city name, the controller 36 may request a state abbreviation if the city name was found in more than one state.
- the user may request information about a company by speaking a company name into the audio input mechanism 32 .
- the controller 36 may ask for supplementary information such as a stock abbreviation of the company.
- the user may request email or voicemail from a specific source.
- a history-based voice dialing system 50 may incorporate a history-based confidence measure as shown in FIG. 3 . Certain names and other spoken inputs may be misrecognized more often than others. Additionally, speech from certain callers and/or speakers may be more difficult to recognize. For example, the user speaks a contact name at a mobile phone or other device to be interpreted by a speech recognizer at step 52 . At step 54 , the history-based voice dialing system 50 determines if any contact names on an N-best list have been misrecognized previously. For example, the history-based voice dialing system 50 may include a history module that keeps track of names that have been misrecognized.
- the history module may include names that are known to be difficult to recognize, such as non-native names or names with unusual pronunciations.
- specific names or words may be hard-coded into the system 50 to indicate that they are easily confusable.
- the names “Ryan” and “Brian” may be automatically recognized as names that are confusable with one another. If the history module does not indicate that any names on the list have been previously misrecognized, or that any of the names are known to be difficult, the history-based voice dialing system 50 may proceed to a voice dialing system at step 56 . Otherwise, the history-based voice dialing system continues to step 58 .
- a confidence estimation module determines the history-based confidence measure based in part on the names tracked by the history module at step 58 .
- the history-based confidence measure is based on names that were misrecognized in past recognition sequences. In other words, if the N-best list includes any difficult or previously misrecognized names, the history-based voice dialing system 50 assumes that the input speech may have been misrecognized. Therefore, the typical confidence measure may not be satisfactory, and the confidence threshold may be adjusted accordingly. In this manner, the history-based voice dialing system 50 ensures that potential misrecognized input speech is confirmed with supplementary information.
- further action is taken to verify the input speech. For example, the presence of a difficult name on the N-best list may automatically require supplementary information.
- the history-based voice dialing system 50 may not require that a confidence measure be determined. In other words, the history-based voice dialing system 50 may omit step 58 and compensate for difficult names using other criteria. For example, the history-based voice dialing system 50 may automatically require supplementary information if a difficult name is on the N-best list, regardless of any confidence measure.
- a voice dialing service or directory 70 may incorporate the present invention to facilitate calls made by callers with speech that is difficult to recognize.
- a user may call an automated switchboard such as a private branch exchange (PBX) switching system 72 .
- PBX private branch exchange
- a user may access the PBX 72 through a telephone network infrastructure 74 .
- the PBX 72 is connected to the telephone network 74 through one or more outside lines 76 .
- Outside telephone stations 78 are reached through a unique telephone number.
- one or more internal telephone stations 80 may be connected to the PBX 72 through telephone station lines 82 . Each individual telephone station 80 may be assigned a unique extension number.
- a voice dialing server 84 connected to the PBX 72 enables callers to reach an internal station 80 or an outside telephone station 78 by using voice dialing.
- callers may call from the outside stations 78 to reach the internal stations 80
- callers may call from the internal stations 80 to the outside stations 78
- callers at internal stations 80 may call other internal stations 80 .
- outside callers can contact the PBX 72 or similar switchboard to call other outside callers.
- An example of a voice dialing server is shown in U.S. Pat. No. 5,930,336, filed Sep. 30, 1996, which is hereby incorporated by reference in its entirety.
- a user from an outside station 78 or from an internal station 80 connects to the voice dialing server 84 through the PBX 72 .
- Voice input from the user is received at the voice dialing server 84 .
- the voice dialing server 84 includes a speech recognizer 86 .
- the speech recognizer 86 interprets the input request from the user.
- the voice dialing server determines an N-best list of candidate contacts from a contact database 88 . Each potential contact in the N-best list has a confidence measure.
- the voice dialing server may request supplementary information from the user through the PBX 72 in order to determine the correct contact.
- the voice dialing server 84 may include, in addition to a general speaker independent model, a plurality of speech models that are specific to some speaker characteristic such as gender or accent.
- the speech model may be configured based on speech from such callers.
- the voice dialing server 84 may modify the confidence measure based on information from the speech model 90 . Output from the speech model 90 may be dynamically combined with the supplementary information supplied by the user to more efficiently determine the correct contact.
- the voice dialing server 84 may include multiple speech models for different users.
- the voice dialing server 84 may dynamically select a speech model for a particular user based on the input speech.
- the voice dialing server 84 may select a speech model for a particular user based on prior calls from the user.
- the voice dialing server 84 may include speech models for specific accents or dialects.
- the voice dialing service 70 may be used to determine a contact from a large list of potential contacts. While a contact list residing on a mobile phone or other device may be limited, a directory or switching system may be responsible for an indefinitely large list of contacts. It is therefore possible for the voice dialing service 70 to determine an extremely large N-best list based on the input speech from the user. The supplementary information of the present invention can be used to quickly narrow down the list of contacts so that the voice dialing system 70 may determine the correct contact. It should be understood that a similar voice dialing system may be used to connect users of various types of telephony devices to intended contacts. For example, a mobile telephone user may connect to a remotely located voice dialing system or server to contact other users using the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- User Interface Of Digital Computer (AREA)
- Telephone Function (AREA)
Abstract
Description
- The present invention relates to voice dialing systems, and more particularly to improving the performance of voice dialing systems.
- Voice dialing systems require speech recognition capabilities to process voice commands. Speech recognition capabilities may be implemented in a mobile phone to allow a user to easily dial a phone number. For example, the user may initiate a call to a contact in an address book on the mobile phone by saying the name of the contact. Speech recognition allows the voice dialing system to process the name and automatically dial the correct number.
- In order for the voice dialing system to correctly identify the correct number to dial, the user must say the name of the contact clearly. For example, distortion, mispronunciation, and background noise may cause the voice dialing system to misunderstand the intended contact. Therefore, voice dialing systems may implement a system wherein a confidence value is assigned to the voice input of the user. In other words, the confidence value indicates the presumed accuracy of the intended contact as determined by the voice dialing system. A low confidence value may indicate that further measures are necessary in order to dial the correct number. For example, the voice dialing system may require the user to restate the name of the intended contact. Of course, in a particular implementation, the role of the confidence value might be played by a measure whose value would be low when speech is well-recognized, and high when misunderstanding is probable. This could be called an “uncertainty measure.” A high value of the uncertainty measure might indicate that further measures are necessary to dial the correct number. Despite the superficial differences between the two types of measures, they play the same kind of role in the system.
- A method for improving recognition results of a speech recognizer at a remote location comprises receiving input speech from a user at the remote location. One or more candidate matches are determined based on the input speech. In another embodiment, the one or more candidate matches represent the whole list of possible candidates, and such list is sorted according to the input speech. The user is prompted for supplementary data associated with the input speech. The supplementary data is received from the user. One of the one or more candidate matches is selected based on the input speech and the supplementary data.
- In another aspect of the invention, a method for improving recognition results of a speech recognizer in an electronic device comprises receiving input speech from a user at the device. The input speech is interpreted at a speech recognizer. One or more candidate entries from a plurality of candidate entries are determined based on the input speech. In another embodiment, the one or more candidate matches represent the whole list of possible candidates, and such list is sorted according to the input speech. A confidence measure is generated for the one or more candidate entries based on the input speech. The user is prompted for supplementary data associated with the input speech if the confidence measure is below a threshold. The supplementary data is received from the user at the device. One of the one or more candidate entries is selected based on the input speech and the supplementary data.
- In another aspect of the invention, a system for directing telephone calls based on input speech comprises a speech recognizer that receives input speech from a remote user. A database includes a plurality of entries. A controller communicates with the speech recognizer and the database to select one or more candidate entries from the plurality of entries based on the input speech. The controller determines supplementary data based on ambiguities between the input speech and the one or more candidate entries and prompts the user for the supplementary data.
- Further areas of applicability of the present invention will become apparent from the detailed description provided hereinafter. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
- The present invention will become more fully understood from the detailed description and the accompanying drawings, wherein:
-
FIG. 1 is a flow diagram of a voice dialing system according to the present invention; -
FIG. 2 is a functional block diagram of a mobile device according to the present invention; -
FIG. 3 is a flow diagram of a voice dialing system incorporating a history-based confidence measure according to the present invention; and -
FIG. 4 is functional block diagram of an automated switchboard according to the present invention. - The following description of the preferred embodiment(s) is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
- As shown in
FIG. 1 , a user initiates avoice dialing algorithm 10 by speaking a contact name atstep 12. Thevoice dialing system 10 interprets the input contact name and associates the input contact name with a contact name in an address book. Thevoice dialing system 10 determines a confidence measure of the input contact name at step 14. The voice dialing system determines if the confidence measure is above a threshold at 16. If the confidence measure is high, thevoice dialing system 10 acts on the contact name and dials the phone number of the contact atstep 18. For example, thevoice dialing system 10 may require that the confidence measure is above a specific threshold, such as 60%. The threshold may be predetermined or be modifiable by the user. - If the confidence measure is below the threshold but not lower than a minimum threshold, the
voice dialing system 10 asks the user to confirm that the spoken contact name was interpreted correctly atstep 20. For example, thevoice dialing system 10 may repeat the contact name and ask the user to reply “yes” or “no.” If the user says “yes,” then thevoice dialing system 10 may proceed with the phone call and dial the number of the confirmed contact name atstep 18. If the user says “no,” thevoice dialing system 10 requires the user to enter supplementary information atstep 22. Alternatively, if the confidence measure is below the minimum threshold, thevoice dialing system 10 may omitstep 20 and move directly tostep 22. Thesystem 10 may ask the user to enter the supplementary information using a keypad and/or voice commands. For example, thevoice dialing system 10 may require the user to speak the initials of the intended contact, or to enter the initials using the keypad. The user may enter supplementary information using other suitable methods, such as a mouse, touchpad, touchscreen, or stylus pen. In another embodiment, thevoice dialing system 10 may require that the user enter keypad information prior to asking for input speech atstep 12. In this embodiment, thevoice dialing system 10 may interpret the input speech according to constraints defined by the keypad input. - At
step 24, thevoice dialing system 10 processes the supplementary information to determine the correct contact name, and thereafter proceeds tostep 18. If the supplementary information is not sufficient to determine the correct contact name, further actions may be necessary. For example, thevoice dialing system 10 may return to step 22 to request additional supplementary information. In another embodiment, thevoice dialing system 10 may return to step 12 and require the user to restate the intended contact name. In another embodiment, thevoice dialing system 10 may not be able to correctly determine the input speech. In this case, thevoice dialing system 10 may direct the user to an operator for further assistance. For example, thevoice dialing system 10 may increment and check a counter atstep 26 if the supplementary information is not sufficient to determine a contact. If the counter has not reached a predetermined setpoint, thevoice dialing system 10 may continue to request supplementary information atstep 22. If the counter has reached the setpoint, thevoice dialing system 10 may direct the user to an operator atstep 28. - The
voice dialing system 10 determines what supplementary information to request according to ambiguities of the initial spoken contact name that is input atstep 12. For example, if the intended contact name is “John Smith,” and there are numerous entries in the address book with the initials “J” and “S,” then requesting the user to input initials would be ineffective. In this instance, thevoice dialing system 10 may require the user to enter other supplementary information such as the first three letters of the last name of the intended contact. Alternatively, thevoice dialing system 10 may require the user enter an area code of the intended contact. - The supplementary information that the
voice dialing system 10 requires atstep 22 is minimized. In other words, thevoice dialing system 10 will request as little information as possible in order to confirm the intended contact name and proceed with the call. Thevoice dialing system 10 will not request the supplementary information if the initial confidence measure determined at step 14 was sufficient. If thevoice dialing system 10 requires keypad inputs, thevoice dialing system 10 will request the minimum number of key presses necessary to distinguish the intended contact name from an N-best list of contact names. For example, if the first three letters of both the intended contact and a similar entry are “smi,” thevoice dialing system 10 may ask the user to input the first four letters of the intended contact. Alternatively, thevoice dialing system 10 may simply ask the user to input the contact name using the keypad, and then automatically select the correct contact as the keys are entered. In other words, if the user begins to input supplementary information, thevoice dialing system 10 may automatically select the correct contact name as soon as sufficient information has been entered. Thevoice dialing system 10 may also dynamically request a different type of supplementary information as the user inputs previously requested information. - In another embodiment, the
voice dialing system 10 may compensate for garbled or distorted input speech. A speech recognizer may initially interpret the input speech incorrectly. After the user inputs supplementary information, the input speech may be reinterpreted by the speech recognizer. The speech recognizer interprets the input speech within the constraints defined by the supplementary information. In this manner, thevoice dialing system 10 may also compensate for mispronunciations. - In another embodiment, the
voice dialing system 10 may compensate for misspellings or typographical errors in the user's manual input. For example, thevoice dialing system 10 may determine that the supplementary information is not consistent with a contact list or database. Thesystem 10 may thus include an algorithm to determine an approximate matching between the supplementary information and the database or contact list, and eventually considering the input speech. - A
mobile device 30 incorporating thevoice dialing system 10 is shown inFIG. 2 . The mobile device may be a mobile phone, PDA, or other suitable device. A user speaks a contact name, or other audio input, into anaudio input mechanism 32 of themobile device 30. Aspeech recognizer 34 interprets the audio input. Acontroller 36 compares the audio input as interpreted by therecognizer 34 to a database of contact names 38. Thecontroller 36 generates a confidence measure based on the comparison. - If the confidence measure is greater than a threshold, the
controller 36 dials the number of the contact. If the confidence measure is not greater than the threshold, thecontroller 36 requests a “yes” or “no” verification from the user. If the user does not answer “yes,” thecontroller 36 determines what supplementary information to request from the user. In other words, thecontroller 36 determines desired supplementary information according to potential ambiguities between the intended contact name and the interpreted contact name. Supplementary information includes, but is not limited to, spelling, initials, and area code. The desired supplementary information may be conveyed to the user visually at adisplay 40 or at an audio output mechanism orspeaker 42. The user may enter the supplementary information verbally into theaudio input mechanism 32, or manually at akeypad 44. Thecontroller 36 determines the correct contact name based on the supplementary information. Alternatively, thecontroller 36 may forego the step of requesting a “yes” or “no” verification. For example, if the confidence measure is greater than a second threshold, thecontroller 36 may determine that a particular contact name is correct and automatically dial the corresponding number. - In addition to contact names, the user may apply the present invention to other applications. In one embodiment, the user may request navigation information. The user inputs a location or place name into the
audio input mechanism 32. Thespeech recognizer 34 interprets the audio input. Thecontroller 36 compares the audio input as interpreted by therecognizer 34 to a list of locations in thedatabase 38. Thecontroller 36 may request supplementary information relevant to navigation. For example, if the user input a city name, thecontroller 36 may request a state abbreviation if the city name was found in more than one state. In another embodiment, the user may request information about a company by speaking a company name into theaudio input mechanism 32. Thecontroller 36 may ask for supplementary information such as a stock abbreviation of the company. In still another embodiment, the user may request email or voicemail from a specific source. - A history-based
voice dialing system 50 may incorporate a history-based confidence measure as shown inFIG. 3 . Certain names and other spoken inputs may be misrecognized more often than others. Additionally, speech from certain callers and/or speakers may be more difficult to recognize. For example, the user speaks a contact name at a mobile phone or other device to be interpreted by a speech recognizer atstep 52. Atstep 54, the history-basedvoice dialing system 50 determines if any contact names on an N-best list have been misrecognized previously. For example, the history-basedvoice dialing system 50 may include a history module that keeps track of names that have been misrecognized. Alternatively, the history module may include names that are known to be difficult to recognize, such as non-native names or names with unusual pronunciations. In other words, specific names or words may be hard-coded into thesystem 50 to indicate that they are easily confusable. For example, the names “Ryan” and “Brian” may be automatically recognized as names that are confusable with one another. If the history module does not indicate that any names on the list have been previously misrecognized, or that any of the names are known to be difficult, the history-basedvoice dialing system 50 may proceed to a voice dialing system atstep 56. Otherwise, the history-based voice dialing system continues to step 58. - A confidence estimation module determines the history-based confidence measure based in part on the names tracked by the history module at
step 58. The history-based confidence measure is based on names that were misrecognized in past recognition sequences. In other words, if the N-best list includes any difficult or previously misrecognized names, the history-basedvoice dialing system 50 assumes that the input speech may have been misrecognized. Therefore, the typical confidence measure may not be satisfactory, and the confidence threshold may be adjusted accordingly. In this manner, the history-basedvoice dialing system 50 ensures that potential misrecognized input speech is confirmed with supplementary information. Atstep 60, further action is taken to verify the input speech. For example, the presence of a difficult name on the N-best list may automatically require supplementary information. - In another embodiment, the history-based
voice dialing system 50 may not require that a confidence measure be determined. In other words, the history-basedvoice dialing system 50 may omitstep 58 and compensate for difficult names using other criteria. For example, the history-basedvoice dialing system 50 may automatically require supplementary information if a difficult name is on the N-best list, regardless of any confidence measure. - Referring now to
FIG. 4 , a voice dialing service ordirectory 70 may incorporate the present invention to facilitate calls made by callers with speech that is difficult to recognize. A user may call an automated switchboard such as a private branch exchange (PBX) switchingsystem 72. A user may access thePBX 72 through atelephone network infrastructure 74. ThePBX 72 is connected to thetelephone network 74 through one or moreoutside lines 76.Outside telephone stations 78 are reached through a unique telephone number. Additionally, one or moreinternal telephone stations 80 may be connected to thePBX 72 through telephone station lines 82. Eachindividual telephone station 80 may be assigned a unique extension number. Avoice dialing server 84 connected to thePBX 72 enables callers to reach aninternal station 80 or anoutside telephone station 78 by using voice dialing. In other words, callers may call from theoutside stations 78 to reach theinternal stations 80, callers may call from theinternal stations 80 to theoutside stations 78, and/or callers atinternal stations 80 may call otherinternal stations 80. In another embodiment, outside callers can contact thePBX 72 or similar switchboard to call other outside callers. An example of a voice dialing server is shown in U.S. Pat. No. 5,930,336, filed Sep. 30, 1996, which is hereby incorporated by reference in its entirety. - A user from an
outside station 78 or from aninternal station 80 connects to thevoice dialing server 84 through thePBX 72. Voice input from the user is received at thevoice dialing server 84. For example, the user may request to be connected to a specific contact. Thevoice dialing server 84 includes aspeech recognizer 86. Thespeech recognizer 86 interprets the input request from the user. The voice dialing server determines an N-best list of candidate contacts from acontact database 88. Each potential contact in the N-best list has a confidence measure. The voice dialing server may request supplementary information from the user through thePBX 72 in order to determine the correct contact. Thevoice dialing server 84 may include, in addition to a general speaker independent model, a plurality of speech models that are specific to some speaker characteristic such as gender or accent. The speech model may be configured based on speech from such callers. Thevoice dialing server 84 may modify the confidence measure based on information from thespeech model 90. Output from thespeech model 90 may be dynamically combined with the supplementary information supplied by the user to more efficiently determine the correct contact. Additionally, thevoice dialing server 84 may include multiple speech models for different users. Thevoice dialing server 84 may dynamically select a speech model for a particular user based on the input speech. Alternatively, thevoice dialing server 84 may select a speech model for a particular user based on prior calls from the user. For example, thevoice dialing server 84 may include speech models for specific accents or dialects. - As shown in
FIG. 4 , thevoice dialing service 70 may be used to determine a contact from a large list of potential contacts. While a contact list residing on a mobile phone or other device may be limited, a directory or switching system may be responsible for an indefinitely large list of contacts. It is therefore possible for thevoice dialing service 70 to determine an extremely large N-best list based on the input speech from the user. The supplementary information of the present invention can be used to quickly narrow down the list of contacts so that thevoice dialing system 70 may determine the correct contact. It should be understood that a similar voice dialing system may be used to connect users of various types of telephony devices to intended contacts. For example, a mobile telephone user may connect to a remotely located voice dialing system or server to contact other users using the present invention. - The description of the invention is merely exemplary in nature and, thus, variations that do not depart from the gist of the invention are intended to be within the scope of the invention. Such variations are not to be regarded as a departure from the spirit and scope of the invention.
Claims (58)
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/652,146 US6983244B2 (en) | 2003-08-29 | 2003-08-29 | Method and apparatus for improved speech recognition with supplementary information |
PCT/US2004/024959 WO2005024779A2 (en) | 2003-08-29 | 2004-07-30 | Method and apparatus for improved speech recognition with supplementary information |
CNA2004800248177A CN1842842A (en) | 2003-08-29 | 2004-07-30 | A method and device for improving speech recognition based on auxiliary information |
EP04779886A EP1661121A4 (en) | 2003-08-29 | 2004-07-30 | Method and apparatus for improved speech recognition with supplementary information |
JP2006524671A JP2007504490A (en) | 2003-08-29 | 2004-07-30 | Method and apparatus for improved speech recognition using supplementary information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/652,146 US6983244B2 (en) | 2003-08-29 | 2003-08-29 | Method and apparatus for improved speech recognition with supplementary information |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050049860A1 true US20050049860A1 (en) | 2005-03-03 |
US6983244B2 US6983244B2 (en) | 2006-01-03 |
Family
ID=34217569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/652,146 Expired - Lifetime US6983244B2 (en) | 2003-08-29 | 2003-08-29 | Method and apparatus for improved speech recognition with supplementary information |
Country Status (5)
Country | Link |
---|---|
US (1) | US6983244B2 (en) |
EP (1) | EP1661121A4 (en) |
JP (1) | JP2007504490A (en) |
CN (1) | CN1842842A (en) |
WO (1) | WO2005024779A2 (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050131686A1 (en) * | 2003-12-16 | 2005-06-16 | Canon Kabushiki Kaisha | Information processing apparatus and data input method |
US20060025996A1 (en) * | 2004-07-27 | 2006-02-02 | Microsoft Corporation | Method and apparatus to improve name confirmation in voice-dialing systems |
US20070005358A1 (en) * | 2005-06-29 | 2007-01-04 | Siemens Aktiengesellschaft | Method for determining a list of hypotheses from a vocabulary of a voice recognition system |
US20070180384A1 (en) * | 2005-02-23 | 2007-08-02 | Demetrio Aiello | Method for selecting a list item and information or entertainment system, especially for motor vehicles |
US20080243501A1 (en) * | 2007-04-02 | 2008-10-02 | Google Inc. | Location-Based Responses to Telephone Requests |
US20090196404A1 (en) * | 2008-02-05 | 2009-08-06 | Htc Corporation | Method for setting voice tag |
US20100125456A1 (en) * | 2008-11-19 | 2010-05-20 | Robert Bosch Gmbh | System and Method for Recognizing Proper Names in Dialog Systems |
US20110022386A1 (en) * | 2009-07-22 | 2011-01-27 | Cisco Technology, Inc. | Speech recognition tuning tool |
US20120179457A1 (en) * | 2011-01-07 | 2012-07-12 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US20120220347A1 (en) * | 2011-02-28 | 2012-08-30 | Nokia Corporation | Handling a voice communication request |
US20120304057A1 (en) * | 2011-05-23 | 2012-11-29 | Nuance Communications, Inc. | Methods and apparatus for correcting recognition errors |
US20120304212A1 (en) * | 2009-10-09 | 2012-11-29 | Morris Lee | Methods and apparatus to adjust signature matching results for audience measurement |
US20140023241A1 (en) * | 2012-07-23 | 2014-01-23 | Toshiba Tec Kabushiki Kaisha | Dictionary registration apparatus and method for adding feature amount data to recognition dictionary |
US20140052449A1 (en) * | 2006-09-12 | 2014-02-20 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a ultimodal application |
US20150031416A1 (en) * | 2013-07-23 | 2015-01-29 | Motorola Mobility Llc | Method and Device For Command Phrase Validation |
US20150127339A1 (en) * | 2013-11-06 | 2015-05-07 | Microsoft Corporation | Cross-language speech recognition |
US9196252B2 (en) | 2001-06-15 | 2015-11-24 | Nuance Communications, Inc. | Selective enablement of speech recognition grammars |
US9761241B2 (en) | 1998-10-02 | 2017-09-12 | Nuance Communications, Inc. | System and method for providing network coordinated conversational services |
US9886944B2 (en) | 2012-10-04 | 2018-02-06 | Nuance Communications, Inc. | Hybrid controller for ASR |
US20190130901A1 (en) * | 2016-06-15 | 2019-05-02 | Sony Corporation | Information processing device and information processing method |
CN109785858A (en) * | 2018-12-14 | 2019-05-21 | 平安普惠企业管理有限公司 | A kind of contact person's adding method, device, readable storage medium storing program for executing and terminal device |
CN110021293A (en) * | 2019-04-08 | 2019-07-16 | 上海汽车集团股份有限公司 | Audio recognition method and device, readable storage medium storing program for executing |
KR20200053341A (en) * | 2018-11-08 | 2020-05-18 | 현대자동차주식회사 | Vehicle and controlling method thereof |
US10971157B2 (en) | 2017-01-11 | 2021-04-06 | Nuance Communications, Inc. | Methods and apparatus for hybrid speech recognition processing |
US11096848B2 (en) * | 2016-09-12 | 2021-08-24 | Fuji Corporation | Assistance device for identifying a user of the assistance device from a spoken name |
US11537881B2 (en) * | 2019-10-21 | 2022-12-27 | The Boeing Company | Machine learning model development |
US20230115271A1 (en) * | 2021-10-13 | 2023-04-13 | Hithink Royalflush Information Network Co., Ltd. | Systems and methods for speech recognition |
US12002451B1 (en) * | 2021-07-01 | 2024-06-04 | Amazon Technologies, Inc. | Automatic speech recognition |
US12033618B1 (en) * | 2021-11-09 | 2024-07-09 | Amazon Technologies, Inc. | Relevant context determination |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE60222413T2 (en) * | 2001-04-19 | 2008-06-12 | British Telecommunications P.L.C. | VOICE RECOGNITION |
KR20050054706A (en) * | 2003-12-05 | 2005-06-10 | 엘지전자 주식회사 | Method for building lexical tree for speech recognition |
US8413069B2 (en) | 2005-06-28 | 2013-04-02 | Avaya Inc. | Method and apparatus for the automatic completion of composite characters |
US7509094B2 (en) * | 2005-06-30 | 2009-03-24 | Modu Ltd. | Wireless telecommunication device and uses thereof |
US7636426B2 (en) * | 2005-08-10 | 2009-12-22 | Siemens Communications, Inc. | Method and apparatus for automated voice dialing setup |
US8249873B2 (en) | 2005-08-12 | 2012-08-21 | Avaya Inc. | Tonal correction of speech |
US20080037720A1 (en) * | 2006-07-27 | 2008-02-14 | Speechphone, Llc | Voice Activated Communication Using Automatically Updated Address Books |
US8386248B2 (en) * | 2006-09-22 | 2013-02-26 | Nuance Communications, Inc. | Tuning reusable software components in a speech application |
EP1933302A1 (en) * | 2006-12-12 | 2008-06-18 | Harman Becker Automotive Systems GmbH | Speech recognition method |
US7970433B2 (en) | 2007-06-08 | 2011-06-28 | Modu Ltd. | SD switch box in a cellular handset |
US10027789B2 (en) | 2007-02-13 | 2018-07-17 | Google Llc | Modular wireless communicator |
US8391921B2 (en) | 2007-02-13 | 2013-03-05 | Google Inc. | Modular wireless communicator |
DE102008028090A1 (en) * | 2008-02-29 | 2009-09-10 | Navigon Ag | Method for operating a navigation system |
US8412226B2 (en) | 2008-06-24 | 2013-04-02 | Google Inc. | Mobile phone locator |
WO2012011636A1 (en) | 2010-07-20 | 2012-01-26 | Lg Electronics Inc. | User profile based configuration of user experience environment |
WO2012011630A1 (en) | 2010-07-20 | 2012-01-26 | Lg Electronics Inc. | Selective interaction between networked smart devices |
CN103098500B (en) * | 2010-07-20 | 2016-10-12 | Lg电子株式会社 | Its method that information is provided of electronic equipment, electronic system and use |
CN102708862B (en) * | 2012-04-27 | 2014-09-24 | 苏州思必驰信息科技有限公司 | Touch-assisted real-time speech recognition system and real-time speech/action synchronous decoding method thereof |
CN103578468B (en) * | 2012-08-01 | 2017-06-27 | 联想(北京)有限公司 | The method of adjustment and electronic equipment of a kind of confidence coefficient threshold of voice recognition |
CN102937834B (en) * | 2012-11-26 | 2016-01-06 | 上海量明科技发展有限公司 | The method that mixed type inputs, client and system |
US10026399B2 (en) * | 2015-09-11 | 2018-07-17 | Amazon Technologies, Inc. | Arbitration between voice-enabled devices |
CN105931642B (en) * | 2016-05-31 | 2020-11-10 | 北京京东尚科信息技术有限公司 | Voice recognition method, device and system |
CN118171655B (en) * | 2024-05-13 | 2024-07-12 | 北京中关村科金技术有限公司 | Name generation method and device, electronic equipment and computer program product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5566272A (en) * | 1993-10-27 | 1996-10-15 | Lucent Technologies Inc. | Automatic speech recognition (ASR) processing using confidence measures |
US5748841A (en) * | 1994-02-25 | 1998-05-05 | Morin; Philippe | Supervised contextual language acquisition system |
US5912949A (en) * | 1996-11-05 | 1999-06-15 | Northern Telecom Limited | Voice-dialing system using both spoken names and initials in recognition |
US20020178344A1 (en) * | 2001-05-22 | 2002-11-28 | Canon Kabushiki Kaisha | Apparatus for managing a multi-modal user interface |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2219008C (en) | 1997-10-21 | 2002-11-19 | Bell Canada | A method and apparatus for improving the utility of speech recognition |
EP1238250B1 (en) * | 1999-06-10 | 2004-11-17 | Infineon Technologies AG | Voice recognition method and device |
US6421672B1 (en) * | 1999-07-27 | 2002-07-16 | Verizon Services Corp. | Apparatus for and method of disambiguation of directory listing searches utilizing multiple selectable secondary search keys |
US6587818B2 (en) * | 1999-10-28 | 2003-07-01 | International Business Machines Corporation | System and method for resolving decoding ambiguity via dialog |
US6925154B2 (en) * | 2001-05-04 | 2005-08-02 | International Business Machines Corproation | Methods and apparatus for conversational name dialing systems |
US6963834B2 (en) * | 2001-05-29 | 2005-11-08 | International Business Machines Corporation | Method of speech recognition using empirically determined word candidates |
US7124085B2 (en) * | 2001-12-13 | 2006-10-17 | Matsushita Electric Industrial Co., Ltd. | Constraint-based speech recognition system and method |
-
2003
- 2003-08-29 US US10/652,146 patent/US6983244B2/en not_active Expired - Lifetime
-
2004
- 2004-07-30 JP JP2006524671A patent/JP2007504490A/en active Pending
- 2004-07-30 EP EP04779886A patent/EP1661121A4/en not_active Withdrawn
- 2004-07-30 CN CNA2004800248177A patent/CN1842842A/en active Pending
- 2004-07-30 WO PCT/US2004/024959 patent/WO2005024779A2/en not_active Application Discontinuation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5566272A (en) * | 1993-10-27 | 1996-10-15 | Lucent Technologies Inc. | Automatic speech recognition (ASR) processing using confidence measures |
US5748841A (en) * | 1994-02-25 | 1998-05-05 | Morin; Philippe | Supervised contextual language acquisition system |
US5912949A (en) * | 1996-11-05 | 1999-06-15 | Northern Telecom Limited | Voice-dialing system using both spoken names and initials in recognition |
US20020178344A1 (en) * | 2001-05-22 | 2002-11-28 | Canon Kabushiki Kaisha | Apparatus for managing a multi-modal user interface |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9761241B2 (en) | 1998-10-02 | 2017-09-12 | Nuance Communications, Inc. | System and method for providing network coordinated conversational services |
US9196252B2 (en) | 2001-06-15 | 2015-11-24 | Nuance Communications, Inc. | Selective enablement of speech recognition grammars |
US20050131686A1 (en) * | 2003-12-16 | 2005-06-16 | Canon Kabushiki Kaisha | Information processing apparatus and data input method |
US20060025996A1 (en) * | 2004-07-27 | 2006-02-02 | Microsoft Corporation | Method and apparatus to improve name confirmation in voice-dialing systems |
US7475017B2 (en) * | 2004-07-27 | 2009-01-06 | Microsoft Corporation | Method and apparatus to improve name confirmation in voice-dialing systems |
US20070180384A1 (en) * | 2005-02-23 | 2007-08-02 | Demetrio Aiello | Method for selecting a list item and information or entertainment system, especially for motor vehicles |
US20070005358A1 (en) * | 2005-06-29 | 2007-01-04 | Siemens Aktiengesellschaft | Method for determining a list of hypotheses from a vocabulary of a voice recognition system |
EP1739655A3 (en) * | 2005-06-29 | 2008-06-18 | Siemens Aktiengesellschaft | Method for determining a list of hypotheses from the vocabulary of a speech recognition system |
US8862471B2 (en) * | 2006-09-12 | 2014-10-14 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a multimodal application |
US20140052449A1 (en) * | 2006-09-12 | 2014-02-20 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a ultimodal application |
US9600229B2 (en) | 2007-04-02 | 2017-03-21 | Google Inc. | Location based responses to telephone requests |
US10163441B2 (en) * | 2007-04-02 | 2018-12-25 | Google Llc | Location-based responses to telephone requests |
US10431223B2 (en) * | 2007-04-02 | 2019-10-01 | Google Llc | Location-based responses to telephone requests |
US9858928B2 (en) | 2007-04-02 | 2018-01-02 | Google Inc. | Location-based responses to telephone requests |
US11854543B2 (en) | 2007-04-02 | 2023-12-26 | Google Llc | Location-based responses to telephone requests |
US20190019510A1 (en) * | 2007-04-02 | 2019-01-17 | Google Llc | Location-Based Responses to Telephone Requests |
US11056115B2 (en) | 2007-04-02 | 2021-07-06 | Google Llc | Location-based responses to telephone requests |
US8650030B2 (en) * | 2007-04-02 | 2014-02-11 | Google Inc. | Location based responses to telephone requests |
US8856005B2 (en) | 2007-04-02 | 2014-10-07 | Google Inc. | Location based responses to telephone requests |
US20080243501A1 (en) * | 2007-04-02 | 2008-10-02 | Google Inc. | Location-Based Responses to Telephone Requests |
US10665240B2 (en) | 2007-04-02 | 2020-05-26 | Google Llc | Location-based responses to telephone requests |
US8229507B2 (en) * | 2008-02-05 | 2012-07-24 | Htc Corporation | Method for setting voice tag |
US20090196404A1 (en) * | 2008-02-05 | 2009-08-06 | Htc Corporation | Method for setting voice tag |
US20100125456A1 (en) * | 2008-11-19 | 2010-05-20 | Robert Bosch Gmbh | System and Method for Recognizing Proper Names in Dialog Systems |
US8108214B2 (en) | 2008-11-19 | 2012-01-31 | Robert Bosch Gmbh | System and method for recognizing proper names in dialog systems |
WO2010059525A1 (en) * | 2008-11-19 | 2010-05-27 | Robert Bosch Gmbh | System and method for recognizing proper names in dialog systems |
US9183834B2 (en) * | 2009-07-22 | 2015-11-10 | Cisco Technology, Inc. | Speech recognition tuning tool |
US20110022386A1 (en) * | 2009-07-22 | 2011-01-27 | Cisco Technology, Inc. | Speech recognition tuning tool |
US20120304212A1 (en) * | 2009-10-09 | 2012-11-29 | Morris Lee | Methods and apparatus to adjust signature matching results for audience measurement |
US9124379B2 (en) * | 2009-10-09 | 2015-09-01 | The Nielsen Company (Us), Llc | Methods and apparatus to adjust signature matching results for audience measurement |
US20120179457A1 (en) * | 2011-01-07 | 2012-07-12 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US10032455B2 (en) | 2011-01-07 | 2018-07-24 | Nuance Communications, Inc. | Configurable speech recognition system using a pronunciation alignment between multiple recognizers |
US10049669B2 (en) | 2011-01-07 | 2018-08-14 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US9953653B2 (en) * | 2011-01-07 | 2018-04-24 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US20120220347A1 (en) * | 2011-02-28 | 2012-08-30 | Nokia Corporation | Handling a voice communication request |
US8868136B2 (en) * | 2011-02-28 | 2014-10-21 | Nokia Corporation | Handling a voice communication request |
US20120304057A1 (en) * | 2011-05-23 | 2012-11-29 | Nuance Communications, Inc. | Methods and apparatus for correcting recognition errors |
US10522133B2 (en) * | 2011-05-23 | 2019-12-31 | Nuance Communications, Inc. | Methods and apparatus for correcting recognition errors |
US20140023241A1 (en) * | 2012-07-23 | 2014-01-23 | Toshiba Tec Kabushiki Kaisha | Dictionary registration apparatus and method for adding feature amount data to recognition dictionary |
US9886944B2 (en) | 2012-10-04 | 2018-02-06 | Nuance Communications, Inc. | Hybrid controller for ASR |
US20150031416A1 (en) * | 2013-07-23 | 2015-01-29 | Motorola Mobility Llc | Method and Device For Command Phrase Validation |
US11363128B2 (en) | 2013-07-23 | 2022-06-14 | Google Technology Holdings LLC | Method and device for audio input routing |
US11876922B2 (en) | 2013-07-23 | 2024-01-16 | Google Technology Holdings LLC | Method and device for audio input routing |
US20150127339A1 (en) * | 2013-11-06 | 2015-05-07 | Microsoft Corporation | Cross-language speech recognition |
US9472184B2 (en) * | 2013-11-06 | 2016-10-18 | Microsoft Technology Licensing, Llc | Cross-language speech recognition |
US20190130901A1 (en) * | 2016-06-15 | 2019-05-02 | Sony Corporation | Information processing device and information processing method |
US10937415B2 (en) * | 2016-06-15 | 2021-03-02 | Sony Corporation | Information processing device and information processing method for presenting character information obtained by converting a voice |
US11096848B2 (en) * | 2016-09-12 | 2021-08-24 | Fuji Corporation | Assistance device for identifying a user of the assistance device from a spoken name |
US11990135B2 (en) | 2017-01-11 | 2024-05-21 | Microsoft Technology Licensing, Llc | Methods and apparatus for hybrid speech recognition processing |
US10971157B2 (en) | 2017-01-11 | 2021-04-06 | Nuance Communications, Inc. | Methods and apparatus for hybrid speech recognition processing |
US11189276B2 (en) * | 2018-11-08 | 2021-11-30 | Hyundai Motor Company | Vehicle and control method thereof |
KR102613210B1 (en) | 2018-11-08 | 2023-12-14 | 현대자동차주식회사 | Vehicle and controlling method thereof |
KR20200053341A (en) * | 2018-11-08 | 2020-05-18 | 현대자동차주식회사 | Vehicle and controlling method thereof |
CN109785858A (en) * | 2018-12-14 | 2019-05-21 | 平安普惠企业管理有限公司 | A kind of contact person's adding method, device, readable storage medium storing program for executing and terminal device |
CN110021293A (en) * | 2019-04-08 | 2019-07-16 | 上海汽车集团股份有限公司 | Audio recognition method and device, readable storage medium storing program for executing |
US11537881B2 (en) * | 2019-10-21 | 2022-12-27 | The Boeing Company | Machine learning model development |
US12002451B1 (en) * | 2021-07-01 | 2024-06-04 | Amazon Technologies, Inc. | Automatic speech recognition |
US20230115271A1 (en) * | 2021-10-13 | 2023-04-13 | Hithink Royalflush Information Network Co., Ltd. | Systems and methods for speech recognition |
US12223945B2 (en) * | 2021-10-13 | 2025-02-11 | Hithink Royalflush Information Network Co., Ltd. | Systems and methods for multiple speaker speech recognition |
US12033618B1 (en) * | 2021-11-09 | 2024-07-09 | Amazon Technologies, Inc. | Relevant context determination |
Also Published As
Publication number | Publication date |
---|---|
WO2005024779B1 (en) | 2005-07-21 |
WO2005024779A3 (en) | 2005-06-16 |
US6983244B2 (en) | 2006-01-03 |
CN1842842A (en) | 2006-10-04 |
WO2005024779A2 (en) | 2005-03-17 |
EP1661121A2 (en) | 2006-05-31 |
EP1661121A4 (en) | 2007-02-28 |
JP2007504490A (en) | 2007-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6983244B2 (en) | Method and apparatus for improved speech recognition with supplementary information | |
US6243684B1 (en) | Directory assistance system and method utilizing a speech recognition system and a live operator | |
US5917890A (en) | Disambiguation of alphabetic characters in an automated call processing environment | |
US6643622B2 (en) | Data retrieval assistance system and method utilizing a speech recognition system and a live operator | |
US5917889A (en) | Capture of alphabetic or alphanumeric character strings in an automated call processing environment | |
US5905773A (en) | Apparatus and method for reducing speech recognition vocabulary perplexity and dynamically selecting acoustic models | |
US7657005B2 (en) | System and method for identifying telephone callers | |
CA2372671C (en) | Voice-operated services | |
US8374862B2 (en) | Method, software and device for uniquely identifying a desired contact in a contacts database based on a single utterance | |
US7450698B2 (en) | System and method of utilizing a hybrid semantic model for speech recognition | |
US20030191639A1 (en) | Dynamic and adaptive selection of vocabulary and acoustic models based on a call context for speech recognition | |
US20010016813A1 (en) | Distributed recogniton system having multiple prompt-specific and response-specific speech recognizers | |
US20090304161A1 (en) | system and method utilizing voice search to locate a product in stores from a phone | |
US9444934B2 (en) | Speech to text training method and system | |
JPH10215319A (en) | Dialing method and device by voice | |
US20060287868A1 (en) | Dialog system | |
CA2266112C (en) | Speech recognition of caller identifiers using location information | |
US20050144014A1 (en) | Directory dialer name recognition | |
EP1377000B1 (en) | Method used in a speech-enabled automatic directory system | |
US8213966B1 (en) | Text messages provided as a complement to a voice session | |
US6141661A (en) | Method and apparatus for performing a grammar-pruning operation | |
US20050203741A1 (en) | Caller interface systems and methods | |
EP1524778A1 (en) | Method for communicating information from a server to a user via a mobile communication device running a dialog script | |
JP4067483B2 (en) | Telephone reception translation system | |
EP1895748A1 (en) | Method, software and device for uniquely identifying a desired contact in a contacts database based on a single utterance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JUNQUA, JEAN-CLAUDE;KUHN, ROLAND;CONTOLINI, MATTEO;AND OTHERS;REEL/FRAME:014456/0193 Effective date: 20030829 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PANASONIC CORPORATION;REEL/FRAME:033033/0163 Effective date: 20140527 Owner name: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AME Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PANASONIC CORPORATION;REEL/FRAME:033033/0163 Effective date: 20140527 |
|
FPAY | Fee payment |
Year of fee payment: 12 |