US20130018659A1 - Systems and Methods for Speech Command Processing - Google Patents
Systems and Methods for Speech Command Processing Download PDFInfo
- Publication number
- US20130018659A1 US20130018659A1 US13/291,320 US201113291320A US2013018659A1 US 20130018659 A1 US20130018659 A1 US 20130018659A1 US 201113291320 A US201113291320 A US 201113291320A US 2013018659 A1 US2013018659 A1 US 2013018659A1
- Authority
- US
- United States
- Prior art keywords
- speech
- output
- computing device
- wearable computing
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B27/00—Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
- G02B27/01—Head-up displays
- G02B27/017—Head mounted
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B27/00—Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
- G02B27/01—Head-up displays
- G02B27/017—Head mounted
- G02B27/0172—Head mounted characterised by optical features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/16—Constructional details or arrangements
- G06F1/1613—Constructional details or arrangements for portable computers
- G06F1/163—Wearable computers, e.g. on a belt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/632—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/638—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B27/00—Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
- G02B27/01—Head-up displays
- G02B27/0101—Head-up displays characterised by optical features
- G02B2027/014—Head-up displays characterised by optical features comprising information/image processing systems
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B27/00—Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
- G02B27/01—Head-up displays
- G02B27/017—Head mounted
- G02B2027/0178—Eyeglass type
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- Word software applications such as word processing applications can be used to create, edit, and/or view information containing text.
- word processing software such as Microsoft Word
- Microsoft Word can be used to create, edit, and/or view documents that include text.
- Additional software applications can be used to convert speech to text. These applications can recognize spoken words and generate corresponding text. Some of these applications can provide a voice interface to other applications, such as voice mail systems.
- speech input is received at a wearable computing device.
- Speech-related text corresponding to the speech input is generated at the wearable computing device.
- a context for the speech-related text is determined using the wearable computing device. The context is based at least in part on a history of accessed documents and one or more databases.
- an action is determined. The action includes at least one of a command and a search request.
- an output based on the command is generated using the wearable computing device.
- the search request is communicated to a search engine, (ii) search results are received from the search engine, and an output based on the search results is generated using the wearable computing device.
- the output is provided using one or more output components of the wearable computing device.
- an apparatus in still another aspect of the disclosure of the application, includes: (i) means for receiving speech input, (ii) means for generating speech-related text corresponding to the speech input, (iii) means for determining a context for the speech-related text based at least in part on a history of accessed documents and one or more databases, (iv) means for determining an action based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, where the action comprises at least one of a command and a search request, (v) means for, in response to the action comprising a command, generating output based on the command, and (vi) means for providing the output.
- an article of manufacture including a tangible non-transitory computer-readable storage medium having computer-readable instructions encoded thereon.
- the computer-readable instructions include: (i) instructions for receiving speech input, (ii) instructions for generating speech-related text corresponding to the speech input, (iii) instructions for determining a context for the speech-related text based at least in part on a history of accessed documents and one or more databases, (iv) instructions for determining an action based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, wherein the action comprises at least one of a command and a search request, (v) instructions for, in response to the action comprising a command, generating output based on the command, (vi) instructions for, in response to the action comprising a search request: (a) communicating the search request to a search engine, (b) receiving search results from the search engine, and (c) generating output based on the search results, and (vi)
- FIG. 1 is a first view of an example system for receiving, transmitting and displaying data, in accordance with example embodiments.
- FIG. 2 is a second view of an example system of FIG. 1 , in accordance with example embodiments.
- FIG. 3 is an example schematic drawing of computer network infrastructure, in accordance with an example embodiment.
- FIG. 4 is a functional block diagram for a wearable computing system, in accordance with an example embodiment.
- FIG. 5A depicts a first scenario of speech evaluation in accordance with an example embodiment.
- FIGS. 5B and 5C depict processing by a speech evaluation module for the speech uttered in the scenario of FIG. 5A in accordance with an example embodiment.
- FIG. 6 depicts a second scenario of speech evaluation in accordance with an example embodiment.
- FIG. 7 depicts a third scenario of speech evaluation in accordance with an example embodiment.
- FIG. 8 depicts a fourth scenario of speech evaluation in accordance with an example embodiment.
- FIG. 9 depicts a fifth scenario of speech evaluation in accordance with an example embodiment.
- FIG. 10 is a flow chart of a method in accordance with an example embodiment.
- a speaker can say “Contact Jim” to provide speech input to the wearable computing device.
- the speech input can be received via an audio sensor (e.g., a microphone) of the wearable computing device and can be converted to text.
- an audio sensor e.g., a microphone
- a contextual analysis can be applied on the speech and/or text.
- the wearable computing device can convert the speech of “Contact Jim” to text.
- the contextual analysis of the “Contact Jim” speech can be determined using one or more queries for the text.
- the word “Contact” can lead to a display of various options for contacting a person; e.g., voice, multimedia, text, e-mail, social networking messages, and other options.
- a query of contacts or similar information can be performed using the text “Jim” to decide who “Jim” might be.
- one or more contacts can be returned with the name “Jim.”
- the speaker can provide additional information to contact a person. For example, if no contacts are returned based on the “Jim” query, the speaker could be prompted for information about the contact; e.g., the speaker could be asked for a full name, an e-mail address, or phone number for a contact.
- the wearable computing device can ask the user to choose between one or more contacts and use the choice to refine the query; e.g., choose between contacts “Jim Alpha” and “Jim Beta” and run a subsequent query based on the chosen contact.
- Communications options for contacting Jim can be based on the specific contact. For example, suppose the contact is “Jim Beta” and the contact database only includes e-mail contact information for Jim Beta. In this example, the displayed options for contacting Jim Beta may list e-mail only and may not include, for example, contacting Jim Beta via phone or via a social network.
- contacts can be differentiated by a context that includes recently accessed information such as documents. For example, suppose the user of the wearable computing device had recently been accessing work-related information via the wearable computing device, including some documents written by co-worker Jim Delta. Then, if the user says “Contact Jim”, the wearable computing device can use historical information about recently accessed information to determine that the “Jim” in this context could be “Jim Delta” and add “Jim Delta” to a list of contacts when asking the user to differentiate between one or more contacts. In such scenarios, if the user does not have “Jim Delta” as a contact, the wearable computing device could query other devices, such as a work-related server, to determine contact information. The devices to be queried could be selected based on the context; e.g., (domains of) servers that provided recently-accessed information.
- additional or different context signals can be utilized.
- a user of the wearable computing device might say “Show Map to Last Saturday's Restaurant.”
- the wearable computing device can convert this speech to text.
- the wearable computing device can generate the desired map, perhaps by looking up information about the activities of the user on “Last Saturday” in one or more calendar data bases, e-mails, and/or other data sources to find one or more restaurants associated with the user on last Saturday. If multiple restaurants are found, the user can be prompted (visually and/or audibly) to select one of the restaurants.
- a map to the restaurant can be displayed via the wearable computing device.
- Other related information such as pictures of the restaurant, menus, diner reviews, turn-by-turn directions to get to the restaurant, information about friends/contacts at or near the restaurant, related establishments, etc. can be provided to the user of the wearable computing device as well.
- FIG. 1 illustrates an example system 100 for receiving, transmitting, and displaying data.
- the system 100 is shown in the form of a wearable computing device. While FIG. 1 illustrates eyeglasses 102 as an example of a wearable computing device, other types of wearable computing devices could additionally or alternatively be used.
- the eyeglasses 102 comprise frame elements including lens-frames 104 and 106 and a center frame support 108 , lens elements 110 and 112 , and extending side-arms 114 and 116 .
- the center frame support 108 and the extending side-arms 114 and 116 are configured to secure the eyeglasses 102 to a user's face via a user's nose and ears, respectively.
- Each of the frame elements 104 , 106 , and 108 and the extending side-arms 114 and 116 may be formed of a solid structure of plastic or metal, or may be formed of a hollow structure of similar material so as to allow wiring and component interconnects to be internally routed through the eyeglasses 102 .
- Each of the lens elements 110 and 112 may include a material on which an image or graphic can be displayed. Each of the lens elements 110 and 112 may also be sufficiently transparent to allow a user to see through the lens element. These two features of the lens elements could be combined; for example, to provide an augmented reality or heads-up display where the projected image or graphic can be superimposed over or provided in conjunction with a real-world view as perceived by the user through the lens elements.
- the extending side-arms 114 and 116 are each projections that extend away from the frame elements 104 and 106 , respectively, and are positioned behind a user's ears to secure the eyeglasses 102 to the user.
- the extending side-arms 114 and 116 may further secure the eyeglasses 102 to the user by extending around a rear portion of the user's head.
- the system 100 may be connected to or be integral to a head-mounted helmet structure. Other possibilities exist as well.
- the system 100 may also include an on-board computing system 118 , a video camera 120 , a sensor 122 , and finger-operable touch pads 124 , 126 .
- the on-board computing system 118 is shown to be positioned on the extending side-arm 114 of the eyeglasses 102 ; however, the on-board computing system 118 may be provided on other parts of the eyeglasses 102 .
- the on-board computing system 118 may include a processor and memory, for example.
- the on-board computing system 118 may be configured to receive and analyze data from the video camera 120 and the finger-operable touch pads 124 , 126 (and possibly from other sensory devices, user interfaces, or both) and generate images for output to the lens elements 110 and 112 .
- the video camera 120 is shown to be positioned on the extending side-arm 114 of the eyeglasses 102 ; however, the video camera 120 may be provided on other parts of the eyeglasses 102 .
- the video camera 120 may be configured to capture images at various resolutions or at different frame rates. Many video cameras with a small form-factor, such as those used in cell phones or webcams, for example, may be incorporated into an example of the system 100 .
- FIG. 1 illustrates one video camera 120 , more video cameras may be used, and each may be configured to capture the same view, or to capture different views.
- the video camera 120 may be forward facing to capture at least a portion of the real-world view perceived by the user. This forward facing image captured by the video camera 120 may then be used to generate an augmented reality where computer generated images appear to interact with the real-world view perceived by the user.
- the sensor 122 is shown mounted on the extending side-arm 116 of the eyeglasses 102 ; however, the sensor 122 may be provided on other parts of the eyeglasses 102 .
- the sensor 122 may include one or more motion sensors, such as a gyroscope and/or an accelerometer. Other sensing devices may be included within the sensor 122 and other sensing functions may be performed by the sensor 122 .
- the finger-operable touch pads 124 , 126 are shown mounted on the extending side-arms 114 , 116 of the eyeglasses 102 . Each of finger-operable touch pads 124 , 126 may be used by a user to input commands.
- the finger-operable touch pads 124 , 126 may sense at least one of a position and a movement of a finger via capacitive sensing, resistance sensing, or a surface acoustic wave process, among other possibilities.
- the finger-operable touch pads 124 , 126 may be capable of sensing finger movement in a direction parallel to the pad surface, in a direction normal to the pad surface, or both, and may also be capable of sensing a level of pressure applied.
- the finger-operable touch pads 124 , 126 may be formed of one or more translucent or transparent insulating layers and one or more translucent or transparent conducting layers. Edges of the finger-operable touch pads 124 , 126 may be formed to have a raised, indented, or roughened surface, so as to provide tactile feedback to a user when the user's finger reaches the edge of the finger-operable touch pads 124 , 126 . Each of the finger-operable touch pads 124 , 126 may be operated independently, and may provide a different function.
- FIG. 2 illustrates another view of the system 100 of FIG. 1 .
- the lens elements 110 and 112 may act as display elements.
- the eyeglasses 102 may include a first projector 128 coupled to an inside surface of the extending side-arm 116 and configured to project a display 130 onto an inside surface of the lens element 112 .
- a second projector 132 may be coupled to an inside surface of the extending side-arm 114 and configured to project a display 134 onto an inside surface of the lens element 110 .
- the lens elements 110 and 112 may act as a combiner in a light projection system and may include a coating that reflects the light projected onto them from the projectors 128 and 132 . In some embodiments, a special coating may not be used (e.g., when the projectors 128 and 132 are scanning laser devices).
- the lens elements 110 , 112 themselves may include: a transparent or semi-transparent matrix display, such as an electroluminescent display or a liquid crystal display; one or more waveguides for delivering an image to the user's eyes; and/or other optical elements capable of delivering an in focus near-to-eye image to the user.
- a corresponding display driver may be disposed within the frame elements 104 and 106 for driving such a matrix display.
- a scanning laser device such as low-power laser or LED source and accompanying scanning system, can draw a raster display directly onto the retina of one or more of the user's eyes. The user can then perceive the raster display based on the light reaching the retina.
- system 100 can be configured for audio output.
- system 100 can be equipped with speaker(s), earphone(s), and/or earphone jack(s).
- audio output can be provided via the speaker(s), earphone(s), and/or earphone jack(s).
- FIG. 3 is a schematic drawing of a system 136 illustrating an example computer network infrastructure.
- a device 138 communicates using a communication link 140 (e.g., a wired or wireless connection) to a remote device 142 .
- the device 138 may be any type of device that can receive data and display information corresponding to or associated with the data.
- the device 138 may be a heads-up display system, such as the eyeglasses 102 described with reference to FIGS. 1 and 2 .
- the device 138 may include a display system 144 comprising a processor 146 and a display 148 .
- the display 148 may be, for example, an optical see-through display, an optical see-around display, or a video see-through display.
- the processor 146 may receive data from the remote device 142 , and configure the data for display on the display 148 .
- the processor 146 may be any type of processor, such as a micro-processor or a digital signal processor, for example.
- the device 138 may further include on-board data storage, such as memory 150 shown coupled to the processor 146 in FIG. 3 .
- the memory 150 may store software and/or data that can be accessed and executed by the processor 146 , for example.
- the remote device 142 may be any type of computing device or transmitter including a laptop computer, a mobile telephone, etc., that is configured to transmit data to the device 138 .
- the remote device 142 and the device 138 may contain hardware to enable the communication link 140 , such as processors, transmitters, receivers, antennas, etc.
- the communication link 140 is illustrated as a wireless connection.
- the wireless connection could use, e.g., Bluetooth® radio technology, communication protocols described in IEEE 802.11 (including any IEEE 802.11 revisions), Cellular technology (such as GSM, CDMA, UMTS, EV-DO, WiMAX, or LTE), or Zigbee® technology, among other possibilities.
- wired connections may be used.
- the communication link 140 may be a wired link via a serial bus such as a universal serial bus or a parallel bus.
- a wired connection may be a proprietary connection as well.
- the communication link 140 may also be a combination of wired and wireless connections.
- the remote device 142 may be accessible via the Internet and may comprise a computing cluster associated with a particular web service (e.g., social-networking, photo sharing, address book, etc.).
- FIG. 4 is a functional block diagram for a wearable computing system 400 in accordance with an example embodiment.
- System 400 is configured to monitor incoming data from a number of input sources 404 .
- system 400 can monitor speech received via microphone 408 and, may convert the speech to text using speech-to-text module 426 .
- the input speech can include instructions that specify actions and objects for the actions.
- system 400 can be configured to detect instructions, and to responsively initiate the actions specified in the instructions.
- system 400 includes one or more input-source interfaces 402 for receiving data from input sources 404 .
- the input sources 404 include, for example, an application 406 , a microphone 408 , a keyboard 410 , a camera 412 , and a touchpad 414 .
- a given input-source interface 402 may be configured to interface with and receive data from a single input source, such as microphone 408 .
- a given input-source interface 402 may be configured to simultaneously interface with multiple input sources, such as input sources 406 - 414 .
- System 400 can receive a number of different modalities of input data from input sources 404 .
- system 400 may receive, for example, audio data from microphone 408 , text data from keypad 410 , video data and/or image data from camera(s) 412 , and/or gesture data from touchpad 414 .
- a system may be configured to receive other modalities of data, in addition or in the alternative to those described, without departing from the scope of the invention.
- system 400 includes an input selection module 416 , which generally functions to evaluate the input data from the various input sources 404 .
- input selection module 416 may be configured to receive input data from the input sources 404 via input source interfaces 402 and detect one or more data patterns in the input data.
- input selection module 416 may detect multiple concurrent data patterns in the input data. For example, input selection module 416 may detect a first data pattern in data from a first source and, simultaneously, detect a second data pattern in data from a second source. As such, selection criteria 418 may provide input-selection rules that prioritize certain data patterns and/or certain input sources.
- selection criteria 418 may prioritize detection of speech in audio data from microphone 408 over other data patterns detected in video data from camera 412 . Accordingly, some embodiments may be configured to display a text conversion of speech whenever speech matching a data pattern is detected in incoming audio data, regardless of whether there is also a matching data pattern in incoming video data. Similarly, if input selection module 416 detects that a user is entering text via a keyboard 410 , this text may be displayed, even when there is a matching data pattern in incoming audio data and/or in incoming video data; for example, where keyboard data is given priority over audio data and video data by selection criteria 418 .
- selection criteria 418 may provide input-selection rules that prioritize certain data patterns when multiple matching data patterns are detected from a common input source. For instance, when explicit commands are received in audio data, the explicit commands may be given priority over implicit information in the audio data from input sources 404 . As one specific example, input-selection criteria 418 may specify that when a user says “show video” (e.g., when “show video” is detected in audio data from microphone 408 ), then this should be interpreted as an explicit command to select camera 412 as the input source and display video from camera 412 .
- selection criteria 418 may specify other hierarchies and/or other prioritizations of input sources and/or data patterns, without departing from the scope of the invention. Thus, selection criteria 418 may be based on one or more objectives in a specific implementation.
- the selection criteria 418 indicate that multiple input sources 404 should be selected.
- a scenario may exist where text is detected in input data from keyboard 410 and speech is detected in audio data from microphone 408 .
- speech-to-text module 426 may convert the speech from the audio data to text, and this text may be merged with the text from the keyboard for display.
- scenarios may exist where video or an image from camera 412 is displayed, and text is overlaid on top of the video or image. In such a scenario, the text may be obtained from the keyboard 410 and/or obtained via speech-to-text module 426 converting speech in audio data from microphone 408 .
- Many other examples combinations of multiple input sources, which combine a variable number of input sources, are also possible.
- the selection criteria 418 can indicate that speech is to be evaluated by speech evaluation module 430 .
- Speech evaluation module 430 can be configured to receive speech and/or text as input, evaluate the input, and responsively generate one or more commands.
- speech input “Display map” can be received at microphone 408 , passed through input source interface 402 , and received at input selection module 416 .
- Selection criteria 418 can direct input selection module 416 to: (1) convert the spoken input to corresponding text via speech-to-text module 426 and (2) provide the corresponding text to speech evaluation module 430 for evaluation.
- part or all of the functionality of one or more of the herein-described modules 416 , 420 , 426 , 430 , selection criteria 418 , and historical context 424 can be combined with one or more other modules.
- the part or all of the functionality of speech evaluation module 430 can be combined with input selection module 416 or speech-to-text-module 426 .
- Speech evaluation module 430 can evaluate the text of “Display map” to determine that the text includes an action or command of “Display” and an object of “map.” Based on the evaluation, speech evaluation module 430 can send a command to generate a map; e.g., send a query to a server to provide a map. Upon receiving the map, speech evaluation module 430 can then send a command to Head Mounted Display (HMD) 401 to display the received map.
- HMD Head Mounted Display
- System 400 can select an input based on implicit information extracted from input data from the various possible input sources. This implicit information may correspond to certain data patterns in the input data.
- input selection module 416 may monitor incoming audio data for various data patterns, according to the input-selection criteria.
- the input-selection criteria may specify numerous types of data patterns, which may vary in complexity and/or form.
- input selection module 416 may monitor audio data for: (i) patterns that are indicative of human speech in general, (ii) patterns that are indicative of human speech by a particular person (e.g., the owner of the device, or a friend or spouse of the owner), (iii) patterns that are indicative of a certain type of human speech (e.g., a question or a proposition), (iv) patterns that are indicative of human speech inflected with a certain emotion (e.g., angry speech, happy speech, sad speech, and so on), (v) patterns that are indicative of human speech associated with a certain context (e.g., a pre-recorded announcement on a subway car or a statement typically given by a flight attendant on an airplane), (vi) patterns that are indicative of a certain type of human speech (e.g., speech that is not in a speaker's native language), (vii) patterns indicative of certain types of non-speech audio (e.g., music) and/or of non-speech audio with certain
- a system may be configured to monitor audio data for data patterns that include or are indicative of speech by a particular user, who is associated with the system (e.g., the owner of a wearable computer). Accordingly, the speech-to-text module 426 may convert the speech to corresponding text, which may then be displayed.
- the speech-to-text module 426 may convert the speech to corresponding text, which may then be displayed.
- the audio data in which speech is detected may be analyzed in order to verify that the speech is actually that of the user associated with the system. For example, the audio data can be compared to previously-received samples of audio data known to be utterances of the user associated with the system to verify that a speaker is (or is not) the user associated with the system.
- a “voiceprint” or template of the voice of the user associated with the system can be generated, and compared to a voiceprint generated from input audio data. Other techniques for verifying speaker(s) are possible as well.
- speech evaluation module 430 can generate command(s) to search various sources for the named person's contact information or other information related to the named person.
- Speech evaluation module 430 may perform one or more implicit searches, for example, when the person's name is stated in the midst of a conversation, and the user does not explicitly request the information about the person. Implicit searches can be performed for other types of content, such as other proper nouns, repeated words, unusual words, and/or other words.
- speech evaluation module 430 can indicate that the contact information may be displayed.
- the contact information can include phone number(s), email address(es), mailing address(es), images/video related to the contact, and/or social networking information.
- the contact information may be displayed in various forms—the contact information can be displayed visually (e.g., using HMD 401 ) and/or audibly (e.g., using a text-to-speech module, not shown in FIG. 4 , in combination with an audio output, such as a speaker or earphone not shown in FIG. 4 ). Many other types of contact information are possible as well.
- the default action may be not to display anything related to the detected speech.
- Other default actions are also possible.
- input selection module 416 may be configured to select an input source and/or to select input content based on context.
- input selection module 416 may coordinate with context evaluation module 420 , which is configured to evaluate context signals from one or more context information sources 422 .
- context evaluation module 420 may determine a context, and then relay the determined context to input selection module 416 .
- input selection module 416 can provide the context to another module; e.g., speech evaluation module 430 .
- context evaluation module 420 may determine context using various “context signals,” which may be any signals or information pertaining to the state or the environment surrounding the system or a user associated with the system.
- a wearable computer may be configured to receive one or more context signals, such as location signals, time signals, environmental signals, and so on. These context signals may be received from, or derived from information received from, context information sources 422 and/or other sources.
- context signals may include: (a) the current time, (b) the current date, (c) the current day of the week, (d) the current month, (e) the current season, (f) a time of a future event, (g) a date of a future event or future user-context, (h) a day of the week of a future event or future user-context, (i) a month of a future event or future user-context, (j) a season of a future event or future user-context, (k) a time of a past event or past user-context, (l) a date of a past event or past user-context, (m) a day of the week of a past event or past user-context, (n) a month of a past event or past user-context, (o) a season of a past event or past user-context, ambient temperature
- context evaluation module 420 may identify the context as a quantitative or qualitative value of one context signal (e.g., the time of the day, a current location, a user status). The context may also be determined based on a plurality of context signals (e.g., the time of day, the day of the week, and the location of the user). In other embodiments, the context evaluation module 420 may extrapolate from the information provided by context signals. For example, a determined user-context may be determined, in part, based on context signals that are provided by a user (e.g., a label for a location such as “work” or “home”, or user-provided status information such as “on vacation”).
- context information sources 422 may include various sensors that provide context information. These sensors may be included as part of or communicatively coupled to system 400 . Examples of such sensors include, but are not limited to, a temperature sensor, an accelerometer, a gyroscope, a compass, a barometer, a moisture sensor, one or more electrodes, a shock sensor, one or more chemical sample and/or analysis systems, one or more biological sensors, an ambient light sensor, a microphone, and/or a digital camera, among others.
- System 400 may also be configured to acquire context signals from various data sources.
- context evaluation module 420 can be configured to derive information from network-based weather-report feeds, news feeds and/or financial-market feeds, a system clock providing a reference for time-based context signals, and/or a location-determining system (e.g., GPS), among others.
- a location-determining system e.g., GPS
- system 400 may also be configured to learn over time about a user's preferences in certain contexts, and to update selection criteria 418 accordingly. For example, whenever an explicit input-content instruction is received, a corresponding entry may be created in historical context database 424 . This entry may include the input source and/or input content indicated by the input-content instruction, as well as context information that is available at or near the receipt of the input-content instruction. Context evaluation module 420 may periodically evaluate historical context database 424 and determine a correlation exists between explicit instructions to select a certain input source and/or certain input content, and a certain context. When such a correlation exists, selection criteria 418 may be updated to specify that the input source should be automatically selected, and/or that the input content should be automatically displayed, upon detection of the corresponding context.
- system 400 may be configured for an “on-the-fly” determination of whether a current context has historically been associated with certain input sources and/or certain input content.
- context evaluation module 420 may compare a current context to historical context data in historical context database 424 , and determine whether certain content historically has been correlated with the current context. If a correlation is found, then context evaluation module 420 may automatically display the associated input content.
- context evaluation module 420 can determine that the context include (a) a location of system 400 is related to “work” (b) a time just before or at 12:00, (c) a history of ordering lunch from the aforementioned seven restaurants, and (c) that six of the seven restaurants are open at this time, based on online listings. Then, the context evaluation module 420 can generate a command to display a reminder to “Order Lunch” with a list of the six open restaurants for order selection, and perhaps including information indicating that the seventh restaurant is closed. In response, the user can select a restaurant from the list using input sources 404 , choose another restaurant, dismiss/postpone the order, or perhaps, perform some other action.
- speech evaluation module 430 may select the particular application that is appropriate to open the file as the input source, launch the selected application in the multimode input field, and then open the named file using the application.
- the user may say “search” and then state or type the terms to be searched, or identify other content to be searched, such as an image, for example.
- speech evaluation module 430 may responsively form a query to a search engine, provide the query with subsequently stated terms or identified content, and receive search results in response to the query.
- Implicit searches also can be performed by this technique of forming a query based on identified content; e.g., the word(s) that provoked the implicit search, providing the query with identified content to a search engine, and receiving search results in response to the query.
- speech actions may include objects that directly identify the input source or sources to select (e.g., a “select video” instruction), or may identify an input source by specifying an action that involves the input source (e.g., a “contact information” or “search” action). Many other actions of speech input can identify an input source.
- Historical context database 424 can also, or instead, include information about a document context that can be included a context.
- a document context may involve context information derived from a given document within a collection of documents, such as, but not limited to, related collections of documents and past documents that have been created by the user and/or by other users. For example, based on the fact that a user has created a number of purchase order documents in the past, a background process may interpret the document in the context of a purchase order agreement, perhaps searching for supplier names and/or supplier part numbers upon which a search requests can be based.
- a document can be a bounded physical or digital representation of a body of information, or content.
- Content of the document can include text, images, video, audio, multi-media content, and/or other types of content.
- Document-property information can be associated with a document, such as, but not limited to, document names, sizes, locations, references, partial or complete content of documents, criteria for selecting documents to form a context and/or to locate a document. Other types of content and document-property information are possible as well.
- a document can be accessed via one or more references such as, but not limited to, a Uniform Resource Locator (URL), a Uniform Resource Identifier (URI), a volume name/number, a title, a page number, an address, a storage address, such as a memory address or disk sector, a library index number, an International Standard Book Number (ISBN), a bar code, and/or other identifying information.
- URL Uniform Resource Locator
- URI Uniform Resource Identifier
- volume name/number a volume name/number
- title a title
- a page number an address
- a storage address such as a memory address or disk sector
- ISBN International Standard Book Number
- bar code and/or other identifying information.
- Other document references are possible as well.
- system 400 may allow a user to provide explicit instructions via other input sources, such as keyboard 410 and/or touchpad 414 .
- explicit instructions received via other input sources 404 may include input-content instructions to select a certain input source and/or to display certain content, as well explicit instructions to perform other actions.
- FIG. 5A depicts a scenario 500 of speech evaluation in accordance with an example embodiment. Scenarios 500 , 600 , 700 , 800 , and 900 below each involve speaker 502 utilizing wearable computing device 510 .
- An example wearable computing device that could be utilized as device 510 is system 400 , described in detail above with reference to FIG. 4 .
- device 510 can be configured to process an utterance to determine whether or not the utterance is a speech command.
- a speech command can have one or more actions and zero or more objects for each action.
- the speech command “Shutdown” without an object can be interpreted by device 510 to power itself off.
- the speech command “Shutdown earphones and speakers” can be interpreted by device 510 to stop output from and/or power down earphone(s) and speaker(s) associated with device 510 .
- Many other examples of speech commands, actions, and objects beyond those described herein are possible as well.
- the order of actions and objects in a speech command can be reversed or otherwise reordered.
- speech commands in German and other languages typically have object(s) preceding actions.
- the device can understand the utterance “Mom phone” to be a speech command to call Mom, perhaps from a very young English-speaking user.
- Scenario 500 begins at 500 A with speaker 502 instructing device 510 to “Contact Scott at work” via utterance 520 .
- device 510 prompts speaker 502 to disambiguate the action “contact” with prompt 522 .
- prompt 522 includes a question “Contact?” and two options “E-mail” and “Phone.” In other scenarios, prompt 522 can include more than two options to disambiguate an action.
- speaker 502 disambiguates the action “contact” via utterance 530 of “Phone.”
- device 510 prompts speaker 502 at 500 D to disambiguate Scott using prompt 532 .
- FIG. 5 shows that prompt 532 includes a question “Scott?” and two options “Scott C.” and “Scott H.”
- speaker 502 responds to prompt 532 with utterance 540 of “Scott H.”
- device 510 places a phone call to Scott H. at work, and generates prompt 542 informing speaker 502 that device 510 is “Phoning Scott H. at Work . . . ”
- FIGS. 5B and 5C depict processing by speech evaluation module 430 for speech uttered in scenario 500 in accordance with an example embodiment.
- Speech evaluation module 430 is configured to receive speech input in either audible or textual form.
- FIG. 5B shows the speech input of “Contact Scott at Work” in textual form.
- speech input received in audible form is converted to text and then processed as described herein.
- speech evaluation module can provide speech input in audible form to speech-to-text module 426 for conversion to textual form, and then process the converted audible-form speech input.
- speech evaluation module 430 determines an input action for the speech input.
- FIG. 5B shows that speech evaluation module 430 determines the input action by performing action lookup 552 , and also shows that techniques for add action 554 a and search engine search 554 b can be utilized along with, or instead of, performing action lookup 552 .
- Action lookup 552 can divide speech input into words and compare each word with one or more known action words.
- the known action words can be stored, searched, and retrieved using a list, table, tree, trie, dictionary, database, and/or other data structure(s) configured to store at least one action word. Then, action lookup 552 can find word(s) in the speech input that are known action words by looking up the each input word in the data structure(s) storing the known action words.
- Example action words include, but are not limited, to words related to control of device 510 (e.g., turn on or off, louder, softer, increase, decrease, mute, output, clear, erase, brighten, darken, etc.), document processing (e.g., open, load, close, edit, save, undo, replace, delete, insert, format, etc.), communications (e.g., e-mail, mail, call, contact, send, receive, get, post, tweet, text, etc.), searches (e.g., find, search, look for, locate, etc.), content delivery (e.g., show, play, display), and other action words. Many other example action words are possible as well.
- word processing e.g., open, load, close, edit, save, undo, replace, delete, insert, format, etc.
- communications e.g., e-mail, mail, call, contact, send, receive, get, post, tweet, text, etc.
- searches e.g., find, search, look for
- action lookup 552 can identify the word “contact” as an action word.
- the word contact can be further identified as a “communication action” or action word related to communications, such as indicated in the paragraph above.
- Block 556 of FIG. 5B shows that speech evaluation module 430 has identified an action of “contact” in the speech input.
- speech evaluation module 530 can “disambiguate” the word “contact.” Disambiguation involves determining a (more) precise meaning for one or more words in speech input. For example, while “contact” is a communication action, multiple techniques can be used to contact a person utilizing device 510 . For example, device 510 can be used to contact a person and/or device via telephone, e-mail, text message, blog entry, tweet, and/or other communications techniques.
- Preference information 560 can include preferences for techniques for use in contacting others (e.g., always call Alice, always tweet Bob, call Carol only between 10 AM and 10 PM, only contact Dan when at work or at home), information about contact lists and other contextual information, calendar information, information about previous speech commands, information about disambiguating action words, and/or other information.
- preference information 560 can indicate that speaker 502 prefers to use phone calls and e-mail to “contact” others. Since preference information 560 indicates that two or more possible actions can be performed, speech evaluation module 430 can determine that user prompt 562 can disambiguate the action of contacting between telephoning and e-mailing.
- FIG. 5B shows that techniques of search engine search 564 a and/or contextual search 564 b can be utilized along with, or instead of, performing user prompt 562 .
- Contact prompt 566 shown in FIG. 5B is the same as prompt 522 of FIG. 5A .
- speech evaluation module 430 can await user input at block 568 .
- the user input is “phone” as shown as utterance 530 of FIG. 5B and in block 570 of FIG. 5C , where the action is determined to be phone.
- an action identifier and/or other information about the phone action can be maintained as well by speech evaluation module 430 .
- speech evaluation module 430 can remove the word disambiguated “contact” from the input, and process the remaining input of “Scott at Work” as an object for the phone action.
- speech evaluation module 430 disambiguates the word “Scott” for the phone action.
- FIG. 5C shows that speech evaluation module 430 can disambiguate the word Scott using contextual search 576 a and user prompt 576 b , and also shows that search engine search 578 can be utilized along with, or instead of, contextual search 576 a and user prompt 576 b.
- Contextual search 576 a involves searching historical context database 424 and perhaps other contextual information.
- the contextual search can be performed by speech evaluation module 430 and/or content evaluation module 420 (shown in FIG. 4 ).
- historical context database 424 can include entries regarding input sources and content, such as documents, web pages, URLs, URIs, computer addresses such as Internet Protocol (IP) addresses, images, video files, audio files, and/or other files accessed by device 510 .
- IP Internet Protocol
- historical context database 424 can store and/or retrieve context signals as well, such as a current time and/or location when an input source is accessed.
- other contextual information can be searched as well as part of a contextual search.
- the other contextual information can include information about a speaker 502 , such as identification information of speaker 502 , contacts/friends of speaker 502 , a calendar of events for the speaker 502 , organizations related to speaker 502 , and other information related to speaker 502 .
- the other context information can include information about other entities other than speaker 502 such as members of the speaker 502 's family, work colleagues, mailing lists, blogs, feeds, organization(s), persons with shared interests, and/or other related entities.
- speech evaluation module 430 can determine that there are two persons named Scott that speaker 502 may be trying to contact: Scott C. or Scott H. To disambiguate between Scott C. and Scott H., speech evaluation module 430 can use user prompt technique 576 b to provide name prompt 580 .
- FIG. 5C shows that name prompt 580 is the same as prompt 532 of FIG. 5B .
- speech evaluation module 430 can await user input at block 582 .
- the user input is “Scott H.” as shown as utterance 540 of FIG. 5B and in block 584 of FIG. 5C , where Name is determined to be “Scott H.”
- an identifier and/or other information about name and/or Scott H. can be maintained as well by speech evaluation module 430 .
- speech evaluation module 430 can remove the word “Scott” from the input, as already disambiguated, and process the remaining input of “at Work” as part of the object whose name is “Scott H.”
- speech evaluation module 430 can perform a contextual search for a phone number for “Scott H.” that is “at work”, and determine a phone number for Scott H. at work. For example, speech evaluation module 430 can search for “Scott H” in a contact database, list of most recently accessed documents, work-related computer, and/or other resources to find a telephone number for Scott H. at work. In this example, Scott H.'s work number is (555) 555-5555.
- FIG. 5C also indicates that speech evaluation module 430 can also or instead perform search engine search 590 a and/or user prompt 590 b to determine the phone number.
- speech evaluation module 430 can output a command to phone the number (555) 555-5555 in response to the speech input of “Contact Scott at Work.”
- device 510 can utilize telephone-related hardware and/or software to place a call to telephone number (555) 555-5555 on behalf of speaker 502 , process the call, and tear down the call when the call ends.
- FIG. 6 depicts a scenario 600 of speech evaluation in accordance with an example embodiment.
- Scenario 600 begins at 600 A with speaker 502 instructing device 510 using utterance 610 of “Search kumquat.”
- speech evaluation module 430 of device 510 can determine that the action is “search” and the object is “kumquat” using the techniques discussed above with reference to FIGS. 5B and 5C . Upon determining that the action is “search”, speech evaluation module 430 can send a command to utilize a search engine to search for the object kumquat, and also display a user prompt that the search is in progress.
- FIG. 6 shows that device 510 shows prompt 620 of “Search in progress . . . ” to show the search is in progress.
- FIG. 6 shows a search result 630 of “Kumquats are small fruit” displayed using device 510 .
- Search result 630 can be part or all of information returned by the search engine responding to the command to utilize the search engine for the object kumquat.
- FIG. 6 shows that scenario 600 continues by speaker 502 providing utterance “Display image” 640 to device 510 .
- speech evaluation module 430 can determine that utterance 640 has an action of “display” and an object of “image” using the techniques discussed above with reference to FIGS. 5B and 5C .
- Device 510 can disambiguate the object “image” using the context of the previous command, where the object was “kumquat”, to determine that speech input is a command to display an image of a kumquat.
- speech evaluation module 430 can perform another search (or perhaps process results of the already-performed search) to find an image related to the object “kumquat.” For example, speech evaluation module 430 can search for images and/or video using the keyword kumquat. In response, a search engine or other entity can provide device 510 an image related to a kumquat.
- FIG. 6 shows a display of kumquat image 650 and text 652 of “kumquat” displayed in response to utterance 640 .
- speaker 502 can request display of a “next” or “previous” image, save the image, and/or communicate the image to another person. Many other scenarios with searches and image displays are possible as well.
- audio and/or video output can be provided with, or instead of, image 650 and/or text 652 .
- FIG. 7 depicts a scenario 700 of speech evaluation in accordance with an example embodiment.
- Scenario 700 begins at 700 A with speaker 502 instructing device 510 using utterance 710 of “Output to speaker.”
- speech evaluation module 430 of device 510 can determine that the action is “output” and the object is “to speaker” using the techniques discussed above with reference to FIGS. 5B and 5C . Upon determining that the action is “output”, speech evaluation module 430 can send a command to direct any future output to the object of the speech input; that is direct output to audio-output device configured for producing audio output (e.g., provide output to an speaker or earphone jack).
- audio-output device configured for producing audio output (e.g., provide output to an speaker or earphone jack).
- FIG. 7 shows that, at 700 B, device 510 confirms that utterance 710 has been processed by outputting output 720 of “Using audio output” via an audio-output device.
- FIG. 7 also shows that, at 700 C, speaker 502 instructs device 510 with utterance 730 of “Output to display and speaker.”
- speech evaluation module 430 of device 510 can determine that the action is “output” and the object is “to display and speaker” using the techniques discussed above with reference to FIGS. 5B and 5C .
- speech evaluation module 430 can send a command to direct any future output to the object of the speech input to both the audio-output device and to a display, such as one or more lens elements 110 , 112 , and/or HMD 401 .
- FIG. 7 shows that, at 700 D, device 510 can confirm that utterance 730 has been processed by outputting output 740 of “Using audio output” via an audio-output device and output 742 of “Using display output” on a lens element.
- output can be directed to a display only.
- output can be stored (e.g., in a file), provided to other output devices of device 510 , communicated using a communication link to another computing device and/or a network, and/or provided to other outputs.
- output can be directed to a file for some period of time and later speech input can close the file, ending storage of the output in the file.
- a first utterance can be speech input to “Copy output to file output1”, then all output can be stored in the file “output1”, and later speech input, such as “Close output1” can terminate storage of the output to the output1 file.
- input devices can be turned on and off via speech input as well; e.g., “Turn on microphone”, “Turn off keyboard”, etc. Many other scenarios are possible as well.
- FIG. 8 depicts a scenario 800 of speech evaluation in accordance with an example embodiment.
- Scenario 800 begins at 800 A with speaker 502 instructing device 510 using utterance 810 of “Output to display.”
- speech evaluation module 430 of device 510 can determine that the action is “output” and the object is “to display” using the techniques discussed above with reference to FIGS. 5B and 5C .
- FIG. 8 shows that, at 800 B, device 510 confirms the output is provided to the display by outputting prompt 820 of “Using display output” on a display of device 510 .
- FIG. 8 shows two speakers—speaker 502 and speaker 830 —simultaneously providing speech input to device 510 .
- Speaker 502 provides speech input to device 510 via utterance 840 of “Display anniversary” and speaker 830 provides speech input to device 510 via utterance 842 of “Search for cars.”
- device 510 can analyze the audio data in which speech is detected to verify that the speech is associated with an authorized user of the system. For example, as discussed above, device 510 can use voiceprints to determine authorized or unauthorized users.
- priority and/or security information can be associated with a voiceprint and/or other speech characteristics that identify a speaker.
- the priority information can include information that specifies an importance of a speaker; for example, suppose a device 510 has two possible speakers: speaker O that owns device 510 , and speaker F that borrows device 510 on occasion. Then, the priority of speaker O can indicate that speaker O has more importance than speaker F.
- the priority information can be used to determine whose speech input that device 510 processes when multiple authorized speakers provide simultaneous, or near simultaneous speech input. In this example, when speakers O and F both speak, device 510 can use the priority information to process speaker O's speech input.
- Security information can be used to enable or disable certain functions of device 510 .
- a guest level of security which lets a speaker perform searches, display search results, and turn on/off device 510 via speech commands only
- an owner level of security which lets a speaker perform all actions via speech commands.
- speaker F and O can be assigned the guest level of security
- speaker O can be assigned the user level of security.
- Many other techniques for priority and/or security are possible as well.
- device 510 can store and/or access one or more stored voiceprints of authorized users. Then, upon receiving speech input, device 510 generate a voiceprint of each speaker identified in the audio data and compare the generated voiceprint(s) with the stored voiceprint(s) of authorized user(s). If a match is found between a stored voiceprint and a generated voiceprint, then the user can be classified as authorized, and device 510 can perform the instruction(s) in the speech input from the authorized user.
- one or more device identifiers can be stored with the voiceprint(s) of authorized user(s).
- both voiceprints and device identifiers can be compared before a user can be authorized to use a specific device; e.g., device 510 . That is, the device can compare generated and stored voiceprints and a current device identifier with a device identifier stored with the voiceprint. A speaker can then be authorized to use a device associated with the current device identifier when both the voiceprints and the device identifiers match.
- These embodiments can permit voiceprint storage in location(s) other than on device 510 .
- priority and/or security information can be associated with some or all stored voiceprint(s).
- device 510 does not generate the voiceprint; rather, device 510 can provides voice data and perhaps current device information to another device that generates the voiceprint.
- the generated voiceprint can be communicated to device 510 and/or compared to stored voiceprint(s) to determine if a speaker is authorized. This can simplify device 510 by permitting generation of voiceprints by devices other than device 510 .
- speaker 510 is determined to be an authorized speaker and speaker 830 is determined to be an unauthorized speaker. Accordingly, utterance 840 is treated as speech input by device 510 and utterance 842 is ignored by device 510 .
- speech evaluation module 430 of device 510 can determine that the action is “display” and the object is “anniversary” using the techniques discussed above with reference to FIGS. 5B and 5C .
- Device 510 can perform a contextual search (or use other techniques) to determine that the anniversary for speaker 510 is on Jan. 29, 2012.
- FIG. 8 shows that, at 800 D, device 510 can generate prompt 850 indicating that the “Anniversary is 1/29/12.”
- both speakers 510 and 830 can be authorized speakers.
- speech inputs from multiple authorized speakers can be processed on a first-come-first-served (FCFS) basis, based on a priority and/or security information associated with a speaker, based on a proximity to device 510 , based on a number of previous speech inputs made by the speaker; i.e., the more previous speech inputs processed by device 510 for a given authorized speaker indicates that the given authorized speaker is to be given a higher priority; based on keywords or passwords used by a speaker and/or by other techniques.
- FCFS first-come-first-served
- a number of speakers can be determined.
- device 510 can determine voice prints, frequency ranges, and/or other speech-related characteristics differ between utterances 840 and 842 .
- a number of speakers of speech input can be determined; e.g., each different set of speech-related characteristics can be assigned to one speaker.
- counting the number of different sets of speech-related characteristics can indicate a number of different speakers.
- outputs can be determined based on the number of speakers. For example, if the number of speakers is one, output can use one format, such as audio output, while another format, such as video, can be used if the number of speakers is greater than one. Such output choices can be stored in preference information 560 . Many other techniques and scenarios involving multiple speakers are possible as well.
- FIG. 9 depicts a scenario 900 of speech evaluation in accordance with an example embodiment.
- Scenario 900 begins at 900 A with speaker 502 instructing device 510 using utterance 910 of “Load last copy of memo1.”
- speech evaluation module 430 of device 510 can determine that the action is “load” and the object is “last copy of memo1” using the techniques discussed above with reference to FIGS. 5B and 5C . Further, as discussed above with reference to FIGS. 5B and 5C , speech evaluation module 430 can disambiguate the “last copy of memo1” object to refer to a most-recently modified version of a file entitled “memo1.”
- FIG. 9 shows that, at 900 B, device 510 displays a first portion of memo1 as output 920 of “Memo1: In 1Q11, we made” on a display of device 510 .
- FIG. 9 indicates that scenario 900 continues at 900 C with speaker 502 instructing device 510 using utterance 930 of “Open DB Q1db.”
- speech evaluation module 430 of device 510 can determine that the action is “open” and the object is “DB Q1db” using the techniques discussed above with reference to FIGS. 5B and 5C . Further, as discussed above with reference to FIGS. 5B and 5C , speech evaluation module 430 can disambiguate the “DB Q1db” object to be a database (DB) entitled “Q1db” and then open the Q1db database.
- DB database
- scenario 900 at 900 D shows device 510 providing prompt 940 of “Q1db: open” on a display of device 510 to indicate that the Q1db database has been opened.
- FIG. 9 indicates that scenario 900 continues at 900 E with speaker 502 instructing device 510 using utterance 950 of “Insert 1Q11 profit from Q1db into memo1.”
- speech evaluation module 430 of device 510 can determine that the action is “insert” and the object is “1Q11 profit from Q1db into memo1” using the techniques discussed above with reference to FIGS. 5B and 5C . Further, as discussed above with reference to FIGS. 5B and 5C , speech evaluation module 430 can disambiguate the “1Q11 profit from Q1db into memo1” object to 1Q11 profit that can be found in the Q1db database and is to be placed in the memo1 file.
- the Q1db database and perhaps other databases are resident; e.g., stored on device 510 . In other embodiments, the Q1db database and perhaps other databases are not resident on device 510 .
- the device 510 can be configured to communicate with Q1db database, regardless of whether the database is or is not resident on the wearable computing device. For example, device 510 can be configured to access databases using a common set of access functions that permit communication with resident database(s) using local communication functionality, non-resident database(s) via a communication link or other communication interface, and both resident and non-resident databases.
- device 510 can generate a command to query Q1db for the 1Q11 profit.
- FIG. 9 shows that, at 900 F of scenario 900 , device 510 has received output from the query command that indicates the 1Q11 profit is $1M, and has provided corresponding prompt 960 on a display of device 510 .
- device 510 can insert the profit value of “$1M” retrieved from the Q1db database into the memo1 file.
- FIG. 9 shows that, at 900 G of scenario 900 , device 510 has generated output 962 of an updated first portion of memo1 that includes the “$1M” from Q1db.
- implicit search requests can be generated for a document.
- An implicit search request is a request for information generated by editing a document. For example, consider that a document is edited by adding the words “sword fighting.”
- an implicit search request for information about sword fighting can be generated and sent to one or more search engines.
- Implicit search requests are search requests generated by device 510 , or perhaps another device, without specific user interaction (e.g., speech input) to control timing of the communication of the search request to a search engine and/or content of the search request.
- Information received from search engines based on implicit search requests can be provided to device 510 , which can display the information without specific user interaction to control timing and/or content of the displayed information.
- information in response to the implicit search request for “sword fighting”, information, historical allusions, literature, music, games, etc. related to sword fighting can be provided using device 510 .
- FIG. 10 is a flow chart of an example method 1000 in accordance with an example embodiment.
- speech input can be received at a wearable computing device. Receiving speech input at wearable computing devices is described above with reference to at least FIGS. 4-9 .
- speech-related text corresponding to the speech input can be generated at the wearable computing device. Generating speech-related text corresponding to speech input is discussed above in more detail with reference to at least FIGS. 4-5C .
- a context for the speech-related text can be determined using the wearable computing device.
- the context can based at least in part on a history of accessed documents and one or more databases. Determining contexts for speech-related text is discussed above is discussed above in more detail with reference to at least FIGS. 4-5C .
- At least one database of the one or more databases is not resident on the wearable computing device.
- the wearable computing device can be configured to communicate with the at least one database that is not resident on the wearable computing device. Resident and non-resident databases are discussed above in more detail at least with reference to FIG. 9 .
- an action is determined, based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text.
- the action can include at least one of a command and a search request. Determining actions based on evaluating contexts and speech-related text is discussed above in more detail at least with reference to FIGS. 5-9 .
- the wearable computing device can generate output based on the command. Generating output based on commands is discussed above in more detail with reference to at least FIGS. 5-9 .
- the command can be selected from the group of a communication command, a scheduling command, a command to display information, a command to save information, and a command to delete information.
- the command can be an implicit search request, and wherein the implicit search request comprises a request to search within the context. Commands and actions are discussed above in greater detail with reference to at least FIGS. 5-9 .
- method 1000 proceeds to block 1030 .
- the search request can be communicated to a search engine. Communicating search requests to search engines is discussed above in more detail at least with reference to FIGS. 6 and 9 .
- search results are received from the search engine. Receiving search results from search engines is discussed above in more detail at least with reference to FIGS. 6 and 9 .
- output is generated based on the search results using the wearable computing device. Generating output based on search results is discussed above in more detail at least with reference to FIGS. 6 and 9 .
- a number of persons providing speech input is determined. Determining the number of persons providing speech input is discussed above in more detail at least with reference to FIG. 8 .
- the output is provided using one or more output components of the wearable computing device based on the number of persons providing speech input.
- the one or more output components can include an audio output and/or a video output. Audio and video outputs are discussed above in more detail at least with reference to FIGS. 4-9 .
- method 1000 includes determining a number of persons providing speech input based on determining a number of different sets of speech-related characteristics. Determining the number of different sets of speech-related characteristics is discussed above in more detail at least with reference to FIG. 8 .
- a user can be associated with the speech input. Then, providing the output comprises providing the output based on the determined user.
- Providing user-controlled output, such as indicated by speech input and perhaps as part of preference information, is discussed above with reference to at least FIGS. 5-9 .
- each block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments.
- Alternative embodiments are included within the scope of these example embodiments.
- functions described as blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved.
- more or fewer blocks and/or functions may be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
- a block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique.
- a block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data).
- the program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique.
- the program code and/or related data may be stored on any type of computer readable medium such as a storage device including a disk or hard drive or other storage medium.
- the computer readable medium may also include non-transitory computer readable media such as computer-readable media that stores data for short periods of time like register memory, processor cache, and random access memory (RAM).
- the computer readable media may also include non-transitory computer readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example.
- the computer readable media may also be any other volatile or non-volatile storage systems.
- a computer readable medium may be considered a computer readable storage medium, for example, or a tangible storage device.
- a block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Optics & Photonics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Methods and apparatus related to processing speech input at a wearable computing device are disclosed. Speech input can be received at the wearable computing device. Speech-related text corresponding to the speech input can be generated. A context can be determined based on database(s) and/or a history of accessed documents. An action can be determined based on an evaluation of at least a portion of the speech-related text and the context. The action can be a command or a search request. If the action is a command, then the wearable computing device can generate output for the command. If the action is a search request, then the wearable computing device can: communicate the search request to a search engine, receive search results from the search engine, and generate output based on the search results. The output can be provided using output component(s) of the wearable computing device.
Description
- This application claims priority to U.S. Provisional Pat. App. No. 61/507,009 entitled “Systems and Methods for Speech Command Processing”, filed on Jul. 12, 2011, which is fully incorporated herein for all purposes.
- Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
- Software applications such as word processing applications can be used to create, edit, and/or view information containing text. For example, word processing software, such as Microsoft Word, can be used to create, edit, and/or view documents that include text.
- Additional software applications can be used to convert speech to text. These applications can recognize spoken words and generate corresponding text. Some of these applications can provide a voice interface to other applications, such as voice mail systems.
- In one aspect of the disclosure of the application, speech input is received at a wearable computing device. Speech-related text corresponding to the speech input is generated at the wearable computing device. A context for the speech-related text is determined using the wearable computing device. The context is based at least in part on a history of accessed documents and one or more databases. Based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, an action is determined. The action includes at least one of a command and a search request. In response to the action including a command, an output based on the command is generated using the wearable computing device. In response to the action including a search request: (i) the search request is communicated to a search engine, (ii) search results are received from the search engine, and an output based on the search results is generated using the wearable computing device. The output is provided using one or more output components of the wearable computing device.
- In still another aspect of the disclosure of the application, an apparatus is provided. The apparatus includes: (i) means for receiving speech input, (ii) means for generating speech-related text corresponding to the speech input, (iii) means for determining a context for the speech-related text based at least in part on a history of accessed documents and one or more databases, (iv) means for determining an action based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, where the action comprises at least one of a command and a search request, (v) means for, in response to the action comprising a command, generating output based on the command, and (vi) means for providing the output.
- In yet another aspect of the disclosure of the application, an article of manufacture including a tangible non-transitory computer-readable storage medium having computer-readable instructions encoded thereon is provided. The computer-readable instructions include: (i) instructions for receiving speech input, (ii) instructions for generating speech-related text corresponding to the speech input, (iii) instructions for determining a context for the speech-related text based at least in part on a history of accessed documents and one or more databases, (iv) instructions for determining an action based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, wherein the action comprises at least one of a command and a search request, (v) instructions for, in response to the action comprising a command, generating output based on the command, (vi) instructions for, in response to the action comprising a search request: (a) communicating the search request to a search engine, (b) receiving search results from the search engine, and (c) generating output based on the search results, and (vi) instructions for providing the output.
-
FIG. 1 is a first view of an example system for receiving, transmitting and displaying data, in accordance with example embodiments. -
FIG. 2 is a second view of an example system ofFIG. 1 , in accordance with example embodiments. -
FIG. 3 is an example schematic drawing of computer network infrastructure, in accordance with an example embodiment. -
FIG. 4 is a functional block diagram for a wearable computing system, in accordance with an example embodiment. -
FIG. 5A depicts a first scenario of speech evaluation in accordance with an example embodiment. -
FIGS. 5B and 5C depict processing by a speech evaluation module for the speech uttered in the scenario ofFIG. 5A in accordance with an example embodiment. -
FIG. 6 depicts a second scenario of speech evaluation in accordance with an example embodiment. -
FIG. 7 depicts a third scenario of speech evaluation in accordance with an example embodiment. -
FIG. 8 depicts a fourth scenario of speech evaluation in accordance with an example embodiment. -
FIG. 9 depicts a fifth scenario of speech evaluation in accordance with an example embodiment. -
FIG. 10 is a flow chart of a method in accordance with an example embodiment. - Techniques are described herein for processing speech input using a wearable computing device. For example, a speaker can say “Contact Jim” to provide speech input to the wearable computing device. The speech input can be received via an audio sensor (e.g., a microphone) of the wearable computing device and can be converted to text.
- A contextual analysis can be applied on the speech and/or text. For this example, the wearable computing device can convert the speech of “Contact Jim” to text. The contextual analysis of the “Contact Jim” speech can be determined using one or more queries for the text. For example, the word “Contact” can lead to a display of various options for contacting a person; e.g., voice, multimedia, text, e-mail, social networking messages, and other options. Also, a query of contacts or similar information can be performed using the text “Jim” to decide who “Jim” might be. In response to the query, one or more contacts can be returned with the name “Jim.”
- In some cases, the speaker can provide additional information to contact a person. For example, if no contacts are returned based on the “Jim” query, the speaker could be prompted for information about the contact; e.g., the speaker could be asked for a full name, an e-mail address, or phone number for a contact.
- In some cases, the wearable computing device can ask the user to choose between one or more contacts and use the choice to refine the query; e.g., choose between contacts “Jim Alpha” and “Jim Beta” and run a subsequent query based on the chosen contact. Communications options for contacting Jim can be based on the specific contact. For example, suppose the contact is “Jim Beta” and the contact database only includes e-mail contact information for Jim Beta. In this example, the displayed options for contacting Jim Beta may list e-mail only and may not include, for example, contacting Jim Beta via phone or via a social network.
- Additionally, contacts can be differentiated by a context that includes recently accessed information such as documents. For example, suppose the user of the wearable computing device had recently been accessing work-related information via the wearable computing device, including some documents written by co-worker Jim Delta. Then, if the user says “Contact Jim”, the wearable computing device can use historical information about recently accessed information to determine that the “Jim” in this context could be “Jim Delta” and add “Jim Delta” to a list of contacts when asking the user to differentiate between one or more contacts. In such scenarios, if the user does not have “Jim Delta” as a contact, the wearable computing device could query other devices, such as a work-related server, to determine contact information. The devices to be queried could be selected based on the context; e.g., (domains of) servers that provided recently-accessed information.
- In some scenarios, additional or different context signals can be utilized. For example, a user of the wearable computing device might say “Show Map to Last Saturday's Restaurant.” The wearable computing device can convert this speech to text. Then, based on the converted text, the wearable computing device can generate the desired map, perhaps by looking up information about the activities of the user on “Last Saturday” in one or more calendar data bases, e-mails, and/or other data sources to find one or more restaurants associated with the user on last Saturday. If multiple restaurants are found, the user can be prompted (visually and/or audibly) to select one of the restaurants. Once a restaurant is determined, a map to the restaurant can be displayed via the wearable computing device. Other related information, such as pictures of the restaurant, menus, diner reviews, turn-by-turn directions to get to the restaurant, information about friends/contacts at or near the restaurant, related establishments, etc. can be provided to the user of the wearable computing device as well.
- System and Device Architecture
-
FIG. 1 illustrates anexample system 100 for receiving, transmitting, and displaying data. Thesystem 100 is shown in the form of a wearable computing device. WhileFIG. 1 illustrateseyeglasses 102 as an example of a wearable computing device, other types of wearable computing devices could additionally or alternatively be used. - As illustrated in
FIG. 1 , theeyeglasses 102 comprise frame elements including lens-frames center frame support 108,lens elements arms center frame support 108 and the extending side-arms eyeglasses 102 to a user's face via a user's nose and ears, respectively. Each of theframe elements arms eyeglasses 102. Each of thelens elements lens elements - The extending side-
arms frame elements eyeglasses 102 to the user. The extending side-arms eyeglasses 102 to the user by extending around a rear portion of the user's head. Additionally or alternatively, thesystem 100 may be connected to or be integral to a head-mounted helmet structure. Other possibilities exist as well. - The
system 100 may also include an on-board computing system 118, avideo camera 120, asensor 122, and finger-operable touch pads board computing system 118 is shown to be positioned on the extending side-arm 114 of theeyeglasses 102; however, the on-board computing system 118 may be provided on other parts of theeyeglasses 102. The on-board computing system 118 may include a processor and memory, for example. The on-board computing system 118 may be configured to receive and analyze data from thevideo camera 120 and the finger-operable touch pads 124, 126 (and possibly from other sensory devices, user interfaces, or both) and generate images for output to thelens elements - The
video camera 120 is shown to be positioned on the extending side-arm 114 of theeyeglasses 102; however, thevideo camera 120 may be provided on other parts of theeyeglasses 102. Thevideo camera 120 may be configured to capture images at various resolutions or at different frame rates. Many video cameras with a small form-factor, such as those used in cell phones or webcams, for example, may be incorporated into an example of thesystem 100. AlthoughFIG. 1 illustrates onevideo camera 120, more video cameras may be used, and each may be configured to capture the same view, or to capture different views. For example, thevideo camera 120 may be forward facing to capture at least a portion of the real-world view perceived by the user. This forward facing image captured by thevideo camera 120 may then be used to generate an augmented reality where computer generated images appear to interact with the real-world view perceived by the user. - The
sensor 122 is shown mounted on the extending side-arm 116 of theeyeglasses 102; however, thesensor 122 may be provided on other parts of theeyeglasses 102. Thesensor 122 may include one or more motion sensors, such as a gyroscope and/or an accelerometer. Other sensing devices may be included within thesensor 122 and other sensing functions may be performed by thesensor 122. - The finger-
operable touch pads arms eyeglasses 102. Each of finger-operable touch pads operable touch pads operable touch pads operable touch pads operable touch pads operable touch pads operable touch pads -
FIG. 2 illustrates another view of thesystem 100 ofFIG. 1 . As shown inFIG. 2 , thelens elements eyeglasses 102 may include afirst projector 128 coupled to an inside surface of the extending side-arm 116 and configured to project adisplay 130 onto an inside surface of thelens element 112. Additionally or alternatively, asecond projector 132 may be coupled to an inside surface of the extending side-arm 114 and configured to project adisplay 134 onto an inside surface of thelens element 110. - The
lens elements projectors projectors - In alternative embodiments, other types of display elements may also be used. For example, the
lens elements frame elements - In other embodiments (not shown in
FIGS. 1 and 2 ),system 100 can be configured for audio output. For example,system 100 can be equipped with speaker(s), earphone(s), and/or earphone jack(s). In these embodiments, audio output can be provided via the speaker(s), earphone(s), and/or earphone jack(s). Other possibilities exist as well. -
FIG. 3 is a schematic drawing of asystem 136 illustrating an example computer network infrastructure. Insystem 136, adevice 138 communicates using a communication link 140 (e.g., a wired or wireless connection) to aremote device 142. Thedevice 138 may be any type of device that can receive data and display information corresponding to or associated with the data. For example, thedevice 138 may be a heads-up display system, such as theeyeglasses 102 described with reference toFIGS. 1 and 2 . - Thus, the
device 138 may include adisplay system 144 comprising aprocessor 146 and adisplay 148. Thedisplay 148 may be, for example, an optical see-through display, an optical see-around display, or a video see-through display. Theprocessor 146 may receive data from theremote device 142, and configure the data for display on thedisplay 148. Theprocessor 146 may be any type of processor, such as a micro-processor or a digital signal processor, for example. - The
device 138 may further include on-board data storage, such asmemory 150 shown coupled to theprocessor 146 inFIG. 3 . Thememory 150 may store software and/or data that can be accessed and executed by theprocessor 146, for example. - The
remote device 142 may be any type of computing device or transmitter including a laptop computer, a mobile telephone, etc., that is configured to transmit data to thedevice 138. Theremote device 142 and thedevice 138 may contain hardware to enable thecommunication link 140, such as processors, transmitters, receivers, antennas, etc. - In
FIG. 3 , thecommunication link 140 is illustrated as a wireless connection. The wireless connection could use, e.g., Bluetooth® radio technology, communication protocols described in IEEE 802.11 (including any IEEE 802.11 revisions), Cellular technology (such as GSM, CDMA, UMTS, EV-DO, WiMAX, or LTE), or Zigbee® technology, among other possibilities. Alternatively or additionally, wired connections may be used. For example, thecommunication link 140 may be a wired link via a serial bus such as a universal serial bus or a parallel bus. A wired connection may be a proprietary connection as well. Thecommunication link 140 may also be a combination of wired and wireless connections. Theremote device 142 may be accessible via the Internet and may comprise a computing cluster associated with a particular web service (e.g., social-networking, photo sharing, address book, etc.). - Example Wearable Computing System
-
FIG. 4 is a functional block diagram for awearable computing system 400 in accordance with an example embodiment.System 400 is configured to monitor incoming data from a number of input sources 404. For example,system 400 can monitor speech received viamicrophone 408 and, may convert the speech to text using speech-to-text module 426. The input speech can include instructions that specify actions and objects for the actions. - Accordingly,
system 400 can be configured to detect instructions, and to responsively initiate the actions specified in the instructions. - Example Input Sources
- As shown in
FIG. 4 ,system 400 includes one or more input-source interfaces 402 for receiving data frominput sources 404. In the illustrated embodiment, theinput sources 404 include, for example, anapplication 406, amicrophone 408, akeyboard 410, acamera 412, and atouchpad 414. A given input-source interface 402 may be configured to interface with and receive data from a single input source, such asmicrophone 408. Alternatively, a given input-source interface 402 may be configured to simultaneously interface with multiple input sources, such as input sources 406-414. -
System 400 can receive a number of different modalities of input data frominput sources 404. In the illustrated embodiment,system 400 may receive, for example, audio data frommicrophone 408, text data fromkeypad 410, video data and/or image data from camera(s) 412, and/or gesture data fromtouchpad 414. A system may be configured to receive other modalities of data, in addition or in the alternative to those described, without departing from the scope of the invention. - Selection Criteria for Input Content In the illustrated embodiment,
system 400 includes aninput selection module 416, which generally functions to evaluate the input data from thevarious input sources 404. In particular,input selection module 416 may be configured to receive input data from theinput sources 404 via input source interfaces 402 and detect one or more data patterns in the input data. - In some cases,
input selection module 416 may detect multiple concurrent data patterns in the input data. For example,input selection module 416 may detect a first data pattern in data from a first source and, simultaneously, detect a second data pattern in data from a second source. As such,selection criteria 418 may provide input-selection rules that prioritize certain data patterns and/or certain input sources. - For instance,
selection criteria 418 may prioritize detection of speech in audio data frommicrophone 408 over other data patterns detected in video data fromcamera 412. Accordingly, some embodiments may be configured to display a text conversion of speech whenever speech matching a data pattern is detected in incoming audio data, regardless of whether there is also a matching data pattern in incoming video data. Similarly, ifinput selection module 416 detects that a user is entering text via akeyboard 410, this text may be displayed, even when there is a matching data pattern in incoming audio data and/or in incoming video data; for example, where keyboard data is given priority over audio data and video data byselection criteria 418. - In a further aspect,
selection criteria 418 may provide input-selection rules that prioritize certain data patterns when multiple matching data patterns are detected from a common input source. For instance, when explicit commands are received in audio data, the explicit commands may be given priority over implicit information in the audio data frominput sources 404. As one specific example, input-selection criteria 418 may specify that when a user says “show video” (e.g., when “show video” is detected in audio data from microphone 408), then this should be interpreted as an explicit command to selectcamera 412 as the input source and display video fromcamera 412. - It should be understood
selection criteria 418 may specify other hierarchies and/or other prioritizations of input sources and/or data patterns, without departing from the scope of the invention. Thus,selection criteria 418 may be based on one or more objectives in a specific implementation. - In a further aspect, there may be scenarios where the
selection criteria 418 indicate thatmultiple input sources 404 should be selected. For example, a scenario may exist where text is detected in input data fromkeyboard 410 and speech is detected in audio data frommicrophone 408. In this scenario, speech-to-text module 426 may convert the speech from the audio data to text, and this text may be merged with the text from the keyboard for display. As another example, scenarios may exist where video or an image fromcamera 412 is displayed, and text is overlaid on top of the video or image. In such a scenario, the text may be obtained from thekeyboard 410 and/or obtained via speech-to-text module 426 converting speech in audio data frommicrophone 408. Many other examples combinations of multiple input sources, which combine a variable number of input sources, are also possible. - In another aspect, the
selection criteria 418 can indicate that speech is to be evaluated byspeech evaluation module 430.Speech evaluation module 430 can be configured to receive speech and/or text as input, evaluate the input, and responsively generate one or more commands. For example, speech input “Display map” can be received atmicrophone 408, passed throughinput source interface 402, and received atinput selection module 416.Selection criteria 418 can directinput selection module 416 to: (1) convert the spoken input to corresponding text via speech-to-text module 426 and (2) provide the corresponding text tospeech evaluation module 430 for evaluation. - In some embodiments, part or all of the functionality of one or more of the herein-described
modules selection criteria 418, andhistorical context 424 can be combined with one or more other modules. For example, the part or all of the functionality ofspeech evaluation module 430 can be combined withinput selection module 416 or speech-to-text-module 426. -
Speech evaluation module 430 can evaluate the text of “Display map” to determine that the text includes an action or command of “Display” and an object of “map.” Based on the evaluation,speech evaluation module 430 can send a command to generate a map; e.g., send a query to a server to provide a map. Upon receiving the map,speech evaluation module 430 can then send a command to Head Mounted Display (HMD) 401 to display the received map. Many other examples are possible as well. - In embodiments not depicted in
FIG. 4 , output can be provided to other devices thanHMD 401; for example, output can be communicated viacommunication link 140. As another example, ifsystem 400 is equipped with speaker(s), earphone(s), and/or earphone jack(s), audio output can be provided via the speaker(s), earphone(s), and/or earphone jack(s). Other outputs are possible as well. - Selection of Input Content Based on Implicit Information
-
System 400 can select an input based on implicit information extracted from input data from the various possible input sources. This implicit information may correspond to certain data patterns in the input data. - When
system 400 includes a microphone or other audio sensor as an input source,input selection module 416 may monitor incoming audio data for various data patterns, according to the input-selection criteria. The input-selection criteria may specify numerous types of data patterns, which may vary in complexity and/or form. - For example,
input selection module 416 may monitor audio data for: (i) patterns that are indicative of human speech in general, (ii) patterns that are indicative of human speech by a particular person (e.g., the owner of the device, or a friend or spouse of the owner), (iii) patterns that are indicative of a certain type of human speech (e.g., a question or a proposition), (iv) patterns that are indicative of human speech inflected with a certain emotion (e.g., angry speech, happy speech, sad speech, and so on), (v) patterns that are indicative of human speech associated with a certain context (e.g., a pre-recorded announcement on a subway car or a statement typically given by a flight attendant on an airplane), (vi) patterns that are indicative of a certain type of human speech (e.g., speech that is not in a speaker's native language), (vii) patterns indicative of certain types of non-speech audio (e.g., music) and/or of non-speech audio with certain characteristics (e.g., a particular genre of music), and/or (viii) other types of audio-data patterns. - As a specific example, a system may be configured to monitor audio data for data patterns that include or are indicative of speech by a particular user, who is associated with the system (e.g., the owner of a wearable computer). Accordingly, the speech-to-
text module 426 may convert the speech to corresponding text, which may then be displayed. - In some embodiments, the audio data in which speech is detected may be analyzed in order to verify that the speech is actually that of the user associated with the system. For example, the audio data can be compared to previously-received samples of audio data known to be utterances of the user associated with the system to verify that a speaker is (or is not) the user associated with the system. In particular embodiments, a “voiceprint” or template of the voice of the user associated with the system can be generated, and compared to a voiceprint generated from input audio data. Other techniques for verifying speaker(s) are possible as well.
- Further, when speech is detected, and possibly in other scenarios as well, the detected speech may be analyzed for information that may imply certain content might be desirable. For instance, when a speaker says a person's name,
speech evaluation module 430 can generate command(s) to search various sources for the named person's contact information or other information related to the named person.Speech evaluation module 430 may perform one or more implicit searches, for example, when the person's name is stated in the midst of a conversation, and the user does not explicitly request the information about the person. Implicit searches can be performed for other types of content, such as other proper nouns, repeated words, unusual words, and/or other words. - If contact information for the named person is located,
speech evaluation module 430 can indicate that the contact information may be displayed. For example, the contact information can include phone number(s), email address(es), mailing address(es), images/video related to the contact, and/or social networking information. Furthermore, the contact information may be displayed in various forms—the contact information can be displayed visually (e.g., using HMD 401) and/or audibly (e.g., using a text-to-speech module, not shown inFIG. 4 , in combination with an audio output, such as a speaker or earphone not shown inFIG. 4 ). Many other types of contact information are possible as well. - In the event that analysis of the speech does not provide implicit information that can be used to select an input source, text corresponding to the detected speech can be displayed.
- Alternatively, the default action may be not to display anything related to the detected speech. Other default actions are also possible.
- Selection of Content Based on Context Information
- In a further aspect,
input selection module 416 may be configured to select an input source and/or to select input content based on context. In order to use context information in the selection process,input selection module 416 may coordinate withcontext evaluation module 420, which is configured to evaluate context signals from one or more context information sources 422. For example,context evaluation module 420 may determine a context, and then relay the determined context to inputselection module 416. In some cases,input selection module 416 can provide the context to another module; e.g.,speech evaluation module 430. - In an example embodiment,
context evaluation module 420 may determine context using various “context signals,” which may be any signals or information pertaining to the state or the environment surrounding the system or a user associated with the system. As such, a wearable computer may be configured to receive one or more context signals, such as location signals, time signals, environmental signals, and so on. These context signals may be received from, or derived from information received from,context information sources 422 and/or other sources. - Many types of information, from many different sources, may serve as context signals or provide information from which context signals may be derived. For example, context signals may include: (a) the current time, (b) the current date, (c) the current day of the week, (d) the current month, (e) the current season, (f) a time of a future event, (g) a date of a future event or future user-context, (h) a day of the week of a future event or future user-context, (i) a month of a future event or future user-context, (j) a season of a future event or future user-context, (k) a time of a past event or past user-context, (l) a date of a past event or past user-context, (m) a day of the week of a past event or past user-context, (n) a month of a past event or past user-context, (o) a season of a past event or past user-context, ambient temperature near the user (or near a monitoring device associated with a user), (p) a current, future, and/or past weather forecast at or near a user's current location, (q) a current, future, and/or past weather forecast at or near a location of a planned event in which a user and/or a user's friends plan to participate, (r) a current, future, and/or past weather forecast at or near a location of a previous event in which a user and/or a user's friends participated, (s) information on user's calendar, such as information regarding events or statuses of a user or a user's friends, (t) information accessible via a user's social networking account, such as information relating a user's status, statuses of a user's friends in a social network group, and/or communications between the user and the users friends, (u) noise level or any recognizable sounds detected by a monitoring device, (v) items that are currently detected by a monitoring device, (w) items that have been detected in the past by the monitoring device, (x) items that other devices associated with a monitoring device (e.g., a “trusted” monitoring device) are currently monitoring or have monitored in the past, (y) information derived from cross-referencing any two or more of: information on user's calendar, information available via a user's social networking account, and/or other context signals or sources of context information, (z) health statistics or characterizations of a user's current health (e.g., whether a user has a fever or whether a user just woke up from being asleep), (aa) items a user has indicated a need for in the past or has gone back to get in the recent past, (bb) items a user currently has (e.g., having a beach towel makes it more likely that a user should also have sunscreen), and (cc) a user's recent context as determined from sensors on or near the user and/or other sources of context information. Those skilled in the art will understand that the above list of possible context signals and sources of context information is not intended to be limiting, and that other context signals and/or sources of context information are possible in addition, or in the alternative, to those listed above.
- In some embodiments,
context evaluation module 420 may identify the context as a quantitative or qualitative value of one context signal (e.g., the time of the day, a current location, a user status). The context may also be determined based on a plurality of context signals (e.g., the time of day, the day of the week, and the location of the user). In other embodiments, thecontext evaluation module 420 may extrapolate from the information provided by context signals. For example, a determined user-context may be determined, in part, based on context signals that are provided by a user (e.g., a label for a location such as “work” or “home”, or user-provided status information such as “on vacation”). - In a further aspect,
context information sources 422 may include various sensors that provide context information. These sensors may be included as part of or communicatively coupled tosystem 400. Examples of such sensors include, but are not limited to, a temperature sensor, an accelerometer, a gyroscope, a compass, a barometer, a moisture sensor, one or more electrodes, a shock sensor, one or more chemical sample and/or analysis systems, one or more biological sensors, an ambient light sensor, a microphone, and/or a digital camera, among others. -
System 400 may also be configured to acquire context signals from various data sources. For example,context evaluation module 420 can be configured to derive information from network-based weather-report feeds, news feeds and/or financial-market feeds, a system clock providing a reference for time-based context signals, and/or a location-determining system (e.g., GPS), among others. - In another aspect,
system 400 may also be configured to learn over time about a user's preferences in certain contexts, and to updateselection criteria 418 accordingly. For example, whenever an explicit input-content instruction is received, a corresponding entry may be created inhistorical context database 424. This entry may include the input source and/or input content indicated by the input-content instruction, as well as context information that is available at or near the receipt of the input-content instruction.Context evaluation module 420 may periodically evaluatehistorical context database 424 and determine a correlation exists between explicit instructions to select a certain input source and/or certain input content, and a certain context. When such a correlation exists,selection criteria 418 may be updated to specify that the input source should be automatically selected, and/or that the input content should be automatically displayed, upon detection of the corresponding context. - Additionally or alternatively,
system 400 may be configured for an “on-the-fly” determination of whether a current context has historically been associated with certain input sources and/or certain input content. In particular,context evaluation module 420 may compare a current context to historical context data inhistorical context database 424, and determine whether certain content historically has been correlated with the current context. If a correlation is found, thencontext evaluation module 420 may automatically display the associated input content. - For example, suppose a user of
system 400 typically orders lunch from one of seven restaurants between 12:00 and 12:30 while at work. Then,context evaluation module 420 can determine that the context include (a) a location ofsystem 400 is related to “work” (b) a time just before or at 12:00, (c) a history of ordering lunch from the aforementioned seven restaurants, and (c) that six of the seven restaurants are open at this time, based on online listings. Then, thecontext evaluation module 420 can generate a command to display a reminder to “Order Lunch” with a list of the six open restaurants for order selection, and perhaps including information indicating that the seventh restaurant is closed. In response, the user can select a restaurant from the list usinginput sources 404, choose another restaurant, dismiss/postpone the order, or perhaps, perform some other action. - As another example, when
speech evaluation module 430 detects an “open” speech action followed by a file name,speech evaluation module 430 may select the particular application that is appropriate to open the file as the input source, launch the selected application in the multimode input field, and then open the named file using the application. As an additional example, the user may say “search” and then state or type the terms to be searched, or identify other content to be searched, such as an image, for example. Whenspeech evaluation module 430 detects such a “search” action, it may responsively form a query to a search engine, provide the query with subsequently stated terms or identified content, and receive search results in response to the query. Implicit searches also can be performed by this technique of forming a query based on identified content; e.g., the word(s) that provoked the implicit search, providing the query with identified content to a search engine, and receiving search results in response to the query. - As the above examples illustrate, speech actions may include objects that directly identify the input source or sources to select (e.g., a “select video” instruction), or may identify an input source by specifying an action that involves the input source (e.g., a “contact information” or “search” action). Many other actions of speech input can identify an input source.
-
Historical context database 424 can also, or instead, include information about a document context that can be included a context. A document context may involve context information derived from a given document within a collection of documents, such as, but not limited to, related collections of documents and past documents that have been created by the user and/or by other users. For example, based on the fact that a user has created a number of purchase order documents in the past, a background process may interpret the document in the context of a purchase order agreement, perhaps searching for supplier names and/or supplier part numbers upon which a search requests can be based. - A document can be a bounded physical or digital representation of a body of information, or content. Content of the document can include text, images, video, audio, multi-media content, and/or other types of content. Document-property information can be associated with a document, such as, but not limited to, document names, sizes, locations, references, partial or complete content of documents, criteria for selecting documents to form a context and/or to locate a document. Other types of content and document-property information are possible as well.
- In some cases, a document can be accessed via one or more references such as, but not limited to, a Uniform Resource Locator (URL), a Uniform Resource Identifier (URI), a volume name/number, a title, a page number, an address, a storage address, such as a memory address or disk sector, a library index number, an International Standard Book Number (ISBN), a bar code, and/or other identifying information. Other document references are possible as well.
- In addition to speech commands,
system 400 may allow a user to provide explicit instructions via other input sources, such askeyboard 410 and/ortouchpad 414. Like explicit speech commands, explicit instructions received viaother input sources 404 may include input-content instructions to select a certain input source and/or to display certain content, as well explicit instructions to perform other actions. - Example Scenarios for Speech Evaluation and Related Actions
-
FIG. 5A depicts ascenario 500 of speech evaluation in accordance with an example embodiment.Scenarios speaker 502 utilizingwearable computing device 510. An example wearable computing device that could be utilized asdevice 510 issystem 400, described in detail above with reference toFIG. 4 . - As discussed below,
device 510 can be configured to process an utterance to determine whether or not the utterance is a speech command. A speech command can have one or more actions and zero or more objects for each action. For example, the speech command “Shutdown” without an object can be interpreted bydevice 510 to power itself off. As another example, the speech command “Shutdown earphones and speakers” can be interpreted bydevice 510 to stop output from and/or power down earphone(s) and speaker(s) associated withdevice 510. Many other examples of speech commands, actions, and objects beyond those described herein are possible as well. - In some embodiments, the order of actions and objects in a speech command can be reversed or otherwise reordered. For example, speech commands in German and other languages typically have object(s) preceding actions. As another example, the device can understand the utterance “Mom phone” to be a speech command to call Mom, perhaps from a very young English-speaking user.
-
Scenario 500 begins at 500A withspeaker 502instructing device 510 to “Contact Scott at work” viautterance 520. At 500B, upon processing part ofutterance 520,device 510prompts speaker 502 to disambiguate the action “contact” withprompt 522. As shown inFIG. 5 , prompt 522 includes a question “Contact?” and two options “E-mail” and “Phone.” In other scenarios, prompt 522 can include more than two options to disambiguate an action. - At 500C,
speaker 502 disambiguates the action “contact” viautterance 530 of “Phone.” Upon further processing ofutterances device 510prompts speaker 502 at 500D to disambiguate Scott using prompt 532.FIG. 5 shows that prompt 532 includes a question “Scott?” and two options “Scott C.” and “Scott H.” - At 500E,
speaker 502 responds to prompt 532 withutterance 540 of “Scott H.” Upon further processing ofutterances device 510 places a phone call to Scott H. at work, and generates prompt 542 informingspeaker 502 thatdevice 510 is “Phoning Scott H. at Work . . . ” -
FIGS. 5B and 5C depict processing byspeech evaluation module 430 for speech uttered inscenario 500 in accordance with an example embodiment.Speech evaluation module 430 is configured to receive speech input in either audible or textual form.FIG. 5B shows the speech input of “Contact Scott at Work” in textual form. In scenarios not shown inFIG. 5B , speech input received in audible form is converted to text and then processed as described herein. For example, speech evaluation module can provide speech input in audible form to speech-to-text module 426 for conversion to textual form, and then process the converted audible-form speech input. - At
block 550,speech evaluation module 430 determines an input action for the speech input.FIG. 5B shows thatspeech evaluation module 430 determines the input action by performingaction lookup 552, and also shows that techniques foradd action 554 a andsearch engine search 554 b can be utilized along with, or instead of, performingaction lookup 552. -
Action lookup 552 can divide speech input into words and compare each word with one or more known action words. For example, the known action words can be stored, searched, and retrieved using a list, table, tree, trie, dictionary, database, and/or other data structure(s) configured to store at least one action word. Then,action lookup 552 can find word(s) in the speech input that are known action words by looking up the each input word in the data structure(s) storing the known action words. - Example action words include, but are not limited, to words related to control of device 510 (e.g., turn on or off, louder, softer, increase, decrease, mute, output, clear, erase, brighten, darken, etc.), document processing (e.g., open, load, close, edit, save, undo, replace, delete, insert, format, etc.), communications (e.g., e-mail, mail, call, contact, send, receive, get, post, tweet, text, etc.), searches (e.g., find, search, look for, locate, etc.), content delivery (e.g., show, play, display), and other action words. Many other example action words are possible as well.
- In
scenario 500,action lookup 552 can identify the word “contact” as an action word. In some embodiments, the word contact can be further identified as a “communication action” or action word related to communications, such as indicated in the paragraph above.Block 556 ofFIG. 5B shows thatspeech evaluation module 430 has identified an action of “contact” in the speech input. - At
block 558,speech evaluation module 530 can “disambiguate” the word “contact.” Disambiguation involves determining a (more) precise meaning for one or more words in speech input. For example, while “contact” is a communication action, multiple techniques can be used to contact aperson utilizing device 510. For example,device 510 can be used to contact a person and/or device via telephone, e-mail, text message, blog entry, tweet, and/or other communications techniques. - Disambiguation can involve
preference information 560.Preference information 560 can include preferences for techniques for use in contacting others (e.g., always call Alice, always tweet Bob, call Carol only between 10 AM and 10 PM, only contact Dan when at work or at home), information about contact lists and other contextual information, calendar information, information about previous speech commands, information about disambiguating action words, and/or other information. - For example,
preference information 560 can indicate thatspeaker 502 prefers to use phone calls and e-mail to “contact” others. Sincepreference information 560 indicates that two or more possible actions can be performed,speech evaluation module 430 can determine that user prompt 562 can disambiguate the action of contacting between telephoning and e-mailing.FIG. 5B shows that techniques ofsearch engine search 564 a and/orcontextual search 564 b can be utilized along with, or instead of, performing user prompt 562. - Contact prompt 566 shown in
FIG. 5B is the same asprompt 522 ofFIG. 5A . After providingcontact prompt 566,speech evaluation module 430 can await user input at block 568. Inscenario 500, the user input is “phone” as shown asutterance 530 ofFIG. 5B and inblock 570 ofFIG. 5C , where the action is determined to be phone. In some embodiments, an action identifier and/or other information about the phone action can be maintained as well byspeech evaluation module 430. - Now turning to
FIG. 5C , where the speech input is “Contact Scott at Work” and the action has been determined to be “phone.” Atblock 572,speech evaluation module 430 can remove the word disambiguated “contact” from the input, and process the remaining input of “Scott at Work” as an object for the phone action. - At
block 574,speech evaluation module 430 disambiguates the word “Scott” for the phone action.FIG. 5C shows thatspeech evaluation module 430 can disambiguate the word Scott usingcontextual search 576 a and user prompt 576 b, and also shows thatsearch engine search 578 can be utilized along with, or instead of,contextual search 576 a and user prompt 576 b. -
Contextual search 576 a involves searchinghistorical context database 424 and perhaps other contextual information. The contextual search can be performed byspeech evaluation module 430 and/or content evaluation module 420 (shown inFIG. 4 ). As discussed above with reference toFIG. 4 ,historical context database 424 can include entries regarding input sources and content, such as documents, web pages, URLs, URIs, computer addresses such as Internet Protocol (IP) addresses, images, video files, audio files, and/or other files accessed bydevice 510. In some embodiments,historical context database 424 can store and/or retrieve context signals as well, such as a current time and/or location when an input source is accessed. - Alternatively or additionally, other contextual information can be searched as well as part of a contextual search. The other contextual information can include information about a
speaker 502, such as identification information ofspeaker 502, contacts/friends ofspeaker 502, a calendar of events for thespeaker 502, organizations related tospeaker 502, and other information related tospeaker 502. The other context information can include information about other entities other thanspeaker 502 such as members of thespeaker 502's family, work colleagues, mailing lists, blogs, feeds, organization(s), persons with shared interests, and/or other related entities. - Based on
contextual search 576 a,speech evaluation module 430 can determine that there are two persons named Scott thatspeaker 502 may be trying to contact: Scott C. or Scott H. To disambiguate between Scott C. and Scott H.,speech evaluation module 430 can use user prompt technique 576 b to providename prompt 580. -
FIG. 5C shows that name prompt 580 is the same asprompt 532 ofFIG. 5B . After providingname prompt 580,speech evaluation module 430 can await user input at block 582. Inscenario 500, the user input is “Scott H.” as shown asutterance 540 ofFIG. 5B and in block 584 ofFIG. 5C , where Name is determined to be “Scott H.” In some embodiments, an identifier and/or other information about name and/or Scott H. can be maintained as well byspeech evaluation module 430. - At
block 586,speech evaluation module 430 can remove the word “Scott” from the input, as already disambiguated, and process the remaining input of “at Work” as part of the object whose name is “Scott H.” Atblock 588,speech evaluation module 430 can perform a contextual search for a phone number for “Scott H.” that is “at work”, and determine a phone number for Scott H. at work. For example,speech evaluation module 430 can search for “Scott H” in a contact database, list of most recently accessed documents, work-related computer, and/or other resources to find a telephone number for Scott H. at work. In this example, Scott H.'s work number is (555) 555-5555.FIG. 5C also indicates thatspeech evaluation module 430 can also or instead performsearch engine search 590 a and/or user prompt 590 b to determine the phone number. - Upon determining that phone number (555) 555-5555 is a number for Scott H. at work,
speech evaluation module 430 can output a command to phone the number (555) 555-5555 in response to the speech input of “Contact Scott at Work.” Upon receiving this command,device 510 can utilize telephone-related hardware and/or software to place a call to telephone number (555) 555-5555 on behalf ofspeaker 502, process the call, and tear down the call when the call ends. -
FIG. 6 depicts ascenario 600 of speech evaluation in accordance with an example embodiment.Scenario 600 begins at 600A withspeaker 502instructing device 510 usingutterance 610 of “Search kumquat.” - Upon receiving
utterance 610,speech evaluation module 430 ofdevice 510 can determine that the action is “search” and the object is “kumquat” using the techniques discussed above with reference toFIGS. 5B and 5C . Upon determining that the action is “search”,speech evaluation module 430 can send a command to utilize a search engine to search for the object kumquat, and also display a user prompt that the search is in progress. - At 600B,
FIG. 6 shows thatdevice 510 shows prompt 620 of “Search in progress . . . ” to show the search is in progress. At 600C,FIG. 6 shows asearch result 630 of “Kumquats are small fruit” displayed usingdevice 510.Search result 630 can be part or all of information returned by the search engine responding to the command to utilize the search engine for the object kumquat. - At 600D,
FIG. 6 shows thatscenario 600 continues byspeaker 502 providing utterance “Display image” 640 todevice 510. Upon receiving the speech input of “Display image”,speech evaluation module 430 can determine thatutterance 640 has an action of “display” and an object of “image” using the techniques discussed above with reference toFIGS. 5B and 5C .Device 510 can disambiguate the object “image” using the context of the previous command, where the object was “kumquat”, to determine that speech input is a command to display an image of a kumquat. - Then,
speech evaluation module 430 can perform another search (or perhaps process results of the already-performed search) to find an image related to the object “kumquat.” For example,speech evaluation module 430 can search for images and/or video using the keyword kumquat. In response, a search engine or other entity can providedevice 510 an image related to a kumquat. - At 600E,
FIG. 6 shows a display ofkumquat image 650 andtext 652 of “kumquat” displayed in response toutterance 640. In other scenarios,speaker 502 can request display of a “next” or “previous” image, save the image, and/or communicate the image to another person. Many other scenarios with searches and image displays are possible as well. In other scenarios not shown inFIG. 6 , audio and/or video output can be provided with, or instead of,image 650 and/ortext 652. -
FIG. 7 depicts ascenario 700 of speech evaluation in accordance with an example embodiment.Scenario 700 begins at 700A withspeaker 502instructing device 510 usingutterance 710 of “Output to speaker.” - Upon receiving
utterance 710,speech evaluation module 430 ofdevice 510 can determine that the action is “output” and the object is “to speaker” using the techniques discussed above with reference toFIGS. 5B and 5C . Upon determining that the action is “output”,speech evaluation module 430 can send a command to direct any future output to the object of the speech input; that is direct output to audio-output device configured for producing audio output (e.g., provide output to an speaker or earphone jack). -
FIG. 7 shows that, at 700B,device 510 confirms thatutterance 710 has been processed by outputtingoutput 720 of “Using audio output” via an audio-output device. -
FIG. 7 also shows that, at 700C,speaker 502 instructsdevice 510 withutterance 730 of “Output to display and speaker.” Upon receivingutterance 730,speech evaluation module 430 ofdevice 510 can determine that the action is “output” and the object is “to display and speaker” using the techniques discussed above with reference toFIGS. 5B and 5C . Upon determining that the action is “output”,speech evaluation module 430 can send a command to direct any future output to the object of the speech input to both the audio-output device and to a display, such as one ormore lens elements HMD 401. -
FIG. 7 shows that, at 700D,device 510 can confirm thatutterance 730 has been processed by outputtingoutput 740 of “Using audio output” via an audio-output device andoutput 742 of “Using display output” on a lens element. - In scenarios not shown in
FIG. 7 , output can be directed to a display only. In still other scenarios not shown inFIG. 7 , output can be stored (e.g., in a file), provided to other output devices ofdevice 510, communicated using a communication link to another computing device and/or a network, and/or provided to other outputs. Also, some of these scenarios, output can be directed to a file for some period of time and later speech input can close the file, ending storage of the output in the file. For example, a first utterance can be speech input to “Copy output to file output1”, then all output can be stored in the file “output1”, and later speech input, such as “Close output1” can terminate storage of the output to the output1 file. In other scenarios, input devices can be turned on and off via speech input as well; e.g., “Turn on microphone”, “Turn off keyboard”, etc. Many other scenarios are possible as well. -
FIG. 8 depicts ascenario 800 of speech evaluation in accordance with an example embodiment.Scenario 800 begins at 800A withspeaker 502instructing device 510 using utterance 810 of “Output to display.” Upon receiving utterance 810,speech evaluation module 430 ofdevice 510 can determine that the action is “output” and the object is “to display” using the techniques discussed above with reference toFIGS. 5B and 5C .FIG. 8 shows that, at 800B,device 510 confirms the output is provided to the display by outputtingprompt 820 of “Using display output” on a display ofdevice 510. - At 800C,
FIG. 8 shows two speakers—speaker 502 andspeaker 830—simultaneously providing speech input todevice 510.Speaker 502 provides speech input todevice 510 viautterance 840 of “Display anniversary” andspeaker 830 provides speech input todevice 510 viautterance 842 of “Search for cars.” - After receiving the speech inputs at 800C,
device 510 can analyze the audio data in which speech is detected to verify that the speech is associated with an authorized user of the system. For example, as discussed above,device 510 can use voiceprints to determine authorized or unauthorized users. - In some embodiments, priority and/or security information can be associated with a voiceprint and/or other speech characteristics that identify a speaker. The priority information can include information that specifies an importance of a speaker; for example, suppose a
device 510 has two possible speakers: speaker O that ownsdevice 510, and speaker F that borrowsdevice 510 on occasion. Then, the priority of speaker O can indicate that speaker O has more importance than speaker F. The priority information can be used to determine whose speech input thatdevice 510 processes when multiple authorized speakers provide simultaneous, or near simultaneous speech input. In this example, when speakers O and F both speak,device 510 can use the priority information to process speaker O's speech input. - Security information can be used to enable or disable certain functions of
device 510. For example, suppose two levels of security are provided: a guest level of security, which lets a speaker perform searches, display search results, and turn on/offdevice 510 via speech commands only, and an owner level of security, which lets a speaker perform all actions via speech commands. Continuing the speakers F and O example, speaker F can be assigned the guest level of security, and speaker O can be assigned the user level of security. Many other techniques for priority and/or security are possible as well. - As another example of security information,
device 510 can store and/or access one or more stored voiceprints of authorized users. Then, upon receiving speech input,device 510 generate a voiceprint of each speaker identified in the audio data and compare the generated voiceprint(s) with the stored voiceprint(s) of authorized user(s). If a match is found between a stored voiceprint and a generated voiceprint, then the user can be classified as authorized, anddevice 510 can perform the instruction(s) in the speech input from the authorized user. - In some embodiments, one or more device identifiers can be stored with the voiceprint(s) of authorized user(s). In these embodiments, both voiceprints and device identifiers can be compared before a user can be authorized to use a specific device; e.g.,
device 510. That is, the device can compare generated and stored voiceprints and a current device identifier with a device identifier stored with the voiceprint. A speaker can then be authorized to use a device associated with the current device identifier when both the voiceprints and the device identifiers match. These embodiments can permit voiceprint storage in location(s) other than ondevice 510. In some of these embodiments, priority and/or security information can be associated with some or all stored voiceprint(s). - In other embodiments,
device 510 does not generate the voiceprint; rather,device 510 can provides voice data and perhaps current device information to another device that generates the voiceprint. The generated voiceprint can be communicated todevice 510 and/or compared to stored voiceprint(s) to determine if a speaker is authorized. This can simplifydevice 510 by permitting generation of voiceprints by devices other thandevice 510. - In
scenario 800 at 800C,speaker 510 is determined to be an authorized speaker andspeaker 830 is determined to be an unauthorized speaker. Accordingly,utterance 840 is treated as speech input bydevice 510 andutterance 842 is ignored bydevice 510. - Upon determining
utterance 840 is authorized speech input,speech evaluation module 430 ofdevice 510 can determine that the action is “display” and the object is “anniversary” using the techniques discussed above with reference toFIGS. 5B and 5C .Device 510 can perform a contextual search (or use other techniques) to determine that the anniversary forspeaker 510 is on Jan. 29, 2012.FIG. 8 shows that, at 800D,device 510 can generate prompt 850 indicating that the “Anniversary is 1/29/12.” - In other scenarios not shown in
FIG. 8 , bothspeakers device 510, based on a number of previous speech inputs made by the speaker; i.e., the more previous speech inputs processed bydevice 510 for a given authorized speaker indicates that the given authorized speaker is to be given a higher priority; based on keywords or passwords used by a speaker and/or by other techniques. - In some embodiments, a number of speakers can be determined. For example, at 800C of
scenario 800,device 510 can determine voice prints, frequency ranges, and/or other speech-related characteristics differ betweenutterances - Then, in some scenarios not shown in
FIG. 8 , outputs can be determined based on the number of speakers. For example, if the number of speakers is one, output can use one format, such as audio output, while another format, such as video, can be used if the number of speakers is greater than one. Such output choices can be stored inpreference information 560. Many other techniques and scenarios involving multiple speakers are possible as well. -
FIG. 9 depicts ascenario 900 of speech evaluation in accordance with an example embodiment.Scenario 900 begins at 900A withspeaker 502instructing device 510 usingutterance 910 of “Load last copy of memo1.” Upon receivingutterance 910,speech evaluation module 430 ofdevice 510 can determine that the action is “load” and the object is “last copy of memo1” using the techniques discussed above with reference toFIGS. 5B and 5C . Further, as discussed above with reference toFIGS. 5B and 5C ,speech evaluation module 430 can disambiguate the “last copy of memo1” object to refer to a most-recently modified version of a file entitled “memo1.”FIG. 9 shows that, at 900B,device 510 displays a first portion of memo1 asoutput 920 of “Memo1: In 1Q11, we made” on a display ofdevice 510. -
FIG. 9 indicates thatscenario 900 continues at 900C withspeaker 502instructing device 510 usingutterance 930 of “Open DB Q1db.” Upon receivingutterance 930,speech evaluation module 430 ofdevice 510 can determine that the action is “open” and the object is “DB Q1db” using the techniques discussed above with reference toFIGS. 5B and 5C . Further, as discussed above with reference toFIGS. 5B and 5C ,speech evaluation module 430 can disambiguate the “DB Q1db” object to be a database (DB) entitled “Q1db” and then open the Q1db database. - Upon opening the Q1db database,
scenario 900 at 900D showsdevice 510 providingprompt 940 of “Q1db: open” on a display ofdevice 510 to indicate that the Q1db database has been opened. -
FIG. 9 indicates thatscenario 900 continues at 900E withspeaker 502instructing device 510 usingutterance 950 of “Insert 1Q11 profit from Q1db into memo1.” Upon receivingutterance 950,speech evaluation module 430 ofdevice 510 can determine that the action is “insert” and the object is “1Q11 profit from Q1db into memo1” using the techniques discussed above with reference toFIGS. 5B and 5C . Further, as discussed above with reference toFIGS. 5B and 5C ,speech evaluation module 430 can disambiguate the “1Q11 profit from Q1db into memo1” object to 1Q11 profit that can be found in the Q1db database and is to be placed in the memo1 file. - In some embodiments, the Q1db database and perhaps other databases are resident; e.g., stored on
device 510. In other embodiments, the Q1db database and perhaps other databases are not resident ondevice 510. In such embodiments, thedevice 510 can be configured to communicate with Q1db database, regardless of whether the database is or is not resident on the wearable computing device. For example,device 510 can be configured to access databases using a common set of access functions that permit communication with resident database(s) using local communication functionality, non-resident database(s) via a communication link or other communication interface, and both resident and non-resident databases. - In response to
utterance 950,device 510 can generate a command to query Q1db for the 1Q11 profit.FIG. 9 shows that, at 900F ofscenario 900,device 510 has received output from the query command that indicates the 1Q11 profit is $1M, and has provided corresponding prompt 960 on a display ofdevice 510. - Then,
device 510 can insert the profit value of “$1M” retrieved from the Q1db database into the memo1 file.FIG. 9 shows that, at 900G ofscenario 900,device 510 has generatedoutput 962 of an updated first portion of memo1 that includes the “$1M” from Q1db. - In scenarios not shown in
FIG. 9 , implicit search requests can be generated for a document. An implicit search request is a request for information generated by editing a document. For example, consider that a document is edited by adding the words “sword fighting.” In response, an implicit search request for information about sword fighting can be generated and sent to one or more search engines. Implicit search requests are search requests generated bydevice 510, or perhaps another device, without specific user interaction (e.g., speech input) to control timing of the communication of the search request to a search engine and/or content of the search request. Information received from search engines based on implicit search requests can be provided todevice 510, which can display the information without specific user interaction to control timing and/or content of the displayed information. Continuing the example above, in response to the implicit search request for “sword fighting”, information, historical allusions, literature, music, games, etc. related to sword fighting can be provided usingdevice 510. - Many other scenarios involving reviewing, editing, and deleting documents, databases, and/or other files are possible as well.
- Example Operation
-
FIG. 10 is a flow chart of anexample method 1000 in accordance with an example embodiment. Atblock 1010, speech input can be received at a wearable computing device. Receiving speech input at wearable computing devices is described above with reference to at leastFIGS. 4-9 . - At
block 1012, speech-related text corresponding to the speech input can be generated at the wearable computing device. Generating speech-related text corresponding to speech input is discussed above in more detail with reference to at leastFIGS. 4-5C . - At
block 1014, a context for the speech-related text can be determined using the wearable computing device. The context can based at least in part on a history of accessed documents and one or more databases. Determining contexts for speech-related text is discussed above is discussed above in more detail with reference to at leastFIGS. 4-5C . - In some embodiments, at least one database of the one or more databases is not resident on the wearable computing device. In these embodiments, the wearable computing device can be configured to communicate with the at least one database that is not resident on the wearable computing device. Resident and non-resident databases are discussed above in more detail at least with reference to
FIG. 9 . - At
block 1016, an action is determined, based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text. The action can include at least one of a command and a search request. Determining actions based on evaluating contexts and speech-related text is discussed above in more detail at least with reference toFIGS. 5-9 . - At
block 1018, a determination is made as to whether the action is a command. If the action is a command,method 1000 proceeds to block 1020. If the action is not a command,method 1000 proceeds to block 1022. - At
block 1020, as the action is a command, the wearable computing device can generate output based on the command. Generating output based on commands is discussed above in more detail with reference to at leastFIGS. 5-9 . - In some embodiments, the command can be selected from the group of a communication command, a scheduling command, a command to display information, a command to save information, and a command to delete information. In other embodiments, the command can be an implicit search request, and wherein the implicit search request comprises a request to search within the context. Commands and actions are discussed above in greater detail with reference to at least
FIGS. 5-9 . - Upon completing
block 1020,method 1000 proceeds to block 1030. - At
block 1022, a determination is made as to whether the action is a search request. If the action is a search request,method 1000 proceeds to block 1024. If the action is not a search request,method 1000 ends. - At
block 1024, as the action includes a search request, the search request can be communicated to a search engine. Communicating search requests to search engines is discussed above in more detail at least with reference toFIGS. 6 and 9 . - At
block 1026, search results are received from the search engine. Receiving search results from search engines is discussed above in more detail at least with reference toFIGS. 6 and 9 . - At
block 1028, output is generated based on the search results using the wearable computing device. Generating output based on search results is discussed above in more detail at least with reference toFIGS. 6 and 9 . - At
block 1030, a number of persons providing speech input is determined. Determining the number of persons providing speech input is discussed above in more detail at least with reference toFIG. 8 . - At
block 1032, the output is provided using one or more output components of the wearable computing device based on the number of persons providing speech input. In some embodiments, the one or more output components can include an audio output and/or a video output. Audio and video outputs are discussed above in more detail at least with reference toFIGS. 4-9 . - In some embodiments,
method 1000 includes determining a number of persons providing speech input based on determining a number of different sets of speech-related characteristics. Determining the number of different sets of speech-related characteristics is discussed above in more detail at least with reference toFIG. 8 . - In other embodiments, a user can be associated with the speech input. Then, providing the output comprises providing the output based on the determined user. In particular of these other embodiments, an output preference of the determined user can be stored; e.g., output to speakers only; output to both speakers and display; output to speakers when number of speakers=1, otherwise output to display; speaker volume, display brightness, display font. Then, providing the output based on the determined user can include providing the output based on the stored output preference for the determined user. Providing user-controlled output, such as indicated by speech input and perhaps as part of preference information, is discussed above with reference to at least
FIGS. 5-9 . - The above detailed description describes various features and functions of the disclosed systems, devices, and methods with reference to the accompanying figures. In the figures, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, figures, and claims are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
- With respect to any or all of the ladder diagrams, scenarios, and flow charts in the figures and as discussed herein, each block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, functions described as blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or functions may be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
- A block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data). The program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data may be stored on any type of computer readable medium such as a storage device including a disk or hard drive or other storage medium.
- The computer readable medium may also include non-transitory computer readable media such as computer-readable media that stores data for short periods of time like register memory, processor cache, and random access memory (RAM). The computer readable media may also include non-transitory computer readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media may also be any other volatile or non-volatile storage systems. A computer readable medium may be considered a computer readable storage medium, for example, or a tangible storage device.
- Moreover, a block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
- While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Claims (20)
1. A method, comprising:
receiving speech input at a wearable computing device;
generating, at the wearable computing device, speech-related text corresponding to the speech input;
determining a context for the speech-related text based at least in part on a history of accessed documents and one or more databases using the wearable computing device;
determining an action based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, the action comprising at least one of a command, or an implicit search request, wherein the implicit search request is generated by the wearable computing device without speech input that controls content of the implicit search request;
wherein in response to the action comprising a command:
generating, using the wearable computing device, output based on the command; and
wherein in response to the action comprising the implicit search request:
communicating the implicit search request to a search engine,
receiving search results from the search engine, and
generating, using the wearable computing device, output based on the search results;
determining a number of persons providing the speech input;
selecting a selected output component from among one or more output components of the wearable computing device based on the number of persons providing the speech input; and providing the output using the selected output component.
2. The method of claim 1 , wherein the command is a command selected from the group of a communication command, a scheduling command, a command to display information, a command to save information, and a command to delete information.
3. The method of claim 1 , wherein the implicit search request comprises a request to search within the context.
4. The method of claim 1 , wherein the one or more output components comprise an audio output and/or a video output.
5. The method of claim 1 , wherein at least one database of the one or more databases is not resident on the wearable computing device, and wherein the wearable computing device is configured to communicate with the at least one database that is not resident on the wearable computing device.
6. The method of claim 1 , wherein determining the number of persons providing speech input comprises determining a number of different sets of speech-related characteristics.
7. The method of claim 1 , further comprising:
determining a user associated with the speech input; and
providing the output based on the determined user.
8. The method of claim 7 , further comprising:
storing an output preference of the determined user; and
wherein providing the output based on the determined user comprises providing the output based on the stored output preference for the determined user.
9. A wearable computing device, comprising:
means for receiving speech input;
means for generating speech-related text corresponding to the speech input;
means for determining a context for the speech-related text based at least in part on a history of accessed documents and one or more databases;
means for determining an action based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, wherein the action comprises at least one of a command or an implicit search request, wherein the implicit search request is generated without speech input that controls content of the implicit search request;
means for, in response to the action comprising a command, generating output based on the command;
means for, in response to the action comprising the implicit search request: (a) communicating the implicit search request to a search engine, (b) receiving search results from the search engine, and (c) generating output based on the search results;
means for determining a number of persons providing the speech input;
a plurality of means of providing output;
means for selecting a selected means of providing output from among the plurality of means of providing output based on the number of persons providing the speech input; and
means for providing the output using the selected means of providing output.
10. The wearable computing device of claim 9 , wherein the the implicit search request comprises a request to search the context.
11. The wearable computing device of claim 9 , wherein at least one database of the one or more databases is not resident on the wearable computing device, and wherein the means for determining the context and the means for determining the action further each comprise means to communicate with the at least one database that is not resident on the wearable computing device.
12. The wearable computing device of claim 9 , wherein the means for determining the number of persons providing speech input comprises means for determining a number of different sets of speech-related characteristics.
13. The wearable computing device of claim 9 , further comprising:
means for determining a user associated with the speech input; and
means for providing the output based on the determined user.
14. The wearable computing device of claim 13 , further comprising:
means for storing an output preference of the determined user; and
wherein the means for providing the output based on the determined user comprises the means for providing the output based on the stored output preference for the determined user.
15. An article of manufacture including a tangible non-transitory computer-readable storage medium having computer-readable instructions encoded thereon, the instructions comprising:
instructions for receiving speech input;
instructions for generating speech-related text corresponding to the speech input;
instructions for determining a context for the speech-related text based at least in part on a history of accessed documents and one or more databases;
instructions for determining an action based on an evaluation of at least a portion of the speech-related text and the context for the speech-related text, wherein the action comprises at least one of a command or an implicit search request, wherein the implicit search request is generated without speech input that controls content of the implicit search request;
instructions for, in response to the action comprising a command, generating output based on the command;
instructions for, in response to the action comprising the search request: (a) communicating the implicit search request to a search engine, (b) receiving search results from the search engine, and (c) generating output based on the search results;
instructions for determining a number of persons providing the speech input;
instructions for selecting an output component based on the number of persons providing the speech input; and
instructions for providing the output using the selected output component.
16. The article of manufacture of claim 15 , wherein the implicit search request comprises a request to search the context.
17. The article of manufacture of claim 15 , wherein at least one database of the one or more databases is not resident on the wearable computing device, and wherein the instructions for determining the context and the instructions for determining an action comprise instructions to communicate with the at least one database that is not resident on the wearable computing device.
18. The article of manufacture of claim 15 , wherein the instructions for determining the number of persons providing speech input comprise instructions for determining a number of different sets of speech-related characteristics.
19. The article of manufacture of claim 15 , further comprising:
instructions for determining a user associated with the speech input; and
instructions for providing the output based on the determined user.
20. The article of manufacture of claim 19 , further comprising:
instructions for storing an output preference of the determined user; and
wherein the instructions for providing the output based on the determined user comprises the instructions for providing the output based on the stored output preference for the determined user.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/291,320 US20130018659A1 (en) | 2011-07-12 | 2011-11-08 | Systems and Methods for Speech Command Processing |
PCT/US2012/045616 WO2013009578A2 (en) | 2011-07-12 | 2012-07-05 | Systems and methods for speech command processing |
US14/444,974 US20140337037A1 (en) | 2011-07-12 | 2014-07-28 | Systems and Methods for Speech Command Processing |
US15/346,589 US9911418B2 (en) | 2011-07-12 | 2016-11-08 | Systems and methods for speech command processing |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161507009P | 2011-07-12 | 2011-07-12 | |
US13/291,320 US20130018659A1 (en) | 2011-07-12 | 2011-11-08 | Systems and Methods for Speech Command Processing |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/444,974 Continuation US20140337037A1 (en) | 2011-07-12 | 2014-07-28 | Systems and Methods for Speech Command Processing |
Publications (1)
Publication Number | Publication Date |
---|---|
US20130018659A1 true US20130018659A1 (en) | 2013-01-17 |
Family
ID=47506813
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/291,320 Abandoned US20130018659A1 (en) | 2011-07-12 | 2011-11-08 | Systems and Methods for Speech Command Processing |
US14/444,974 Abandoned US20140337037A1 (en) | 2011-07-12 | 2014-07-28 | Systems and Methods for Speech Command Processing |
US15/346,589 Active US9911418B2 (en) | 2011-07-12 | 2016-11-08 | Systems and methods for speech command processing |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/444,974 Abandoned US20140337037A1 (en) | 2011-07-12 | 2014-07-28 | Systems and Methods for Speech Command Processing |
US15/346,589 Active US9911418B2 (en) | 2011-07-12 | 2016-11-08 | Systems and methods for speech command processing |
Country Status (2)
Country | Link |
---|---|
US (3) | US20130018659A1 (en) |
WO (1) | WO2013009578A2 (en) |
Cited By (176)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140006026A1 (en) * | 2012-06-29 | 2014-01-02 | Mathew J. Lamb | Contextual audio ducking with situation aware devices |
US20140129207A1 (en) * | 2013-07-19 | 2014-05-08 | Apex Technology Ventures, LLC | Augmented Reality Language Translation |
US20140143785A1 (en) * | 2012-11-20 | 2014-05-22 | Samsung Electronics Companty, Ltd. | Delegating Processing from Wearable Electronic Device |
US20140172432A1 (en) * | 2012-12-18 | 2014-06-19 | Seiko Epson Corporation | Display device, head-mount type display device, method of controlling display device, and method of controlling head-mount type display device |
US20140195249A1 (en) * | 2013-01-07 | 2014-07-10 | Samsung Electronics Co., Ltd. | Interactive server, control method thereof, and interactive system |
US20140240226A1 (en) * | 2013-02-27 | 2014-08-28 | Robert Bosch Gmbh | User Interface Apparatus |
EP2778982A1 (en) * | 2013-03-14 | 2014-09-17 | Wal-Mart Stores, Inc. | Attribute detection |
CN104112248A (en) * | 2014-07-15 | 2014-10-22 | 河海大学常州校区 | Image recognition technology based intelligent life reminding system and method |
WO2015037804A1 (en) * | 2013-09-11 | 2015-03-19 | Lg Electronics Inc. | Wearable computing device and user interface method |
US20150262583A1 (en) * | 2012-09-26 | 2015-09-17 | Kyocera Corporation | Information terminal and voice operation method |
US20150379098A1 (en) * | 2014-06-27 | 2015-12-31 | Samsung Electronics Co., Ltd. | Method and apparatus for managing data |
US20160078864A1 (en) * | 2014-09-15 | 2016-03-17 | Honeywell International Inc. | Identifying un-stored voice commands |
US9323983B2 (en) | 2014-05-29 | 2016-04-26 | Comcast Cable Communications, Llc | Real-time image and audio replacement for visual acquisition devices |
US20160196535A1 (en) * | 2015-01-05 | 2016-07-07 | Electronics And Telecommunications Research Institute | Device and method for smart calendar |
US9477313B2 (en) | 2012-11-20 | 2016-10-25 | Samsung Electronics Co., Ltd. | User gesture input to wearable electronic device involving outward-facing sensor of device |
US20160379293A1 (en) * | 2015-06-29 | 2016-12-29 | International Business Machines Corporation | Application for automatic ordering of food items |
US9601113B2 (en) | 2012-05-16 | 2017-03-21 | Xtreme Interactions Inc. | System, device and method for processing interlaced multimodal user input |
EP3192041A1 (en) * | 2014-09-09 | 2017-07-19 | Microsoft Technology Licensing, LLC | Invocation of a digital personal assistant by means of a device in the vicinity |
US20170364325A1 (en) * | 2013-07-25 | 2017-12-21 | Lg Electronics Inc. | Head mounted display and method of controlling therefor |
DK201770036A1 (en) * | 2016-06-10 | 2018-01-15 | Apple Inc | Intelligent digital assistant in a multi-tasking environment |
US20180061449A1 (en) * | 2016-08-30 | 2018-03-01 | Bragi GmbH | Binaural Audio-Video Recording Using Short Range Wireless Transmission from Head Worn Devices to Receptor Device System and Method |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
CN108604237A (en) * | 2015-12-01 | 2018-09-28 | 英特吉姆公司股份有限公司 | personalized interactive intelligent search method and system |
US10095691B2 (en) | 2016-03-22 | 2018-10-09 | Wolfram Research, Inc. | Method and apparatus for converting natural language to machine actions |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10185416B2 (en) | 2012-11-20 | 2019-01-22 | Samsung Electronics Co., Ltd. | User gesture input to wearable electronic device involving movement of device |
US10194060B2 (en) | 2012-11-20 | 2019-01-29 | Samsung Electronics Company, Ltd. | Wearable electronic device |
CN109564706A (en) * | 2016-12-01 | 2019-04-02 | 英特吉姆股份有限公司 | User's interaction platform based on intelligent interactive augmented reality |
US10277649B2 (en) | 2014-09-24 | 2019-04-30 | Microsoft Technology Licensing, Llc | Presentation of computing environment on multiple devices |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10354652B2 (en) | 2015-12-02 | 2019-07-16 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US20190220933A1 (en) * | 2012-10-17 | 2019-07-18 | Facebook, Inc. | Presence Granularity with Augmented Reality |
US20190237073A1 (en) * | 2013-04-09 | 2019-08-01 | Google Llc | Multi-Mode Guard for Voice Commands |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US10390213B2 (en) | 2014-09-30 | 2019-08-20 | Apple Inc. | Social reminders |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10417405B2 (en) | 2011-03-21 | 2019-09-17 | Apple Inc. | Device access using voice authentication |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10417344B2 (en) | 2014-05-30 | 2019-09-17 | Apple Inc. | Exemplar-based natural language processing |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10438591B1 (en) * | 2012-10-30 | 2019-10-08 | Google Llc | Hotword-based speaker recognition |
US10438595B2 (en) | 2014-09-30 | 2019-10-08 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10448111B2 (en) | 2014-09-24 | 2019-10-15 | Microsoft Technology Licensing, Llc | Content projection |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10452783B2 (en) | 2012-04-20 | 2019-10-22 | Maluuba, Inc. | Conversational agent |
US10453443B2 (en) | 2014-09-30 | 2019-10-22 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
CN110473532A (en) * | 2018-05-11 | 2019-11-19 | 和硕联合科技股份有限公司 | Control system and portable electronic devices |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US10529332B2 (en) | 2015-03-08 | 2020-01-07 | Apple Inc. | Virtual assistant activation |
US10551928B2 (en) | 2012-11-20 | 2020-02-04 | Samsung Electronics Company, Ltd. | GUI transitions on wearable electronic device |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10580409B2 (en) | 2016-06-11 | 2020-03-03 | Apple Inc. | Application integration with a digital assistant |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10635296B2 (en) | 2014-09-24 | 2020-04-28 | Microsoft Technology Licensing, Llc | Partitioned application presentation across devices |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10672379B1 (en) * | 2017-09-25 | 2020-06-02 | Amazon Technologies, Inc. | Systems and methods for selecting a recipient device for communications |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10691332B2 (en) | 2014-02-28 | 2020-06-23 | Samsung Electronics Company, Ltd. | Text input on an interactive display |
US10692504B2 (en) | 2010-02-25 | 2020-06-23 | Apple Inc. | User profiling for voice input processing |
US10699717B2 (en) | 2014-05-30 | 2020-06-30 | Apple Inc. | Intelligent assistant for home automation |
US10714117B2 (en) | 2013-02-07 | 2020-07-14 | Apple Inc. | Voice trigger for a digital assistant |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10741185B2 (en) | 2010-01-18 | 2020-08-11 | Apple Inc. | Intelligent automated assistant |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10769385B2 (en) | 2013-06-09 | 2020-09-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10824531B2 (en) | 2014-09-24 | 2020-11-03 | Microsoft Technology Licensing, Llc | Lending target device resources to host device computing environment |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10942702B2 (en) | 2016-06-11 | 2021-03-09 | Apple Inc. | Intelligent device arbitration and control |
US10942703B2 (en) | 2015-12-23 | 2021-03-09 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US20210082405A1 (en) * | 2018-05-30 | 2021-03-18 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method for Location Reminder and Electronic Device |
US10957083B2 (en) * | 2016-08-11 | 2021-03-23 | Integem Inc. | Intelligent interactive and augmented reality based user interface platform |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11048473B2 (en) | 2013-06-09 | 2021-06-29 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11127397B2 (en) | 2015-05-27 | 2021-09-21 | Apple Inc. | Device voice control |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11157436B2 (en) | 2012-11-20 | 2021-10-26 | Samsung Electronics Company, Ltd. | Services associated with wearable electronic device |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US20210365485A1 (en) * | 2020-05-19 | 2021-11-25 | International Business Machines Corporation | Unsupervised text summarization with reinforcement learning |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US11237719B2 (en) | 2012-11-20 | 2022-02-01 | Samsung Electronics Company, Ltd. | Controlling remote electronic device with wearable electronic device |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US11327711B2 (en) | 2014-12-05 | 2022-05-10 | Microsoft Technology Licensing, Llc | External visual interactions for speech-based devices |
US11340465B2 (en) | 2016-12-23 | 2022-05-24 | Realwear, Inc. | Head-mounted display with modular components |
US11350253B2 (en) | 2011-06-03 | 2022-05-31 | Apple Inc. | Active transport based notifications |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11372536B2 (en) | 2012-11-20 | 2022-06-28 | Samsung Electronics Company, Ltd. | Transition and interaction model for wearable electronic device |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11409497B2 (en) * | 2016-12-23 | 2022-08-09 | Realwear, Inc. | Hands-free navigation of touch-based operating systems |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11507216B2 (en) * | 2016-12-23 | 2022-11-22 | Realwear, Inc. | Customizing user interfaces of binary applications |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11722571B1 (en) * | 2016-12-20 | 2023-08-08 | Amazon Technologies, Inc. | Recipient device presence activity monitoring for a communications session |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US20240054195A1 (en) * | 2022-08-09 | 2024-02-15 | Soundhound, Inc. | Authorization of Action by Voice Identification |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11928604B2 (en) | 2005-09-08 | 2024-03-12 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US11997108B1 (en) | 2021-03-05 | 2024-05-28 | Professional Credentials Exchange LLC | Systems and methods for providing consensus sourced verification |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US12277954B2 (en) | 2024-04-16 | 2025-04-15 | Apple Inc. | Voice trigger for a digital assistant |
Families Citing this family (81)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9575563B1 (en) * | 2013-12-30 | 2017-02-21 | X Development Llc | Tap to initiate a next action for user requests |
CN103984102A (en) * | 2014-06-05 | 2014-08-13 | 梁权富 | Head mounted lens amplifying electronic display device |
US11275757B2 (en) | 2015-02-13 | 2022-03-15 | Cerner Innovation, Inc. | Systems and methods for capturing data, creating billable information and outputting billable information |
US10769189B2 (en) | 2015-11-13 | 2020-09-08 | Microsoft Technology Licensing, Llc | Computer speech recognition and semantic understanding from activity patterns |
US11429883B2 (en) | 2015-11-13 | 2022-08-30 | Microsoft Technology Licensing, Llc | Enhanced computer experience from activity prediction |
US10509626B2 (en) | 2016-02-22 | 2019-12-17 | Sonos, Inc | Handling of loss of pairing between networked devices |
US10095470B2 (en) | 2016-02-22 | 2018-10-09 | Sonos, Inc. | Audio response playback |
US10264030B2 (en) | 2016-02-22 | 2019-04-16 | Sonos, Inc. | Networked microphone device control |
US9965247B2 (en) | 2016-02-22 | 2018-05-08 | Sonos, Inc. | Voice controlled media playback system based on user profile |
US10097919B2 (en) | 2016-02-22 | 2018-10-09 | Sonos, Inc. | Music service selection |
US9947316B2 (en) | 2016-02-22 | 2018-04-17 | Sonos, Inc. | Voice control of a media playback system |
US9978390B2 (en) | 2016-06-09 | 2018-05-22 | Sonos, Inc. | Dynamic player selection for audio signal processing |
US10134399B2 (en) | 2016-07-15 | 2018-11-20 | Sonos, Inc. | Contextualization of voice inputs |
US10115400B2 (en) | 2016-08-05 | 2018-10-30 | Sonos, Inc. | Multiple voice services |
WO2018049430A2 (en) * | 2016-08-11 | 2018-03-15 | Integem Inc. | An intelligent interactive and augmented reality based user interface platform |
US9942678B1 (en) | 2016-09-27 | 2018-04-10 | Sonos, Inc. | Audio playback settings for voice interaction |
US10181323B2 (en) | 2016-10-19 | 2019-01-15 | Sonos, Inc. | Arbitration-based voice recognition |
US11183181B2 (en) | 2017-03-27 | 2021-11-23 | Sonos, Inc. | Systems and methods of multiple voice services |
US10475449B2 (en) | 2017-08-07 | 2019-11-12 | Sonos, Inc. | Wake-word detection suppression |
US10048930B1 (en) | 2017-09-08 | 2018-08-14 | Sonos, Inc. | Dynamic computation of system response volume |
US10531157B1 (en) * | 2017-09-21 | 2020-01-07 | Amazon Technologies, Inc. | Presentation and management of audio and visual content across devices |
US10446165B2 (en) | 2017-09-27 | 2019-10-15 | Sonos, Inc. | Robust short-time fourier transform acoustic echo cancellation during audio playback |
US10621981B2 (en) | 2017-09-28 | 2020-04-14 | Sonos, Inc. | Tone interference cancellation |
US10051366B1 (en) | 2017-09-28 | 2018-08-14 | Sonos, Inc. | Three-dimensional beam forming with a microphone array |
US10482868B2 (en) | 2017-09-28 | 2019-11-19 | Sonos, Inc. | Multi-channel acoustic echo cancellation |
US10466962B2 (en) | 2017-09-29 | 2019-11-05 | Sonos, Inc. | Media playback system with voice assistance |
US10880650B2 (en) | 2017-12-10 | 2020-12-29 | Sonos, Inc. | Network microphone devices with automatic do not disturb actuation capabilities |
US10818290B2 (en) | 2017-12-11 | 2020-10-27 | Sonos, Inc. | Home graph |
WO2019152722A1 (en) | 2018-01-31 | 2019-08-08 | Sonos, Inc. | Device designation of playback and network microphone device arrangements |
US10600408B1 (en) * | 2018-03-23 | 2020-03-24 | Amazon Technologies, Inc. | Content output management based on speech quality |
US11545153B2 (en) * | 2018-04-12 | 2023-01-03 | Sony Corporation | Information processing device, information processing system, and information processing method, and program |
US10656806B2 (en) * | 2018-04-21 | 2020-05-19 | Augmentalis Inc. | Display interface systems and methods |
US11175880B2 (en) | 2018-05-10 | 2021-11-16 | Sonos, Inc. | Systems and methods for voice-assisted media content selection |
US10959029B2 (en) | 2018-05-25 | 2021-03-23 | Sonos, Inc. | Determining and adapting to changes in microphone performance of playback devices |
US10681460B2 (en) | 2018-06-28 | 2020-06-09 | Sonos, Inc. | Systems and methods for associating playback devices with voice assistant services |
EP3790006A4 (en) * | 2018-06-29 | 2021-06-09 | Huawei Technologies Co., Ltd. | Voice control method, wearable apparatus, and terminal |
US10461710B1 (en) | 2018-08-28 | 2019-10-29 | Sonos, Inc. | Media playback system with maximum volume setting |
US11076035B2 (en) | 2018-08-28 | 2021-07-27 | Sonos, Inc. | Do not disturb feature for audio notifications |
US10587430B1 (en) | 2018-09-14 | 2020-03-10 | Sonos, Inc. | Networked devices, systems, and methods for associating playback devices based on sound codes |
US11315553B2 (en) * | 2018-09-20 | 2022-04-26 | Samsung Electronics Co., Ltd. | Electronic device and method for providing or obtaining data for training thereof |
US11024331B2 (en) | 2018-09-21 | 2021-06-01 | Sonos, Inc. | Voice detection optimization using sound metadata |
US10811015B2 (en) | 2018-09-25 | 2020-10-20 | Sonos, Inc. | Voice detection optimization based on selected voice assistant service |
US11100923B2 (en) | 2018-09-28 | 2021-08-24 | Sonos, Inc. | Systems and methods for selective wake word detection using neural network models |
US10692518B2 (en) | 2018-09-29 | 2020-06-23 | Sonos, Inc. | Linear filtering for noise-suppressed speech detection via multiple network microphone devices |
US11899519B2 (en) | 2018-10-23 | 2024-02-13 | Sonos, Inc. | Multiple stage network microphone device with reduced power consumption and processing load |
EP3654249A1 (en) | 2018-11-15 | 2020-05-20 | Snips | Dilated convolutions and gating for efficient keyword spotting |
US11183183B2 (en) | 2018-12-07 | 2021-11-23 | Sonos, Inc. | Systems and methods of operating media playback systems having multiple voice assistant services |
US11132989B2 (en) | 2018-12-13 | 2021-09-28 | Sonos, Inc. | Networked microphone devices, systems, and methods of localized arbitration |
US10602268B1 (en) | 2018-12-20 | 2020-03-24 | Sonos, Inc. | Optimization of network microphone devices using noise classification |
US11869509B1 (en) | 2018-12-21 | 2024-01-09 | Cerner Innovation, Inc. | Document generation from conversational sources |
US11062704B1 (en) | 2018-12-21 | 2021-07-13 | Cerner Innovation, Inc. | Processing multi-party conversations |
US11798560B1 (en) | 2018-12-21 | 2023-10-24 | Cerner Innovation, Inc. | Rapid event and trauma documentation using voice capture |
US11875883B1 (en) | 2018-12-21 | 2024-01-16 | Cerner Innovation, Inc. | De-duplication and contextually-intelligent recommendations based on natural language understanding of conversational sources |
US11410650B1 (en) | 2018-12-26 | 2022-08-09 | Cerner Innovation, Inc. | Semantically augmented clinical speech processing |
US11183185B2 (en) | 2019-01-09 | 2021-11-23 | Microsoft Technology Licensing, Llc | Time-based visual targeting for voice commands |
US10867604B2 (en) | 2019-02-08 | 2020-12-15 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing |
US11315556B2 (en) | 2019-02-08 | 2022-04-26 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification |
EP3709194A1 (en) | 2019-03-15 | 2020-09-16 | Spotify AB | Ensemble-based data comparison |
US11120794B2 (en) | 2019-05-03 | 2021-09-14 | Sonos, Inc. | Voice assistant persistence across multiple network microphone devices |
US11200894B2 (en) | 2019-06-12 | 2021-12-14 | Sonos, Inc. | Network microphone device with command keyword eventing |
US11361756B2 (en) | 2019-06-12 | 2022-06-14 | Sonos, Inc. | Conditional wake word eventing based on environment |
US10586540B1 (en) | 2019-06-12 | 2020-03-10 | Sonos, Inc. | Network microphone device with command keyword conditioning |
US11138975B2 (en) | 2019-07-31 | 2021-10-05 | Sonos, Inc. | Locally distributed keyword detection |
US10871943B1 (en) | 2019-07-31 | 2020-12-22 | Sonos, Inc. | Noise classification for event detection |
US11138969B2 (en) | 2019-07-31 | 2021-10-05 | Sonos, Inc. | Locally distributed keyword detection |
US11094319B2 (en) | 2019-08-30 | 2021-08-17 | Spotify Ab | Systems and methods for generating a cleaned version of ambient sound |
US11189286B2 (en) | 2019-10-22 | 2021-11-30 | Sonos, Inc. | VAS toggle based on device orientation |
US11200900B2 (en) | 2019-12-20 | 2021-12-14 | Sonos, Inc. | Offline voice control |
US11562740B2 (en) | 2020-01-07 | 2023-01-24 | Sonos, Inc. | Voice verification for media playback |
US11556307B2 (en) | 2020-01-31 | 2023-01-17 | Sonos, Inc. | Local voice data processing |
US11308958B2 (en) | 2020-02-07 | 2022-04-19 | Sonos, Inc. | Localized wakeword verification |
US11308959B2 (en) | 2020-02-11 | 2022-04-19 | Spotify Ab | Dynamic adjustment of wake word acceptance tolerance thresholds in voice-controlled devices |
US11328722B2 (en) * | 2020-02-11 | 2022-05-10 | Spotify Ab | Systems and methods for generating a singular voice audio stream |
US11727919B2 (en) | 2020-05-20 | 2023-08-15 | Sonos, Inc. | Memory allocation for keyword spotting engines |
US11308962B2 (en) * | 2020-05-20 | 2022-04-19 | Sonos, Inc. | Input detection windowing |
US11482224B2 (en) | 2020-05-20 | 2022-10-25 | Sonos, Inc. | Command keywords with input detection windowing |
CN111897263A (en) * | 2020-07-30 | 2020-11-06 | Oppo广东移动通信有限公司 | Smart glasses control method, device, storage medium and electronic device |
US11698771B2 (en) | 2020-08-25 | 2023-07-11 | Sonos, Inc. | Vocal guidance engines for playback devices |
US11984123B2 (en) | 2020-11-12 | 2024-05-14 | Sonos, Inc. | Network device interaction by range |
US11947783B2 (en) * | 2021-01-25 | 2024-04-02 | Google Llc | Undoing application operation(s) via user interaction(s) with an automated assistant |
US12266431B2 (en) | 2021-04-05 | 2025-04-01 | Cerner Innovation, Inc. | Machine learning engine and rule engine for document auto-population using historical and contextual data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873107A (en) * | 1996-03-29 | 1999-02-16 | Apple Computer, Inc. | System for automatically retrieving information relevant to text being authored |
US6466232B1 (en) * | 1998-12-18 | 2002-10-15 | Tangis Corporation | Method and system for controlling presentation of information to a user based on the user's condition |
US20040199393A1 (en) * | 2003-04-03 | 2004-10-07 | Iker Arizmendi | System and method for speech recognition services |
US20050283532A1 (en) * | 2003-11-14 | 2005-12-22 | Kim Doo H | System and method for multi-modal context-sensitive applications in home network environment |
US20070033005A1 (en) * | 2005-08-05 | 2007-02-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US7398209B2 (en) * | 2002-06-03 | 2008-07-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US20090204410A1 (en) * | 2008-02-13 | 2009-08-13 | Sensory, Incorporated | Voice interface and search for electronic devices including bluetooth headsets and remote systems |
US20100088254A1 (en) * | 2008-10-07 | 2010-04-08 | Yin-Pin Yang | Self-learning method for keyword based human machine interaction and portable navigation device |
US7949529B2 (en) * | 2005-08-29 | 2011-05-24 | Voicebox Technologies, Inc. | Mobile systems and methods of supporting natural language human-machine interactions |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6842877B2 (en) | 1998-12-18 | 2005-01-11 | Tangis Corporation | Contextual responses based on automated learning techniques |
US6862713B1 (en) | 1999-08-31 | 2005-03-01 | International Business Machines Corporation | Interactive process for recognition and evaluation of a partial search query and display of interactive results |
WO2001075676A2 (en) * | 2000-04-02 | 2001-10-11 | Tangis Corporation | Soliciting information based on a computer user's context |
US7200555B1 (en) * | 2000-07-05 | 2007-04-03 | International Business Machines Corporation | Speech recognition correction for devices having limited or no display |
US6721706B1 (en) | 2000-10-30 | 2004-04-13 | Koninklijke Philips Electronics N.V. | Environment-responsive user interface/entertainment device that simulates personal interaction |
JP4838499B2 (en) | 2004-05-21 | 2011-12-14 | オリンパス株式会社 | User support device |
US8051425B2 (en) * | 2004-10-29 | 2011-11-01 | Emc Corporation | Distributed system with asynchronous execution systems and methods |
US7620549B2 (en) * | 2005-08-10 | 2009-11-17 | Voicebox Technologies, Inc. | System and method of supporting adaptive misrecognition in conversational speech |
US7477909B2 (en) | 2005-10-31 | 2009-01-13 | Nuance Communications, Inc. | System and method for conducting a search using a wireless mobile device |
US20070178950A1 (en) * | 2006-01-19 | 2007-08-02 | International Business Machines Corporation | Wearable multimodal computing device with hands-free push to talk |
US7499858B2 (en) | 2006-08-18 | 2009-03-03 | Talkhouse Llc | Methods of information retrieval |
EP2082395A2 (en) | 2006-09-14 | 2009-07-29 | Google, Inc. | Integrating voice-enabled local search and contact lists |
US20090018830A1 (en) * | 2007-07-11 | 2009-01-15 | Vandinburg Gmbh | Speech control of computing devices |
US8037070B2 (en) | 2008-06-25 | 2011-10-11 | Yahoo! Inc. | Background contextual conversational search |
CN102439972B (en) | 2009-02-27 | 2016-02-10 | 基础制造有限公司 | Based on the telecommunication platform of earphone |
US8990235B2 (en) | 2009-03-12 | 2015-03-24 | Google Inc. | Automatically providing content associated with captured information, such as information captured in real-time |
US8121618B2 (en) | 2009-10-28 | 2012-02-21 | Digimarc Corporation | Intuitive computing methods and systems |
-
2011
- 2011-11-08 US US13/291,320 patent/US20130018659A1/en not_active Abandoned
-
2012
- 2012-07-05 WO PCT/US2012/045616 patent/WO2013009578A2/en active Application Filing
-
2014
- 2014-07-28 US US14/444,974 patent/US20140337037A1/en not_active Abandoned
-
2016
- 2016-11-08 US US15/346,589 patent/US9911418B2/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873107A (en) * | 1996-03-29 | 1999-02-16 | Apple Computer, Inc. | System for automatically retrieving information relevant to text being authored |
US6466232B1 (en) * | 1998-12-18 | 2002-10-15 | Tangis Corporation | Method and system for controlling presentation of information to a user based on the user's condition |
US7398209B2 (en) * | 2002-06-03 | 2008-07-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US8112275B2 (en) * | 2002-06-03 | 2012-02-07 | Voicebox Technologies, Inc. | System and method for user-specific speech recognition |
US20040199393A1 (en) * | 2003-04-03 | 2004-10-07 | Iker Arizmendi | System and method for speech recognition services |
US20050283532A1 (en) * | 2003-11-14 | 2005-12-22 | Kim Doo H | System and method for multi-modal context-sensitive applications in home network environment |
US20070033005A1 (en) * | 2005-08-05 | 2007-02-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US7949529B2 (en) * | 2005-08-29 | 2011-05-24 | Voicebox Technologies, Inc. | Mobile systems and methods of supporting natural language human-machine interactions |
US20090204410A1 (en) * | 2008-02-13 | 2009-08-13 | Sensory, Incorporated | Voice interface and search for electronic devices including bluetooth headsets and remote systems |
US20100088254A1 (en) * | 2008-10-07 | 2010-04-08 | Yin-Pin Yang | Self-learning method for keyword based human machine interaction and portable navigation device |
Cited By (295)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11928604B2 (en) | 2005-09-08 | 2024-03-12 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US11979836B2 (en) | 2007-04-03 | 2024-05-07 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US12165635B2 (en) | 2010-01-18 | 2024-12-10 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US10741185B2 (en) | 2010-01-18 | 2020-08-11 | Apple Inc. | Intelligent automated assistant |
US10692504B2 (en) | 2010-02-25 | 2020-06-23 | Apple Inc. | User profiling for voice input processing |
US10417405B2 (en) | 2011-03-21 | 2019-09-17 | Apple Inc. | Device access using voice authentication |
US11350253B2 (en) | 2011-06-03 | 2022-05-31 | Apple Inc. | Active transport based notifications |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
US10452783B2 (en) | 2012-04-20 | 2019-10-22 | Maluuba, Inc. | Conversational agent |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9601113B2 (en) | 2012-05-16 | 2017-03-21 | Xtreme Interactions Inc. | System, device and method for processing interlaced multimodal user input |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US20140006026A1 (en) * | 2012-06-29 | 2014-01-02 | Mathew J. Lamb | Contextual audio ducking with situation aware devices |
US9384737B2 (en) * | 2012-06-29 | 2016-07-05 | Microsoft Technology Licensing, Llc | Method and device for adjusting sound levels of sources based on sound source priority |
US20150262583A1 (en) * | 2012-09-26 | 2015-09-17 | Kyocera Corporation | Information terminal and voice operation method |
US20190220933A1 (en) * | 2012-10-17 | 2019-07-18 | Facebook, Inc. | Presence Granularity with Augmented Reality |
US11557301B2 (en) | 2012-10-30 | 2023-01-17 | Google Llc | Hotword-based speaker recognition |
US10438591B1 (en) * | 2012-10-30 | 2019-10-08 | Google Llc | Hotword-based speaker recognition |
US10551928B2 (en) | 2012-11-20 | 2020-02-04 | Samsung Electronics Company, Ltd. | GUI transitions on wearable electronic device |
US11157436B2 (en) | 2012-11-20 | 2021-10-26 | Samsung Electronics Company, Ltd. | Services associated with wearable electronic device |
US9477313B2 (en) | 2012-11-20 | 2016-10-25 | Samsung Electronics Co., Ltd. | User gesture input to wearable electronic device involving outward-facing sensor of device |
US10423214B2 (en) * | 2012-11-20 | 2019-09-24 | Samsung Electronics Company, Ltd | Delegating processing from wearable electronic device |
US10185416B2 (en) | 2012-11-20 | 2019-01-22 | Samsung Electronics Co., Ltd. | User gesture input to wearable electronic device involving movement of device |
US10194060B2 (en) | 2012-11-20 | 2019-01-29 | Samsung Electronics Company, Ltd. | Wearable electronic device |
US11237719B2 (en) | 2012-11-20 | 2022-02-01 | Samsung Electronics Company, Ltd. | Controlling remote electronic device with wearable electronic device |
US20140143785A1 (en) * | 2012-11-20 | 2014-05-22 | Samsung Electronics Companty, Ltd. | Delegating Processing from Wearable Electronic Device |
US11372536B2 (en) | 2012-11-20 | 2022-06-28 | Samsung Electronics Company, Ltd. | Transition and interaction model for wearable electronic device |
US20140172432A1 (en) * | 2012-12-18 | 2014-06-19 | Seiko Epson Corporation | Display device, head-mount type display device, method of controlling display device, and method of controlling head-mount type display device |
US9542958B2 (en) * | 2012-12-18 | 2017-01-10 | Seiko Epson Corporation | Display device, head-mount type display device, method of controlling display device, and method of controlling head-mount type display device |
US11854570B2 (en) | 2013-01-07 | 2023-12-26 | Samsung Electronics Co., Ltd. | Electronic device providing response to voice input, and method and computer readable medium thereof |
US10891968B2 (en) * | 2013-01-07 | 2021-01-12 | Samsung Electronics Co., Ltd. | Interactive server, control method thereof, and interactive system |
US20140195249A1 (en) * | 2013-01-07 | 2014-07-10 | Samsung Electronics Co., Ltd. | Interactive server, control method thereof, and interactive system |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US12009007B2 (en) | 2013-02-07 | 2024-06-11 | Apple Inc. | Voice trigger for a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US10714117B2 (en) | 2013-02-07 | 2020-07-14 | Apple Inc. | Voice trigger for a digital assistant |
US20140240226A1 (en) * | 2013-02-27 | 2014-08-28 | Robert Bosch Gmbh | User Interface Apparatus |
US9098543B2 (en) | 2013-03-14 | 2015-08-04 | Wal-Mart Stores, Inc. | Attribute detection |
EP2778982A1 (en) * | 2013-03-14 | 2014-09-17 | Wal-Mart Stores, Inc. | Attribute detection |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10891953B2 (en) * | 2013-04-09 | 2021-01-12 | Google Llc | Multi-mode guard for voice commands |
US20190237073A1 (en) * | 2013-04-09 | 2019-08-01 | Google Llc | Multi-Mode Guard for Voice Commands |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11048473B2 (en) | 2013-06-09 | 2021-06-29 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10769385B2 (en) | 2013-06-09 | 2020-09-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US20140129207A1 (en) * | 2013-07-19 | 2014-05-08 | Apex Technology Ventures, LLC | Augmented Reality Language Translation |
US10664230B2 (en) * | 2013-07-25 | 2020-05-26 | Lg Electronics Inc. | Head mounted display and method of controlling therefor |
US20170364325A1 (en) * | 2013-07-25 | 2017-12-21 | Lg Electronics Inc. | Head mounted display and method of controlling therefor |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
WO2015037804A1 (en) * | 2013-09-11 | 2015-03-19 | Lg Electronics Inc. | Wearable computing device and user interface method |
US9471101B2 (en) | 2013-09-11 | 2016-10-18 | Lg Electronics Inc. | Wearable computing device and user interface method |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US10691332B2 (en) | 2014-02-28 | 2020-06-23 | Samsung Electronics Company, Ltd. | Text input on an interactive display |
US9323983B2 (en) | 2014-05-29 | 2016-04-26 | Comcast Cable Communications, Llc | Real-time image and audio replacement for visual acquisition devices |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10657966B2 (en) | 2014-05-30 | 2020-05-19 | Apple Inc. | Better resolution when referencing to concepts |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US12067990B2 (en) | 2014-05-30 | 2024-08-20 | Apple Inc. | Intelligent assistant for home automation |
US10699717B2 (en) | 2014-05-30 | 2020-06-30 | Apple Inc. | Intelligent assistant for home automation |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US10714095B2 (en) | 2014-05-30 | 2020-07-14 | Apple Inc. | Intelligent assistant for home automation |
US12118999B2 (en) | 2014-05-30 | 2024-10-15 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10417344B2 (en) | 2014-05-30 | 2019-09-17 | Apple Inc. | Exemplar-based natural language processing |
US10691717B2 (en) * | 2014-06-27 | 2020-06-23 | Samsung Electronics Co., Ltd. | Method and apparatus for managing data |
US20150379098A1 (en) * | 2014-06-27 | 2015-12-31 | Samsung Electronics Co., Ltd. | Method and apparatus for managing data |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US12200297B2 (en) | 2014-06-30 | 2025-01-14 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
CN104112248A (en) * | 2014-07-15 | 2014-10-22 | 河海大学常州校区 | Image recognition technology based intelligent life reminding system and method |
EP3192041A1 (en) * | 2014-09-09 | 2017-07-19 | Microsoft Technology Licensing, LLC | Invocation of a digital personal assistant by means of a device in the vicinity |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US20160078864A1 (en) * | 2014-09-15 | 2016-03-17 | Honeywell International Inc. | Identifying un-stored voice commands |
US10277649B2 (en) | 2014-09-24 | 2019-04-30 | Microsoft Technology Licensing, Llc | Presentation of computing environment on multiple devices |
US10635296B2 (en) | 2014-09-24 | 2020-04-28 | Microsoft Technology Licensing, Llc | Partitioned application presentation across devices |
US10824531B2 (en) | 2014-09-24 | 2020-11-03 | Microsoft Technology Licensing, Llc | Lending target device resources to host device computing environment |
US10448111B2 (en) | 2014-09-24 | 2019-10-15 | Microsoft Technology Licensing, Llc | Content projection |
US10390213B2 (en) | 2014-09-30 | 2019-08-20 | Apple Inc. | Social reminders |
US10438595B2 (en) | 2014-09-30 | 2019-10-08 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10453443B2 (en) | 2014-09-30 | 2019-10-22 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US11327711B2 (en) | 2014-12-05 | 2022-05-10 | Microsoft Technology Licensing, Llc | External visual interactions for speech-based devices |
US20160196535A1 (en) * | 2015-01-05 | 2016-07-07 | Electronics And Telecommunications Research Institute | Device and method for smart calendar |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US10529332B2 (en) | 2015-03-08 | 2020-01-07 | Apple Inc. | Virtual assistant activation |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US12236952B2 (en) | 2015-03-08 | 2025-02-25 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US12154016B2 (en) | 2015-05-15 | 2024-11-26 | Apple Inc. | Virtual assistant in a communication session |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11127397B2 (en) | 2015-05-27 | 2021-09-21 | Apple Inc. | Device voice control |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10681212B2 (en) | 2015-06-05 | 2020-06-09 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US20160379293A1 (en) * | 2015-06-29 | 2016-12-29 | International Business Machines Corporation | Application for automatic ordering of food items |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US12204932B2 (en) | 2015-09-08 | 2025-01-21 | Apple Inc. | Distributed personal assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
CN108604237A (en) * | 2015-12-01 | 2018-09-28 | 英特吉姆公司股份有限公司 | personalized interactive intelligent search method and system |
CN108604237B (en) * | 2015-12-01 | 2022-10-14 | 英特吉姆公司股份有限公司 | Personalized interactive intelligence search method and system |
US10354652B2 (en) | 2015-12-02 | 2019-07-16 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10942703B2 (en) | 2015-12-23 | 2021-03-09 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10095691B2 (en) | 2016-03-22 | 2018-10-09 | Wolfram Research, Inc. | Method and apparatus for converting natural language to machine actions |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
CN109783046A (en) * | 2016-06-10 | 2019-05-21 | 苹果公司 | Intelligent digital assistant in multitask environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12175977B2 (en) | 2016-06-10 | 2024-12-24 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK201770036A1 (en) * | 2016-06-10 | 2018-01-15 | Apple Inc | Intelligent digital assistant in a multi-tasking environment |
JP2019204517A (en) * | 2016-06-10 | 2019-11-28 | アップル インコーポレイテッドApple Inc. | Intelligent digital assistant in multi-tasking environment |
AU2019213416B2 (en) * | 2016-06-10 | 2019-11-21 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
AU2016409890B2 (en) * | 2016-06-10 | 2018-07-19 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
CN108701013A (en) * | 2016-06-10 | 2018-10-23 | 苹果公司 | Intelligent digital assistants in a multitasking environment |
JP2019522250A (en) * | 2016-06-10 | 2019-08-08 | アップル インコーポレイテッドApple Inc. | Intelligent digital assistant in multitasking environment |
EP3495943A1 (en) * | 2016-06-10 | 2019-06-12 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
AU2019213416A1 (en) * | 2016-06-10 | 2019-08-29 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US10580409B2 (en) | 2016-06-11 | 2020-03-03 | Apple Inc. | Application integration with a digital assistant |
US10942702B2 (en) | 2016-06-11 | 2021-03-09 | Apple Inc. | Intelligent device arbitration and control |
US11587272B2 (en) * | 2016-08-11 | 2023-02-21 | Eliza Y Du | Intelligent interactive and augmented reality cloud platform |
US20210192819A1 (en) * | 2016-08-11 | 2021-06-24 | Eliza Du | Intelligent interactive and augmented reality cloud platform |
US10957083B2 (en) * | 2016-08-11 | 2021-03-23 | Integem Inc. | Intelligent interactive and augmented reality based user interface platform |
US20180061449A1 (en) * | 2016-08-30 | 2018-03-01 | Bragi GmbH | Binaural Audio-Video Recording Using Short Range Wireless Transmission from Head Worn Devices to Receptor Device System and Method |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
CN109564706A (en) * | 2016-12-01 | 2019-04-02 | 英特吉姆股份有限公司 | User's interaction platform based on intelligent interactive augmented reality |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US11722571B1 (en) * | 2016-12-20 | 2023-08-08 | Amazon Technologies, Inc. | Recipient device presence activity monitoring for a communications session |
US11409497B2 (en) * | 2016-12-23 | 2022-08-09 | Realwear, Inc. | Hands-free navigation of touch-based operating systems |
US11507216B2 (en) * | 2016-12-23 | 2022-11-22 | Realwear, Inc. | Customizing user interfaces of binary applications |
US11947752B2 (en) | 2016-12-23 | 2024-04-02 | Realwear, Inc. | Customizing user interfaces of binary applications |
US11340465B2 (en) | 2016-12-23 | 2022-05-24 | Realwear, Inc. | Head-mounted display with modular components |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US12260234B2 (en) | 2017-01-09 | 2025-03-25 | Apple Inc. | Application integration with a digital assistant |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US10847142B2 (en) | 2017-05-11 | 2020-11-24 | Apple Inc. | Maintaining privacy of personal information |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10909171B2 (en) | 2017-05-16 | 2021-02-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US12026197B2 (en) | 2017-05-16 | 2024-07-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10672379B1 (en) * | 2017-09-25 | 2020-06-02 | Amazon Technologies, Inc. | Systems and methods for selecting a recipient device for communications |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US12211502B2 (en) | 2018-03-26 | 2025-01-28 | Apple Inc. | Natural assistant interaction |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
CN110473532A (en) * | 2018-05-11 | 2019-11-19 | 和硕联合科技股份有限公司 | Control system and portable electronic devices |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US20210082405A1 (en) * | 2018-05-30 | 2021-03-18 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method for Location Reminder and Electronic Device |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US12080287B2 (en) | 2018-06-01 | 2024-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US12061752B2 (en) | 2018-06-01 | 2024-08-13 | Apple Inc. | Attention aware virtual assistant dismissal |
US10944859B2 (en) | 2018-06-03 | 2021-03-09 | Apple Inc. | Accelerated task performance |
US10504518B1 (en) | 2018-06-03 | 2019-12-10 | Apple Inc. | Accelerated task performance |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US12136419B2 (en) | 2019-03-18 | 2024-11-05 | Apple Inc. | Multimodality in digital assistant systems |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US12154571B2 (en) | 2019-05-06 | 2024-11-26 | Apple Inc. | Spoken notifications |
US12216894B2 (en) | 2019-05-06 | 2025-02-04 | Apple Inc. | User configurable task triggers |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US12197712B2 (en) | 2020-05-11 | 2025-01-14 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11294945B2 (en) * | 2020-05-19 | 2022-04-05 | International Business Machines Corporation | Unsupervised text summarization with reinforcement learning |
US20210365485A1 (en) * | 2020-05-19 | 2021-11-25 | International Business Machines Corporation | Unsupervised text summarization with reinforcement learning |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US12219314B2 (en) | 2020-07-21 | 2025-02-04 | Apple Inc. | User identification using headphones |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US12135767B1 (en) | 2021-03-05 | 2024-11-05 | Professional Credentials Exchange, LLC | Systems and methods for ingesting credential information |
US11997108B1 (en) | 2021-03-05 | 2024-05-28 | Professional Credentials Exchange LLC | Systems and methods for providing consensus sourced verification |
US20240054195A1 (en) * | 2022-08-09 | 2024-02-15 | Soundhound, Inc. | Authorization of Action by Voice Identification |
US12277954B2 (en) | 2024-04-16 | 2025-04-15 | Apple Inc. | Voice trigger for a digital assistant |
Also Published As
Publication number | Publication date |
---|---|
US20140337037A1 (en) | 2014-11-13 |
US9911418B2 (en) | 2018-03-06 |
WO2013009578A2 (en) | 2013-01-17 |
WO2013009578A3 (en) | 2013-04-25 |
US20170053648A1 (en) | 2017-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9911418B2 (en) | Systems and methods for speech command processing | |
US11595517B2 (en) | Digital assistant integration with telephony | |
US11966494B2 (en) | Threshold-based assembly of remote automated assistant responses | |
KR102197869B1 (en) | Natural assistant interaction | |
US8223088B1 (en) | Multimode input field for a head-mounted display | |
CN111901481B (en) | Computer-implemented method, electronic device, and storage medium | |
CN110797019B (en) | Multi-command single speech input method | |
KR102036786B1 (en) | Providing suggested voice-based action queries | |
US20190095050A1 (en) | Application Gateway for Providing Different User Interfaces for Limited Distraction and Non-Limited Distraction Contexts | |
US9378730B1 (en) | Evaluating pronouns in context | |
KR102002979B1 (en) | Leveraging head mounted displays to enable person-to-person interactions | |
CN115221295A (en) | Personal requested digital assistant processing | |
US20180349447A1 (en) | Methods and systems for customizing suggestions using user-specific information | |
CN112567332A (en) | Multimodal input of voice commands | |
CN118056172A (en) | Digital assistant for providing hands-free notification management | |
US20110276327A1 (en) | Voice-to-expressive text | |
CN117033578A (en) | Active assistance based on inter-device conversational communication | |
KR20190007042A (en) | Integrate selectable application links into conversations with personal assistant modules | |
CN111309136A (en) | Accelerated task execution | |
Neustein | Advances in speech recognition: mobile environments, call centers and clinics | |
WO2022266209A2 (en) | Conversational and environmental transcriptions | |
CN110574023A (en) | offline personal assistant | |
CN115083414A (en) | Multi-state digital assistant for continuous conversation | |
US12236938B2 (en) | Digital assistant for providing and modifying an output of an electronic document | |
CN111899739B (en) | Voice notification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHI, LIANG-YU (TOM);REEL/FRAME:027190/0913 Effective date: 20111025 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044142/0357 Effective date: 20170929 |