US20140067402A1 - Displaying additional data about outputted media data by a display device for a speech search command - Google Patents
Displaying additional data about outputted media data by a display device for a speech search command Download PDFInfo
- Publication number
- US20140067402A1 US20140067402A1 US13/953,313 US201313953313A US2014067402A1 US 20140067402 A1 US20140067402 A1 US 20140067402A1 US 201313953313 A US201313953313 A US 201313953313A US 2014067402 A1 US2014067402 A1 US 2014067402A1
- Authority
- US
- United States
- Prior art keywords
- query term
- query
- display device
- search
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000012545 processing Methods 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 17
- 239000000284 extract Substances 0.000 claims description 12
- 238000012790 confirmation Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 description 9
- 238000003058 natural language processing Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 230000003203 everyday effect Effects 0.000 description 4
- 238000005452 bending Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000238631 Hexapoda Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000005389 magnetism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
Images
Classifications
-
- G06F17/30755—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/632—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
Definitions
- the present invention relates to a display device and more particularly, to a speech search method for a display device.
- users can easily search a variety of information. Especially, the users can search for digital contents at the same time they view the digital contents using a display device. They can search for not only information about the contents themselves but also detailed information about a part of the contents that they are viewing or the object of the contents.
- Searching for information about contents can be performed in various ways.
- the users inputted their search words by using additional input devices such as a keyboard.
- additional input devices such as a keyboard.
- the users can input various voice commands to a device in order to control the device. Therefore, the users can search for information about contents using their voice commands at the same time they are viewing the contents.
- the present invention is directed to a display device and a speech search method that substantially obviates one or more problems due to limitations and disadvantages of the related art.
- An object of the present invention is to provide a method of searching desired information about contents using a speech search in a more efficient and accurate manner.
- Another object of the present invention is to provide a speech search method that generates search results that a user intended to search using speech search commands and context information associated with media data being viewed by the user even when the user does not exactly know what he or she is trying to search.
- a speech search method for a display device includes the steps of outputting media data, receiving a speech search command from a user, and determining whether the speech search command includes a user query term which is full and searchable. If the speech search command does not include a user query term which is full and searchable, the method further comprises the step of extracting a media query term which is full and searchable from audio data of the media data which is outputted immediately prior to the speech search command. Finally, the method includes the step of performing a speech search using the extracted media query term.
- a display device in another aspect of the present invention, includes a media data processing module processing media data, a media data output unit outputting the processed media data, and an audio input unit receiving a speech search command from a user.
- the display device further includes a speech search module determining a query term from the speech search command and performing a speech search using the determined query term.
- the display device determines whether the speech search command includes a user query term which is full and searchable, extracts a media query term from audio data of the media data which is outputted immediately prior to the speech search command if the speech search command does not include a user query term, and performs a speech search using the extracted media query term.
- FIG. 1 illustrates a conceptual diagram of a network according to an embodiment of the present invention
- FIG. 2 illustrates a block diagram of a display device according to an embodiment of the present invention
- FIG. 3 illustrate a speech search method according to an embodiment of the present invention
- FIG. 4 illustrates a flowchart of a speech search method according to an embodiment of the present invention
- FIG. 5 illustrates a speech search method according to another embodiment of the present invention
- FIG. 6 illustrates a flowchart of a speech search method according to another embodiment of the present invention.
- FIG. 7 illustrates a logical block diagram of a display device according to an embodiment of the present invention.
- FIG. 8 illustrates a flowchart of a speech search method according to another embodiment of the present invention.
- FIG. 9 illustrates a speech search method according to another embodiment of the present invention.
- the present invention relates to a display device performing a speech search and providing a user with a result of the speech search.
- a speech search is a technology that recognizes a user's voice command and performs a search with the user's voice command.
- a speech search utilizes a voice or speech recognition technology.
- the voice recognition technology in the present invention includes natural language processing.
- the natural language processing is a process that analyzes the type, meaning, and conversation of a normal everyday language and converts the result of the analysis for a device to process it. In other words, it is not a predetermined keyword that the device recognizes but a spontaneous conversation that it recognizes and it performs the processing according to a user's intention.
- the display device can be any one of a variety of devices that can process and output digital media data or digital contents.
- the digital contents include at least one of text, audio, and video data.
- the display device can be a TV, a set-top box, an internet processing device, a recorded media player, a media recording device, a wireless communication device, a cell phone, a Personal Digital Assistant (“PDA”), a computer, a laptop, and a tablet PC.
- PDA Personal Digital Assistant
- the display device can be any one of a variety of devices providing a user with processed digital contents, and the display device may be referred to as “device” hereinafter.
- FIG. 1 shows a conceptual diagram of a network according to an embodiment of the present invention.
- display devices 1040 are connected to a network 1030 .
- the network 1030 is a network that transmits and receives data by using various communication protocols such as cable, wireless communication, optical communication, or IP network.
- the display devices 1040 receive contents from a contents server 1010 through the network 1030 .
- the contents server 1010 is a contents provider providing digital contents, and the display device 1040 can be used as a contents server based on the network architecture.
- the display device 1040 provides a user with the contents received from the contents server 1010 .
- the display device 1040 provides contents by processing received contents data and displays the processed data.
- the display device 1040 receives a search command from a user, transmits a search term to a search engine 1020 , and provides a result of the search back to the user after receiving the result from the search engine 1020 .
- At least one searchable word which is a target of searching can be called “query term.”
- Query term is an object to be searched by the search engine and includes at least one word.
- the display device 1040 may perform searching from a database included in the display device 1040 by using a query term or transmit the query term to the search engine 1020 and receive a result of the search. And at least one word included in query terms is called “query word.”
- query word When the query term includes a plurality of words, each word can be called query word. If the query term only has one word, the query word is the query term. But, in the following, the query word is a word that a user indicates as the user speaks a speech search command. In other words, the user can speak a partial or unclear word and such a word can be recognized by the display device as the query word.
- FIG. 2 shows a block diagram of a display device according to an embodiment of the present invention.
- FIG. 2 indicates an example of the display device 1040 shown in FIG. 1 having a storage unit 2010 , a communication unit 2020 , a sensor unit 2030 , an audio input/output unit 2040 , a camera unit 2050 , a display unit 2060 , a power unit 2070 , a processor 2080 , and a controller 2090 .
- the display device in FIG. 2 is shown as an example only and it is not required for all the units to be equipped as shown in FIG. 2 .
- a structure block necessary for the display device according to an embodiment of the present invention will be described as follows.
- the storage unit 2010 stores various digital data such as video, audio, picture, movie clips, and applications.
- the storage unit 2010 indicates a various digital data storage space for various digital data such as flash memory, Hard Disk Drive (“HDD”), and Solid State Drive (“SSD”).
- HDD Hard Disk Drive
- SSD Solid State Drive
- a buffer necessary for processing data can be included in the storage unit 2010 .
- the storage unit 2010 can store a database necessary for searching information.
- the communication unit 2020 transmits and receives data and performs communications by using various protocols associated with the display device.
- the communication unit 2020 is connected to the external networks through wire or wirelessly, and transmits and receives digital data.
- the display device receives media data by using the communication unit 2020 , or transmits a search query and receives the search result of the query.
- the sensor unit 2030 may recognize a user's input or environment of the device by using a plurality of sensors and transmit to the controller 2090 .
- the sensor unit 2030 can have a plurality of sensing means.
- a plurality of sensing means can include gravity sensor, terrestrial magnetism sensor, motion sensor, gyro sensor, acceleration sensor, inclination sensor, brightness sensor, olfactory sensor, temperature sensor, depth sensor, pressure sensor, bending sensor, audio sensors, video sensor, Global Positioning System sensor (“GPS”) sensors, and touch sensors.
- GPS Global Positioning System sensor
- the sensor unit 2030 senses the user's various inputs and conditions, and transmits the result of the sensing for the device performing necessary functions based on the result of the sensing.
- the sensors may be included in the device as a different element or combined as at least one element.
- the sensor unit 2030 may be selectively equipped according to an embodiment.
- the audio input/output unit 2040 includes an audio output means such as a speaker and an audio input means such as a microphone.
- the audio input/output unit 2040 may perform audio outputting of the device or audio inputting toward the device.
- the audio input/output unit 2040 can be used as an audio sensor.
- the audio input/output unit 2040 processes audio data and transmits the audio data to an external device, or receives audio data from the external device and processes it.
- An audio input unit and an audio output unit may be separately equipped and an embodiment will be illustrated as follows.
- the camera unit 2050 records movie clips and takes pictures and may be selectively equipped according to an embodiment.
- the camera unit 2050 can be used as a motion sensor or a visual sensor as pre-described.
- the display unit 2060 can output images on a display screen. If the display is a touch-sensitive display, the display unit 2060 can be used as a touch sensor. If the display or the device is flexible, they may be used as a bending sensor. However, according to an embodiment, if the display device does not include a display panel or a screen such as a set-top box and a computer, the display unit processes display data and transmits the display data to an external device like a monitor.
- the display unit 2060 may be called as a video output unit hereinafter.
- the power unit 2070 provides power to the device as a power source connected with an internal battery or an external power.
- the processor 2080 executes various applications stored in the storage unit 2010 and processes internal data in the device.
- the controller 2090 controls the units of the device, and manages transmitting and receiving data between the units and the functions of each unit.
- the processor 2080 and the controller 2090 may be combined in a single chip and implement the functions above-described. In that case, they may be called a control unit 2100 .
- the speech search method of the present invention can be performed by the control unit 2100 and according to an embodiment, performed by modules controlled by the control unit 2100 . Further illustration is as follows.
- FIG. 2 is a block diagram of a display device according to an embodiment of the present invention, and the separately illustrated blocks are shown as the elements of the device.
- the elements of the device can be combined in one chip or a plurality of chips as designed.
- a speech search method can be performed in the control unit 2100 in FIG. 2 and according to an embodiment, the speech search method can be executed by an application which is stored in the storage unit 2010 and operated by the control unit 2100 .
- the control unit 2100 executing such a speech search, further description is to be followed.
- a TV may be used as an example of the display device.
- the display device is not limited to only a TV.
- FIG. 3 shows a speech search method according to an embodiment of the present invention.
- a display device 3010 similar to display device 1040 outputs baseball contents as media data.
- the baseball contents may be received as live broadcast contents or may be pre-stored in the storage unit of the display device 3010 .
- the media data which are contents that the display device outputs, may include audio data and video data.
- the display device 3010 outputs video data through a display screen and audio data through a speaker.
- a user watching the baseball contents by the display device 3010 can search information about the contents by voice. For example, for images being displayed on a screen, one can search information about a player or an unfamiliar word from the words that are spoken by the commentator. As shown in FIG. 3 , if the commentator says “ . . . a mid-fielder ends the inning with a fly-out” in the broadcast, the user might want to search for “fly-out.” In that case, the user in the present invention can search for “fly-out” by a voice command. Especially, by using natural language processing, a normal everyday questioning statement like “What is a fly-out?” can start a speech search function.
- FIG. 4 shows a flowchart of a speech search method according to an embodiment of the present invention that may be performed by any device described herein.
- the method begins when a display device outputs media data (S 4010 ).
- the media contents may include video data and audio data. Further, the media contents may include text data depending on contents.
- the display device then receives a speech search command (S 4020 ).
- the speech search command can be a predetermined command or a normal everyday language statement by using natural language processing. In the embodiment of FIG. 3 , “What is a fly-out?” is a speech search command.
- the display device receives the speech search command from the user's voice received via a microphone.
- the display device extracts a query term from a speech search command (S 4030 ).
- a query term which is the object of the search can be extracted from the speech search command.
- “fly-out” is a query term in the speech search command of “what is a fly-out?” That is, the display device recognizes the user's voice of “what is a fly-out?” as a speech search command, and extracts the query term, “fly-out,” to process the search from the recognized speech search command.
- query word words included in a speech search command are called, “query word.”
- “fly” and “out” are query words.
- the display device can search for a query word, a user might want to search for a combined query word, a query term, not for each query word.
- the display device can extract a query term by using voice-recognition and natural language processing. In that case, as for the extraction of the query term if there is only one query word, the query word is extracted and if there are a plurality of query words, combined query words, that is, a query term, is extracted.
- the extraction of query word can be performed based on the context information of media data. In the embodiments of FIG. 3 and FIG.
- the display device can determine whether the user wants to search for “fly” or “out” or “fly-out” as a baseball term.
- the use of context information allows the search to exclude searches for irrelevant synonyms (e.g., a search for information about the insect fly or the act of flying in a plane.)
- the display device performs the speech search by using the extracted query term (S 4040 ).
- the display device can search for query terms by using an internal search engine, or transmits query terms through a network to an external search engine having the search function and receives the result of the search.
- a search or a search result of a query term includes the definition of the term and diverse data related to contents which a user is watching.
- the display device provides a user with a search result (S 4050 ).
- the search result can be provided in various ways.
- the display device provides the search result in audio or in display.
- the display device outputs the search result to a user in audio or in captions on the display screen.
- a speech search command may not include a full query term. That is, if a query term is incomplete or ambiguous, a query term for searching may not be able to be extracted. Further illustration is as follows. In the following, the same technical descriptions as in FIG. 3 and FIG. 4 are not repeated.
- FIG. 5 is a speech search method according to another embodiment of the present invention.
- the display device outputs baseball contents as contents.
- the baseball contents may be received as live broadcast contents or be pre-stored in the storage unit of a display device 5010 similar to storage unit 2010 .
- Contents that the display device outputs may include audio data and video data.
- the display device 5010 outputs video data through a display screen and audio data through a speaker.
- a user watching the baseball contents by the display device 5010 can search information about the contents by voice. For example, for images being displayed on a screen, one can search information about a player or an unfamiliar word spoken by the commentator. As shown in FIG. 3 , if the user says, “ . . . a mid-fielder ends the inning with a fly-out” in the broadcast, the user might want to search for “fly-out.” In that case, the user in the present invention can search for fly-out by a voice command. Especially, by using natural language processing, a normal everyday questioning statement like “What is a fly-out?” can start a speech search function.
- FIG. 3 and FIG. 4 could say an accurate speech command having an accurate query term like “what is a fly-out?,” as shown in FIG. 5 , the user might want the search engine to process for inaccurate words like “fl . . . what?” or “fly . . . what?”
- FIG. 6 shows a speech search method according to another embodiment of the present invention that may be performed by any device described herein.
- an unfamiliar term can be outputted through the display device, when a user is watching contents S 6010 .
- the contents includes audio “ . . . a mid-fielder ends the inning with a fly-out” in the broadcast and the user is not familiar with “fly-out.”
- the user can tell a speech search command (S 6020 ).
- a speech search command may include an unclear speech command.
- the user may say, “fl . . . what?” “fl . . . ” is an unclear term and may not be recognized as a query word.
- the display device determines whether recent audio frames of contents have a similar term to the term that the user wants to search for (S 6030 ). For example, the display device by using Voice-recognition and natural language processing determines from the user's speech search command of “fl . . . what?” that the “what?” is the part of the user's speech search command and the “fl . . . ” is the object of it. However, the display device does not directly search for “fl . . . ” according to FIG. 3 and FIG. 4 . The display device determines that “fl . . . ” is not a full term and searches a query term which the user might want to search from the recent audio data.
- the display device converts the recent audio data by voice-recognition or to text, the “fly-out” can be searched.
- the display device may determine that the query word “fl . . . ” is the query word for searching the query term “fly out.”
- the display device may provide search result which is based on the context of the content (S 6040 ), or may provide general search result which is not based on the context of the content (S 6050 ).
- the display device determines a query term matching a query word from the audio data of the contents, the display device provides the result of searching with the determined query term (S 6040 ). If the display device determines that there is no query term matching the query word, the display device diligently searches for the received query word and provides with the result (S 6050 ).
- FIG. 5 and FIG. 6 The operation of a display device according to another embodiment in FIG. 5 and FIG. 6 will be further described below.
- FIG. 7 shows a logical block diagram of the display device according to an embodiment of the present invention.
- the display device includes a media data processing module 7010 , a media data output unit 7020 , a speech search module 7030 , and an audio input unit 7040 .
- the media data processing module 7010 and the speech search module 7030 may be included in the control unit or be an application operated in the control unit 2100 of the FIG. 2 .
- the media data processing module 7010 may process media data which includes at least one of text data, audio data and video data.
- the media data processing module 7010 can decode media data and output the decoded media data to the output unit.
- the media data processing module 7010 may be equipped with a buffer 7050 and store a certain amount of the processing media data in the buffer 7050 .
- the buffer 7050 may be included in the storage unit 2010 shown in FIG. 2 .
- the media data processing module 7010 can process media data by streaming or process the pre-stored media data.
- the media data output unit 7020 can output the media data processed in the media data processing module 7010 .
- the media data output unit 7020 may include an audio output unit 7060 and a video output unit 7070 and they output the processed media data in audio and video respectively.
- the video output unit 7070 outputs images of the processed media data and they include visual data such as video clips, still shots, and texts.
- the audio output unit 7060 may be included in an audio input/output unit 2040 and the video output unit 7070 may be included in the display unit 2060 .
- the audio output unit 7060 and the video output unit 7070 may output the processed media data in audio and video.
- the audio input unit 7040 receives audio from outside of the display device and transmits it to the speech search module 7030 like a microphone.
- the speech search module 7030 performs the speech searching method according to an embodiment of the present invention.
- the speech search module 7030 receives a user's speech search command through the audio input unit 7040 .
- the speech search module 7030 receives the media data from the buffer 7050 included in the media data processing module 7010 .
- the speech search module 7030 includes a voice recognition module 7080 which recognizes the user's voice, analyzes its meaning, and extracting a query word or a query term.
- the speech search module 7030 recognizes and analyzes the user's speech search command by using the voice recognition module 7080 .
- the voice recognition module 7080 can perform natural language processing, process audio data and convert it to text data.
- the voice recognition module 7080 determines whether a query word included in the user's speech search command is a searchable and full query term. If it determines that it is a searchable and full query term, the voice recognition module provides with the result of searching with the query term by using the search engine 7090 .
- the search result can be transmitted to the media data processing module 7010 or straight to the media data output unit 7020 and then be outputted to the user.
- the search engine 7090 can perform the search by using a data base equipped in the display device or transmit the query term to the external search engine 1020 as shown in FIG. 1 and receive the result.
- the speech search module 7030 determines that at least one query word included in the speech search command is not a full (or complete) query term
- the audio data from the buffer 7050 included in the media data processing module 7010 will be received and processed by the voice recognition module 7080 .
- the speech search module 7030 receives the buffered audio data of a predetermined period of time from the time of the user's speech search command being received and can convert it to text data.
- the processed result is analyzed with the query word and a full query term that the user intended will be extracted.
- the search engine 7090 may perform searching by using the extracted query term and outputs the result.
- the speech search module 7030 can generate context information.
- the context information indicates information about media data that is being currently processed and outputted.
- the context information includes information about contents that can be extracted from the metadata or the metadata of the context that is being currently outputted.
- the context information includes content related information which is extracted from the predetermined interval of the media data.
- the speech search module 7030 as mentioned can extract the audio data of the media data and converts it to text.
- the converted text data also are included in the context information.
- the result of such processed audio data and the text information may be called as audio-related information of the media data and the audio-related information may also be included in the context information.
- the speech search module 7030 may further include an image processing module.
- the image processing module can process the outputted images of the media data. For example, the image processing module analyzes images outputted from the video output unit 7070 and extracts information about the images. The result of the analyzed images may be called as the image related information of the media data, and the image related information may also be included in the context information.
- FIG. 8 shows a flowchart for a speech search method according to another embodiment of the present invention that may be performed by any device described herein.
- FIG. 8 the same descriptions as shown in FIG. 4 are omitted.
- the display device outputs media data (S 8010 ).
- the media data include video data and audio data or text data depending on the contents.
- the display device receives speech search commands (S 8020 ).
- the speech search command that the display device received may include at least one query word.
- the speech search command can be a predetermined command or a natural command as normal every conversation statement by using natural language processing. According to an embodiment in FIG. 5 , “fl . . . what?” is a speech search command and “fl . . . ” is a query word. Also, the speech search command like “what just now?” may not include any query word. As for such case, next steps will be further illustrated with FIG. 9 .
- the display device determines whether a speech search command includes a query term which is searchable and full (S 8030 ).
- the display device determines whether the speech search command includes a searchable full query term by using at least one query word included in the speech search command.
- the display device determines whether the query word in the speech search command is the complete search word that the user wants to search with. For example, according to the embodiment in FIG. 5 , the user may say the speech search command as “fl . . . what?” or “fly . . . what?” In that case, the display device can determine that the search term with which the user wants to search is “fl” or “fly” or “fly out” by using the user's accent, pronunciation, or context information of the media data.
- the display device determines whether the query word is the searchable full query term based on the user's pronunciation, accent and mumbling. In general, users pronounce unfamiliar words differently than familiar words. Especially, for the unfamiliar words, the user's accent is unclear or they mumble the end of the words. The display device notices the pronunciation patterns for those words and determines whether the display device should search with the query word or should find a searchable full query term.
- the display device determines based on the context information whether the query word is a searchable full query term. As well, the display device can use both the user's pronunciation patterns and context information.
- the context information is information extracted from media data and includes information about the contents that are being currently outputted to the user.
- the media data includes at least one text data, audio data, video data and metadata.
- the metadata is data about the media data and includes information of title, genre, story, scene, schedule, character, and time.
- the context information is information related to the media data, and especially the contents that the user is watching.
- the media data displays baseball contents
- the metadata can indicate that the contents are related to a sport and the sport is baseball.
- the display device discovers that the contents are baseball-related by analyzing and extracting from the audio, images, texts of the media data. In that case, the display device would rather find “fly-out” as a query term that the user wants to search with than just “fly.”
- the display device determines the above by using the context information and comparing the query word with a baseball-related data base.
- Context information includes at least one of the metadata of media data, the audio-related information of media data, and the image-related information of media data.
- the metadata of media data includes at least one of the name, genre, character, scene, and schedule information of the contents.
- the display device processes the recent audio data of media data and extracts a query term (S 8030 ).
- the display device receives from and voice-recognizes the audio data stored in the buffer of a predetermined period of time from a time of the user's speech search command being received. And the display device extracts a query term matching the query word by comparing the texts with the user's query word, the text being voice-recognized result of the audio data.
- a minute of the audio data from the time of the user saying “fl . . . what?” can be read from the buffer, voice-recognized and converted to text data.
- Such generated text data can also be called as the context information.
- the text data will include “a midfielder ends the inning with a fly-out” near the time of the user giving the speech search command.
- the display device understands that the query word is not “fl . . . ” but “fly-out” that the user intended to search with and the full query term, “fly-out,” will be extracted. In other words, the display device determines that the query term is “fly” or “fly-out” that matches the query word “fl . . .
- the “fly-out” is the query term that the user intended to search with by using the context information.
- the text data as context information includes “a mid-fielder . . . with fly-out” and the query term will be determined that the “fly-out” is the object of the search by analyzing the arrangements of words (e.g. the nouns and prefixes) of the speech search command.
- the display device performs searching by using the extracted query term (S 8050 ).
- the display device searches information about the query term with the internal search engine or transmits the query term through a network to the external search engine having the search function and receives the result of the search.
- the search or the search result of the query term includes the definition of the term and diverse data related to contents which the user is watching.
- the display device provides a user with the search result (S 8060 ).
- the search result can be provided in various ways.
- the display device provides the search result as audio or as display output.
- the display device outputs the result to the user in voice or in caption on a display screen.
- S 8030 step may include S 8040 . That is, in the step of determining the query term, recent audio data can be processed and audio related information can be generated. The audio related information may be included in the context information as mentioned. The display device may determine the query term by comparing and analyzing the context information with the query word.
- Context information includes processed media data information in addition to media data.
- the display device can process a part of the media data being outputted and as for the audio data, it is as pre-described.
- the display device image-processes a predetermined time amount of video data and extracts image related information about the media data for that amount.
- the display device may determine that content that is currently being displayed is baseball image. Especially when a user searches for a player's name or information, “what is the number four hitter's name?” as an example can be a speech search command. In that case, the display device obtains the image information about the number four hitter by performing image processing for the video data and obtains additional information about him by the image searching technology.
- the display device may include an image processing module in addition to the units shown in FIG. 7 .
- the image processing module processes and analyzes video data stored in the buffer.
- the search engine of the display device receives image information from the image processing module and performs image searching for the image information.
- FIG. 9 shows an example of a speech search method according to an embodiment of the present invention that may be performed by any device described herein.
- a query term which matches to a query word cannot be determined or a query word itself is also unclear.
- a user's speech search command does not include any query word (e.g., “what?”).
- the display device can provide query term candidates to the user even in that case as shown in FIG. 9 .
- the query term candidates can be any term of a predetermined period of time from the time of the user giving the speech search command. For example, audio data of thirty seconds from the time of the user giving the speech search command is voice-recognized, searchable terms are extracted, and they are displayed in a chronological order by the display device. In that case, the images of the time when the terms being outputted may be read from the buffer and may be displayed in form of thumb nail images. Not only the audio data but also the video data processed can be stored in the buffer as above-mentioned.
- the user can select a query term from the query term candidates and start the search with it. The query term selection can be performed by a remote control, an voice input, a gesture input, and etc.
- FIG. 9 shows an embodiment of displaying query term candidates but the display device can output them in audio.
- Providing query term candidates in FIG. 9 can be performed together with the steps of S 8030 to S 8050 in FIG. 8 .
- the S 8030 step determines whether the query word exists and the query word is a searchable full query term at the same time. If a query word is not obtained or does not exist, the display device in the step of S 8040 extracts at least one query term candidate, provides the user with it, and receives the user's choice for it.
- the display device can offer the user the query term candidate and receives the user's choice for it. Also, in between the steps of S 8040 and S 8050 shown in FIG. 8 and FIG. 9 , the display device can provide a confirmation request for the determined query term to the user. When the confirmation request is received by the user's remote control, audio, or gesture inputs, the display device performs the search with the determined query term and provides the search result. When the user does not confirm or confirms that the query term is not the one that the user intended, the display device can provide additional query term candidates to the user.
- the display device recognizes “fl” as a query word and “fly” as a query term. In that case, the display device can output a confirmation request as “would you like to search with “fly”?” The confirmation request can be outputted with a pop-up window displaying “Yes” and “No.”
- the display device can perform the search with “fly” and provide the search result.
- the display device reviews the context information and provides at least one query term candidate. And when the user selects one query term from the at least one query term candidate, the display device performs the search with the selected query term and provides the search result.
- a plurality of query term candidates for the query word from the context information can be extracted and provided. For example, “fly” and “fly-out” can be displayed and provided to the user. The selected query term by the user will be performed the search.
- the display device can offer the selected query term and send a confirmation request for the selected query term in addition to offering the query term candidates to the user. By doing so, the display device can avoid providing a search result of an unwanted search term.
- the embodiments are not limited to simple words or phrases. That is, in some embodiments, entire query sentences may be deduced based on context and/or search history. In some embodiments, any search history may be based on historic searches performed by the local device and/or for a specific user. In other embodiments, search histories used to develop the query term/phrase may be based on search histories or search trends from Internet-based social media.
- a query word may be “Titanic” and the proposed query term that is based on a social media search history or search trend may be “Who starred in Titanic?” Also, the proposed query term may be a list of query terms such as: “Who starred in Titanic?” “Who directed Titanic?” “When did Titanic sink?”
- the speech search method of the present invention information about audio and video that have already passed from the contents that a user is watching can be conveniently searched. Especially, when the user does not recognize a search object accurately, the optimized search result of the media data that the user is watching can be provided by using the user's pronunciation patterns and context information.
- the present invention provides the optimized search result. Also, when it is difficult to determine the search object that the user intends by even using the non-full or incomplete query word or the context information, the user can select a query term that he or she wants to search for from query term candidates of a predetermined period of time suggested by the display device.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Library & Information Science (AREA)
- Acoustics & Sound (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A speech search method performed by a display device, the method including outputting media data including audio data, receiving a speech search command for additional data about the outputted media data from a user, the speech search command including at least one query word, determining whether the at least one query word matches a query term that is full and searchable, when the at least one query word matches the query term that is full and searchable, performing a search for the additional data using the query term, and when the at least one query word does not match the query term that is full and searchable, determining the query term from a predetermined amount of the audio data prior to receiving the speech search command and performing the search for the additional data using the query term.
Description
- This application is a Continuation of co-pending application Ser. No. 13/761,102 filed on Feb. 6, 2013, which claims the benefit of Korean Application No. 10-2012-0095034, filed on Aug. 29, 2012. The entire contents of all of the above applications are hereby incorporated by reference.
- 1. Field of the Invention
- The present invention relates to a display device and more particularly, to a speech search method for a display device.
- 2. Discussion of Background Art
- As network technology has improved, users can easily search a variety of information. Especially, the users can search for digital contents at the same time they view the digital contents using a display device. They can search for not only information about the contents themselves but also detailed information about a part of the contents that they are viewing or the object of the contents.
- Searching for information about contents can be performed in various ways. Previously, the users inputted their search words by using additional input devices such as a keyboard. However, due to the improvements of the recent voice recognition technology, the users can input various voice commands to a device in order to control the device. Therefore, the users can search for information about contents using their voice commands at the same time they are viewing the contents.
- Accordingly, the present invention is directed to a display device and a speech search method that substantially obviates one or more problems due to limitations and disadvantages of the related art.
- An object of the present invention is to provide a method of searching desired information about contents using a speech search in a more efficient and accurate manner.
- Another object of the present invention is to provide a speech search method that generates search results that a user intended to search using speech search commands and context information associated with media data being viewed by the user even when the user does not exactly know what he or she is trying to search.
- Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
- To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, a speech search method for a display device includes the steps of outputting media data, receiving a speech search command from a user, and determining whether the speech search command includes a user query term which is full and searchable. If the speech search command does not include a user query term which is full and searchable, the method further comprises the step of extracting a media query term which is full and searchable from audio data of the media data which is outputted immediately prior to the speech search command. Finally, the method includes the step of performing a speech search using the extracted media query term.
- In another aspect of the present invention, a display device includes a media data processing module processing media data, a media data output unit outputting the processed media data, and an audio input unit receiving a speech search command from a user. The display device further includes a speech search module determining a query term from the speech search command and performing a speech search using the determined query term. The display device determines whether the speech search command includes a user query term which is full and searchable, extracts a media query term from audio data of the media data which is outputted immediately prior to the speech search command if the speech search command does not include a user query term, and performs a speech search using the extracted media query term.
- It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
- The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principle of the invention. In the drawings;
-
FIG. 1 illustrates a conceptual diagram of a network according to an embodiment of the present invention; -
FIG. 2 illustrates a block diagram of a display device according to an embodiment of the present invention; -
FIG. 3 illustrate a speech search method according to an embodiment of the present invention; -
FIG. 4 illustrates a flowchart of a speech search method according to an embodiment of the present invention; -
FIG. 5 illustrates a speech search method according to another embodiment of the present invention; -
FIG. 6 illustrates a flowchart of a speech search method according to another embodiment of the present invention; -
FIG. 7 illustrates a logical block diagram of a display device according to an embodiment of the present invention; -
FIG. 8 illustrates a flowchart of a speech search method according to another embodiment of the present invention; and -
FIG. 9 illustrates a speech search method according to another embodiment of the present invention. - Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
- It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the inventions. Thus, it is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
- The present invention relates to a display device performing a speech search and providing a user with a result of the speech search. A speech search is a technology that recognizes a user's voice command and performs a search with the user's voice command. A speech search utilizes a voice or speech recognition technology. The voice recognition technology in the present invention includes natural language processing. The natural language processing is a process that analyzes the type, meaning, and conversation of a normal everyday language and converts the result of the analysis for a device to process it. In other words, it is not a predetermined keyword that the device recognizes but a spontaneous conversation that it recognizes and it performs the processing according to a user's intention.
- The display device according to the present invention can be any one of a variety of devices that can process and output digital media data or digital contents. The digital contents include at least one of text, audio, and video data. For example, the display device can be a TV, a set-top box, an internet processing device, a recorded media player, a media recording device, a wireless communication device, a cell phone, a Personal Digital Assistant (“PDA”), a computer, a laptop, and a tablet PC. In other words, the display device can be any one of a variety of devices providing a user with processed digital contents, and the display device may be referred to as “device” hereinafter.
-
FIG. 1 shows a conceptual diagram of a network according to an embodiment of the present invention. As shown inFIG. 1 ,display devices 1040 are connected to anetwork 1030. Thenetwork 1030 is a network that transmits and receives data by using various communication protocols such as cable, wireless communication, optical communication, or IP network. Thedisplay devices 1040 receive contents from acontents server 1010 through thenetwork 1030. Thecontents server 1010 is a contents provider providing digital contents, and thedisplay device 1040 can be used as a contents server based on the network architecture. - The
display device 1040 provides a user with the contents received from thecontents server 1010. Thedisplay device 1040 provides contents by processing received contents data and displays the processed data. And thedisplay device 1040 receives a search command from a user, transmits a search term to asearch engine 1020, and provides a result of the search back to the user after receiving the result from thesearch engine 1020. - In the following, at least one searchable word which is a target of searching can be called “query term.” Query term is an object to be searched by the search engine and includes at least one word. The
display device 1040 may perform searching from a database included in thedisplay device 1040 by using a query term or transmit the query term to thesearch engine 1020 and receive a result of the search. And at least one word included in query terms is called “query word.” When the query term includes a plurality of words, each word can be called query word. If the query term only has one word, the query word is the query term. But, in the following, the query word is a word that a user indicates as the user speaks a speech search command. In other words, the user can speak a partial or unclear word and such a word can be recognized by the display device as the query word. -
FIG. 2 shows a block diagram of a display device according to an embodiment of the present invention. -
FIG. 2 indicates an example of thedisplay device 1040 shown inFIG. 1 having astorage unit 2010, acommunication unit 2020, asensor unit 2030, an audio input/output unit 2040, acamera unit 2050, adisplay unit 2060, apower unit 2070, aprocessor 2080, and acontroller 2090. The display device inFIG. 2 is shown as an example only and it is not required for all the units to be equipped as shown inFIG. 2 . A structure block necessary for the display device according to an embodiment of the present invention will be described as follows. - The
storage unit 2010 stores various digital data such as video, audio, picture, movie clips, and applications. Thestorage unit 2010 indicates a various digital data storage space for various digital data such as flash memory, Hard Disk Drive (“HDD”), and Solid State Drive (“SSD”). In the following, a buffer necessary for processing data can be included in thestorage unit 2010. Also, thestorage unit 2010 can store a database necessary for searching information. - The
communication unit 2020 transmits and receives data and performs communications by using various protocols associated with the display device. Thecommunication unit 2020 is connected to the external networks through wire or wirelessly, and transmits and receives digital data. In the present invention, the display device receives media data by using thecommunication unit 2020, or transmits a search query and receives the search result of the query. - The
sensor unit 2030 may recognize a user's input or environment of the device by using a plurality of sensors and transmit to thecontroller 2090. Thesensor unit 2030 can have a plurality of sensing means. As an embodiment, a plurality of sensing means can include gravity sensor, terrestrial magnetism sensor, motion sensor, gyro sensor, acceleration sensor, inclination sensor, brightness sensor, olfactory sensor, temperature sensor, depth sensor, pressure sensor, bending sensor, audio sensors, video sensor, Global Positioning System sensor (“GPS”) sensors, and touch sensors. The term “sensor unit” 2030 is used to refer to all of the various sensing means. Thesensor unit 2030 senses the user's various inputs and conditions, and transmits the result of the sensing for the device performing necessary functions based on the result of the sensing. The sensors may be included in the device as a different element or combined as at least one element. Thesensor unit 2030 may be selectively equipped according to an embodiment. - The audio input/
output unit 2040 includes an audio output means such as a speaker and an audio input means such as a microphone. The audio input/output unit 2040 may perform audio outputting of the device or audio inputting toward the device. The audio input/output unit 2040 can be used as an audio sensor. However, according to an embodiment of the present invention, when the display device does not include a speaker or a microphone (for example, when the display device is a set-top box), the audio input/output unit 2040 processes audio data and transmits the audio data to an external device, or receives audio data from the external device and processes it. An audio input unit and an audio output unit may be separately equipped and an embodiment will be illustrated as follows. - The
camera unit 2050 records movie clips and takes pictures and may be selectively equipped according to an embodiment. Thecamera unit 2050 can be used as a motion sensor or a visual sensor as pre-described. - The
display unit 2060 can output images on a display screen. If the display is a touch-sensitive display, thedisplay unit 2060 can be used as a touch sensor. If the display or the device is flexible, they may be used as a bending sensor. However, according to an embodiment, if the display device does not include a display panel or a screen such as a set-top box and a computer, the display unit processes display data and transmits the display data to an external device like a monitor. Thedisplay unit 2060 may be called as a video output unit hereinafter. - The
power unit 2070 provides power to the device as a power source connected with an internal battery or an external power. - The
processor 2080 executes various applications stored in thestorage unit 2010 and processes internal data in the device. - The
controller 2090 controls the units of the device, and manages transmitting and receiving data between the units and the functions of each unit. - The
processor 2080 and thecontroller 2090 may be combined in a single chip and implement the functions above-described. In that case, they may be called acontrol unit 2100. The speech search method of the present invention can be performed by thecontrol unit 2100 and according to an embodiment, performed by modules controlled by thecontrol unit 2100. Further illustration is as follows. -
FIG. 2 is a block diagram of a display device according to an embodiment of the present invention, and the separately illustrated blocks are shown as the elements of the device. Thus, the elements of the device can be combined in one chip or a plurality of chips as designed. - In the following, a speech search method can be performed in the
control unit 2100 inFIG. 2 and according to an embodiment, the speech search method can be executed by an application which is stored in thestorage unit 2010 and operated by thecontrol unit 2100. As for the performance of thecontrol unit 2100 executing such a speech search, further description is to be followed. In addition, in other embodiments below, for the purpose of convenience to explain the present invention, a TV may be used as an example of the display device. However, as mentioned, it is obvious to a person ordinarily skilled in the art that the display device is not limited to only a TV. -
FIG. 3 shows a speech search method according to an embodiment of the present invention. - As an embodiment, a
display device 3010 similar todisplay device 1040 outputs baseball contents as media data. The baseball contents may be received as live broadcast contents or may be pre-stored in the storage unit of thedisplay device 3010. The media data, which are contents that the display device outputs, may include audio data and video data. Thedisplay device 3010 outputs video data through a display screen and audio data through a speaker. - A user watching the baseball contents by the
display device 3010 can search information about the contents by voice. For example, for images being displayed on a screen, one can search information about a player or an unfamiliar word from the words that are spoken by the commentator. As shown inFIG. 3 , if the commentator says “ . . . a mid-fielder ends the inning with a fly-out” in the broadcast, the user might want to search for “fly-out.” In that case, the user in the present invention can search for “fly-out” by a voice command. Especially, by using natural language processing, a normal everyday questioning statement like “What is a fly-out?” can start a speech search function. -
FIG. 4 shows a flowchart of a speech search method according to an embodiment of the present invention that may be performed by any device described herein. - The method begins when a display device outputs media data (S4010). As mentioned relative to
FIG. 3 , the media contents may include video data and audio data. Further, the media contents may include text data depending on contents. - The display device then receives a speech search command (S4020). The speech search command can be a predetermined command or a normal everyday language statement by using natural language processing. In the embodiment of
FIG. 3 , “What is a fly-out?” is a speech search command. Using voice-recognition technology, the display device receives the speech search command from the user's voice received via a microphone. - The display device extracts a query term from a speech search command (S4030). When the display device recognizes a speech search command from a user's voice, a query term which is the object of the search can be extracted from the speech search command. In the embodiment of
FIG. 3 , “fly-out” is a query term in the speech search command of “what is a fly-out?” That is, the display device recognizes the user's voice of “what is a fly-out?” as a speech search command, and extracts the query term, “fly-out,” to process the search from the recognized speech search command. - In the following, words included in a speech search command are called, “query word.” According to the embodiments of
FIG. 3 andFIG. 4 , “fly” and “out” are query words. Although the display device can search for a query word, a user might want to search for a combined query word, a query term, not for each query word. The display device can extract a query term by using voice-recognition and natural language processing. In that case, as for the extraction of the query term if there is only one query word, the query word is extracted and if there are a plurality of query words, combined query words, that is, a query term, is extracted. The extraction of query word can be performed based on the context information of media data. In the embodiments ofFIG. 3 andFIG. 4 , as the user is watching the baseball contents, the display device can determine whether the user wants to search for “fly” or “out” or “fly-out” as a baseball term. The use of context information allows the search to exclude searches for irrelevant synonyms (e.g., a search for information about the insect fly or the act of flying in a plane.) - The display device performs the speech search by using the extracted query term (S4040). The display device can search for query terms by using an internal search engine, or transmits query terms through a network to an external search engine having the search function and receives the result of the search. A search or a search result of a query term includes the definition of the term and diverse data related to contents which a user is watching.
- The display device provides a user with a search result (S4050). The search result can be provided in various ways. For example, the display device provides the search result in audio or in display. In other words, the display device outputs the search result to a user in audio or in captions on the display screen.
- However, in the embodiments of
FIG. 3 andFIG. 4 , a speech search command may not include a full query term. That is, if a query term is incomplete or ambiguous, a query term for searching may not be able to be extracted. Further illustration is as follows. In the following, the same technical descriptions as inFIG. 3 andFIG. 4 are not repeated. -
FIG. 5 is a speech search method according to another embodiment of the present invention. - As shown in
FIG. 3 , the display device outputs baseball contents as contents. The baseball contents may be received as live broadcast contents or be pre-stored in the storage unit of adisplay device 5010 similar tostorage unit 2010. Contents that the display device outputs may include audio data and video data. Thedisplay device 5010 outputs video data through a display screen and audio data through a speaker. - A user watching the baseball contents by the
display device 5010 can search information about the contents by voice. For example, for images being displayed on a screen, one can search information about a player or an unfamiliar word spoken by the commentator. As shown inFIG. 3 , if the user says, “ . . . a mid-fielder ends the inning with a fly-out” in the broadcast, the user might want to search for “fly-out.” In that case, the user in the present invention can search for fly-out by a voice command. Especially, by using natural language processing, a normal everyday questioning statement like “What is a fly-out?” can start a speech search function. - However, when words in the audio data of the contents are searched, a user often wants to search using an unfamiliar word. In that case, it is often difficult for a user to accurately hear a word and to then command the search correctly. That is, as shown in
FIG. 5 , when the user does not know the word, “fly-out,” it is often very difficult for the user to accurately give the right search command for “fly-out.” Especially, unlike the video search, words included in audio data go away as they are outputted. Unless the words are outputted again, it is difficult to even know what they were. In other words, words included in audio data are temporarily short-lived, the probability of an inaccurate recognition of a search object is high compared to video data. - Thus, although the user shown in
FIG. 3 andFIG. 4 could say an accurate speech command having an accurate query term like “what is a fly-out?,” as shown inFIG. 5 , the user might want the search engine to process for inaccurate words like “fl . . . what?” or “fly . . . what?” -
FIG. 6 shows a speech search method according to another embodiment of the present invention that may be performed by any device described herein. - As seen in
FIG. 5 , an unfamiliar term can be outputted through the display device, when a user is watching contents S6010. According to an example shown inFIG. 5 , when the user is watching baseball contents, the contents includes audio “ . . . a mid-fielder ends the inning with a fly-out” in the broadcast and the user is not familiar with “fly-out.” - The user can tell a speech search command (S6020). In that case, such a speech search command may include an unclear speech command. According to an example in
FIG. 5 , the user may say, “fl . . . what?” “fl . . . ” is an unclear term and may not be recognized as a query word. - The display device determines whether recent audio frames of contents have a similar term to the term that the user wants to search for (S6030). For example, the display device by using Voice-recognition and natural language processing determines from the user's speech search command of “fl . . . what?” that the “what?” is the part of the user's speech search command and the “fl . . . ” is the object of it. However, the display device does not directly search for “fl . . . ” according to
FIG. 3 andFIG. 4 . The display device determines that “fl . . . ” is not a full term and searches a query term which the user might want to search from the recent audio data. If the display device converts the recent audio data by voice-recognition or to text, the “fly-out” can be searched. Thus, the display device may determine that the query word “fl . . . ” is the query word for searching the query term “fly out.” - The display device may provide search result which is based on the context of the content (S6040), or may provide general search result which is not based on the context of the content (S6050).
- In the above description, we have explained with the example of “fl . . . what” as a case of the speech search command including an incomplete search term. However, the user may utter various speech search commands in various forms including various search words. Thus, when the display device determines a query term matching a query word from the audio data of the contents, the display device provides the result of searching with the determined query term (S6040). If the display device determines that there is no query term matching the query word, the display device diligently searches for the received query word and provides with the result (S6050).
- The operation of a display device according to another embodiment in
FIG. 5 andFIG. 6 will be further described below. -
FIG. 7 shows a logical block diagram of the display device according to an embodiment of the present invention. - In
FIG. 7 , the display device includes a mediadata processing module 7010, a mediadata output unit 7020, aspeech search module 7030, and anaudio input unit 7040. The mediadata processing module 7010 and thespeech search module 7030 may be included in the control unit or be an application operated in thecontrol unit 2100 of theFIG. 2 . - The media
data processing module 7010 may process media data which includes at least one of text data, audio data and video data. The mediadata processing module 7010 can decode media data and output the decoded media data to the output unit. According to an embodiment, the mediadata processing module 7010 may be equipped with abuffer 7050 and store a certain amount of the processing media data in thebuffer 7050. Thebuffer 7050 may be included in thestorage unit 2010 shown inFIG. 2 . The mediadata processing module 7010 can process media data by streaming or process the pre-stored media data. - The media
data output unit 7020 can output the media data processed in the mediadata processing module 7010. The mediadata output unit 7020 may include anaudio output unit 7060 and avideo output unit 7070 and they output the processed media data in audio and video respectively. Thevideo output unit 7070 outputs images of the processed media data and they include visual data such as video clips, still shots, and texts. According to an embodiment ofFIG. 2 , theaudio output unit 7060 may be included in an audio input/output unit 2040 and thevideo output unit 7070 may be included in thedisplay unit 2060. Also, as mentioned, if the display device does not include additional output devices like a set-top box, theaudio output unit 7060 and thevideo output unit 7070 may output the processed media data in audio and video. - The
audio input unit 7040 receives audio from outside of the display device and transmits it to thespeech search module 7030 like a microphone. - The
speech search module 7030 performs the speech searching method according to an embodiment of the present invention. Thespeech search module 7030 receives a user's speech search command through theaudio input unit 7040. Thespeech search module 7030 receives the media data from thebuffer 7050 included in the mediadata processing module 7010. Thespeech search module 7030 includes avoice recognition module 7080 which recognizes the user's voice, analyzes its meaning, and extracting a query word or a query term. - The
speech search module 7030 recognizes and analyzes the user's speech search command by using thevoice recognition module 7080. Thevoice recognition module 7080 can perform natural language processing, process audio data and convert it to text data. Thevoice recognition module 7080 determines whether a query word included in the user's speech search command is a searchable and full query term. If it determines that it is a searchable and full query term, the voice recognition module provides with the result of searching with the query term by using thesearch engine 7090. The search result can be transmitted to the mediadata processing module 7010 or straight to the mediadata output unit 7020 and then be outputted to the user. Thesearch engine 7090 can perform the search by using a data base equipped in the display device or transmit the query term to theexternal search engine 1020 as shown inFIG. 1 and receive the result. - When the
speech search module 7030 determines that at least one query word included in the speech search command is not a full (or complete) query term, the audio data from thebuffer 7050 included in the mediadata processing module 7010 will be received and processed by thevoice recognition module 7080. Thespeech search module 7030 receives the buffered audio data of a predetermined period of time from the time of the user's speech search command being received and can convert it to text data. The processed result is analyzed with the query word and a full query term that the user intended will be extracted. And thesearch engine 7090 may perform searching by using the extracted query term and outputs the result. - The
speech search module 7030 can generate context information. The context information indicates information about media data that is being currently processed and outputted. First, the context information includes information about contents that can be extracted from the metadata or the metadata of the context that is being currently outputted. Also, the context information includes content related information which is extracted from the predetermined interval of the media data. Thespeech search module 7030 as mentioned can extract the audio data of the media data and converts it to text. The converted text data also are included in the context information. The result of such processed audio data and the text information may be called as audio-related information of the media data and the audio-related information may also be included in the context information. - The
speech search module 7030 may further include an image processing module. The image processing module can process the outputted images of the media data. For example, the image processing module analyzes images outputted from thevideo output unit 7070 and extracts information about the images. The result of the analyzed images may be called as the image related information of the media data, and the image related information may also be included in the context information. -
FIG. 8 shows a flowchart for a speech search method according to another embodiment of the present invention that may be performed by any device described herein. - As for
FIG. 8 , the same descriptions as shown inFIG. 4 are omitted. - The display device outputs media data (S8010). As shown, the media data include video data and audio data or text data depending on the contents.
- The display device receives speech search commands (S8020). The speech search command that the display device received may include at least one query word. The speech search command can be a predetermined command or a natural command as normal every conversation statement by using natural language processing. According to an embodiment in
FIG. 5 , “fl . . . what?” is a speech search command and “fl . . . ” is a query word. Also, the speech search command like “what just now?” may not include any query word. As for such case, next steps will be further illustrated withFIG. 9 . - The display device determines whether a speech search command includes a query term which is searchable and full (S8030).
- In other words, the display device determines whether the speech search command includes a searchable full query term by using at least one query word included in the speech search command. The display device determines whether the query word in the speech search command is the complete search word that the user wants to search with. For example, according to the embodiment in
FIG. 5 , the user may say the speech search command as “fl . . . what?” or “fly . . . what?” In that case, the display device can determine that the search term with which the user wants to search is “fl” or “fly” or “fly out” by using the user's accent, pronunciation, or context information of the media data. - The display device determines whether the query word is the searchable full query term based on the user's pronunciation, accent and mumbling. In general, users pronounce unfamiliar words differently than familiar words. Especially, for the unfamiliar words, the user's accent is unclear or they mumble the end of the words. The display device notices the pronunciation patterns for those words and determines whether the display device should search with the query word or should find a searchable full query term.
- Also, the display device determines based on the context information whether the query word is a searchable full query term. As well, the display device can use both the user's pronunciation patterns and context information.
- The context information is information extracted from media data and includes information about the contents that are being currently outputted to the user. For example, the media data includes at least one text data, audio data, video data and metadata. The metadata is data about the media data and includes information of title, genre, story, scene, schedule, character, and time. The context information is information related to the media data, and especially the contents that the user is watching. In the embodiment, if the media data displays baseball contents, the metadata can indicate that the contents are related to a sport and the sport is baseball. Also, the display device discovers that the contents are baseball-related by analyzing and extracting from the audio, images, texts of the media data. In that case, the display device would rather find “fly-out” as a query term that the user wants to search with than just “fly.” The display device determines the above by using the context information and comparing the query word with a baseball-related data base.
- Context information includes at least one of the metadata of media data, the audio-related information of media data, and the image-related information of media data. The metadata of media data includes at least one of the name, genre, character, scene, and schedule information of the contents.
- When at least one query word is not a full query term, the display device processes the recent audio data of media data and extracts a query term (S8030).
- The display device receives from and voice-recognizes the audio data stored in the buffer of a predetermined period of time from a time of the user's speech search command being received. And the display device extracts a query term matching the query word by comparing the texts with the user's query word, the text being voice-recognized result of the audio data.
- For example, a minute of the audio data from the time of the user saying “fl . . . what?” can be read from the buffer, voice-recognized and converted to text data. Such generated text data can also be called as the context information. The text data will include “a midfielder ends the inning with a fly-out” near the time of the user giving the speech search command. Thus, the display device understands that the query word is not “fl . . . ” but “fly-out” that the user intended to search with and the full query term, “fly-out,” will be extracted. In other words, the display device determines that the query term is “fly” or “fly-out” that matches the query word “fl . . . ” and that the “fly-out” is the query term that the user intended to search with by using the context information. In the example above, the text data as context information includes “a mid-fielder . . . with fly-out” and the query term will be determined that the “fly-out” is the object of the search by analyzing the arrangements of words (e.g. the nouns and prefixes) of the speech search command.
- The display device performs searching by using the extracted query term (S8050). The display device searches information about the query term with the internal search engine or transmits the query term through a network to the external search engine having the search function and receives the result of the search. The search or the search result of the query term includes the definition of the term and diverse data related to contents which the user is watching.
- The display device provides a user with the search result (S8060). The search result can be provided in various ways. For example, the display device provides the search result as audio or as display output. In other words, the display device outputs the result to the user in voice or in caption on a display screen.
- Depending on an embodiment, S8030 step may include S8040. That is, in the step of determining the query term, recent audio data can be processed and audio related information can be generated. The audio related information may be included in the context information as mentioned. The display device may determine the query term by comparing and analyzing the context information with the query word.
- Context information includes processed media data information in addition to media data. The display device can process a part of the media data being outputted and as for the audio data, it is as pre-described. The display device image-processes a predetermined time amount of video data and extracts image related information about the media data for that amount. In the above-mentioned embodiment, by using image processing, the display device may determine that content that is currently being displayed is baseball image. Especially when a user searches for a player's name or information, “what is the number four hitter's name?” as an example can be a speech search command. In that case, the display device obtains the image information about the number four hitter by performing image processing for the video data and obtains additional information about him by the image searching technology. In that case, the display device may include an image processing module in addition to the units shown in
FIG. 7 . The image processing module processes and analyzes video data stored in the buffer. Also, in this case, the search engine of the display device receives image information from the image processing module and performs image searching for the image information. -
FIG. 9 shows an example of a speech search method according to an embodiment of the present invention that may be performed by any device described herein. - In
FIG. 8 , it may occur that a query term which matches to a query word cannot be determined or a query word itself is also unclear. Further, as shown inFIG. 9 , a user's speech search command does not include any query word (e.g., “what?”). The display device can provide query term candidates to the user even in that case as shown inFIG. 9 . - The query term candidates can be any term of a predetermined period of time from the time of the user giving the speech search command. For example, audio data of thirty seconds from the time of the user giving the speech search command is voice-recognized, searchable terms are extracted, and they are displayed in a chronological order by the display device. In that case, the images of the time when the terms being outputted may be read from the buffer and may be displayed in form of thumb nail images. Not only the audio data but also the video data processed can be stored in the buffer as above-mentioned. The user can select a query term from the query term candidates and start the search with it. The query term selection can be performed by a remote control, an voice input, a gesture input, and etc.
-
FIG. 9 shows an embodiment of displaying query term candidates but the display device can output them in audio. - Providing query term candidates in
FIG. 9 can be performed together with the steps of S8030 to S8050 inFIG. 8 . In that case, the S8030 step determines whether the query word exists and the query word is a searchable full query term at the same time. If a query word is not obtained or does not exist, the display device in the step of S8040 extracts at least one query term candidate, provides the user with it, and receives the user's choice for it. - As described with reference to
FIG. 8 andFIG. 9 , if it is difficult for the display device to determine the query term, the display device can offer the user the query term candidate and receives the user's choice for it. Also, in between the steps of S8040 and S8050 shown inFIG. 8 andFIG. 9 , the display device can provide a confirmation request for the determined query term to the user. When the confirmation request is received by the user's remote control, audio, or gesture inputs, the display device performs the search with the determined query term and provides the search result. When the user does not confirm or confirms that the query term is not the one that the user intended, the display device can provide additional query term candidates to the user. - For example, in the embodiments of
FIG. 5 andFIG. 9 , the display device recognizes “fl” as a query word and “fly” as a query term. In that case, the display device can output a confirmation request as “would you like to search with “fly”?” The confirmation request can be outputted with a pop-up window displaying “Yes” and “No.” When the user replies to the confirmation request by inputting “Yes,” the display device can perform the search with “fly” and provide the search result. When the user replies to the confirmation request by inputting “NO” or not inputting anything for a predetermined period of time, the display device reviews the context information and provides at least one query term candidate. And when the user selects one query term from the at least one query term candidate, the display device performs the search with the selected query term and provides the search result. - Further, a plurality of query term candidates for the query word from the context information can be extracted and provided. For example, “fly” and “fly-out” can be displayed and provided to the user. The selected query term by the user will be performed the search.
- That is, by the embodiment above, the display device can offer the selected query term and send a confirmation request for the selected query term in addition to offering the query term candidates to the user. By doing so, the display device can avoid providing a search result of an unwanted search term. Furthermore, while the concepts discussed above relate to simple words (e.g., fly) or phrases (e.g., fly-out), the embodiments are not limited to simple words or phrases. That is, in some embodiments, entire query sentences may be deduced based on context and/or search history. In some embodiments, any search history may be based on historic searches performed by the local device and/or for a specific user. In other embodiments, search histories used to develop the query term/phrase may be based on search histories or search trends from Internet-based social media. That is, a query word may be “Titanic” and the proposed query term that is based on a social media search history or search trend may be “Who starred in Titanic?” Also, the proposed query term may be a list of query terms such as: “Who starred in Titanic?” “Who directed Titanic?” “When did Titanic sink?”
- Thus, according to the speech search method of the present invention, information about audio and video that have already passed from the contents that a user is watching can be conveniently searched. Especially, when the user does not recognize a search object accurately, the optimized search result of the media data that the user is watching can be provided by using the user's pronunciation patterns and context information.
- Although the user's speech search command includes an unclear query word or does not include a query word, the present invention provides the optimized search result. Also, when it is difficult to determine the search object that the user intends by even using the non-full or incomplete query word or the context information, the user can select a query term that he or she wants to search for from query term candidates of a predetermined period of time suggested by the display device.
Claims (18)
1. A speech search method performed by a display device, the method comprising:
outputting media data including audio data;
receiving a speech search command for additional data about the outputted media data from a user, the speech search command including at least one query word;
determining whether the at least one query word matches a query term that is full and searchable;
when the at least one query word matches the query term that is full and searchable, performing a search for the additional data using the query term; and
when the at least one query word does not match the query term that is full and searchable, determining the query term from a predetermined amount of the audio data prior to receiving the speech search command and performing the search for the additional data using the query term.
2. The method of claim 1 , wherein the step of determining whether the at least one query word matches a query term that is full and searchable comprises:
determining whether the at least one query word matches the query term based on at least one of pronunciation patterns of the user and context information of the media data.
3. The method of claim 2 , wherein the context information comprises at least one of title information, genre information, character information, scene information, schedule information, audio related information, and image related information of the media data.
4. The method of claim 1 , wherein the step of determining the query term from the predetermined amount of the audio data comprises:
voice-recognizing the predetermined amount of the audio data;
extracting at least one query term candidate from a corresponding voice recognition result; and
determining the at least one query word that matches the query term that is full and searchable from the at least one query term candidate.
5. The method of claim 4 , the method further comprising:
outputting the at least one query term candidate when the at least one query term candidate does not match the at least one query word.
6. The method of claim 1 , wherein the step of determining the query term from the predetermined amount of the audio data comprises:
voice-recognizing the predetermined amount of the audio data;
extracting at least one query term candidate from a corresponding voice recognition result;
outputting the at least one query term candidate; and
receiving a command for selecting the query term from the at least one query term candidate.
7. The method of claim 6 , wherein the outputting step of the at least one query term candidate comprises
providing the user with the at least one query term candidate in a chronological order; and
providing the user with an image of the media data while the at least one query term candidate being outputted.
8. The method of claim 1 , wherein the step of performing the search further comprises:
providing the user with a confirmation request for the determined query term; and
when the determined query term is confirmed by the user, performing the search using the confirmed query term.
9. The method of claim 1 , wherein the step of determining whether the at least one query word matches the query term that is full and searchable further comprises:
determining whether the at least one query word corresponds to the full and searchable query term or a partial query term.
10. A display device, comprising:
a media data processing module configured to process media data;
a media data output unit configured to output the processed media data;
an audio input unit configured to receive a speech search command for additional data about the outputted media data from a user, the speech search command including at least one query word; and
a processor configured to
determine whether the at least one query word matches a query term that is full and searchable,
when the at least one query word matches a query term in that is full and searchable, perform a search for the additional data using the query term, and
when the at least one query word does not match the query term that is full and searchable, determine the query term from a predetermined amount of the audio data prior to receiving the speech search command and perform the search for the additional data using the query term.
11. The display device of claim 10 , wherein the processor is further configured to
determine whether the at least one query word matches the query term based on at least one of pronunciation patterns of the user and context information of the media data.
12. The display device of claim 11 , wherein the context information comprises at least one of title information, genre information, character information, scene information, schedule information, audio related information, and image related information of the media data.
13. The display device of claim 10 , wherein the processor is further configured to
voice-recognize the predetermined amount of the audio data;
extract at least one query term candidate from a corresponding voice recognition result; and
determine the at least one query word that matches the query term that is full and searchable from the at least one query term candidate.
14. The display device of claim 13 , wherein the processor is further configured to
output the at least one query term candidate when the at least one query term candidate does not match the at least one query word.
15. The display device of claim 10 , wherein the processor is further configured to
voice-recognize the predetermined amount of the audio data;
extract at least one query term candidate from a corresponding voice recognition result;
output the at least one query term candidate; and
receive a command for selecting the query term from the at least one query term candidate.
16. The display device of claim 15 , wherein the processor is further configured to
provide the user with the at least one query term candidate in a chronological order; and
provide the user with an image of the media data while the at least one query term candidate being outputted.
17. The display device of claim 10 , wherein the processor is further configured to
provide the user with a confirmation request for the determined query term; and
when the determined query term is confirmed by the user, perform the search using the confirmed query term.
18. The display device of claim 10 , wherein the processor is further configured to
determine whether at least one query word corresponds to a full and searchable query term or a partial query term.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/953,313 US9547716B2 (en) | 2012-08-29 | 2013-07-29 | Displaying additional data about outputted media data by a display device for a speech search command |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2012-0095034 | 2012-08-29 | ||
KR1020120095034A KR102081925B1 (en) | 2012-08-29 | 2012-08-29 | display device and speech search method thereof |
US13/761,102 US8521531B1 (en) | 2012-08-29 | 2013-02-06 | Displaying additional data about outputted media data by a display device for a speech search command |
US13/953,313 US9547716B2 (en) | 2012-08-29 | 2013-07-29 | Displaying additional data about outputted media data by a display device for a speech search command |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/761,102 Continuation US8521531B1 (en) | 2012-08-29 | 2013-02-06 | Displaying additional data about outputted media data by a display device for a speech search command |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140067402A1 true US20140067402A1 (en) | 2014-03-06 |
US9547716B2 US9547716B2 (en) | 2017-01-17 |
Family
ID=48999837
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/761,102 Active US8521531B1 (en) | 2012-08-29 | 2013-02-06 | Displaying additional data about outputted media data by a display device for a speech search command |
US13/953,313 Active 2034-05-31 US9547716B2 (en) | 2012-08-29 | 2013-07-29 | Displaying additional data about outputted media data by a display device for a speech search command |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/761,102 Active US8521531B1 (en) | 2012-08-29 | 2013-02-06 | Displaying additional data about outputted media data by a display device for a speech search command |
Country Status (4)
Country | Link |
---|---|
US (2) | US8521531B1 (en) |
EP (1) | EP2891084A4 (en) |
KR (1) | KR102081925B1 (en) |
WO (1) | WO2014035061A1 (en) |
Cited By (126)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017019929A1 (en) * | 2015-07-29 | 2017-02-02 | Simplifeye, Inc. | System and method for facilitating access to a database |
DK201770641A1 (en) * | 2016-09-23 | 2018-04-03 | Apple Inc | Intelligent automated assistant |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10354652B2 (en) | 2015-12-02 | 2019-07-16 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US10390213B2 (en) | 2014-09-30 | 2019-08-20 | Apple Inc. | Social reminders |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10417405B2 (en) | 2011-03-21 | 2019-09-17 | Apple Inc. | Device access using voice authentication |
US10417344B2 (en) | 2014-05-30 | 2019-09-17 | Apple Inc. | Exemplar-based natural language processing |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10438595B2 (en) | 2014-09-30 | 2019-10-08 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10453443B2 (en) | 2014-09-30 | 2019-10-22 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US10529332B2 (en) | 2015-03-08 | 2020-01-07 | Apple Inc. | Virtual assistant activation |
US10580409B2 (en) | 2016-06-11 | 2020-03-03 | Apple Inc. | Application integration with a digital assistant |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10692504B2 (en) | 2010-02-25 | 2020-06-23 | Apple Inc. | User profiling for voice input processing |
US10699717B2 (en) | 2014-05-30 | 2020-06-30 | Apple Inc. | Intelligent assistant for home automation |
US10714117B2 (en) | 2013-02-07 | 2020-07-14 | Apple Inc. | Voice trigger for a digital assistant |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10741185B2 (en) | 2010-01-18 | 2020-08-11 | Apple Inc. | Intelligent automated assistant |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10769385B2 (en) | 2013-06-09 | 2020-09-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
EP3716636A1 (en) * | 2019-03-29 | 2020-09-30 | Spotify AB | Systems and methods for delivering relevant media content by inferring past media content consumption |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10942702B2 (en) | 2016-06-11 | 2021-03-09 | Apple Inc. | Intelligent device arbitration and control |
US10942703B2 (en) | 2015-12-23 | 2021-03-09 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10957321B2 (en) | 2016-07-21 | 2021-03-23 | Samsung Electronics Co., Ltd. | Electronic device and control method thereof |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11048473B2 (en) | 2013-06-09 | 2021-06-29 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US11094323B2 (en) * | 2016-10-14 | 2021-08-17 | Samsung Electronics Co., Ltd. | Electronic device and method for processing audio signal by electronic device |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11127397B2 (en) | 2015-05-27 | 2021-09-21 | Apple Inc. | Device voice control |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11350253B2 (en) | 2011-06-03 | 2022-05-31 | Apple Inc. | Active transport based notifications |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US20220237273A1 (en) * | 2013-03-15 | 2022-07-28 | Google Llc | Authentication of audio-based input signals |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11928604B2 (en) | 2005-09-08 | 2024-03-12 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US12293763B2 (en) | 2023-07-26 | 2025-05-06 | Apple Inc. | Application integration with a digital assistant |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102081925B1 (en) * | 2012-08-29 | 2020-02-26 | 엘지전자 주식회사 | display device and speech search method thereof |
JP6064629B2 (en) * | 2013-01-30 | 2017-01-25 | 富士通株式会社 | Voice input / output database search method, program, and apparatus |
JP6335437B2 (en) * | 2013-04-26 | 2018-05-30 | キヤノン株式会社 | COMMUNICATION DEVICE, COMMUNICATION METHOD, AND PROGRAM |
JP2015011170A (en) * | 2013-06-28 | 2015-01-19 | 株式会社ATR−Trek | Voice recognition client device performing local voice recognition |
CN104427350A (en) * | 2013-08-29 | 2015-03-18 | 中兴通讯股份有限公司 | Associated content processing method and system |
US20170286049A1 (en) * | 2014-08-27 | 2017-10-05 | Samsung Electronics Co., Ltd. | Apparatus and method for recognizing voice commands |
KR102348084B1 (en) * | 2014-09-16 | 2022-01-10 | 삼성전자주식회사 | Image Displaying Device, Driving Method of Image Displaying Device, and Computer Readable Recording Medium |
US9830321B2 (en) | 2014-09-30 | 2017-11-28 | Rovi Guides, Inc. | Systems and methods for searching for a media asset |
US10504509B2 (en) | 2015-05-27 | 2019-12-10 | Google Llc | Providing suggested voice-based action queries |
US10134386B2 (en) * | 2015-07-21 | 2018-11-20 | Rovi Guides, Inc. | Systems and methods for identifying content corresponding to a language spoken in a household |
KR102453603B1 (en) | 2015-11-10 | 2022-10-12 | 삼성전자주식회사 | Electronic device and method for controlling thereof |
US10915234B2 (en) * | 2016-06-01 | 2021-02-09 | Motorola Mobility Llc | Responsive, visual presentation of informational briefs on user requested topics |
KR102403149B1 (en) * | 2016-07-21 | 2022-05-30 | 삼성전자주식회사 | Electric device and method for controlling thereof |
JP2018054850A (en) * | 2016-09-28 | 2018-04-05 | 株式会社東芝 | Information processing system, information processor, information processing method, and program |
JP6697373B2 (en) * | 2016-12-06 | 2020-05-20 | カシオ計算機株式会社 | Sentence generating device, sentence generating method and program |
US10558421B2 (en) * | 2017-05-22 | 2020-02-11 | International Business Machines Corporation | Context based identification of non-relevant verbal communications |
KR102353486B1 (en) * | 2017-07-18 | 2022-01-20 | 엘지전자 주식회사 | Mobile terminal and method for controlling the same |
US10602234B2 (en) * | 2018-07-12 | 2020-03-24 | Rovi Guides, Inc. | Systems and methods for gamification of real-time instructional commentating |
US10878013B1 (en) * | 2018-11-26 | 2020-12-29 | CSC Holdings, LLC | Bi-directional voice enabled system for CPE devices |
CN113365100B (en) * | 2021-06-02 | 2022-11-22 | 中国邮政储蓄银行股份有限公司 | Video processing method and device |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030115057A1 (en) * | 2001-12-13 | 2003-06-19 | Junqua Jean-Claude | Constraint-based speech recognition system and method |
US20090063151A1 (en) * | 2007-08-28 | 2009-03-05 | Nexidia Inc. | Keyword spotting using a phoneme-sequence index |
US20090150159A1 (en) * | 2007-12-06 | 2009-06-11 | Sony Ericsson Mobile Communications Ab | Voice Searching for Media Files |
US20090276417A1 (en) * | 2008-05-02 | 2009-11-05 | Ido Shapira | Method and system for database query term suggestion |
US20100145938A1 (en) * | 2008-12-04 | 2010-06-10 | At&T Intellectual Property I, L.P. | System and Method of Keyword Detection |
US20100274667A1 (en) * | 2009-04-24 | 2010-10-28 | Nexidia Inc. | Multimedia access |
US20100286984A1 (en) * | 2007-07-18 | 2010-11-11 | Michael Wandinger | Method for speech rocognition |
US8015013B2 (en) * | 2005-12-12 | 2011-09-06 | Creative Technology Ltd | Method and apparatus for accessing a digital file from a collection of digital files |
US8019604B2 (en) * | 2007-12-21 | 2011-09-13 | Motorola Mobility, Inc. | Method and apparatus for uniterm discovery and voice-to-voice search on mobile device |
US20110320197A1 (en) * | 2010-06-23 | 2011-12-29 | Telefonica S.A. | Method for indexing multimedia information |
US20120215533A1 (en) * | 2011-01-26 | 2012-08-23 | Veveo, Inc. | Method of and System for Error Correction in Multiple Input Modality Search Engines |
US8521531B1 (en) * | 2012-08-29 | 2013-08-27 | Lg Electronics Inc. | Displaying additional data about outputted media data by a display device for a speech search command |
US20140122059A1 (en) * | 2012-10-31 | 2014-05-01 | Tivo Inc. | Method and system for voice based media search |
US8751690B2 (en) * | 2009-05-27 | 2014-06-10 | Spot411 Technologies, Inc. | Tracking time-based selection of search results |
US20150019203A1 (en) * | 2011-12-28 | 2015-01-15 | Elliot Smith | Real-time natural language processing of datastreams |
US20150379094A1 (en) * | 2005-08-19 | 2015-12-31 | At&T Intellectual Property Ii, L.P. | System and Method for Using Speech for Data Searching During Presentations |
Family Cites Families (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7831204B1 (en) * | 1981-11-03 | 2010-11-09 | Personalized Media Communications, Llc | Signal processing apparatus and methods |
US5835667A (en) * | 1994-10-14 | 1998-11-10 | Carnegie Mellon University | Method and apparatus for creating a searchable digital video library and a system and method of using such a library |
DE4442999A1 (en) | 1994-12-02 | 1996-06-05 | Hexal Pharma Gmbh | Pharmaceutical composition containing an active loratidine metabolite |
US5809471A (en) * | 1996-03-07 | 1998-09-15 | Ibm Corporation | Retrieval of additional information not found in interactive TV or telephony signal by application using dynamically extracted vocabulary |
US6480819B1 (en) * | 1999-02-25 | 2002-11-12 | Matsushita Electric Industrial Co., Ltd. | Automatic search of audio channels by matching viewer-spoken words against closed-caption/audio content for interactive television |
US6941268B2 (en) * | 2001-06-21 | 2005-09-06 | Tellme Networks, Inc. | Handling of speech recognition in a declarative markup language |
US7950033B2 (en) * | 2001-10-10 | 2011-05-24 | Opentv, Inc. | Utilization of relational metadata in a television system |
US7467398B2 (en) * | 2002-03-21 | 2008-12-16 | International Business Machines Corproation | Apparatus and method of searching for desired television content |
US20040078814A1 (en) * | 2002-03-29 | 2004-04-22 | Digeo, Inc. | Module-based interactive television ticker |
AU2003267006A1 (en) * | 2002-09-27 | 2004-04-19 | International Business Machines Corporation | System and method for enhancing live speech with information accessed from the world wide web |
US20040210443A1 (en) * | 2003-04-17 | 2004-10-21 | Roland Kuhn | Interactive mechanism for retrieving information from audio and multimedia files containing speech |
US20060041926A1 (en) * | 2004-04-30 | 2006-02-23 | Vulcan Inc. | Voice control of multimedia content |
JP2006201749A (en) * | 2004-12-21 | 2006-08-03 | Matsushita Electric Ind Co Ltd | Device in which selection is activated by voice, and method in which selection is activated by voice |
US7640160B2 (en) * | 2005-08-05 | 2009-12-29 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
JP2007081768A (en) | 2005-09-14 | 2007-03-29 | Fujitsu Ten Ltd | Multimedia apparatus |
US8209724B2 (en) * | 2007-04-25 | 2012-06-26 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
US20070225970A1 (en) * | 2006-03-21 | 2007-09-27 | Kady Mark A | Multi-context voice recognition system for long item list searches |
KR100807745B1 (en) * | 2006-03-23 | 2008-02-28 | (주)비욘위즈 | EP information provision method and system |
US20080059522A1 (en) * | 2006-08-29 | 2008-03-06 | International Business Machines Corporation | System and method for automatically creating personal profiles for video characters |
US7272558B1 (en) | 2006-12-01 | 2007-09-18 | Coveo Solutions Inc. | Speech recognition training method for audio and video file indexing on a search engine |
JP5029030B2 (en) * | 2007-01-22 | 2012-09-19 | 富士通株式会社 | Information grant program, information grant device, and information grant method |
US20080270110A1 (en) | 2007-04-30 | 2008-10-30 | Yurick Steven J | Automatic speech recognition with textual content input |
US7983915B2 (en) | 2007-04-30 | 2011-07-19 | Sonic Foundry, Inc. | Audio content search engine |
US8904442B2 (en) * | 2007-09-06 | 2014-12-02 | At&T Intellectual Property I, Lp | Method and system for information querying |
US8312022B2 (en) * | 2008-03-21 | 2012-11-13 | Ramp Holdings, Inc. | Search engine optimization |
US8108214B2 (en) * | 2008-11-19 | 2012-01-31 | Robert Bosch Gmbh | System and method for recognizing proper names in dialog systems |
US8296141B2 (en) * | 2008-11-19 | 2012-10-23 | At&T Intellectual Property I, L.P. | System and method for discriminative pronunciation modeling for voice search |
US8707381B2 (en) * | 2009-09-22 | 2014-04-22 | Caption Colorado L.L.C. | Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs |
KR20110103626A (en) * | 2010-03-15 | 2011-09-21 | 삼성전자주식회사 | Apparatus and method for providing tag information on multimedia data in a portable terminal |
US8660355B2 (en) | 2010-03-19 | 2014-02-25 | Digimarc Corporation | Methods and systems for determining image processing operations relevant to particular imagery |
JP2011199698A (en) * | 2010-03-23 | 2011-10-06 | Yamaha Corp | Av equipment |
CA2800489A1 (en) * | 2010-03-24 | 2011-09-29 | Annaburne Pty Ltd | Method of searching recorded media content |
KR101009973B1 (en) * | 2010-04-07 | 2011-01-21 | 김덕훈 | Method of providing media content, and apparatus therefor |
US8918803B2 (en) * | 2010-06-25 | 2014-12-23 | At&T Intellectual Property I, Lp | System and method for automatic identification of key phrases during a multimedia broadcast |
US20130007043A1 (en) * | 2011-06-30 | 2013-01-03 | Phillips Michael E | Voice description of time-based media for indexing and searching |
-
2012
- 2012-08-29 KR KR1020120095034A patent/KR102081925B1/en active Active
-
2013
- 2013-02-06 US US13/761,102 patent/US8521531B1/en active Active
- 2013-07-29 WO PCT/KR2013/006765 patent/WO2014035061A1/en unknown
- 2013-07-29 US US13/953,313 patent/US9547716B2/en active Active
- 2013-07-29 EP EP13833521.1A patent/EP2891084A4/en not_active Ceased
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030115057A1 (en) * | 2001-12-13 | 2003-06-19 | Junqua Jean-Claude | Constraint-based speech recognition system and method |
US20150379094A1 (en) * | 2005-08-19 | 2015-12-31 | At&T Intellectual Property Ii, L.P. | System and Method for Using Speech for Data Searching During Presentations |
US8015013B2 (en) * | 2005-12-12 | 2011-09-06 | Creative Technology Ltd | Method and apparatus for accessing a digital file from a collection of digital files |
US20100286984A1 (en) * | 2007-07-18 | 2010-11-11 | Michael Wandinger | Method for speech rocognition |
US20090063151A1 (en) * | 2007-08-28 | 2009-03-05 | Nexidia Inc. | Keyword spotting using a phoneme-sequence index |
US20090150159A1 (en) * | 2007-12-06 | 2009-06-11 | Sony Ericsson Mobile Communications Ab | Voice Searching for Media Files |
US8019604B2 (en) * | 2007-12-21 | 2011-09-13 | Motorola Mobility, Inc. | Method and apparatus for uniterm discovery and voice-to-voice search on mobile device |
US20090276417A1 (en) * | 2008-05-02 | 2009-11-05 | Ido Shapira | Method and system for database query term suggestion |
US20100145938A1 (en) * | 2008-12-04 | 2010-06-10 | At&T Intellectual Property I, L.P. | System and Method of Keyword Detection |
US8510317B2 (en) * | 2008-12-04 | 2013-08-13 | At&T Intellectual Property I, L.P. | Providing search results based on keyword detection in media content |
US20100274667A1 (en) * | 2009-04-24 | 2010-10-28 | Nexidia Inc. | Multimedia access |
US8751690B2 (en) * | 2009-05-27 | 2014-06-10 | Spot411 Technologies, Inc. | Tracking time-based selection of search results |
US20110320197A1 (en) * | 2010-06-23 | 2011-12-29 | Telefonica S.A. | Method for indexing multimedia information |
US20120215533A1 (en) * | 2011-01-26 | 2012-08-23 | Veveo, Inc. | Method of and System for Error Correction in Multiple Input Modality Search Engines |
US20150019203A1 (en) * | 2011-12-28 | 2015-01-15 | Elliot Smith | Real-time natural language processing of datastreams |
US8521531B1 (en) * | 2012-08-29 | 2013-08-27 | Lg Electronics Inc. | Displaying additional data about outputted media data by a display device for a speech search command |
US20140122059A1 (en) * | 2012-10-31 | 2014-05-01 | Tivo Inc. | Method and system for voice based media search |
Cited By (221)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11928604B2 (en) | 2005-09-08 | 2024-03-12 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US11979836B2 (en) | 2007-04-03 | 2024-05-07 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10741185B2 (en) | 2010-01-18 | 2020-08-11 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US12165635B2 (en) | 2010-01-18 | 2024-12-10 | Apple Inc. | Intelligent automated assistant |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US10692504B2 (en) | 2010-02-25 | 2020-06-23 | Apple Inc. | User profiling for voice input processing |
US10417405B2 (en) | 2011-03-21 | 2019-09-17 | Apple Inc. | Device access using voice authentication |
US11350253B2 (en) | 2011-06-03 | 2022-05-31 | Apple Inc. | Active transport based notifications |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US12009007B2 (en) | 2013-02-07 | 2024-06-11 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US10714117B2 (en) | 2013-02-07 | 2020-07-14 | Apple Inc. | Voice trigger for a digital assistant |
US12277954B2 (en) | 2013-02-07 | 2025-04-15 | Apple Inc. | Voice trigger for a digital assistant |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US20220237273A1 (en) * | 2013-03-15 | 2022-07-28 | Google Llc | Authentication of audio-based input signals |
US11880442B2 (en) * | 2013-03-15 | 2024-01-23 | Google Llc | Authentication of audio-based input signals |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US11048473B2 (en) | 2013-06-09 | 2021-06-29 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10769385B2 (en) | 2013-06-09 | 2020-09-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US12067990B2 (en) | 2014-05-30 | 2024-08-20 | Apple Inc. | Intelligent assistant for home automation |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US10714095B2 (en) | 2014-05-30 | 2020-07-14 | Apple Inc. | Intelligent assistant for home automation |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US10657966B2 (en) | 2014-05-30 | 2020-05-19 | Apple Inc. | Better resolution when referencing to concepts |
US10699717B2 (en) | 2014-05-30 | 2020-06-30 | Apple Inc. | Intelligent assistant for home automation |
US10417344B2 (en) | 2014-05-30 | 2019-09-17 | Apple Inc. | Exemplar-based natural language processing |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US12118999B2 (en) | 2014-05-30 | 2024-10-15 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US12200297B2 (en) | 2014-06-30 | 2025-01-14 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10453443B2 (en) | 2014-09-30 | 2019-10-22 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10438595B2 (en) | 2014-09-30 | 2019-10-08 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10390213B2 (en) | 2014-09-30 | 2019-08-20 | Apple Inc. | Social reminders |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10529332B2 (en) | 2015-03-08 | 2020-01-07 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US12236952B2 (en) | 2015-03-08 | 2025-02-25 | Apple Inc. | Virtual assistant activation |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US12154016B2 (en) | 2015-05-15 | 2024-11-26 | Apple Inc. | Virtual assistant in a communication session |
US11127397B2 (en) | 2015-05-27 | 2021-09-21 | Apple Inc. | Device voice control |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10681212B2 (en) | 2015-06-05 | 2020-06-09 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
WO2017019929A1 (en) * | 2015-07-29 | 2017-02-02 | Simplifeye, Inc. | System and method for facilitating access to a database |
US12204932B2 (en) | 2015-09-08 | 2025-01-21 | Apple Inc. | Distributed personal assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US10354652B2 (en) | 2015-12-02 | 2019-07-16 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10942703B2 (en) | 2015-12-23 | 2021-03-09 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12175977B2 (en) | 2016-06-10 | 2024-12-24 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US10942702B2 (en) | 2016-06-11 | 2021-03-09 | Apple Inc. | Intelligent device arbitration and control |
US10580409B2 (en) | 2016-06-11 | 2020-03-03 | Apple Inc. | Application integration with a digital assistant |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10957321B2 (en) | 2016-07-21 | 2021-03-23 | Samsung Electronics Co., Ltd. | Electronic device and control method thereof |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
DK201770641A1 (en) * | 2016-09-23 | 2018-04-03 | Apple Inc | Intelligent automated assistant |
US11094323B2 (en) * | 2016-10-14 | 2021-08-17 | Samsung Electronics Co., Ltd. | Electronic device and method for processing audio signal by electronic device |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US12260234B2 (en) | 2017-01-09 | 2025-03-25 | Apple Inc. | Application integration with a digital assistant |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10847142B2 (en) | 2017-05-11 | 2020-11-24 | Apple Inc. | Maintaining privacy of personal information |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US12026197B2 (en) | 2017-05-16 | 2024-07-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US10909171B2 (en) | 2017-05-16 | 2021-02-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US12211502B2 (en) | 2018-03-26 | 2025-01-28 | Apple Inc. | Natural assistant interaction |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US12061752B2 (en) | 2018-06-01 | 2024-08-13 | Apple Inc. | Attention aware virtual assistant dismissal |
US12080287B2 (en) | 2018-06-01 | 2024-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US10504518B1 (en) | 2018-06-03 | 2019-12-10 | Apple Inc. | Accelerated task performance |
US10944859B2 (en) | 2018-06-03 | 2021-03-09 | Apple Inc. | Accelerated task performance |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US12136419B2 (en) | 2019-03-18 | 2024-11-05 | Apple Inc. | Multimodality in digital assistant systems |
US10965976B2 (en) | 2019-03-29 | 2021-03-30 | Spotify Ab | Systems and methods for delivering relevant media content by inferring past media content consumption |
EP3716636A1 (en) * | 2019-03-29 | 2020-09-30 | Spotify AB | Systems and methods for delivering relevant media content by inferring past media content consumption |
US11653048B2 (en) | 2019-03-29 | 2023-05-16 | Spotify Ab | Systems and methods for delivering relevant media content by inferring past media content consumption |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US12216894B2 (en) | 2019-05-06 | 2025-02-04 | Apple Inc. | User configurable task triggers |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US12154571B2 (en) | 2019-05-06 | 2024-11-26 | Apple Inc. | Spoken notifications |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US12197712B2 (en) | 2020-05-11 | 2025-01-14 | Apple Inc. | Providing relevant data items based on context |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US12219314B2 (en) | 2020-07-21 | 2025-02-04 | Apple Inc. | User identification using headphones |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US12293763B2 (en) | 2023-07-26 | 2025-05-06 | Apple Inc. | Application integration with a digital assistant |
Also Published As
Publication number | Publication date |
---|---|
US9547716B2 (en) | 2017-01-17 |
KR20140028540A (en) | 2014-03-10 |
KR102081925B1 (en) | 2020-02-26 |
WO2014035061A1 (en) | 2014-03-06 |
US8521531B1 (en) | 2013-08-27 |
EP2891084A4 (en) | 2016-05-25 |
EP2891084A1 (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9547716B2 (en) | Displaying additional data about outputted media data by a display device for a speech search command | |
US20230222153A1 (en) | Content Analysis to Enhance Voice Search | |
US8620658B2 (en) | Voice chat system, information processing apparatus, speech recognition method, keyword data electrode detection method, and program for speech recognition | |
US10643036B2 (en) | Language translation device and language translation method | |
US11556302B2 (en) | Electronic apparatus, document displaying method thereof and non-transitory computer readable recording medium | |
US20160189711A1 (en) | Speech recognition for internet video search and navigation | |
KR102029276B1 (en) | Answering questions using environmental context | |
US10672379B1 (en) | Systems and methods for selecting a recipient device for communications | |
JPWO2015098109A1 (en) | Speech recognition processing device, speech recognition processing method, and display device | |
US10152298B1 (en) | Confidence estimation based on frequency | |
US10255321B2 (en) | Interactive system, server and control method thereof | |
KR20140089836A (en) | Interactive server, display apparatus and controlling method thereof | |
KR100970711B1 (en) | Apparatus for searching the internet while watching TV and method threrefor | |
US10841411B1 (en) | Systems and methods for establishing a communications session | |
EP3994686B1 (en) | Contextual voice-based presentation assistance | |
US11657805B2 (en) | Dynamic context-based routing of speech processing | |
US11587571B2 (en) | Electronic apparatus and control method thereof | |
KR20240124243A (en) | Electronic apparatus and control method thereof | |
CN117812323A (en) | Display device, voice recognition method, voice recognition device and storage medium | |
KR20170081418A (en) | Image display apparatus and method for displaying image | |
CN115862615A (en) | Display device, voice search method and storage medium | |
WO2022271555A1 (en) | Early invocation for contextual data processing | |
US20230267934A1 (en) | Display apparatus and operating method thereof | |
US20250131910A1 (en) | Automated prediction of pronunciation of text entities based on co-emitted speech recognition predictions | |
JP2007213554A (en) | Method for rendering rank-ordered result set for probabilistic query, executed by computer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |