CN103544140A - Data processing method, display method and corresponding devices - Google Patents
Data processing method, display method and corresponding devices Download PDFInfo
- Publication number
- CN103544140A CN103544140A CN201210241787.1A CN201210241787A CN103544140A CN 103544140 A CN103544140 A CN 103544140A CN 201210241787 A CN201210241787 A CN 201210241787A CN 103544140 A CN103544140 A CN 103544140A
- Authority
- CN
- China
- Prior art keywords
- keyword
- region
- confidence
- degree
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of voice recognition, and discloses a data processing method. The data processing method comprises the following steps that a text message corresponding to the display content is acquired, wherein the display content comprises a plurality of areas; text analysis is conducted on the text message, so that a first key word sequence is obtained, wherein the first key word sequence comprises an area key word associated with at least one of the plurality of areas; a voice message related to the display content is acquired, wherein the voice message at least comprises a current voice fragment; a first model network is used for analyzing the current voice fragment so as to judge the area corresponding to the current voice fragment, wherein the first model network comprises the first key word sequence. Accordingly, the invention further discloses a display method, a corresponding device used for data processing and a corresponding device used for display. By means of the technical scheme, the voice fragment can be associated with different areas of the display content, and automatic skip of the display content can be achieved according to the different areas.
Description
Technical field
The present invention relates to field of speech recognition, more specifically, relate to a kind of method, methods of exhibiting of data processing and install accordingly.
Background technology
Along with the development of modern society, in increasing occasion, in order to facilitate audience or spectators' understanding or the power that attracts attention, people usually need to coordinate explanation/speech to show.For example, when sales force is client's products Presentation or scheme, just usually need by displayings such as electronic slides, audio frequency and video; Technician also usually uses these technological means to show in explanation technical scheme; During remote teaching, teacher more needs to rely on these technological means to diffuse information to student.
Now, people, when carrying out above-mentioned displaying, show that content cannot jump to the region corresponding with current explanation automatically along with exhibitor's explanation, also, the explanation at the exhibitor scene zone association different from showing content cannot be got up.This has just caused, for the redirect of showing content zones of different, needing artificial intervention, thereby has improved the human cost of showing, also easily makes whole displaying be interrupted, and seems sufficiently complete and smooth.
For the problems referred to above of the prior art, need a kind of technology that on-the-spot voice messaging and the zones of different of showing content are associated.
Summary of the invention
In order to realize voice messaging and to show the associated of content, the invention provides a kind of data processing method, a kind of methods of exhibiting, a kind of device for data processing and a kind of device for showing.
According to an aspect of the present invention, provide a kind of data processing method, described method comprises: obtain and show text message corresponding to content, described displaying content comprises a plurality of regions; Described text message is carried out to text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment; Use the first prototype network to analyze described current speech segment, to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
According to another aspect of the present invention, provide a kind of methods of exhibiting, described method comprises: obtain and show text message corresponding to content, described displaying content comprises a plurality of regions; Described text message is carried out to text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment; Obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment; According to the degree of confidence of described keyword, obtain the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region; Degree of confidence in response to corresponding the second keyword sequence of described current region is less than the tenth threshold value, and described current region is left in redirect.
According to a further aspect of the invention, provide a kind of device for data processing, described device comprises: text acquisition module, and be configured to obtain and show text message corresponding to content, wherein said displaying content comprises a plurality of regions; Text analysis model, is configured to described text message to carry out text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first speech analysis module, is configured to use the first prototype network to analyze described current speech segment, and to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
According to a further aspect of the invention, provide a kind of device for showing, described device comprises:
Text acquisition module, is configured to obtain and shows text message corresponding to content, and wherein, described displaying content comprises a plurality of regions; Text analysis model, be configured to described text message to carry out text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first degree of confidence module, is configured to obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment; The second degree of confidence mould certainly, is configured to the degree of confidence according to described keyword, obtains the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region; Redirect module, is configured to be less than the 23 threshold value in response to the degree of confidence of corresponding the second keyword sequence of described current region, and described current region is left in redirect.
During technical scheme provided by the present invention can realize voice and show content, region is associated, thereby can realize, shows that content is according to the automatic redirect in region.
Accompanying drawing explanation
In conjunction with the drawings disclosure illustrative embodiments is described in more detail, above-mentioned and other object of the present disclosure, Characteristics and advantages will become more obvious, wherein, in disclosure illustrative embodiments, identical reference number represents same parts conventionally.
Fig. 1 shows and is suitable for for realizing the block diagram of the exemplary computer system 100 of embodiment of the present invention;
Fig. 2 shows the schematic flow sheet of a kind of data processing method in the embodiment of the present invention;
Fig. 3 shows an example of the first prototype network and the second prototype network in the embodiment of the present invention;
Fig. 4 shows the schematic flow sheet of a kind of methods of exhibiting of the embodiment of the present invention;
Fig. 5 shows a kind of structural representation for data processing equipment of the embodiment of the present invention;
Fig. 6 shows a kind of structural representation for exhibiting device of the embodiment of the present invention.
Embodiment
Preferred implementation of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown preferred implementation of the present disclosure in accompanying drawing, yet should be appreciated that, can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order to make the disclosure more thorough and complete that these embodiments are provided, and can by the scope of the present disclosure complete convey to those skilled in the art.
Fig. 1 shows and is suitable for for realizing the block diagram of the exemplary computer system 100 of embodiment of the present invention.As shown in Figure 1, computer system 100 can comprise: CPU (CPU (central processing unit)) 101, RAM (random access memory) 102, ROM (ROM (read-only memory)) 103, system bus 104, hard disk controller 105, keyboard controller 106, serial interface controller 107, parallel interface controller 108, display controller 109, hard disk 110, keyboard 111, serial external unit 112, parallel external unit 113 and display 114.In these equipment, with system bus 104 coupling have CPU 101, RAM 102, ROM 103, hard disk controller 105, keyboard controller 106, serialization controller 107, parallel controller 108 and a display controller 109.Hard disk 110 and hard disk controller 105 couplings, keyboard 111 and keyboard controller 106 couplings, serial external unit 112 and serial interface controller 107 couplings, parallel external unit 113 and parallel interface controller 108 couplings, and display 114 and display controller 109 couplings.Should be appreciated that the structured flowchart described in Fig. 1 is only used to the object of example, rather than limitation of the scope of the invention.In some cases, can increase as the case may be or reduce some equipment.
Person of ordinary skill in the field knows, the present invention can be implemented as system, method or computer program.Therefore, the disclosure can specific implementation be following form, that is: can be completely hardware, also can be software (comprising firmware, resident software, microcode etc.) completely, can also be the form of hardware and software combination, be commonly referred to as " circuit ", " module " or " system " herein.In addition, in certain embodiments, the present invention can also be embodied as the form of the computer program in one or more computer-readable mediums, comprises computer-readable program code in this computer-readable medium.
Can adopt the combination in any of one or more computer-readable media.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable recording medium can be for example--but being not limited to--electricity, magnetic, optical, electrical magnetic, infrared ray or semi-conductive system, device or device, or the combination arbitrarily.The example more specifically of computer-readable recording medium (non exhaustive list) comprising: have the electrical connection, portable computer diskette, hard disk, random access memory (RAM), ROM (read-only memory) (ROM), erasable type programmable read only memory (EPROM or flash memory), optical fiber, Portable, compact disk ROM (read-only memory) (CD-ROM), light storage device, magnetic memory device of one or more wires or the combination of above-mentioned any appropriate.In presents, computer-readable recording medium can be any comprising or stored program tangible medium, and this program can be used or be combined with it by instruction execution system, device or device.
Computer-readable signal media can be included in base band or the data-signal of propagating as a carrier wave part, has wherein carried computer-readable program code.The combination of electromagnetic signal that the data-signal of this propagation can adopt various ways, comprises--but being not limited to--, light signal or above-mentioned any appropriate.Computer-readable signal media can also be any computer-readable medium beyond computer-readable recording medium, and this computer-readable medium can send, propagates or transmit the program for being used or be combined with it by instruction execution system, device or device.
The program code comprising on computer-readable medium can be with any suitable medium transmission, comprises that--but being not limited to--is wireless, electric wire, optical cable, RF etc., or the combination of above-mentioned any appropriate.
Can combine to write for carrying out the computer program code of the present invention's operation with one or more programming languages or its, described programming language comprises object-oriented programming language-such as Java, Smalltalk, C++, also comprises conventional process type programming language-such as " C " language or similar programming language.Program code can fully be carried out, partly on subscriber computer, carries out, as an independently software package execution, part part on subscriber computer, carry out or on remote computer or server, carry out completely on remote computer on subscriber computer.In relating to the situation of remote computer, remote computer can be by the network of any kind---comprise LAN (Local Area Network) (LAN) or wide area network (WAN)-be connected to subscriber computer, or, can be connected to outer computer (for example utilizing ISP to pass through Internet connection).
Process flow diagram and/or block diagram below with reference to method, device (system) and the computer program of the embodiment of the present invention are described the present invention.Should be appreciated that the combination of each square frame in each square frame of process flow diagram and/or block diagram and process flow diagram and/or block diagram, can be realized by computer program instructions.These computer programs capture the processor that can offer multi-purpose computer, special purpose computer or other programmable data treating apparatus, thereby produce a kind of machine, these computer program instructions are carried out by computing machine or other programmable data treating apparatus, have produced the device of the function/operation of stipulating in the square frame in realization flow figure and/or block diagram.
Also these computer program instructions can be stored in and can make in computing machine or the computer-readable medium of other programmable data treating apparatus with ad hoc fashion work, like this, the instruction being stored in computer-readable medium just produces a manufacture (manufacture) that comprises the command device (instruction means) of the function/operation of stipulating in the square frame in realization flow figure and/or block diagram.
Also computer program instructions can be loaded on computing machine, other programmable data treating apparatus or miscellaneous equipment, make to carry out sequence of operations step on computing machine, other programmable data treating apparatus or miscellaneous equipment, to produce computer implemented process, thus the process of function/operation that the instruction that makes to carry out on computing machine or other programmable device is stipulated during the square frame in realization flow figure and/or block diagram can be provided.
Referring now to Fig. 2,, Fig. 2 shows a kind of data processing method that the embodiment of the present invention provides.The method comprises the following steps: step 210, and obtain and show text message corresponding to content; Step 220, carries out text analyzing to text information, obtains the first keyword sequence; Step 230, obtains and the voice messaging of showing that content is relevant; Step 240, is used the first prototype network to analyze current speech segment, to judge the corresponding region of current speech segment.
For one embodiment of the invention, in step 210, this displaying content comprises a plurality of regions.Wherein, region can be divided according to different standards, and for example can divide according to different themes, or can divide according to fixing size, or can be according to divisions such as page, paragraphs, the present invention is not limited at this.Take the electronic slides of products Presentation as showing that content is example, and the function of product can form a region, and the structure of product can form a region etc.; Take show content as document be example, each paragraph or each one-level title can form a region; Take show content as picture be example, people different in picture can form different regions, or every pictures forms a region; Take show content as video or audio frequency be example, fixedly the segment of duration can form region, or the segment of different themes content can form different regions.In one embodiment of the invention, if show, content is electronic slides etc., and to take text be main object, and step 210 can be directly will show that text message in content is as text message corresponding to displaying content; If show, content is audio frequency or video, step 210 can obtain text message corresponding to this displaying content by exhibitor's preview being carried out to speech recognition, or according to the captions corresponding with audio frequency or video, obtain text message, or can also obtain text message according to the manuscript corresponding with audio frequency or video.It will be understood by those skilled in the art that division and text message for region, can carry out artificial adjustment.
Text analyzing in step 220 can adopt text analysis technique of the prior art, repeats no more herein.The first keyword sequence of step 220 comprises and the region keyword of showing the zone association of content.Region keyword is the keyword that can be used in identified region, and region keyword is such as being title, region high frequency words or control command word etc. at different levels.Wherein, the region high frequency words as region keyword there will not be conventionally in different regions.It will be understood by those skilled in the art that when using region high frequency words as region keyword, can filter everyday words, thus avoid everyday words because the frequency of occurrences is high as region keyword.Common word is such as being conjunction, pronoun etc.In one embodiment of this invention, can carry out artificial adjustment or appointment to region keyword, thereby make the region keyword can be better and zone association.To introduce the example that is shown as of a certain forest, this displaying content comprises a plurality of regions, is respectively the position of forest, the seeds that forest is included, and the Animal resources that forest is included, forest is to adjusting of periphery weather etc.In example as shown in Figure 3, the first keyword sequence is such as comprising: geographic position, Jilin Province, plant resources, kahikatea, Animal resources, golden eagle, climate effect, humidity etc.Wherein, geographic position, Zhe Liangge region, Jilin Province keyword are all associated with this region, position of forest.
Wherein, the first prototype network can also comprise the first aligned phoneme sequence.The first aligned phoneme sequence consists of a plurality of phonemes.Those skilled in the art are known, and phoneme is minimum voice unit, be for example initial consonant and simple or compound vowel of a Chinese syllable, and be for example phonetic symbol for English for Chinese.In one embodiment of the invention, the first aligned phoneme sequence has comprised a kind of all phonemes of language, for example, comprised all initial consonants and the simple or compound vowel of a Chinese syllable of Chinese.
In step 230, voice messaging at least comprises current speech segment.The present invention does not specifically limit the cutting method of voice snippet, can use voice cutting method of the prior art.The voice messaging relevant to showing content be the language of exhibitor scene in displaying normally.In one embodiment of this invention, this language can be exhibitor's natural language, but not the particular command statement sending.
In step 240, can be exported accordingly after using the first prototype network to analyze current speech segment.If judgement current speech segment is the some regions keyword in the first keyword sequence, this output can be this region keyword; If any region keyword in non-the first keyword sequence of judgement current speech segment, this is output as the phoneme of this voice snippet obtaining according to the first aligned phoneme sequence.In one embodiment of this invention, above-mentioned judgement can be made by the competition results based on the degree of confidence of institute's containing element in the first keyword sequence and the first aligned phoneme sequence.
In an embodiment of the present invention, if be output as region keyword, can judge that the corresponding region of current speech segment is the region associated with this region keyword.
In one embodiment of the invention, after step 240, can also comprise jump procedure: displaying content is jumped to the corresponding region of current speech segment.Can realize like this and show that content, according to the automatic redirect in region, has reduced manual operation.Optionally, if the corresponding region of current speech segment is identical with current region, can not carry out this redirect, still show current region; If the corresponding region of current speech segment is different from current region, carry out this redirect, thereby show this corresponding region of current voice snippet, now, this corresponding region of current voice snippet is current region.When specifically carrying out redirect, for document or electronic slides etc., can directly jump to page corresponding to region or section; And for video, audio frequency etc., can directly jump to timestamp corresponding to region.The information of the page corresponding with zones of different, section or timestamp can be preset, and also can obtain by text analyzing.
It will be appreciated by those skilled in the art that, the data processing method that the embodiment of the present invention provides not only can be for showing that content is according to the automatic redirect in region, also there is other application, for example, for showing that content processes, the operation such as is deleted, moves in the corresponding region of current speech segment.
The method providing by above-described embodiment, can realize for showing that content is according to the automatic redirect in region according to exhibitor's language, avoided in displaying exhibitor or other people to carry out artificial redirect, making to show can be more complete, smooth, does not also need the cooperation between exhibitor and other operating personnel.Further, because said method can be processed exhibitor's natural language, and be not limited to command statement, make whole displaying more complete, natural, and without exhibitor, remember specific command statement, the complexity that minimizing method is implemented.Particularly when exhibitor carries out remote exhibition, the on-the-spot sound that can only hear exhibitor, the scheme providing by the above embodiment of the present invention can be analyzed exhibitor's voice messaging, thereby realize the automatic redirect of showing content, while having avoided remote exhibition, for showing the unmanageable problem of content.
In an embodiment of the present invention, step 240 specifically can comprise: obtain the degree of confidence of at least one region keyword in described the first keyword sequence, and wherein larger with the degree of confidence of the higher region keyword of the similarity of current speech segment; If the degree of confidence of a region keyword reaches a threshold value, determine the associated region of keyword, Wei Gai region, the corresponding region of current speech segment.In another embodiment of the present invention, if can be that the degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value, determine Wei Gai region, the corresponding region of current speech segment.Wherein, a plurality of degree of confidence reach the concrete quantity of the associated region keyword of the same area of a threshold value, can preset.In another embodiment of the present invention, if can also be a plurality of regions keyword associated with the same area degree of confidence and reach a threshold value, determine Wei Gai region, the corresponding region of current speech segment.Wherein, the degree of confidence of a plurality of regions keyword and can be directly add and, can be also weighted sum.When the degree of confidence that adopts a plurality of regions keyword judges, be more conducive to judging area accurately, reduced the possibility of erroneous judgement.It will be understood by those skilled in the art that the specific implementation that above-described embodiment provides is only an example, can also make the combination of above-mentioned implementation, or can also utilize in other way the first prototype network to carry out speech analysis.
In an embodiment of the present invention, can be the degree of confidence that obtains All Ranges keyword in the first keyword sequence.When carrying out speech analysis, can judge the similarity of All Ranges keyword in current speech segment and the first keyword sequence, the degree of confidence that similarity is higher is larger, obtain the region keyword of degree of confidence maximum, whether the degree of confidence that judges this region keyword has reached a threshold value, judges the associated region of keyword, Wei Gai region, the corresponding region of current speech segment if reach.Another implementation can be the region keyword in current speech segment and the first keyword sequence to be carried out to order compare, when the degree of confidence of a certain region keyword reaches a threshold value, just directly judge the associated region of keyword, Wei Gai region, the corresponding region of current speech segment.The similarity that it will be understood by those skilled in the art that region key words current speech segment can be pronunciation similarity, can be also text similarity.
In an embodiment of the present invention, further can calculate the similarity of the phoneme in current speech segment and the first aligned phoneme sequence, the phoneme degree of confidence that similarity is higher is larger, obtains the phoneme of degree of confidence maximum or the phoneme that degree of confidence reaches a threshold value.If the degree of confidence of the region keyword obtaining according to said method is compared with the degree of confidence of the phoneme obtaining, gap reaches a threshold value, judges and in current speech segment, does not comprise any region keyword.
In an embodiment of the present invention, can also be by the first aligned phoneme sequence judgement corresponding region of current speech segment in step 240.Concrete, in speech analysis, can obtain at least one phoneme adjacent with current speech segment according to the first aligned phoneme sequence; Judge the pronunciation similarity of the corresponding text message of this at least one phoneme and this at least one region keyword, the corresponding text message of this at least one region keyword comprises the context of this at least one region keyword in text message; If the pronunciation similarity of the corresponding text message of this at least one phoneme and at least one region keyword reaches a threshold value, improve the degree of confidence that this pronunciation similarity reaches the region keyword of threshold value.In embodiments of the present invention, can be in current speech segment, whether to comprise that region keyword all obtains at least one phoneme that current speech segment is adjacent, or can be in the time of may comprising region keyword in current speech segment, for example, when the degree of confidence of region keyword is higher than a threshold value, obtain at least one phoneme that current speech segment is adjacent.Optionally, phoneme and contextual pronunciation similarity in order to judge that more accurately this is adjacent, can obtain more adjacent phoneme.Region keyword in the first keyword sequence has its context in text message, also be its corresponding text message, the adjacent phoneme obtaining and these corresponding text messages can be compared, and when its pronunciation similarity reaches a threshold value, improve the degree of confidence of corresponding region keyword.It will be understood by those skilled in the art that this scheme has other implementation, for example, only select the highest corresponding text message of pronunciation similarity, and improve the degree of confidence of this corresponding region of corresponding text message keyword.Or according to different pronunciation similarities, different for the adjustment of degree of confidence, pronunciation similarity is higher, what degree of confidence was enhanced is larger.By the adjustment to the degree of confidence of region keyword, can be so that determining of region be more accurate.And, owing to being to the pronunciation judgement of similarity but not text similarity, so when exhibitor may rhotacism, or while there is accent, also can use the method to judge.
It will be understood by those skilled in the art that in above-described embodiment all with similarity more high confidence level more greatly example describe, but degree of confidence also can be carried out contrary setting, more high confidence level is lower for similarity, so corresponding Rule of judgment also can be contrary.
In one embodiment of the invention, not only the prototype network of one deck can be set up by the way, two layer model networks can also be set up, by the structure of two layer model networks, not only can realize for the judgement of showing keyword in content, can also further improve the accuracy of region identification.An example of the second prototype network has been shown in Fig. 3.Describe the foundation of two layer model networks below in detail.
In embodiment shown in Fig. 2, the method may further include: obtain a plurality of the second keyword sequences, at least one second keyword sequence in described the second keyword sequence is corresponding with at least one region in a plurality of regions, and at least one second keyword sequence comprises at least one keyword; According to described a plurality of the second keyword sequences, obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described a plurality of the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of current speech segment.Accordingly, during the judgement corresponding region of current speech segment, not only want the degree of confidence of judging area keyword, also will judge the degree of confidence of corresponding the second keyword sequence of current region.Concrete, whether the degree of confidence that judges corresponding the second keyword sequence of current region is less than a threshold value, if be less than this threshold value, and the degree of confidence of the corresponding region of current speech segment keyword meets the requirement described in above-described embodiment, judge that the corresponding region of current speech segment is the associated region of region keyword satisfying condition.The degree of confidence of the second keyword sequence obtains according to the degree of confidence of keyword included in the second keyword sequence, such as the adding and or weighted sum etc. of degree of confidence that is included keyword.Can see, by utilizing the confirmation of the second keyword sequence auxiliary area in the second prototype network, the accuracy that further reinforced region is confirmed.
In one embodiment of the invention, can also utilize the auxiliary confirmation of carrying out region of the second aligned phoneme sequence in the second prototype network.Wherein, need judgement when using the second prototype network to analyze voice messaging, whether the number of times that obtains output according to the second aligned phoneme sequence reaches a threshold value, if reach and the degree of confidence of region keyword meets the requirement described in above-described embodiment, judge that the corresponding region of current speech segment is the associated region of region keyword satisfying condition.
In order to carry out association to the keyword and the current speech segment that comprise in displaying content, in one embodiment of the invention, this data processing method also comprises: use described the second prototype network to analyze current speech segment, to judge the corresponding keyword of current speech segment.Can be with reference to above-described embodiment while using the second prototype network to analyze voice messaging, for example obtain the degree of confidence of at least one keyword in the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of current speech segment, and according to the corresponding keyword of degree of confidence judgement current speech segment of keyword.By said method, can be according to exhibitor's voice, to showing that the keyword in content carry out association, and need not exhibitor or other people manually mark.Optionally, in the present embodiment, can further include the step of mark, also, in showing content, mark the corresponding keyword of current speech segment.By to showing that the keyword in content marks automatically, guaranteed the integrality of showing, also saved manpower.For example, exhibitor, when talking about this region, geographic position of forest, when mentioning longitude, can mark the longitude in displaying content, thereby causes audience or spectators' attention.It will be understood by those skilled in the art that for the concrete label technology of keyword and can adopt prior art, and mode is various, for example, keyword is carried out highlightedly, or pour down line in keyword subscript, or show keyword etc. in video content.And the speech recognition speed that the structure of two layer model networks can avoid keyword too much to cause is crossed slow problem, can also improve the granularity of speech recognition.It will be understood by those skilled in the art that after having determined the corresponding keyword of current speech segment to have other application process, such as keyword is recorded, statistics etc.
In above-described embodiment, the first keyword sequence is set for region, and keyword in each region that the second keyword sequence comprises.Be appreciated that the second keyword sequence and region might not be relations one to one, the region for example having may not have the second corresponding keyword sequence, and the second keyword sequence having may be corresponding with a plurality of regions, for example the corresponding keyword in a plurality of regions is all the same, just can use same the second keyword sequence.In example before, mentioned, and for the high frequency words all occurring in a plurality of regions, conventionally can not be used as region keyword, but this high frequency words can be used as the keyword in the second keyword sequence, because the second keyword sequence is for each region.And, for the keyword in the second keyword sequence, can carry out artificial adjustment and setting, for example exhibitor wishes that the word highlighting also can be used as the keyword in the second keyword sequence.Conventionally, the keyword in the second keyword sequence can be the high frequency words in this region, or other exhibitors wish the word that is marked in displaying or emphasizes.
In one embodiment of the invention, the second prototype network can also comprise the second aligned phoneme sequence.The second aligned phoneme sequence can be identical with the first aligned phoneme sequence, also can be different.Same, the second aligned phoneme sequence also consists of phoneme.In the second prototype network, can comprise one or more second aligned phoneme sequence, corresponding second aligned phoneme sequence of a plurality of the second keyword sequences for example, or corresponding second aligned phoneme sequence of each second keyword sequence, wherein a plurality of the second aligned phoneme sequence can be the same or different.
In one embodiment of the invention, after having determined the corresponding region of current speech segment by speech analysis, can use the second prototype network corresponding to this region to analyze this current voice snippet, thereby realize determining keyword.In another embodiment of the present invention, can use the first prototype network and the second prototype network to analyze current speech segment simultaneously, and while all comprising same keyword in a plurality of regions, in conjunction with definite region decision, should be the keyword in which region.
In one embodiment of the invention, also can pass through the degree of confidence change of the second aligned phoneme sequence to keyword.For example, according to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in text message; If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches a threshold value, improve the degree of confidence that this pronunciation similarity reaches the keyword of threshold value.
In one embodiment of the invention, because a keyword in the displaying content in a region may occur repeatedly, for which keyword judgement more accurately should mark, can utilize the second aligned phoneme sequence to carry out auxiliary judgment.Concrete may be embodied as: according to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; If the degree of confidence of at least one keyword reaches first threshold, the keyword that definite this degree of confidence reaches first threshold is candidate keywords; The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and this candidate keywords, the corresponding text message of described candidate keywords comprises the context of this candidate keywords in text message; If the pronunciation similarity of in the corresponding text message of described at least one phoneme and this candidate keywords reaches Second Threshold, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches Second Threshold.In said method step, obtain candidate keywords and obtain there is no specific execution sequence between at least one phoneme adjacent with current speech segment, can successively carry out, also can carry out simultaneously.For example; when exhibitor tells about the Animal resources of forest; in text message, two positions, all there is northeastern tiger; a position is " the mammal resource existing in forest has: northeastern tiger, sika deer ", and another position is " wherein animals under first-class state protection have: northeastern tiger, golden eagle ".Can see and in the same area, have two positions all to occur same keyword, so just need to judge by the phoneme adjacent with current speech segment which the corresponding keyword of current speech segment is actually.By utilizing the second aligned phoneme sequence, can carry out more careful, judgement exactly to the keyword in text message.
In one embodiment of the invention, can to text message, carry out when text analyzing obtains the first keyword sequence also obtaining the second keyword sequence; Or the second keyword sequence can obtain obtaining according to text analyzing after obtaining the first keyword sequence generation.
In one embodiment of the invention, can be using predefined text message as keyword or the context of region keyword.Like this can be so that determining of the judgement in region and keyword be more flexible, for example exhibitor previews before displaying, find identification error or the keyword misjudgment in some region, voice messaging in the time of can be previewed or other text messages that is more conducive to judgement are as the region keyword making a mistake or the context of keyword, thus the accuracy of identification while having improved formal displaying.
Between above-mentioned each embodiment of the method, can be bonded to each other and reference, obtain more embodiment.The method providing by above-described embodiment, can realize the automatic redirect for region, and can realize for the mark of showing keyword in content.And, utilizing the output of the second aligned phoneme sequence, can locate more accurately the keyword of needs mark, because this output will draw originally in speech analysis, so do not increase extra work amount.Utilize the second aligned phoneme sequence auxiliary judgment whether to carry out the redirect in region.Utilize the first aligned phoneme sequence can judge more exactly the corresponding keyword of current speech segment, thereby obtain more accurately the corresponding region of current speech segment, and carry out the redirect in region.Therefore, according to above-described embodiment, not only can realize the robotization of showing content redirect, mark, can also improve the degree of accuracy of speech recognition, can't increase concrete calculated amount, can not consume more resource simultaneously.
In above-described embodiment and following each embodiment all there is threshold value in many places, and these threshold values can be the same or different, and the present invention does not carry out concrete restriction.
Fig. 4 shows a kind of methods of exhibiting that the embodiment of the present invention provides.The method comprises: step 410, and to obtain and show text message corresponding to content, this displaying content comprises a plurality of regions; Step 420, the text message obtaining is carried out to text analyzing, obtain a plurality of the second keyword sequences, at least one second keyword sequence in the plurality of the second keyword sequence is corresponding with at least one region in a plurality of regions, and at least one second keyword sequence comprises at least one keyword; Step 430, obtains and the voice messaging of showing that content is relevant; Step 440, according to the second keyword sequence, obtains the degree of confidence of at least part of keyword at least part of the second keyword sequence; Step 450, is less than a threshold value in response to the degree of confidence of corresponding the second keyword sequence of current region, and current region is left in redirect.
In the present embodiment, concrete implementation detail can be with reference to the embodiment shown in Fig. 2.From embodiment illustrated in fig. 2 different be, the identification in region embodiment illustrated in fig. 2 is mainly the judgement that relies on the region keyword in the first keyword sequence, and the identification in region embodiment illustrated in fig. 4 is mainly the judgement that relies on the keyword in the second keyword sequence.Can see, because the second keyword sequence correspondence region, so if the degree of confidence of second keyword sequence corresponding with current region is too low, current region has been left in the explanation that can judge exhibitor, enter next region, therefore, carried out the redirect in region.By said method, can realize for the automatic region redirect of showing content, save manually-operated manpower, and improved the integrality of showing.
In one embodiment of the invention, can also to the redirect in region, control in conjunction with the region keyword in the first keyword sequence.Concrete, can obtain according to the embodiment shown in Fig. 2 the degree of confidence of at least one region keyword in the first keyword sequence, and when first condition meets, jump to the region keyword associated region related with first condition.This first condition is for below one of at least: the degree of confidence of a region keyword reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area and reach a threshold value.
In one embodiment of the invention, can also confirm together region in conjunction with the region keyword in the first keyword sequence, concrete method can be with reference to embodiment illustrated in fig. 2.
In one embodiment of the invention, can also to the redirect in region, control according to the degree of confidence of other the second keyword sequences.For example, if the degree of confidence of one second keyword sequence reaches a threshold value, jump to this corresponding region of the second keyword sequence.Because the degree of confidence of the second keyword sequence corresponding to current region is very low, and the degree of confidence of another the second keyword sequence is higher, and this is to judge, should leave current region and jump to this another the second corresponding region of keyword sequence.
In one embodiment of the invention, can also be to showing that the keyword in content marks.Concrete, if the degree of confidence of a keyword reaches a threshold value, determine that the corresponding keyword of current speech segment is this keyword, and mark this keyword in showing content.
In one embodiment of the invention, can also change according to the second aligned phoneme sequence the degree of confidence of keyword.Concrete grammar can be with reference to the embodiment shown in Fig. 2.
Embodiment, also can have advantages of that two layer model networks have as shown in Figure 4, and its concrete implementation can, with reference to the embodiment shown in Fig. 2, repeat no more herein.
As shown in Figure 5, the embodiment of the present invention provides a kind of device 500 for data processing.This device 500 comprises: text acquisition module 510, and be configured to obtain and show text message corresponding to content, wherein said displaying content comprises a plurality of regions; Text analysis model 520, is configured to described text message to carry out text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; Voice acquisition module 530, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first speech analysis module 540, is configured to use the first prototype network to analyze current speech segment, and to judge the corresponding region of current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
According to one embodiment of the invention, the first speech analysis module 540 comprises: the first degree of confidence submodule, be configured to according to described the first keyword sequence, obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein, larger with the degree of confidence of the higher region keyword of the similarity of current speech segment; Submodule is determined in region, if be configured to first condition, meets, and determines that the corresponding region of described current speech segment is the related associated region of region keyword of described the 6th condition; Wherein, described the 6th condition comprise following one of at least: the degree of confidence of a region keyword reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area and reach a threshold value.
According to one embodiment of the invention, the first prototype network further comprises the first aligned phoneme sequence.Stating the first speech analysis module 540 further comprises: the first phoneme submodule, be configured to according to described the first aligned phoneme sequence, and obtain at least one phoneme adjacent with current speech segment; The first similarity judgement submodule, the pronunciation similarity that is configured to the corresponding text message of described at least one phoneme of judgement and described at least one region keyword, the corresponding text message of described at least one region keyword comprises the context of this at least one region keyword in text message; First adjusts submodule, if be configured to the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one region keyword, reaches a threshold value, improves the degree of confidence that this pronunciation similarity reaches the region keyword of threshold value.
According to one embodiment of the invention, device 500 further comprises: keyword module, be configured to obtain a plurality of the second keyword sequences, wherein, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; The second speech analysis module, is configured to use the second prototype network to analyze current speech segment, and to judge the corresponding keyword of current speech segment, described the second prototype network comprises described the second keyword sequence.
In one embodiment of the invention, the second prototype network further comprises the second aligned phoneme sequence.And the second speech analysis module comprises: the second phoneme submodule, be configured to according to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; The second degree of confidence submodule, is configured to obtain the degree of confidence of at least one keyword in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment; Candidate judges submodule, if be configured to the degree of confidence of at least one keyword, reaches the 5th threshold value, and the keyword that definite this degree of confidence reaches the 5th threshold value is candidate keywords; The second similarity judges submodule, is configured to the pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and this candidate keywords, and the corresponding text message of described definite keyword comprises the context of this candidate keywords in text message; Submodule determined in keyword, if the pronunciation similarity of being configured in the corresponding text message of described at least one phoneme and this candidate keywords reaches the 6th threshold value, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches the 6th threshold value.
In one embodiment of the invention, install 500 and can also comprise redirect module and/or labeling module.Redirect module is configured to displaying content to jump to the corresponding region of current speech information.Labeling module is configured to mark the corresponding keyword of current speech information in showing content.
In one embodiment of the invention, install 500 and can also comprise other modules, be configured to carry out other steps embodiment illustrated in fig. 2, specifically can, with reference to embodiment illustrated in fig. 2, repeat no more herein.And installing the technique effect that between 500 included modules and module, relation is brought can be with reference to the embodiment shown in Fig. 2.
Above-mentioned can be each other between embodiment illustrated in fig. 5 with reference to, in conjunction with obtaining more embodiment.
As shown in Figure 6, the embodiment of the present invention provides a kind of device 600 for showing.This device 600 comprises: text acquisition module 610, and be configured to obtain and show text message corresponding to content, wherein, described displaying content comprises a plurality of regions; Text analysis model 620, be configured to described text message to carry out text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Voice acquisition module 630, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first degree of confidence module 640, is configured to obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of current speech segment; The second degree of confidence module 650, is configured to the degree of confidence according to described keyword, obtains the degree of confidence of corresponding the second keyword sequence of current region; Redirect module 660, is configured to be less than a threshold value in response to the degree of confidence of corresponding the second keyword sequence of described current region, and described current region is left in redirect.
In one embodiment of the invention, this device 600 also comprises: region keyword module, be configured to obtain the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; The 3rd degree of confidence module, is configured to obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein larger with the degree of confidence of the higher region keyword of the similarity of current speech segment.And redirect module 660, is specifically configured to, if the 3rd condition meets, jump to the associated region of the region keyword related with described the 3rd condition; Wherein, described the 3rd condition comprise following one of at least: the degree of confidence of a region keyword reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area and all reach a threshold value.
In one embodiment of the invention, redirect module 660 is specifically configured to, if second condition meets, jumps to the region that second keyword sequence related with described second condition is corresponding; Wherein, described second condition comprises: the degree of confidence of the second keyword sequence reaches a threshold value.
In one embodiment of the invention, install 600 and further comprise: determination module, if be configured to the degree of confidence of a keyword, reach a threshold value, determine that the corresponding keyword of current speech segment is this keyword; Labeling module, is configured to mark this keyword in described displaying content.
In one embodiment of the invention, install 600 and further comprise: phoneme module, be configured to according to the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; Similarity judge module, be configured to, the pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in text message; Degree of confidence adjusting module, is configured to, if the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches a threshold value, improves the degree of confidence that this pronunciation similarity reaches the keyword of threshold value.
Between each embodiment shown in Fig. 6, reference, combination each other, obtains more embodiment.And the details that realizes in said apparatus embodiment can be with reference to the embodiment shown in Fig. 4.
Process flow diagram in accompanying drawing and block diagram have shown the system according to a plurality of embodiment of the present invention, architectural framework in the cards, function and the operation of method and computer program product.In this, each square frame in process flow diagram or block diagram can represent a part for module, program segment or a code, and a part for described module, program segment or code comprises one or more for realizing the executable instruction of the logic function of regulation.Also it should be noted that what the function marking in square frame also can be marked to be different from accompanying drawing occurs in sequence in some realization as an alternative.For example, in fact two continuous square frames can be carried out substantially concurrently, and they also can be carried out by contrary order sometimes, and this determines according to related function.Also be noted that, each square frame in block diagram and/or process flow diagram and the combination of the square frame in block diagram and/or process flow diagram, can realize by the special-purpose hardware based system of the function putting rules into practice or operation, or can realize with the combination of specialized hardware and computer instruction.
Below described various embodiments of the present invention, above-mentioned explanation is exemplary, exhaustive not, and be also not limited to each disclosed embodiment.In the situation that do not depart from the scope and spirit of each illustrated embodiment, many modifications and changes are all apparent for those skilled in the art.The selection of term used herein, is intended to explain best principle, practical application or the technological improvement to the technology in market of each embodiment, or makes other those of ordinary skill of the art can understand each embodiment disclosing herein.
Claims (20)
1. a data processing method, described method comprises:
Obtain and show text message corresponding to content, described displaying content comprises a plurality of regions;
Described text message is carried out to text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association;
Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment;
Use the first prototype network to analyze described current speech segment, to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
2. method according to claim 1, described the first prototype network of described use is analyzed described current speech segment, to judge the corresponding region of described current speech segment, comprising:
Obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein larger with the degree of confidence of the higher region keyword of the similarity of described current speech segment;
If first condition meets, determine that the corresponding region of described current speech segment is the associated region of region keyword relating in described first condition;
Wherein said first condition comprise following one of at least:
The degree of confidence of a region keyword reaches first threshold;
The degree of confidence of a plurality of regions keyword associated with the same area all reaches Second Threshold;
The degree of confidence of a plurality of regions keyword associated with the same area and reach the 3rd threshold value.
3. method according to claim 2,
Described the first prototype network further comprises the first aligned phoneme sequence;
Described the first prototype network of described use is analyzed described current speech segment, to judge the corresponding region of described current speech segment, further comprises:
According to described the first aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one region keyword, the corresponding text message of described at least one region keyword comprises the context of this at least one region keyword in described text message;
If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one region keyword reaches the 4th threshold value, improve the degree of confidence that this pronunciation similarity reaches the region keyword of the 4th threshold value.
4. method according to claim 2,
Described method further comprises: obtain a plurality of the second keyword sequences, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
If described first condition meets, determine that the corresponding region of described current speech segment is the associated region of region keyword relating in described first condition, comprise: if first condition meets and second condition also meets, determine that the corresponding region of described current speech segment is the associated region of region keyword that described first condition relates to, and using the corresponding region of described current speech segment as current region;
Wherein, described second condition comprises: the degree of confidence of corresponding the second keyword sequence of current region is less than the 5th threshold value, and the degree of confidence of corresponding the second keyword sequence of described current region obtains according to the degree of confidence of keyword included in corresponding the second keyword sequence of described current region.
5. method according to claim 2,
Described method further comprises: obtain a plurality of the second keyword sequences, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Use the second prototype network to analyze described voice snippet, described the second prototype network comprises described the second keyword sequence and the second aligned phoneme sequence;
If described first condition meets, determine that the corresponding region of described current speech segment is the associated region of region keyword relating in described first condition, comprise: if first condition meets and the 3rd condition also meets, determine that the corresponding region of described current speech segment is the associated region of region keyword that described first condition relates to;
Wherein, described the 3rd condition comprises: when using described the second prototype network to analyze described current speech segment, the number of times that obtains output according to described the second aligned phoneme sequence reaches the 6th threshold value.
6. method according to claim 1, described method further comprises:
Obtain a plurality of the second keyword sequences, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
Use the second prototype network to analyze described current speech segment, to judge the corresponding keyword of described current speech segment, described the second prototype network comprises described the second keyword sequence.
7. method according to claim 6, described the second prototype network of described use is analyzed described current speech segment, to judge the corresponding keyword of described current speech segment, comprising:
Obtain the degree of confidence of at least one keyword in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
If the degree of confidence of at least one keyword reaches the 7th threshold value, determine that the corresponding keyword of described current speech segment reaches the keyword of the 7th threshold value for this degree of confidence.
8. method according to claim 7,
Described the second prototype network further comprises the second aligned phoneme sequence;
Described the second prototype network of described use is analyzed described current speech segment, to judge the corresponding keyword of described current speech segment, comprising:
According to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in text message;
If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches the 8th threshold value, improve the degree of confidence that this pronunciation similarity reaches the keyword of the 8th threshold value.
9. method according to claim 7,
Described the second prototype network further comprises the second aligned phoneme sequence;
If the degree of confidence of described at least one keyword reaches the 7th threshold value, determine that the corresponding keyword of described current speech segment, for this degree of confidence reaches the keyword of the 7th threshold value, comprising:
According to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
If the degree of confidence of at least one keyword reaches the 7th threshold value, the keyword that definite this degree of confidence reaches the 7th threshold value is candidate keywords;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and this candidate keywords, the corresponding text message of described candidate keywords comprises the context of described candidate keywords in text message;
If the pronunciation similarity of in the corresponding text message of described at least one phoneme and described candidate keywords reaches the 9th threshold value, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches the 9th threshold value.
10. a methods of exhibiting, described method comprises:
Obtain and show text message corresponding to content, described displaying content comprises a plurality of regions;
Described text message is carried out to text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment;
Obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
According to the degree of confidence of described keyword, obtain the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region;
Degree of confidence in response to corresponding the second keyword sequence of described current region is less than the tenth threshold value, and described current region is left in redirect.
11. methods according to claim 10, described method further comprises:
Obtain the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association;
According to described the first keyword sequence, obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein larger with the degree of confidence of the higher region keyword of the similarity of described current speech segment;
Described redirect is left described current region and is comprised: if the 4th condition meets, jump to described the 4th condition in the associated region of the region keyword that relates to, and using with described the 4th condition in the associated region of the region keyword that relates to as current region;
Wherein, described the 4th condition comprise following one of at least:
The degree of confidence of a region keyword reaches the 11 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area all reaches the 12 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area and reach the 13 threshold value.
12. methods according to claim 10, described redirect is left described current region and is comprised: if the 5th condition meets, jump to the region that second keyword sequence related with described the 5th condition is corresponding, and using with described the 5th condition in the associated region of the region keyword that relates to as current region; Wherein, described the 5th condition comprises:
The degree of confidence of at least one the second keyword sequence reaches the 14 threshold value.
13. methods according to claim 10, described method further comprises:
If the degree of confidence of at least one keyword reaches the 15 threshold value, determine that the corresponding keyword of described current speech segment reaches the keyword of the 15 threshold value for this degree of confidence;
In described displaying content, mark the keyword that this degree of confidence reaches the 15 threshold value.
14. according to claim 10 to the method described in any one in 13, and described method further comprises:
According to the second aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in described text message;
If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches the 16 threshold value, improve the degree of confidence that this pronunciation similarity reaches the keyword of the 16 threshold value.
15. 1 kinds of devices for data processing, described device comprises:
Text acquisition module, is configured to obtain and shows text message corresponding to content, and wherein said displaying content comprises a plurality of regions;
Text analysis model, is configured to described text message to carry out text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association;
Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment;
The first speech analysis module, is configured to use the first prototype network to analyze described current speech segment, and to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
16. devices according to claim 15, described the first speech analysis module comprises:
The first degree of confidence submodule, is configured to obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein, larger with the degree of confidence of the higher region keyword of the similarity of described current speech segment;
Submodule is determined in region, if the 6th condition that is configured to meets, determines that the corresponding region of described current speech segment is the associated region of region keyword relating in described the 6th condition;
Wherein, described the 6th condition comprise following one of at least:
The degree of confidence of a region keyword reaches the 17 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area all reaches the 18 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area and reach the 19 threshold value.
17. devices according to claim 16,
Described the first prototype network further comprises the first aligned phoneme sequence;
Described the first speech analysis module further comprises:
The first phoneme submodule, is configured to according to described the first aligned phoneme sequence, obtains at least one phoneme adjacent with described current speech segment;
The first similarity judgement submodule, the pronunciation similarity that is configured to the corresponding text message of described at least one phoneme of judgement and described at least one region keyword, the corresponding text message of described at least one region keyword comprises the context of this at least one region keyword in text message;
First adjusts submodule, if be configured to the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one region keyword, reaches the 20 threshold value, improves the degree of confidence that this pronunciation similarity reaches the region keyword of the 20 threshold value.
18. devices according to claim 15, described device further comprises:
Keyword module, be configured to obtain a plurality of the second keyword sequences, wherein, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
The second speech analysis module, is configured to use the second prototype network to analyze described current speech segment, and to judge the corresponding keyword of described current speech segment, described the second prototype network comprises described the second keyword sequence.
19. devices according to claim 18,
Described the second prototype network further comprises the second aligned phoneme sequence;
Described the second speech analysis module comprises:
The second phoneme submodule, is configured to according to described the second aligned phoneme sequence, obtains at least one phoneme adjacent with described current speech segment;
The second degree of confidence submodule, is configured to obtain the degree of confidence of at least one keyword in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
Candidate judges submodule, if be configured to the degree of confidence of at least one keyword, reaches the 21 threshold value, and the keyword that definite this degree of confidence reaches the 21 threshold value is candidate keywords;
The second similarity judgement submodule, the pronunciation similarity that is configured to the corresponding text message of described at least one phoneme of judgement and this candidate keywords, the corresponding text message of described candidate keywords comprises the context of this candidate keywords in described text message;
Submodule determined in keyword, if the pronunciation similarity of being configured in the corresponding text message of described at least one phoneme and this candidate keywords reaches the 22 threshold value, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches the 22 threshold value.
20. 1 kinds of devices for showing, described device comprises:
Text acquisition module, is configured to obtain and shows text message corresponding to content, and wherein, described displaying content comprises a plurality of regions;
Text analysis model, be configured to described text message to carry out text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment;
The first degree of confidence module, is configured to obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
The second degree of confidence module, is configured to the degree of confidence according to described keyword, obtains the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region;
Redirect module, is configured to be less than the 23 threshold value in response to the degree of confidence of corresponding the second keyword sequence of described current region, and described current region is left in redirect.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210241787.1A CN103544140A (en) | 2012-07-12 | 2012-07-12 | Data processing method, display method and corresponding devices |
US13/924,832 US9158752B2 (en) | 2012-07-12 | 2013-06-24 | Data processing method, presentation method, and corresponding apparatuses |
US13/943,308 US9158753B2 (en) | 2012-07-12 | 2013-07-16 | Data processing method, presentation method, and corresponding apparatuses |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210241787.1A CN103544140A (en) | 2012-07-12 | 2012-07-12 | Data processing method, display method and corresponding devices |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103544140A true CN103544140A (en) | 2014-01-29 |
Family
ID=49914715
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210241787.1A Pending CN103544140A (en) | 2012-07-12 | 2012-07-12 | Data processing method, display method and corresponding devices |
Country Status (2)
Country | Link |
---|---|
US (2) | US9158752B2 (en) |
CN (1) | CN103544140A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105895085A (en) * | 2016-03-30 | 2016-08-24 | 科大讯飞股份有限公司 | Multimedia transliteration method and system |
CN107886938A (en) * | 2016-09-29 | 2018-04-06 | 中国科学院深圳先进技术研究院 | Virtual reality guides hypnosis method of speech processing and device |
WO2018157789A1 (en) * | 2017-03-02 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Speech recognition method, computer, storage medium, and electronic apparatus |
CN110265018A (en) * | 2019-07-01 | 2019-09-20 | 成都启英泰伦科技有限公司 | A kind of iterated command word recognition method continuously issued |
CN110364142A (en) * | 2019-06-28 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Phoneme of speech sound recognition methods and device, storage medium and electronic device |
CN110770819A (en) * | 2017-06-15 | 2020-02-07 | 北京嘀嘀无限科技发展有限公司 | Speech recognition system and method |
CN111539197A (en) * | 2020-04-15 | 2020-08-14 | 北京百度网讯科技有限公司 | Text matching method and device, computer system and readable storage medium |
CN111954864A (en) * | 2018-04-11 | 2020-11-17 | 微软技术许可有限责任公司 | Automated presentation control |
CN112041905A (en) * | 2018-04-13 | 2020-12-04 | 德沃特奥金有限公司 | Control device for furniture drive and method for controlling furniture drive |
Families Citing this family (157)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US9202460B2 (en) * | 2008-05-14 | 2015-12-01 | At&T Intellectual Property I, Lp | Methods and apparatus to generate a speech recognition library |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US20120311585A1 (en) | 2011-06-03 | 2012-12-06 | Apple Inc. | Organizing task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
AU2014214676A1 (en) | 2013-02-07 | 2015-08-27 | Apple Inc. | Voice trigger for a digital assistant |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
KR101922663B1 (en) | 2013-06-09 | 2018-11-28 | 애플 인크. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
DE112014003653B4 (en) | 2013-08-06 | 2024-04-18 | Apple Inc. | Automatically activate intelligent responses based on activities from remote devices |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9396177B1 (en) * | 2014-02-10 | 2016-07-19 | Jpmorgan Chase Bank, N.A. | Systems and methods for document tracking using elastic graph-based hierarchical analysis |
KR102305117B1 (en) * | 2014-04-30 | 2021-09-27 | 삼성전자주식회사 | Method for control a text input and electronic device thereof |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9966065B2 (en) * | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
CN105632499B (en) * | 2014-10-31 | 2019-12-10 | 株式会社东芝 | Method and apparatus for optimizing speech recognition results |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10141010B1 (en) * | 2015-10-01 | 2018-11-27 | Google Llc | Automatic censoring of objectionable song lyrics in audio |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
CN107205076A (en) * | 2016-03-16 | 2017-09-26 | 广州阿里巴巴文学信息技术有限公司 | The page turning method and device of a kind of e-book |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | Intelligent automated assistant in a home environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | User interface for correcting recognition errors |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770429A1 (en) | 2017-05-12 | 2018-12-14 | Apple Inc. | Low-latency intelligent automated assistant |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | MULTI-MODAL INTERFACES |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | Far-field extension for digital assistant services |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10916258B2 (en) * | 2017-06-30 | 2021-02-09 | Telegraph Peak Technologies, LLC | Audio channel monitoring by voice to keyword matching with notification |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US20190129591A1 (en) * | 2017-10-26 | 2019-05-02 | International Business Machines Corporation | Dynamic system and method for content and topic based synchronization during presentations |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
CN110072118A (en) * | 2018-01-24 | 2019-07-30 | 优酷网络技术(北京)有限公司 | Video matching method and device |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | Virtual assistant operation in multi-device environments |
DK179822B1 (en) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | USER ACTIVITY SHORTCUT SUGGESTIONS |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
DK201970511A1 (en) | 2019-05-31 | 2021-02-15 | Apple Inc | Voice identification in digital assistant systems |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11227599B2 (en) | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
WO2021080033A1 (en) * | 2019-10-23 | 2021-04-29 | 엘지전자 주식회사 | Speech analysis method and device |
JP6758732B1 (en) * | 2020-01-06 | 2020-09-23 | 株式会社インタラクティブソリューションズ | Presentation support system |
CN111767391B (en) * | 2020-03-27 | 2024-04-16 | 北京沃东天骏信息技术有限公司 | Target text generation method, device, computer system and medium |
US11183193B1 (en) | 2020-05-11 | 2021-11-23 | Apple Inc. | Digital assistant hardware abstraction |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
US11620993B2 (en) * | 2021-06-09 | 2023-04-04 | Merlyn Mind, Inc. | Multimodal intent entity resolver |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040210443A1 (en) * | 2003-04-17 | 2004-10-21 | Roland Kuhn | Interactive mechanism for retrieving information from audio and multimedia files containing speech |
US20060100851A1 (en) * | 2002-11-13 | 2006-05-11 | Bernd Schonebeck | Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries |
CN101034455A (en) * | 2006-03-06 | 2007-09-12 | 腾讯科技(深圳)有限公司 | Method and system for implementing online advertisement |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6272461B1 (en) * | 1999-03-22 | 2001-08-07 | Siemens Information And Communication Networks, Inc. | Method and apparatus for an enhanced presentation aid |
US20020099549A1 (en) * | 2000-12-04 | 2002-07-25 | Nguyen Khang Kv. | Method for automatically presenting a digital presentation |
JP4088131B2 (en) * | 2002-03-28 | 2008-05-21 | 富士通株式会社 | Synchronous content information generation program, synchronous content information generation device, and synchronous content information generation method |
US20040210433A1 (en) * | 2003-04-21 | 2004-10-21 | Gidon Elazar | System, method and apparatus for emulating a web server |
US7725318B2 (en) * | 2004-07-30 | 2010-05-25 | Nice Systems Inc. | System and method for improving the accuracy of audio searching |
US7908141B2 (en) * | 2004-09-29 | 2011-03-15 | International Business Machines Corporation | Extracting and utilizing metadata to improve accuracy in speech to text conversions |
US20070106685A1 (en) * | 2005-11-09 | 2007-05-10 | Podzinger Corp. | Method and apparatus for updating speech recognition databases and reindexing audio and video content using the same |
US8171412B2 (en) * | 2006-06-01 | 2012-05-01 | International Business Machines Corporation | Context sensitive text recognition and marking from speech |
US8090570B2 (en) * | 2006-10-26 | 2012-01-03 | Mobile Technologies, Llc | Simultaneous translation of open domain lectures and speeches |
WO2008106655A1 (en) * | 2007-03-01 | 2008-09-04 | Apapx, Inc. | System and method for dynamic learning |
US7549120B1 (en) * | 2008-04-07 | 2009-06-16 | International Business Machines Corporation | Method and system for analyzing a presentation |
US8712776B2 (en) * | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US9031839B2 (en) * | 2010-12-01 | 2015-05-12 | Cisco Technology, Inc. | Conference transcription based on conference data |
US8954329B2 (en) * | 2011-05-23 | 2015-02-10 | Nuance Communications, Inc. | Methods and apparatus for acoustic disambiguation by insertion of disambiguating textual information |
WO2013163494A1 (en) * | 2012-04-27 | 2013-10-31 | Interactive Itelligence, Inc. | Negative example (anti-word) based performance improvement for speech recognition |
US9035955B2 (en) * | 2012-05-16 | 2015-05-19 | Microsoft Technology Licensing, Llc | Synchronizing virtual actor's performances to a speaker's voice |
US10019983B2 (en) * | 2012-08-30 | 2018-07-10 | Aravind Ganapathiraju | Method and system for predicting speech recognition performance using accuracy scores |
CN103971678B (en) * | 2013-01-29 | 2015-08-12 | 腾讯科技(深圳)有限公司 | Keyword spotting method and apparatus |
-
2012
- 2012-07-12 CN CN201210241787.1A patent/CN103544140A/en active Pending
-
2013
- 2013-06-24 US US13/924,832 patent/US9158752B2/en not_active Expired - Fee Related
- 2013-07-16 US US13/943,308 patent/US9158753B2/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060100851A1 (en) * | 2002-11-13 | 2006-05-11 | Bernd Schonebeck | Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries |
US20040210443A1 (en) * | 2003-04-17 | 2004-10-21 | Roland Kuhn | Interactive mechanism for retrieving information from audio and multimedia files containing speech |
CN101034455A (en) * | 2006-03-06 | 2007-09-12 | 腾讯科技(深圳)有限公司 | Method and system for implementing online advertisement |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105895085B (en) * | 2016-03-30 | 2019-10-18 | 讯飞智元信息科技有限公司 | A kind of multimedia transfer method and system |
CN105895085A (en) * | 2016-03-30 | 2016-08-24 | 科大讯飞股份有限公司 | Multimedia transliteration method and system |
CN107886938B (en) * | 2016-09-29 | 2020-11-17 | 中国科学院深圳先进技术研究院 | Virtual reality guidance hypnosis voice processing method and device |
CN107886938A (en) * | 2016-09-29 | 2018-04-06 | 中国科学院深圳先进技术研究院 | Virtual reality guides hypnosis method of speech processing and device |
WO2018157789A1 (en) * | 2017-03-02 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Speech recognition method, computer, storage medium, and electronic apparatus |
CN110770819B (en) * | 2017-06-15 | 2023-05-12 | 北京嘀嘀无限科技发展有限公司 | Speech recognition system and method |
CN110770819A (en) * | 2017-06-15 | 2020-02-07 | 北京嘀嘀无限科技发展有限公司 | Speech recognition system and method |
CN111954864A (en) * | 2018-04-11 | 2020-11-17 | 微软技术许可有限责任公司 | Automated presentation control |
CN111954864B (en) * | 2018-04-11 | 2024-05-14 | 微软技术许可有限责任公司 | Automated presentation control |
CN112041905A (en) * | 2018-04-13 | 2020-12-04 | 德沃特奥金有限公司 | Control device for furniture drive and method for controlling furniture drive |
CN110364142A (en) * | 2019-06-28 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Phoneme of speech sound recognition methods and device, storage medium and electronic device |
CN110364142B (en) * | 2019-06-28 | 2022-03-25 | 腾讯科技(深圳)有限公司 | Speech phoneme recognition method and device, storage medium and electronic device |
CN110265018A (en) * | 2019-07-01 | 2019-09-20 | 成都启英泰伦科技有限公司 | A kind of iterated command word recognition method continuously issued |
CN111539197A (en) * | 2020-04-15 | 2020-08-14 | 北京百度网讯科技有限公司 | Text matching method and device, computer system and readable storage medium |
CN111539197B (en) * | 2020-04-15 | 2023-08-15 | 北京百度网讯科技有限公司 | Text matching method and device, computer system and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
US9158753B2 (en) | 2015-10-13 |
US20140019121A1 (en) | 2014-01-16 |
US9158752B2 (en) | 2015-10-13 |
US20140019133A1 (en) | 2014-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103544140A (en) | Data processing method, display method and corresponding devices | |
US11350178B2 (en) | Content providing server, content providing terminal and content providing method | |
US10210769B2 (en) | Method and system for reading fluency training | |
US9066049B2 (en) | Method and apparatus for processing scripts | |
JP6928642B2 (en) | Audio broadcasting method and equipment | |
CN111160004B (en) | Method and device for establishing sentence-breaking model | |
CN109754783A (en) | Method and apparatus for determining the boundary of audio sentence | |
CN110600002B (en) | Voice synthesis method and device and electronic equipment | |
CN112632326A (en) | Video production method and device based on video script semantic recognition | |
WO2022228235A1 (en) | Method and apparatus for generating video corpus, and related device | |
CN111142667A (en) | System and method for generating voice based on text mark | |
CN110517668A (en) | A kind of Chinese and English mixing voice identifying system and method | |
CN110750996A (en) | Multimedia information generation method and device and readable storage medium | |
CN113225612A (en) | Subtitle generating method and device, computer readable storage medium and electronic equipment | |
KR102553511B1 (en) | Method, device, electronic equipment and storage medium for video processing | |
CN115883919A (en) | Video processing method, video processing device, electronic equipment and storage medium | |
CN113761865A (en) | Sound and text realignment and information presentation method and device, electronic equipment and storage medium | |
CN113221514A (en) | Text processing method and device, electronic equipment and storage medium | |
CN118784942B (en) | Video generation method, electronic device, storage medium and product | |
CN111475708A (en) | Push method, medium, device and computing equipment for follow-up reading content | |
CN111562864B (en) | Picture display method, electronic device and computer readable medium | |
CN114694657A (en) | Method for cutting audio file and related product | |
KR20100014031A (en) | Device and method for making u-contents by easily, quickly and accurately extracting only wanted part from multimedia file | |
CN116153289A (en) | Processing method and related device for speech synthesis marked text | |
CN116156248A (en) | Video generation method, device, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140129 |
|
WD01 | Invention patent application deemed withdrawn after publication |