CN103544140A - Data processing method, display method and corresponding devices - Google Patents

Data processing method, display method and corresponding devices Download PDF

Info

Publication number
CN103544140A
CN103544140A CN201210241787.1A CN201210241787A CN103544140A CN 103544140 A CN103544140 A CN 103544140A CN 201210241787 A CN201210241787 A CN 201210241787A CN 103544140 A CN103544140 A CN 103544140A
Authority
CN
China
Prior art keywords
keyword
region
confidence
degree
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210241787.1A
Other languages
Chinese (zh)
Inventor
张世磊
刘�文
包胜华
陈健
施勤
苏中
秦勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CN201210241787.1A priority Critical patent/CN103544140A/en
Priority to US13/924,832 priority patent/US9158752B2/en
Priority to US13/943,308 priority patent/US9158753B2/en
Publication of CN103544140A publication Critical patent/CN103544140A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of voice recognition, and discloses a data processing method. The data processing method comprises the following steps that a text message corresponding to the display content is acquired, wherein the display content comprises a plurality of areas; text analysis is conducted on the text message, so that a first key word sequence is obtained, wherein the first key word sequence comprises an area key word associated with at least one of the plurality of areas; a voice message related to the display content is acquired, wherein the voice message at least comprises a current voice fragment; a first model network is used for analyzing the current voice fragment so as to judge the area corresponding to the current voice fragment, wherein the first model network comprises the first key word sequence. Accordingly, the invention further discloses a display method, a corresponding device used for data processing and a corresponding device used for display. By means of the technical scheme, the voice fragment can be associated with different areas of the display content, and automatic skip of the display content can be achieved according to the different areas.

Description

A kind of data processing method, methods of exhibiting and accordingly device
Technical field
The present invention relates to field of speech recognition, more specifically, relate to a kind of method, methods of exhibiting of data processing and install accordingly.
Background technology
Along with the development of modern society, in increasing occasion, in order to facilitate audience or spectators' understanding or the power that attracts attention, people usually need to coordinate explanation/speech to show.For example, when sales force is client's products Presentation or scheme, just usually need by displayings such as electronic slides, audio frequency and video; Technician also usually uses these technological means to show in explanation technical scheme; During remote teaching, teacher more needs to rely on these technological means to diffuse information to student.
Now, people, when carrying out above-mentioned displaying, show that content cannot jump to the region corresponding with current explanation automatically along with exhibitor's explanation, also, the explanation at the exhibitor scene zone association different from showing content cannot be got up.This has just caused, for the redirect of showing content zones of different, needing artificial intervention, thereby has improved the human cost of showing, also easily makes whole displaying be interrupted, and seems sufficiently complete and smooth.
For the problems referred to above of the prior art, need a kind of technology that on-the-spot voice messaging and the zones of different of showing content are associated.
Summary of the invention
In order to realize voice messaging and to show the associated of content, the invention provides a kind of data processing method, a kind of methods of exhibiting, a kind of device for data processing and a kind of device for showing.
According to an aspect of the present invention, provide a kind of data processing method, described method comprises: obtain and show text message corresponding to content, described displaying content comprises a plurality of regions; Described text message is carried out to text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment; Use the first prototype network to analyze described current speech segment, to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
According to another aspect of the present invention, provide a kind of methods of exhibiting, described method comprises: obtain and show text message corresponding to content, described displaying content comprises a plurality of regions; Described text message is carried out to text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment; Obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment; According to the degree of confidence of described keyword, obtain the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region; Degree of confidence in response to corresponding the second keyword sequence of described current region is less than the tenth threshold value, and described current region is left in redirect.
According to a further aspect of the invention, provide a kind of device for data processing, described device comprises: text acquisition module, and be configured to obtain and show text message corresponding to content, wherein said displaying content comprises a plurality of regions; Text analysis model, is configured to described text message to carry out text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first speech analysis module, is configured to use the first prototype network to analyze described current speech segment, and to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
According to a further aspect of the invention, provide a kind of device for showing, described device comprises:
Text acquisition module, is configured to obtain and shows text message corresponding to content, and wherein, described displaying content comprises a plurality of regions; Text analysis model, be configured to described text message to carry out text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first degree of confidence module, is configured to obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment; The second degree of confidence mould certainly, is configured to the degree of confidence according to described keyword, obtains the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region; Redirect module, is configured to be less than the 23 threshold value in response to the degree of confidence of corresponding the second keyword sequence of described current region, and described current region is left in redirect.
During technical scheme provided by the present invention can realize voice and show content, region is associated, thereby can realize, shows that content is according to the automatic redirect in region.
Accompanying drawing explanation
In conjunction with the drawings disclosure illustrative embodiments is described in more detail, above-mentioned and other object of the present disclosure, Characteristics and advantages will become more obvious, wherein, in disclosure illustrative embodiments, identical reference number represents same parts conventionally.
Fig. 1 shows and is suitable for for realizing the block diagram of the exemplary computer system 100 of embodiment of the present invention;
Fig. 2 shows the schematic flow sheet of a kind of data processing method in the embodiment of the present invention;
Fig. 3 shows an example of the first prototype network and the second prototype network in the embodiment of the present invention;
Fig. 4 shows the schematic flow sheet of a kind of methods of exhibiting of the embodiment of the present invention;
Fig. 5 shows a kind of structural representation for data processing equipment of the embodiment of the present invention;
Fig. 6 shows a kind of structural representation for exhibiting device of the embodiment of the present invention.
Embodiment
Preferred implementation of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown preferred implementation of the present disclosure in accompanying drawing, yet should be appreciated that, can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order to make the disclosure more thorough and complete that these embodiments are provided, and can by the scope of the present disclosure complete convey to those skilled in the art.
Fig. 1 shows and is suitable for for realizing the block diagram of the exemplary computer system 100 of embodiment of the present invention.As shown in Figure 1, computer system 100 can comprise: CPU (CPU (central processing unit)) 101, RAM (random access memory) 102, ROM (ROM (read-only memory)) 103, system bus 104, hard disk controller 105, keyboard controller 106, serial interface controller 107, parallel interface controller 108, display controller 109, hard disk 110, keyboard 111, serial external unit 112, parallel external unit 113 and display 114.In these equipment, with system bus 104 coupling have CPU 101, RAM 102, ROM 103, hard disk controller 105, keyboard controller 106, serialization controller 107, parallel controller 108 and a display controller 109.Hard disk 110 and hard disk controller 105 couplings, keyboard 111 and keyboard controller 106 couplings, serial external unit 112 and serial interface controller 107 couplings, parallel external unit 113 and parallel interface controller 108 couplings, and display 114 and display controller 109 couplings.Should be appreciated that the structured flowchart described in Fig. 1 is only used to the object of example, rather than limitation of the scope of the invention.In some cases, can increase as the case may be or reduce some equipment.
Person of ordinary skill in the field knows, the present invention can be implemented as system, method or computer program.Therefore, the disclosure can specific implementation be following form, that is: can be completely hardware, also can be software (comprising firmware, resident software, microcode etc.) completely, can also be the form of hardware and software combination, be commonly referred to as " circuit ", " module " or " system " herein.In addition, in certain embodiments, the present invention can also be embodied as the form of the computer program in one or more computer-readable mediums, comprises computer-readable program code in this computer-readable medium.
Can adopt the combination in any of one or more computer-readable media.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable recording medium can be for example--but being not limited to--electricity, magnetic, optical, electrical magnetic, infrared ray or semi-conductive system, device or device, or the combination arbitrarily.The example more specifically of computer-readable recording medium (non exhaustive list) comprising: have the electrical connection, portable computer diskette, hard disk, random access memory (RAM), ROM (read-only memory) (ROM), erasable type programmable read only memory (EPROM or flash memory), optical fiber, Portable, compact disk ROM (read-only memory) (CD-ROM), light storage device, magnetic memory device of one or more wires or the combination of above-mentioned any appropriate.In presents, computer-readable recording medium can be any comprising or stored program tangible medium, and this program can be used or be combined with it by instruction execution system, device or device.
Computer-readable signal media can be included in base band or the data-signal of propagating as a carrier wave part, has wherein carried computer-readable program code.The combination of electromagnetic signal that the data-signal of this propagation can adopt various ways, comprises--but being not limited to--, light signal or above-mentioned any appropriate.Computer-readable signal media can also be any computer-readable medium beyond computer-readable recording medium, and this computer-readable medium can send, propagates or transmit the program for being used or be combined with it by instruction execution system, device or device.
The program code comprising on computer-readable medium can be with any suitable medium transmission, comprises that--but being not limited to--is wireless, electric wire, optical cable, RF etc., or the combination of above-mentioned any appropriate.
Can combine to write for carrying out the computer program code of the present invention's operation with one or more programming languages or its, described programming language comprises object-oriented programming language-such as Java, Smalltalk, C++, also comprises conventional process type programming language-such as " C " language or similar programming language.Program code can fully be carried out, partly on subscriber computer, carries out, as an independently software package execution, part part on subscriber computer, carry out or on remote computer or server, carry out completely on remote computer on subscriber computer.In relating to the situation of remote computer, remote computer can be by the network of any kind---comprise LAN (Local Area Network) (LAN) or wide area network (WAN)-be connected to subscriber computer, or, can be connected to outer computer (for example utilizing ISP to pass through Internet connection).
Process flow diagram and/or block diagram below with reference to method, device (system) and the computer program of the embodiment of the present invention are described the present invention.Should be appreciated that the combination of each square frame in each square frame of process flow diagram and/or block diagram and process flow diagram and/or block diagram, can be realized by computer program instructions.These computer programs capture the processor that can offer multi-purpose computer, special purpose computer or other programmable data treating apparatus, thereby produce a kind of machine, these computer program instructions are carried out by computing machine or other programmable data treating apparatus, have produced the device of the function/operation of stipulating in the square frame in realization flow figure and/or block diagram.
Also these computer program instructions can be stored in and can make in computing machine or the computer-readable medium of other programmable data treating apparatus with ad hoc fashion work, like this, the instruction being stored in computer-readable medium just produces a manufacture (manufacture) that comprises the command device (instruction means) of the function/operation of stipulating in the square frame in realization flow figure and/or block diagram.
Also computer program instructions can be loaded on computing machine, other programmable data treating apparatus or miscellaneous equipment, make to carry out sequence of operations step on computing machine, other programmable data treating apparatus or miscellaneous equipment, to produce computer implemented process, thus the process of function/operation that the instruction that makes to carry out on computing machine or other programmable device is stipulated during the square frame in realization flow figure and/or block diagram can be provided.
Referring now to Fig. 2,, Fig. 2 shows a kind of data processing method that the embodiment of the present invention provides.The method comprises the following steps: step 210, and obtain and show text message corresponding to content; Step 220, carries out text analyzing to text information, obtains the first keyword sequence; Step 230, obtains and the voice messaging of showing that content is relevant; Step 240, is used the first prototype network to analyze current speech segment, to judge the corresponding region of current speech segment.
For one embodiment of the invention, in step 210, this displaying content comprises a plurality of regions.Wherein, region can be divided according to different standards, and for example can divide according to different themes, or can divide according to fixing size, or can be according to divisions such as page, paragraphs, the present invention is not limited at this.Take the electronic slides of products Presentation as showing that content is example, and the function of product can form a region, and the structure of product can form a region etc.; Take show content as document be example, each paragraph or each one-level title can form a region; Take show content as picture be example, people different in picture can form different regions, or every pictures forms a region; Take show content as video or audio frequency be example, fixedly the segment of duration can form region, or the segment of different themes content can form different regions.In one embodiment of the invention, if show, content is electronic slides etc., and to take text be main object, and step 210 can be directly will show that text message in content is as text message corresponding to displaying content; If show, content is audio frequency or video, step 210 can obtain text message corresponding to this displaying content by exhibitor's preview being carried out to speech recognition, or according to the captions corresponding with audio frequency or video, obtain text message, or can also obtain text message according to the manuscript corresponding with audio frequency or video.It will be understood by those skilled in the art that division and text message for region, can carry out artificial adjustment.
Text analyzing in step 220 can adopt text analysis technique of the prior art, repeats no more herein.The first keyword sequence of step 220 comprises and the region keyword of showing the zone association of content.Region keyword is the keyword that can be used in identified region, and region keyword is such as being title, region high frequency words or control command word etc. at different levels.Wherein, the region high frequency words as region keyword there will not be conventionally in different regions.It will be understood by those skilled in the art that when using region high frequency words as region keyword, can filter everyday words, thus avoid everyday words because the frequency of occurrences is high as region keyword.Common word is such as being conjunction, pronoun etc.In one embodiment of this invention, can carry out artificial adjustment or appointment to region keyword, thereby make the region keyword can be better and zone association.To introduce the example that is shown as of a certain forest, this displaying content comprises a plurality of regions, is respectively the position of forest, the seeds that forest is included, and the Animal resources that forest is included, forest is to adjusting of periphery weather etc.In example as shown in Figure 3, the first keyword sequence is such as comprising: geographic position, Jilin Province, plant resources, kahikatea, Animal resources, golden eagle, climate effect, humidity etc.Wherein, geographic position, Zhe Liangge region, Jilin Province keyword are all associated with this region, position of forest.
Wherein, the first prototype network can also comprise the first aligned phoneme sequence.The first aligned phoneme sequence consists of a plurality of phonemes.Those skilled in the art are known, and phoneme is minimum voice unit, be for example initial consonant and simple or compound vowel of a Chinese syllable, and be for example phonetic symbol for English for Chinese.In one embodiment of the invention, the first aligned phoneme sequence has comprised a kind of all phonemes of language, for example, comprised all initial consonants and the simple or compound vowel of a Chinese syllable of Chinese.
In step 230, voice messaging at least comprises current speech segment.The present invention does not specifically limit the cutting method of voice snippet, can use voice cutting method of the prior art.The voice messaging relevant to showing content be the language of exhibitor scene in displaying normally.In one embodiment of this invention, this language can be exhibitor's natural language, but not the particular command statement sending.
In step 240, can be exported accordingly after using the first prototype network to analyze current speech segment.If judgement current speech segment is the some regions keyword in the first keyword sequence, this output can be this region keyword; If any region keyword in non-the first keyword sequence of judgement current speech segment, this is output as the phoneme of this voice snippet obtaining according to the first aligned phoneme sequence.In one embodiment of this invention, above-mentioned judgement can be made by the competition results based on the degree of confidence of institute's containing element in the first keyword sequence and the first aligned phoneme sequence.
In an embodiment of the present invention, if be output as region keyword, can judge that the corresponding region of current speech segment is the region associated with this region keyword.
In one embodiment of the invention, after step 240, can also comprise jump procedure: displaying content is jumped to the corresponding region of current speech segment.Can realize like this and show that content, according to the automatic redirect in region, has reduced manual operation.Optionally, if the corresponding region of current speech segment is identical with current region, can not carry out this redirect, still show current region; If the corresponding region of current speech segment is different from current region, carry out this redirect, thereby show this corresponding region of current voice snippet, now, this corresponding region of current voice snippet is current region.When specifically carrying out redirect, for document or electronic slides etc., can directly jump to page corresponding to region or section; And for video, audio frequency etc., can directly jump to timestamp corresponding to region.The information of the page corresponding with zones of different, section or timestamp can be preset, and also can obtain by text analyzing.
It will be appreciated by those skilled in the art that, the data processing method that the embodiment of the present invention provides not only can be for showing that content is according to the automatic redirect in region, also there is other application, for example, for showing that content processes, the operation such as is deleted, moves in the corresponding region of current speech segment.
The method providing by above-described embodiment, can realize for showing that content is according to the automatic redirect in region according to exhibitor's language, avoided in displaying exhibitor or other people to carry out artificial redirect, making to show can be more complete, smooth, does not also need the cooperation between exhibitor and other operating personnel.Further, because said method can be processed exhibitor's natural language, and be not limited to command statement, make whole displaying more complete, natural, and without exhibitor, remember specific command statement, the complexity that minimizing method is implemented.Particularly when exhibitor carries out remote exhibition, the on-the-spot sound that can only hear exhibitor, the scheme providing by the above embodiment of the present invention can be analyzed exhibitor's voice messaging, thereby realize the automatic redirect of showing content, while having avoided remote exhibition, for showing the unmanageable problem of content.
In an embodiment of the present invention, step 240 specifically can comprise: obtain the degree of confidence of at least one region keyword in described the first keyword sequence, and wherein larger with the degree of confidence of the higher region keyword of the similarity of current speech segment; If the degree of confidence of a region keyword reaches a threshold value, determine the associated region of keyword, Wei Gai region, the corresponding region of current speech segment.In another embodiment of the present invention, if can be that the degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value, determine Wei Gai region, the corresponding region of current speech segment.Wherein, a plurality of degree of confidence reach the concrete quantity of the associated region keyword of the same area of a threshold value, can preset.In another embodiment of the present invention, if can also be a plurality of regions keyword associated with the same area degree of confidence and reach a threshold value, determine Wei Gai region, the corresponding region of current speech segment.Wherein, the degree of confidence of a plurality of regions keyword and can be directly add and, can be also weighted sum.When the degree of confidence that adopts a plurality of regions keyword judges, be more conducive to judging area accurately, reduced the possibility of erroneous judgement.It will be understood by those skilled in the art that the specific implementation that above-described embodiment provides is only an example, can also make the combination of above-mentioned implementation, or can also utilize in other way the first prototype network to carry out speech analysis.
In an embodiment of the present invention, can be the degree of confidence that obtains All Ranges keyword in the first keyword sequence.When carrying out speech analysis, can judge the similarity of All Ranges keyword in current speech segment and the first keyword sequence, the degree of confidence that similarity is higher is larger, obtain the region keyword of degree of confidence maximum, whether the degree of confidence that judges this region keyword has reached a threshold value, judges the associated region of keyword, Wei Gai region, the corresponding region of current speech segment if reach.Another implementation can be the region keyword in current speech segment and the first keyword sequence to be carried out to order compare, when the degree of confidence of a certain region keyword reaches a threshold value, just directly judge the associated region of keyword, Wei Gai region, the corresponding region of current speech segment.The similarity that it will be understood by those skilled in the art that region key words current speech segment can be pronunciation similarity, can be also text similarity.
In an embodiment of the present invention, further can calculate the similarity of the phoneme in current speech segment and the first aligned phoneme sequence, the phoneme degree of confidence that similarity is higher is larger, obtains the phoneme of degree of confidence maximum or the phoneme that degree of confidence reaches a threshold value.If the degree of confidence of the region keyword obtaining according to said method is compared with the degree of confidence of the phoneme obtaining, gap reaches a threshold value, judges and in current speech segment, does not comprise any region keyword.
In an embodiment of the present invention, can also be by the first aligned phoneme sequence judgement corresponding region of current speech segment in step 240.Concrete, in speech analysis, can obtain at least one phoneme adjacent with current speech segment according to the first aligned phoneme sequence; Judge the pronunciation similarity of the corresponding text message of this at least one phoneme and this at least one region keyword, the corresponding text message of this at least one region keyword comprises the context of this at least one region keyword in text message; If the pronunciation similarity of the corresponding text message of this at least one phoneme and at least one region keyword reaches a threshold value, improve the degree of confidence that this pronunciation similarity reaches the region keyword of threshold value.In embodiments of the present invention, can be in current speech segment, whether to comprise that region keyword all obtains at least one phoneme that current speech segment is adjacent, or can be in the time of may comprising region keyword in current speech segment, for example, when the degree of confidence of region keyword is higher than a threshold value, obtain at least one phoneme that current speech segment is adjacent.Optionally, phoneme and contextual pronunciation similarity in order to judge that more accurately this is adjacent, can obtain more adjacent phoneme.Region keyword in the first keyword sequence has its context in text message, also be its corresponding text message, the adjacent phoneme obtaining and these corresponding text messages can be compared, and when its pronunciation similarity reaches a threshold value, improve the degree of confidence of corresponding region keyword.It will be understood by those skilled in the art that this scheme has other implementation, for example, only select the highest corresponding text message of pronunciation similarity, and improve the degree of confidence of this corresponding region of corresponding text message keyword.Or according to different pronunciation similarities, different for the adjustment of degree of confidence, pronunciation similarity is higher, what degree of confidence was enhanced is larger.By the adjustment to the degree of confidence of region keyword, can be so that determining of region be more accurate.And, owing to being to the pronunciation judgement of similarity but not text similarity, so when exhibitor may rhotacism, or while there is accent, also can use the method to judge.
It will be understood by those skilled in the art that in above-described embodiment all with similarity more high confidence level more greatly example describe, but degree of confidence also can be carried out contrary setting, more high confidence level is lower for similarity, so corresponding Rule of judgment also can be contrary.
In one embodiment of the invention, not only the prototype network of one deck can be set up by the way, two layer model networks can also be set up, by the structure of two layer model networks, not only can realize for the judgement of showing keyword in content, can also further improve the accuracy of region identification.An example of the second prototype network has been shown in Fig. 3.Describe the foundation of two layer model networks below in detail.
In embodiment shown in Fig. 2, the method may further include: obtain a plurality of the second keyword sequences, at least one second keyword sequence in described the second keyword sequence is corresponding with at least one region in a plurality of regions, and at least one second keyword sequence comprises at least one keyword; According to described a plurality of the second keyword sequences, obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described a plurality of the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of current speech segment.Accordingly, during the judgement corresponding region of current speech segment, not only want the degree of confidence of judging area keyword, also will judge the degree of confidence of corresponding the second keyword sequence of current region.Concrete, whether the degree of confidence that judges corresponding the second keyword sequence of current region is less than a threshold value, if be less than this threshold value, and the degree of confidence of the corresponding region of current speech segment keyword meets the requirement described in above-described embodiment, judge that the corresponding region of current speech segment is the associated region of region keyword satisfying condition.The degree of confidence of the second keyword sequence obtains according to the degree of confidence of keyword included in the second keyword sequence, such as the adding and or weighted sum etc. of degree of confidence that is included keyword.Can see, by utilizing the confirmation of the second keyword sequence auxiliary area in the second prototype network, the accuracy that further reinforced region is confirmed.
In one embodiment of the invention, can also utilize the auxiliary confirmation of carrying out region of the second aligned phoneme sequence in the second prototype network.Wherein, need judgement when using the second prototype network to analyze voice messaging, whether the number of times that obtains output according to the second aligned phoneme sequence reaches a threshold value, if reach and the degree of confidence of region keyword meets the requirement described in above-described embodiment, judge that the corresponding region of current speech segment is the associated region of region keyword satisfying condition.
In order to carry out association to the keyword and the current speech segment that comprise in displaying content, in one embodiment of the invention, this data processing method also comprises: use described the second prototype network to analyze current speech segment, to judge the corresponding keyword of current speech segment.Can be with reference to above-described embodiment while using the second prototype network to analyze voice messaging, for example obtain the degree of confidence of at least one keyword in the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of current speech segment, and according to the corresponding keyword of degree of confidence judgement current speech segment of keyword.By said method, can be according to exhibitor's voice, to showing that the keyword in content carry out association, and need not exhibitor or other people manually mark.Optionally, in the present embodiment, can further include the step of mark, also, in showing content, mark the corresponding keyword of current speech segment.By to showing that the keyword in content marks automatically, guaranteed the integrality of showing, also saved manpower.For example, exhibitor, when talking about this region, geographic position of forest, when mentioning longitude, can mark the longitude in displaying content, thereby causes audience or spectators' attention.It will be understood by those skilled in the art that for the concrete label technology of keyword and can adopt prior art, and mode is various, for example, keyword is carried out highlightedly, or pour down line in keyword subscript, or show keyword etc. in video content.And the speech recognition speed that the structure of two layer model networks can avoid keyword too much to cause is crossed slow problem, can also improve the granularity of speech recognition.It will be understood by those skilled in the art that after having determined the corresponding keyword of current speech segment to have other application process, such as keyword is recorded, statistics etc.
In above-described embodiment, the first keyword sequence is set for region, and keyword in each region that the second keyword sequence comprises.Be appreciated that the second keyword sequence and region might not be relations one to one, the region for example having may not have the second corresponding keyword sequence, and the second keyword sequence having may be corresponding with a plurality of regions, for example the corresponding keyword in a plurality of regions is all the same, just can use same the second keyword sequence.In example before, mentioned, and for the high frequency words all occurring in a plurality of regions, conventionally can not be used as region keyword, but this high frequency words can be used as the keyword in the second keyword sequence, because the second keyword sequence is for each region.And, for the keyword in the second keyword sequence, can carry out artificial adjustment and setting, for example exhibitor wishes that the word highlighting also can be used as the keyword in the second keyword sequence.Conventionally, the keyword in the second keyword sequence can be the high frequency words in this region, or other exhibitors wish the word that is marked in displaying or emphasizes.
In one embodiment of the invention, the second prototype network can also comprise the second aligned phoneme sequence.The second aligned phoneme sequence can be identical with the first aligned phoneme sequence, also can be different.Same, the second aligned phoneme sequence also consists of phoneme.In the second prototype network, can comprise one or more second aligned phoneme sequence, corresponding second aligned phoneme sequence of a plurality of the second keyword sequences for example, or corresponding second aligned phoneme sequence of each second keyword sequence, wherein a plurality of the second aligned phoneme sequence can be the same or different.
In one embodiment of the invention, after having determined the corresponding region of current speech segment by speech analysis, can use the second prototype network corresponding to this region to analyze this current voice snippet, thereby realize determining keyword.In another embodiment of the present invention, can use the first prototype network and the second prototype network to analyze current speech segment simultaneously, and while all comprising same keyword in a plurality of regions, in conjunction with definite region decision, should be the keyword in which region.
In one embodiment of the invention, also can pass through the degree of confidence change of the second aligned phoneme sequence to keyword.For example, according to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in text message; If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches a threshold value, improve the degree of confidence that this pronunciation similarity reaches the keyword of threshold value.
In one embodiment of the invention, because a keyword in the displaying content in a region may occur repeatedly, for which keyword judgement more accurately should mark, can utilize the second aligned phoneme sequence to carry out auxiliary judgment.Concrete may be embodied as: according to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; If the degree of confidence of at least one keyword reaches first threshold, the keyword that definite this degree of confidence reaches first threshold is candidate keywords; The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and this candidate keywords, the corresponding text message of described candidate keywords comprises the context of this candidate keywords in text message; If the pronunciation similarity of in the corresponding text message of described at least one phoneme and this candidate keywords reaches Second Threshold, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches Second Threshold.In said method step, obtain candidate keywords and obtain there is no specific execution sequence between at least one phoneme adjacent with current speech segment, can successively carry out, also can carry out simultaneously.For example; when exhibitor tells about the Animal resources of forest; in text message, two positions, all there is northeastern tiger; a position is " the mammal resource existing in forest has: northeastern tiger, sika deer ", and another position is " wherein animals under first-class state protection have: northeastern tiger, golden eagle ".Can see and in the same area, have two positions all to occur same keyword, so just need to judge by the phoneme adjacent with current speech segment which the corresponding keyword of current speech segment is actually.By utilizing the second aligned phoneme sequence, can carry out more careful, judgement exactly to the keyword in text message.
In one embodiment of the invention, can to text message, carry out when text analyzing obtains the first keyword sequence also obtaining the second keyword sequence; Or the second keyword sequence can obtain obtaining according to text analyzing after obtaining the first keyword sequence generation.
In one embodiment of the invention, can be using predefined text message as keyword or the context of region keyword.Like this can be so that determining of the judgement in region and keyword be more flexible, for example exhibitor previews before displaying, find identification error or the keyword misjudgment in some region, voice messaging in the time of can be previewed or other text messages that is more conducive to judgement are as the region keyword making a mistake or the context of keyword, thus the accuracy of identification while having improved formal displaying.
Between above-mentioned each embodiment of the method, can be bonded to each other and reference, obtain more embodiment.The method providing by above-described embodiment, can realize the automatic redirect for region, and can realize for the mark of showing keyword in content.And, utilizing the output of the second aligned phoneme sequence, can locate more accurately the keyword of needs mark, because this output will draw originally in speech analysis, so do not increase extra work amount.Utilize the second aligned phoneme sequence auxiliary judgment whether to carry out the redirect in region.Utilize the first aligned phoneme sequence can judge more exactly the corresponding keyword of current speech segment, thereby obtain more accurately the corresponding region of current speech segment, and carry out the redirect in region.Therefore, according to above-described embodiment, not only can realize the robotization of showing content redirect, mark, can also improve the degree of accuracy of speech recognition, can't increase concrete calculated amount, can not consume more resource simultaneously.
In above-described embodiment and following each embodiment all there is threshold value in many places, and these threshold values can be the same or different, and the present invention does not carry out concrete restriction.
Fig. 4 shows a kind of methods of exhibiting that the embodiment of the present invention provides.The method comprises: step 410, and to obtain and show text message corresponding to content, this displaying content comprises a plurality of regions; Step 420, the text message obtaining is carried out to text analyzing, obtain a plurality of the second keyword sequences, at least one second keyword sequence in the plurality of the second keyword sequence is corresponding with at least one region in a plurality of regions, and at least one second keyword sequence comprises at least one keyword; Step 430, obtains and the voice messaging of showing that content is relevant; Step 440, according to the second keyword sequence, obtains the degree of confidence of at least part of keyword at least part of the second keyword sequence; Step 450, is less than a threshold value in response to the degree of confidence of corresponding the second keyword sequence of current region, and current region is left in redirect.
In the present embodiment, concrete implementation detail can be with reference to the embodiment shown in Fig. 2.From embodiment illustrated in fig. 2 different be, the identification in region embodiment illustrated in fig. 2 is mainly the judgement that relies on the region keyword in the first keyword sequence, and the identification in region embodiment illustrated in fig. 4 is mainly the judgement that relies on the keyword in the second keyword sequence.Can see, because the second keyword sequence correspondence region, so if the degree of confidence of second keyword sequence corresponding with current region is too low, current region has been left in the explanation that can judge exhibitor, enter next region, therefore, carried out the redirect in region.By said method, can realize for the automatic region redirect of showing content, save manually-operated manpower, and improved the integrality of showing.
In one embodiment of the invention, can also to the redirect in region, control in conjunction with the region keyword in the first keyword sequence.Concrete, can obtain according to the embodiment shown in Fig. 2 the degree of confidence of at least one region keyword in the first keyword sequence, and when first condition meets, jump to the region keyword associated region related with first condition.This first condition is for below one of at least: the degree of confidence of a region keyword reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area and reach a threshold value.
In one embodiment of the invention, can also confirm together region in conjunction with the region keyword in the first keyword sequence, concrete method can be with reference to embodiment illustrated in fig. 2.
In one embodiment of the invention, can also to the redirect in region, control according to the degree of confidence of other the second keyword sequences.For example, if the degree of confidence of one second keyword sequence reaches a threshold value, jump to this corresponding region of the second keyword sequence.Because the degree of confidence of the second keyword sequence corresponding to current region is very low, and the degree of confidence of another the second keyword sequence is higher, and this is to judge, should leave current region and jump to this another the second corresponding region of keyword sequence.
In one embodiment of the invention, can also be to showing that the keyword in content marks.Concrete, if the degree of confidence of a keyword reaches a threshold value, determine that the corresponding keyword of current speech segment is this keyword, and mark this keyword in showing content.
In one embodiment of the invention, can also change according to the second aligned phoneme sequence the degree of confidence of keyword.Concrete grammar can be with reference to the embodiment shown in Fig. 2.
Embodiment, also can have advantages of that two layer model networks have as shown in Figure 4, and its concrete implementation can, with reference to the embodiment shown in Fig. 2, repeat no more herein.
As shown in Figure 5, the embodiment of the present invention provides a kind of device 500 for data processing.This device 500 comprises: text acquisition module 510, and be configured to obtain and show text message corresponding to content, wherein said displaying content comprises a plurality of regions; Text analysis model 520, is configured to described text message to carry out text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; Voice acquisition module 530, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first speech analysis module 540, is configured to use the first prototype network to analyze current speech segment, and to judge the corresponding region of current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
According to one embodiment of the invention, the first speech analysis module 540 comprises: the first degree of confidence submodule, be configured to according to described the first keyword sequence, obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein, larger with the degree of confidence of the higher region keyword of the similarity of current speech segment; Submodule is determined in region, if be configured to first condition, meets, and determines that the corresponding region of described current speech segment is the related associated region of region keyword of described the 6th condition; Wherein, described the 6th condition comprise following one of at least: the degree of confidence of a region keyword reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area and reach a threshold value.
According to one embodiment of the invention, the first prototype network further comprises the first aligned phoneme sequence.Stating the first speech analysis module 540 further comprises: the first phoneme submodule, be configured to according to described the first aligned phoneme sequence, and obtain at least one phoneme adjacent with current speech segment; The first similarity judgement submodule, the pronunciation similarity that is configured to the corresponding text message of described at least one phoneme of judgement and described at least one region keyword, the corresponding text message of described at least one region keyword comprises the context of this at least one region keyword in text message; First adjusts submodule, if be configured to the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one region keyword, reaches a threshold value, improves the degree of confidence that this pronunciation similarity reaches the region keyword of threshold value.
According to one embodiment of the invention, device 500 further comprises: keyword module, be configured to obtain a plurality of the second keyword sequences, wherein, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; The second speech analysis module, is configured to use the second prototype network to analyze current speech segment, and to judge the corresponding keyword of current speech segment, described the second prototype network comprises described the second keyword sequence.
In one embodiment of the invention, the second prototype network further comprises the second aligned phoneme sequence.And the second speech analysis module comprises: the second phoneme submodule, be configured to according to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; The second degree of confidence submodule, is configured to obtain the degree of confidence of at least one keyword in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment; Candidate judges submodule, if be configured to the degree of confidence of at least one keyword, reaches the 5th threshold value, and the keyword that definite this degree of confidence reaches the 5th threshold value is candidate keywords; The second similarity judges submodule, is configured to the pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and this candidate keywords, and the corresponding text message of described definite keyword comprises the context of this candidate keywords in text message; Submodule determined in keyword, if the pronunciation similarity of being configured in the corresponding text message of described at least one phoneme and this candidate keywords reaches the 6th threshold value, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches the 6th threshold value.
In one embodiment of the invention, install 500 and can also comprise redirect module and/or labeling module.Redirect module is configured to displaying content to jump to the corresponding region of current speech information.Labeling module is configured to mark the corresponding keyword of current speech information in showing content.
In one embodiment of the invention, install 500 and can also comprise other modules, be configured to carry out other steps embodiment illustrated in fig. 2, specifically can, with reference to embodiment illustrated in fig. 2, repeat no more herein.And installing the technique effect that between 500 included modules and module, relation is brought can be with reference to the embodiment shown in Fig. 2.
Above-mentioned can be each other between embodiment illustrated in fig. 5 with reference to, in conjunction with obtaining more embodiment.
As shown in Figure 6, the embodiment of the present invention provides a kind of device 600 for showing.This device 600 comprises: text acquisition module 610, and be configured to obtain and show text message corresponding to content, wherein, described displaying content comprises a plurality of regions; Text analysis model 620, be configured to described text message to carry out text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Voice acquisition module 630, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment; The first degree of confidence module 640, is configured to obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of current speech segment; The second degree of confidence module 650, is configured to the degree of confidence according to described keyword, obtains the degree of confidence of corresponding the second keyword sequence of current region; Redirect module 660, is configured to be less than a threshold value in response to the degree of confidence of corresponding the second keyword sequence of described current region, and described current region is left in redirect.
In one embodiment of the invention, this device 600 also comprises: region keyword module, be configured to obtain the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association; The 3rd degree of confidence module, is configured to obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein larger with the degree of confidence of the higher region keyword of the similarity of current speech segment.And redirect module 660, is specifically configured to, if the 3rd condition meets, jump to the associated region of the region keyword related with described the 3rd condition; Wherein, described the 3rd condition comprise following one of at least: the degree of confidence of a region keyword reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area all reaches a threshold value; The degree of confidence of a plurality of regions keyword associated with the same area and all reach a threshold value.
In one embodiment of the invention, redirect module 660 is specifically configured to, if second condition meets, jumps to the region that second keyword sequence related with described second condition is corresponding; Wherein, described second condition comprises: the degree of confidence of the second keyword sequence reaches a threshold value.
In one embodiment of the invention, install 600 and further comprise: determination module, if be configured to the degree of confidence of a keyword, reach a threshold value, determine that the corresponding keyword of current speech segment is this keyword; Labeling module, is configured to mark this keyword in described displaying content.
In one embodiment of the invention, install 600 and further comprise: phoneme module, be configured to according to the second aligned phoneme sequence, obtain at least one phoneme adjacent with current speech segment; Similarity judge module, be configured to, the pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in text message; Degree of confidence adjusting module, is configured to, if the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches a threshold value, improves the degree of confidence that this pronunciation similarity reaches the keyword of threshold value.
Between each embodiment shown in Fig. 6, reference, combination each other, obtains more embodiment.And the details that realizes in said apparatus embodiment can be with reference to the embodiment shown in Fig. 4.
Process flow diagram in accompanying drawing and block diagram have shown the system according to a plurality of embodiment of the present invention, architectural framework in the cards, function and the operation of method and computer program product.In this, each square frame in process flow diagram or block diagram can represent a part for module, program segment or a code, and a part for described module, program segment or code comprises one or more for realizing the executable instruction of the logic function of regulation.Also it should be noted that what the function marking in square frame also can be marked to be different from accompanying drawing occurs in sequence in some realization as an alternative.For example, in fact two continuous square frames can be carried out substantially concurrently, and they also can be carried out by contrary order sometimes, and this determines according to related function.Also be noted that, each square frame in block diagram and/or process flow diagram and the combination of the square frame in block diagram and/or process flow diagram, can realize by the special-purpose hardware based system of the function putting rules into practice or operation, or can realize with the combination of specialized hardware and computer instruction.
Below described various embodiments of the present invention, above-mentioned explanation is exemplary, exhaustive not, and be also not limited to each disclosed embodiment.In the situation that do not depart from the scope and spirit of each illustrated embodiment, many modifications and changes are all apparent for those skilled in the art.The selection of term used herein, is intended to explain best principle, practical application or the technological improvement to the technology in market of each embodiment, or makes other those of ordinary skill of the art can understand each embodiment disclosing herein.

Claims (20)

1. a data processing method, described method comprises:
Obtain and show text message corresponding to content, described displaying content comprises a plurality of regions;
Described text message is carried out to text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association;
Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment;
Use the first prototype network to analyze described current speech segment, to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
2. method according to claim 1, described the first prototype network of described use is analyzed described current speech segment, to judge the corresponding region of described current speech segment, comprising:
Obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein larger with the degree of confidence of the higher region keyword of the similarity of described current speech segment;
If first condition meets, determine that the corresponding region of described current speech segment is the associated region of region keyword relating in described first condition;
Wherein said first condition comprise following one of at least:
The degree of confidence of a region keyword reaches first threshold;
The degree of confidence of a plurality of regions keyword associated with the same area all reaches Second Threshold;
The degree of confidence of a plurality of regions keyword associated with the same area and reach the 3rd threshold value.
3. method according to claim 2,
Described the first prototype network further comprises the first aligned phoneme sequence;
Described the first prototype network of described use is analyzed described current speech segment, to judge the corresponding region of described current speech segment, further comprises:
According to described the first aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one region keyword, the corresponding text message of described at least one region keyword comprises the context of this at least one region keyword in described text message;
If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one region keyword reaches the 4th threshold value, improve the degree of confidence that this pronunciation similarity reaches the region keyword of the 4th threshold value.
4. method according to claim 2,
Described method further comprises: obtain a plurality of the second keyword sequences, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
If described first condition meets, determine that the corresponding region of described current speech segment is the associated region of region keyword relating in described first condition, comprise: if first condition meets and second condition also meets, determine that the corresponding region of described current speech segment is the associated region of region keyword that described first condition relates to, and using the corresponding region of described current speech segment as current region;
Wherein, described second condition comprises: the degree of confidence of corresponding the second keyword sequence of current region is less than the 5th threshold value, and the degree of confidence of corresponding the second keyword sequence of described current region obtains according to the degree of confidence of keyword included in corresponding the second keyword sequence of described current region.
5. method according to claim 2,
Described method further comprises: obtain a plurality of the second keyword sequences, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword; Use the second prototype network to analyze described voice snippet, described the second prototype network comprises described the second keyword sequence and the second aligned phoneme sequence;
If described first condition meets, determine that the corresponding region of described current speech segment is the associated region of region keyword relating in described first condition, comprise: if first condition meets and the 3rd condition also meets, determine that the corresponding region of described current speech segment is the associated region of region keyword that described first condition relates to;
Wherein, described the 3rd condition comprises: when using described the second prototype network to analyze described current speech segment, the number of times that obtains output according to described the second aligned phoneme sequence reaches the 6th threshold value.
6. method according to claim 1, described method further comprises:
Obtain a plurality of the second keyword sequences, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
Use the second prototype network to analyze described current speech segment, to judge the corresponding keyword of described current speech segment, described the second prototype network comprises described the second keyword sequence.
7. method according to claim 6, described the second prototype network of described use is analyzed described current speech segment, to judge the corresponding keyword of described current speech segment, comprising:
Obtain the degree of confidence of at least one keyword in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
If the degree of confidence of at least one keyword reaches the 7th threshold value, determine that the corresponding keyword of described current speech segment reaches the keyword of the 7th threshold value for this degree of confidence.
8. method according to claim 7,
Described the second prototype network further comprises the second aligned phoneme sequence;
Described the second prototype network of described use is analyzed described current speech segment, to judge the corresponding keyword of described current speech segment, comprising:
According to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in text message;
If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches the 8th threshold value, improve the degree of confidence that this pronunciation similarity reaches the keyword of the 8th threshold value.
9. method according to claim 7,
Described the second prototype network further comprises the second aligned phoneme sequence;
If the degree of confidence of described at least one keyword reaches the 7th threshold value, determine that the corresponding keyword of described current speech segment, for this degree of confidence reaches the keyword of the 7th threshold value, comprising:
According to described the second aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
If the degree of confidence of at least one keyword reaches the 7th threshold value, the keyword that definite this degree of confidence reaches the 7th threshold value is candidate keywords;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and this candidate keywords, the corresponding text message of described candidate keywords comprises the context of described candidate keywords in text message;
If the pronunciation similarity of in the corresponding text message of described at least one phoneme and described candidate keywords reaches the 9th threshold value, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches the 9th threshold value.
10. a methods of exhibiting, described method comprises:
Obtain and show text message corresponding to content, described displaying content comprises a plurality of regions;
Described text message is carried out to text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
Obtain the voice messaging relevant to described displaying content, described voice messaging at least comprises current speech segment;
Obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
According to the degree of confidence of described keyword, obtain the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region;
Degree of confidence in response to corresponding the second keyword sequence of described current region is less than the tenth threshold value, and described current region is left in redirect.
11. methods according to claim 10, described method further comprises:
Obtain the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association;
According to described the first keyword sequence, obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein larger with the degree of confidence of the higher region keyword of the similarity of described current speech segment;
Described redirect is left described current region and is comprised: if the 4th condition meets, jump to described the 4th condition in the associated region of the region keyword that relates to, and using with described the 4th condition in the associated region of the region keyword that relates to as current region;
Wherein, described the 4th condition comprise following one of at least:
The degree of confidence of a region keyword reaches the 11 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area all reaches the 12 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area and reach the 13 threshold value.
12. methods according to claim 10, described redirect is left described current region and is comprised: if the 5th condition meets, jump to the region that second keyword sequence related with described the 5th condition is corresponding, and using with described the 5th condition in the associated region of the region keyword that relates to as current region; Wherein, described the 5th condition comprises:
The degree of confidence of at least one the second keyword sequence reaches the 14 threshold value.
13. methods according to claim 10, described method further comprises:
If the degree of confidence of at least one keyword reaches the 15 threshold value, determine that the corresponding keyword of described current speech segment reaches the keyword of the 15 threshold value for this degree of confidence;
In described displaying content, mark the keyword that this degree of confidence reaches the 15 threshold value.
14. according to claim 10 to the method described in any one in 13, and described method further comprises:
According to the second aligned phoneme sequence, obtain at least one phoneme adjacent with described current speech segment;
The pronunciation similarity of the corresponding text message of described at least one phoneme of judgement and described at least one keyword, the corresponding text message of described at least one keyword comprises the context of this at least one keyword in described text message;
If the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one keyword reaches the 16 threshold value, improve the degree of confidence that this pronunciation similarity reaches the keyword of the 16 threshold value.
15. 1 kinds of devices for data processing, described device comprises:
Text acquisition module, is configured to obtain and shows text message corresponding to content, and wherein said displaying content comprises a plurality of regions;
Text analysis model, is configured to described text message to carry out text analyzing, obtains the first keyword sequence, described the first keyword sequence comprise with described a plurality of regions in the region keyword of at least one zone association;
Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment;
The first speech analysis module, is configured to use the first prototype network to analyze described current speech segment, and to judge the corresponding region of described current speech segment, wherein said the first prototype network comprises described the first keyword sequence.
16. devices according to claim 15, described the first speech analysis module comprises:
The first degree of confidence submodule, is configured to obtain the degree of confidence of at least one region keyword in described the first keyword sequence, wherein, larger with the degree of confidence of the higher region keyword of the similarity of described current speech segment;
Submodule is determined in region, if the 6th condition that is configured to meets, determines that the corresponding region of described current speech segment is the associated region of region keyword relating in described the 6th condition;
Wherein, described the 6th condition comprise following one of at least:
The degree of confidence of a region keyword reaches the 17 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area all reaches the 18 threshold value;
The degree of confidence of a plurality of regions keyword associated with the same area and reach the 19 threshold value.
17. devices according to claim 16,
Described the first prototype network further comprises the first aligned phoneme sequence;
Described the first speech analysis module further comprises:
The first phoneme submodule, is configured to according to described the first aligned phoneme sequence, obtains at least one phoneme adjacent with described current speech segment;
The first similarity judgement submodule, the pronunciation similarity that is configured to the corresponding text message of described at least one phoneme of judgement and described at least one region keyword, the corresponding text message of described at least one region keyword comprises the context of this at least one region keyword in text message;
First adjusts submodule, if be configured to the pronunciation similarity of the corresponding text message of described at least one phoneme and at least one region keyword, reaches the 20 threshold value, improves the degree of confidence that this pronunciation similarity reaches the region keyword of the 20 threshold value.
18. devices according to claim 15, described device further comprises:
Keyword module, be configured to obtain a plurality of the second keyword sequences, wherein, in described the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
The second speech analysis module, is configured to use the second prototype network to analyze described current speech segment, and to judge the corresponding keyword of described current speech segment, described the second prototype network comprises described the second keyword sequence.
19. devices according to claim 18,
Described the second prototype network further comprises the second aligned phoneme sequence;
Described the second speech analysis module comprises:
The second phoneme submodule, is configured to according to described the second aligned phoneme sequence, obtains at least one phoneme adjacent with described current speech segment;
The second degree of confidence submodule, is configured to obtain the degree of confidence of at least one keyword in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
Candidate judges submodule, if be configured to the degree of confidence of at least one keyword, reaches the 21 threshold value, and the keyword that definite this degree of confidence reaches the 21 threshold value is candidate keywords;
The second similarity judgement submodule, the pronunciation similarity that is configured to the corresponding text message of described at least one phoneme of judgement and this candidate keywords, the corresponding text message of described candidate keywords comprises the context of this candidate keywords in described text message;
Submodule determined in keyword, if the pronunciation similarity of being configured in the corresponding text message of described at least one phoneme and this candidate keywords reaches the 22 threshold value, determine that the corresponding keyword of described current speech segment is contextual keyword for take the text message that this pronunciation similarity reaches the 22 threshold value.
20. 1 kinds of devices for showing, described device comprises:
Text acquisition module, is configured to obtain and shows text message corresponding to content, and wherein, described displaying content comprises a plurality of regions;
Text analysis model, be configured to described text message to carry out text analyzing, obtain a plurality of the second keyword sequences, in wherein said the second keyword sequence, at least one second keyword sequence is corresponding with at least one region in described a plurality of regions, and described at least one, the second keyword sequence comprises at least one keyword;
Voice acquisition module, is configured to obtain the voice messaging relevant to described displaying content, and described voice messaging at least comprises current speech segment;
The first degree of confidence module, is configured to obtain the degree of confidence of at least one keyword of at least one the second keyword sequence in described the second keyword sequence, wherein larger with the degree of confidence of the higher keyword of the similarity of described current speech segment;
The second degree of confidence module, is configured to the degree of confidence according to described keyword, obtains the degree of confidence of corresponding the second keyword sequence of current region in described a plurality of region;
Redirect module, is configured to be less than the 23 threshold value in response to the degree of confidence of corresponding the second keyword sequence of described current region, and described current region is left in redirect.
CN201210241787.1A 2012-07-12 2012-07-12 Data processing method, display method and corresponding devices Pending CN103544140A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201210241787.1A CN103544140A (en) 2012-07-12 2012-07-12 Data processing method, display method and corresponding devices
US13/924,832 US9158752B2 (en) 2012-07-12 2013-06-24 Data processing method, presentation method, and corresponding apparatuses
US13/943,308 US9158753B2 (en) 2012-07-12 2013-07-16 Data processing method, presentation method, and corresponding apparatuses

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210241787.1A CN103544140A (en) 2012-07-12 2012-07-12 Data processing method, display method and corresponding devices

Publications (1)

Publication Number Publication Date
CN103544140A true CN103544140A (en) 2014-01-29

Family

ID=49914715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210241787.1A Pending CN103544140A (en) 2012-07-12 2012-07-12 Data processing method, display method and corresponding devices

Country Status (2)

Country Link
US (2) US9158752B2 (en)
CN (1) CN103544140A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105895085A (en) * 2016-03-30 2016-08-24 科大讯飞股份有限公司 Multimedia transliteration method and system
CN107886938A (en) * 2016-09-29 2018-04-06 中国科学院深圳先进技术研究院 Virtual reality guides hypnosis method of speech processing and device
WO2018157789A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Speech recognition method, computer, storage medium, and electronic apparatus
CN110265018A (en) * 2019-07-01 2019-09-20 成都启英泰伦科技有限公司 A kind of iterated command word recognition method continuously issued
CN110364142A (en) * 2019-06-28 2019-10-22 腾讯科技(深圳)有限公司 Phoneme of speech sound recognition methods and device, storage medium and electronic device
CN110770819A (en) * 2017-06-15 2020-02-07 北京嘀嘀无限科技发展有限公司 Speech recognition system and method
CN111539197A (en) * 2020-04-15 2020-08-14 北京百度网讯科技有限公司 Text matching method and device, computer system and readable storage medium
CN111954864A (en) * 2018-04-11 2020-11-17 微软技术许可有限责任公司 Automated presentation control
CN112041905A (en) * 2018-04-13 2020-12-04 德沃特奥金有限公司 Control device for furniture drive and method for controlling furniture drive

Families Citing this family (157)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US9202460B2 (en) * 2008-05-14 2015-12-01 At&T Intellectual Property I, Lp Methods and apparatus to generate a speech recognition library
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US20120311585A1 (en) 2011-06-03 2012-12-06 Apple Inc. Organizing task items that represent tasks to perform
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
AU2014214676A1 (en) 2013-02-07 2015-08-27 Apple Inc. Voice trigger for a digital assistant
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
KR101922663B1 (en) 2013-06-09 2018-11-28 애플 인크. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
DE112014003653B4 (en) 2013-08-06 2024-04-18 Apple Inc. Automatically activate intelligent responses based on activities from remote devices
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US9396177B1 (en) * 2014-02-10 2016-07-19 Jpmorgan Chase Bank, N.A. Systems and methods for document tracking using elastic graph-based hierarchical analysis
KR102305117B1 (en) * 2014-04-30 2021-09-27 삼성전자주식회사 Method for control a text input and electronic device thereof
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9966065B2 (en) * 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
CN105632499B (en) * 2014-10-31 2019-12-10 株式会社东芝 Method and apparatus for optimizing speech recognition results
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10141010B1 (en) * 2015-10-01 2018-11-27 Google Llc Automatic censoring of objectionable song lyrics in audio
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
CN107205076A (en) * 2016-03-16 2017-09-26 广州阿里巴巴文学信息技术有限公司 The page turning method and device of a kind of e-book
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US12223282B2 (en) 2016-06-09 2025-02-11 Apple Inc. Intelligent automated assistant in a home environment
DK179588B1 (en) 2016-06-09 2019-02-22 Apple Inc. Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US12197817B2 (en) 2016-06-11 2025-01-14 Apple Inc. Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770429A1 (en) 2017-05-12 2018-12-14 Apple Inc. Low-latency intelligent automated assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770411A1 (en) 2017-05-15 2018-12-20 Apple Inc. MULTI-MODAL INTERFACES
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US20180336275A1 (en) 2017-05-16 2018-11-22 Apple Inc. Intelligent automated assistant for media exploration
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10916258B2 (en) * 2017-06-30 2021-02-09 Telegraph Peak Technologies, LLC Audio channel monitoring by voice to keyword matching with notification
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US20190129591A1 (en) * 2017-10-26 2019-05-02 International Business Machines Corporation Dynamic system and method for content and topic based synchronization during presentations
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
CN110072118A (en) * 2018-01-24 2019-07-30 优酷网络技术(北京)有限公司 Video matching method and device
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. USER ACTIVITY SHORTCUT SUGGESTIONS
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
DK201970511A1 (en) 2019-05-31 2021-02-15 Apple Inc Voice identification in digital assistant systems
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11227599B2 (en) 2019-06-01 2022-01-18 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
WO2021080033A1 (en) * 2019-10-23 2021-04-29 엘지전자 주식회사 Speech analysis method and device
JP6758732B1 (en) * 2020-01-06 2020-09-23 株式会社インタラクティブソリューションズ Presentation support system
CN111767391B (en) * 2020-03-27 2024-04-16 北京沃东天骏信息技术有限公司 Target text generation method, device, computer system and medium
US11183193B1 (en) 2020-05-11 2021-11-23 Apple Inc. Digital assistant hardware abstraction
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
US11620993B2 (en) * 2021-06-09 2023-04-04 Merlyn Mind, Inc. Multimodal intent entity resolver

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040210443A1 (en) * 2003-04-17 2004-10-21 Roland Kuhn Interactive mechanism for retrieving information from audio and multimedia files containing speech
US20060100851A1 (en) * 2002-11-13 2006-05-11 Bernd Schonebeck Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries
CN101034455A (en) * 2006-03-06 2007-09-12 腾讯科技(深圳)有限公司 Method and system for implementing online advertisement

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6272461B1 (en) * 1999-03-22 2001-08-07 Siemens Information And Communication Networks, Inc. Method and apparatus for an enhanced presentation aid
US20020099549A1 (en) * 2000-12-04 2002-07-25 Nguyen Khang Kv. Method for automatically presenting a digital presentation
JP4088131B2 (en) * 2002-03-28 2008-05-21 富士通株式会社 Synchronous content information generation program, synchronous content information generation device, and synchronous content information generation method
US20040210433A1 (en) * 2003-04-21 2004-10-21 Gidon Elazar System, method and apparatus for emulating a web server
US7725318B2 (en) * 2004-07-30 2010-05-25 Nice Systems Inc. System and method for improving the accuracy of audio searching
US7908141B2 (en) * 2004-09-29 2011-03-15 International Business Machines Corporation Extracting and utilizing metadata to improve accuracy in speech to text conversions
US20070106685A1 (en) * 2005-11-09 2007-05-10 Podzinger Corp. Method and apparatus for updating speech recognition databases and reindexing audio and video content using the same
US8171412B2 (en) * 2006-06-01 2012-05-01 International Business Machines Corporation Context sensitive text recognition and marking from speech
US8090570B2 (en) * 2006-10-26 2012-01-03 Mobile Technologies, Llc Simultaneous translation of open domain lectures and speeches
WO2008106655A1 (en) * 2007-03-01 2008-09-04 Apapx, Inc. System and method for dynamic learning
US7549120B1 (en) * 2008-04-07 2009-06-16 International Business Machines Corporation Method and system for analyzing a presentation
US8712776B2 (en) * 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US9031839B2 (en) * 2010-12-01 2015-05-12 Cisco Technology, Inc. Conference transcription based on conference data
US8954329B2 (en) * 2011-05-23 2015-02-10 Nuance Communications, Inc. Methods and apparatus for acoustic disambiguation by insertion of disambiguating textual information
WO2013163494A1 (en) * 2012-04-27 2013-10-31 Interactive Itelligence, Inc. Negative example (anti-word) based performance improvement for speech recognition
US9035955B2 (en) * 2012-05-16 2015-05-19 Microsoft Technology Licensing, Llc Synchronizing virtual actor's performances to a speaker's voice
US10019983B2 (en) * 2012-08-30 2018-07-10 Aravind Ganapathiraju Method and system for predicting speech recognition performance using accuracy scores
CN103971678B (en) * 2013-01-29 2015-08-12 腾讯科技(深圳)有限公司 Keyword spotting method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060100851A1 (en) * 2002-11-13 2006-05-11 Bernd Schonebeck Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries
US20040210443A1 (en) * 2003-04-17 2004-10-21 Roland Kuhn Interactive mechanism for retrieving information from audio and multimedia files containing speech
CN101034455A (en) * 2006-03-06 2007-09-12 腾讯科技(深圳)有限公司 Method and system for implementing online advertisement

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105895085B (en) * 2016-03-30 2019-10-18 讯飞智元信息科技有限公司 A kind of multimedia transfer method and system
CN105895085A (en) * 2016-03-30 2016-08-24 科大讯飞股份有限公司 Multimedia transliteration method and system
CN107886938B (en) * 2016-09-29 2020-11-17 中国科学院深圳先进技术研究院 Virtual reality guidance hypnosis voice processing method and device
CN107886938A (en) * 2016-09-29 2018-04-06 中国科学院深圳先进技术研究院 Virtual reality guides hypnosis method of speech processing and device
WO2018157789A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Speech recognition method, computer, storage medium, and electronic apparatus
CN110770819B (en) * 2017-06-15 2023-05-12 北京嘀嘀无限科技发展有限公司 Speech recognition system and method
CN110770819A (en) * 2017-06-15 2020-02-07 北京嘀嘀无限科技发展有限公司 Speech recognition system and method
CN111954864A (en) * 2018-04-11 2020-11-17 微软技术许可有限责任公司 Automated presentation control
CN111954864B (en) * 2018-04-11 2024-05-14 微软技术许可有限责任公司 Automated presentation control
CN112041905A (en) * 2018-04-13 2020-12-04 德沃特奥金有限公司 Control device for furniture drive and method for controlling furniture drive
CN110364142A (en) * 2019-06-28 2019-10-22 腾讯科技(深圳)有限公司 Phoneme of speech sound recognition methods and device, storage medium and electronic device
CN110364142B (en) * 2019-06-28 2022-03-25 腾讯科技(深圳)有限公司 Speech phoneme recognition method and device, storage medium and electronic device
CN110265018A (en) * 2019-07-01 2019-09-20 成都启英泰伦科技有限公司 A kind of iterated command word recognition method continuously issued
CN111539197A (en) * 2020-04-15 2020-08-14 北京百度网讯科技有限公司 Text matching method and device, computer system and readable storage medium
CN111539197B (en) * 2020-04-15 2023-08-15 北京百度网讯科技有限公司 Text matching method and device, computer system and readable storage medium

Also Published As

Publication number Publication date
US9158753B2 (en) 2015-10-13
US20140019121A1 (en) 2014-01-16
US9158752B2 (en) 2015-10-13
US20140019133A1 (en) 2014-01-16

Similar Documents

Publication Publication Date Title
CN103544140A (en) Data processing method, display method and corresponding devices
US11350178B2 (en) Content providing server, content providing terminal and content providing method
US10210769B2 (en) Method and system for reading fluency training
US9066049B2 (en) Method and apparatus for processing scripts
JP6928642B2 (en) Audio broadcasting method and equipment
CN111160004B (en) Method and device for establishing sentence-breaking model
CN109754783A (en) Method and apparatus for determining the boundary of audio sentence
CN110600002B (en) Voice synthesis method and device and electronic equipment
CN112632326A (en) Video production method and device based on video script semantic recognition
WO2022228235A1 (en) Method and apparatus for generating video corpus, and related device
CN111142667A (en) System and method for generating voice based on text mark
CN110517668A (en) A kind of Chinese and English mixing voice identifying system and method
CN110750996A (en) Multimedia information generation method and device and readable storage medium
CN113225612A (en) Subtitle generating method and device, computer readable storage medium and electronic equipment
KR102553511B1 (en) Method, device, electronic equipment and storage medium for video processing
CN115883919A (en) Video processing method, video processing device, electronic equipment and storage medium
CN113761865A (en) Sound and text realignment and information presentation method and device, electronic equipment and storage medium
CN113221514A (en) Text processing method and device, electronic equipment and storage medium
CN118784942B (en) Video generation method, electronic device, storage medium and product
CN111475708A (en) Push method, medium, device and computing equipment for follow-up reading content
CN111562864B (en) Picture display method, electronic device and computer readable medium
CN114694657A (en) Method for cutting audio file and related product
KR20100014031A (en) Device and method for making u-contents by easily, quickly and accurately extracting only wanted part from multimedia file
CN116153289A (en) Processing method and related device for speech synthesis marked text
CN116156248A (en) Video generation method, device, electronic device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140129

WD01 Invention patent application deemed withdrawn after publication
OSZAR »