CN102651217A - Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis - Google Patents
Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis Download PDFInfo
- Publication number
- CN102651217A CN102651217A CN2011100465804A CN201110046580A CN102651217A CN 102651217 A CN102651217 A CN 102651217A CN 2011100465804 A CN2011100465804 A CN 2011100465804A CN 201110046580 A CN201110046580 A CN 201110046580A CN 102651217 A CN102651217 A CN 102651217A
- Authority
- CN
- China
- Prior art keywords
- fuzzy
- contextual feature
- data
- mark
- polyphone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 33
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 33
- 238000003066 decision tree Methods 0.000 claims abstract description 54
- 238000012545 processing Methods 0.000 claims description 18
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 230000002194 synthesizing effect Effects 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000012120 mounting media Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a method and equipment for voice synthesis and a method for training an acoustic model used in voice synthesis. The method for voice synthesis includes the steps as follows: confirming that data generated by text analysis is fuzzy polyphone data; and performing fuzzy polyphone prediction for the fuzzy polyphone data, so as to output a plurality of candidate pronunciations and the probability thereof; generating the fuzzy context characteristic tagging based on the candidate pronunciations and the probability thereof; based on the acoustical model provided with a fuzzy decision tree, confirming model parameters direct at the fuzzy context characteristic tagging; generating voice parameters based on the model parameters; and synthesizing voice through the voice parameters. As per the method and equipment provided by the embodiment of the invention, the fuzzy treatment can be performed for polyphone words difficult for prediction in a Chinese text, so as to improve the synthesis quality of Chinese polyphones.
Description
Technical field
The present invention relates to phonetic synthesis, more specifically, relate to the synthetic of Chinese polyphone.
Background technology
Produce voice by manual work through certain machinery and equipment and be called phonetic synthesis.Phonetic synthesis is the important component part that the man machine language communicates by letter.Utilize speech synthesis technique can let machine resemble the people and speak, some are otherwise represented or canned data can convert voice into, thereby people can obtain these information easily through the sense of hearing.
What launch big quantity research and application at present is literary composition language conversion tts system; In this system, import text to be synthesized usually; The text analyzer that system comprises is handled it, output pronunciation descriptor, and it comprises the phonetic symbol of segment aspect and the prosodic sign of Supersonic section aspect.Text analyzer is at first according to Pronounceable dictionary; Text to be synthesized is decomposed into speech and the pronunciation symbol thereof that has attribute labeling; Again according to semantic rules and phonetic rules; For sentence structure and intonation confirmed in each speech, each syllable, and the linguistics and the prosodic features of target speech such as pause part of speech distance.The descriptor of will pronouncing afterwards is input to the compositor that this system comprises, through phonetic synthesis, and the voice that output is synthetic.
In the prior art, be widely used in speech synthesis technique, can have easily revised the sound synthetic with conversion based on Hidden Markov HMM acoustic model.Phonetic synthesis is divided into model training and composite part usually.In the model training stage, mark attributes such as parameters,acoustic that each voice unit in the sound bank is comprised and corresponding segment, the rhythm carry out the training of statistical model.These marks derive from language harmony and gain knowledge, and the contextual feature of its composition (context feature) has been described corresponding voice attributes (for example tone, part of speech etc.).In the training stage of HMM acoustic model, to the estimation of model parameter from statistical computation to these voice unit parameters.
In the prior art, consider so many, as to have a large amount of variations context combination, generally adopt the tree clustering method of decision tree to handle.Decision tree can gather into one type with candidate's primitive of contextual feature and acoustics feature similarity, thereby has avoided data sparse effectively, and has reduced the quantity of model effectively.Problem set is the set that supplies the problem of decision tree structure use, and the problem of being chosen during node division node is therewith bound, thereby determines which primitive to get into same leafy node.The process of cluster is with reference to predefined problem set; Each node of decision tree is all bound " Yes/No " problem; All candidate's primitives that allow to get into root node all will be answered the problem of binding on the node, select to get into left branch or right branch according to answering the result.Therefore, each will have in the same leafy node identical or that be in decision tree near the syllable or the phoneme of contextual feature, and the model that node is corresponding can be HMM model or state usually, and model is by parametric description.Simultaneously, cluster also is that a study processing runs into the process of new situation in synthetic, thereby can realize optimum coupling.The decision tree that obtains Hidden Markov (HMM) model and corresponding model through training and cluster to training data.
At synthesis phase, obtain the contextual feature mark of polyphone through text analyzer and context mark maker.Be labeled in to this contextual feature and find corresponding acoustic model parameter (the for example status switch of HMM acoustic model) on the decision tree that trains.This model parameter obtains the relevant voice parameter through parameter generation algorithm then, thereby through compositor (Vocoder) synthetic speech.
The target of speech synthesis system is exactly to synthesize sound equally intelligent with voice and nature.But for the Chinese speech synthesis system, the pronunciation predictablity rate of polyphone is difficult to guarantee, because the pronunciation of polyphone is often definite according to semanteme, and semantic understanding is a challenging problem.What complementary like this relation caused polyphone prediction is difficult to obtain gratifying high accuracy.In the prior art, even the prediction of this pronunciation is not enough held, speech synthesis system generally all can provide a definite pronunciation to this polyphone.
In Chinese, different significance represented in different pronunciations.If speech synthesis system is given the pronunciation make mistake, will cause the ambiguity of hearer on understanding, give the very bad impression of hearer.Thereby for the speech synthesis system of in life, work and scientific research (for example vehicle mounted guidance, automatic sound information service, broadcasting, robot simulation etc.), using; Will cause bad user experience owing to the polyphone pronunciation of apparent error, even the inconvenience of using.Therefore, in the phonetic synthesis field, exist the phoneme synthesizing method of improved polyphone and the needs of system.
Summary of the invention
For this reason, the method that is used for phonetic synthesis and the system thereof of embodiments of the invention and the method that training is used for the acoustic model of phonetic synthesis are provided.Embodiment through embodiment of the present invention; Can have the following advantages: can not have enough to hold to provide under the situation of right pronunciation in system; The pronunciation of obfuscation polyphone; And do not influence the quality of other normal sound of total system, the method will be avoided manifest error, thereby improves the whole subjective sense of hearing of synthesis system.
According to an aspect of the present invention, a kind of method that is used for phonetic synthesis is provided, can have comprised: confirmed that the data that text analyzing generates are fuzzy polyphone data; Said fuzzy polyphone data are blured the polyphone prediction, to export a plurality of candidate's pronunciations and the probability thereof of said fuzzy polyphone data; Based on said a plurality of candidate's pronunciations and probability thereof, generate fuzzy contextual feature mark; Based on the acoustic model of confirming, confirm model parameter to said fuzzy contextual feature mark with fuzzy decision-tree; Said model parameter is generated speech parameter; And said speech parameter synthesized voice.
Preferably, generating the step of bluring the contextual feature mark may further include: confirm that based on said probability the context mark of candidate's pronunciation of said fuzzy polyphone data falls into the degree of classification; And through quantizing the said degree of conversion generating said fuzzy contextual feature mark, wherein said fuzzy contextual feature be labeled as said candidate's pronunciation the context mark unite expression.
According to other aspect of the present invention, a kind of equipment that is used for synthetic speech is provided, can comprise: the polyphone predicting unit is used for the pronunciation of predictive fuzzy polyphone data, to export a plurality of candidate's pronunciations and the prediction probability of said fuzzy polyphone data; Fuzzy contextual feature mark generation unit is used for generating fuzzy contextual feature mark based on said a plurality of candidate's pronunciations and probability thereof; Confirm the unit, be used for, confirm model parameter to said fuzzy contextual feature mark based on the acoustic model of confirming with fuzzy decision-tree; Parameter generators is used for generating speech parameter to said model parameter; And compositor, be used for said speech parameter is synthesized voice.
Preferably, said fuzzy contextual feature mark generation unit can further be configured to: confirm that based on said probability the context mark of candidate's pronunciation of said fuzzy polyphone data falls into the degree of classification; And through quantizing the said degree of conversion generating said fuzzy contextual feature mark, wherein said fuzzy contextual feature be labeled as said candidate's pronunciation the context mark unite expression.
According to other aspect of the present invention, a kind of system that is used for synthetic speech is provided, can comprise: be used for confirming that the data that text analyzing generates are the device of fuzzy polyphone data; Be used for said fuzzy polyphone data are blured polyphone prediction, with a plurality of candidates' pronunciations of exporting said fuzzy polyphone data and the device of probability thereof; Be used for generating the device of fuzzy contextual feature mark based on said a plurality of candidate's pronunciations and probability thereof; Be used for based on acoustic model, confirm the device of model parameter to said fuzzy contextual feature mark with fuzzy decision-tree; Be used for said model parameter is generated the device of speech parameter; And the device that is used for said speech parameter is synthesized voice.
According to other aspect of the present invention, a kind of method that is used to train acoustic model is provided, can comprise: each voice unit in the training utterance storehouse, to generate acoustic model, said voice unit comprises parameters,acoustic and context mark; For the context combination, carry out the decision tree clustering processing has decision tree with generation acoustic model; Based on said acoustic model, confirm the fuzzy data in the sound bank with decision tree; To said fuzzy data, generate fuzzy contextual feature mark; And, said sound bank is carried out the cluster training based on said fuzzy contextual feature mark, have the acoustic model of fuzzy decision-tree with generation.
Preferably, the step of confirming fuzzy data may further include: the assessment voice unit; And confirm that candidate's context mark of said voice unit falls into the degree of classification; And if said degree satisfies predetermined threshold, confirm that then said voice unit is a fuzzy data.
Preferably, the step of assessment voice unit may further include: generate parameter and the distance between the voice unit parameter through model posterior probability or model and assess the score value that the contextual feature of candidate's pronunciation of said voice unit marks.
Preferably, generating the step of bluring the contextual feature mark may further include: confirm the score value of corresponding candidate's contextual feature mark of said voice unit pronunciation through assessing said voice unit; Confirm that based on said score value candidate's context mark of said voice unit falls into the degree of classification; And through quantizing the said degree of conversion generating said fuzzy contextual feature mark, wherein said fuzzy contextual feature be labeled as said candidate's pronunciation the context mark unite expression.
Preferably; Said based on said fuzzy contextual feature mark; It is one of following that the step of carrying out cluster training may further include: based on said fuzzy contextual feature mark and preset fuzzy problem collection, training comprises that the training set of said fuzzy data has the acoustic model of said fuzzy decision-tree with generation; And train each voice unit in the said sound bank once more based on problem set and contextual feature mark; Wherein said problem set also comprises preset fuzzy problem collection, and the contextual feature of the fuzzy data in the said sound bank is labeled as said fuzzy contextual feature mark.
Description of drawings
In conjunction with accompanying drawing, the object of the invention, characteristics and advantage will be obvious the detailed description of embodiments of the invention from facing down, wherein:
Fig. 1 shows the process flow diagram according to the method that is used to train the acoustic model with fuzzy decision-tree of the embodiment of the invention.
Fig. 2 show according to the method for the embodiment of the invention really cover half stick with paste the processing flow chart of data.
Fig. 3 shows the operation of passing through model posterior probability assessment training data of method according to an embodiment of the invention.
The model that passes through that Fig. 4 shows method according to an embodiment of the invention generates the operation that distance between parameter and the actual parameter is assessed training data.
Fig. 5 illustration according to an embodiment of the invention fuzzy data is quantized conversion operations to generate fuzzy context.
Fig. 6 illustration the method for synthetic speech according to an embodiment of the invention.
Fig. 7 is the block diagram according to the equipment that is used for synthetic speech of the embodiment of the invention.
Embodiment
Below, in conjunction with accompanying drawing embodiments of the invention are described in detail.
Usually, the embodiment of the invention relates to the method for synthetic speech in electronic equipment (for example telephone system, portable terminal, vehicle mounted traffic instrument, automatic sound information service system, broadcast system, robot etc. and/or analog) and the method for system and training acoustic model thereof.
Generally, basic design of the present invention is: synthetic to Chinese polyphone, and do not select the unique candidate who confirms to pronounce, but the voice of fuzzy polyphone are carried out Fuzzy processing, thereby avoided just providing in advance arbitrary decision even wrong choice.In an embodiment of the present invention, fuzzy polyphone is meant that polyphone predicting unit of the prior art is difficult to the polyphone of prediction processing; And fuzzy data be in the training utterance storehouse, because the speech data that the influence of the continuous speech coarticulation of speaker and accidental pronunciation error are produced; It satisfies hazy condition (usually can according to member function ambiguity in definition threshold value) and is used for model training; Correspondingly, this voice that are difficult for definite candidate's pronunciation are called fuzzy voice.Can introduce fuzzy decision-tree to realize this process preferably in training and synthesis phase; Fuzzy decision-tree is commonly used to handle uncertain; Can help to derive the more decision-making of intelligence on complicated with fuzzy border, thereby make the optimal selection under the ambiguity.And the pronunciation of obfuscation is intended to comprise the characteristic of each candidate's pronunciation, and candidate's pronunciation that particularly those probability are bigger can avoid producing the misjudgment of candidate's pronunciation like this, thereby reduces the probability of synthetic ear-piercing or wrong voice.
In an embodiment of the present invention; In the model training stage; Can introduce fuzzy decision-tree, the sound bank that comprises fuzzy data is further trained, obtain acoustic model (for example HMM acoustic model) and this model corresponding fuzzy decision tree (the HMM acoustic model that for example has fuzzy decision-tree); At synthesis phase, when the polyphone predicting unit can not provide suitable selection, then Fuzzy processing is carried out in the pronunciation of this word, with voice, thereby make synthetic sound more near the big candidate of prediction possibility in the synthetic correspondence of compositor.The processing of synthesis phase can be operated as follows: the probability that obtains a plurality of candidate's pronunciations through the polyphone predicting unit; Blur contextual feature and handle the fuzzy context mark that obtains having many candidates fuzzy characteristics; The acoustic model with fuzzy decision-tree, the basis that generate based on training should obtain corresponding model parameter by fuzzy context mark; This model parameter obtains the relevant voice parameter through parameter generation algorithm, thereby through compositor this speech parameter is synthesized voice.
Fig. 1 shows the process flow diagram according to the method that is used to train the acoustic model with fuzzy decision-tree of the embodiment of the invention.As shown in Figure 1, at step S110, each voice unit in the training utterance storehouse is to generate acoustic model.In an embodiment of the present invention, sound bank generally is a reference voice that prerecord, through phonetic entry port input.The context mark that each voice unit comprises parameters,acoustic and describes corresponding segment, rhythm attribute.
With the HMM acoustic model is example, and in the training stage of this model, from the statistical computation to these voice unit parameters, this is a widely used proven technique in this area, repeats no more at this to the estimation of model parameter.
At step S120,, adopt the tree clustering method of decision tree to handle acoustic model usually and have the acoustic model of decision tree, for example CART (Classification and Regression Tree) with generation for context combination with a large amount of variations.Adopt clustering method to avoid data sparse effectively, and reduce the quantity of model.Simultaneously, cluster also is that the process that in synthetic, runs into new situation is handled in study, can realize optimum coupling.The process of cluster is with reference to predefined problem set.Problem set is the set that supplies the problem of decision tree structure use, and the problem of being chosen during node division node is therewith bound, thereby determines which primitive to get into same leafy node.Its problem set can be different according to concrete applied environment.For example have in the Chinese 5 types of tones 1,2,3,4,5}, each type can be used as the problem of decision tree, polyphone are being confirmed under the situation of tone, problem set can be provided with shown in table one:
Problem that table one problem set is used and value
Its code is as follows:
Is QS " phntone==1 " { " * | phntone=1|* " } tone the 1st type?
Is QS " phntone==2 " { " * | phntone=2|* " } tone the 2nd type?
Is QS " phntone==3 " { " * | phntone=3|* " } tone the 3rd type?
Is QS " phntone==4 " { " * | phntone=4|* " } tone the 4th type?
Is QS " phntone==5 " { " * | phntone=5|* " } tone the 5th type?
To those skilled in the art, the use of decision tree is this area technology commonly used, and it can adopt various decision trees according to various applied environments, and the variety of issue collection is set, and divides based on this problem and to make up decision tree, repeats no more at this.
In an embodiment of the present invention, through training data being trained and cluster can obtain the decision tree of Hidden Markov HMM model and corresponding model.Yet, it should be appreciated by those skilled in the art that the acoustic model of other types also can be applied in the Fuzzy processing of embodiments of the invention.
In an embodiment of the present invention, voice unit can be other unit such as phoneme, syllable or sound mother, and for the sake of simplicity, only illustration sound mother handles as voice unit.Yet, it should be appreciated by those skilled in the art that embodiments of the invention should be not limited thereto.
In an embodiment of the present invention, also based on fuzzy data, acoustic model is trained once more.For example, at step S140,, confirm the fuzzy data in the sound bank to above-mentioned acoustic model (Hidden Markov HMM model) with decision tree.In an embodiment of the present invention, can adopt some polyphone related context all possible mark, assess the ability that this mark characterizes real data based on real data, confirm according to this assessment result whether this speech data belongs to fuzzy data then.Afterwards, at step S160,, generate fuzzy contextual feature mark to qualified fuzzy data.So, at step S180,, mark based on this fuzzy contextual feature and to train fuzzy decision-tree to the sound bank that comprises fuzzy data, have the acoustic model of fuzzy decision-tree with generation.
Fig. 2 show according to the method for the embodiment of the invention really cover half stick with paste the processing flow chart of data.As shown in Figure 2, at step S210, generate all possible contextual feature mark of the speech data in the training storehouse.All possible context mark refers to for some and will such as tone, generate all possibilities as the attribute of polyphone Fuzzy Processing.In an embodiment of the present invention, do not pay close attention to whether meet linguistic norm, and generate all possibilities.For example, for polyphone " be ", the pronunciation of this polyphone is wei4 and wei2 in theory.Promptly refer to generate wei1 and all generate possible mark, wei2, wei3, wei4, wei5 for all tones.The contextual feature mark has characterized the language of voice segments and the attribute of voice, and for example the entity sound of speech primitive is female, tone, and syllable, the position in syllable, speech, phrase and sentence, the relevant information of the unit of forward-backward correlation, and the type of sentence etc.Tone is the key character of polyphone, is example with the tone, and 5 tones can be arranged in mandarin, for this training data 5 parallel contextual feature marks can be arranged then so.It should be appreciated by those skilled in the art that for the different pronunciation in the polyphone also can generate possible contextual feature mark, it handles with the processing of tone similar.
At step S220, be based on the acoustic model (the HMM model that for example has decision tree) that step S120 trains, the assessment training data.For example, for having N a certain voice unit under the parallel contextual feature mark, then can calculate its corresponding N score value is s [1] successively ... s [k] ... s [N], this score value have reflected that this mark characterizes the ability of actual parameter.In an embodiment of the present invention, any method that can quantize assessment can adopt, and for example posterior probability under the computation model condition or model generate distance between parameter and the actual parameter etc., will describe in detail below.
At step S230, based on assessment result, the reflection of for example calculating characterizes the score value of power, judges whether voice unit is fuzzy data.In an embodiment of the present invention, the data that point value of evaluation is lower can be confirmed as fuzzy data, are used for further training.At this, point value of evaluation is low to refer to that in parallel contextual feature mark all score values all do not have enough advantages to prove its actual optimum that is only this unit mark.
In an embodiment of the present invention, also can fall into the degree of this classification according to the corresponding score value of contextual feature mark that member function (membership function) calculates this voice unit.Member function m
kCan be following to these parallel minute value representation:
Wherein, s [k] is the corresponding score value of contextual feature mark, and N is the number of contextual feature mark.
In an embodiment of the present invention, the data that satisfy hazy condition (usually according to member function ambiguity in definition threshold value) then are fuzzy data.The setting of fuzzy threshold value can be fixed, and for example for the candidate who does not occupy 50% above score value among all candidates, then these data can be thought fuzzy data.Alternatively, this fuzzy threshold value also can be dynamic, for example can choose certain part (as 10%) that ranks behind according to the score value ordering of definition classification sum under the active cell in the current database
In an embodiment of the present invention; Tranining database being carried out selecting of fuzzy data and changing whole training is favourable; This process has not only generated the data that are used for the fuzzy decision-tree training; Also the training accuracy raising for normal data contributes, and need not significantly to increase training burden.
Fig. 3 shows the operation of passing through model posterior probability assessment training data of method according to an embodiment of the invention.In an embodiment of the present invention, for for simplicity, training data is an example with certain voice unit.As shown in Figure 3; For N of this voice unit possible contextual feature mark 16a-1label 1...16a-k label k...16a-N label N, can on the model (the HMM model that for example has decision tree) that step S120 trains, find each self-corresponding acoustic model (21a-1 model1...21a-k model k...21a-N model N).In an embodiment of the present invention, be the operation that example is explained following assessment training data with the HMM acoustic model.Yet, should be appreciated that embodiments of the invention are not limited thereto.
For given voice unit, its speech parameter vector sequence is represented as follows:
The speech parameter vector sequence of this voice unit is expressed as in the posterior probability of model HMM λ:
Wherein, Q is HMM status switch { q
1, q
2..., q
T.
Each frame of voice unit is alignd with model state, and obtain number of state indexes.Can calculate then with lower probability:
Wherein, b
j(o
t) be t observed quantity constantly o
tAt the output probability of j state of current model, its Gaussian distribution probability and all depend on the type of HMM, for example hybrid density HMM continuously.
Wherein, ω
IjmIt is the weight of i mixed components of j state.μ
IjAnd ∑
IjBe average and covariance.
Alternatively, in an embodiment of the present invention, can also assess training data through the distance that model generates between parameter and the actual parameter.The model that passes through that Fig. 4 shows method according to an embodiment of the invention generates the operation that distance between parameter and the actual parameter is assessed training data.As shown in Figure 4; Still be example with certain voice unit; It is similar to the above embodiments; Still have all possible contextual feature mark 16b-1label 1...16b-k label k...l6b-N label N, and confirm its each self-corresponding model 21a-1model 1..21a-k model k...21a-N model N.Simultaneously, recover speech parameter 25b-1parameter 1...25b-k parameter k...25b-N parameter N (it is test parameter) according to each model parameter.Through calculating, assess the score value of these possibility contextual features marks to the speech parameter (being reference parameter) of this unit and the distance between the recovery parameter.
As stated, for given voice unit, its speech parameter vector sequence O is expressed as:
Can be expressed as as follows and recover speech parameter:
At the actual parameter T of given voice unit with recover there are differences between the speech parameter T '.At first between T and T ', carry out linear mapping.Usually will recover speech parameter T ' expands or is compressed to and be T.So as calculating Euclidean distance between the two of getting off:
In an embodiment of the present invention, can generate fuzzy context mark through quantizing to shine upon to change.Fuzzy context mark has characterized the language and the acoustic feature of current speech unit; And the association attributes to the polyphone that will carry out Fuzzy processing has carried out the vague definition of degreeization; The score value that can quantize according to each mark of voice unit convert corresponding context degree (high for example into; Low etc.), and carry out association list and show, to generate fuzzy context mark.Notice that in an embodiment of the present invention, fuzzy context mark generates according to objective calculating, can not receive philological restriction, such as the tone 1 through calculating wei3 or wei and combination of 5 or the like.Below come the fuzzy context mark of its generation of illustration with operation to certain voice unit with 5 tones.
As shown in Figure 5, suppose that candidate's tone of this unit is a tone 2, be expressed as tone=2 at this; (it is corresponding to tone tone=(1,2,3 to each possible contextual feature mark for member function membership as described above; 4,5)) calculate the value that it falls into the degree of this classification.So each member function value is carried out normalization, quantize to the value between the 0-1, like (0.05,0.45,0.1,0.2,0.2).And definite its contextual degree, for example high, middle or low.Then each contextual feature mark association list is shown fuzzy contextual feature mark.
In an embodiment of the present invention, for example threshold=0.2 of threshold value be can establish, then the pronunciation candidate who only considers to satisfy this baseline requirement when fuzzy contextual feature marks, for example tone 2,4 and 5 generated.To generate fuzzy context mark, for example tone=High2_Low4_Low5 according to the corresponding distributed degrees of above-mentioned tone.
It should be appreciated by those skilled in the art that generating fuzzy contextual feature mark can have multiple mode, for example can distribute, the fuzzy context that obtains quantizing according to the histogram of distribution proportion then according to the score value of similar segment in the whole training of the statistics storehouse.Should be noted that embodiments of the invention only as illustration, the mode of the fuzzy contextual feature mark of the generation of the embodiment of the invention is not limited thereto.
In an embodiment of the present invention, through generating fuzzy contextual feature mark, can have the diversity characteristic of obfuscation, thereby can avoid in the uncertain attribute classification that bad data cause, making stiff classification.
In an embodiment of the present invention, fuzzy data generated fuzzy contextual feature mark after, can carry out the fuzzy decision-tree training, and just upgrade the model parameter of acoustic model this decision tree training the time.At this,, yet it will be understood by those skilled in the art that this method confirms that for the polyphone with different pronunciations candidate's pronunciation can be suitable for equally still confirming that tone is an example.Be that example is come brief description still with above-mentioned instance.Shown in table two, the corresponding fuzzy problem set can be set be:
Problem that table two problem set is used and value
More than illustrative problem can comprise the multiple situation of the classification that combines tone, can put question to every kind of situation.The combination of these situation can be from linguistry, the practical combinations that occurs in the time of also can coming self-training etc.
In an embodiment of the present invention, can adopt multiple cluster mode, for example carry out cluster again, or cluster etc. is carried out in the secondary training storehouse of only forming to fuzzy data to whole training storehouse.When cluster is carried out again in whole training storehouse,, then its mark is replaced by the fuzzy contextual feature mark that as above generates, and in problem set, increases similar fuzzy problem collection if the training data in this training storehouse is a fuzzy data.
In an embodiment of the present invention, when cluster is carried out in secondary training storehouse,, only use fuzzy context mark and the training of fuzzy problem collection based on acoustic model of having trained and decision tree.
Carry out cluster as described above, then obtain having the acoustic model of fuzzy decision-tree.
In an embodiment of the present invention, the acoustic model that from real speech, obtains having fuzzy decision-tree through training to be improving the quality of phonetic synthesis, thereby makes Fuzzy processing become reasonable, flexible and intelligent, and makes conventional voice also obtain training more accurately.
Fig. 6 illustration the method for synthetic speech according to an embodiment of the invention.This is used for the method for phonetic synthesis, can comprise: confirm that the data that text analyzing generates are fuzzy polyphone data; Said fuzzy polyphone data are blured the polyphone prediction, to export a plurality of candidate's pronunciations and the probability thereof of said fuzzy polyphone data; Based on said a plurality of candidate's pronunciations and probability thereof, generate fuzzy contextual feature mark; Based on the acoustic model of confirming, confirm model parameter to said fuzzy contextual feature mark with fuzzy decision-tree; Said model parameter is generated speech parameter; And said speech parameter synthesized voice.
As shown in Figure 6, at step S610, confirm that the data that text analyzing generates are fuzzy polyphone data.In an embodiment of the present invention; Text analyzer is treated synthesis text and is carried out the participle operation; It is decomposed into speech and the pronunciation symbol thereof that has attribute labeling; Again according to semantic rules and phonetic rules, for sentence structure and intonation confirmed in each speech, each syllable, and the prosodic features of target speech such as pause.Can obtain multi-character words and monosyllabic word according to word segmentation result, multi-character words generally can be confirmed pronunciation according to dictionary, wherein comprises polyphone, and then such polyphone is not as fuzzy polyphone data of the present invention.And the polyphone in the embodiments of the invention generally refers to the individual character that still has a plurality of pronunciations through participle later on.So this polyphone is being carried out in the voice prediction process, can produce predicting the outcome of each candidate's pronunciation, this predicts the outcome and has described under the situation of concrete speech, the corresponding probability that the pronunciation of polyphone has.It is multiple for the mode of bluring the polyphone data has to adjudicate this polyphone, and threshold value for example can be set, and the polyphone that satisfies this threshold value then is fuzzy polyphone data.Be the candidate more than 70% for there not being probability among all candidates for example, then this polyphone can be thought fuzzy polyphone data.Confirm the principle of fuzzy polyphone data and confirm that in the training stage principle of fuzzy data is similar, repeat no more at this.
Afterwards,, said fuzzy polyphone data are blured the polyphone prediction, to export a plurality of candidate's pronunciations and the probability thereof of said fuzzy polyphone data at step S620.In an embodiment of the present invention, for non-fuzzy polyphone data, its pronunciation can be confirmed with higher confidence level ground, therefore need not carry out Fuzzy processing, then carries out conventional polyphone prediction processing, to export this candidate who confirms pronunciation.If this polyphone then carries out Fuzzy processing for fuzzy polyphone data, export a plurality of candidate's pronunciations and corresponding probability.
Next, at step S630,, generate fuzzy contextual feature mark based on said a plurality of candidate's pronunciations and probability thereof.In an embodiment of the present invention, the step S160 that generates fuzzy contextual feature mark in the execution of this step and the training process is similar, all can mapping be changed or other modes realize through quantizing, repeat no more at this.
At step S640,, confirm corresponding model parameter to said fuzzy contextual feature mark based on acoustic model with fuzzy decision-tree.In an embodiment of the present invention, for the HMM acoustic model, the distribution of each component under the state that then corresponding model parameter comprises for the HMM model.
At step S650, said model parameter is generated speech parameter.Can adopt this area parameter generation algorithm commonly used,, repeat no more at this for example according to the parameter generation algorithm of maximum likelihood probability condition etc.
At last, at step S660, said speech parameter is synthesized voice.
In an embodiment of the present invention, come synthetic speech through Fuzzy processing is carried out in the pronunciation of fuzzy polyphone data, thereby under different context, this pronunciation can have various variation, thereby improve the quality of phonetic synthesis.
Under same inventive concept, Fig. 7 is the block diagram according to the equipment that is used for synthetic speech of the embodiment of the invention.Below just combine should figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
The equipment 700 that is used for synthetic speech can comprise: polyphone predicting unit 703 is used for fuzzy polyphone data are carried out fuzzy prediction, to export a plurality of candidate's pronunciations and the prediction probability of said fuzzy polyphone data; Fuzzy contextual feature mark generation unit 704 is used for generating fuzzy contextual feature mark based on said a plurality of candidate's pronunciations and probability thereof; Confirm unit 705, be used for, confirm model parameter to said fuzzy contextual feature mark based on the acoustic model of confirming with fuzzy decision-tree; Parameter generators 706 is used for generating speech parameter to said model parameter; And compositor 707, be used for said speech parameter synthetic speech.
The equipment 700 that is used for synthetic speech of the present invention can be realized the above-mentioned method that is used for synthetic speech, and its concrete operations please refer to as above content, repeat no more at this.
In an embodiment of the present invention, equipment 700 can also comprise text analyzer 702, is used for text to be synthesized is decomposed into speech and the pronunciation symbol thereof that has attribute labeling.Alternatively, equipment 700 can also comprise I/O unit 701, is used to import text to be synthesized and the synthetic voice of output.Alternatively, in an embodiment of the present invention, symbols streams that can also text analyzing has been carried out in direct input from the outside.Therefore, as shown in Figure 7, text analyzer 702 is shown in broken lines with I/O unit 701.
In an embodiment of the present invention, be used for the equipment 700 and the various piece thereof of synthetic speech, can realize the method that is used for synthetic speech or its step of the embodiment that the front is described in the operation.
The equipment that is used for synthetic speech 700 in the present embodiment and each ingredient thereof can use special-purpose circuit or chip to constitute, and also can carry out corresponding program through computing machine (processor) and realize.
Those having ordinary skill in the art will appreciate that can use a computer executable instruction and/or be included in the processor control routine of above-mentioned method and apparatus realizes, for example provides such code on such as the mounting medium of disk, CD or DVD-ROM, such as the programmable memory of ROM (read-only memory) (firmware) or the data carrier such as optics or electronic signal carrier.The method and apparatus of present embodiment also can by such as VLSI (very large scale integrated circuits) or gate array, such as the semiconductor of logic chip, transistor etc., or realize such as the hardware circuit of the programmable hardware device of field programmable gate array, programmable logic device etc., also can by the combination of above-mentioned hardware circuit and software for example firmware realize.
Though above combination specific embodiment is described in detail the method that is used to train acoustic model of the present invention, the method and apparatus that is used for synthetic speech; But the present invention is not limited to this, and those of ordinary skills can understand and can carry out multiple conversion, replacement and modification and without departing from the spirit and scope of the present invention to the present invention; Protection scope of the present invention is limited accompanying claims.
Claims (10)
1. method that is used for phonetic synthesis comprises:
Confirm that the data that text analyzing generates are fuzzy polyphone data;
Said fuzzy polyphone data are blured the polyphone prediction, to export a plurality of candidate's pronunciations and the probability thereof of said fuzzy polyphone data;
Based on said a plurality of candidate's pronunciations and probability thereof, generate fuzzy contextual feature mark;
Based on acoustic model, confirm model parameter to said fuzzy contextual feature mark with fuzzy decision-tree;
Said model parameter is generated speech parameter; And
Said speech parameter is synthesized voice.
2. the method for claim 1, the step that wherein generates fuzzy contextual feature mark further comprises:
Confirm that based on said probability the context mark of candidate's pronunciation of said fuzzy polyphone data falls into the degree of classification; And
Through quantizing the said degree of conversion generating said fuzzy contextual feature mark, wherein said fuzzy contextual feature be labeled as said candidate's pronunciation the context mark unite expression.
3. equipment that is used for synthetic speech comprises:
The polyphone predicting unit is used for the pronunciation of the fuzzy polyphone data of fuzzy prediction, to export a plurality of candidate's pronunciations and the prediction probability of said fuzzy polyphone data;
Fuzzy contextual feature mark generation unit is used for generating fuzzy contextual feature mark based on said a plurality of candidate's pronunciations and probability thereof;
Confirm the unit, be used for, confirm model parameter to said fuzzy contextual feature mark based on acoustic model with fuzzy decision-tree;
Parameter generators is used for generating speech parameter to said model parameter; And
Compositor is used for said speech parameter synthetic speech.
4. equipment as claimed in claim 3, wherein said fuzzy contextual feature mark generation unit further is configured to:
Confirm that based on said probability the context mark of candidate's pronunciation of said fuzzy polyphone data falls into the degree of classification; And
Through quantizing the said degree of conversion generating said fuzzy contextual feature mark, wherein said fuzzy contextual feature be labeled as said candidate's pronunciation the context mark unite expression.
5. system that is used for synthetic speech comprises:
Be used for confirming that the data that text analyzing generates are the device of fuzzy polyphone data;
Be used for said fuzzy polyphone data are blured polyphone prediction, with a plurality of candidates' pronunciations of exporting said fuzzy polyphone data and the device of probability thereof;
Be used for generating the device of fuzzy contextual feature mark based on said a plurality of candidate's pronunciations and probability thereof;
Be used for based on acoustic model, confirm the device of model parameter to said fuzzy contextual feature mark with fuzzy decision-tree;
Be used for said model parameter is generated the device of speech parameter; And
Be used for said speech parameter is synthesized the device of voice.
6. method that is used to train acoustic model comprises:
Each voice unit in the training utterance storehouse, to generate acoustic model, said voice unit comprises parameters,acoustic and context mark;
For the context combination, carry out the decision tree clustering processing has decision tree with generation acoustic model;
Based on said acoustic model, confirm the fuzzy data in the sound bank with decision tree;
To said fuzzy data, generate fuzzy contextual feature mark; And
Based on said fuzzy contextual feature mark, said sound bank is carried out the cluster training, have the acoustic model of fuzzy decision-tree with generation.
7. method as claimed in claim 6, confirm that wherein the step of fuzzy data further comprises:
The assessment voice unit; And
Confirm that candidate's context mark of said voice unit falls into the degree of classification; And
If said degree satisfies predetermined threshold, confirm that then said voice unit is a fuzzy data.
8. method as claimed in claim 7, the step of wherein assessing voice unit further comprises:
Generate parameter and the distance between the voice unit parameter through model posterior probability or model and assess the score value that the contextual feature of candidate's pronunciation of said voice unit marks.
9. method as claimed in claim 6, the step that wherein generates fuzzy contextual feature mark further comprises:
Through assessing the score value that said voice unit confirms that the contextual feature of candidate's pronunciation of said voice unit marks;
Confirm that based on said score value candidate's context mark of said voice unit falls into the degree of classification; And
Through quantizing the said degree of conversion generating said fuzzy contextual feature mark, wherein said fuzzy contextual feature be labeled as said candidate's pronunciation the context mark unite expression.
10. method as claimed in claim 6, wherein based on said fuzzy contextual feature mark, it is one of following that the step of carrying out the cluster training further comprises:
Based on said fuzzy contextual feature mark and preset fuzzy problem collection, training comprises that the training set of said fuzzy data has the acoustic model of said fuzzy decision-tree with generation; And
Train each voice unit in the said sound bank once more based on problem set and contextual feature mark; Wherein said problem set also comprises preset fuzzy problem collection, and the contextual feature of the fuzzy data in the said sound bank is labeled as said fuzzy contextual feature mark.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100465804A CN102651217A (en) | 2011-02-25 | 2011-02-25 | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis |
US13/402,602 US9058811B2 (en) | 2011-02-25 | 2012-02-22 | Speech synthesis with fuzzy heteronym prediction using decision trees |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100465804A CN102651217A (en) | 2011-02-25 | 2011-02-25 | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102651217A true CN102651217A (en) | 2012-08-29 |
Family
ID=46693212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011100465804A Pending CN102651217A (en) | 2011-02-25 | 2011-02-25 | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis |
Country Status (2)
Country | Link |
---|---|
US (1) | US9058811B2 (en) |
CN (1) | CN102651217A (en) |
Cited By (95)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103578467A (en) * | 2013-10-18 | 2014-02-12 | 威盛电子股份有限公司 | Acoustic model building method, speech recognition method and electronic device thereof |
CN103854643A (en) * | 2012-11-29 | 2014-06-11 | 株式会社东芝 | Method and apparatus for speech synthesis |
CN103902600A (en) * | 2012-12-27 | 2014-07-02 | 富士通株式会社 | Keywords list forming device and method and electronic equipment |
CN104142909A (en) * | 2014-05-07 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Method and device for phonetic annotation of Chinese characters |
CN104200803A (en) * | 2014-09-16 | 2014-12-10 | 北京开元智信通软件有限公司 | Voice broadcasting method, device and system |
CN104464731A (en) * | 2013-09-20 | 2015-03-25 | 株式会社东芝 | Data collection device, method, voice talking device and method |
CN104599670A (en) * | 2015-01-30 | 2015-05-06 | 成都星炫科技有限公司 | Voice recognition method of touch and talk pen |
CN104867491A (en) * | 2015-06-17 | 2015-08-26 | 百度在线网络技术(北京)有限公司 | Training method and device for prosody model used for speech synthesis |
CN105225657A (en) * | 2015-10-22 | 2016-01-06 | 百度在线网络技术(北京)有限公司 | Polyphone mark template generation method and device |
CN105304081A (en) * | 2015-11-09 | 2016-02-03 | 上海语知义信息技术有限公司 | Smart household voice broadcasting system and voice broadcasting method |
CN105336322A (en) * | 2015-09-30 | 2016-02-17 | 百度在线网络技术(北京)有限公司 | Polyphone model training method, and speech synthesis method and device |
CN105340004A (en) * | 2013-06-28 | 2016-02-17 | 谷歌公司 | Computer-implemented method, computer-readable medium and system for pronunciation learning |
CN105702248A (en) * | 2014-12-09 | 2016-06-22 | 苹果公司 | Disambiguating heteronyms in speech synthesis |
CN105931635A (en) * | 2016-03-31 | 2016-09-07 | 北京奇艺世纪科技有限公司 | Audio segmentation method and device |
CN108305612A (en) * | 2017-11-21 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Text-processing, model training method, device, storage medium and computer equipment |
CN108346423A (en) * | 2017-01-23 | 2018-07-31 | 北京搜狗科技发展有限公司 | The treating method and apparatus of phonetic synthesis model |
CN108364639A (en) * | 2013-08-23 | 2018-08-03 | 株式会社东芝 | Speech processing system and method |
CN108389577A (en) * | 2018-02-12 | 2018-08-10 | 广州视源电子科技股份有限公司 | Method, system, device and storage medium for optimizing speech recognition acoustic model |
CN109996149A (en) * | 2017-12-29 | 2019-07-09 | 深圳市赛菲姆科技有限公司 | A kind of parking lot Intelligent voice broadcasting system |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
CN111681641A (en) * | 2020-05-26 | 2020-09-18 | 微软技术许可有限责任公司 | Phrase-based end-to-end text-to-speech (TTS) synthesis |
CN111968676A (en) * | 2020-08-18 | 2020-11-20 | 北京字节跳动网络技术有限公司 | Pronunciation correction method and device, electronic equipment and storage medium |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
CN114360494A (en) * | 2021-12-29 | 2022-04-15 | 广州酷狗计算机科技有限公司 | Rhythm labeling method and device, computer equipment and storage medium |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
CN115440205A (en) * | 2021-06-04 | 2022-12-06 | 中国移动通信集团浙江有限公司 | Speech processing method, device, terminal and program product |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US12277954B2 (en) | 2024-04-16 | 2025-04-15 | Apple Inc. | Voice trigger for a digital assistant |
Families Citing this family (133)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
WO2010067118A1 (en) | 2008-12-11 | 2010-06-17 | Novauris Technologies Limited | Speech recognition involving a mobile device |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8706472B2 (en) * | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
CN102982019B (en) * | 2012-11-26 | 2019-01-15 | 百度国际科技(深圳)有限公司 | Input method corpus phonetic notation method, the method and electronic device for generating evaluation and test corpus |
US9396723B2 (en) | 2013-02-01 | 2016-07-19 | Tencent Technology (Shenzhen) Company Limited | Method and device for acoustic language model training |
CN103971677B (en) * | 2013-02-01 | 2015-08-12 | 腾讯科技(深圳)有限公司 | A kind of acoustics language model training method and device |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
WO2014144949A2 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | Training an at least partial voice command system |
US20140351196A1 (en) * | 2013-05-21 | 2014-11-27 | Sas Institute Inc. | Methods and systems for using clustering for splitting tree nodes in classification decision trees |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
KR101809808B1 (en) | 2013-06-13 | 2017-12-15 | 애플 인크. | System and method for emergency calls initiated by voice command |
CN105531757B (en) * | 2013-09-20 | 2019-08-06 | 株式会社东芝 | Voice selecting auxiliary device and voice selecting method |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
CA2934298C (en) * | 2014-01-14 | 2023-03-07 | Interactive Intelligence Group, Inc. | System and method for synthesis of speech from provided text |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
KR20160058470A (en) * | 2014-11-17 | 2016-05-25 | 삼성전자주식회사 | Speech synthesis apparatus and control method thereof |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179588B1 (en) | 2016-06-09 | 2019-02-22 | Apple Inc. | Intelligent automated assistant in a home environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
KR102392094B1 (en) | 2016-09-06 | 2022-04-28 | 딥마인드 테크놀로지스 리미티드 | Sequence processing using convolutional neural networks |
US11080591B2 (en) | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
KR102353284B1 (en) | 2016-09-06 | 2022-01-19 | 딥마인드 테크놀로지스 리미티드 | Generate audio using neural networks |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
JP6756916B2 (en) | 2016-10-26 | 2020-09-16 | ディープマインド テクノロジーズ リミテッド | Processing text sequences using neural networks |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
CN107122179A (en) | 2017-03-31 | 2017-09-01 | 阿里巴巴集团控股有限公司 | The function control method and device of voice |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10431203B2 (en) * | 2017-09-05 | 2019-10-01 | International Business Machines Corporation | Machine training for native language and fluency identification |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
CN110047463B (en) * | 2019-01-31 | 2021-03-02 | 北京捷通华声科技股份有限公司 | Voice synthesis method and device and electronic equipment |
CN109767755A (en) * | 2019-03-01 | 2019-05-17 | 广州多益网络股份有限公司 | A kind of phoneme synthesizing method and system |
CN115116427B (en) * | 2022-06-22 | 2023-11-14 | 马上消费金融股份有限公司 | Labeling method, voice synthesis method, training method and training device |
CN115512696B (en) * | 2022-09-20 | 2024-09-13 | 中国第一汽车股份有限公司 | Simulation training method and vehicle |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6098042A (en) * | 1998-01-30 | 2000-08-01 | International Business Machines Corporation | Homograph filter for speech synthesis system |
CN1836226A (en) * | 2003-08-21 | 2006-09-20 | 熊锦棠 | Method and apparatus for converting characters of non-alphabetic languages |
US20060277045A1 (en) * | 2005-06-06 | 2006-12-07 | International Business Machines Corporation | System and method for word-sense disambiguation by recursive partitioning |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6366883B1 (en) * | 1996-05-15 | 2002-04-02 | Atr Interpreting Telecommunications | Concatenation of speech segments by use of a speech synthesizer |
US6081781A (en) * | 1996-09-11 | 2000-06-27 | Nippon Telegragh And Telephone Corporation | Method and apparatus for speech synthesis and program recorded medium |
JP3587048B2 (en) * | 1998-03-02 | 2004-11-10 | 株式会社日立製作所 | Prosody control method and speech synthesizer |
ATE298453T1 (en) * | 1998-11-13 | 2005-07-15 | Lernout & Hauspie Speechprod | SPEECH SYNTHESIS BY CONTACTING SPEECH WAVEFORMS |
EP1159733B1 (en) * | 1999-03-08 | 2003-08-13 | Siemens Aktiengesellschaft | Method and array for determining a representative phoneme |
US7657102B2 (en) * | 2003-08-27 | 2010-02-02 | Microsoft Corp. | System and method for fast on-line learning of transformed hidden Markov models |
US7881934B2 (en) * | 2003-09-12 | 2011-02-01 | Toyota Infotechnology Center Co., Ltd. | Method and system for adjusting the voice prompt of an interactive system based upon the user's state |
FR2861491B1 (en) * | 2003-10-24 | 2006-01-06 | Thales Sa | METHOD FOR SELECTING SYNTHESIS UNITS |
US7983910B2 (en) * | 2006-03-03 | 2011-07-19 | International Business Machines Corporation | Communicating across voice and text channels with emotion preservation |
US20080120093A1 (en) * | 2006-11-16 | 2008-05-22 | Seiko Epson Corporation | System for creating dictionary for speech synthesis, semiconductor integrated circuit device, and method for manufacturing semiconductor integrated circuit device |
US20090299731A1 (en) * | 2007-03-12 | 2009-12-03 | Mongoose Ventures Limited | Aural similarity measuring system for text |
GB0704772D0 (en) * | 2007-03-12 | 2007-04-18 | Mongoose Ventures Ltd | Aural similarity measuring system for text |
BRPI0809759A2 (en) * | 2007-04-26 | 2014-10-07 | Ford Global Tech Llc | "EMOTIVE INFORMATION SYSTEM, EMOTIVE INFORMATION SYSTEMS, EMOTIVE INFORMATION DRIVING METHODS, EMOTIVE INFORMATION SYSTEMS FOR A PASSENGER VEHICLE AND COMPUTER IMPLEMENTED METHOD" |
US8321222B2 (en) * | 2007-08-14 | 2012-11-27 | Nuance Communications, Inc. | Synthesis by generation and concatenation of multi-form segments |
CN101452699A (en) * | 2007-12-04 | 2009-06-10 | 株式会社东芝 | Rhythm self-adapting and speech synthesizing method and apparatus |
JP5422754B2 (en) * | 2010-01-04 | 2014-02-19 | 株式会社東芝 | Speech synthesis apparatus and method |
WO2012001457A1 (en) * | 2010-06-28 | 2012-01-05 | Kabushiki Kaisha Toshiba | Method and apparatus for fusing voiced phoneme units in text-to-speech |
US9009050B2 (en) * | 2010-11-30 | 2015-04-14 | At&T Intellectual Property I, L.P. | System and method for cloud-based text-to-speech web services |
US8706472B2 (en) * | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
-
2011
- 2011-02-25 CN CN2011100465804A patent/CN102651217A/en active Pending
-
2012
- 2012-02-22 US US13/402,602 patent/US9058811B2/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6098042A (en) * | 1998-01-30 | 2000-08-01 | International Business Machines Corporation | Homograph filter for speech synthesis system |
CN1836226A (en) * | 2003-08-21 | 2006-09-20 | 熊锦棠 | Method and apparatus for converting characters of non-alphabetic languages |
US20060277045A1 (en) * | 2005-06-06 | 2006-12-07 | International Business Machines Corporation | System and method for word-sense disambiguation by recursive partitioning |
Non-Patent Citations (3)
Title |
---|
K. TOKUDA ET AL: "AN HMM-BASED SPEECH SYNTHESIS SYSTEM APPLIED TO ENGLISH", 《PROC. OF 2002 IEEE SSW》, 30 September 2002 (2002-09-30) * |
LU HENG ET AL: "HETERONYM VERIFICATION FOR MANDARIN SPEECH SYNTHESIS", 《INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING》, 19 December 2008 (2008-12-19) * |
张子荣,初敏: "解决多音字字-音转换的一种统计学习方法", 《中文信息学报》, vol. 16, no. 3, 31 December 2002 (2002-12-31) * |
Cited By (139)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US12165635B2 (en) | 2010-01-18 | 2024-12-10 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
CN103854643A (en) * | 2012-11-29 | 2014-06-11 | 株式会社东芝 | Method and apparatus for speech synthesis |
CN103854643B (en) * | 2012-11-29 | 2017-03-01 | 株式会社东芝 | Method and apparatus for synthesizing voice |
CN103902600A (en) * | 2012-12-27 | 2014-07-02 | 富士通株式会社 | Keywords list forming device and method and electronic equipment |
CN103902600B (en) * | 2012-12-27 | 2017-12-01 | 富士通株式会社 | Lists of keywords forming apparatus and method and electronic equipment |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN105340004B (en) * | 2013-06-28 | 2019-09-10 | 谷歌有限责任公司 | Computer implemented method, computer-readable medium and system for word pronunciation learning |
CN105340004A (en) * | 2013-06-28 | 2016-02-17 | 谷歌公司 | Computer-implemented method, computer-readable medium and system for pronunciation learning |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
CN108364639A (en) * | 2013-08-23 | 2018-08-03 | 株式会社东芝 | Speech processing system and method |
CN104464731A (en) * | 2013-09-20 | 2015-03-25 | 株式会社东芝 | Data collection device, method, voice talking device and method |
CN103578467A (en) * | 2013-10-18 | 2014-02-12 | 威盛电子股份有限公司 | Acoustic model building method, speech recognition method and electronic device thereof |
US10114809B2 (en) | 2014-05-07 | 2018-10-30 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for phonetically annotating text |
CN104142909A (en) * | 2014-05-07 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Method and device for phonetic annotation of Chinese characters |
CN104142909B (en) * | 2014-05-07 | 2016-04-27 | 腾讯科技(深圳)有限公司 | A kind of phonetic annotation of Chinese characters method and device |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
CN104200803A (en) * | 2014-09-16 | 2014-12-10 | 北京开元智信通软件有限公司 | Voice broadcasting method, device and system |
CN105702248B (en) * | 2014-12-09 | 2019-11-19 | 苹果公司 | Electronic device and method, storage medium for operating an intelligent automated assistant |
CN105702248A (en) * | 2014-12-09 | 2016-06-22 | 苹果公司 | Disambiguating heteronyms in speech synthesis |
CN104599670A (en) * | 2015-01-30 | 2015-05-06 | 成都星炫科技有限公司 | Voice recognition method of touch and talk pen |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US12154016B2 (en) | 2015-05-15 | 2024-11-26 | Apple Inc. | Virtual assistant in a communication session |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
CN104867491B (en) * | 2015-06-17 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | Rhythm model training method and device for phonetic synthesis |
CN104867491A (en) * | 2015-06-17 | 2015-08-26 | 百度在线网络技术(北京)有限公司 | Training method and device for prosody model used for speech synthesis |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US12204932B2 (en) | 2015-09-08 | 2025-01-21 | Apple Inc. | Distributed personal assistant |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
CN105336322A (en) * | 2015-09-30 | 2016-02-17 | 百度在线网络技术(北京)有限公司 | Polyphone model training method, and speech synthesis method and device |
CN105225657A (en) * | 2015-10-22 | 2016-01-06 | 百度在线网络技术(北京)有限公司 | Polyphone mark template generation method and device |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
CN105304081A (en) * | 2015-11-09 | 2016-02-03 | 上海语知义信息技术有限公司 | Smart household voice broadcasting system and voice broadcasting method |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
CN105931635A (en) * | 2016-03-31 | 2016-09-07 | 北京奇艺世纪科技有限公司 | Audio segmentation method and device |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
CN108346423A (en) * | 2017-01-23 | 2018-07-31 | 北京搜狗科技发展有限公司 | The treating method and apparatus of phonetic synthesis model |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
CN108305612B (en) * | 2017-11-21 | 2020-07-31 | 腾讯科技(深圳)有限公司 | Text processing method, text processing device, model training method, model training device, storage medium and computer equipment |
CN108305612A (en) * | 2017-11-21 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Text-processing, model training method, device, storage medium and computer equipment |
CN109996149A (en) * | 2017-12-29 | 2019-07-09 | 深圳市赛菲姆科技有限公司 | A kind of parking lot Intelligent voice broadcasting system |
CN108389577A (en) * | 2018-02-12 | 2018-08-10 | 广州视源电子科技股份有限公司 | Method, system, device and storage medium for optimizing speech recognition acoustic model |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US12080287B2 (en) | 2018-06-01 | 2024-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
CN111681641B (en) * | 2020-05-26 | 2024-02-06 | 微软技术许可有限责任公司 | Phrase-based end-to-end text-to-speech (TTS) synthesis |
CN111681641A (en) * | 2020-05-26 | 2020-09-18 | 微软技术许可有限责任公司 | Phrase-based end-to-end text-to-speech (TTS) synthesis |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
CN111968676A (en) * | 2020-08-18 | 2020-11-20 | 北京字节跳动网络技术有限公司 | Pronunciation correction method and device, electronic equipment and storage medium |
CN115440205A (en) * | 2021-06-04 | 2022-12-06 | 中国移动通信集团浙江有限公司 | Speech processing method, device, terminal and program product |
CN114360494A (en) * | 2021-12-29 | 2022-04-15 | 广州酷狗计算机科技有限公司 | Rhythm labeling method and device, computer equipment and storage medium |
US12277954B2 (en) | 2024-04-16 | 2025-04-15 | Apple Inc. | Voice trigger for a digital assistant |
Also Published As
Publication number | Publication date |
---|---|
US20120221339A1 (en) | 2012-08-30 |
US9058811B2 (en) | 2015-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102651217A (en) | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis | |
Qian et al. | Contentvec: An improved self-supervised speech representation by disentangling speakers | |
Kharitonov et al. | Text-free prosody-aware generative spoken language modeling | |
US11837216B2 (en) | Speech recognition using unspoken text and speech synthesis | |
CN110782870B (en) | Speech synthesis method, device, electronic equipment and storage medium | |
Stoller et al. | End-to-end lyrics alignment for polyphonic music using an audio-to-character recognition model | |
US10332508B1 (en) | Confidence checking for speech processing and query answering | |
US10388274B1 (en) | Confidence checking for speech processing and query answering | |
Franco et al. | Automatic pronunciation scoring for language instruction | |
Morgan | Deep and wide: Multiple layers in automatic speech recognition | |
EP2815398B1 (en) | Audio human interactive proof based on text-to-speech and semantics | |
CN101828218B (en) | Synthesis by generation and concatenation of multi-form segments | |
CN106297800B (en) | A method and device for adaptive speech recognition | |
WO2022148176A1 (en) | Method, device, and computer program product for english pronunciation assessment | |
CN101551947A (en) | Computer system for assisting spoken language learning | |
CN110459202B (en) | Rhythm labeling method, device, equipment and medium | |
Abdou et al. | Computer aided pronunciation learning system using speech recognition techniques. | |
CN110415725A (en) | Use the method and system of first language data assessment second language pronunciation quality | |
US20020040296A1 (en) | Phoneme assigning method | |
JP6810580B2 (en) | Language model learning device and its program | |
CN102651218A (en) | Method and equipment for creating voice tag | |
Chang et al. | Speechprompt: Prompting speech language models for speech processing tasks | |
Barbany et al. | FastVC: Fast Voice Conversion with non-parallel data | |
Li et al. | Improving mandarin tone mispronunciation detection for non-native learners with soft-target tone labels and blstm-based deep models | |
Janyoi et al. | An Isarn dialect HMM-based text-to-speech system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20161130 |
|
C20 | Patent right or utility model deemed to be abandoned or is abandoned |