CN104867491B - Rhythm model training method and device for phonetic synthesis - Google Patents

Rhythm model training method and device for phonetic synthesis Download PDF

Info

Publication number
CN104867491B
CN104867491B CN201510337430.7A CN201510337430A CN104867491B CN 104867491 B CN104867491 B CN 104867491B CN 201510337430 A CN201510337430 A CN 201510337430A CN 104867491 B CN104867491 B CN 104867491B
Authority
CN
China
Prior art keywords
text
rhythm model
rhythm
participle
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510337430.7A
Other languages
Chinese (zh)
Other versions
CN104867491A (en
Inventor
徐扬凯
李秀林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510337430.7A priority Critical patent/CN104867491B/en
Publication of CN104867491A publication Critical patent/CN104867491A/en
Application granted granted Critical
Publication of CN104867491B publication Critical patent/CN104867491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of rhythm model training method and device for phonetic synthesis, wherein, for the rhythm model training method of phonetic synthesis, including:S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text;S2, based on Chinese thesaurus in training corpus text participle carry out it is extensive;And S3, according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained.The rhythm model training method and device for phonetic synthesis of the embodiment of the present invention, by extracting the corresponding text feature of participle and marker characteristic from training corpus text, multiple participles in training corpus text are carried out based on Chinese thesaurus extensive, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained, so that rhythm model is more perfect, and then lift the accuracy of prosody prediction.

Description

Rhythm model training method and device for phonetic synthesis
Technical field
The present invention relates to literary periodicals technical field, more particularly to a kind of rhythm model training method for phonetic synthesis And device.
Background technology
Phonetic synthesis, also known as literary periodicals technology, it is a kind of that text information can be converted into voice and read aloud Technology.With the continuous progress of science and technology, the application of phonetic synthesis is more and more extensive, such as report, the sound novel of news and information Deng.In daily life, also the information such as short message, mail can be synthesized by voice by phonetic synthesis, is to provide one kind user more Obtain the mode of information.
In speech synthesis system, prosody prediction is the basis of whole system, if rhythm pause prediction error can be direct Influence the effect of phonetic synthesis.For example:Synthesis text is " if passerby passs its empty bottle ", and the correct rhythm should be " if # 1 passerby #1 passs its #2 of #1 mono- #1 empty bottle ", and real prosody prediction result is " if #1 passerby #1 passs its #1 mono- of #2 Individual #1 empty bottles ", #1 therein represents dwell, and #2 represents big pause.It is final that rhythm pause prediction error causes the sentence The inadequate remarkable fluency of synthetic effect, so as to cause user experience poor.
The content of the invention
It is contemplated that at least solving one of technical problem in correlation technique to a certain extent.Therefore, the present invention One purpose is a kind of rhythm model training method for phonetic synthesis of proposition, and this method can improve rhythm model, carry Rise the accuracy of prosody prediction.
Second object of the present invention is to propose a kind of phoneme synthesizing method.
Third object of the present invention is to propose a kind of rhythm model trainer for phonetic synthesis.
Fourth object of the present invention is to propose a kind of speech synthetic device.
To achieve these goals, first aspect present invention embodiment proposes a kind of rhythm model for phonetic synthesis Training method, including:S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text;S2, based on synonymous Word word woods carries out extensive to the participle in the training corpus text;And S3, according to the text feature, the marker characteristic And it is extensive after participle, the rhythm model is trained.
The rhythm model training method for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model It is more perfect, and then lift the accuracy of prosody prediction.
Second aspect of the present invention embodiment proposes a kind of phoneme synthesizing method, including:S4, the extraction from text to be predicted Text feature, and the text feature is inputted into the rhythm model;S5, according to the rhythm model to the text to be predicted Carry out prosody prediction;S6, acoustical predictions further are carried out to the text to be predicted, to generate parameters,acoustic sequence;And S7, The parameters,acoustic sequence is spliced, to generate phonetic synthesis result.
The phoneme synthesizing method of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used Family usage experience.
Third aspect present invention embodiment proposes a kind of rhythm model trainer for phonetic synthesis, including:Carry Modulus block, for extracting the corresponding text feature of participle and marker characteristic from training corpus text;Extensive module, for based on Chinese thesaurus carries out extensive to the participle in the training corpus text;And training module, for special according to the text Levy, the marker characteristic and it is extensive after participle, the rhythm model is trained.
The rhythm model trainer for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model It is more perfect, and then lift the accuracy of prosody prediction.
Fourth aspect present invention embodiment proposes a kind of speech synthetic device, including:Extraction module, for be predicted Text feature is extracted in text, and the text feature is inputted into the rhythm model;Prosody prediction module, for according to described Rhythm model carries out prosody prediction to the text to be predicted;Acoustical predictions module, for further to the text to be predicted Acoustical predictions are carried out, to generate parameters,acoustic sequence;And generation module, for splicing to the parameters,acoustic sequence, To generate phonetic synthesis result.
The speech synthetic device of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used Family usage experience.
Brief description of the drawings
Fig. 1 is the flow chart of the rhythm model training method according to an embodiment of the invention for phonetic synthesis.
Fig. 2 is the flow chart of phoneme synthesizing method according to an embodiment of the invention.
Fig. 3 is the structural representation of the rhythm model trainer according to an embodiment of the invention for phonetic synthesis Figure.
Fig. 4 is the structural representation of speech synthetic device according to an embodiment of the invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and be not considered as limiting the invention.
Below with reference to the accompanying drawings describe the rhythm model training method and device for phonetic synthesis of the embodiment of the present invention with And phoneme synthesizing method and device.
Fig. 1 is the flow chart of the rhythm model training method according to an embodiment of the invention for phonetic synthesis.
As shown in figure 1, the rhythm model training method for phonetic synthesis may include:
S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text.
Wherein, training corpus can be split as multiple participles, and each participle is respectively provided with corresponding text feature and mark is special Levy.Text feature may include the features such as part of speech, word length.Marker characteristic can be the corresponding rhythm pause level of classification of participle, The corresponding rhythm pause level of such as rhythm word is #1, and the corresponding rhythm pause level of prosodic phrase is #2, intonation phrase correspondence Rhythm pause level for #3 etc..
For example, example sentence is " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 activities # 3”.Word sequence x is:European Union determines that setting up joint force's strike Mediterranean steals into another country activity, and flag sequence y is:#2#1#2#3#2#1# 1#3.Flag sequence y is made up of multiple marker characteristics.
S2, based on Chinese thesaurus in training corpus text multiple participles carry out it is extensive.
Specifically, feature can will be added with the identical synonym such as function word usage, meaning, part of speech, carries out extensive extension.
For example, the synonym of " establishment " may include " to set up ", " establishment " etc..
S3, according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained.
Specifically, rhythm model can be trained by below equation.
Wherein, x is word sequence;Y is flag sequence;P (y | x) it is the probability for occurring flag sequence y under word sequence x;Z (x) it is normalization factor,tk(yi-1, yi, x, i) and it is whole The feature of observation sequence and respective markers sequence at i-1 the and i moment, is transfer function;sk(yi, x, i) and to be whole at the i moment The feature of observation sequence and mark, is function of state;λkFor the weight parameter for the transfer function that need to train estimation;μkTo need training The weight parameter of the function of state of estimation.
For example, " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 work to training corpus Dynamic #3 " in participle " establishments " can be generalized for " setting up ", " establishment ", the following real number value tag of formation:
Its characteristic function is
Thus, weight parameter λ can be trainedkAnd μk
The rhythm model training method for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model It is more perfect, and then lift the accuracy of prosody prediction.
Fig. 2 is the flow chart of phoneme synthesizing method according to an embodiment of the invention.
As shown in Fig. 2 phoneme synthesizing method may include:
S4, text feature is extracted from text to be predicted, and text feature is inputted into rhythm model.
In an embodiment of the present invention, it can be multiple participles by text dividing to be predicted, then obtain each participle correspondence The feature such as part of speech, word length, above-mentioned text feature is then inputted into the rhythm model generated in a upper embodiment.
S5, according to rhythm model to text to be predicted carry out prosody prediction.
Specifically, using the weight parameter λ of characteristic functionkAnd μk, prosody prediction is carried out to text to be predicted.
Wherein, the feature of text progress prosody prediction to be predicted is:
Wherein, x is word sequence;I is the sequence moment;B (x, i) is features of the word sequence x at the i moment;xiIt is x at the i moment State.
Function of state is:
Transfer function is:
Wherein, y is flag sequence;I is the sequence moment;B (x, i) is features of the word sequence x at the i moment;yiIt is y in i The state at quarter.
For example, after extensive to participle progress based on Chinese thesaurus, in xiDuring=" establishment ", deposited in rhythm model With real number value tag
With Corresponding characteristic function Related weight parameter λkAnd μk, then in the word sequence for corresponding to " determine set up joint force " xiProsody prediction sequences y during=" establishment "i=#2.And before synonym is extensive, above-mentioned real number value tag is not present, it is impossible to obtain Obtain the related weight parameter λ of corresponding characteristic functionkAnd μk, thus can not accurately provide the probabilistic information of correlation.Therefore add After Chinese thesaurus, the accuracy of prosody prediction can be improved.
Prosody prediction is carried out to whole segmentation sequence using the above method, the rhythm pause level of each participle is obtained, from And complete prosody prediction.
S6, further to text to be predicted carry out acoustical predictions, to generate parameters,acoustic sequence.
Rhythm pause level is input in acoustical predictions model, so as to carry out acoustical predictions to text to be predicted, can be given birth to Into parameters,acoustic sequences such as corresponding spectrum, fundamental frequencies.
S7, parameters,acoustic sequence is spliced, to generate phonetic synthesis result.
Waveform concatenation finally is carried out to parameters,acoustic sequence using vocoder, so as to generate final phonetic synthesis result.
The phoneme synthesizing method of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used Family usage experience.
To achieve the above object, the present invention also proposes a kind of rhythm model trainer for phonetic synthesis.
Fig. 3 is the structural representation of the rhythm model trainer according to an embodiment of the invention for phonetic synthesis Figure.
As shown in figure 3, the rhythm model trainer for phonetic synthesis may include:Extraction module 110, extensive module 120 and training module 130.
Extraction module 110 is used to from training corpus text extract text feature and marker characteristic.
Wherein, training corpus can be split as multiple participles, and each participle is respectively provided with corresponding text feature and mark is special Levy.Text feature may include the features such as part of speech, word length.Marker characteristic can be the corresponding rhythm pause level of classification of participle, The corresponding rhythm pause level of such as rhythm word is #1, and the corresponding rhythm pause level of prosodic phrase is #2, intonation phrase correspondence Rhythm pause level for #3 etc..
For example, example sentence is " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 activities # 3”.Word sequence x is:European Union determines that setting up joint force's strike Mediterranean steals into another country activity, and flag sequence y is:#2#1#2#3#2#1# 1#3.Flag sequence y is made up of multiple marker characteristics.
Extensive module 120 is used for extensive to multiple participles progress in training corpus text based on Chinese thesaurus.
Specifically, extensive module 120 can will add feature with the identical synonym such as function word usage, meaning, part of speech, enter The extensive extension of row.
For example, the synonym of " establishment " may include " to set up ", " establishment " etc..
Training module 130 is used to be trained rhythm model.
Specifically, training module 130 can be trained by below equation to rhythm model.
Wherein, x is word sequence;Y is flag sequence;P (y | x) it is the probability for occurring flag sequence y under word sequence x;Z (x) it is normalization factor,tk(yi-1, yi, x, i) and it is whole The feature of observation sequence and respective markers sequence at i-1 the and i moment, is transfer function;sk(yi, x, i) and to be whole at the i moment The feature of observation sequence and mark, is function of state;λkFor the weight parameter for the transfer function that need to train estimation;μkTo need training The weight parameter of the function of state of estimation.
For example, " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 work to training corpus Dynamic #3 " in participle " establishments " can be generalized for " setting up ", " establishment ", the following real number value tag of formation:
Its characteristic function is
Thus, weight parameter λ can be trainedkAnd μk
The rhythm model trainer for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model It is more perfect, and then lift the accuracy of prosody prediction.
Fig. 4 is the structural representation of speech synthetic device according to an embodiment of the invention.
As shown in figure 4, speech synthetic device may include:Analysis module 140, prosody prediction module 150, acoustical predictions module 160 and generation module 170.
Analysis module 140 is used to extract text feature from text to be predicted, and text feature is inputted into rhythm model.
In an embodiment of the present invention, text dividing to be predicted can be multiple participles by analysis module 140, then obtain every The features such as the corresponding part of speech of individual participle, word length, then input the rhythm model generated in a upper embodiment by above-mentioned text feature.
Prosody prediction module 150 is used to carry out prosody prediction to text to be predicted according to rhythm model.
Specifically, prosody prediction module 150 can utilize characteristic function weight parameter λkAnd μk, rhythm is carried out to text to be predicted Rule prediction.
Wherein, the feature of text progress prosody prediction to be predicted is:
Wherein, x is word sequence;I is the sequence moment;B (x, i) is features of the word sequence x at the i moment;xiIt is x at the i moment State.
For example, after extensive to participle progress based on Chinese thesaurus, in xiDuring=" establishment ", deposited in rhythm model With real number value tag
With Corresponding characteristic function Related weight parameter λkAnd μk, then in the word sequence for corresponding to " determine set up joint force " xiProsody prediction sequences y during=" establishment "i=#2.And before synonym is extensive, above-mentioned real number value tag is not present, it is impossible to obtain Obtain the related weight parameter λ of corresponding characteristic functionkAnd μk, thus can not accurately provide the probabilistic information of correlation.Therefore add After Chinese thesaurus, the accuracy of prosody prediction can be improved.
Prosody prediction is carried out to whole segmentation sequence using the above method, the rhythm pause level of each participle is obtained, from And complete prosody prediction.
Acoustical predictions module 160 is used to further carry out acoustical predictions to text to be predicted, to generate parameters,acoustic sequence.
Specifically, rhythm pause level can be input in acoustical predictions model by acoustical predictions module 160, so as to treat pre- Survey text and carry out acoustical predictions, the parameters,acoustic sequences such as corresponding spectrum, fundamental frequency can be generated.
Generation module 170 is used to splice parameters,acoustic sequence, to generate phonetic synthesis result.
Specifically, generation module 170 can carry out waveform concatenation using vocoder to parameters,acoustic sequence, so as to generate final Phonetic synthesis result.
The speech synthetic device of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used Family usage experience.
In the description of the invention, it is to be understood that term " " center ", " longitudinal direction ", " transverse direction ", " length ", " width ", " thickness ", " on ", " under ", "front", "rear", "left", "right", " vertical ", " level ", " top ", " bottom " " interior ", " outer ", " up time The orientation or position relationship of the instruction such as pin ", " counterclockwise ", " axial direction ", " radial direction ", " circumference " be based on orientation shown in the drawings or Position relationship, is for only for ease of the description present invention and simplifies description, rather than indicate or imply that the device or element of meaning must There must be specific orientation, with specific azimuth configuration and operation, therefore be not considered as limiting the invention.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that indicating or implying relative importance Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can express or Implicitly include at least one this feature.In the description of the invention, " multiple " are meant that at least two, such as two, three It is individual etc., unless otherwise specifically defined.
In the present invention, unless otherwise clearly defined and limited, term " installation ", " connected ", " connection ", " fixation " etc. Term should be interpreted broadly, for example, it may be fixedly connected or be detachably connected, or integrally;Can be that machinery connects Connect or electrically connect;Can be joined directly together, can also be indirectly connected to by intermediary, can be in two elements The connection in portion or the interaction relationship of two elements, unless otherwise clear and definite restriction.For one of ordinary skill in the art For, the concrete meaning of above-mentioned term in the present invention can be understood as the case may be.
In the present invention, unless otherwise clearly defined and limited, fisrt feature can be with "above" or "below" second feature It is that the first and second features are directly contacted, or the first and second features pass through intermediary mediate contact.Moreover, fisrt feature exists Second feature " on ", " top " and " above " but fisrt feature are directly over second feature or oblique upper, or be merely representative of Fisrt feature level height is higher than second feature.Fisrt feature second feature " under ", " lower section " and " below " can be One feature is immediately below second feature or obliquely downward, or is merely representative of fisrt feature level height less than second feature.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described Point is contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area Art personnel can be tied the not be the same as Example or the feature of example and non-be the same as Example or example described in this specification Close and combine.
Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changed, replacing and modification.

Claims (8)

1. a kind of rhythm model training method for phonetic synthesis, it is characterised in that comprise the following steps:
S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text;
S2, based on Chinese thesaurus in the training corpus text participle carry out it is extensive;And
S3, according to the text feature, the marker characteristic and it is extensive after participle, the rhythm model is trained.
2. the method as described in claim 1, it is characterised in that it is described according to the text feature, the marker characteristic and Participle after extensive, is trained to the rhythm model, specifically includes:
The rhythm model is trained by object function, to obtain the weight parameter of transfer function and the power of function of state Weight parameter.
3. a kind of method that phonetic synthesis is carried out using rhythm model as claimed in claim 1 or 2, it is characterised in that including Following steps:
S4, extract text feature from text to be predicted, and the text feature is inputted into the rhythm model;
S5, prosody prediction carried out to the text to be predicted according to the rhythm model;
S6, acoustical predictions further are carried out to the text to be predicted, to generate parameters,acoustic sequence;And
S7, the parameters,acoustic sequence is spliced, to generate phonetic synthesis result.
4. method as claimed in claim 3, it is characterised in that described to be entered according to the rhythm model to the text to be predicted Row prosody prediction, is specifically included:
According to transfer function and function of state, judge the text feature with the presence or absence of the weight parameter of corresponding transfer function and The weight parameter of function of state, if in the presence of obtaining the corresponding rhythm pause level of the text to be predicted.
5. a kind of rhythm model trainer for phonetic synthesis, including:Extraction module, for being carried from training corpus text Take the corresponding text feature of participle and marker characteristic, it is characterised in that also include:
Extensive module, it is extensive for being carried out based on Chinese thesaurus to the participle in the training corpus text;And
Training module, for according to the text feature, the marker characteristic and it is extensive after participle, to the rhythm model It is trained.
6. device as claimed in claim 5, it is characterised in that the training module, specifically for:
The rhythm model is trained by object function, to obtain the weight parameter of transfer function and the power of function of state Weight parameter.
7. a kind of rhythm model using as described in claim 5 or 6 carries out the device of phonetic synthesis, it is characterised in that including:
Analysis module, the rhythm model is inputted for extracting text feature from text to be predicted, and by the text feature;
Prosody prediction module, for carrying out prosody prediction to the text to be predicted according to the rhythm model;
Acoustical predictions module, for further carrying out acoustical predictions to the text to be predicted, to generate parameters,acoustic sequence;With And
Generation module, for splicing to the parameters,acoustic sequence, to generate phonetic synthesis result.
8. device as claimed in claim 7, it is characterised in that the prosody prediction module, specifically for:
According to transfer function and function of state, judge the text feature with the presence or absence of the weight parameter of corresponding transfer function and The weight parameter of function of state, if in the presence of obtaining the corresponding rhythm pause level of the text to be predicted.
CN201510337430.7A 2015-06-17 2015-06-17 Rhythm model training method and device for phonetic synthesis Active CN104867491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510337430.7A CN104867491B (en) 2015-06-17 2015-06-17 Rhythm model training method and device for phonetic synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510337430.7A CN104867491B (en) 2015-06-17 2015-06-17 Rhythm model training method and device for phonetic synthesis

Publications (2)

Publication Number Publication Date
CN104867491A CN104867491A (en) 2015-08-26
CN104867491B true CN104867491B (en) 2017-08-18

Family

ID=53913283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510337430.7A Active CN104867491B (en) 2015-06-17 2015-06-17 Rhythm model training method and device for phonetic synthesis

Country Status (1)

Country Link
CN (1) CN104867491B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105551481B (en) * 2015-12-21 2019-05-31 百度在线网络技术(北京)有限公司 The prosodic labeling method and device of voice data
CN106601228B (en) * 2016-12-09 2020-02-04 百度在线网络技术(北京)有限公司 Sample labeling method and device based on artificial intelligence rhythm prediction
CN109739968A (en) * 2018-12-29 2019-05-10 北京猎户星空科技有限公司 A kind of data processing method and device
CN110164413B (en) * 2019-05-13 2021-06-04 北京百度网讯科技有限公司 Speech synthesis method, apparatus, computer device and storage medium
CN112084766B (en) * 2019-06-12 2024-01-23 阿里巴巴集团控股有限公司 Text processing method and device, storage medium and processor
CN110516110B (en) * 2019-07-22 2023-06-23 平安科技(深圳)有限公司 Song generation method, song generation device, computer equipment and storage medium
CN111164674B (en) * 2019-12-31 2024-05-03 深圳市优必选科技股份有限公司 Speech synthesis method, device, terminal and storage medium
CN111226275A (en) * 2019-12-31 2020-06-02 深圳市优必选科技股份有限公司 Voice synthesis method, device, terminal and medium based on rhythm characteristic prediction
CN111210803B (en) * 2020-04-21 2021-08-03 南京硅基智能科技有限公司 System and method for training clone timbre and rhythm based on Bottle sock characteristics
CN111754978B (en) * 2020-06-15 2023-04-18 北京百度网讯科技有限公司 Prosodic hierarchy labeling method, device, equipment and storage medium
CN112786023B (en) * 2020-12-23 2024-07-02 竹间智能科技(上海)有限公司 Mark model construction method and voice broadcasting system
CN114707503B (en) * 2022-02-14 2023-04-07 慧言科技(天津)有限公司 Front-end text analysis method based on multi-task learning
CN118214907A (en) * 2024-03-06 2024-06-18 深圳市超时代软件有限公司 Text-to-video conversion system based on artificial intelligence and control method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101572083A (en) * 2008-04-30 2009-11-04 富士通株式会社 Method and device for making up words by using prosodic words
CN102063898A (en) * 2010-09-27 2011-05-18 北京捷通华声语音技术有限公司 Method for predicting prosodic phrases
CN102651217A (en) * 2011-02-25 2012-08-29 株式会社东芝 Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1872361A4 (en) * 2005-03-28 2009-07-22 Lessac Technologies Inc Hybrid speech synthesizer, method and use

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101572083A (en) * 2008-04-30 2009-11-04 富士通株式会社 Method and device for making up words by using prosodic words
CN102063898A (en) * 2010-09-27 2011-05-18 北京捷通华声语音技术有限公司 Method for predicting prosodic phrases
CN102651217A (en) * 2011-02-25 2012-08-29 株式会社东芝 Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis

Also Published As

Publication number Publication date
CN104867491A (en) 2015-08-26

Similar Documents

Publication Publication Date Title
CN104867491B (en) Rhythm model training method and device for phonetic synthesis
CN102354495B (en) Test method and system for semi-open oral test questions
Jing et al. Prominence features: Effective emotional features for speech emotion recognition
CN105185372B (en) Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device
US8990089B2 (en) Text to speech synthesis for texts with foreign language inclusions
CN101064104B (en) Emotion voice creating method based on voice conversion
CN102034475B (en) Method for interactively scoring open short conversation by using computer
CN102360543A (en) HMM-based bilingual (mandarin-english) TTS techniques
CN110147451A (en) A kind of session command understanding method of knowledge based map
CN106782603A (en) Intelligent sound evaluating method and system
Raza et al. Design and development of phonetically rich Urdu speech corpus
KR100669241B1 (en) Interactive Speech Synthesis System and Method Using Speech Act Information
CN105895076B (en) A kind of phoneme synthesizing method and system
Lane A Latin grammar for schools and colleges
Kyriakopoulos et al. Automatic characterisation of the pronunciation of non-native English speakers using phone distance features
Raptis et al. Expressive speech synthesis for storytelling: the innoetics’ entry to the blizzard challenge 2016
KR20130067854A (en) Apparatus and method for language model discrimination training based on corpus
Nguyen Hmm-based vietnamese text-to-speech: Prosodic phrasing modeling, corpus design system design, and evaluation
KR101669408B1 (en) Apparatus and method for reading foreign language
Kim et al. Designing a large recording script for open-domain English speech synthesis
Narupiyakul et al. A stochastic knowledge-based Thai text-to-speech system
ELothmany Arabic text-to-speech including prosody (ATTSIP): for mobile devices
Schmiedel et al. Development of Speech Syntheses for Lower Sorbian and Upper Sorbian using MaryTTS
Hansakunbuntheung et al. Mongolian speech corpus for text-to-speech development
Kato et al. Perceptual study on the effects of language transfer on the naturalness of Japanese prosody for isolated words

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OSZAR »