CN104867491B - Rhythm model training method and device for phonetic synthesis - Google Patents
Rhythm model training method and device for phonetic synthesis Download PDFInfo
- Publication number
- CN104867491B CN104867491B CN201510337430.7A CN201510337430A CN104867491B CN 104867491 B CN104867491 B CN 104867491B CN 201510337430 A CN201510337430 A CN 201510337430A CN 104867491 B CN104867491 B CN 104867491B
- Authority
- CN
- China
- Prior art keywords
- text
- rhythm model
- rhythm
- participle
- predicted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000033764 rhythmic process Effects 0.000 title claims abstract description 102
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 46
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000003550 marker Substances 0.000 claims abstract description 28
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 230000002194 synthesizing effect Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a kind of rhythm model training method and device for phonetic synthesis, wherein, for the rhythm model training method of phonetic synthesis, including:S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text;S2, based on Chinese thesaurus in training corpus text participle carry out it is extensive;And S3, according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained.The rhythm model training method and device for phonetic synthesis of the embodiment of the present invention, by extracting the corresponding text feature of participle and marker characteristic from training corpus text, multiple participles in training corpus text are carried out based on Chinese thesaurus extensive, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained, so that rhythm model is more perfect, and then lift the accuracy of prosody prediction.
Description
Technical field
The present invention relates to literary periodicals technical field, more particularly to a kind of rhythm model training method for phonetic synthesis
And device.
Background technology
Phonetic synthesis, also known as literary periodicals technology, it is a kind of that text information can be converted into voice and read aloud
Technology.With the continuous progress of science and technology, the application of phonetic synthesis is more and more extensive, such as report, the sound novel of news and information
Deng.In daily life, also the information such as short message, mail can be synthesized by voice by phonetic synthesis, is to provide one kind user more
Obtain the mode of information.
In speech synthesis system, prosody prediction is the basis of whole system, if rhythm pause prediction error can be direct
Influence the effect of phonetic synthesis.For example:Synthesis text is " if passerby passs its empty bottle ", and the correct rhythm should be " if #
1 passerby #1 passs its #2 of #1 mono- #1 empty bottle ", and real prosody prediction result is " if #1 passerby #1 passs its #1 mono- of #2
Individual #1 empty bottles ", #1 therein represents dwell, and #2 represents big pause.It is final that rhythm pause prediction error causes the sentence
The inadequate remarkable fluency of synthetic effect, so as to cause user experience poor.
The content of the invention
It is contemplated that at least solving one of technical problem in correlation technique to a certain extent.Therefore, the present invention
One purpose is a kind of rhythm model training method for phonetic synthesis of proposition, and this method can improve rhythm model, carry
Rise the accuracy of prosody prediction.
Second object of the present invention is to propose a kind of phoneme synthesizing method.
Third object of the present invention is to propose a kind of rhythm model trainer for phonetic synthesis.
Fourth object of the present invention is to propose a kind of speech synthetic device.
To achieve these goals, first aspect present invention embodiment proposes a kind of rhythm model for phonetic synthesis
Training method, including:S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text;S2, based on synonymous
Word word woods carries out extensive to the participle in the training corpus text;And S3, according to the text feature, the marker characteristic
And it is extensive after participle, the rhythm model is trained.
The rhythm model training method for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text
Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus
Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model
It is more perfect, and then lift the accuracy of prosody prediction.
Second aspect of the present invention embodiment proposes a kind of phoneme synthesizing method, including:S4, the extraction from text to be predicted
Text feature, and the text feature is inputted into the rhythm model;S5, according to the rhythm model to the text to be predicted
Carry out prosody prediction;S6, acoustical predictions further are carried out to the text to be predicted, to generate parameters,acoustic sequence;And S7,
The parameters,acoustic sequence is spliced, to generate phonetic synthesis result.
The phoneme synthesizing method of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special
Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted
Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on
The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used
Family usage experience.
Third aspect present invention embodiment proposes a kind of rhythm model trainer for phonetic synthesis, including:Carry
Modulus block, for extracting the corresponding text feature of participle and marker characteristic from training corpus text;Extensive module, for based on
Chinese thesaurus carries out extensive to the participle in the training corpus text;And training module, for special according to the text
Levy, the marker characteristic and it is extensive after participle, the rhythm model is trained.
The rhythm model trainer for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text
Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus
Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model
It is more perfect, and then lift the accuracy of prosody prediction.
Fourth aspect present invention embodiment proposes a kind of speech synthetic device, including:Extraction module, for be predicted
Text feature is extracted in text, and the text feature is inputted into the rhythm model;Prosody prediction module, for according to described
Rhythm model carries out prosody prediction to the text to be predicted;Acoustical predictions module, for further to the text to be predicted
Acoustical predictions are carried out, to generate parameters,acoustic sequence;And generation module, for splicing to the parameters,acoustic sequence,
To generate phonetic synthesis result.
The speech synthetic device of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special
Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted
Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on
The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used
Family usage experience.
Brief description of the drawings
Fig. 1 is the flow chart of the rhythm model training method according to an embodiment of the invention for phonetic synthesis.
Fig. 2 is the flow chart of phoneme synthesizing method according to an embodiment of the invention.
Fig. 3 is the structural representation of the rhythm model trainer according to an embodiment of the invention for phonetic synthesis
Figure.
Fig. 4 is the structural representation of speech synthetic device according to an embodiment of the invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and be not considered as limiting the invention.
Below with reference to the accompanying drawings describe the rhythm model training method and device for phonetic synthesis of the embodiment of the present invention with
And phoneme synthesizing method and device.
Fig. 1 is the flow chart of the rhythm model training method according to an embodiment of the invention for phonetic synthesis.
As shown in figure 1, the rhythm model training method for phonetic synthesis may include:
S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text.
Wherein, training corpus can be split as multiple participles, and each participle is respectively provided with corresponding text feature and mark is special
Levy.Text feature may include the features such as part of speech, word length.Marker characteristic can be the corresponding rhythm pause level of classification of participle,
The corresponding rhythm pause level of such as rhythm word is #1, and the corresponding rhythm pause level of prosodic phrase is #2, intonation phrase correspondence
Rhythm pause level for #3 etc..
For example, example sentence is " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 activities #
3”.Word sequence x is:European Union determines that setting up joint force's strike Mediterranean steals into another country activity, and flag sequence y is:#2#1#2#3#2#1#
1#3.Flag sequence y is made up of multiple marker characteristics.
S2, based on Chinese thesaurus in training corpus text multiple participles carry out it is extensive.
Specifically, feature can will be added with the identical synonym such as function word usage, meaning, part of speech, carries out extensive extension.
For example, the synonym of " establishment " may include " to set up ", " establishment " etc..
S3, according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained.
Specifically, rhythm model can be trained by below equation.
Wherein, x is word sequence;Y is flag sequence;P (y | x) it is the probability for occurring flag sequence y under word sequence x;Z
(x) it is normalization factor,tk(yi-1, yi, x, i) and it is whole
The feature of observation sequence and respective markers sequence at i-1 the and i moment, is transfer function;sk(yi, x, i) and to be whole at the i moment
The feature of observation sequence and mark, is function of state;λkFor the weight parameter for the transfer function that need to train estimation;μkTo need training
The weight parameter of the function of state of estimation.
For example, " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 work to training corpus
Dynamic #3 " in participle " establishments " can be generalized for " setting up ", " establishment ", the following real number value tag of formation:
Its characteristic function is
Thus, weight parameter λ can be trainedkAnd μk。
The rhythm model training method for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text
Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus
Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model
It is more perfect, and then lift the accuracy of prosody prediction.
Fig. 2 is the flow chart of phoneme synthesizing method according to an embodiment of the invention.
As shown in Fig. 2 phoneme synthesizing method may include:
S4, text feature is extracted from text to be predicted, and text feature is inputted into rhythm model.
In an embodiment of the present invention, it can be multiple participles by text dividing to be predicted, then obtain each participle correspondence
The feature such as part of speech, word length, above-mentioned text feature is then inputted into the rhythm model generated in a upper embodiment.
S5, according to rhythm model to text to be predicted carry out prosody prediction.
Specifically, using the weight parameter λ of characteristic functionkAnd μk, prosody prediction is carried out to text to be predicted.
Wherein, the feature of text progress prosody prediction to be predicted is:
Wherein, x is word sequence;I is the sequence moment;B (x, i) is features of the word sequence x at the i moment;xiIt is x at the i moment
State.
Function of state is:
Transfer function is:
Wherein, y is flag sequence;I is the sequence moment;B (x, i) is features of the word sequence x at the i moment;yiIt is y in i
The state at quarter.
For example, after extensive to participle progress based on Chinese thesaurus, in xiDuring=" establishment ", deposited in rhythm model
With real number value tag
With
Corresponding characteristic function Related weight parameter λkAnd μk, then in the word sequence for corresponding to " determine set up joint force "
xiProsody prediction sequences y during=" establishment "i=#2.And before synonym is extensive, above-mentioned real number value tag is not present, it is impossible to obtain
Obtain the related weight parameter λ of corresponding characteristic functionkAnd μk, thus can not accurately provide the probabilistic information of correlation.Therefore add
After Chinese thesaurus, the accuracy of prosody prediction can be improved.
Prosody prediction is carried out to whole segmentation sequence using the above method, the rhythm pause level of each participle is obtained, from
And complete prosody prediction.
S6, further to text to be predicted carry out acoustical predictions, to generate parameters,acoustic sequence.
Rhythm pause level is input in acoustical predictions model, so as to carry out acoustical predictions to text to be predicted, can be given birth to
Into parameters,acoustic sequences such as corresponding spectrum, fundamental frequencies.
S7, parameters,acoustic sequence is spliced, to generate phonetic synthesis result.
Waveform concatenation finally is carried out to parameters,acoustic sequence using vocoder, so as to generate final phonetic synthesis result.
The phoneme synthesizing method of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special
Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted
Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on
The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used
Family usage experience.
To achieve the above object, the present invention also proposes a kind of rhythm model trainer for phonetic synthesis.
Fig. 3 is the structural representation of the rhythm model trainer according to an embodiment of the invention for phonetic synthesis
Figure.
As shown in figure 3, the rhythm model trainer for phonetic synthesis may include:Extraction module 110, extensive module
120 and training module 130.
Extraction module 110 is used to from training corpus text extract text feature and marker characteristic.
Wherein, training corpus can be split as multiple participles, and each participle is respectively provided with corresponding text feature and mark is special
Levy.Text feature may include the features such as part of speech, word length.Marker characteristic can be the corresponding rhythm pause level of classification of participle,
The corresponding rhythm pause level of such as rhythm word is #1, and the corresponding rhythm pause level of prosodic phrase is #2, intonation phrase correspondence
Rhythm pause level for #3 etc..
For example, example sentence is " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 activities #
3”.Word sequence x is:European Union determines that setting up joint force's strike Mediterranean steals into another country activity, and flag sequence y is:#2#1#2#3#2#1#
1#3.Flag sequence y is made up of multiple marker characteristics.
Extensive module 120 is used for extensive to multiple participles progress in training corpus text based on Chinese thesaurus.
Specifically, extensive module 120 can will add feature with the identical synonym such as function word usage, meaning, part of speech, enter
The extensive extension of row.
For example, the synonym of " establishment " may include " to set up ", " establishment " etc..
Training module 130 is used to be trained rhythm model.
Specifically, training module 130 can be trained by below equation to rhythm model.
Wherein, x is word sequence;Y is flag sequence;P (y | x) it is the probability for occurring flag sequence y under word sequence x;Z
(x) it is normalization factor,tk(yi-1, yi, x, i) and it is whole
The feature of observation sequence and respective markers sequence at i-1 the and i moment, is transfer function;sk(yi, x, i) and to be whole at the i moment
The feature of observation sequence and mark, is function of state;λkFor the weight parameter for the transfer function that need to train estimation;μkTo need training
The weight parameter of the function of state of estimation.
For example, " European Union #2 determines that #1 sets up #2 joint force #3 strikes #2 Mediterranean #1 and steals into another country #1 work to training corpus
Dynamic #3 " in participle " establishments " can be generalized for " setting up ", " establishment ", the following real number value tag of formation:
Its characteristic function is
Thus, weight parameter λ can be trainedkAnd μk。
The rhythm model trainer for phonetic synthesis of the embodiment of the present invention, by being extracted from training corpus text
Multiple participles in training corpus text are carried out general by the corresponding text feature of participle and marker characteristic based on Chinese thesaurus
Change, then according to text feature, marker characteristic and it is extensive after participle, rhythm model is trained so that rhythm model
It is more perfect, and then lift the accuracy of prosody prediction.
Fig. 4 is the structural representation of speech synthetic device according to an embodiment of the invention.
As shown in figure 4, speech synthetic device may include:Analysis module 140, prosody prediction module 150, acoustical predictions module
160 and generation module 170.
Analysis module 140 is used to extract text feature from text to be predicted, and text feature is inputted into rhythm model.
In an embodiment of the present invention, text dividing to be predicted can be multiple participles by analysis module 140, then obtain every
The features such as the corresponding part of speech of individual participle, word length, then input the rhythm model generated in a upper embodiment by above-mentioned text feature.
Prosody prediction module 150 is used to carry out prosody prediction to text to be predicted according to rhythm model.
Specifically, prosody prediction module 150 can utilize characteristic function weight parameter λkAnd μk, rhythm is carried out to text to be predicted
Rule prediction.
Wherein, the feature of text progress prosody prediction to be predicted is:
Wherein, x is word sequence;I is the sequence moment;B (x, i) is features of the word sequence x at the i moment;xiIt is x at the i moment
State.
For example, after extensive to participle progress based on Chinese thesaurus, in xiDuring=" establishment ", deposited in rhythm model
With real number value tag
With
Corresponding characteristic function Related weight parameter λkAnd μk, then in the word sequence for corresponding to " determine set up joint force "
xiProsody prediction sequences y during=" establishment "i=#2.And before synonym is extensive, above-mentioned real number value tag is not present, it is impossible to obtain
Obtain the related weight parameter λ of corresponding characteristic functionkAnd μk, thus can not accurately provide the probabilistic information of correlation.Therefore add
After Chinese thesaurus, the accuracy of prosody prediction can be improved.
Prosody prediction is carried out to whole segmentation sequence using the above method, the rhythm pause level of each participle is obtained, from
And complete prosody prediction.
Acoustical predictions module 160 is used to further carry out acoustical predictions to text to be predicted, to generate parameters,acoustic sequence.
Specifically, rhythm pause level can be input in acoustical predictions model by acoustical predictions module 160, so as to treat pre-
Survey text and carry out acoustical predictions, the parameters,acoustic sequences such as corresponding spectrum, fundamental frequency can be generated.
Generation module 170 is used to splice parameters,acoustic sequence, to generate phonetic synthesis result.
Specifically, generation module 170 can carry out waveform concatenation using vocoder to parameters,acoustic sequence, so as to generate final
Phonetic synthesis result.
The speech synthetic device of the embodiment of the present invention, by extracting text feature from text to be predicted, and text is special
Input rhythm model is levied, prosody prediction is carried out to text to be predicted according to rhythm model, further to text carry out sound to be predicted
Prediction is learned, to generate parameters,acoustic sequence, and parameters,acoustic sequence is spliced, to generate phonetic synthesis result, is based on
The rhythm model of Chinese thesaurus, improves the accuracy of prosody prediction, and the more remarkable fluency so that the rhythm pauses, lifting is used
Family usage experience.
In the description of the invention, it is to be understood that term " " center ", " longitudinal direction ", " transverse direction ", " length ", " width ",
" thickness ", " on ", " under ", "front", "rear", "left", "right", " vertical ", " level ", " top ", " bottom " " interior ", " outer ", " up time
The orientation or position relationship of the instruction such as pin ", " counterclockwise ", " axial direction ", " radial direction ", " circumference " be based on orientation shown in the drawings or
Position relationship, is for only for ease of the description present invention and simplifies description, rather than indicate or imply that the device or element of meaning must
There must be specific orientation, with specific azimuth configuration and operation, therefore be not considered as limiting the invention.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that indicating or implying relative importance
Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can express or
Implicitly include at least one this feature.In the description of the invention, " multiple " are meant that at least two, such as two, three
It is individual etc., unless otherwise specifically defined.
In the present invention, unless otherwise clearly defined and limited, term " installation ", " connected ", " connection ", " fixation " etc.
Term should be interpreted broadly, for example, it may be fixedly connected or be detachably connected, or integrally;Can be that machinery connects
Connect or electrically connect;Can be joined directly together, can also be indirectly connected to by intermediary, can be in two elements
The connection in portion or the interaction relationship of two elements, unless otherwise clear and definite restriction.For one of ordinary skill in the art
For, the concrete meaning of above-mentioned term in the present invention can be understood as the case may be.
In the present invention, unless otherwise clearly defined and limited, fisrt feature can be with "above" or "below" second feature
It is that the first and second features are directly contacted, or the first and second features pass through intermediary mediate contact.Moreover, fisrt feature exists
Second feature " on ", " top " and " above " but fisrt feature are directly over second feature or oblique upper, or be merely representative of
Fisrt feature level height is higher than second feature.Fisrt feature second feature " under ", " lower section " and " below " can be
One feature is immediately below second feature or obliquely downward, or is merely representative of fisrt feature level height less than second feature.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described
Point is contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not
Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office
Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area
Art personnel can be tied the not be the same as Example or the feature of example and non-be the same as Example or example described in this specification
Close and combine.
Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example
Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, changed, replacing and modification.
Claims (8)
1. a kind of rhythm model training method for phonetic synthesis, it is characterised in that comprise the following steps:
S1, the corresponding text feature of extraction participle and marker characteristic from training corpus text;
S2, based on Chinese thesaurus in the training corpus text participle carry out it is extensive;And
S3, according to the text feature, the marker characteristic and it is extensive after participle, the rhythm model is trained.
2. the method as described in claim 1, it is characterised in that it is described according to the text feature, the marker characteristic and
Participle after extensive, is trained to the rhythm model, specifically includes:
The rhythm model is trained by object function, to obtain the weight parameter of transfer function and the power of function of state
Weight parameter.
3. a kind of method that phonetic synthesis is carried out using rhythm model as claimed in claim 1 or 2, it is characterised in that including
Following steps:
S4, extract text feature from text to be predicted, and the text feature is inputted into the rhythm model;
S5, prosody prediction carried out to the text to be predicted according to the rhythm model;
S6, acoustical predictions further are carried out to the text to be predicted, to generate parameters,acoustic sequence;And
S7, the parameters,acoustic sequence is spliced, to generate phonetic synthesis result.
4. method as claimed in claim 3, it is characterised in that described to be entered according to the rhythm model to the text to be predicted
Row prosody prediction, is specifically included:
According to transfer function and function of state, judge the text feature with the presence or absence of the weight parameter of corresponding transfer function and
The weight parameter of function of state, if in the presence of obtaining the corresponding rhythm pause level of the text to be predicted.
5. a kind of rhythm model trainer for phonetic synthesis, including:Extraction module, for being carried from training corpus text
Take the corresponding text feature of participle and marker characteristic, it is characterised in that also include:
Extensive module, it is extensive for being carried out based on Chinese thesaurus to the participle in the training corpus text;And
Training module, for according to the text feature, the marker characteristic and it is extensive after participle, to the rhythm model
It is trained.
6. device as claimed in claim 5, it is characterised in that the training module, specifically for:
The rhythm model is trained by object function, to obtain the weight parameter of transfer function and the power of function of state
Weight parameter.
7. a kind of rhythm model using as described in claim 5 or 6 carries out the device of phonetic synthesis, it is characterised in that including:
Analysis module, the rhythm model is inputted for extracting text feature from text to be predicted, and by the text feature;
Prosody prediction module, for carrying out prosody prediction to the text to be predicted according to the rhythm model;
Acoustical predictions module, for further carrying out acoustical predictions to the text to be predicted, to generate parameters,acoustic sequence;With
And
Generation module, for splicing to the parameters,acoustic sequence, to generate phonetic synthesis result.
8. device as claimed in claim 7, it is characterised in that the prosody prediction module, specifically for:
According to transfer function and function of state, judge the text feature with the presence or absence of the weight parameter of corresponding transfer function and
The weight parameter of function of state, if in the presence of obtaining the corresponding rhythm pause level of the text to be predicted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510337430.7A CN104867491B (en) | 2015-06-17 | 2015-06-17 | Rhythm model training method and device for phonetic synthesis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510337430.7A CN104867491B (en) | 2015-06-17 | 2015-06-17 | Rhythm model training method and device for phonetic synthesis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104867491A CN104867491A (en) | 2015-08-26 |
CN104867491B true CN104867491B (en) | 2017-08-18 |
Family
ID=53913283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510337430.7A Active CN104867491B (en) | 2015-06-17 | 2015-06-17 | Rhythm model training method and device for phonetic synthesis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104867491B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105551481B (en) * | 2015-12-21 | 2019-05-31 | 百度在线网络技术(北京)有限公司 | The prosodic labeling method and device of voice data |
CN106601228B (en) * | 2016-12-09 | 2020-02-04 | 百度在线网络技术(北京)有限公司 | Sample labeling method and device based on artificial intelligence rhythm prediction |
CN109739968A (en) * | 2018-12-29 | 2019-05-10 | 北京猎户星空科技有限公司 | A kind of data processing method and device |
CN110164413B (en) * | 2019-05-13 | 2021-06-04 | 北京百度网讯科技有限公司 | Speech synthesis method, apparatus, computer device and storage medium |
CN112084766B (en) * | 2019-06-12 | 2024-01-23 | 阿里巴巴集团控股有限公司 | Text processing method and device, storage medium and processor |
CN110516110B (en) * | 2019-07-22 | 2023-06-23 | 平安科技(深圳)有限公司 | Song generation method, song generation device, computer equipment and storage medium |
CN111164674B (en) * | 2019-12-31 | 2024-05-03 | 深圳市优必选科技股份有限公司 | Speech synthesis method, device, terminal and storage medium |
CN111226275A (en) * | 2019-12-31 | 2020-06-02 | 深圳市优必选科技股份有限公司 | Voice synthesis method, device, terminal and medium based on rhythm characteristic prediction |
CN111210803B (en) * | 2020-04-21 | 2021-08-03 | 南京硅基智能科技有限公司 | System and method for training clone timbre and rhythm based on Bottle sock characteristics |
CN111754978B (en) * | 2020-06-15 | 2023-04-18 | 北京百度网讯科技有限公司 | Prosodic hierarchy labeling method, device, equipment and storage medium |
CN112786023B (en) * | 2020-12-23 | 2024-07-02 | 竹间智能科技(上海)有限公司 | Mark model construction method and voice broadcasting system |
CN114707503B (en) * | 2022-02-14 | 2023-04-07 | 慧言科技(天津)有限公司 | Front-end text analysis method based on multi-task learning |
CN118214907A (en) * | 2024-03-06 | 2024-06-18 | 深圳市超时代软件有限公司 | Text-to-video conversion system based on artificial intelligence and control method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101572083A (en) * | 2008-04-30 | 2009-11-04 | 富士通株式会社 | Method and device for making up words by using prosodic words |
CN102063898A (en) * | 2010-09-27 | 2011-05-18 | 北京捷通华声语音技术有限公司 | Method for predicting prosodic phrases |
CN102651217A (en) * | 2011-02-25 | 2012-08-29 | 株式会社东芝 | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1872361A4 (en) * | 2005-03-28 | 2009-07-22 | Lessac Technologies Inc | Hybrid speech synthesizer, method and use |
-
2015
- 2015-06-17 CN CN201510337430.7A patent/CN104867491B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101572083A (en) * | 2008-04-30 | 2009-11-04 | 富士通株式会社 | Method and device for making up words by using prosodic words |
CN102063898A (en) * | 2010-09-27 | 2011-05-18 | 北京捷通华声语音技术有限公司 | Method for predicting prosodic phrases |
CN102651217A (en) * | 2011-02-25 | 2012-08-29 | 株式会社东芝 | Method and equipment for voice synthesis and method for training acoustic model used in voice synthesis |
Also Published As
Publication number | Publication date |
---|---|
CN104867491A (en) | 2015-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104867491B (en) | Rhythm model training method and device for phonetic synthesis | |
CN102354495B (en) | Test method and system for semi-open oral test questions | |
Jing et al. | Prominence features: Effective emotional features for speech emotion recognition | |
CN105185372B (en) | Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device | |
US8990089B2 (en) | Text to speech synthesis for texts with foreign language inclusions | |
CN101064104B (en) | Emotion voice creating method based on voice conversion | |
CN102034475B (en) | Method for interactively scoring open short conversation by using computer | |
CN102360543A (en) | HMM-based bilingual (mandarin-english) TTS techniques | |
CN110147451A (en) | A kind of session command understanding method of knowledge based map | |
CN106782603A (en) | Intelligent sound evaluating method and system | |
Raza et al. | Design and development of phonetically rich Urdu speech corpus | |
KR100669241B1 (en) | Interactive Speech Synthesis System and Method Using Speech Act Information | |
CN105895076B (en) | A kind of phoneme synthesizing method and system | |
Lane | A Latin grammar for schools and colleges | |
Kyriakopoulos et al. | Automatic characterisation of the pronunciation of non-native English speakers using phone distance features | |
Raptis et al. | Expressive speech synthesis for storytelling: the innoetics’ entry to the blizzard challenge 2016 | |
KR20130067854A (en) | Apparatus and method for language model discrimination training based on corpus | |
Nguyen | Hmm-based vietnamese text-to-speech: Prosodic phrasing modeling, corpus design system design, and evaluation | |
KR101669408B1 (en) | Apparatus and method for reading foreign language | |
Kim et al. | Designing a large recording script for open-domain English speech synthesis | |
Narupiyakul et al. | A stochastic knowledge-based Thai text-to-speech system | |
ELothmany | Arabic text-to-speech including prosody (ATTSIP): for mobile devices | |
Schmiedel et al. | Development of Speech Syntheses for Lower Sorbian and Upper Sorbian using MaryTTS | |
Hansakunbuntheung et al. | Mongolian speech corpus for text-to-speech development | |
Kato et al. | Perceptual study on the effects of language transfer on the naturalness of Japanese prosody for isolated words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |