RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://patents.google.com/patent/KR20030031936A/en below:

KR20030031936A - Mutiple Speech Synthesizer using Pitch Alteration Method

KR20030031936A - Mutiple Speech Synthesizer using Pitch Alteration Method - Google PatentsMutiple Speech Synthesizer using Pitch Alteration Method Download PDF Info

Publication number: KR20030031936A
Authority: KR; South Korea
Prior art keywords: pitch; voice; signal; synthesizer; sound
Prior art date: 2003-02-13
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Ceased

Application number

KR1020030009198A

Other languages

Korean (ko)

Inventor

ë°°ëªì§

ë°íì

Original Assignee

ë°°ëªì§

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2003-02-13

Filing date

2003-02-13

Publication date

2003-04-23

2003-02-13 Application filed by ë°°ëªì§ filed Critical ë°°ëªì§

2003-02-13 Priority to KR1020030009198A priority Critical patent/KR20030031936A/en

2003-04-23 Publication of KR20030031936A publication Critical patent/KR20030031936A/en

2003-06-24 Priority to PCT/KR2003/001238 priority patent/WO2004072951A1/en

Status Ceased legal-status Critical Current

Links

238000000034 method Methods 0.000 title description 17
230000004075 alteration Effects 0.000 title 1
230000002194 synthesizing effect Effects 0.000 claims abstract description 4
238000001514 detection method Methods 0.000 claims description 14
238000001308 synthesis method Methods 0.000 claims description 2
230000033764 rhythmic process Effects 0.000 claims 1
230000015572 biosynthetic process Effects 0.000 abstract description 4
230000000694 effects Effects 0.000 abstract description 4
230000005284 excitation Effects 0.000 abstract description 4
238000003786 synthesis reaction Methods 0.000 abstract description 4
210000001260 vocal cord Anatomy 0.000 abstract description 4
230000000737 periodic effect Effects 0.000 abstract description 3
230000008451 emotion Effects 0.000 abstract description 2
238000000605 extraction Methods 0.000 abstract 1
238000009987 spinning Methods 0.000 abstract 1
239000011295 pitch Substances 0.000 description 60
238000005516 engineering process Methods 0.000 description 11
238000010586 diagram Methods 0.000 description 6
238000005311 autocorrelation function Methods 0.000 description 2
230000002093 peripheral effect Effects 0.000 description 2
230000003252 repetitive effect Effects 0.000 description 2
230000001360 synchronised effect Effects 0.000 description 2
238000004891 communication Methods 0.000 description 1
230000006870 function Effects 0.000 description 1
210000004072 lung Anatomy 0.000 description 1
238000013139 quantization Methods 0.000 description 1
230000005236 sound signal Effects 0.000 description 1
230000001755 vocal effect Effects 0.000 description 1

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management

Landscapes

Engineering & Computer Science (AREA)
Computational Linguistics (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Physics & Mathematics (AREA)
Acoustics & Sound (AREA)
Multimedia (AREA)
Electrophonic Musical Instruments (AREA)
Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract Translated from Korean

ë³¸ ë°ëªì ìì±ì ì¤ìí í¹ì§ íë¼ë¯¸í°ì¸ í¼ì¹ë¥¼ ë³ê²½íì¬ ë§ì´í¬ë¥¼ íµíì¬ ìë ¥ë ë¨ì¼ ìì±ì ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±í´ì£¼ë í©ì±ê¸° êµ¬íì ê´í ê²ì´ë¤. ì¼ë°ì ì¼ë¡ ìì±ì ì¬ê¸°ì í¸(ìì)ê° ì¬íê¸°(ì±ë)ë¥¼ íµê³¼íì¬ ëì¤ë ì í¸ë¡ ê°ì íê³ ìì¼ë©°. ì¬ê¸°ì í¸ë í¼ì¹ ì±ë¶, ì¬íê¸°ë í¬ë§í¸ ì±ë¶ì¼ë¡ ëª¨ë¸ë§ í ì ìë¤. í¬ë§í¸ë ì±ëì ê¸°ííì ì¸ ëª¨ìì ë°ë¼ ë¬ë¼ì§ë¤. ìë¥¼ ë¤ì´ "ì" ë¼ë ìê³¼ "ì´"ë¼ë ìì ì¬ëì ì±ë ë³íì ìí´ì ë§ë¤ì´ ë¼ ì ìì¼ë©° ì´ ëì í¬ë§í¸ ì£¼íìë ê°ê° ë¤ë¥¸ ììì ëíë¸ë¤. ì´ì ê°ì´ í¬ë§í¸ë ìì´ ì ë³´ë¥¼ ê°ì§ê³ ìì¼ë©°, ìì±ì í¸ ëª¨ë¸ë§ìì ì¤ìí ììë¡ ìì©íë¤. í¼ì¹ë ì±ëì ì£¼ê¸°ì ì¸ ë¨ë¦¼ì ìí´ì ìì±ëë©° ì¸ê°ì ì²ê°ì ë§¤ì° ë¯¼ê°íê² ë°ìíë íë¼ë¯¸í°ë¡ì¨, ìì±ì í¸ì íìë¥¼ êµ¬ë¶íëë° ì¬ì©íë©°, ìì±ì í¸ì naturalnessì í° ìí¥ì ë¯¸ì¹ë¤. ê·¸ë¬ë¯ë¡ ì íí í¼ì¹ í´ìì ìì±í©ì±ì ìì§ì ì¢ì°íë ì¤ìí ììì´ë©° ìì±ì½ë©ì ìì´ìë í¼ì¹ì ì íí ì¶ì¶ê³¼ ë³µìì ìì§ì ê²°ì ì ì¸ ìí ì íë¤. ê·¸ë¦¬ê³ í¼ì¹ ì ë³´ë ìì±ì í¸ì ì ì±ì/ë¬´ì±ìì íë¨íë íë¼ë¯¸í°ë¡ë ì¬ì©ëë¤. í¼ì¹ë ì¬ëì ì±ë êµ¬ì¡°ì ì¼ì í ì ì½ì ê°ì§ëë°, ë¨ì±ì ê²½ì° ì¼ë°ì ì¼ë¡ 50-250Hz, ì¬ì±ì ê²½ì° 120-150Hzì ì¡´ì¬íë©° ê°ì¸, ìµì, ê°ì ë±ì ë°ë¼ì ë³íë¤. ì´ë¬í í¹ì±ì ê°ì§ë Pitch ë³ê²½í¨ì¼ë¡ì¨, íì¬ëì ëª©ìë¦¬ë¥¼ ì¬ë¬ ì¬ëì´ ë°ì±íë ëª©ìë¦¬ì²ë¼ í©ì±í´ ë¼ ì ìë¤. ë³¸ ë°ëªì ìì©ë¶ì¼ë ë¤ìíë¤. ìì©ë¶ì¼ë¡ë ì´ëê²½ê¸°ì¥ìì íì¬ëì ììì¼ë¡ ì¬ë¬ ì¬ëì´ ììíë í¨ê³¼ë¥¼ ë´ë ìì í©ì±ê¸°, ìì¼ì´ë íí°ì¥ ë±ììì ì¶í í©ì±ê¸°, ëë¦¼ë¸ë ì¥ëê° ë±ì ìì©í ì ìì¼ë©°, ìíë ì°ê·¹ììì í¨ê³¼ì, ì¥ìê° ì§ì ë¹ì°ë ë§ë²ì´ ê°ì ìì ëë ë°©ì§ ìì¤íì¼ë¡ë ìì© í ì ìë¤. ëí ìì¦ íê°ì ì ííê³ ìë ì¡¸ë¼ë§¨ ì´ë ì ëªì¸ ëª©ìë¦¬ íë´ë¥¼ ë´ë ìì±ë³ì¡°ìë ìì© í ì ìë¤.The present invention relates to a synthesizer implementation for synthesizing a single voice input through a microphone into multiple voices by changing pitch, which is an important feature parameter of the voice. In general, it is assumed that the excitation signal (sound source) passes through the filter (saint). The excitation signal can be modeled by the pitch component and the filter component by the formant component. The formant depends on the geometric shape of the saints. For example, the sound of "ah" and "er" can be produced by changes in human saints, and the formant frequency at this time is different. Thus formant has phonological information and plays an important role in speech signal modeling. Pitch is a parameter that is generated by periodic shaking of the vocal cords and is very sensitive to human hearing. It is used to distinguish the speaker of a voice signal and has a great influence on the naturalness of the voice signal. Therefore, accurate pitch analysis is an important factor that determines the sound quality of speech synthesis. Accurate extraction and reconstruction of pitch plays a decisive role in sound quality. The pitch information is also used as a parameter for determining the voiced sound / unvoiced sound of the voice signal. Pitch has certain constraints on the structure of the human vocal cords, which are generally 50-250Hz for men and 120-150Hz for women and vary with stress, intonation and emotion. By changing the pitch that has these characteristics, one voice can be synthesized like a voice of several people. The field of application of the invention is diverse. Applications include a cheer synthesizer that produces a cheering effect by one person at a sports stadium, a celebration synthesizer at a birthday or a party hall, or a spinning song toy. It can also be applied as an anti-theft system at home. It can also be applied to voice modulations that imitate the voices of celebrities such as jolamen, which are popular all the time.

Description Translated from Korean í¼ì¹ ë³ê²½ë²ì ì´ì©í ë¨ì¼ ìì± ë¤ì¤ ëª©ìë¦¬ í©ì±ê¸°{Mutiple Speech Synthesizer using Pitch Alteration Method}Multi Speech Synthesizer using Pitch Alteration Method

ë³¸ ë°ëªì í¼ì¹ë¥¼ ë³ê²½íì¬ ë¨ì¼ ëª©ìë¦¬ë¥¼ ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±íë ê²ì¼ë¡ì ìì±íµì ê¸°ì ë¶ì¼ ëë ì¤ëì¤ ì í¸ì²ë¦¬ ë¶ì¼ë¡ ë¶ë¥í ì ìë¤. íì¬ ì¬ì©ëê³ ìë ê¸°ì ì íì¬ëì ìì±ì ìë ¥ë°ì í¼ì¹ë¥¼ ë³ê²½ í í ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±í´ ì£¼ë ê²ì´ ìëë¼ íì¬ëì ìì±ì¼ë¡ í©ì±í´ ë´ë ê¸°ì ì ì¬ì©íê³ ìë¤. ë°ë¼ì ë¤ìí ëª©ìë¦¬ë¥¼ í©ì±í´ ë¼ ì ìë ë¨ì ì ê°ì§ê³ ìë¤.The present invention synthesizes a single voice into multiple voices by changing the pitch and can be classified into a voice communication technology field or an audio signal processing field. Currently used technology uses a technology of synthesizing one voice instead of multiple voices after changing the pitch after receiving one voice. Therefore, there is a disadvantage that can not synthesize a variety of voices.

ë³¸ ë°ëªì ì´ì ë¨ì ì ë³´ìíì¬ ë¤ìí ëª©ìë¦¬ë¥¼ í©ì±í´ ë¸ë¤.The present invention secures its shortcomings to synthesize various voices.

ë³¸ ë°ëªì ìì±ì ì¤ìí íë¼ë¯¸í°ì¸ í¼ì¹ë¥¼ ë³ê²½íì¬ ë¨ì¼ ìì±ì ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±í´ë´ë í©ì±ê¸°ì ëíì¬ ì ìíë ê²ì´ë¤. ë 1ì ì¼ë°ì ì¸ ìì±ìì± ëª¨ë¸ì´ë¤. íë¡ë¶í° ì±ëë¥¼ ê±°ì³ì ì±ëë¡ ë¤ì´ì¤ë ìë ¥ì ë ê°ì§ë¡ ëë ìê° ìëë°, ì ì±ìì í¼ì¹ ì£¼ê¸°ë¥¼ ê¸°ì´ë¡ í ìíì¤ ì´ë¡, ë¬´ì±ìì ëë¤ ë¸ì´ì¦ë¡ ëª¨ë¸ë§ì´ ê°ë¥íë¤. ì´ ë ì í¸ë¥¼ ì¤ìì¹ í ì í¸ë ìë ¥ ì í¸ì ìëì§ì ë°ë¼ ì´ëì´ ê³±í´ì§ê³ ì´ë¥¼ ì±ë ëª¨ë¸ì¸ ì¬íê¸°ë¥¼ ê±°ì¹ë©´ ìì±ì í¸ê° ë§ë¤ì´ì§ë¤. ìì± ë°ì± ëª¨ë¸ì ë°ë¼ ìì± ì í¸ë¥¼ ë¶ìí´ ë³´ë©´ ì¸ê°ì ê°ì±ê³¼ ê°ì ì ëíë´ë ì¬ê¸°(excitation)ì ë³´ì ìì¬ ë´ì©ì ëíë´ë ì±ë ì¬íê¸°ì í¬ë§í¸ ì ë³´ë¡ êµ¬ì±ëì´ ììì ì ì ìë¤. ì¬ê¸°ì ë³´ë¥¼ ëíë´ë í¼ì¹ë ì±ëì ì£¼ê¸°ì ì¸ ë¨ë¦¼ì ìí´ì ìì±ëë©° ì¸ê°ì ì²ê°ì ë§¤ì° ë¯¼ê°íê² ë°ìíë íë¼ë¯¸í°ë¡ì¨, ìì±ì í¸ì íìë¥¼ êµ¬ë¶íëë° ì¬ì©íë©°, ìì±ì í¸ì naturalnessì í° ìí¥ì ë¯¸ì¹ë¤. ì´ë¬í ì´ì¨ì ë³´ë¥¼ ê°ì§ë í¼ì¹ë¥¼ ë³ê²½íë©´ ë¤ìí í©ì±ìì ë§ë¤ì´ ë¼ ì ìë¤The present invention proposes a synthesizer that synthesizes a single voice into multiple voices by changing the pitch, which is an important parameter of the voice. 1 is a general speech generation model. The input from the lung to the vocal cords can be divided into two types: voiced sound is an impulse sequence based on pitch period, and unvoiced sound can be modeled as random noise. The signal that switches these two signals is multiplied by the gain according to the energy of the input signal, and the voice signal is generated by passing through the filter. Analyzing the voice signal according to the voice phonation model shows that the excitation information indicating the personality and emotion of the human being and the formant information of the vocal tract filter indicating the physician's content are composed. The pitch representing excitation information is a parameter that is generated by periodic shaking of the vocal cords and is very sensitive to human hearing. It is used to distinguish the speaker of a voice signal and has a great influence on the naturalness of the voice signal. By changing the pitch with such rhyme information, various synthesized sounds can be produced.

ë 1ì ì¢ëì ìì±ìì± ëª¨ë¸ì ì¤ëªíê¸° ìí ë¸ëë1 is a block diagram illustrating a conventional speech generation model

ë 2ë ì¼ë°ì ì¸ í¼ì¹ ë³ê²½ ìì¤íì ë¸ë¡ë2 is a block diagram of a typical pitch change system.

ë 3ì ë³¸ ë°ëªì ì ì©í í¼ì¹ ë³ê²½ ìì¤í ë¸ëë3 is a block diagram of a pitch change system applied to the present invention.

ë 4ë ë³¸ ë°ëªì ì ì©í í¼ì¹ ìì ê²ì¶ë°©ë²ì ë¸ëë4 is a block diagram of a pitch time detection method applied to the present invention.

ë 5ë ë³¸ ë°ëªì ì ì©í í¼ì¹ ë³ê²½ë²(PSOLA í©ì±ë²)5 is a pitch change method (PSOLA synthesis method) applied to the present invention.

ë 6ì ë¤ì¤ ëª©ìë¦¬ í©ì± ìì¤í íëì¨ì´ êµ¬ì±ë6 is a hardware configuration diagram of a multi-voice synthesis system

ë 7ì ë¤ì¤ ëª©ìë¦¬ í©ì± ìì¤íì ìíí¸ì¨ì´ íë¡ì° ì± í¸7 is a software flow chart of a multi-voice synthesis system

í¼ì¹ ë³ê²½ ìì¤íì ë 2ì ê°ì´ êµ¬ì±ëë¤. í¼ì¹ ë³ê²½ ìì¤íì ë¶ìë¨ììë ë§ì´í¬ë¡í°ì¼ë¡ ìë ¥ë ì ì í¸ì ëª©ì ì í¸ì í¼ì¹ë¥¼ ê²ì¶íì¬ ë³ê²½ ê·ì¹ ìì±ë¨ì ëê²¨ì¤ë¤. ë³ê²½ ê·ì¹ ìì±ë¨ììë ì´ë¥¼ ì´ì©íì¬ í¼ì¹ ë³ê²½ì¨ê³¼ ê·¸ì ì í©í í¼ì¹ ë³ê²½ë²ì ê²°ì íë¤. ì´ë¬í í¼ì¹ ë³ê²½ ê·ì¹ì ì¤ì í¼ì¹ ë³ê²½ë¨ì ì ê³µëì´ ì ì í¸ì í¼ì¹ë¥¼ ì ì ë í¼ì¹ ë³ê²½ë²ì ì ì©íì¬ ë³ê²½ì¨ ë§í¼ í¼ì¹ë¥¼ ë³ê²½íê³ í©ì±ë¨ììë ì´ë¥¼ ì´ì©íì¬ ìì±ì´ ë³ê²½ë í©ì±ìì ìì±íë¤. ì´ë¬í ê³¼ì ìë ì íí í¼ì¹ ê²ì¶ê¸°ë²ê³¼ í¨ê» ìê³¡ì´ ì ì í¼ì¹ ë³ê²½ê¸°ë²ì íìë¡ íë¤. ìì±ì í¸ì í¼ì¹ ê²ì¶ë²ì ìµê·¼ 40ëê° ìë§ì ë°©ë²ë¤ì´ ì ìëì´ ìë¤(ì°¸ê³ ë¬¸í). ì¼ìë¡ í¼ì¹ ê²ì¶ì ìê¸°ìê´í¨ìë²ì´ ì£¼ë¡ ì¬ì©ëê³ ìì¼ë©°, ì¸ê·¼ ìì±ííë¤ ê°ì ìê´ê´ê³ë¥¼ ê³ì°íì¬ ë°ë³µì ì¸ ííì ì£¼ê¸°ë¥¼ ê²ì¶íë ë°©ë²ì´ ìë¤(ì°¸ê³ ë¬¸í). í¼ì¹ì ë³ê²½ì í¼ì¹ ê²ì¶ì´ ì ì´ë£¨ì´ì§ ë¤ìì ì´ë¥¼ ê·¼ê±°ë¡ í¼ì¹ë¥¼ ë³ê²½ìí¤ê² ëë¤. ëí í¼ì¹ë¥¼ ë³ê²½íë ë°©ë²ì ì§ê¸ê¹ì§ ë§ì´ ì ìëì´ì ¸ ìë¤(ì°¸ê³ ë¬¸í). ì¼ìë¡ ìê° ìììì í¼ì¹ì£¼ê¸° ë¨ìë¡ ìì±ííì ëê² ë¶ì í ë¤ìì ë³ê²½ë í¼ì¹ì£¼ê¸° ë¨ìë¡ ì¤ì²©ìì¼ì ííì ì¬êµ¬ì±íë PSOLA(Pitch Synchronous Overwrap and Add) í¼ì¹ ë³ê²½ë²ì´ ìë¤(ì°¸ê³ ë¬¸í).The pitch change system is configured as shown in FIG. The analysis stage of the pitch change system detects the pitch of the original signal and the target signal input to the microphone and passes it to the change rule generator. The change rule generator uses this to determine the pitch change rate and a suitable pitch change method. This pitch change rule is provided to the actual pitch change stage, and the pitch of the original signal is changed by applying a predetermined pitch change method, and the synthesized stage uses the same to generate the synthesized sound whose voice is changed. This process requires a pitch change technique with low distortion along with an accurate pitch detector technique. Pitch detection of speech signals has been proposed in the last 40 years (Ref.). As an example, pitch detection is mainly used for the autocorrelation function, and there is a method of detecting the period of a repetitive waveform by calculating a correlation between adjacent voice waveforms (reference). Pitch change causes the pitch to change based on good pitch detection. Moreover, many methods of changing a pitch have been proposed so far (Ref.). For example, a Pitch Synchronous Overwrap and Add (PSOLA) pitch change method is used to reconstruct a waveform by segmenting a speech waveform in a time period in a time domain and then superimposing the waveform in a changed pitch period (reference).

ë 3ì ë³¸ ë°ëªìì ì¬ì©í í¼ì¹ ë³ê²½ ìì¤í ë¸ë¡ëì´ë¤. ë³¸ ë°ëªììë í¼ì¹ ê²ì¶ì ìíì¬ ë 4ì ê°ì ì´ì¨ ì¡°ì ì íìí ê²ì¶ë²ì ì¬ì©íìë¤. ë¨¼ì íë¦¬ì í¼ìì¤ íí°ë¥¼ íµí ê³ ì£¼íì ììì´ ê°ì¡°ë ì íìì¸¡ê³ìë¡ ííëë íí°ì ìì¼ë¡ íµê³¼ìí¨ ë¤ì ë¶ìêµ¬ê°ë³ë¡ ì»ì´ì§ë ì±ë¬¸ì ì§í í¹ì±ê³¼ ì£¼ê¸° í¹ì±ì ì ì©íì¬ì í¼ì¹ ê²ì¶ ê³¼ì ì ìííìë¤(ì°¸ê³ ë¬¸í). ìì ê°ì´ í¼ì¹ë¥¼ ê²ì¶íê³ ê²ì¶ë í¼ì¹ë¥¼ ë 5ì ê°ì PSOLA í¼ì¹ ë³ê²½ë²ì ì¬ì©íì¬ 140%, 120% ì ì¥ë í¼ì¹ì 80%, 60%ë¡ ìì¶ë í¼ì¹ë¥¼ ì½ê°ì delayë¥¼ ëì´ í©ì±íë©´ ë¤ì¤ ëª©ìë¦¬ í©ì±ìì ìì± í ì ìê²ëë¤.3 is a block diagram of a pitch change system used in the present invention. In the present invention, the detection method required for rhyme control as shown in FIG. 4 was used for pitch detection. First, the high frequency region through the pre-emphasis filter was passed inversely to the filter represented by the linear predictive coefficient, and the pitch detection process was performed by applying the amplitude characteristics and periodic characteristics of the gates obtained for each analysis section (reference). When the pitch is detected as described above, and the detected pitch is synthesized with a slight delay between 140%, 120% stretched pitch and 80%, 60% compressed pitch using the PSOLA pitch changing method as shown in FIG. Will be created.

[íëì¨ì´ ì¥ì¹ì êµ¬ì±][Configuration of Hardware Device]

ë§ì´í¬ë¡í°ìì ë¤ì´ì¤ë ìë ë¡ê·¸ ííì ëª©ìë¦¬ ì í¸(600)ë¥¼ ìë ¥ ë°ìì í¼ì¹ë¥¼ ë³ê²½íì¬ ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±íë ì¥ì¹ë ë 6ê³¼ ê°ë¤. ìë ë¡ê·¸ ííë¡ ìë ¥ë ëª©ìë¦¬ ì í¸íí(600)ì ì¦íê¸°(601)ìì ì¦íë ë¤ìì ìë¦¬ì´ì§(aliasing)í¨ê³¼ë¥¼ ì ê±°íê¸° ìí´ ì ìíµê³¼ì¬íê¸°(602)ë¥¼ íµê³¼íê³ , ììí(quantization) ë° ë¶í¸í(coding)ë¥¼ ìííë ìë ë¡ê·¸-ëì§í¸ ë³íê¸°(603)ë¥¼ íµê³¼í¨ì¼ë¡ì ì ííì¤ë¶í¸ë³ì¡°(PCM) ííì ëì§í¸ ì í¸ë¡ ë°ëì´ì ë²ì© CPUë ëì§í¸ ì í¸ì²ë¦¬ê¸°(DSP)ìì ìíí¸ì¨ì´ë íì¨ì´ì ìí´ ì²ë¦¬(604)ëë¤.An apparatus for synthesizing multiple voices by changing the pitch by receiving an analog voice signal 600 input from a microphone is illustrated in FIG. 6. The voice signal waveform 600 input in analog form is amplified by the amplifier 601 and then passed through the low pass filter 602 to eliminate the aliasing effect, and is then quantized and encoded. By passing through the analog-to-digital converter 603, which is converted into a linear pulse code modulation (PCM) type digital signal, it is processed 604 by software or firmware in a general purpose CPU or digital signal processor (DSP).

ì í¸ì²ë¦¬ ë ëë ì´ ì»´í¨í° ì²ë¦¬ê¸°(604)ê° ëë´ì¸ì ì¤ì¹ë ì£¼ë³ì¥ì¹(609)ë¥¼ ì°¸ê³ í ìë ìê³ , ëí ìë ¥ ëì§í¸ ì í¸ë ì²ë¦¬ ê²°ê³¼ë¥¼ ì ì¥íê¸° ìí´ ì£¼ë³ ë©ëª¨ë¦¬(605)ë¥¼ ì°¸ê³ í ìë ìë¤.When the signal is processed, the computer processor 604 may refer to a peripheral device 609 installed both inside and outside, and may also refer to the peripheral memory 605 to store input digital signals or processing results.

CPUìì ìíí¸ì¨ì´ì ìí´ í¼ì¹ë¥¼ ë³ê²½íì¬ ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±ë ëì§í¸ ì í¸ë ëì§í¸-ìë ë¡ê·¸ ë³íê¸°(608)ë¥¼ íµí´ íë³¸íë ìë ë¡ê·¸ ì í¸ííë¡ ë³íëë¤. ì´ ì í¸ë¥¼ ì ìíµê³¼ ì¬íê¸°(607)ì íµê³¼ìí¤ë©´ ììí ì¡ìì´ ì ê±°ë ìë ë¡ê·¸ ì í¸ê° ëê³ , ì ë¹í ì¦ííë©´(606) ì¤í¼ì»¤ ë±ì íµí´ì ë¤ì ì ìë ìë ë¡ê·¸ ì í¸(610)ê° ëë¤.The digital signal synthesized into the multiple voices by changing the pitch by software in the CPU is converted into a sampled analog signal form through the digital-to-analog converter 608. Passing this signal through the lowpass filter 607 results in an analog signal from which quantization noise has been removed, and when properly amplified (606), an analog signal 610 that can be heard through a speaker or the like.

[ìíí¸ì¨ì´ ì²ë¦¬ê³¼ì ][Software Process]

í¼ì¹ ë³ê²½ë²ì ì´ì©í ë¤ì¤ ëª©ìë¦¬ í©ì±ê¸°ë ê¸°ì¡´ ë¨ì¼ í¼ì¹ ë³ê²½ë²ì ì¬ì©íë ëì ì ë¤ì¤ í¼ì¹ ë³ê²½ë²ì ì¬ì©íë ìíí¸ì¨ì´ë íì¨ì´ë¥¼ ì¶ê°í ê²ì´ë¤. ë 7ì ë³¸ ë°ëªìì ì¬ì©í ë¤ì¤ ëª©ìë¦¬ í©ì±ê¸°ì ìíí¸ì¨ì´ íë¡ì° ì± í¸ë¥¼ ëíë¸ë¤.Multi-voice synthesizer using pitch change is an addition of software or firmware that uses multiple pitch change instead of using the traditional single pitch change. 7 shows a software flow chart of the multiple voice synthesizer used in the present invention.

ìë ë¡ê·¸-ëì§í¸ ë³íê¸°(ADC)ìì ìë ¥ë ë°ì´í° íë³¸(701)ê°ì´ í íë ìë¨ìë¡ ëìì ì²ë¦¬ëë¤. ë¨¼ì íì¬ íë ìì ìë ë°ì´í° ê°ì´ ì ì±ì êµ¬ê°ì¸ì§ ìëì§ë¥¼ íìíê³ , ì ì±ì êµ¬ê°ì´ ìëë©´(703) ë§ë²í¼ì ì ì ì¨(Buffer Rate, BR)ì ê³ì°íê² ëë¤. ì²ë¦¬ë ë°ì´í°ë¥¼ ëê¸°ìí¤ëë° íìí ë©ëª¨ë¦¬ ë²í¼ë¥¼ ë§ë²í¼(710)ë¼ê³ íë¤.The data sample 701 input from the analog-to-digital converter (ADC) is processed simultaneously in units of one frame. First, it is determined whether the data value in the current frame is a voiced sound section, and if it is not the voiced sound section (703), the occupancy ratio (Buffer Rate, BR) of the ring buffer is calculated. The memory buffer required to wait for the processed data is called ring buffer 710.

ë§ë²í¼ì ì ì ì¨(BR)ì ì²ë¦¬ë ë°ì´í°ê° ë§ë²í¼ìì ëê¸°ëë ìê°ë¹ì¨ì ëíë´ëë°, í íë ìì´ ë¹ì ì±ìêµ¬ê°ì´ê³ ë§ë²í¼ì ëê¸°íê³ ìë ìê°ì´ ì í´ì§ ìê°(ì BT=1.5ì´ì)ì ëì´ì°ë¤ë©´, ì²ë¦¬ìëë¥¼ ìë¹ê¸°ëë¡ ë°ì±ì ì²ë¦¬ìê° ë¨ì¶(708)ì ìííê² ëë¤. ì´ë ê² í¨ì¼ë¡ì¨ ë¤ì¤ í¼ì¹ë³ê²½ì´ ìíë ë ì¼ê¸°ëë ì²ë¦¬ìê° ì§ì°ì í´ìí ì ìê² ëë¤. ì¦, ì ì±ì êµ¬ê°ììë í¼ì¹ë³ê²½ì´ ìííê² ì´ë£¨ì´ì§ëë¡ ë°ì´í°ë¥¼ ì²ì²í ì¶ë ¥íì§ë§ ë¹ì ì±ì êµ¬ê°ììë ë¹ ë¥´ê² íì¬ ì ì²´ì ì¸ ìê°ì§ì°ì í´ìíê² í ê²ì´ë¤.The ring buffer occupancy ratio (BR) represents the time rate at which processed data is waited in the ring buffer. If the current frame is a non-voicing period and the waiting time in the ring buffer exceeds a predetermined time (eg BT = 1.5 or more), In order to speed up the processing speed, the voice processing time is shortened (708). This makes it possible to eliminate the processing time delay caused when multiple pitch changes are performed. In other words, the data is output slowly so that the pitch can be changed smoothly in the voiced sound section, but in the non-voiced sound section, the time delay is eliminated.

íì¬ì íë ìì´ ì ì±ì êµ¬ê°ì¸ì§ ë¹ì ì±ì êµ¬ê°ì¸ì§ë¥¼ ì¸¡ì íë ë°©ë²(702)ì ìì±ì²ë¦¬ êµì¬(ì°¸ê³ ë¬¸í)ì ë§ì´ ì ìëì´ì ¸ ìì¼ë©°, ì¼ë¡ë¡ ìëì§ ë ë²¨ì ì¸¡ì íì¬ ì½ê² íìí ì ìë¤. ì¦, íì¬ íë ìì íê· ìëì§ê° ì í´ì§ ë¬¸í± ê° ì´íë¼ë©´ ì´ êµ¬ê°ì ë¹ì ì±ì êµ¬ê°ì´ ëë¤.A method 702 of measuring whether the current frame is a voiced sound section or a non-voiced sound section has been proposed in a speech processing textbook (reference). For example, the energy level can be easily measured by measuring the energy level. That is, if the average energy of the current frame is less than or equal to a predetermined threshold value, this section becomes an unvoiced sound section.

ìë ¥ë ë°ì´íê° ì ì±ì êµ¬ê°ì´ë¼ë©´ í¼ì¹ìì ê²ì¶(705)ë²ì ì¬ì©íì¬ í¼ì¹ì£¼ê¸°ë¥¼ ê²ì¶ íì¬ì¼íë¤. ìì±ì í¸ì í¼ì¹ì£¼ê¸° ê²ì¶ë²ì ìµê·¼ 40ëê° ìë§ì ë°©ë²ë¤ì´ ì ìëì´ ìë¤(ì°¸ê³ ë¬¸í). ì¼ìë¡ í¼ì¹ê²ì¶ì ìê¸°ìê´í¨ìë²ì´ ì£¼ë¡ ì¬ì©ëê³ ìì¼ë©°, ì¸ê·¼ ìì±ííë¤ ê°ì ìê´ê´ê³ë¥¼ ê³ì°íì¬ ë°ë³µì ì¸ ííì ì£¼ê¸°ë¥¼ ê²ì¶íë ë°©ë²ì´ ìë¤(ì°¸ê³ ë¬¸í).If the input data is a voiced sound section, the pitch period should be detected using the pitch point detection 705 method. Pitch period detection method of speech signal has been proposed in the last 40 years (Ref.). As an example, pitch detection is mainly used for the autocorrelation function, and there is a method for detecting the period of a repetitive waveform by calculating correlations between adjacent voice waveforms (reference).

ë³¸ ë°ëªììë ììì ì¤ëªí ì´ì¨ ì¡°ì ì íìí ê²ì¶ë²ì ì¬ì©íìë¤.In the present invention, the detection method necessary for adjusting the rhyme described above was used.

ëí ì ì±ì êµ¬ê°ë´ìì ìµìì ë³íë¥¼ ì´ë ì ëë¡ ì í(ì, 1.5ë°° ì´ë´)íê¸° ìí´, ì°ìë ì ì±ì êµ¬ê°ì í¼ì¹ì£¼ê¸°ë¥¼ ê²ì¶í ë¤ìì íë ìë¹ ë³íëë¥¼ êµ¬íê³ , ë³íê° í¬ë¤ë©´ í¼ì¹ ì£¼ê¸°ë³ê²½ì ìííì¬ ëª©ìë¦¬ë¥¼ ìì ìí¤ê² ëë¤(706). í¼ì¹ì£¼ê¸° ë³ê²½ì í¼ì¹ì£¼ê¸° ê²ì¶ì´ ì ì´ë£¨ì´ì§ ë¤ìì ì´ë¥¼ ê·¼ê±°ë¡ í¼ì¹ì£¼ê¸°ë¥¼ ë³ê²½ìí¤ê² ëë¤. ëí í¼ì¹ì£¼ê¸°ë¥¼ ë³ê²½íë ë°©ë²ì ì§ê¸ê¹ì§ ë§ì´ ì ìëì´ì ¸ ìë¤(ì°¸ê³ ë¬¸í). ë³¸ ë°ëªììë ìê° ìììì í¼ì¹ì£¼ê¸° ë¨ìë¡ ìì±ííì ëê² ë¶ì í ë¤ìì ë³ê²½ë í¼ì¹ì£¼ê¸° ë¨ìë¡ ì¤ì²©ìì¼ì ííì ì¬êµ¬ì±íë PSOLA(Pitch Synchronous Overwrap and Add) í¼ì¹ ë³ê²½ë²((ì°¸ê³ ë¬¸í)ì ì¬ì©íì¬ ë¤ì¤ í¼ì¹ë³ê²½ì ìí íìë¤.In addition, in order to limit the change of intonation in the voiced sound zone to some extent (eg, within 1.5 times), the pitch period of the continuous voiced sound zone is detected, and then the change rate is calculated per frame. The voice is stabilized (706). Pitch period change is to change the pitch period based on the pitch period detection is well made. In addition, a number of methods for changing the pitch period have been proposed so far (Ref.). In the present invention, multiple pitches are changed by using a PSOLA (Pitch Synchronous Overwrap and Add) pitch change method (Ref.) That reconstructs a waveform by broadly segmenting a speech waveform in a pitch period unit in the time domain and then superimposing the changed waveform unit in a pitch period unit. Was done.

ì´ë ê² ì²ë¦¬ ìë£ë ìì± ë°ì´í°ë¤ì ë§ë²í¼ì ì ì¥ìí¤ê³ (709), ì ì¥ë ììì ë°ë¼ì ëì§í¸-ìë ë¡ê·¸ ë³íê¸°(DAC)ë¥¼ íµí´ ìì± ë°ì´í° íë³¸ ë¨ìë¡ ì¤í¼ì»¤í°ì íµí´ ì¶ë ¥íë¤(710). ì¬ê¸°ì ë¤ì¤ ëª©ìë¦¬ í©ì±ê¸°ì ê¸°ë¥ì ì¤ìê°ì¼ë¡ ì²ë¦¬ëë¤. ì¦, ìë ë¡ê·¸-ëì§í¸ ë³íê¸°(ADC)ìì í íë ìì ë°ì´í°ë¥¼ ë°ê³ (701)ëìë¶í° ê·¸ ë¤ì íë ìì ë°ì´í°ë¥¼ ë°ìì¬ ëê¹ì§ ì²ë¦¬(709)ê° ëë ì ìëë¡ í´ì¼ë§ íë¤.The processed voice data are stored in the ring buffer (709), and output through the speakerphone in units of voice data through a digital-to-analog converter (DAC) according to the stored order (710). The function of the multiple voice synthesizer is handled in real time here. That is, the processing 709 must end until the analog-to-digital converter (ADC) receives the data of one frame (701) until the data of the next frame is received.

[ì°¸ê³ ë¬¸í][references]

[1] ë°°ëªì§, ì´ìí¨, ëì§í¸ ìì±ë¶ì, ëìì¶íì¬, 1998.[1] Myung-Jin Bae, Sang-Hyo Lee, Digital Speech Analysis, Dong Young Publishing Co., 1998.

[2] ë°°ëªì§, ëì§í¸ ìì±í©ì±, ëìì¶íì¬, 1999.[2] Bae Myung-jin, Digital Speech Synthesis, Dong Young Publishing Co., 1999.

[3] ë°°ëªì§, ëì§í¸ ìì±ë¶í¸í, ëìì¶íì¬, 2000.[3] Bae Myung-jin, Digital Voice Coding, Dong Young Publishing Co., 2000.

[4] Rabiner and Schefer, Digital Signal Processing of Speech Signals,[4] Rabiner and Schefer, Digital Signal Processing of Speech Signals,

Prentice Hall, 1978.Prentice Hall, 1978.

[5] ë°íë¹, ë°°ëªì§, " ììë³ê²½ì ìí í¼ì¹ìì ê²ì¶ì ê´í ì°êµ¬ ", íêµìí¥íí, íê³ íì ë°íëí, ì 19ê¶ 1(s)í¸, No.1, pp 1, 49-152, 2000ë 7ì7-8ì¼.[5] Hyung-Bin Park, Myung-Jin Bae, "A Study on the Pitch-Point Detection for Tone Change", Korean Society for Acoustical Science, Summer Conference, Vol.19 (1), No.1, pp 1, 49-152, 2000 July 7-8.

ì´ììì ìì í ë°ì ê°ì´ ë³¸ ë°ëªì, ìì±ì ì´ì¨ ì ë³´ë¥¼ ê°ì§ê³ ìë ì¤ìí íë¼ë¯¸í°ì¸ í¼ì¹ë¥¼ ë³ê²½íì¬ ë¨ì¼ ìì±ì ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±í´ ë´ë ê²ì´ë¤. ìì±ì ë³´ê¸°ì ì MITìì ì§ì í 21ì¸ê¸° 10ë ê¸°ì , ì¼ì±ê²½ì ì°êµ¬ìê° ì ì í 21ì¸ê¸° 10ë ì ë§ê¸°ì ë¡ ì ì ë ë° ìë¤. ê¸°ì ì ì¤ìì± ì¸ìë ìì±ê¸°ì ê´ë ¨ ìì¥ì ì´ê³ ì ì±ì¥ì¸ë¥¼ ê¸°ë¡í ì ë§ì´ë¤. íì¬ êµë´ ìì±ê¸°ì ìì¥ì ì´ê¸°ë¨ê³ë¡ ì§ëí´ ì½ 200ìµì ê·ëª¨ë¡ ì¶ì ëê³ ìì¼ë, ì°íê· 50% ì´ìì ì±ì¥ì ì§ìí´ 2005ëìë êµë´ ìì±ê¸°ì ìì¥ê·ëª¨ë§ ì½ 1000ìµìì ë¬í ê²ì¼ë¡ ìì¸¡ëê³ ìë¤. ì´ì ê°ì´ ì ì°¨ ì¦ê°íê³ ìë ìì±ê¸°ì ìì¥ì ë³¸ ë°ëªì ë¤ìí ë¶ì¼ì ìì©í ì ìë¤. ì´ëê²½ê¸°ì¥ìì íì¬ëì ììì¼ë¡ ì¬ë¬ ì¬ëì´ ììíë í¨ê³¼ë¥¼ ë´ë ìì í©ì±ê¸°, ìì¼ì´ë íí°ì¥ ë±ììì ì¶í í©ì±ê¸°, ëë¦¼ë¸ë ì¥ëê° ë±ì ìì©í ì ìì¼ë©°, ìíë ì°ê·¹ììì í¨ê³¼ì, ì¥ìê° ì§ì ë¹ì°ë ë§ë²ì´ ê°ì ìì ëë ë°©ì§ ìì¤íì¼ë¡ë ìì© í ì ìë¤. ëí ìì¦ íê°ì ì ííê³ ìë ì¡¸ë¼ë§¨ ì´ë ì ëªì¸ ëª©ìë¦¬ íë´ë¥¼ ë´ë ìì±ë³ì¡°ìë ìì© í ì ìë¤. ì´ì ê°ì´ ë¤ìí ë¶ì¼ì ìì© í ì ìì¼ë©° ê·¸ íê¸í¨ê³¼ê° ìì£¼ í´ ê²ì¼ë¡ ììëë¤.As described above, the present invention synthesizes a single voice into multiple voices by changing pitch, which is an important parameter having voice rhyme information. Voice information technology has been selected as one of the twentieth century's 10th technologies designated by MIT and the ten most promising technologies selected by the Samsung Economic Research Institute. In addition to the importance of technology, the voice technology market is expected to record rapid growth. The domestic voice technology market is currently in its initial stage, estimated at about 20 billion won last year. However, the domestic voice technology market is expected to reach about 100 billion won in 2005, with annual growth of more than 50%. In this increasingly increasing voice technology market, the present invention can be applied to various fields. It can be applied to a cheering synthesizer that produces the effect of cheering by several people at a sports stadium, a celebration synthesizer at a birthday or a party, a sounding toy, etc. It can also be applied as a system. It can also be applied to voice modulation that mimics the voila of celebrities and celebrities. As such, it can be applied to various fields and its ripple effect is expected to be very large.

Claims (1) Translated from Korean

ë³¸ ë°ëªì í¬ë§í¸ ì±ë¶ì ê·¸ëë ì ì§íê³ ì´ì¨ ì ë³´ë¥¼ ê°ì§ë ì¤ìí ìì± íë¼ë¯¸í°ì¸ í¼ì¹ë¥¼ ë³ê²½íì¬ ë¨ì¼ ìì±ì ë¤ì¤ ëª©ìë¦¬ë¡ í©ì±í´ ë´ë ê²ì¼ë¡ì, ì¤ìê°ì¼ë¡ ì´ì¨ì ì ì´ í ì ìë ë°©ë²ì¼ë¡ í¼ì¹ ë³ê²½ì ìê°ìììì ì ì©íìê³ , ìê°ìì í¼ì¹ ë³ê²½ìì íìì ê°ì±ê³¼ ëªë£ì±ì ì ì§íë ¤ë©´ ë°ì±ìì ì¤ì¬ì´ ëë í¼ì¹ë¥¼ ê¸°ì¤ì¼ë¡ íì¬ í¼ì¹ ë³ê²½ì´ ì´ë£¨ì´ì ¸ì¼ íë©°, í¼ì¹ ë³ê²½ì ìííê¸° ìí´ìë ê·¸ ë°ì±ìì í¼ì¹ìì ì ê²ì¶í ì ìë ì íìì¸¡ë¶ìì ì´ì©í í¼ì¹ìì ê²ì¶ë²ì ì¬ì©íìì¼ë©°, ìê°ìììì ì¤ìê° í¼ì¹ ë³ê²½ì ìíì¬ PSOLA í©ì± ë°©ìì ì ì©íì¬ ì¬ë¬ ê°ì§ë¡ í¼ì¹ ë³ê²½ë ìì±ì ëìì í©ì±í¨ì¼ë¡ì¨ ë¤ì¤ì ëª©ìë¦¬ë¡ í©ì±í´ ì£¼ë í©ì±ê¸° êµ¬íì ê´í ë°©ì.The present invention synthesizes a single voice into multiple voices by changing the pitch, which is an important voice parameter having rhyme information, while maintaining the formant component. Pitch changes are applied in a time domain in a manner that can control the rhythm in real time. In order to maintain the individuality and clarity of the speaker in the time domain pitch change, the pitch change should be made based on the pitch that is the center of the speaker.In order to perform the pitch change, the linear predictive analysis can detect the pitch time of the speaker. Pitch point-of-sight detection method is used to implement synthesizer that synthesizes multiple voices by simultaneously synthesizing various pitch-changed voices by applying PSOLA synthesis method for real-time pitch change in time domain.

KR1020030009198A 2003-02-13 2003-02-13 Mutiple Speech Synthesizer using Pitch Alteration Method Ceased KR20030031936A (en) Priority Applications (2) Application Number Priority Date Filing Date Title KR1020030009198A KR20030031936A (en) 2003-02-13 2003-02-13 Mutiple Speech Synthesizer using Pitch Alteration Method PCT/KR2003/001238 WO2004072951A1 (en) 2003-02-13 2003-06-24 Multiple speech synthesizer using pitch alteration method Applications Claiming Priority (1) Application Number Priority Date Filing Date Title KR1020030009198A KR20030031936A (en) 2003-02-13 2003-02-13 Mutiple Speech Synthesizer using Pitch Alteration Method Publications (1) Family ID=29578508 Family Applications (1) Application Number Title Priority Date Filing Date KR1020030009198A Ceased KR20030031936A (en) 2003-02-13 2003-02-13 Mutiple Speech Synthesizer using Pitch Alteration Method Country Status (2) Cited By (11) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title KR100912339B1 (en) * 2007-05-10 2009-08-14 ì£¼ìíì¬ ì¼ì´í° Minority Speaker Voice Data Training Device Using Speech Variation and Its Method CN109712634A (en) * 2018-12-24 2019-05-03 ä¸åå¤§å¦ A kind of automatic sound conversion method TWI728277B (en) * 2017-11-10 2021-05-21 å¼åæ©éå¤«ç¾åæ Selecting pitch lag US11043226B2 (en) 2017-11-10 2021-06-22 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Apparatus and method for encoding and decoding an audio signal using downsampling or interpolation of scale parameters US11127408B2 (en) 2017-11-10 2021-09-21 FraunhoferâGesellschaft zur F rderung der angewandten Forschung e.V. Temporal noise shaping US11217261B2 (en) 2017-11-10 2022-01-04 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Encoding and decoding audio signals US11315583B2 (en) 2017-11-10 2022-04-26 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits US11315580B2 (en) 2017-11-10 2022-04-26 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Audio decoder supporting a set of different loss concealment tools US11462226B2 (en) 2017-11-10 2022-10-04 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Controlling bandwidth in encoders and/or decoders US11545167B2 (en) 2017-11-10 2023-01-03 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Signal filtering US11562754B2 (en) 2017-11-10 2023-01-24 Fraunhofer-Gesellschaft Zur F Rderung Der Angewandten Forschung E.V. Analysis/synthesis windowing function for modulated lapped transformation Families Citing this family (1) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title GB2498812A (en) * 2012-01-30 2013-07-31 China Ind Ltd Providing an time delayed and pitched shifted accompaniment to a sound produced by a user Family Cites Families (3) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title US5787398A (en) * 1994-03-18 1998-07-28 British Telecommunications Plc Apparatus for synthesizing speech by varying pitch JPH08202395A (en) * 1995-01-31 1996-08-09 Matsushita Electric Ind Co Ltd Pitch converting method and its device KR100417092B1 (en) * 2001-05-03 2004-02-11 (ì£¼)ëì§í Method for synthesizing voice

2003
- 2003-02-13 KR KR1020030009198A patent/KR20030031936A/en not_active Ceased
- 2003-06-24 WO PCT/KR2003/001238 patent/WO2004072951A1/en active Application Filing

Cited By (15) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title KR100912339B1 (en) * 2007-05-10 2009-08-14 ì£¼ìíì¬ ì¼ì´í° Minority Speaker Voice Data Training Device Using Speech Variation and Its Method US11315583B2 (en) 2017-11-10 2022-04-26 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits US11380339B2 (en) 2017-11-10 2022-07-05 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits US11043226B2 (en) 2017-11-10 2021-06-22 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Apparatus and method for encoding and decoding an audio signal using downsampling or interpolation of scale parameters US11127408B2 (en) 2017-11-10 2021-09-21 FraunhoferâGesellschaft zur F rderung der angewandten Forschung e.V. Temporal noise shaping US11217261B2 (en) 2017-11-10 2022-01-04 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Encoding and decoding audio signals US12033646B2 (en) 2017-11-10 2024-07-09 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Analysis/synthesis windowing function for modulated lapped transformation US11315580B2 (en) 2017-11-10 2022-04-26 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Audio decoder supporting a set of different loss concealment tools TWI728277B (en) * 2017-11-10 2021-05-21 å¼åæ©éå¤«ç¾åæ Selecting pitch lag US11380341B2 (en) 2017-11-10 2022-07-05 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Selecting pitch lag US11386909B2 (en) 2017-11-10 2022-07-12 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits US11462226B2 (en) 2017-11-10 2022-10-04 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Controlling bandwidth in encoders and/or decoders US11545167B2 (en) 2017-11-10 2023-01-03 Fraunhofer-Gesellschaft zur FÃ¶rderung der angewandten Forschung e.V. Signal filtering US11562754B2 (en) 2017-11-10 2023-01-24 Fraunhofer-Gesellschaft Zur F Rderung Der Angewandten Forschung E.V. Analysis/synthesis windowing function for modulated lapped transformation CN109712634A (en) * 2018-12-24 2019-05-03 ä¸åå¤§å¦ A kind of automatic sound conversion method Also Published As Similar Documents Publication Publication Date Title JP6290858B2 (en) 2018-03-07 Computer processing method, apparatus, and computer program product for automatically converting input audio encoding of speech into output rhythmically harmonizing with target song AU639394B2 (en) 1993-07-22 Speech synthesis using perceptual linear prediction parameters Syrdal et al. 1994 Applied speech technology CN100568343C (en) 2009-12-09 Device and method for generating pitch waveform signal and device and method for processing speech signal JPH02242298A (en) 1990-09-26 Speaker identifying device based on glottis waveform GB2480538A (en) 2011-11-23 Real time correction of mispronunciation of a non-native speaker KR20030031936A (en) 2003-04-23 Mutiple Speech Synthesizer using Pitch Alteration Method JP3701671B2 (en) 2005-10-05 Method and apparatus for testing communication devices using test signals with reduced redundancy JP5560769B2 (en) 2014-07-30 Phoneme code converter and speech synthesizer JP5360489B2 (en) 2013-12-04 Phoneme code converter and speech synthesizer JPH07199997A (en) 1995-08-04 Audio signal processing method in audio signal processing system and method for reducing processing time in the processing JP3618217B2 (en) 2005-02-09 Audio pitch encoding method, audio pitch encoding device, and recording medium on which audio pitch encoding program is recorded KR20010025770A (en) 2001-04-06 On the Real-Time Fairy Tale Narration System with Parent's Voice Color US20050171777A1 (en) 2005-08-04 Generation of synthetic speech Kim et al. 2009 On a speech multiple system implementation for speech synthesis JP2003323200A (en) 2003-11-14 Gradient descent optimization of linear prediction coefficient for speech coding Agiomyrgiannakis et al. 2008 Towards flexible speech coding for speech synthesis: an LF+ modulated noise vocoder. JPH0235994B2 (en) 1990-08-14 ZieliÅski et al. 2021 Speech Compression and Recognition Yazu et al. 1986 The speech synthesis system for an unlimited Japanese vocabulary WO2021245771A1 (en) 2021-12-09 Training data generation device, model training device, training data generation method, model training method, and program JPS635398A (en) 1988-01-11 Voice analysis system JPS5950079B2 (en) 1984-12-06 Speech synthesis method JP5481957B2 (en) 2014-04-23 Speech synthesizer Kim et al. 2005 On the Implementation of Gentle Phoneâs Function Based on PSOLA Algorithm Legal Events Date Code Title Description 2003-02-13 A201 Request for examination 2003-02-13 PA0109 Patent application

Patent event code: PA01091R01D

Comment text: Patent Application

Patent event date: 20030213

2003-02-13 PA0201 Request for examination 2003-04-23 PG1501 Laying open of application 2003-06-19 N231 Notification of change of applicant 2003-06-19 PN2301 Change of applicant

Patent event date: 20030619

Comment text: Notification of Change of Applicant

Patent event code: PN23011R01D

2005-03-07 E902 Notification of reason for refusal 2005-03-07 PE0902 Notice of grounds for rejection

Comment text: Notification of reason for refusal

Patent event date: 20050307

Patent event code: PE09021S01D

2005-09-26 E601 Decision to refuse application 2005-09-26 PE0601 Decision on rejection of patent

Patent event date: 20050926

Comment text: Decision to Refuse Application

Patent event code: PE06012S01D

Patent event date: 20050307

Comment text: Notification of reason for refusal

Patent event code: PE06011S01I

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4