PROBLEM TO BE SOLVED: To obtain a method in which the distortion and the noise are not present and which changes the pitch and the reproducing speed of an acoustic signal by determining optimal overlapping connection points while calculating means absolute errors of the acoustic signal and changing the pitch and the reproducing speed of the acoustic signal. SOLUTION: An analog /digital converter 14 converts an analog audio signal into a digital signal. Next, the digital signal is divided into acoustic frames in a pitch shifting processor 76 and the pitch and the reproducing speed of the digital signal in certain frames is changed. Next, the changed acoustic frame is overlappingly connected with unchanged acoustic frames so that the unchanged acoustic frames are overlapped with the end area of a pre-changed acoustic frame. This overlapping calculates a differential mean absolute error to minimize an audible noise. Or, the pitch and the reproducing speed are determined by defining the overlapping connection point as the best overlapping point in which the distortion and noise are not entirely present.
Description Translated from Japanese ãçºæã®è©³ç´°ãªèª¬æãDETAILED DESCRIPTION OF THE INVENTIONãï¼ï¼ï¼ï¼ã[0001]
ãçºæã®å±ããæè¡åéãæ¬çºæã¯ãä¸è¬çã«ã¯ãªã¼ã
ã£ãªä¿¡å·ã®ãããåã³åçï¼ã¾ãã¯æ¼å¥ï¼é度ã夿´ã
ãã¢ã«ã´ãªãºã ã«é¢ãããã詳ããè¿°ã¹ãã°ãé³é¿ä¿¡å·
ã®ç¨®ã
ã®åºåãéãç¶ãï¼ã¹ãã©ã¤ã¹ï¼ãã¦ãããåã³
é度ã夿´ã§ããããã«å¹³å絶対誤差ãè¨ç®ãã¦æè¯ã®
éãç¶ãç¹ãè¦åºããã¨ã«ãã£ã¦ããªã¼ãã£ãªä¿¡å·ã®ã
ããåã³é度ã夿´ããé«å¹çã®ã¢ã«ã´ãªãºã ã«é¢ã
ããBACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates generally to algorithms for altering the pitch and playback (or playing) speed of an audio signal, and more particularly, to splicing various sections of an audio signal. The present invention relates to a highly efficient algorithm for changing the pitch and speed of an audio signal by calculating the average absolute error so that the pitch and speed can be changed to find the best lap joint.
ãï¼ï¼ï¼ï¼ã[0002]
ã徿¥ã®æè¡åã³çºæã解決ãããã¨ãã課é¡ããªã¼ã
ã£ãªä¿¡å·ã¬ã³ã¼ãã£ã³ã°ã«ããã¦ã¯ãç¹å®ã®ãªã¼ãã£ãª
å¿ç¨ã«ãããé³é¿ä¿¡å·ã®ãããåã³åçé度ã夿´ãã
ãã¨ããåªåããªããã¦ãããä¾ãã°ããµã³ããªã³ã°ã·
ã³ã»ãµã¤ã¶ããã¼ã¢ãã¤ã¶ããã³ã¼ããè¨èªå¦ç¿è£
ç½®ã
é»è©±å¿çè£
ç½®ãåã³ã³ã³ãã¥ã¼ã¿åæé³æ¥½ã®ããã®ã½ã
ãã¦ã§ã¢ã®ä½¿ç¨ã®ãããªããããããªå¿ç¨ã«ããã¦å¤æ´
ã試ã¿ããã¦ããã人ã®é³å£°ä¿¡å·ã夿´ãããã¨ãæã
å ´åã«ã¯ãå§ç¸®æè¡ã使ç¨ãã¦ææã®ãããã«å¾ã£ã¦é³
é¿ä¿¡å·ã夿´ããä¿¡å·ã®æ¯å¹
ã調æ´ãã¦ãããä¸è¬ã«ã
調æ´å¯è½ãªå
¥åé³é¿ä¿¡å·ã®æ¯å¹
ã®å¤æ´ç¯å²ã¯ããªã¯ã¿ã¼
ã以å
ã§ãããé³é¿ä¿¡å·ã¯ã 12 ã®éé ã®ãã¼ããã¼ã³
ã¨ã 12 ã®æé ã®ãã¼ããã¼ã³ã¨ãå«ãåè¨ 24 ã®ãã¼
ããã¼ã³å
ã§èª¿æ´ãããã¨ãã§ããããã®å¤æ´ã¯ãæ¯è¼
çç°¡åãªãã¼ãã¦ã§ã¢è¨è¨ã«ãã£ã¦ããã¼ã¿ã®å®æéå¦
çã«å¯¾ããè¦æã«åãããªããã°ãªããªããã¾ãé³é¿å
ã«ãå¦ä½ãªãæ¤åºå¯è½ãªã²ãã¿ãå°å
¥ãã¦ã¯ãªããªããBACKGROUND OF THE INVENTION In audio signal recording, efforts have been made to change the pitch and playback speed of audio signals in certain audio applications. For example, sampling synthesizers, harmonizers, vocoders, language learning devices,
Changes have been attempted in a variety of applications, such as telephone answering machines and the use of software for computer synthesized music. If it was desired to alter the human speech signal, compression techniques were used to alter the sound signal according to the singer's pitch and adjust the amplitude of the signal. In general,
The adjustable range of the amplitude of the input audio signal that can be adjusted is within an octave. The acoustic signal can be adjusted within a total of 24 halftones, including 12 descending halftones and 12 ascending halftones. This change must be adapted to the desire for real-time processing of the data by a relatively simple hardware design. Nor should any detectable distortion be introduced into the sound.
ãï¼ï¼ï¼ï¼ã徿¥ãé³é¿ä¿¡å·ã®å¤æ´ã«ã¯åãµã³ããªã³ã°
åã³ãã©ã¼ãããã£ã³ã°ã使ç¨ããåé¢åã³éãç¶ãæ¹
æ³ãæ¡ç¨ããã¦ãããããããªããããã®å¤æ´æ¹æ³ã¯å
容ã§ããªãã¬ãã«ã®ã²ãã¿ãé³é¿å
ã«çºçãããåãµã³
ããªã³ã°æè¡ã¯ãµã³ããªã³ã°å¨æ³¢æ°ãå¤ãããã¨ã«éç¹
ãããã¦ãããé³é¿ä¿¡å·ã®æ¯å¹
ãå¤åãããã ãã§ã¯ãª
ããä¿¡å·ã®é·ãåã³ãã©ã¼ãããå
絡ç·ã®å½¢ç¶ã¾ã§å¤å
ããã¦ãã¾ããå
ã®ä¿¡å·ã®é·ããç¶æããããã«ãé³é¿
ä¿¡å·ãåãµã³ããªã³ã°ããå¾ã«å§ç¸®åã³ä¼¸é·æè¡ã®ãã
ãªä»ã®ä½æ¥ãéè¡ãã¦ããããããããããã®å§ç¸®ï¼ä¼¸
é·æ®µéã¯ãçæéã®ãããéé³ãçºçããããã¨ãå¤
ããæ´ã«ããã©ã¼ãããå
絡ç·ã®å½¢ç¶ãå¤åããã¨ãé«
ãããã®éé³ãçºçãããåé¢ï¼éãç¶ãæ¹æ³ã¯ããã©
ã¼ãããã®å½¢ç¶ãç¶æããããã«ç·å½¢äºæ¸¬ãã£ã«ã¿åã³
ãã¼ãªã¨å¤æã使ç¨ããããå¿
è¦ãªè¨ç®æ®µéã®æ°ãè«å¤§
ã«ãªããæ´ã«ä»ã®æ¹æ³ã¯ãé³é¿ãããã夿´ããããã«
è¤æ°ã®çºæ¯å¨åã³ãã£ã«ã¿ãã³ã¯ã使ç¨ãã¦ããããã
ãã®æ¹æ³ã¯ãä½å¨æ³¢æ°åã³é«å¨æ³¢æ°ã®éé³ãçºçããæ´
ã«è¨ç®ã«å¤æ°ã®æ®µéãå¿
è¦ã¨ãããå¾ã£ã¦ãæ¬çºæã®ç®
çã¯ã徿¥æè¡ã®æ¹æ³ã®æ¬ é¥ãæãã¦ããªããé³é¿ä¿¡å·
ã®ãããåã³åçé度ã夿´ããæ¹æ³ãæä¾ãããã¨ã§
ãããæ¬çºæã®å¥ã®ç®çã¯ãé³é¿ä¿¡å·ã®å¹³å絶対誤差ã
è¨ç®ãããã¨ã«ãã£ã¦æé©ã®éãç¶ãç¹ã決å®ããé³é¿
ä¿¡å·ã®ãããåã³åçé度ã夿´ããæ¹æ³ãæä¾ããã
ã¨ã§ãããæ¬çºæã®ãããªãç®çã¯ããããã¯ï¼é²æ¢ç´¢
æ¹æ³ãçµã¿å
¥ãããã¨ã«ãã£ã¦é³é¿ä¿¡å·ã®å¹³å絶対誤差
ãè¨ç®ããé³é¿ä¿¡å·ã®ãããåã³åçé度ã夿´ããæ¹
æ³ãæä¾ãããã¨ã§ããã[0003] In the past, separation and splicing methods using resampling and formatting have been used to modify acoustic signals. However, this modification produces unacceptable levels of distortion in the sound. Resampling techniques focus on changing the sampling frequency, not only changing the amplitude of the acoustic signal, but also changing the length of the signal and the shape of the format envelope. Other tasks such as compression and decompression techniques have been performed after resampling the audio signal to maintain the original signal length. However, these compression / decompression stages often generate short-term pop noise. Further, when the shape of the format envelope changes, high pitch noise is generated. The split / splice method uses a linear prediction filter and a Fourier transform to maintain the shape of the format, but the number of computation steps required is enormous. Still other methods use multiple oscillators and filter banks to change the acoustic pitch. These methods generate low and high frequency noise and require a number of steps in the computation. Accordingly, it is an object of the present invention to provide a method of changing the pitch and playback speed of an audio signal that does not have the deficiencies of the prior art methods. It is another object of the present invention to provide a method for determining the optimal splice point by calculating the average absolute error of the audio signal and changing the pitch and playback speed of the audio signal. It is a further object of the present invention to provide a method for calculating the average absolute error of an audio signal by incorporating a block binary search method and changing the pitch and playback speed of the audio signal.
ãï¼ï¼ï¼ï¼ã[0004]
ã課é¡ã解決ããããã®ææ®µãä¸è¿°ããç®çã«å¾ã£ã¦ã
æ¬çºæã®ç¬¬ï¼ã®é¢ã«ããã¦ã¯ãå
ãã¢ããã°ãªã¼ãã£ãª
ä¿¡å·ããã£ã¸ã¿ã«ä¿¡å·ã«å¤æãããããªããªã¼ãã£ãªä¿¡
å·ã®ãã©ã¡ã¿ã夿´ããæ¹æ³ãæä¾ããããæ¬¡ãã§ãã
ã®ãã£ã¸ã¿ã«ä¿¡å·ã¯é³é¿ãã¬ã¼ã ã«åå²ããããããã¬
ã¼ã å
ã®ãã£ã¸ã¿ã«ä¿¡å·ã®ãããåã³åçé度ã夿´ã
ãããæ¬¡ã«ããã®ããã«å¤æ´ããé³é¿ãã¬ã¼ã ããã¯ã
ã¹ãã§ã¼ãã£ã³ã°ã®ããã«ãæªå¤æ´é³é¿ãã¬ã¼ã ã夿´
æ¸ã¿é³é¿ãã¬ã¼ã ã®ç«¯é åã«éãªãããã«ãæªå¤æ´é³é¿
ãã¬ã¼ã ã¨éãç¶ãããããã®éãåããã¯ãé³é¿æ§é
ãä¸è¨ç«¯é åã«é¡ä¼¼ãããã¬ã¼ã ã®ä¸é¨ãç¨ãã¦éè¡ã
ãããé³é¿æ§é ã®é¡ä¼¼æ§ã¯ã颿° DMAEï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)â x1(m)âx2(mï¼1 ï¼Ï) âx2(mï¼Ï) ï½ ï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)âx2(mï¼1 ï¼Ï) â[ x1(m) ï¼x2(mï¼Ï) ] ï½ ã«ããè¨ç®æ®µéã®æ°ãæãå°ãªããããããªéãç¶ãã®
å·®åå¹³å絶対誤差ãå®ç¾©ãããã¨ã«ãã£ã¦æ±ºããããã
ä½ããDï¼ï¼¡ï¼¥ã¯éãç¶ãã®å·®åå¹³å絶対誤差ã§ããã
ï½ã¯ï¼ã¨ï½ï½ï¼ï½ï½ã¯ã¯ãã¹ãã§ã¼ãã£ã³ã°ã®ãµã¤ãºï¼
ã¨ã®éã®ç¹ã®ä½çãã®çµåãåã§ãããæ¢ç´¢é åãï½ï½
ã¨ãã¦ï¼â¦Ïï¼ï½ï½ã§ãããï½1 ã¯å¤æ´æ¸ã¿ãã¬ã¼ã ã§
ãããããã¦ï½2 ã¯æªå¤æ´ãã¬ã¼ã ã§ããã夿´åã³é
ãç¶ã段éã¯ãæªå¤æ´é³é¿ãã¬ã¼ã ã«ã¤ãã¦ç¹°ãè¿ã
ããã¾ããã£ã¸ã¿ã«ä¿¡å·ã®æ®ä½ã®æªå¤æ´é³é¿ãã¬ã¼ã ã«
ã¤ãã¦ãç¹°ãè¿ããã¦å¤æ´ããããã£ã¸ã¿ã«ä¿¡å·ãæ±ã
ããããæå¾ã«å¤æ´æ¸ã¿ãã£ã¸ã¿ã«ä¿¡å·ãã¢ããã°å½¢ç¶
ã«æ»ãããã«å¤æããããAccording to the above-mentioned object,
According to a first aspect of the present invention, there is provided a method of changing parameters of an audio signal, such as first converting an analog audio signal to a digital signal. The digital signal is then split into sound frames, and the pitch and playback speed of the digital signal in a frame are changed. Next, the sound frame changed in this way is overlapped with the unchanged sound frame so that the unchanged sound frame overlaps the end region of the changed sound frame for crossfading. This superposition is performed using a part of the frame whose acoustic structure is similar to the end region. Similarity of the acoustic structure, function DMAE = Σ m | x 1 ( m) -x 2 (m + Ï) | + | x 1 (m + 1) - x 1 (m) -x 2 (m + 1 + Ï) -x 2 (m + Ï ) | = Σ m | x 1 (m) -x 2 (m + Ï) | + | x 1 (m + 1) -x 2 (m + 1 + Ï) - [x 1 (m) + x 2 (m + Ï)] | by the calculation step It is determined by defining the difference mean absolute error of the splicing that minimizes the number.
However, DMAE is the difference mean absolute error of the splicing,
m is 0 and cs (cs is the size of crossfading)
Is the sum of some combination of points between
0 â¦ Ï <sr, x 1 is the changed frame, and x 2 is the unchanged frame. The altering and splicing steps are repeated for the unaltered audio frames, and also for the remaining unaltered audio frames of the digital signal to determine the altered digital signal. Finally, the modified digital signal is converted back to an analog form.
ãï¼ï¼ï¼ï¼ã夿´æ®µéã«ãã£ã¦é³é¿ãã¬ã¼ã ãé·ããªã
å ´åã«ã¯ãéå°ãªæªå¤æ´é³é¿ãã¬ã¼ã ãç ´æ£ãã¦åçæ
éãå¤åãããªãããã«ç¶æããã䏿¹ã夿´æ®µéã«ã
ã£ã¦é³é¿ãã¬ã¼ã ãçããªãå ´åã«ã¯ãä¸è¶³ããé³é¿ã
ã¬ã¼ã ãå
ã®ãã£ã¸ã¿ã«ä¿¡å·ããåãå
¥ãããã¦åçæ
éãå¤åãããªãããã«ç¶æãããDï¼ï¼¡ï¼¥ã¯ãäºãã«
é¢éããç¹ï½Ïï¼ï½ã¯æ´æ°ã§ãã£ã¦ã許容è¨ç®ç²¾åº¦ã®ç¯
å²ã«ä¾åããï¼å
ã«å®ç¾©ããããæ¢ç´¢é åãè¤æ°ã®åºå
ã«åå²ããååºåæ¯ã«ãDï¼ï¼¡ï¼¥ãå®ç¾©ããå®ç¾©ããã
Dï¼ï¼¡ï¼¥ãäºãã«æ¯è¼ããããã¦æå°ï¼¤ï¼ï¼¡ï¼¥ãæãã
åºåãæé©ã®éãç¶ãä½ç½®ã¨ãã¦é¸æãããæå°ï¼¤ï¼ï¼¡
ï¼¥ãæããåºåãæ¢ç¥ããã®ã«å¿
è¦ãªè¨ç®ã®æ°ã¯ã ï½ãï¼ï¼ï¼ï¼ log2 ï¼ï¼³ï¼ï½âï¼ï¼ã ã§ãããä½ããï½ã¯åºåã®æ°ã§ãããï¼ï¼³ã¯æ¢ç´¢é åã®
é·ãã§ãããæ¬çºæã®ç¬¬ï¼ã®é¢ã«ããã°ããªã¼ãã£ãªä¿¡
å·ã®ãã©ã¡ã¿ã夿´ããæ¹æ³ãæä¾ããããã®æ¹æ³ã§
ã¯ãå
ãã¢ããã°ãªã¼ãã£ãªä¿¡å·ã¯ãã£ã¸ã¿ã«ä¿¡å·ã«å¤
æããããæ¬¡ãã§ããã®ãã£ã¸ã¿ã«ä¿¡å·ã¯é³é¿ãã¬ã¼ã
ã«åå²ããããããã¬ã¼ã ã®åçæéã夿´ããããæ¬¡
ã«ããã®ããã«å¤æ´ããé³é¿ãã¬ã¼ã ããã¯ãã¹ãã§ã¼
ãã£ã³ã°ã®ããã«ãæªå¤æ´é³é¿ãã¬ã¼ã ã夿´æ¸ã¿é³é¿
ãã¬ã¼ã ã®ç«¯é åã«éãªãããã«ãæªå¤æ´é³é¿ãã¬ã¼ã
ã¨éãç¶ãããããã®éãåããã¯ãé³é¿æ§é ãä¸è¨ç«¯
é åã«é¡ä¼¼ãããã¬ã¼ã ã®ä¸é¨ãç¨ãã¦éè¡ããããé³
é¿æ§é ã®é¡ä¼¼æ§ã¯é¢æ° DMAEï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)â x1(m)âx2(mï¼1 ï¼Ï) âx2(mï¼Ï) ï½ ï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)âx2(mï¼1 ï¼Ï) â[ x1(m) ï¼x2(mï¼Ï) ] ï½ ã«ããè¨ç®æ®µéã®æ°ãæãå°ãªããããããªéãç¶ãã®
å·®åå¹³å絶対誤差ãå®ç¾©ãããã¨ã«ãã£ã¦æ±ºããããã
ä½ããDï¼ï¼¡ï¼¥ã¯éãç¶ãã®å·®åå¹³å絶対誤差ã§ããã
ï½ã¯ï¼ã¨ï½ï½ï¼ï½ï½ã¯ã¯ãã¹ãã§ã¼ãã£ã³ã°ã®ãµã¤ãºï¼
ã¨ã®éã®ç¹ã®ä½çãã®çµåãåã§ãããæ¢ç´¢é åãï½ï½
ã¨ãã¦ï¼â¦Ïï¼ï½ï½ã§ãããï½1 ã¯å¤æ´æ¸ã¿ãã¬ã¼ã ã§
ãããããã¦ï½2 ã¯æªå¤æ´ãã¬ã¼ã ã§ããã夿´åã³é
ãç¶ã段éã¯ãæªå¤æ´é³é¿ãã¬ã¼ã ã«ã¤ãã¦ç¹°ãè¿ã
ããã¾ããã£ã¸ã¿ã«ä¿¡å·ã®æ®ä½ã®æªå¤æ´é³é¿ãã¬ã¼ã ã«
ã¤ãã¦ãç¹°ãè¿ããã¦å¤æ´ããããã£ã¸ã¿ã«ä¿¡å·ãæ±ã
ããããæå¾ã«å¤æ´æ¸ã¿ãã£ã¸ã¿ã«ä¿¡å·ãã¢ããã°å½¢ç¶
ã«æ»ãããã«å¤æãããã[0005] When the sound frame becomes longer due to the changing step, the excessive unchanged sound frame is discarded to keep the reproduction time unchanged. On the other hand, if the sound frame is shortened by the change step, the missing sound frame is taken in from the original digital signal and the reproduction time is maintained so as not to change. The DMAE is defined within points nÏ separated from each other (n is an integer and depends on the range of allowable calculation accuracy). The search area is divided into a plurality of sections, a DMAE is defined for each section, the defined DMAEs are compared with each other, and a section having the minimum DMAE is selected as an optimal splicing position. Minimum DMA
The number of calculations required to find the partition with E is n [3 + 2 (log 2 MS / n-2)]. Here, n is the number of sections, and MS is the length of the search area. According to a second aspect of the present invention, there is provided a method for modifying parameters of an audio signal, wherein an analog audio signal is first converted to a digital signal. The digital signal is then split into sound frames, and the playback time of a certain frame is changed. Next, the sound frame changed in this way is overlapped with the unchanged sound frame so that the unchanged sound frame overlaps the end region of the changed sound frame for crossfading. This superposition is performed using a part of the frame whose acoustic structure is similar to the end region. Similarity of the acoustic structure function DMAE = Σ m | x 1 ( m) -x 2 (m + Ï) | + | x 1 (m + 1) - x 1 (m) -x 2 (m + 1 + Ï) -x 2 (m + Ï) | = Σ m | x 1 ( m) -x 2 (m + Ï) | + | x 1 (m + 1) -x 2 (m + 1 + Ï) - [x 1 (m) + x 2 (m + Ï)] | number of calculations step by Is defined by defining the difference average absolute error of the overlapping splice so as to minimize
However, DMAE is the difference mean absolute error of the splicing,
m is 0 and cs (cs is the size of crossfading)
Is the sum of some combination of points between
0 â¦ Ï <sr, x 1 is the changed frame, and x 2 is the unchanged frame. The altering and splicing steps are repeated for the unaltered audio frames, and also for the remaining unaltered audio frames of the digital signal to determine the altered digital signal. Finally, the modified digital signal is converted back to an analog form.
ãï¼ï¼ï¼ï¼ããã®å ´åãããªã¼ãã£ãªä¿¡å·å¦çããªã¼ã
ã£ãªä¿¡å·ã®æ¯å¹
ãå¢å ããããã¨ãããã°ãåçæéã
å¤åããã夿´æ®µéã¯ãããå¢å ããããªã¼ãã£ãªä¿¡å·
ã®åçé度åã³æ¯å¹
ãå¤åãããªãããã«ç¶æãããã¾
ãããªã¼ãã£ãªä¿¡å·å¦çããªã¼ãã£ãªä¿¡å·ã®æ¯å¹
ãæ¸å°
ããããã¨ãããã°ãåçæéãå¤åããã夿´æ®µéã¯
ãããç縮ããããªã¼ãã£ãªä¿¡å·ã®åçé度åã³æ¯å¹
ã
å¤åãããªãããã«ç¶æãããDï¼ï¼¡ï¼¥ã¯ãäºãã«é¢é
ããç¹ï½Ïï¼ï½ã¯æ´æ°ã§ãã£ã¦ã許容è¨ç®ç²¾åº¦ã®ç¯å²ã«
ä¾åããï¼å
ã«å®ç¾©ããããæ¢ç´¢é åãè¤æ°ã®åºåã«å
å²ããååºåæ¯ã«ãDï¼ï¼¡ï¼¥ãå®ç¾©ããå®ç¾©ãããDï¼
AEãäºãã«æ¯è¼ããããã¦æå°ï¼¤ï¼ï¼¡ï¼¥ãæããåºå
ãæé©ã®éãç¶ãä½ç½®ã¨ãã¦é¸æãããæå°ï¼¤ï¼ï¼¡ï¼¥ã
æããåºåãæ¢ç¥ããã®ã«å¿
è¦ãªè¨ç®ã®æ°ã¯ã ï½ãï¼ï¼ï¼ï¼ log2 ï¼ï¼³ï¼ï½âï¼ï¼ã ã§ãããä½ããï½ã¯åºåã®æ°ã§ãããï¼ï¼³ã¯æ¢ç´¢é åã®
é·ãã§ããããªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ããè£
ç½®
ãæä¾ããããæ¬çºæã«ããã°ããã®è£
ç½®ã¯ãå
¥åå¢å¹
å¨åã³åºåå¢å¹
å¨ã¨ã第ï¼åã³ç¬¬ï¼ã®ä½åééãã£ã«ã¿
ã¨ãã¢ããã°ã»ãã¸ã¿ã«å¤æå¨ã¨ããã£ã¸ã¿ã«ã»ã¢ãã
ã°å¤æå¨ã¨ããããã·ããã£ã³ã°ããã»ããµã¨ãåãã¦
ãããå
¥åå¢å¹
å¨ã第ï¼ã®ä½åééãã£ã«ã¿ãåã³ã¢ã
ãã°ã»ãã¸ã¿ã«å¤æå¨ã¯ãããã·ããã£ã³ã°ããã»ããµ
ã®å
¥åã¨ç´åã«æ¥ç¶ããã䏿¹ãã£ã¸ã¿ã«ã»ã¢ããã°å¤
æå¨ã第ï¼ã®ä½åééãã£ã«ã¿ãåã³åºåå¢å¹
å¨ã¯ãã
ãã·ããã£ã³ã°ããã»ããµã®åºåã¨ç´åã«æ¥ç¶ããã¦ã
ããIn this case as well, if the audio signal processing increases the amplitude of the audio signal, the changing step of changing the reproduction time increases it and keeps the reproduction speed and the amplitude of the audio signal unchanged. . In addition, if the audio signal processing reduces the amplitude of the audio signal, the changing step of changing the reproduction time shortens it and keeps the reproduction speed and the amplitude of the audio signal unchanged. The DMAE is defined within points nÏ separated from each other (n is an integer and depends on the range of allowable calculation accuracy). The search area is divided into a plurality of sections, a DMAE is defined for each section, and the defined DM is defined.
The AEs are compared with each other and the section with the lowest DMAE is selected as the optimal splice location. The number of calculations required to find the partition with the lowest DMAE is n [3 + 2 (log 2 MS / n-2)]. Here, n is the number of sections, and MS is the length of the search area. An apparatus for changing a parameter of an audio signal is also provided. According to the invention, the device comprises an input amplifier and an output amplifier, first and second low-pass filters, an analog-to-digital converter, a digital-to-analog converter, and a pitch shifting processor. ing. The input amplifier, the first low-pass filter, and the analog-to-digital converter are connected in series with the input of the pitch shifting processor, while the digital-to-analog converter, the second low-pass filter, and the output amplifier are connected to the pitch-shifting processor. It is connected in series with the output of the shifting processor.
ãï¼ï¼ï¼ï¼ããããã·ããã£ã³ã°ããã»ããµã¯ãå
¥åã
ããã¡ã«æ¥ç¶ããã¦ããå
¥åã¦ãããã¨ãåºåãããã¡
ã«æ¥ç¶ããã¦ããåºåã¦ãããã¨ãã¯ãã¹ãã§ã¼ãã£ã³
ã°ãå¿
è¦ã¨ãããªã¼ãã£ãªä¿¡å·ã®é¨åãæ ¼ç´ããã¯ãã¹
ãã§ã¼ãã£ã³ã°ãã¼ã¿ã¡ã¢ãªã¨ãå
¥ååã³åºåãããã¡
åã³ã¯ãã¹ãã§ã¼ãã£ã³ã°ãã¼ã¿ã¡ã¢ãªã«æ¥ç¶ããã¦ã
ãã¢ãã¬ã¹ã¦ãããã¨ãã¬ã¸ã¹ã¿ãã¡ã¤ã«ã¦ãããã¨ã
å¹³å絶対誤差åã³ã¯ãã¹ãã§ã¼ãã£ã³ã°å¤ãè¨ç®ããã
ã£ã¸ã¿ã«å¦çã¦ãããã¨ãå¶å¾¡ã¦ãããã¨ãåãã¦ã
ããå
¥åãããã¡ãã¯ãã¹ãã§ã¼ãã£ã³ã°ãã¼ã¿ã¡ã¢
ãªãã¬ã¸ã¹ã¿ãã¡ã¤ã«ã¦ãããããã£ã¸ã¿ã«å¦çã¦ãã
ããå¶å¾¡ã¦ããããåã³åºåãããã¡ã¯ããã¹ã·ã¹ãã
ãéãã¦ä½åçã«ç¸äºæ¥ç¶ããã¦ãããæ¬çºæã®ä»ã®ç®
çãç¹è²ãåã³é·æã¯ã以ä¸ã®æ·»ä»å³é¢ã«åºã¥ã説æã
ãæç½ã«ãªãã§ããããThe pitch shifting processor includes an input unit connected to an input buffer, an output unit connected to an output buffer, and a crossfading data memory for storing a portion of an audio signal requiring crossfading. An address unit connected to the input and output buffers and the crossfading data memory; a register file unit;
A digital processing unit for calculating an average absolute error and a crossfading value; and a control unit. The input buffer, crossfading data memory, register file unit, digital processing unit, control unit, and output buffer are operatively interconnected through a bus system. Other objects, features and advantages of the present invention will become apparent from the following description based on the accompanying drawings.
ãï¼ï¼ï¼ï¼ã[0008]
ãçºæã®å®æ½ã®å½¢æ
ãæ¬çºæã«ããã°ã徿¥æè¡ã®æ¹æ³
ã®æ¬ é¥ãæãã¦ããªããé³é¿ä¿¡å·ã®ãããåã³åçé度
ã夿´ããæ¹æ³ãæä¾ããããé³é¿ä¿¡å·ã®ãããã夿´
ããæãç°¡åãªæ¹æ³ã¯ããããããã¼ãã¬ã³ã¼ããé«é
ã§ãã¾ãã¯ä½éã§åçãã¦ãããã®ãããªå¹æãçºçã
ãããã¨ã§ããããã®å¹æã¯ï¼ã¤ã®ç°ãªãæ¹æ³ã§çºçã
ãããã¨ãã§ããã第ï¼ã®æ¹æ³ã¯ãããåçé度ãä¸å®
ã«ä¿ã¤ã®ã§ããã°ããµã³ããªã³ã°ç¹ã卿çã«å¢å ã¾ã
ã¯æ¸å°ããããã¨ã§ããããããå³ï¼ã«ç¤ºããå
ã®é³é¿
ä¿¡å·ãï¼ï¼ã§ç¤ºãã¦ãããé³é¿ä¿¡å·ï¼ï¼ã¯ãé«éã§åç
ãããé³é¿ã®å¹æãå¾ãããã«ããµã³ããªã³ã°ç¹ã卿
çã«æ¸å°ãã¦ãããã¨ã示ãã¦ãããé³é¿ä¿¡å·ï¼ï¼ã¯ã
ä½éã§é³é¿ãåçãã广ãçºçãããããã«ããµã³ã
ãªã³ã°ç¹ã卿çã«å¢å ãã¦ããç¶æ
ã示ãã¦ããã第
ï¼ã®æ¹æ³ã¯ãåçé度ãå¢å ã¾ãã¯ä½ä¸ãããªããããµ
ã³ããªã³ã°ç¹ãä¸å®ã«ä¿ã¤ãã¨ã§ããããã®æ¹æ³ã¯ãã
ã¼ãã¬ã³ã¼ããé«éã§ãã¾ãã¯ä½éã§åçããåçã«ä¼¼
ã¦ãããããããªããããããã®æ¹æ³ã®ä½ããã«ãã£ã¦
ããããããæ¬ é¥ã®ï¼ã¤ã¯ãçµæã¨ãã¦å¾ãããåçæ
éãå¤åãããã¨ã§ããããã®åé¡ãä¿®æ£ããããã«ã
夿´ããé³é¿ä¿¡å·ã®éè¤ï¼ç ´æ£æ¹æ³ã使ç¨ããå
ãé£ç¶
é³é¿ä¿¡å·ããé³é¿ãã¬ã¼ã ã¨å¼ã¶å¹¾ã¤ãã®åºåã«åå²ã
ããæ¯å¹
ãæ¸å°ãã¦é³é¿ãã¬ã¼ã ãé·ãããããããªç¶
æ³ã§ã¯ãéå°ãªé³é¿ä¿¡å·ãç ´æ£ããã䏿¹ãæ¯å¹
ãå¢å
ãã¦é³é¿ãã¬ã¼ã ãçããããå ´åã«ã¯ãé³é¿ä¿¡å·ã®ä¸
è¶³é¨åãé³é¿ãã¬ã¼ã ã®ä»ã®åºåã«ãã£ã¦å
å¡«ãããã
ã®æè¡ã使ç¨ãããã¨ã«ãã£ã¦ãåé³é¿ãã¬ã¼ã ã®é·ã
ãä¸å®å¤ã«ç¶æãããã¨ãã§ãããDETAILED DESCRIPTION OF THE INVENTION In accordance with the present invention, there is provided a method of changing the pitch and playback speed of an audio signal without the deficiencies of the prior art methods. The simplest way to change the pitch of an audio signal is to create the effect as if the tape recorder were playing at a high or low speed. This effect can be generated in two different ways. The first way is to periodically increase or decrease the sampling points if the playback speed is kept constant. This is shown in FIG. The original acoustic signal is shown at 10. The sound signal 12 indicates that the sampling points are periodically reduced in order to obtain the effect of the sound reproduced at high speed. The acoustic signal 12 is
This shows a state in which the number of sampling points is periodically increased in order to generate an effect of reproducing sound at a low speed. The second method is to keep the sampling point constant while increasing or decreasing the playback speed. This method resembles the principle of playing a tape recorder at high speed or low speed. However, one of the deficiencies introduced by any of these methods is that the resulting playback time varies. To fix this problem,
Using a modified sound signal overlap / discard method, a continuous sound signal is first divided into several sections called sound frames. In situations where the amplitude decreases and the acoustic frame becomes longer, the excess acoustic signal is discarded. On the other hand, when the amplitude is increased to shorten the acoustic frame, the insufficient portion of the acoustic signal is filled with another section of the acoustic frame. By using this technique, the length of each acoustic frame can be maintained at a constant value.
ãï¼ï¼ï¼ï¼ãæ´ã«ãé·ããä¸è¶³ããé³é¿ä¿¡å·ãä»ã®é³é¿
ãã¬ã¼ã ã«ãã£ã¦å
å¡«ããæ¹æ³ã¯ã以ä¸ã®ããã«å®è¡ã
ããã¨ãã§ãããï¼ããªç§ã®åçæéé·ãæããé³é¿ã
ã¬ã¼ã ã®å ´åããã卿³¢æ°ãï½åã«é«ãããã¨ã«ãã£ã¦
ããããå¢å ãããã¨ããã°ãé³é¿ã®åçæéã¯ç縮ã
ãã¦åºåé³é¿ãã¬ã¼ã ã¯ï¼ï¼ï½ããªç§ã«ãªãããã®æé
ã¹ã±ã¼ã«ã®çµããã®ä¸è¶³é³é¿ãã¬ã¼ã ã¯ãå
ã®é³é¿ä¿¡å·
ã®é³é¿ãã¬ã¼ã ã®ããåºåãåãï¼å³ã¡ãå
ã®é³é¿ä¿¡å·
ã®ï¼ï¼ï½ããï¼ï¼ï½ï¼ï¼ããªç§ã¾ã§ã®é³é¿ãã¬ã¼ã ãå
ããã¨ã«ãã£ã¦ï¼ããããä¸è¶³é³é¿ãã¬ã¼ã ã®çµããã«
éãç¶ããããã¨ã«ãã£ã¦å
å¡«ãããã¨ãã§ãããåé³
é¿ãã¬ã¼ã ã«ã¯ãã¯ãã¹ãã§ã¼ãã£ã³ã°ã®ããã®é³é¿ä¿¡
å·ã®å°ããé åï¼ï¼ãä»å ï¼å³ã¡ãç·å½¢å ç®ï¼ããªãã
ã°ãªããªãããããå³ï¼ã«ç¤ºããï¼ï¼ã§ç¤ºãå
¥åé³é¿ä¿¡
å·ã®é³é¿ãã¬ã¼ã ã®åºåã¯ããµã³ããªã³ã°ç¹ãæ¯ä¾çã«
æ¸å°ããããå³ã¡ãµã³ããªã³ã°å¨æ³¢æ°ãå¢å ãããå¾
ã¯ãï¼ï¼ã®é·ãã¾ã§ç縮ããããããã«ãã£ã¦ãé³é¿ã
ã¬ã¼ã ï¼ï¼ã®çµããï¼ã¯ãã¹ãã§ã¼ãã£ã³ã°é¨åï¼ï¼ã
å«ã¾ãªãï¼ããã¯ãå
ã®é³é¿ä¿¡å·ã«ä¸è´ããããã«ãª
ãããããå³ï¼ã«ï¼ï¼ã§ç¤ºãããã®æ®µéã¯ãé³é¿ä¿¡å·ã®
æ®ä½ã®åºåã«ã¤ãã¦ç¹°ãè¿ãããã[0009] Furthermore, a method of filling an acoustic signal of insufficient length with another acoustic frame can be performed as follows. For an acoustic frame having a playback time length of M milliseconds, if the pitch is increased by increasing the frequency x times, the playback time of the audio is reduced and the output audio frame is M / x milliseconds. . The missing sound frame at the end of the time scale takes a section of the sound frame of the original sound signal (ie, by taking sound frames from M / x to M / x + M milliseconds of the original sound signal), It can be filled by splicing it at the end of the missing acoustic frame. A small area 20 of the audio signal for crossfading must be added (ie, linearly added) to each audio frame. This is shown in FIG. The segment of the audio frame of the input audio signal, indicated at 16, is reduced to a length of 18 after the sampling points have been reduced proportionally, ie, after the sampling frequency has been increased. As a result, from the end of the acoustic frame 18 (not including the crossfading portion 20), the original acoustic signal matches. This is indicated at 22 in FIG. This step is repeated for the remaining sections of the audio signal.
ãï¼ï¼ï¼ï¼ã䏿¹ãããé³é¿ä¿¡å·ã®ããããä½ä¸ãã¦å¨
æ³¢æ°ãï¼ï¼ï½ã«ãªãã¨ãåè¨åçæéã¯ï½ï¼ããªç§ã«ãª
ãããããå³ï¼ã«ç¤ºããä¸è¿°ããå ´åã¨åãããã«ãé³
é¿åçã®çµããã«ãå
ã®é³é¿ä¿¡å·ã®å¯¾å¿ããé¨åï¼å³ã¡
å
ã®é³é¿ä¿¡å·ã®ï½ï¼ããï½ï¼ï¼ï¼ããªç§ã¾ã§ã®é¨åï¼ã
åããã¨ã«ãã£ã¦ãé³é¿ãã¬ã¼ã ã®ããåºåãé³é¿åºå
ã®çµããã«æ¥ç¶ããããåé³é¿ãã¬ã¼ã ã®çé¢ã«ããã¦
ã¯ãã¹ãã§ã¼ãã£ã³ã°åºåãåãããã«éè¡ããããä¾
ãã°ãé³é¿ãã¬ã¼ã ï¼ï¼ã¯å
¥åé³é¿ä¿¡å·ã®ããåºåã§ã
ã£ã¦ããµã³ããªã³ã°ç¹ãå¢å ãããå³ã¡ãµã³ããªã³ã°å¨
æ³¢æ°ãä½ä¸ããå¾ã¯ãçªå·ï¼ï¼ã§ç¤ºãããã«é·ããå¢å
ãããé³é¿ãã¬ã¼ã ã®å¾ç«¯ã«ããã¦ãã¯ãã¹ãã§ã¼ãã£
ã³ã°ã®ããã«å°åºåï¼ï¼ã使ç¨ããããããã«ãã£ã¦é³
é¿ãã¬ã¼ã ã®å¾ç«¯ï¼ï¼ï¼ã¯ãã¹ãã§ã¼ãã£ã³ã°åºåï¼ï¼
ãå«ã¾ãªãï¼ã¯ãå³ï¼ã®é³é¿ãã¬ã¼ã ï¼ï¼ã§ç¤ºãããã«
å
ã®é³é¿ä¿¡å·ã«ä¸è´ããããã«ãªãããã®æ®µéã¯ããã»
ã¹ãå®äºãããããã«ç¹°ãè¿ããããæ¬çºæã®æ¹æ³ã«ã
ã£ã¦å¤æ´ãããé³é¿ä¿¡å·ã§ã¯ãé³é¿ã¹ã±ã¼ã«ã®å¤åã®ç¨
度ã¯é³é¿ãã¬ã¼ã åã³ã¯ãã¹ãã§ã¼ãã£ã³ã°ã®å¤§ããã«
é¢ä¿ãããä¸è¬çã«è¨ãã°ãããããé«ã夿´ããã
ç¨ãé³é¿ãã¬ã¼ã åã³ã¯ãã¹ãã§ã¼ãã£ã³ã°ã®é·ããç
ããªããã¨ã³ã¼ãç®ç«ããªããããã¨ãã§ãããã¾ãã
ã¯ãã¹ãã§ã¼ãã£ã³ã°ãé·ãããç¨ãçºçããéé³ãå°
ãããªããã¨ãçºè¦ãããããããªãããã¯ãã¹ãã§ã¼
ãã£ã³ã°ãé·éããã¨ãé³é¿ã®ãã¼ã³ã®è³ªãä½ä¸ããæ
ããããããã¨ããé³é¿ãã¬ã¼ã ãéãç¶ãããã®ã«ã¯
ãã¹ãã§ã¼ãã£ã³ã°æ¹æ³ã使ç¨ãã¦æ»ãããªç§»è¡ãå¾ã
ãã¨ãã§ããã¨ãã¦ããããã§ãé³é¿ãã¬ã¼ã ã®ç¸å¯¾ä½
ç½®ã«èµ·å ãã¦éé³ã¯çºçãå¾ããå¾ã£ã¦ãä»ã®é³é¿ãã¬
ã¼ã ã«æãé¡ä¼¼ããé³é¿ãã¬ã¼ã ã®é åãæ¢ç¥ããé大
ãªéé³ãçºçããããã¨ãªãããããéãç¶ãã§ããã
ãã«æ¬çºæãæ´ã«æ¹åãããã¨ãæã¾ããããããã®ä½
ç½®ãæ¢ç¥ããæ¹æ³ãå³ï¼ã«ç¤ºããä¾ãã°ãé³é¿ãã¬ã¼ã
ï¼ï¼ã®å¾ç«¯ã®å°ããé³é¿ãã¬ã¼ã åºåï¼ï¼ã¨ã第ï¼ã®é³
é¿ãã¬ã¼ã ï¼ï¼ã®ååºåï¼ï¼ã¨ãæ¯è¼ãããå°åºåï¼ï¼
ã¯ãé³é¿ãã¬ã¼ã ï¼ï¼ã®ååºåï¼ï¼ããã¯å°ããã¯ãã¹
ãã§ã¼ãã£ã³ã°é åã®å¤§ããã示ãã¦ãããé³é¿ãã¬ã¼
ã ï¼ï¼ã¨é³é¿ãã¬ã¼ã ï¼ï¼ã¨ãéãç¶ãããããã«ã¯ã
é³é¿ãã¬ã¼ã ï¼ï¼å
ã«åããããªåºåï¼ï¼ãè¦åºãå¿
è¦
ããããOn the other hand, if the pitch of the sound signal is reduced and the frequency becomes 1 / x, the total reproduction time becomes xM milliseconds. This is shown in FIG. As before, at the end of the sound reproduction, by taking the corresponding part of the original sound signal (ie from xM to xM + M milliseconds of the original sound signal), a section of the sound frame is Connected at end of output. Crossfading sections are performed in the same way at the interface of each acoustic frame. For example, the sound frame 32 is a section of the input sound signal, and the length increases as indicated by the numeral 34 after the sampling points increase, that is, after the sampling frequency decreases. At the rear end of the acoustic frame, a subsection 36 is used for crossfading. As a result, the rear end 34 of the acoustic frame (cross-fading section 36)
) Does not match the original acoustic signal as shown by the acoustic frame 38 in FIG. This step is repeated to complete the process. For an audio signal modified by the method of the present invention, the degree of change in the audio scale is related to the size of the audio frame and the crossfading. Generally speaking, the higher the pitch, the shorter the length of the acoustic frame and crossfading, making the echo less noticeable. Also,
We found that the longer the crossfading, the lower the noise generated. However, if the crossfading is too long, the tone quality of the sound may be degraded. Even though a smooth transition can be obtained using the crossfading method to splice acoustic frames, noise can still occur due to the relative position of the acoustic frames. Accordingly, it is desirable to further improve the present invention so that areas of an acoustic frame that are most similar to other acoustic frames can be located and spliced together without generating significant noise. FIG. 4 shows a method of detecting these positions. For example, the small sound frame section 42 at the rear end of the sound frame 40 is compared with the front section 44 of the second sound frame 46. Subsection 42
Indicates the size of the crossfading area smaller than the front section 44 of the acoustic frame 46. To overlap the acoustic frame 46 and the acoustic frame 40,
A similar section 48 must be found in the acoustic frame 46.
ãï¼ï¼ï¼ï¼ãé³é¿ãã¬ã¼ã ã®ããã®æãé¡ä¼¼ããéãç¶
ãé åãè¦åºãããã®æ°å¦çæ¹æ³ãæå±ããããã®æ¹æ³
ã¯ãè¨ç®æ®µéã®æ°ãæå°ã«ãããããªãå¾ã£ã¦é«å¹çã§
éãç¶ããè¡ããã¨ãã§ããéãç¶ãã®å·®åå¹³å絶対誤
å·®ï¼ï¼¤ï¼ï¼¡ï¼¥ï¼ãè¨ç®ãããã¨ã«åºã¥ãã¦ããããã®è¨
ç®ã¯ã DMAEï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)â x1(m)âx2(mï¼1 ï¼Ï) âx2(mï¼Ï) ï½ ï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)âx2(mï¼1 ï¼Ï) â[ x1(m) ï¼x2(mï¼Ï) ] ï½ ã«å¾ã£ã¦éè¡ããããä½ããDï¼ï¼¡ï¼¥ã¯éãç¶ãã®å·®å
å¹³å絶対誤差ã§ãããï½ã¯ï¼ã¨ï½ï½ï¼ï½ï½ã¯ã¯ãã¹ãã§
ã¼ãã£ã³ã°ã®ãµã¤ãºï¼ã¨ã®éã®ç¹ã®ä½çãã®çµåãåã§
ãããæ¢ç´¢é åãï½ï½ã¨ãã¦ï¼â¦Ïï¼ï½ï½ã§ãããï½1
ã¯å¤æ´æ¸ã¿ãã¬ã¼ã ã§ãããããã¦ï½2 ã¯æªå¤æ´ãã¬ã¼
ã ã§ãããï½ã®ç¹ãå¤ãããç¨ãé³é¿ã®è³ªã¯è¯å¥½ã«ãª
ããDï¼ï¼¡ï¼¥ã®ä½ç½®ã¯ããã®é³é¿ãã¬ã¼ã ã«ã¨ã£ã¦æè¯
ã®éãç¶ãç¹ã§ãããDï¼ï¼¡ï¼¥ã®è¨ç®ã¯å ç®ã¨æ¸ç®ã ã
ã§ãããããã¯ä¹ç®ãå¿
è¦ã¨ããªããããç°¡åãªããã»
ã¹ã§ãããA mathematical method is proposed to find the most similar splice regions for acoustic frames. The method is based on calculating the differential mean absolute error (DMAE) of the splice that minimizes the number of calculation steps and thus can be spliced with high efficiency. This calculation, DMAE = Σ m | x 1 (m) -x 2 (m + Ï) | + | x 1 (m + 1) - x 1 (m) -x 2 (m + 1 + Ï) -x 2 (m + Ï) | = Σ m | x 1 (m) -x 2 (m + Ï) | + | x 1 (m + 1) -x 2 (m + 1 + Ï) - [x 1 (m) + x 2 (m + Ï)] | is performed in accordance with. Here, DMAE is the difference average absolute error of the overlapping splice, m is the sum of some combination of points between 0 and cs (cs is the size of crossfading), and 0 â¦ Ï with the search area being sr. <Sr, x 1
It is a modified frame, and x 2 is the unmodified frame. The more points m, the better the sound quality. The position of the DMAE is the best lap joint for the acoustic frame. The calculation of DMAE requires only addition and subtraction, which is a simple process since no multiplication is required.
ãï¼ï¼ï¼ï¼ãæè¯éãç¶ãä½ç½®ãæ¢ç¥ããããã«ï¼¤ï¼ï¼¡
ï¼¥æ³ãé©ç¨ããå ´åããã®é³é¿ãã¬ã¼ã å
ã®å
¨ã¦ã®ãµã³
ãã«ãè¨ç®ããããé³é¿ä¿¡å·ã¯ããè¦åæ§ãæãã¦ãã
ãããä½ããï¼ã¤ã®é£æ¥ç¹éã®å·®ã¯æ¥µãã¦å°ãããå¾ã£
ã¦ããµããµã³ããªã³ã°æ³ã§è¨ç®ããããã«åï¼ã¤ã®ç¹ã®
䏿¹ãåããã¨ãã§ããããµããµã³ããªã³ã°æ³ã使ç¨ã
ããã¨ã«ãã£ã¦ãè¨ç®ã®ç²¾åº¦ãèããä½ä¸ããããã¨ãª
ããè¨ç®ã®åè¨æ°ãååã«æ¸å°ããã表ï¼ã¯ãç·ã®å£°ã
ãã¤ãªãªã³ã®é³ãåã³é»å鳿¥½ã«ã¤ãã¦ãDï¼ï¼¡ï¼¥æ³å
ã³ï¼¤ï¼ï¼¡ï¼¥ï¼ãµããµã³ããªã³ã°æ³ã®ä¸¡è
ã«ãã£ã¦è¨ç®ã
ããä¿¡å·å¯¾é鳿¯ï¼ï¼³ï¼®ï¼²ï¼ã示ãã¦ããã 表 ï¼ ï¼³ï¼®ï¼² Dï¼ï¼¡ï¼¥ Dï¼ï¼¡ï¼¥åã³ ãµããµã³ãã« ç·ã®å£° 26.25415 26.20773 ãã¤ãªãªã³ã®é³ 31.56789 31.14602 é»å鳿¥½ 19.85814 19.737 表ï¼ã«ç¤ºãããã«ããµããµã³ããªã³ã°ã使ç¨ãããã¾ã
ã¯ä½¿ç¨ããªãæ¹æ³ã使ç¨ãããã¨ã«ãã£ã¦ãç°ãªãé³é¿
ä¿¡å·ã«ã¤ãã¦å¾ãããSNRå¤ã«å¤§ããå·®ç°ã¯ãªããå®
éã®è´å試é¨ã§ã¯ãæ®éã®äººã®è³ã§ã¯å·®ãæ¤åºãããã¨
ã¯ã§ããªãã£ããã¾ãã精度ã®åå·®ã許容ç¯å²å
ã«ãã
éããåï¼ã¤ã®ç¹ããï¼ã¤ã®ãµã³ããªã³ã°ç¹ãåãåºã
ããã¾ãã¯åï¼ã¤ã®ç¹ããï¼ã¤ã®ãµã³ããªã³ã°ç¹ãåã
åºãã¦ãè¨ç®ã®æ°ãæ´ã«æ¸å°ããããã¨ãå¯è½ã§ãããDMA to find the best splice location
When applying the E method, all samples in the acoustic frame are calculated. Since the acoustic signal has a certain regularity, the difference between any two adjacent points is extremely small. Therefore, one of each two points can be taken for calculation by the subsampling method. By using the subsampling method, the total number of calculations is reduced by half without significantly reducing the accuracy of the calculations. Table 1 shows the voice of a man,
FIG. 4 shows signal-to-noise ratios (SNR) calculated by the DMAE method and the DMAE / sub-sampling method for violin sound and electronic music. Table 1 SNR DMAE DMAE and As shown in sub-sample man's voice 26.25415 26.20773 violin sound 31.56789 31.14602 electronic musical 19.85814 19.737 Table 1, by using using the sub-sampling, or methods that do not use, large SNR values obtained for the different acoustic signals There is no difference. In an actual listening test, it was not possible to detect a difference with a normal human ear. It is also possible to take one sampling point out of every three points or one sampling point out of every four points, as long as the accuracy deviation is within an acceptable range, further reducing the number of calculations. It is.
ãï¼ï¼ï¼ï¼ã代æ¿å®æ½ä¾ã«ããã¦æ¬çºæã¯ãæ®éã¯éå
ããç»åã®å¦çã«ä½¿ç¨ãããé忍宿³ãã使ç¨ããã
é忍宿³ããçµã¿å
¥ãããã¨ã«ãã£ã¦ãDï¼ï¼¡ï¼¥ãæ¢
ç¥ããã®ã«è¦ããè¨ç®ã®åè¨æ°ã大å¹
ã«æ¸å°ãããæè¨
ããã°ãæè¯ã®éãç¶ãä½ç½®ãæ¢ç´¢ããéã«ãäºæ¬¡å
æ³
ã䏿¬¡å
ï¼é²æ¢ç´¢æ³ã«ç¸®å°ãããã¨ãã§ããããã®æ¢ç´¢
ã®ç²¾åº¦ãæ¹åããããã«ãæ¢ç´¢é åãå¤ãã®åºåã«åå²
ãã¦ååºåæ¯ã®ï¼¤ï¼ï¼¡ï¼¥å¤ã決å®ãããæ¬¡ãã§ãããã®
Dï¼ï¼¡ï¼¥å¤ãæ¯è¼ããæãå°ããå¤ãæé©ã®éãç¶ãä½
ç½®ã¨ãã¦é¸æããããã®å¤æ´ãããæ¹æ³ããããã¯ï¼é²
æ¢ç´¢ï¼ï¼¢ï¼¢ï¼³ï¼ã¨å¼ã³ãå³ï¼ã«ç¤ºããé³é¿é åã®ï¼ã¤ã
ï¼ï¼ã§ç¤ºãããã®é³é¿é åï¼ï¼ã¯ï¼ã¤ã®çããé¨åã«å
å²ãããå°åºåï¼ï¼ãï¼ï¼åã³ï¼ï¼ã¯åã
ï¼ï¼ï¼é åã
ï¼ï¼ï¼é åãåã³ï¼ï¼ï¼é åã表ãã¦ããããããã®é
åã¯åã
ãã®ï¼¤ï¼ï¼¡ï¼¥å¤ã決å®ãããæ¬¡ãã§é åï¼ï¼ã
æè¯ã®æ´åä½ç½®ã§ãããã¨ã決å®ããããæ¬¡ã«ã対å¿ã
ãå°åºåï¼ï¼ãä¸å¿ä½ç½®ã¨ãã¦ãï¼ï¼ï¼åæ¹ã®å°åºåï¼
ï¼ãåã³ï¼ï¼ï¼å¾æ¹ã®å°åºåï¼ï¼ã使ç¨ããããããã®
ä¸ã§æãæ´åããä½ç½®ã決å®ããããå³ï¼ã«ç¤ºããã
ã«ãï¼ï¼ï¼ä½ç½®ã«ããå°é åï¼ï¼ãæãæ´åãã¦ããã
ã¨ãè¦åºãããããã®æ¹æ³ã¯ãæãæ´åãã¦ããä½ç½®ï¼
ï¼ãï¼ã¤ã®é³é¿ãã¬ã¼ã ã®éãç¶ãä½ç½®ã¨ãã¦ãï¼ã¤ã®
飿¥å°é åãäºãã«é¢éããå¯ä¸ã®ç¹ã§ããã¨æ±ºå®ãã
ãã¾ã§é£ç¶ãã¦éè¡ããããIn an alternative embodiment, the present invention also uses motion estimation methods that are commonly used to process moving images.
By also incorporating the motion estimation method, the total number of calculations required to detect the DMAE is greatly reduced. In other words, when searching for the best overlapping position, the two-dimensional method can be reduced to a one-dimensional binary search method. In order to improve the accuracy of this search, the search area is divided into many sections, and the DMAE value for each section is determined. These DMAE values are then compared and the smallest value is selected as the optimal splice position. This modified method is called Block Binary Search (BBS) and is shown in FIG. One of the acoustic regions is indicated by 52. This acoustic area 52 is divided into four equal parts, with subsections 54, 56 and 58 each being a quarter area,
This represents a 2/4 area and a 3/4 area. Each of these regions has its DMAE value determined, and then region 58 is determined to be the best match. Next, with the corresponding subsection 60 as the center position, the subsection 6 1/8 ahead is set.
The 2 and 1/8 back subsections 64 are used to determine the best matching location among them. As shown in FIG. 5, the small region 62 at the 5/8 position is found to be the best match. The method is based on the best matching position 6
6 is performed continuously until the three adjacent sub-regions are determined to be the only points separated from each other as the overlapping position of the two acoustic frames.
ãï¼ï¼ï¼ï¼ãæ¢ç´¢é åãï½åºåã«åå²ããããã®ã¨ãã
ã°ãåæè¯æ´åç¹ãæ¢ç¥ããã®ã«å¿
è¦ãªè¨ç®ã®æ°ã¯ã ï½ã»ãï¼ï¼ï¼ã»ï¼ log2 ï¼ï¼³ï¼ï½âï¼ï¼ã ããã«ãï¼ï¼³ã¯æ¢ç´¢é åã®é·ãã§ãããä¾ãã°ãããï½
ï¼ï¼ã§ããã°ã ï¼ï¼³ï¼ 10 ããªç§Ã 22.05ï½ï¼¨ï½ï¼ 220.5 ã«ãªãããããã¯ï¼é²æ¢ç´¢æ³ãé©ç¨ãããã¨ã«ãã£ã¦ã
å¿
è¦ãªè¨ç®ã®åè¨æ°ã¯ 42 ã«æ¸å°ããããã¯å
ã®è¨ç®æ°
ã®å
ã 20 ï¼
ã«ããéããªãããããµããµã³ããªã³ã°æ³
ããæ¡ç¨ããã°è¨ç®ã®åè¨æ°ã¯æ´ã«ååã«æ¸å°ããå¾ã£
ã¦å
ã®è¨ç®æ°ã® 10 ï¼
ã«ãªãããããã¯ï¼é²æ¢ç´¢æ³ã«ã
ãè¨ç®å¹çã表ï¼ã«ç¤ºãã表ã«ç¤ºãããã«ãBBSæ³ã
使ç¨ãã¦ãã¾ãã¯ä½¿ç¨ããã«æ±ºå®ãããï¼ã¤ã®ç°ãªãé³
é¿ä¿¡å·ã«é¢ããä¿¡å·å¯¾é鳿¯ã®å·®ã¯æ¥µãã¦å°ãããæ®é
ã®äººã¯ããããã®å·®ãèãåãããã¨ã¯ã§ããªããAssuming that the search area is divided into n sections, the number of calculations required to find each best matching point is n · [3 + 2 · (log 2 MS / nâ2)] where , MS is the length of the search area. For example, if n
If = 4, then MS = 10 ms x 22.05 kHz = 220.5. By applying the block binary search method,
The total number of calculations required is reduced to 42, which is only 20% of the original number. If the sub-sampling method is also employed, the total number of calculations is further reduced by half, and thus is 10% of the original number of calculations. Table 2 shows the calculation efficiency by the block binary search method. As shown in the table, the difference in signal-to-noise ratio for three different acoustic signals determined with or without the BBS method is very small. Ordinary people cannot discern these differences.
ãï¼ï¼ï¼ï¼ã 表 ï¼ ï¼³ï¼®ï¼² Dï¼ï¼¡ï¼¥ Dï¼ï¼¡ï¼¥å㳠Dï¼ï¼¡ï¼¥åã³ï¼¢ï¼¢ï¼³ BBS åã³ãµããµã³ãã« ç·ã®å£° 26.25415 25.66386 25.32933 ãã¤ãªãªã³ã®é³ 31.56789 31.11732 31.06021 é»å鳿¥½ 19.85814 19.60205 19.76816 å¾ã£ã¦ãæ¬çºæã¯ãé³é¿ã®åçé度ãå¤ãããã¨ã«ãã£
ã¦ãµã³ããªã³ã°ç¹ãå¤ãããã¨ãã§ãããä¸ã«ç¤ºããè¨
ç®ã«ãã£ã¦ãããããå¤ããã«ããããåçæéã墿¸
ããã¦åä¸ã®åçé度ã§å¤æ´ãããé³é¿ãåçãããã¨
ãã§ãããä¾ãã°ãããããé³é¿ä¿¡å·ã®è¨ç®ã«ãã£ã¦æ¯
å¹
ãå¢å ããã°ããã®é³é¿ä¿¡å·å
ã«å«ã¾ãããã¼ã¿éã
å¢å ãããåãåçé度ãªãã°åè¨åçæéãå¢å ãã
æ¯å¹
ã¯åä¸ã«ç¶æããããå対ã«ãããè¨ç®ã«ãã£ã¦æ¯
å¹
ãæ¸å°ããã°ããã®é³é¿ä¿¡å·å
ã«å«ã¾ãããã¼ã¿éã
æ¸å°ãããåçæéãç縮ãããã¨ãå¯è½ã«ãªãããæ¯
å¹
ã¯åä¸ã«ç¶æããããTable 2 SNR DMAE DMAE and DMAE and BBS BBS and subsample male voice 26.25415 25.66386 25.32933 Violin sound 31.56789 31.11732 31.06021 Electronic music 19.85814 19.60205 19.76816 Therefore, the present invention can change the sampling point by changing the sound reproduction speed. The above calculations allow the modified sound to be played at the same playback speed without changing the pitch but with increasing or decreasing the playback time. For example, if the amplitude is increased by calculating a certain acoustic signal, the amount of data included in the acoustic signal increases. At the same playback speed, the total playback time increases,
The amplitude remains the same. Conversely, if the amplitude is reduced by calculation, the amount of data contained in the acoustic signal is reduced. The playback time can be reduced, but the amplitude remains the same.
ãï¼ï¼ï¼ï¼ãé常ãé³é¿ä¿¡å·ã¯ã¢ããã°ä¿¡å·ã¨ãã¦ä¾çµ¦
ããããããããªããããããã®ä¿¡å·ãå¦çãããæã«
ã¯ãã£ã¸ã¿ã«å¦çæ³ã使ç¨ããªããã°ãªããªãããã£ã¸
ã¿ã«ä¿¡å·ãå¦çããå¾ã«ããããã¯åã³ã¢ããã°ä¿¡å·ã«
夿ããã¦åºåããããå³ï¼ã¯ãããã夿´ãçµã¿å
¥ã
ãé³é¿ä¿¡å·å¦çã®ããã®ãããã¯ç·å³ã§ãããå
ããã
ã¤ã¯ããã³ããå¦çã®ããã«é³é¿ãã¢ããã°é»åä¿¡å·ï½
ï¼Ïï¼ã«å¤æãããã¢ããã°ä¿¡å·ï½ï¼Ïï¼ã¯ãä¿¡å·ãå¢
å¼·ããããã«å
¥åå¢å¹
å¨ï¼ï¼ã«ãã£ã¦å¢å¹
ããããå¢å¹
ãããä¿¡å·ã¯ãéé³ä¿¡å·ãæé¤ããããã«ä½åééãã£
ã«ã¿ï¼ï¼ãééããããããæ¿¾æ³¢ãããä¿¡å·ã¯ã¢ããã°
ã»ãã¸ã¿ã«å¤æå¨ï¼ï¼ã¸å°å ãããã¢ããã°ä¿¡å·ã¯ãã£
ã¸ã¿ã«ä¿¡å·ã«å¤ããããããã®ç¹ã«ããããã£ã¸ã¿ã«ä¿¡
å·ã¯ï¼°ï¼£ï¼ä¿¡å·ã§ãããå¦çã®ããã«ãããã·ããã£ã³
ã°ããã»ããµï¼ï¼ã¸éããããæ¬¡ãã§ãå¦çãããä¿¡å·
ã¯ãã£ã¸ã¿ã«ã»ã¢ããã°å¤æå¨ï¼ï¼ã¸éãããä¿¡å·ã¯ã¢
ããã°ä¿¡å·ã«ããããæ¬¡ã«ãã¢ããã°ä¿¡å·ã¯å¥ã®ä½åé
éãã£ã«ã¿ï¼ï¼ã¸éãããããããåºåå¢å¹
å¨ï¼ï¼ã¸ä¾
給ããã¦ã¹ãã¼ã«ãéãã¦ã夿´ãããããããæãã
å¯è´é³ï½âï¼Ïï¼ã¨ãã¦åºåããããå³ï¼ã«ããããã·
ããã£ã³ã°ããã»ããµã®ã¢ã¼ããã¯ãã£ã示ããé³é¿ã
ã¼ã¿ã¯ï¼°ï¼© ï¼ï¼ãéãã¦å
¥åãããã¡ï¼ï¼ã¸éãã
ããã¯ãã¹ãã§ã¼ãã£ã³ã°ãã¼ã¿ï¼ï¼ã¯ãã¯ãã¹ãã§ã¼
ãã£ã³ã°ãå¿
è¦ã¨ããå
è¡é³é¿ãã¬ã¼ã ã®å¾é¨åãæ ¼ç´
ãã¦ãããDï¼ï¼¡ï¼¥åã³ã¯ãã¹ãã§ã¼ãã£ã³ã°ãè¨ç®ã
ãããã«ï¼¤ï¼°ï¼µ ï¼ï¼ã使ç¨ããããå¦çãããé³é¿ä¿¡
å·ã¯åºåãããã¡ï¼ï¼åã³ï¼°ï¼¯ ï¼ï¼ï¼ãéãã¦å¤é¨ã¸
åºåããããNormally, the audio signal is supplied as an analog signal. However, when these signals are processed, digital processing methods must be used. After processing the digital signals, they are again converted to analog signals and output. FIG. 6 is a block diagram for acoustic signal processing incorporating a pitch change. First, the microphone converts the sound to an analog electronic signal x for processing.
(Τ). Analog signal x (Ï) is amplified by input amplifier 70 to enhance the signal. The amplified signal is passed through a low pass filter 72 to reject noise signals. The filtered signal is applied to an analog to digital converter 74, which converts the analog signal to a digital signal. The digital signal at this point is a PCM signal and is sent to pitch shifting processor 76 for processing. The processed signal is then sent to a digital to analog converter 78, which converts the signal to an analog signal. The analog signal is then sent to another low-pass filter 80, from which it is fed to an output amplifier 82 and output through a speaker as an audible tone x '(Ï) having a modified pitch. FIG. 7 shows the architecture of the pitch shifting processor. The sound data is sent to the input buffer 92 through the PI 90. The crossfading data 94 stores the rear part of the preceding sound frame that requires crossfading. DPU 96 is used to calculate DMAE and crossfading. The processed sound signal is output to the outside through the output buffer 98 and the PO 100.
ãï¼ï¼ï¼ï¼ã以ä¸ã«æ¬çºæãä¾ç¤ºã®ç®çã§èª¬æãããã
説æã«ä½¿ç¨ããç¨èªã¯èªã®æ¬è³ªãæå³ãããã®ã§ãã£
ã¦ãå¶ç´ãããã®ã§ã¯ãªããã¨ãçè§£ãããããæ´ã«ã
æ¬çºæããã®å¥½ã¾ãã宿½ã®å½¢æ
ã«é¢ãã¦èª¬æãããã
彿¥è
ãªãã°ãããã®æç¤ºãæ¬çºæã®ä»ã®èãå¾ãå¤å½¢
ã«å®¹æã«é©ç¨ã§ããããæ¬çºæã¯ãããã®å®æ½ã®å½¢æ
ã«
éå®ããããã®ã§ã¯ãªããç¹è¨±è«æ±ã®ç¯å²ã«ãã£ã¦ã®ã¿
éå®ããããã®ã§ãããWhile the present invention has been described for purposes of illustration,
It is to be understood that the terminology used in the description is intended to be in the nature of the word and not restrictive. Furthermore,
Although the present invention has been described in terms of its preferred embodiments,
One skilled in the art will readily apply these teachings to other possible variations of the present invention. The present invention is not limited to these embodiments, but is limited only by the claims.
ãå³ï¼ããµã³ããªã³ã°ç¹ãå¢å åã³æ¸å°ããã¦åä¸ã®å
çé度ã§åçãããé³é¿ä¿¡å·ã示ãå³ã§ãããFIG. 1 is a diagram showing audio signals reproduced at the same reproduction speed by increasing and decreasing sampling points.
ãå³ï¼ãé³é¿ã¹ã±ã¼ã«ãå¢å ãããããã®æ¬çºæã®é³é¿
ãã¬ã¼ã éãç¶ãæ¹æ³ã示ãå³ã§ãããFIG. 2 is a diagram illustrating an acoustic frame splicing method of the present invention for increasing an acoustic scale.
ãå³ï¼ãé³é¿ã¹ã±ã¼ã«ãæ¸å°ãããããã®æ¬çºæã®é³é¿
ãã¬ã¼ã éãç¶ãæ¹æ³ã示ãå³ã§ãããFIG. 3 is a diagram illustrating an acoustic frame splicing method of the present invention for reducing an acoustic scale.
ãå³ï¼ãé³é¿ãã¬ã¼ã ã®æè¯éãç¶ãä½ç½®ãè¦åºããã
ã®æ¢ç´¢ã®ç¯å²åã³æ¹æ³ã示ãå³ã§ãããFIG. 4 is a diagram showing a search range and a method for finding a best overlap position of an acoustic frame.
ãå³ï¼ãæè¯éãç¶ãä½ç½®ãè¦åºãããã®æ¬çºæã®ï¼é²
æ¢ç´¢æ¹æ³ã示ãå³ã§ãããFIG. 5 is a diagram showing a binary search method of the present invention for finding the best overlapping position.
ãå³ï¼ãæ¬çºæã«ããè£ ç½®ã®ãããã¯ç·å³ã§ãããFIG. 6 is a block diagram of the device according to the invention.
ãå³ï¼ãå³ï¼ã®è£
ç½®ã®ãããã·ããã£ã³ã°ããã»ããµã®
ãããã¯ç·å³ã§ãããFIG. 7 is a block diagram of a pitch shifting processor of the apparatus of FIG.
ï¼ï¼ å ¥åå¢å¹ å¨ ï¼ï¼ ä½åééãã£ã«ã¿ ï¼ï¼ ã¢ããã°ã»ãã¸ã¿ã«å¤æå¨ ï¼ï¼ ãããã·ããã£ã³ã°ããã»ããµ ï¼ï¼ ãã£ã¸ã¿ã«ã»ã¢ããã°å¤æå¨ ï¼ï¼ ä½åééãã£ã«ã¿ ï¼ï¼ åºåå¢å¹ å¨ ï¼ï¼ PI ï¼ï¼ å ¥åãããã¡ ï¼ï¼ ã¯ãã¹ãã§ã¼ãã£ã³ã°ãã¼ã¿ ï¼ï¼ DPU ï¼ï¼ åºåãããã¡ ï¼ï¼ï¼ PO Reference Signs List 70 input amplifier 72 low-pass filter 74 analog-to-digital converter 76 pitch shifting processor 78 digital-to-analog converter 80 low-pass filter 82 output amplifier 90 PI 92 input buffer 94 cross-fading data 96 DPU 98 output buffer 100 PO
Claims (14) Translated from Japanese ãç¹è¨±è«æ±ã®ç¯å²ã[Claims] ãè«æ±é
ï¼ã ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ããæ¹
æ³ã«ããã¦ã ï¼ï½ï¼ã¢ããã°ãªã¼ãã£ãªä¿¡å·ããã£ã¸ã¿ã«ä¿¡å·ã«å¤æ
ããæ®µéã¨ã ï¼ï½ï¼ä¸è¨ãã£ã¸ã¿ã«ä¿¡å·ãé³é¿ãã¬ã¼ã ã«åå²ããæ®µ
éã¨ã ï¼ï½ï¼ãããã¬ã¼ã å
ã®ä¸è¨ãã£ã¸ã¿ã«ä¿¡å·ã®ãããå
ã³åçé度ã夿´ããæ®µéã¨ã ï¼ï½ï¼ä¸è¨å¤æ´æ¸ã¿é³é¿ãã¬ã¼ã ã¨æªå¤æ´é³é¿ãã¬ã¼ã
ã¨ãéãç¶ãããæ®µéã¨ã ï¼ï½
ï¼ä¸è¨æ®µéï¼ï½ï¼åã³ï¼ï½ï¼ããä¸è¨æªå¤æ´é³é¿ã
ã¬ã¼ã ã«ã¤ãã¦ãåã³ä¸è¨ãã£ã¸ã¿ã«ä¿¡å·ã®æ®ä½ã®æªå¤
æ´é³é¿ãã¬ã¼ã ã«ã¤ãã¦ç¹°ãè¿ãã夿´æ¸ã¿ãã£ã¸ã¿ã«
ä¿¡å·ãçæããæ®µéã¨ã ï¼ï½ï¼ä¸è¨å¤æ´æ¸ã¿ãã£ã¸ã¿ã«ä¿¡å·ãã¢ããã°ä¿¡å·ã«æ»
ãããã«å¤æããæ®µéã¨ãããªãã ä¸è¨éãç¶ã段éï¼ï½ï¼ã¯ãä¸è¨å¤æ´æ¸ã¿é³é¿ãã¬ã¼ã
ã®ç«¯é åã¨ãé³é¿æ§é ãä¸è¨ç«¯é åã«é¡ä¼¼ãã¦ããä¸è¨
æªå¤æ´é³é¿ãã¬ã¼ã ã®é¨åã¨ãã¯ãã¹ãã§ã¼ãã£ã³ã°ã®
ããã«éãåããããã¨ãå«ã¿ãä¸è¨é³é¿æ§é ã®é¡ä¼¼æ§
ã¯ãDï¼ï¼¡ï¼¥ãéãç¶ãã®å·®åå¹³å絶対誤差ã¨ããï½ã
ï¼ã¨ã¯ãã¹ãã§ã¼ãã£ã³ã°ã®ãµã¤ãºã§ããï½ï½ã¨ã®éã®
ç¹ã®ä½çãã®çµåãåã¨ããæ¢ç´¢é åãï½ï½ã¨ãã¦ï¼â¦
Ïï¼ï½ï½ã¨ããï½1 ã夿´æ¸ã¿ãã¬ã¼ã ã¨ããããã¦ï½
2 ãæªå¤æ´ãã¬ã¼ã ã¨ãã¦ã DMAEï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)â x1(m)âx2(mï¼1 ï¼Ï) âx2(mï¼Ï) ï½ ï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)âx2(mï¼1 ï¼Ï) â[ x1(m) ï¼x2(mï¼Ï) ] ï½ ãªã颿°ã«ããè¨ç®æ®µéã®æ°ãæãå°ãªããããããªé
ãç¶ãã®å·®åå¹³å絶対誤差ãå®ç¾©ãããã¨ã«ãã£ã¦æ±ºã
ããããã¨ãç¹å¾´ã¨ããæ¹æ³ã1. A method for changing parameters of an audio signal, comprising: (a) converting an analog audio signal into a digital signal; (b) dividing the digital signal into sound frames; and (c) a frame. Changing the pitch and the reproduction speed of the digital signal in (d); (d) overlapping the changed sound frame and the unchanged sound frame; and (e) changing the steps (c) and (d). Generating a modified digital signal by repeating the unmodified audio frame and the remaining unmodified audio frame of the digital signal; and (f) converting the modified digital signal back to an analog signal. In the lap splicing step (d), the end region of the modified acoustic frame and the acoustic structure are The superimposition of the acoustic structure includes superimposing a portion of the unmodified acoustic frame that is similar to a region for crossfading, wherein DMAE is the differential mean absolute error of the splice and m is 0. The sum of some combination of points between the crossfading size cs and the search area sr is defined as 0 â¦
Let Ï <sr, let x 1 be a modified frame, and x
2 as unmodified frame, DMAE = Σ m | x 1 (m) -x 2 (m + Ï) | + | x 1 (m + 1) - x 1 (m) -x 2 (m + 1 + Ï) -x 2 (m + Ï) | = Σ m | x 1 ( m) -x 2 (m + Ï) | + | x 1 (m + 1) -x 2 (m + 1 + Ï) - [x 1 (m) + x 2 (m + Ï)] | calculation step by comprising function The method is characterized in that it is determined by defining a difference average absolute error of the overlapped splice such that the number of the splices is minimized. ãè«æ±é
ï¼ã ä¸è¨å¤æ´ã«ãã£ã¦é³é¿ãã¬ã¼ã ãé·ãã
ããå ´åã«ã¯ãéå°ã®æªå¤æ´é³é¿ãã¬ã¼ã ã¯ç ´æ£ããã
åçæéã¯å¤åããã«ä¿åãããè«æ±é
ï¼ã«è¨è¼ã®ãªã¼
ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ããæ¹æ³ã2. If the change lengthens the acoustic frame, the excess unmodified acoustic frame is discarded,
The method according to claim 1, wherein the playback time is stored without change. ãè«æ±é
ï¼ã ä¸è¨å¤æ´ã«ãã£ã¦é³é¿ãã¬ã¼ã ãçãã
ããå ´åã«ã¯ãä¸è¶³ã®æªå¤æ´é³é¿ãã¬ã¼ã ãå
ã®ãã£ã¸
ã¿ã«ä¿¡å·ããåãå
¥ããåçæéã¯å¤åããã«ä¿åãã
ãè«æ±é
ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ã
ãæ¹æ³ã3. The audio signal according to claim 1, wherein when the audio frame is shortened by the change, a missing unaltered audio frame is taken in from the original digital signal, and the reproduction time is stored without change. How to change parameters. ãè«æ±é
ï¼ã ä¸è¨ï¼¤ï¼ï¼¡ï¼¥ã¯ãï½ã許容è¨ç®ç²¾åº¦ã®ç¯
å²ã«ä¾åããæ´æ°ã¨ãã¦ãäºãã«é¢éããç¹ï½Ïå
ã«å®
義ãããè«æ±é
ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã
夿´ããæ¹æ³ã4. The method according to claim 1, wherein the DMAE is defined within points nÏ apart from each other, where n is an integer depending on a range of allowable calculation accuracy. ãè«æ±é
ï¼ã ä¸è¨æ¢ç´¢é åãè¤æ°ã®åºåã«åå²ããä¸
è¨ååºåæ¯ã«ãä¸è¨ï¼¤ï¼ï¼¡ï¼¥ãå®ç¾©ããä¸è¨å®ç¾©ããã
Dï¼ï¼¡ï¼¥ãäºãã«æ¯è¼ããããã¦æå°ï¼¤ï¼ï¼¡ï¼¥ãæãã
åºåãæé©ã®éãç¶ãä½ç½®ã¨ãã¦é¸æããè«æ±é
ï¼ã«è¨
è¼ã®ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ããæ¹æ³ã5. The method according to claim 5, wherein the search area is divided into a plurality of sections, the DMAE is defined for each of the sections, the defined DMAEs are compared with each other, and the section having the minimum DMAE is determined as an optimum splice position. 2. The method for changing parameters of an audio signal according to claim 1, wherein the parameter is selected as: ãè«æ±é
ï¼ã ä¸è¨æå°ï¼¤ï¼ï¼¡ï¼¥ãæããåºåãæ¢ç¥ã
ãã®ã«å¿
è¦ãªè¨ç®ã®æ°ã¯ãï½ãåºåã®æ°ãï¼ï¼³ãä¸è¨æ¢
ç´¢é åã®é·ãã¨ãã¦ã ï½ãï¼ï¼ï¼ï¼ log2 ï¼ï¼³ï¼ï½âï¼ï¼ã ã§ããè«æ±é
ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ãå¤
æ´ããæ¹æ³ã6. The number of calculations required to find the partition having the minimum DMAE is as follows: n is the number of partitions, MS is the length of the search area, and n [3 + 2 (log 2 MS / nâ2). 6. The method for changing parameters of an audio signal according to claim 5, wherein: ãè«æ±é
ï¼ã ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ããæ¹
æ³ã«ããã¦ã ï¼ï½ï¼ã¢ããã°ãªã¼ãã£ãªä¿¡å·ããã£ã¸ã¿ã«ä¿¡å·ã«å¤æ
ããæ®µéã¨ã ï¼ï½ï¼ä¸è¨ãã£ã¸ã¿ã«ä¿¡å·ãé³é¿ãã¬ã¼ã ã«åå²ããæ®µ
éã¨ã ï¼ï½ï¼ãããã¬ã¼ã ã®åçæéã夿´ããæ®µéã¨ã ï¼ï½ï¼ä¸è¨å¤æ´æ¸ã¿é³é¿ãã¬ã¼ã ã¨æªå¤æ´é³é¿ãã¬ã¼ã
ã¨ãéãç¶ãããæ®µéã¨ã ï¼ï½
ï¼ä¸è¨æ®µéï¼ï½ï¼åã³ï¼ï½ï¼ããä¸è¨æªå¤æ´é³é¿ã
ã¬ã¼ã ã«ã¤ãã¦ãåã³ä¸è¨ãã£ã¸ã¿ã«ä¿¡å·ã®æ®ä½ã®æªå¤
æ´é³é¿ãã¬ã¼ã ã«ã¤ãã¦ç¹°ãè¿ãã夿´æ¸ã¿ãã£ã¸ã¿ã«
ä¿¡å·ãçæããæ®µéã¨ã ï¼ï½ï¼ä¸è¨å¤æ´æ¸ã¿ãã£ã¸ã¿ã«ä¿¡å·ãã¢ããã°ä¿¡å·ã«æ»
ãããã«å¤æããæ®µéã¨ãããªãã ä¸è¨éãç¶ã段éï¼ï½ï¼ã¯ãä¸è¨å¤æ´æ¸ã¿é³é¿ãã¬ã¼ã
ã®ç«¯é åã¨ãé³é¿æ§é ãä¸è¨ç«¯é åã«é¡ä¼¼ãã¦ããä¸è¨
æªå¤æ´é³é¿ãã¬ã¼ã ã®é¨åã¨ãã¯ãã¹ãã§ã¼ãã£ã³ã°ã®
ããã«éãåããããã¨ãå«ã¿ãä¸è¨é³é¿æ§é ã®é¡ä¼¼æ§
ã¯ãDï¼ï¼¡ï¼¥ãéãç¶ãã®å·®åå¹³å絶対誤差ã¨ããï½ã
ï¼ã¨ã¯ãã¹ãã§ã¼ãã£ã³ã°ã®ãµã¤ãºã§ããï½ï½ã¨ã®éã®
ç¹ã®ä½çãã®çµåãåã¨ããæ¢ç´¢é åãï½ï½ã¨ãã¦ï¼â¦
Ïï¼ï½ï½ã¨ããï½1 ã夿´æ¸ã¿ãã¬ã¼ã ã¨ããããã¦ï½
2 ãæªå¤æ´ãã¬ã¼ã ã¨ãã¦ã DMAEï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)â x1(m)âx2(mï¼1 ï¼Ï) âx2(mï¼Ï) ï½ ï¼Î£m ï½x1(m) âx2(mï¼Ï) ï½ ï¼ï½x1(mï¼1)âx2(mï¼1 ï¼Ï) â[ x1(m) ï¼x2(mï¼Ï) ] ï½ ãªã颿°ã«ããè¨ç®æ®µéã®æ°ãæãå°ãªããããããªé
ãç¶ãã®å·®åå¹³å絶対誤差ãå®ç¾©ãããã¨ã«ãã£ã¦æ±ºã
ããããã¨ãç¹å¾´ã¨ããæ¹æ³ã7. A method for changing parameters of an audio signal, comprising: (a) converting an analog audio signal to a digital signal; (b) dividing the digital signal into acoustic frames; and (c) a frame. (D) overlapping the changed sound frame and the unchanged sound frame; and (e) performing the steps (c) and (d) for the unchanged sound frame. And generating a modified digital signal by repeating the remaining unmodified acoustic frames of the digital signal; and (f) converting the modified digital signal back to an analog signal. Step (d) comprises an end region of the modified sound frame and the unmodified sound whose sound structure is similar to the end region. Including overlapping portions of the frame for crossfading, the similarity of the acoustic structure may be such that the DMAE is the difference mean absolute error of the overlap and m is 0 and cs is the size of the crossfading. The sum of some combination of the points between them, the search area is sr and 0 â¦
Let Ï <sr, let x 1 be a modified frame, and x
2 as unmodified frame, DMAE = Σ m | x 1 (m) -x 2 (m + Ï) | + | x 1 (m + 1) - x 1 (m) -x 2 (m + 1 + Ï) -x 2 (m + Ï) | = Σ m | x 1 ( m) -x 2 (m + Ï) | + | x 1 (m + 1) -x 2 (m + 1 + Ï) - [x 1 (m) + x 2 (m + Ï)] | calculation step by comprising function The method is characterized in that it is determined by defining a difference average absolute error of the overlapped splice such that the number of the splices is minimized. ãè«æ±é
ï¼ã ä¸è¨åçæéã®å¤æ´ã¯ãä¸è¨æéãå¢å
ããããã¨ãå«ã¿ãä¸è¨ãªã¼ãã£ãªä¿¡å·ã®å¦çã¯ãä¸è¨
ãªã¼ãã£ãªä¿¡å·ã®ãµã³ããªã³ã°ç¹ãå¢å ããããã¨ã«ã
ã£ã¦åçæéåã³ä¸è¨ãªã¼ãã£ãªä¿¡å·ã®ãµã³ããªã³ã°ç¹
ãç¶æãããã¨ãå«ãè«æ±é
ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·
ã®ãã©ã¡ã¿ã夿´ããæ¹æ³ã8. The method of claim 1, wherein the changing of the reproduction time includes increasing the time, and the processing of the audio signal maintains the reproduction time and the sampling point of the audio signal by increasing a sampling point of the audio signal. The method of claim 7, wherein the method comprises: changing a parameter of the audio signal. ãè«æ±é
ï¼ã ä¸è¨åçæéã®å¤æ´ã¯ãä¸è¨æéãæ¸å°
ããããã¨ãå«ã¿ãä¸è¨ãªã¼ãã£ãªä¿¡å·ã®å¦çã¯ãä¸è¨
ãªã¼ãã£ãªä¿¡å·ã®ãµã³ããªã³ã°ç¹ãæ¸å°ããããã¨ã«ã
ã£ã¦åçæéåã³ä¸è¨ãªã¼ãã£ãªä¿¡å·ã®ãµã³ããªã³ã°ç¹
ãç¶æãããã¨ãå«ãè«æ±é
ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·
ã®ãã©ã¡ã¿ã夿´ããæ¹æ³ã9. The method of claim 1, wherein the changing of the reproduction time includes reducing the time, and the processing of the audio signal maintains the reproduction time and the sampling point of the audio signal by reducing a sampling point of the audio signal. The method of claim 7, wherein the method comprises: changing a parameter of the audio signal. ãè«æ±é
ï¼ï¼ã ä¸è¨ï¼¤ï¼ï¼¡ï¼¥ã¯ãï½ã許容è¨ç®ç²¾åº¦ã®
ç¯å²ã«ä¾åããæ´æ°ã¨ãã¦ãäºãã«é¢éããç¹ï½Ïå
ã«
å®ç¾©ãããè«æ±é
ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿
ã夿´ããæ¹æ³ã10. The method of changing parameters of an audio signal according to claim 7, wherein the DMAE is defined within a point nÏ apart from each other, where n is an integer depending on a range of allowable calculation accuracy. ãè«æ±é
ï¼ï¼ã ä¸è¨æ¢ç´¢é åãè¤æ°ã®åºåã«åå²ãã
ä¸è¨ååºåæ¯ã«ãä¸è¨ï¼¤ï¼ï¼¡ï¼¥ãå®ç¾©ããä¸è¨å®ç¾©ãã
ãDï¼ï¼¡ï¼¥ãäºãã«æ¯è¼ããããã¦æå°ï¼¤ï¼ï¼¡ï¼¥ãæã
ãåºåãæé©ã®éãç¶ãä½ç½®ã¨ãã¦é¸æããè«æ±é
ï¼ã«
è¨è¼ã®ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ããæ¹æ³ã11. The search area is divided into a plurality of sections,
8. The parameter of the audio signal according to claim 7, wherein the DMAE is defined for each of the sections, the defined DMAEs are compared with each other, and the section having the minimum DMAE is selected as an optimum overlapping position. Method. ãè«æ±é
ï¼ï¼ã ä¸è¨æå°ï¼¤ï¼ï¼¡ï¼¥ãæããåºåãæ¢ç¥
ããã®ã«å¿
è¦ãªè¨ç®ã®æ°ã¯ãï½ãåºåã®æ°ãï¼ï¼³ãä¸è¨
æ¢ç´¢é åã®é·ãã¨ãã¦ã ï½ãï¼ï¼ï¼ï¼ log2 ï¼ï¼³ï¼ï½âï¼ï¼ã ã§ããè«æ±é
ï¼ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã
夿´ããæ¹æ³ã12. The number of calculations required to find a partition having the minimum DMAE is as follows: n is the number of partitions, MS is the length of the search area, and n [3 + 2 (log 2 MS / nâ2). The method according to claim 11, wherein the parameter of the audio signal is changed. ãè«æ±é
ï¼ï¼ã ãªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ãã
è£
ç½®ã«ããã¦ãå
¥åå¢å¹
å¨åã³åºåå¢å¹
å¨ã¨ã第ï¼åã³
第ï¼ã®ä½åééãã£ã«ã¿ã¨ãã¢ããã°ã»ãã¸ã¿ã«å¤æå¨
ã¨ããã£ã¸ã¿ã«ã»ã¢ããã°å¤æå¨ã¨ããããã·ããã£ã³
ã°ããã»ããµã¨ãåããä¸è¨å
¥åå¢å¹
å¨ãä¸è¨ç¬¬ï¼ã®ä½
åééãã£ã«ã¿ãåã³ä¸è¨ã¢ããã°ã»ãã¸ã¿ã«å¤æå¨ã¯
ä¸è¨ãããã·ããã£ã³ã°ããã»ããµã®å
¥åã«ç´åã«æ¥ç¶
ãããä¸è¨ãã£ã¸ã¿ã«ã»ã¢ããã°å¤æå¨ãä¸è¨ç¬¬ï¼ã®ä½
åééãã£ã«ã¿ãåã³ä¸è¨åºåå¢å¹
å¨ã¯ä¸è¨ãããã·ã
ãã£ã³ã°ããã»ããµã®åºåã«ç´åã«æ¥ç¶ããã¦ãããã¨
ãç¹å¾´ã¨ãããªã¼ãã£ãªä¿¡å·ã®ãã©ã¡ã¿ã夿´ããè£
ç½®ã13. An apparatus for changing parameters of an audio signal, comprising: an input amplifier and an output amplifier; first and second low-pass filters; an analog-to-digital converter; a digital-to-analog converter; A switching processor, wherein the input amplifier, the first low-pass filter, and the analog-to-digital converter are connected in series to an input of the pitch shifting processor, and the digital-to-analog converter, the second An apparatus for changing parameters of an audio signal, wherein the low-pass filter according to claim 1 and said output amplifier are connected in series to an output of said pitch shifting processor. ãè«æ±é
ï¼ï¼ã ä¸è¨ãããã·ããã£ã³ã°ããã»ããµ
ã¯ãå
¥åãããã¡ã«æ¥ç¶ããã¦ããå
¥åã¦ãããã¨ãåº
åãããã¡ã«æ¥ç¶ããã¦ããåºåã¦ãããã¨ãã¯ãã¹ã
ã§ã¼ãã£ã³ã°ãå¿
è¦ã¨ãããªã¼ãã£ãªä¿¡å·ã®é¨åãæ ¼ç´
ããã¯ãã¹ãã§ã¼ãã£ã³ã°ãã¼ã¿ã¡ã¢ãªã¨ãä¸è¨å
¥åå
ã³åºåãããã¡åã³ä¸è¨ã¯ãã¹ãã§ã¼ãã£ã³ã°ãã¼ã¿ã¡
ã¢ãªã«æ¥ç¶ããã¦ããã¢ãã¬ã¹ã¦ãããã¨ãã¬ã¸ã¹ã¿ã
ã¡ã¤ã«ã¦ãããã¨ãå¹³å絶対誤差åã³ã¯ãã¹ãã§ã¼ãã£
ã³ã°å¤ãè¨ç®ãããã£ã¸ã¿ã«å¦çã¦ãããã¨ãå¶å¾¡ã¦ã
ããã¨ãåããä¸è¨å
¥åãããã¡ãä¸è¨ã¯ãã¹ãã§ã¼ã
ã£ã³ã°ãã¼ã¿ã¡ã¢ãªãä¸è¨ã¬ã¸ã¹ã¿ãã¡ã¤ã«ã¦ãããã
ä¸è¨ãã£ã¸ã¿ã«å¦çã¦ããããä¸è¨å¶å¾¡ã¦ããããåã³
ä¸è¨åºåãããã¡ã¯ããã¹ã·ã¹ãã ãéãã¦äºãã«ä½å
çã«æ¥ç¶ããã¦ããè«æ±é
ï¼ï¼ã«è¨è¼ã®ãªã¼ãã£ãªä¿¡å·
ã®ãã©ã¡ã¿ã夿´ããè£
ç½®ã14. A pitch shifting processor comprising: an input unit connected to an input buffer; an output unit connected to an output buffer; and a crossfading unit for storing a portion of an audio signal requiring crossfading. An address unit connected to the input and output buffers and the crossfading data memory; a register file unit; a digital processing unit for calculating an average absolute error and a crossfading value; and a control unit. The input buffer, the cross-fading data memory, the register file unit,
14. The apparatus for changing parameters of an audio signal according to claim 13, wherein the digital processing unit, the control unit, and the output buffer are operatively connected to each other through a bus system.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4