å ·ä½å®æ½æ¹å¼ Detailed ways
å¾1说æäºä¸ä¸ªå ¸åçç³»ç»1ï¼å¨å ¶ä¸å¯ä»¥æçå°ä½¿ç¨æ¬åæãåå°æº10å å«ä¸ä¸ªå¤©çº¿12ï¼å ¶å æ¬ç¸å ³ç硬件å软件以è½å¤åæ¥æ¶æº20åéæ çº¿çµä¿¡å·5ãåå°æº10é¤äºå¤ä¸ªå ¶å®é¨åä¹å¤è¿å æ¬å¤å£°éç¼ç å¨14ï¼å ¶å°å¤ä¸ªè¾å ¥å£°é16çä¿¡å·åæ¢æéäºæ 线çµä¼ è¾çè¾åºä¿¡å·ã以ä¸å°è¿ä¸æ¥è¯¦ç»æè¿°åéçå¤å£°éç¼ç å¨14çå®ä¾ãå¯ä»¥ä»ä¾å¦é³é¢ä¿¡å·åå¨å¨18æä¾è¾å ¥å£°é16çä¿¡å·ï¼ä¾å¦é³é¢è®°å½çæ°åè¡¨ç¤ºçæ°æ®æä»¶ãç£å¸¦æè é³é¢çèä¹ç¯çççãè¿å¯ä»¥âå®åµâæä¾è¾å ¥å£°é16çä¿¡å·ï¼ä¾å¦ä»ä¸ç»è¯ç19æä¾ã妿é³é¢ä¿¡å·è¿ä¸æ¯æ°åæ ¼å¼ï¼åå¨è¿å ¥å¤å£°éç¼ç å¨14ä¹åå¯¹å ¶è¿è¡æ°ååãFigure 1 illustrates a typical system 1 in which the present invention may be beneficially used. The transmitter 10 comprises an antenna 12 including the associated hardware and software to be able to transmit a radio signal 5 to a receiver 20 . Transmitter 10 includes, among other things, a multi-channel encoder 14 that transforms signals of a plurality of input channels 16 into output signals suitable for radio transmission. Examples of suitable multi-channel encoders 14 are described in further detail below. The input channel 16 signal may be provided from eg an audio signal memory 18, eg a data file of a digital representation of an audio recording, a magnetic tape or an audio polyethylene disc or the like. The input channel 16 signal may also be provided "live", for example from a set of microphones 19 . If the audio signal is not already in digital format, it is digitized before entering the multi-channel encoder 14 .
卿¥æ¶æº20ä¾§ï¼å ·æç¸å ³ç¡¬ä»¶å软件ç天线22å¤ç表示å¤é³é³é¢ä¿¡å·çæ 线çµä¿¡å·5çæ¥æ¶ã卿¤æ§è¡é常çåè½ï¼ä¾å¦è¯¯å·®æ ¡æ£ãè§£ç å¨24è§£ç ææ¥æ¶çæ 线çµä¿¡å·5ï¼å¹¶ä¸å°ç±æ¤æºå¸¦çé³é¢æ°æ®åæ¢æå¤ä¸ªè¾åºå£°é26çä¿¡å·ãè¾åºä¿¡å·å¯ä»¥è¢«æä¾ç»ä¾å¦æ¬å£°å¨29è¿è¡ç«å³åç°ï¼æè å¯ä»¥è¢«åå¨å¨ä»»ä½ç§ç±»çé³é¢ä¿¡å·åå¨å¨28ä¸ãOn the receiver 20 side, an antenna 22 with associated hardware and software handles the reception of a radio signal 5 representing a multi-tone audio signal. The usual functions, such as error correction, are performed here. The decoder 24 decodes the received radio signal 5 and transforms the audio data carried thereby into signals of a plurality of output channels 26 . The output signal may be provided eg to a loudspeaker 29 for immediate presentation, or may be stored in any kind of audio signal memory 28 .
ç³»ç»1å¯ä»¥æ¯ä¾å¦çµè¯ä¼è®®ç³»ç»ãç¨äºæä¾é³é¢æå¡æå ¶å®é³é¢åºç¨çç³»ç»ãå¨ä¸äºç³»ç»ä¸ï¼ä¾å¦å¨çµè¯ä¼è®®ç³»ç»ä¸ï¼éä¿¡å¿ é¡»æ¯å工类åçï¼èä»ä¸ä¸ªæå¡ä¾åºåå订æ·ååé³ä¹åå¯ä»¥åºæ¬ä¸æ¯ååç±»åçãä»åå°æº10å°æ¥æ¶æº20çä¿¡å·ä¼ è¾ä¹å¯ä»¥ç¨ä»»ä½å ¶å®çæ¹å¼è¿è¡ï¼ä¾å¦éè¿ä¸åç§ç±»ççµç£æ³¢ãçµç¼æå 纤以åå®ä»¬çç»åãThe system 1 may be, for example, a teleconferencing system, a system for providing audio services or other audio applications. In some systems, such as in teleconferencing systems, the communication must be of the duplex type, whereas the distribution of music from a service provider to subscribers can be essentially of the one-way type. The signal transmission from the transmitter 10 to the receiver 20 can also be done in any other way, for example by different kinds of electromagnetic waves, cables or optical fibers and combinations thereof.
å¾2aè¯´ææ ¹æ®æ¬åæçç¼ç å¨ç宿½ä¾ãå¨è¿ä¸å®æ½ä¾ä¸ï¼å¤é³ä¿¡å·æ¯å å«å¨è¾å ¥ç«¯16Aå16B夿¥æ¶ç两个声éaåbçç«ä½å£°ä¿¡å·ã声éaåbçä¿¡å·è¢«æä¾ç»é¢å¤çåå 32ï¼å¨é£éå¯ä»¥æ§è¡ä¸åçä¿¡å·è°èè¿ç¨ãæ¥èªé¢å¤çåå 32çè¾åºçä¿¡å·(ä¹è®¸è¢«ä¿®æ¹è¿)å¨å æ³åå 34ä¸è¿è¡æ±åãæè¿°å æ³åå 34è¿ææå¾å°çåé¤ä»¥å å2ã以è¿ç§æ¹å¼äº§ççä¿¡å·xmonoæ¯è¯¥ç«ä½å£°ä¿¡å·ç主信å·ï¼å 为å®åºæ¬ä¸å æ¬æ¥èªä¸¤ä¸ªä¿¡éçæææ°æ®ãå¨è¿ä¸å®æ½ä¾ä¸ï¼ä¸»ä¿¡å·å è表示ä¸ä¸ªçº¯âå声éâä¿¡å·ã主信å·xmono被æä¾ç»ä¸»ä¿¡å·ç¼ç å¨åå 38ï¼å ¶æ ¹æ®ä»»ä½åéçç¼ç åçæ¥ç¼ç æè¿°ä¸»ä¿¡å·ãè¿äºåçå¯ä»¥å¨ç°æææ¯ä¸è·å¾ï¼å è卿¤ä¸ä½è¿ä¸æ¥ç讨论ã主信å·ç¼ç å¨åå 38ç»åºè¾åºä¿¡å·pmonoï¼ä½ä¸ºè¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°ãFigure 2a illustrates an embodiment of an encoder according to the invention. In this embodiment, the polyphonic signal is a stereo signal comprising two channels a and b received at inputs 16A and 16B. The signals of channels a and b are supplied to a pre-processing unit 32 where various signal conditioning processes can be performed. The signals (possibly modified) from the output of the pre-processing unit 32 are summed in an addition unit 34 . The summing unit 34 also divides the resulting sum by a factor of two. The signal x mono produced in this way is the main signal of this stereo signal, since it basically includes all the data from both channels. In this embodiment, the main signal thus represents a purely "mono" signal. The main signal x mono is provided to a main signal encoder unit 38 which encodes said main signal according to any suitable encoding principle. These principles are available in the prior art and thus will not be discussed further here. The main signal encoder unit 38 gives an output signal p mono as an encoding parameter representing the main signal.
å¨åæ³åå 36ä¸ï¼å£°éä¿¡å·çå·®(é¤ä»¥å å2)被æä¾ä½ä¸ºä¾§ä¿¡å·xsideãå¨è¿ä¸å®æ½ä¾ä¸ï¼ä¾§ä¿¡å·è¡¨ç¤ºç«ä½å£°ä¿¡å·ç两个声éä¹é´çå·®ãä¾§ä¿¡å·xside被æä¾ç»ä¾§ä¿¡å·ç¼ç åå 30ã以ä¸å°è¿ä¸æ¥è®¨è®ºä¾§ä¿¡å·ç¼ç åå 30çä¼é宿½ä¾ãæ ¹æ®å°å¨ä¸é¢è¿ä¸æ¥è¯¦ç»è®¨è®ºçä¾§ä¿¡å·ç¼ç è¿ç¨ï¼ä¾§ä¿¡å·xsideè¢«è½¬æ¢æè¡¨ç¤ºä¾§ä¿¡å·xsideçç¼ç åæ°psideã卿äºå®æ½ä¾ä¸ï¼è¿å©ç¨ä¸»ä¿¡å·xmonoçä¿¡æ¯æ¥è¿è¡ç¼ç ãç®å¤´42æç¤ºäºè¿ç§è®¾å¤ï¼å ¶ä¸å©ç¨äºåå§æªç¼ç ç主信å·xmonoãå¨è¿ä¸æ¥çå ¶å®å®æ½ä¾ä¸ï¼å¨ä¾§ä¿¡å·ç¼ç åå 30ä¸æä½¿ç¨ç主信å·ä¿¡æ¯å¯ä»¥ä»è¡¨ç¤ºè¯¥ä¸»ä¿¡å·çç¼ç åæ°pmono䏿¨æåºæ¥ï¼å¦è线44ææç¤ºçãIn the subtraction unit 36 the difference of the channel signals (divided by a factor of 2) is provided as side signal x side . In this embodiment, the side signal represents the difference between the two channels of the stereo signal. The side signal x side is supplied to the side signal encoding unit 30 . A preferred embodiment of the side signal encoding unit 30 will be discussed further below. According to a side signal encoding process which will be discussed in further detail below, the side signal xside is transformed into an encoding parameter pside representing the side signal xside . In some embodiments, information from the main signal x mono is also used for encoding. Arrow 42 indicates such a device, in which the original unencoded main signal xmono is utilized. In still other embodiments, the main signal information used in the side signal encoding unit 30 may be inferred from the encoding parameter p mono representing the main signal, as indicated by the dashed line 44 .
表示主信å·xmonoçç¼ç åæ°pmonoæ¯ç¬¬ä¸è¾åºä¿¡å·ï¼ä»¥å表示侧信å·xsideçç¼ç åæ°psideæ¯ç¬¬äºè¾åºä¿¡å·ãå¨é常æ å½¢ä¸ï¼è¿ä¸¤ä¸ªè¾åºä¿¡å·pmonoãpsideä¸èµ·è¡¨ç¤ºå®æ´çç«ä½å£°å£°é³ï¼å®ä»¬å¨å¤è·¯å¤ç¨å¨åå 40被å¤è·¯å¤ç¨æä¸ä¸ªä¼ è¾ä¿¡å·52ãç¶èï¼å¨å ¶å®å®æ½ä¾ä¸ï¼å¯ä»¥åå¼è¿è¡ç¬¬ä¸å第äºè¾åºä¿¡å·pmonoãpsideçä¼ è¾ãThe encoding parameter p mono representing the main signal x mono is the first output signal and the encoding parameter p side representing the side signal x side is the second output signal. In the usual case, the two output signals p mono , p side together represent the complete stereo sound, which are multiplexed in the multiplexer unit 40 into one transmission signal 52 . However, in other embodiments, the transmission of the first and second output signals p mono , p side may be performed separately.
å¨å¾2bä¸ï¼ä»¥æ¡å¾å½¢å¼è¯´æäºæ ¹æ®æ¬åæçè§£ç å¨24ç宿½ä¾ãææ¥æ¶çä¿¡å·54(å å«è¡¨ç¤ºä¸»åä¾§ä¿¡å·ä¿¡æ¯çç¼ç åæ°)被æä¾ç»è§£å¤ç¨å¨åå 56ï¼å®åå«ååºç¬¬ä¸å第äºè¾å ¥ä¿¡å·ã对åºäºä¸»ä¿¡å·çç¼ç åæ°pmonoç第ä¸è¾å ¥ä¿¡å·è¢«æä¾ç»ä¸»ä¿¡å·è§£ç å¨åå 64ãä»¥ä¼ ç»çæ¹å¼ï¼è¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°pmono被ç¨äºäº§çä¸ä¸ªè§£ç ç主信å·xâmonoï¼å®å°½å¯è½å°ç±»ä¼¼äºç¼ç å¨14(å¾2a)ä¸ç主信å·xmono(å¾2a)ãIn Fig. 2b, an embodiment of a decoder 24 according to the invention is illustrated in block diagram form. The received signal 54 (comprising encoding parameters representing information about the main and side signals) is supplied to a demultiplexer unit 56, which separates the first and second input signals, respectively. A first input signal corresponding to the encoding parameter p mono of the main signal is supplied to the main signal decoder unit 64 . In a conventional manner, the encoding parameters p mono representing the main signal are used to generate a decoded main signal x" mono which is as similar as possible to the main signal x mono (Fig. 2a) in the encoder 14 (Fig. 2a).
类似å°ï¼å¯¹åºäºä¾§ä¿¡å·ç第äºè¾å ¥ä¿¡å·è¢«æä¾ç»ä¸ä¸ªä¾§ä¿¡å·è§£ç å¨åå 60ãå¨è¿éï¼è¡¨ç¤ºä¾§ä¿¡å·çç¼ç åæ°pside被ç¨äºæ¢å¤è§£ç çä¾§ä¿¡å·xâsideãå¨ä¸äºå®æ½ä¾ä¸ï¼è§£ç è¿ç¨å©ç¨æå ³ä¸»ä¿¡å·xâmonoçä¿¡æ¯ï¼å¦ç®å¤´ææç¤ºçãSimilarly, a second input signal corresponding to the side signal is provided to a side signal decoder unit 60 . Here, the encoding parameter pside representing the side signal is used to recover the decoded side signal x" side . In some embodiments, the decoding process utilizes information about the main signal x" mono , as indicated by the arrow.
æè§£ç ç主åä¾§ä¿¡å·xâmonoãxâside被æä¾ç»ä¸ä¸ªå æ³åå 70ï¼å ¶æä¾ä¸ä¸ªè¡¨ç¤ºå£°éaçåå§ä¿¡å·çè¾åºä¿¡å·ã类似å°ï¼ç±åæ³åå 68æä¾çå·®æä¾äºä¸ä¸ªè¡¨ç¤ºå£°ébçåå§ä¿¡å·çè¾åºä¿¡å·ãå¯ä»¥æ ¹æ®ç°æææ¯çå¤çè¿ç¨å¨åå¤çå¨åå 74ä¸å¯¹è¿äºå£°éä¿¡å·è¿è¡åå¤çãæç»ï¼å¨è§£ç å¨çè¾åºç«¯26Aå26Bæä¾å£°éä¿¡å·aåbãThe decoded main and side signals x" mono , x" side are supplied to an addition unit 70 which provides an output signal representing the original signal of channel a. Similarly, the difference provided by subtraction unit 68 provides an output signal representative of the original signal of channel b. These channel signals may be post-processed in the post-processor unit 74 according to prior art processing procedures. Finally, channel signals a and b are provided at decoder outputs 26A and 26B.
å¦å¨åæå å®¹ä¸æè¿°ï¼éå¸¸ä»¥æ¯æ¬¡ä¸å¸§çæ¹å¼è¿è¡ç¼ç ãä¸å¸§å æ¬å¨ä¸ä¸ªé¢å®æ¶é´å¨æå çé³é¢éæ ·ãå¨å¾3açåºé¨ï¼ç¤ºä¾äºæç»æ¶é´ä¸ºLç帧SF2ã卿 é´å½±é¨åå çé³é¢éæ ·è¦ä¸èµ·è¢«ç¼ç ãåé¢çéæ ·åéåçéæ ·å¨å ¶å®å¸§ä¸è¿è¡ç¼ç ãæ è®ºå¦ä½ï¼æéæ ·åæå¸§é½å°å¨å¸§è¾¹çå¤å¼å ¥ä¸äºä¸è¿ç»ãå¤åç声é³å°ç»åºå¤åçç¼ç åæ°ï¼ä»èåºæ¬ä¸å¨æ¯ä¸ªå¸§è¾¹çå¤åçååãè¿å°äº§ç坿ç¥ç误差ã对è¿ç§æ å½¢ç¨å¾®è¿è¡è¡¥å¿çä¸ç§æ¹æ³æ¯ä½¿ç¼ç ä¸ä½åºäºè¦è¢«ç¼ç çéæ ·ï¼èä¸åºäºå¨è¯¥å¸§çç»å¯¹éè¿çéæ ·ï¼å¦ç±é´å½±é¨åææç¤ºçã以è¿ç§æ¹æ³ï¼å¨ä¸åç帧ä¹é´å°æ¯æ¯è¾æåç转æ¢ãä½ä¸ºå¤éæ¹æ¡æè è¡¥å ï¼ææ¶å©ç¨å æææ¯æ¥éä½ç±å¸§è¾¹çå¼èµ·ç坿ç¥ç人工产ç©ãç¶èï¼ææè¿äºè¿ç¨é½éè¦å¤§éçéå 计ç®èµæºï¼å¹¶ä¸å¯¹äºæäºç¹å®ç¼ç ææ¯èè¨ï¼ä¹è®¸é¾äºæä¾ä»»ä½çèµæºãAs mentioned in the Summary of the Invention, encoding is typically done one frame at a time. A frame consists of audio samples within a predetermined period of time. At the bottom of Fig. 3a, a frame SF2 of duration L is illustrated. Audio samples within the unshaded portion are to be encoded together. Previous samples and subsequent samples are coded in other frames. In any case, dividing samples into frames introduces some discontinuities at frame boundaries. A variable sound will give a variable encoding parameter that changes essentially at every frame boundary. This will produce a perceivable error. One way to somewhat compensate for this situation is to base the encoding not only on the samples to be encoded, but also on samples in the absolute vicinity of the frame, as indicated by the shaded portion. In this way there will be softer transitions between frames. As an alternative or in addition, interpolation techniques are sometimes utilized to reduce perceived artifacts caused by frame boundaries. However, all of these processes require significant additional computing resources, and for certain encoding techniques, it may be difficult to provide any resources.
å æ¤ï¼ä½¿ç¨å°½å¯è½é¿ç叧尿¯æççï¼å æ¤å¸§è¾¹ççæ°ç®ä¼å°ãèä¸ç¼ç æçé常ä¼åé«ï¼å¹¶ä¸å¿ è¦çä¼ è¾æ¯ç¹çé常ä¹è¢«æå°åãç¶èï¼é¿å¸§æå¸¦æ¥çé®é¢æ¯é¢å声人工产ç©åè幻声é³ãTherefore, it would be beneficial to use as long a frame as possible, so the number of frame boundaries would be small. Also the coding efficiency is usually high and the necessary transmission bit rate is usually minimized. The problem with long frames, however, is pre-echo artifacts and phantom sounds.
éè¿æ¿ä»£å°å©ç¨è¾çç帧ï¼ä¾å¦åå«å ·æL/2åL/4çæç»æ¶é´çSF1æçè³SF0ï¼æ¬é¢åçææ¯äººå认è¯å°ï¼ç¼ç æçä¼è¢«éä½ï¼ä¼ è¾æ¯ç¹çå¿ é¡»æ¯è¾é«ï¼å¹¶ä¸å¸§è¾¹ç人工产ç©çé®é¢å°å¢å ãç¶èï¼è¾çç帧è¾å°ç»åä¾å¦å ¶å®å¯æç¥ç人工产ç©ï¼æ¯å¦èå¹»ç声é³åé¢å声ã为äºè½å¤å°½å¯è½å¤å°æå°åç¼ç 误差ï¼åºå½ä½¿ç¨å°½å¯è½çç帧é¿ãBy using shorter frames instead, such as SF1 or even SF0 with durations of L/2 and L/4 respectively, those skilled in the art realize that the coding efficiency will be reduced, the transmission bit rate must be higher, and The problem of frame boundary artifacts will increase. However, shorter frames are less subject to, for example, other perceivable artifacts such as phantom sounds and pre-echoes. In order to be able to minimize coding errors as much as possible, the shortest possible frame length should be used.
æ ¹æ®æ¬åæï¼éè¿ä½¿ç¨ä¾èµäºå½åä¿¡å·å 容ç帧é¿åº¦æ¥ç¼ç ä¾§ä¿¡å·å¯ä»¥æ¹è¿é³é¢æç¥ãç±äºä¸å帧é¿åº¦å¯¹äºé³é¢æç¥çå½±åå°æ ¹æ®è¦è¢«ç¼ç ç声é³çç¹æ§èä¸åï¼å æ¤éè¿è®©ä¿¡å·æ¬èº«çç¹æ§æ¥å½±åæä½¿ç¨ç帧é¿åº¦å¯ä»¥è·å¾æ¹è¿ã主信å·çç¼ç 䏿¯æ¬åæçç®çï¼å æ¤ä¸è¿è¡è¯¦ç»æè¿°ãç¶èï¼ä¸»ä¿¡å·æç¨ç帧é¿åº¦å¯ä»¥ä¸ä¾§ä¿¡å·æä½¿ç¨ç帧é¿åº¦ç¸çï¼æè å¯ä»¥ä¸ç¸çãAccording to the invention, audio perception can be improved by encoding the side signal with a frame length that depends on the current signal content. Since the impact of different frame lengths on audio perception will vary depending on the characteristics of the sound to be encoded, improvements can be obtained by letting the characteristics of the signal itself influence the frame length used. The encoding of the main signal is not the object of the invention and therefore not described in detail. However, the frame length used by the main signal may be equal to the frame length used by the side signal, or may not be equal.
ç±äºå°çç¬æ¶ååï¼æä»¥ä¾å¦å¨ä¸äºæ å½¢ä¸ä½¿ç¨ç¸å¯¹è¾é¿ç帧对侧信å·è¿è¡ç¼ç æ¯æççã对äºå ·æå¤§éæ©æ£ç声åºçè®°å½æ¯å¦é³ä¹ä¼è®°å½ä¼åºç°è¿ç§æ å½¢ãå¨å ¶å®æ å½¢ä¸ï¼ä¾å¦å¨ç«ä½å£°è¯é³ä¼è¯ä¸ï¼ç帧åå¯è½æ¯ä¼éçãå¯ä»¥ç¨ä¸¤ç§åºæ¬æ¹æ³æ¥å¤æéååªç§å¸§é¿åº¦ãDue to the small temporal variations, it is beneficial in some situations, for example, to use relatively long frames for encoding the side signal. This is the case for recordings with a heavily diffuse sound field, such as concert recordings. In other situations, such as in stereo speech conversations, short frames may be preferable. There are two basic ways to decide which frame length to choose.
å¨å¾3bä¸è¯´ææ ¹æ®æ¬åæçä¾§ä¿¡å·ç¼ç å¨åå 30çä¸ä¸ªå®æ½ä¾ï¼å ¶ä¸å©ç¨äºéç¯å¤æã卿¤ä½¿ç¨äºé¿åº¦ä¸ºLçåºæ¬ç¼ç 帧ã产çäºå¤ä¸ªç¼ç æ¹æ¡81ï¼ç±å帧çåå¼çéå80æ¥è¡¨å¾ãåå¸§çæ¯ä¸ªéå80å æ¬ä¸ä¸ªæè å¤ä¸ªå帧ï¼å®ä»¬å ·æç¸åæä¸åçé¿åº¦ãç¶èå帧çéå80çæ»é¿åº¦æ»æ¯çäºåºæ¬ç¼ç 帧é¿åº¦Lãåèå¾3bï¼é¡¶é¨ç¼ç æ¹æ¡è¢«è¡¨å¾ä¸ºåªå å«ä¸ä¸ªé¿åº¦ä¸ºLçå帧çå帧éåãä¸ä¸ä¸ªå帧éåå å«ä¸¤ä¸ªé¿åº¦ä¸ºL/2çå帧ã第ä¸éåå å«ä¸¤ä¸ªé¿åº¦ä¸ºL/4çå帧以ååé¢çä¸ä¸ªé¿åº¦ä¸ºL/2çå帧ãAn embodiment of a side signal encoder unit 30 according to the invention is illustrated in Fig. 3b, in which a closed-loop decision is utilized. A basic coded frame of length L is used here. A plurality of coding schemes 81 are generated, characterized by separate sets 80 of subframes. Each set 80 of subframes includes one or more subframes, which may be of the same or different length. However the total length of the set 80 of subframes is always equal to the basic coded frame length L. Referring to Fig. 3b, the top coding scheme is characterized as a set of subframes containing only one subframe of length L. The next set of subframes contains two subframes of length L/2. The third set includes two subframes of length L/4 followed by a subframe of length L/2.
éè¿ææçç¼ç æ¹æ¡81对被æä¾ç»ä¾§ä¿¡å·ç¼ç å¨åå 30çä¿¡å·xsideè¿è¡ç¼ç ãå¨é¡¶é¨çç¼ç æ¹æ¡ä¸ï¼ä»¥ä¸åæ¥ç¼ç æ´ä¸ªåºæ¬ç¼ç 帧ãç¶èå¨å ¶å®çç¼ç æ¹æ¡ä¸ï¼å¨ç¸äºåå¼çå个å帧ä¸å¯¹ä¿¡å·xsideè¿è¡ç¼ç ãæ¥èªæ¯ä¸ªç¼ç æ¹æ¡çç»æè¢«æä¾ç»éæ©å¨85ãä¿ç度æµéè£ ç½®83ç¡®å®æ¯ä¸ªç¼ç ä¿¡å·çä¿ç度æµéå¼(measure)ãä¿ç度æµé弿¯ä¸ä¸ªå®¢è§çè´¨éå¼ï¼ä¼éçä¸ºä¿¡åªæ¯æµé弿è å æçä¿¡åªæ¯ãæ¯è¾ä¸æ¯ç§ç¼ç æ¹æ¡ç¸å ³çä¿ç度æµéå¼ï¼å¹¶ä¸å ¶ç»ææ§å¶ä¸ä¸ªåæ¢è£ ç½®87ï¼ç¨äºä»ç»åºæå¥½çä¿ç度æµéå¼çç¼ç æ¹æ¡ä¸éæ©è¡¨ç¤ºè¯¥ä¾§ä¿¡å·çç¼ç åæ°ï¼ä»¥ä½ä¸ºæ¥èªä¾§ä¿¡å·ç¼ç å¨åå 30çè¾åºä¿¡å·psideãThe signal x side provided to the side signal encoder unit 30 is encoded by all encoding schemes 81 . In the coding scheme at the top, the entire basic coding frame is coded in one block. In other coding schemes, however, the signal x side is coded in separate subframes. The results from each coding scheme are provided to a selector 85 . Fidelity measurement means 83 determine a fidelity measure for each encoded signal. The fidelity measure is an objective quality value, preferably a signal-to-noise ratio measure or a weighted signal-to-noise ratio. The fidelity measures associated with each encoding scheme are compared and the result controls a switching means 87 for selecting the encoding parameters representing the side signal from the encoding scheme giving the best fidelity measure to as output signal p side from side signal encoder unit 30 .
ä¼éå°ï¼æµè¯å¸§é¿åº¦çææå¯è½çç»åï¼å¹¶éæ©ç»åºæå¥½ç客è§è´¨é(ä¾å¦ä¿¡åªæ¯)çå帧çéåãPreferably, all possible combinations of frame lengths are tested and the set of subframes that gives the best objective quality (eg signal to noise ratio) is selected.
卿¬å®æ½ä¾ä¸ï¼æ ¹æ®ä¸å¼éæ©æç¨çå帧çé¿åº¦ï¼In this embodiment, the length of the subframe used is selected according to the following formula:
lsfï¼lf/2nï¼l sf = l f /2 n ,
å ¶ä¸lsfæ¯å帧çé¿åº¦ï¼lfæ¯ç¼ç 帧çé¿åº¦ï¼ä»¥ånæ¯ä¸ä¸ªæ´æ°ã卿¬å®æ½ä¾ä¸ï¼å¨0å3ä¹é´éæ©nãç¶èï¼å°å¯è½ä½¿ç¨ä»»ä½å¸§é¿åº¦ï¼åªè¦éåçæ»é¿åº¦ä¿ææå®ãwhere l sf is the length of the subframe, l f is the length of the coded frame, and n is an integer. In this embodiment, n is chosen between 0 and 3. However, it would be possible to use any frame length as long as the total length of the set remains constant.
å¨å¾3cä¸è¯´æäºæ ¹æ®æ¬åæçä¾§ä¿¡å·ç¼ç å¨åå 30çå¦ä¸ä¸ªå®æ½ä¾ã卿¤ï¼å¸§é¿åº¦å¤ææ¯ä¸ä¸ªåºäºä¿¡å·çç»è®¡ç¹æ§çå¼ç¯å¤æãæ¢è¨ä¹ï¼å°ä½¿ç¨ä¾§ä¿¡å·çé¢è°±ç¹å¾ä»¥ä½ä¸ºç¨äºå³å®æç®ä½¿ç¨åªç§ç¼ç æ¹æ¡çåºç¡ãå¦åæè¿°ï¼å¯ä»¥è·å¾è¢«è¡¨å¾ä¸ºä¸åå帧çéåçä¸åç¼ç æ¹æ¡ãç¶èï¼å¨è¿ä¸å®æ½ä¾ä¸ï¼éæ©å¨85被æ¾ç½®å¨å®é ç¼ç ä¹åãè¾å ¥çä¾§ä¿¡å·xsideè¿å ¥éæ©å¨85åä¿¡å·åæåå 84ãåæçç»ææä¸ºå¼å ³86çè¾å ¥ï¼å¨å¼å ³ä¸åªä½¿ç¨ä¸ç§ç¼ç æ¹æ¡81ãæ¥èªè¯¥ç¼ç æ¹æ¡çè¾åºä¹å°æ¯æ¥èªä¾§ä¿¡å·ç¼ç å¨åå 30çè¾åºä¿¡å·psideãAnother embodiment of a side signal encoder unit 30 according to the invention is illustrated in Fig. 3c. Here, the frame length judgment is an open-loop judgment based on the statistical characteristics of the signal. In other words, the spectral characteristics of the side signal will be used as a basis for deciding which coding scheme to use. As mentioned before, different coding schemes can be obtained which are characterized as sets of different subframes. However, in this embodiment the selector 85 is placed before the actual encoding. The input side signal x side enters the selector 85 and the signal analysis unit 84 . The result of the analysis becomes the input of a switch 86 in which only one encoding scheme 81 is used. The output from this encoding scheme will also be the output signal p side from the side signal encoder unit 30 .
å¼ç¯å¤æçä¼ç¹å¨äºåªè¦æ§è¡ä¸æ¬¡å®é ç¼ç ãç¶è缺ç¹å¨äºï¼ä¿¡å·ç¹å¾çåæå®é ä¸ä¼é叏夿ï¼å¹¶ä¸é¾ä»¥äºå 颿µå¯è½çç¹æ§ä»¥ä¾¿è½å¤å¨å¼å ³86ä¸ç»åºéå½çéæ©ãå¨ä¿¡å·åæåå 84ä¸å¿ é¡»æ§è¡åå å«è®¸å¤ç声é³ç»è®¡åæãç¼ç æ¹æ¡ä¸ä»»ä½å°çååé½å¯è½å®å ¨é¢ åç»è®¡ç¹æ§ãThe advantage of open-loop judgment is that the actual encoding needs to be performed only once. A disadvantage, however, is that the analysis of the signal characteristics can actually be very complex and it is difficult to predict possible characteristics in advance in order to be able to make an appropriate selection in the switch 86 . In the signal analysis unit 84 a number of sound statistical analyzes have to be performed and included. Any small change in the encoding scheme can completely reverse the statistical properties.
éè¿ä½¿ç¨éç¯éæ©(å¾3b)ï¼å¯ä»¥äºæ¢ç¼ç æ¹æ¡èæ é对åå çå ¶ä½é¨åè¿è¡ä»»ä½ååãå¦ä¸æ¹é¢ï¼å¦æè¦ç 究许å¤ç¼ç æ¹æ¡ï¼å计ç®è¦æ±ä¼å¾é«ãBy using closed-loop selection (Fig. 3b), the coding schemes can be interchanged without any changes to the rest of the unit. On the other hand, if many encoding schemes are to be investigated, the computational requirements can be high.
è¿ç§å¯¹ä¾§ä¿¡å·è¿è¡å¯å帧é¿ç¼ç ççå¤å¨äºï¼å¯ä»¥å¨ä¸¤ç§æ å½¢ä¹é´è¿è¡éæ©ï¼ä¸æ¹é¢æ¯ç²¾ç»çæ¶é´å辨çåç²ç³çé¢çå辨çï¼å¦ä¸æ¹é¢æ¯ç²ç³çæ¶é´å辨çåç²¾ç»çé¢çå辨çã以ä¸ç宿½ä¾å°ä»¥æä½³å¯è½çæ¹å¼æ¥ä¿æç«ä½å£°å¾åãThe benefit of this variable frame length coding of the side signal is that it is possible to choose between two situations: fine time resolution and coarse frequency resolution on the one hand, and coarse time resolution and coarse frequency resolution on the other hand. Fine frequency resolution. The above embodiments will preserve the stereo image in the best possible way.
对äºå¨ä¸åç¼ç æ¹æ¡ä¸æä½¿ç¨çå®é ç¼ç è¿ä¼æä¸äºè¦æ±ãç¹å«æ¯ï¼å½ä½¿ç¨éç¯éæ©æ¶ï¼ç¨äºæ§è¡å¤ä¸ªæå¤æå°åæ¶ç¼ç ç计ç®èµæºå¿ 须大ãç¼ç è¿ç¨è¶å¤æï¼æéè¦ç计ç®è½åå°±è¶å¤ãæ¤å¤ï¼å¨ä¼ è¾æ¶ç使¯ç¹ç乿¯ä¼éçãThere are also requirements for the actual encoding used in the different encoding schemes. In particular, when using closed-loop selection, the computing resources for performing multiple more or less simultaneous encodings must be large. The more complex the encoding process, the more computing power is required. Furthermore, a low bit rate in transmission is also preferred.
å¨US 5,434,948ä¸ç»åºçæ¹æ³ä½¿ç¨äºå声é(主)ä¿¡å·çæ»¤æ³¢å½¢å¼æ¥æ¯æä¾§ä¿¡å·æè 差信å·ã滤波å¨çåæ°è¢«ä¼åï¼å¹¶ä¸å è®¸éæ¶é´ååãç¶å表示侧信å·çç¼ç çæ»¤æ³¢å¨åæ°è¢«åéãå¨ä¸ä¸ªå®æ½ä¾ä¸ï¼ä¹åéä¸ä¸ªæ®çä¾§ä¿¡å·ãå¨è®¸å¤æ å½¢ä¸ï¼è¿ç§æ¹æ³å°å¯è½ç¨ä½å¨æ¬åæèå´å çä¾§ä¿¡å·ç¼ç æ¹æ³ãç¶èï¼è¯¥æ¹æ³å ·æä¸äºç¼ºé·ãç±äºæ»¤æ³¢å¨é¶æ°å¿ é¡»å¾é«æ¥æä¾ç²¾ç¡®çä¾§ä¿¡å·ä¼°è®¡ï¼æä»¥æ»¤æ³¢å¨ç³»æ°å任使®çä¾§ä¿¡å·çéåé常éè¦ç¸å¯¹è¾é«çä¼ è¾æ¯ç¹çãæ»¤æ³¢å¨èªèº«ç估计ä¹ä¼æé®é¢ï¼ç¹å«æ¯å¨ç¬æ¶ä¸°å¯çé³ä¹ä¸ã估计误差å°ç»åºä¸ä¸ªä¿®æ¹çä¾§ä¿¡å·ï¼å ¶ææ¶å¨å¹ 度æ¹é¢æ¯æªä¿®æ¹çä¿¡å·å¤§ãè¿å°å¯¼è´è¾é«çæ¯ç¹çéè¦ãèä¸ï¼å¦ææ¯Nä¸ªéæ ·è®¡ç®ä¸ç»æ°ç滤波å¨ç³»æ°ï¼åéè¦å æè¿äºæ»¤æ³¢å¨ç³»æ°ä»¥äº§çä»ä¸ç»æ»¤æ³¢å¨ç³»æ°å°å¦ä¸ç»çå¹³æ»è½¬æ¢ï¼å¦ä¸é¢æè®¨è®ºçãæ»¤æ³¢å¨ç³»æ°çå ææ¯ä¸é¡¹å¤æçä»»å¡ï¼å¹¶ä¸å¨å æä¸ç误差å°ä¼è¡¨ç°ä¸ºå¤§ç侧误差信å·ï¼ä»è导è´å·®å¼è¯¯å·®ä¿¡å·ç¼ç 卿éçè¾é«æ¯ç¹çãThe method given in US 5,434,948 uses a filtered version of the mono (main) signal to compare the side or difference signal. The parameters of the filter are optimized and allowed to vary over time. The encoded filter parameters representing the side signal are then transmitted. In one embodiment, a residual side signal is also sent. In many cases, this method will likely be used as a side signal encoding method within the scope of the present invention. However, this method has some drawbacks. Since the filter order must be high to provide accurate side signal estimates, quantization of the filter coefficients and any residual side signal typically requires a relatively high transmission bit rate. Estimation of the filter itself can also be problematic, especially in temporally rich music. The estimation error will give a modified side signal which is sometimes larger in magnitude than the unmodified signal. This will result in a higher bitrate required. Also, if a new set of filter coefficients is computed every N samples, these filter coefficients need to be interpolated to produce a smooth transition from one set of filter coefficients to another, as discussed above. Interpolation of the filter coefficients is a complex task and errors in the interpolation will appear as large side error signals, leading to higher bit rates required by the difference error signal encoder.
é¿å å æçéè¦çä¸ç§æ¹æ³æ¯åºäºéä¸ªéæ ·æ¥æ´æ°æ»¤æ³¢å¨ç³»æ°ï¼å¹¶ä¸ä¾é ååèªéåºåæã为äºå¯ä»¥è¯å¥½è¿è¡ï¼è¦æ±æ®çç¼ç 卿ç¸å½é«çæ¯ç¹çãå æ¤ï¼è¿å¯¹äºä½éçç«ä½å£°ç¼ç 䏿¯ä¸ä¸ªå¥½çå¤éæ¹æ¡ãOne way to avoid the need for interpolation is to update the filter coefficients on a sample-by-sample basis and rely on backward adaptive analysis. Residual encoders are required to have a fairly high bitrate in order to work well. Therefore, this is not a good candidate for low-rate stereo encoding.
åå¨ä»¥ä¸ä¾å¦å¯¹äºé³ä¹æ¥è¯´å¾å¸¸è§çæ å½¢ï¼å ¶ä¸å声éä¿¡å·å差信å·å 乿¯ä¸ç¸å ³çãäºæ¯æ»¤æ³¢å¨ä¼°è®¡åå¾é常å°é¾ï¼éå çé£é©åªæ¯ä½¿å¾å·®å¼è¯¯å·®ä¿¡å·ç¼ç å¨çæ 嵿´ç³ãThere are situations, which are common eg for music, where the mono signal and the difference signal are almost uncorrelated. Filter estimation then becomes very difficult, with the added risk of only making the difference error signal encoder worse.
æ ¹æ®US 5,434,948çè§£å³æ¹æ¡å¯ä»¥å¨ä¸é¢çæ å½¢ä¸è¯å¥½å·¥ä½ï¼å ¶ä¸æ»¤æ³¢å¨ç³»æ°éçæ¶é´çåå徿 ¢ï¼ä¾å¦å¨ä¼è®®çµè¯ç³»ç»ä¸ãå¨é³ä¹ä¿¡å·çæ å½¢ä¸ï¼è¯¥æ¹æ³å¹¶ä¸å¾å¥½å°å·¥ä½ï¼å 为滤波å¨éè¦å¿«éæ¹å以è·è¸ªç«ä½å£°å¾åãè¿æå³çï¼å¿ 须使ç¨å¹ 度é常ä¸åçå帧é¿åº¦ï¼å ¶æå³çè¦æµè¯çç»åæ°ç®å¿«éå¢å ãè¿åæå³çç¨äºè®¡ç®ææå¯è½çç¼ç æ¹æ¡çè¦æ±åå¾é«å¾ä¸åå®é ãThe solution according to US 5,434,948 may work well in situations where the filter coefficients vary slowly over time, eg in a conference call system. In the case of music signals, this approach does not work very well, since the filters need to change rapidly to track the stereo image. This means that subframe lengths of very different magnitudes have to be used, which means that the number of combinations to be tested increases rapidly. This in turn means that the requirements for computing all possible encoding schemes become impractically high.
å æ¤ï¼å¨ä¼é宿½ä¾ä¸ï¼åºäºä»¥ä¸ææ³æ¥ç¼ç ä¾§ä¿¡å·ï¼å³éè¿ä½¿ç¨ä¸ä¸ªç®åç平衡å 忥代æ¿å¤æçæ¯ç¹çæ¶èç颿µæ»¤æ³¢å¨ï¼ä»èéä½å声éä¿¡å·åä¾§ä¿¡å·ä¹é´çåä½ãç¶åç¼ç è¿ä¸æä½çæ®çãæè¿°æ®ççå¹ åº¦ç¸å¯¹è¾ä½ï¼å¹¶ä¸ä¸éè¦é常é«çæ¯ç¹çéæ±æ¥è¿è¡ä¼ éãè¿ä¸ææ³çç¡®é常éäºååé¢æè¿°çå¯å帧éåæ¹æ³ç¸ç»åï¼å 为计ç®å¤æåº¦ä½ãTherefore, in the preferred embodiment, the side signal is encoded based on the idea that the redundancy between the mono signal and the side signal is reduced by replacing the complex bitrate-consuming prediction filter with a simple balancing factor . The remainder of this operation is then encoded. The magnitude of the residue is relatively low and does not require very high bit rate requirements for transmission. This idea is indeed very suitable for combining with the above-mentioned variable frame set method, because the computational complexity is low.
使ç¨ä¸å¯å帧é¿åº¦æ¹æ³ç»åç平衡å åæ¶é¤äºå¯¹å¤æå æçéè¦ä»¥åå æå¯è½å¼èµ·çç¸å ³é®é¢ãèä¸ï¼ä½¿ç¨ç®åç平衡å å代æ¿å¤æç滤波å¨äº§çæ´å°ç估计é®é¢ï¼å 为平衡å åçå¯è½çä¼°è®¡è¯¯å·®å ·ææ´å°çå½±åãä¼éçè§£å³æ¹æ¡å°è½å¤ä»¥è¯å¥½çè´¨éååéçæ¯ç¹çè¦æ±ä»¥å计ç®èµæºæ¥åç°å¹³æ»ä¿¡å·(panned signal)åæ©æ£å£°åºãUsing a balance factor combined with a variable frame length method eliminates the need for complex interpolation and the associated problems that interpolation can cause. Also, using simple balance factors instead of complex filters creates fewer estimation problems, since possible estimation errors of the balance factors have less impact. A preferred solution will be able to reproduce panned signals and diffuse sound fields with good quality and constrained bitrate requirements and computational resources.
å¾4说æäºæ ¹æ®æ¬åæçç«ä½å£°ç¼ç å¨çä¼é宿½ä¾ãè¯¥å®æ½ä¾ä¸å¾2aæç¤ºç宿½ä¾é常类似ï¼ç¶èï¼æç¤ºäºä¾§ä¿¡å·ç¼ç å¨åå 30çç»èãè¯¥å®æ½ä¾çç¼ç å¨14ä¸å ·å¤ä»»ä½çé¢å¤çåå ï¼å¹¶ä¸è¾å ¥ä¿¡å·è¢«ç´æ¥æä¾ç»å æ³ååæ³åå 34ã36ãå¨ä¹æ³å¨33ä¸å声éä¿¡å·xå声éåæä¸å¹³è¡¡å ågsmç¸ä¹ãå¨åæ³åå 35ä¸ï¼ç¸ä¹åçå声éä¿¡å·è¢«ä»ä¾§ä¿¡å·xä¾§ä¸åå»(å³åºæ¬ä¸æ¯è¿ä¸¤ä¸ªå£°éä¹é´çå·®å¼)ï¼ä»¥äº§çä¾§æ®çä¿¡å·ãéè¿ä¼åå¨37åºäºå声éä¿¡å·åä¾§ä¿¡å·çå 容æ¥ç¡®å®å¹³è¡¡å ågsmï¼ä»¥ä¾¿æ ¹æ®è´¨éæ 忥æå°åä¾§æ®çä¿¡å·ãæè¿°è´¨éæ åä¼é为æå°åæ¹æ åãæ ¹æ®ä»»ä¸ç¼ç å¨è¿ç¨å¨ä¾§æ®çç¼ç å¨39ä¸å¯¹ä¾§æ®çä¿¡å·è¿è¡ç¼ç ãä¼éå°ï¼ä¾§æ®çç¼ç å¨39æ¯ä¸ä¸ªä½æ¯ç¹ç忢ç¼ç å¨ï¼æè ä¸ä¸ªç æ¬æ¿å±çº¿æ§é¢æµ(CELPï¼Codebook Excited LinearPrediction)ç¼ç å¨ã表示侧信å·çç¼ç åæ°psideåå å«äºè¡¨ç¤ºä¾§æ®çä¿¡å·çç¼ç åæ°pside residualåä¼åç平衡å å49ãFigure 4 illustrates a preferred embodiment of a stereo encoder according to the invention. This embodiment is very similar to the embodiment shown in Fig. 2a, however, the details of the side signal encoder unit 30 are revealed. The encoder 14 of this embodiment is not provided with any pre-processing unit and the input signal is provided directly to the addition and subtraction units 34,36. In the multiplier 33 the mono signal x mono is multiplied by a certain balance factor g sm . In a subtraction unit 35 the multiplied mono signal is subtracted from the side signal xside (ie essentially the difference between these two channels) to produce a side residual signal. The balance factor gsm is determined by the optimizer 37 based on the content of the mono signal and the side signal in order to minimize the side residual signal according to quality criteria. The quality standard is preferably the least mean square standard. The side residual signal is encoded in a side residual encoder 39 according to either encoder process. Preferably, the side residual encoder 39 is a low bit rate transform encoder, or a Codebook Excited Linear Prediction (CELP: Codebook Excited Linear Prediction) encoder. The coding parameter p side representing the side signal includes the coding parameter p side residual representing the side residual signal and an optimized balance factor 49 .
å¨å¾4ç宿½ä¾ä¸ï¼ç¨äºåæä¾§ä¿¡å·çå声éä¿¡å·42æ¯å声éç¼ç å¨38çç®æ ä¿¡å·xmonoãå¦ä¸æè¿°(ç»åå¾2a)ï¼ä¹å¯ä»¥å©ç¨å声éç¼ç å¨38çæ¬å°åæä¿¡å·ãå¨å䏿 å½¢ä¸ä¼å¢å æ»ç¼ç 卿¶å»¶ï¼å¹¶ä¼å¢å ä¾§ä¿¡å·ç计ç®å¤æåº¦ãå¦ä¸æ¹é¢ï¼è´¨é伿¯è¾å¥½ï¼å 为æå¯è½ä¿®å¤å¨å声éç¼ç å¨ä¸äº§ççç¼ç é误ãIn the embodiment of FIG. 4 , the mono signal 42 used to synthesize the side signal is the target signal x mono of the mono encoder 38 . As mentioned above (in connection with Fig. 2a), it is also possible to use the local composite signal of the mono encoder 38. In the latter case the total encoder delay will be increased and the computational complexity of the side signals will be increased. On the other hand, the quality will be better because it is possible to fix encoding errors made in the mono encoder.
å¦ä¸ä»¥æ´å ç²¾ç¡®çæ¹å¼æ¥æè¿°åºæ¬ç¼ç æ¹æ¡ãå°ä¸¤ä¸ªå£°éä¿¡å·è¡¨ç¤ºä¸ºaåbï¼å®ä»¬å¯ä»¥æ¯ç«ä½å£°å¯¹ç左声éåå³å£°éãéè¿ç¸å å°å£°éä¿¡å·ç»åæä¸ä¸ªå声éä¿¡å·ï¼å¹¶ä¸éè¿ç¸åèç»åæä¸ä¸ªä¾§ä¿¡å·ã该æä½ä»¥çå¼çå½¢å¼è¢«æè¿°ä¸ºï¼The basic encoding scheme is described in a more precise manner as follows. Denote the two channel signals as a and b, which may be the left and right channels of a stereo pair. The channel signals are combined into a mono signal by addition and into a side signal by subtraction. This operation is described in equation form as:
xmono(n)ï¼0.5(a(n)+b(n))x mono (n)=0.5(a(n)+b(n))
xside(n)ï¼0.5(a(n)-b(n)).x side (n)=0.5(a(n)-b(n)).
æççæ¯ä»¥2为å åæ¥ç¼©å°xmonoåxsideä¿¡å·ã卿¤ï¼è¿æç¤ºçåå¨å ¶å®äº§çxmonoåxsideçæ¹æ³ãå¯ä»¥ä½¿ç¨ä¾å¦ï¼It is beneficial to scale down the x mono and x side signals by a factor of 2. Here, this implies that there are other ways to generate x mono and x side . You can use for example:
xmono(n)ï¼Î³a(n)+(1-γ)b(n)x mono (n)=γa(n)+(1-γ)b(n)
xside(n)ï¼Î³a(n)-(1-γ)b(n)x side (n)=γa(n)-(1-γ)b(n)
0â¤Î³â¤1.0.0â¤Î³â¤1.0.
å¨è¾å ¥ä¿¡å·çåä¸ï¼æ ¹æ®ä¸å¼è®¡ç®ä¿®æ¹åçæè æ®ççä¾§ä¿¡å·ï¼On blocks of the input signal, the modified or residual side signal is computed according to:
xsideresidual(n)ï¼xside(n)-f(xmonoï¼xside)xmono(n)ï¼x sideresidual (n)=x side (n)-f(x mono , x side ) x mono (n),
å ¶ä¸f(xmonoï¼xside)æ¯å¹³è¡¡å å彿°ï¼å ¶åºäºæ¥èªä¾§åå声éä¿¡å·çNä¸ªéæ ·çå(å³å帧)æ¥äºåä»ä¾§ä¿¡å·ä¸å°½å¯è½å¤å°æ¶é¤ãæ¢è¨ä¹ï¼ä½¿ç¨å¹³è¡¡å 忥æå°åæ®çä¾§ä¿¡å·ãå¨ä»¥åæ¹ä¸ºåè¿è¡æå°åçç¹æ®æ å½¢ä¸ï¼è¿çä»·äºæå°åæ®çä¾§ä¿¡å·xside residualçè½éãwhere f(x mono , x side ) is a balance factor function that seeks to cancel as much as possible from the side signal based on blocks of N samples (ie subframes) from the side and mono signals. In other words, balance factors are used to minimize the residual side signal. In the special case of mean squared minimization, this is equivalent to minimizing the energy of the residual side signal x side residual .
å¨ä¸è¿°ç¹æ®æ å½¢ä¸ï¼f(xmonoï¼xside)被æè¿°ä¸ºï¼In the special case above, f( xmono , x side ) is described as:
ff (( xx monomono ,, xx sidethe side )) == RR smsm RR mmmm
RR mmmm == [[ ΣΣ nno == framestartframe start frameendframe end xx monomono (( nno )) xx monomono (( nno )) ]]
RR smsm == [[ ΣΣ nno == framestartframe start frameendframe end xx sidethe side (( nno )) xx monomono (( nno )) ]] ,,
å ¶ä¸xsideæ¯ä¾§ä¿¡å·ï¼ä»¥åxmonoæ¯å声éä¿¡å·ã注æå°ï¼è¯¥å½æ°åºäºä»¥â帧å¼å§âå¼å§å以âå¸§ç»æâç»æçåãwhere x side is the side signal and x mono is the mono signal. Note that this function is based on blocks starting with "frame start" and ending with "frame end".
æå¯è½å¨é¢åä¸å¢å å ææ¥è®¡ç®å¹³è¡¡å åãè¿æ¯éè¿å©ç¨å ææ»¤æ³¢å¨çèå²ååºå¯¹xsideåxmonoä¿¡å·å·ç§¯æ¥å®æçãè¿æ ·æå¯è½å°ä¼°è®¡è¯¯å·®ç§»å¨å°æ´ä¸æè¢«å¬å°çé¢çèå´å ãè¿è¢«ç§°ä¸ºæç¥å æãIt is possible to add weighting in the frequency domain to calculate the balance factor. This is done by convolving the x side and x mono signals with the impulse response of the weighting filter. This has the potential to move the estimation error into frequency ranges that are less audible. This is called perceptual weighting.
ç±å½æ°f(xmonoï¼xside)ç»åºç平衡å åå¼çéåå½¢å¼è¢«åéå°è§£ç å¨ãå¨äº§çä¿®æ¹çä¾§ä¿¡å·æ¶æå¥½å·²ç»è¯´æäºè¿äºéåãç¶åè·å¾ä»¥ä¸ç表达å¼ï¼A quantized version of the balance factor value given by the function f(x mono , x side ) is sent to the decoder. These quantizations are preferably already accounted for when generating the modified side signal. Then the following expressions are obtained:
xsideresidual(n)ï¼xside(n)-gQxmono(n)x sideresidual (n)ï¼x side (n)-g Q x mono (n)
gg QQ == QQ gg -- 11 (( QQ gg (( RR smsm RR mmmm )) )) ..
Qg(...)æ¯ä¸ä¸ªéå彿°ï¼å ¶è¢«åºç¨å°ç±å½æ°f(xmonoï¼xside)æç»åºç平衡å åä¸ãå¨ä¼ è¾ä¿¡éä¸åéæè¿°å¹³è¡¡å åã卿£å¸¸çå·¦å³å¹³æ»ä¿¡å·ä¸ï¼å¹³è¡¡å å被éå¶å¨åºé´[-1.0 1.0]ä¸ãå¦ä¸æ¹é¢ï¼å¦æå£°éç¸å¯¹äºå½¼æ¤å¼ç¸ï¼å平衡å åä¼è¶ åºè¿äºéå¶ãQg(...) is a quantization function that is applied to the balance factor given by the function f(x mono , x side ). The balance factor is sent in a transport channel. In a normal left-right smooth signal, the balance factor is clamped in the interval [-1.0 1.0]. On the other hand, if the channels are out of phase with respect to each other, the balance factor will exceed these limits.
ä½ä¸ºç¨äºç¨³å®ç«ä½å£°å¾åçä¸ä¸ªå¯éæ¹æ³ï¼å¯ä»¥å¨ä»¥ä¸æ åµä¸å¯¹å¹³è¡¡å åè¿è¡éå¶ï¼å³å¦æå声éä¿¡å·åä¾§ä¿¡å·ä¹é´çå½ä¸åäºç¸å ³ä¸ä½³ï¼å¦ä»¥ä¸ç弿ç»åºçï¼As an optional method for stabilizing the stereo image, the balance factor can be limited if the normalized cross-correlation between the mono signal and the side signal is poor, as given by the following equation of:
gg QQ == QQ gg -- 11 (( QQ gg (( || RR == smsm || RR smsm RR mmmm )) )) ,,
å ¶ä¸ï¼in,
RR == smsm == RR smsm RR ssss ·&Center Dot; RR mmmm
RR smsm == [[ ΣΣ nno == framestartframe start frameendframe end xx sidcsidc (( nno )) xx monomono (( nno )) ]] ..
è¿äºæ å½¢å¨å ·æå¤§éæ©æ£å£°é³çå¤å ¸é³ä¹ææé³å®¤é³ä¹ä¸åºç°é常é¢ç¹ï¼å ¶ä¸å¨ä¸äºæ å½¢ä¸ï¼å¨å建å声éä¿¡å·æ¶aåb声éä¹è®¸å ä¹å½¼æ¤æµæ¶ã对äºå¹³è¡¡å åçå½±åå°±æ¯ä¼å¿«éè·³åï¼ä»èå¼èµ·æ··ä¹±çç«ä½å£°å¾åãä¸è¿°è°æ´åè½»äºæè¿°é®é¢ãThese situations arise very frequently in classical or studio music with a lot of diffuse sound, where in some cases the a and b channels may nearly cancel each other out when creating a mono signal. The effect on the balance factor is that it jumps quickly, causing a confusing stereo image. The above adjustments alleviate the problem.
å¨US 5,434,948ä¸åºäºæ»¤æ³¢å¨çæ¹æ³å ·æç±»ä¼¼çé®é¢ï¼ä½æ¯å¨é£ç§æ å½¢ä¸è§£å³æ¹æ¡å¹¶ä¸é£ä¹ç®åãThe filter based approach in US 5,434,948 has a similar problem, but the solution is not so simple in that case.
妿Esæ¯æ®çä¾§ä¿¡å·çç¼ç 彿°(ä¾å¦åæ¢ç¼ç å¨)ï¼ä»¥åEmæ¯å声éä¿¡å·çç¼ç 彿°ï¼åå¨è§£ç 卿«å°¾è¢«è§£ç çaâåbâä¿¡å·å¯ä»¥è¢«æè¿°ä¸º(卿¤å设γï¼0.5)ï¼If E s is the encoding function of the residual side signal (e.g. a transform coder), and E m is the encoding function of the mono signal, then the decoded a" and b" signals at the end of the decoder can be described as (here Assuming γ = 0.5):
aâ³(n)ï¼(1+gQ)xâ³mono(n)+xâ³side(n)aâ³(n)=(1+g Q )xâ³ mono (n)+xâ³ side (n)
bâ³(n)ï¼(1-gQ)xâ³mono(n)-xâ³side(n)bâ³(n)=(1-g Q )xâ³ mono (n)-xâ³ side (n)
xx sidethe side ′′ ′′ == EE. sthe s -- 11 (( EE. sthe s (( xx sideresidualside residual )) ))
xx monomono ′′ ′′ == EE. mm -- 11 (( EE. mm (( xx monomono )) ))
å¯¹äºæ¯ä¸å¸§è®¡ç®å¹³è¡¡å åçä¸ä¸ªéè¦çå¤å°±æ¯é¿å äºä½¿ç¨å æã代ä¹ä»¥ï¼ä¸è¬å°å¦ä¸æè¿°ï¼å©ç¨éå çå¸§æ¥æ§è¡å¸§å¤çãAn important benefit of computing the balance factor for each frame is that it avoids the use of interpolation. Instead, frame processing is performed with overlapping frames, generally as described above.
å¨é³ä¹ä¿¡å·çæ å½¢ä¸ä½¿ç¨å¹³è¡¡å åçç¼ç åçå·¥ä½ç¹å«è¯å¥½ï¼å ¶ä¸é常éè¦å¿«éçæ¹åæ¥è·è¸ªç«ä½å£°å¾åãThe coding principle using balance factors works particularly well in the case of music signals, where often fast changes are required to track the stereo image.
è¿æ¥ï¼å¤å£°éç¼ç å·²ç»å徿®éãä¸ä¸ªå®ä¾æ¯DVDçµå½±ä¸ç5.1声éç¯ç»å£°ãè¿äºå£°éå¨é£é被设置为ï¼åå·¦ãåä¸ãåå³ãåå·¦ãåå³ä»¥åäºä½é³æ¬å£°å¨ãå¨å¾5ä¸ï¼ç¤ºåºäºæ ¹æ®æ¬åæä»¥è¿ç§éç¨å£°éé´åä½çå¸ç½®å¯¹3个å声éè¿è¡ç¼ç çç¼ç å¨ç宿½ä¾ãRecently, multi-channel encoding has become common. An example is 5.1 channel surround sound in DVD movies. The channels are set there as: Front Left, Front Center, Front Right, Rear Left, Rear Right and Subwoofer. In Fig. 5 an embodiment of an encoder encoding 3 front channels in this arrangement with inter-channel redundancy according to the invention is shown.
å¨3个è¾å ¥ç«¯16A-C䏿ä¾3个声éä¿¡å·Lï¼Cï¼Rï¼å¹¶ä¸éè¿è¿ä¸ä¸ªä¿¡å·çåæ¥äº§çå声éä¿¡å·xmonoãå¢å äºä¸å¤®ä¿¡å·ç¼ç å¨åå 130ï¼å ¶æ¥æ¶ä¸å¤®ä¿¡å·xcentreã卿¬å®æ½ä¾ä¸å声éä¿¡å·42æ¯æç¼ç åè§£ç çå声éä¿¡å·xâmonoï¼å¹¶ä¸å¨ä¹æ³å¨133ä¸ä¸æä¸å¹³è¡¡å ågQç¸ä¹ãå¨åæ³åå 135ä¸ï¼ç¸ä¹åçå声éä¿¡å·è¢«ä»ä¸å¤®ä¿¡å·xcentreä¸åå»ï¼ä»¥äº§çä¸å¤®æ®çä¿¡å·ãç±ä¼åå¨137åºäºå声éä¿¡å·åä¸å¤®ä¿¡å·çå 容æ¥ç¡®å®å¹³è¡¡å ågQï¼ä»¥ä¾¿æ ¹æ®è´¨éæ 忥æå°åä¸å¤®æ®çä¿¡å·ãå¨ä¸å¤®æ®çç¼ç å¨139䏿 ¹æ®ä»»ä½ç¼ç è¿ç¨å¯¹ä¸å¤®æ®çä¿¡å·è¿è¡ç¼ç ãä¼éå°ï¼ä¸å¤®æ®çç¼ç å¨139æ¯ä½æ¯ç¹ç忢ç¼ç 卿CELPç¼ç å¨ã表示ä¸å¤®ä¿¡å·çç¼ç åæ°pcentreä¸å¤®åå å«è¡¨ç¤ºä¸å¤®æ®çä¿¡å·çç¼ç åæ°pcentre residual以åä¼åç平衡å å149ãå¨å æ³åå 235ä¸å°ä¸å¤®æ®çä¿¡å·ä¸ç¼©æ¾åçå声éä¿¡å·ç¸å ï¼ä»è产çä¿®æ¹åçä¸å¤®ä¿¡å·142æ¥è¡¥å¿ç¼ç 误差ãThree channel signals L, C, R are provided at the three inputs 16A-C, and a mono signal x mono is generated by the sum of these three signals. A central signal encoder unit 130 is added, which receives the central signal xcentre . In this embodiment the mono signal 42 is the encoded and decoded mono signal x" mono and is multiplied in the multiplier 133 by a certain balance factor g Q. In the subtraction unit 135, the multiplied The mono signal is subtracted from the central signal xcentre to produce the central residual signal. The balance factor gQ is determined by the optimizer 137 based on the content of the mono signal and the central signal, so as to minimize the central residual according to quality criteria Signal. The central residual signal is encoded according to any encoding process in the central residual encoder 139. Preferably, the central residual encoder 139 is a low bit-rate transform coder or CELP encoder. The encoding parameter p center representing the central signal is then Contains an encoding parameter p center residual representing the central residual signal and an optimized balance factor 149. The central residual signal is added to the scaled mono signal in the addition unit 235, thereby producing a modified central signal 142 to compensate for encoding errors .
å¦åé¢ç宿½ä¾ä¸é£æ ·ï¼ä¾§ä¿¡å·xside(å³å·¦Lä¸å³R声éä¹é´çå·®)被æä¾ç»ä¾§ä¿¡å·ç¼ç å¨åå 30ãç¶èï¼å¨è¿éï¼ä¼åå¨37ä¹ä¾èµäºç±ä¸å¤®ä¿¡å·ç¼ç å¨åå 130ææä¾çä¿®æ¹åçä¸å¤®ä¿¡å·142ãå æ¤å°å¨åæ³åå 35ä¸äº§çä¾§æ®çä¿¡å·ä»¥ä½ä¸ºå声éä¿¡å·42ãä¿®æ¹åçä¸å¤®ä¿¡å·142以åä¾§ä¿¡å·çæä½³çº¿æ§ç»åãAs in the previous embodiments, the side signal x side (ie the difference between the left L and right R channels) is supplied to the side signal encoder unit 30 . Here, however, the optimizer 37 also relies on the modified central signal 142 provided by the central signal encoder unit 130 . The side residual signal will thus be generated in the subtraction unit 35 as an optimal linear combination of the mono signal 42, the modified central signal 142 and the side signal.
ä¸è¿°å¯å帧é¿åº¦çæ¦å¿µå¯ä»¥è¢«åºç¨å°ä¾§ä¿¡å·åä¸å¤®ä¿¡å·çä»»ä¸ä¸æè å ¨é¨ä¸ãThe concept of variable frame length described above can be applied to either or both of the side and center signals.
å¾6说æéäºä»å¾5çç¼ç å¨åå æ¥æ¶ç¼ç çé³é¢ä¿¡å·çè§£ç å¨åå ãææ¥æ¶çä¿¡å·54被åæè¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°pmonoã表示ä¸å¤®ä¿¡å·çç¼ç åæ°pcnetre以å表示侧信å·çç¼ç åæ°psideãå¨è§£ç å¨64ä¸ï¼è¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°pmonoè¢«ç¨æ¥äº§ç主信å·xâmonoãå¨è§£ç å¨160ä¸ï¼è¡¨ç¤ºä¸å¤®ä¿¡å·çç¼ç åæ°pcentre被ç¨äºåºäºä¸»ä¿¡å·xâmonoæ¥äº§çä¸å¤®ä¿¡å·xâcentreãå¨è§£ç å¨60ä¸ï¼æ ¹æ®ä¸»ä¿¡å·xâmonoåä¸å¤®ä¿¡å·xâcentreæ¥è§£ç 表示侧信å·çç¼ç åæ°psideï¼ä»è产çä¾§ä¿¡å·xâsideãFIG. 6 illustrates a decoder unit adapted to receive an encoded audio signal from the encoder unit of FIG. 5 . The received signal 54 is divided into coding parameters p mono representing the main signal, coding parameters p cnetre representing the central signal, and coding parameters p side representing the side signals. In decoder 64, the coding parameter p mono representing the main signal is used to generate the main signal x" mono . In decoder 160, the coding parameter p center representing the central signal is used to generate the central signal based on the main signal x" mono . Signal x" centre . In the decoder 60, the coding parameter pside representing the side signal is decoded from the main signal x" mono and the central signal x" centre , thereby generating the side signal x" side .
该è¿ç¨å¯ä»¥å¨æ°å¦ä¸è¡¨ç¤ºå¦ä¸ï¼The process can be expressed mathematically as follows:
æ ¹æ®ä¸å¼å°è¾å ¥ä¿¡å·xleftãxright以åxcentreç»å为ä¸ä¸ªå声éï¼Combine the input signals x left , x right and x center into one mono channel according to:
xmono(n)ï¼Î±xleft(n)+βxright(n)+Ïxcentre(n).x mono (n)ï¼Î±x left (n)+βx right (n)+Ïx center (n).
为äºç®åèµ·è§ï¼å¨å©ä½é¨åä¸å°Î±ãβ以åÏ设置为1.0ï¼ä½æ¯å®ä»¬å¯ä»¥è¢«è®¾ç½®ä¸ºä»»æå¼ãαãβ以åÏçå¼å¯ä»¥æ¯å¸¸æ°ï¼æè åå³äºä¿¡å·å 容ï¼ä»¥ä¾¿å¼ºè°ä¸ä¸ªæè 两个声éï¼ä»èè·å¾ä¸ä¸ªæä½³è´¨éãFor simplicity, α, β, and Ï are set to 1.0 in the remainder, but they can be set to arbitrary values. The values of α, β and Ï can be constant or depend on the signal content in order to emphasize one or two channels and thus obtain an optimum quality.
å¦ä¸è®¡ç®å¨å声éåä¸å¤®ä¿¡å·ä¹é´çå½ä¸åçäºç¸å ³ï¼The normalized cross-correlation between the mono and center signals is calculated as follows:
RR == cmcm == RR cntcnt RR cccc ·&Center Dot; RR mmmm ,,
å ¶ä¸in
RR cccc == [[ ΣΣ nno == framestartframe start frameendframe end xx centrecenter (( nno )) xx centrecenter (( nno )) ]]
RR mmmm == [[ ΣΣ nno == framestartframe start frameendframe end xx monomono (( nno )) xx monomono (( nno )) ]]
RR cmcm == [[ ΣΣ nno == framestartframe start frameendframe end xx centrecenter (( nno )) xx monomono (( nno )) ]] ..
xcentreæ¯ä¸å¤®ä¿¡å·ï¼ä»¥åxmonoæ¯å声éä¿¡å·ãå声éä¿¡å·æ¥èªäºå声éç®æ ä¿¡å·ï¼ä½æ¯ä¹å¯è½ä½¿ç¨å声éç¼ç å¨çæ¬å°åæãx center is the center signal, and x mono is the mono signal. The mono signal is derived from the mono target signal, but may also be synthesized locally using the mono encoder.
è¦ç¼ç çä¸å¤®æ®çä¿¡å·ä¸ºï¼The central residual signal to encode is:
xcentreresidual(n)ï¼xcentre(n)-gQxmono(n)x centreresidual (n)ï¼x center (n)-g Q x mono (n)
gg QQ == QQ gg -- 11 (( QQ gg (( RR cmcm RR mmmm )) )) ..
Qg(...)æ¯è¢«åºç¨äºå¹³è¡¡å åçéå彿°ãå¨ä¼ è¾ä¿¡éä¸åéæè¿°å¹³è¡¡å åãQg(...) is the quantization function applied to the balance factor. The balance factor is sent in a transport channel.
妿Ecæ¯ä¸å¤®æ®çä¿¡å·çç¼ç 彿°(ä¾å¦åæ¢ç¼ç å¨)ï¼ä»¥åEmæ¯å声éä¿¡å·çç¼ç 彿°ï¼åå¨è§£ç 卿«å°¾çè§£ç ä¿¡å·xâcentre被æè¿°ä¸ºï¼If E c is the encoding function of the central residual signal (e.g. a transform coder), and E m is the encoding function of the mono signal, then the decoded signal xâ center at the end of the decoder is described as:
xâ³centre(n)ï¼gQxâ³mono(n)+xâ³centreresidual(n)xâ³ center (n)ï¼g Q xâ³ mono (n)+xâ³ centerresidual (n)
xx centreresidualcenterresidual ′′ ′′ == EE. cc -- 11 (( EE. cc (( xx centreresidualcenterresidual )) ))
xx monomono ′′ ′′ == EE. mm -- 11 (( EE. mm (( xx monomono )) ))
è¦ç¼ç çä¾§æ®çä¿¡å·ä¸ºï¼The side residual signal to be encoded is:
xsideresidual(n)ï¼(xleft(n)-xright(n))-gQsmxâ³mono(n)-gQscxâ³centre(n)ï¼x sideresidual (n)=(x left (n)-x right (n))-g Qsm xâ³ mono (n)-g Qsc xâ³ center (n),
å ¶ä¸gQsmågQscæ¯åæ°gsmågscçéåå¼ï¼å ¶æå°åäºè¡¨è¾¾å¼ï¼where g Qsm and g Qsc are quantized values of the parameters g sm and g sc that minimize the expression:
ΣΣ nno == framestartframe start frameendframe end [[ || (( xx leftleft (( nno )) -- xx rightright (( nno )) )) -- gg smsm xx monomono ′′ ′′ (( nno )) -- gg scsc xx centrecenter ′′ ′′ (( nno )) || ]] ηη ..
对äºè¯¯å·®çæå°åæ¹æå°åï¼Î·ä¾å¦å¯ä»¥çäº2ãgsmågscåæ°å¯ä»¥è¢«å ±åéåæè åå¼éåãFor least mean square minimization of errors, n may be equal to 2, for example. The gsm and gsc parameters can be quantized together or separately.
妿Esæ¯ä¾§æ®çä¿¡å·çç¼ç 彿°ï¼åè§£ç åç声éä¿¡å·xâå·¦åxâright被ç»åºä¸ºï¼If Es is the encoding function of the side residual signal, the decoded channel signals x" left and x" right are given as:
xâ³left(n)ï¼xâ³mono(n)-xâ³centre(n)+xâ³side(n)xâ³ left (n)ï¼xâ³ mono (n)-xâ³ center (n)+xâ³ side (n)
xâ³right(n)ï¼xâ³mono(n)-xâ³centre(n)-xâ³side(n)xâ³ right (n)ï¼xâ³ mono (n)-xâ³ center (n)-xâ³ side (n)
xâ³side(n)ï¼xâ³sideresidual+gQsmxâ³mono(n)+gQsxâ³centre(n)xâ³ side (n)ï¼xâ³ sideresidual +g Qsm xâ³ mono (n)+g Qs xâ³ center (n)
xx sideresidualside residual ′′ ′′ == EE. sthe s -- 11 (( EE. sthe s (( xx sideresidualside residual )) )) ..
æä»¤äººè®¨åç坿ç¥äººå·¥äº§ç©ä¹ä¸æ¯é¢å声æåºãå¨å¾7a-bä¸ï¼æè¿°å¾è¯´æäºè¿ç§äººå·¥äº§ç©ãå设信å·åéå ·æå¦æ²çº¿100æç¤ºçæ¶é´åå±ãå¨å¼å§(ä»t0å¼å§)ï¼å¨é³é¢éæ ·ä¸ä¸åå¨ä¿¡å·åéãå¨t1åt2ä¹é´çæ¶é´tï¼çªç¶åºç°ä¿¡å·åéãå½ä½¿ç¨t2-t1ç帧é¿åº¦å¯¹è¯¥ä¿¡å·åéç¼ç æ¶ï¼è¯¥ä¿¡å·åéçåºç°ä¼è¢«âæ¸éâ卿´ä¸ªå¸§ä¸ï¼å¦æ²çº¿101æç¤ºãå¦æäº§ç该æ²çº¿101çè§£ç ï¼å该信å·åéå¨è¯¥ä¿¡å·åéç颿åºç°ä¹ååºç°æ¶é´Îtï¼ç±æ¤æç¥å°âé¢å声âãOne of the most annoying perceptual artifacts is the pre-echo effect. In Figures 7a-b, the figures illustrate this artefact. Assume that the signal components have a temporal development as shown by curve 100 . At the beginning (starting from t0), there are no signal components in the audio samples. At time t between t1 and t2, a signal component suddenly appears. When this signal component is encoded using a frame length of t2-t1, the occurrence of this signal component will be "bleeded" over the entire frame, as shown by curve 101 . If a decoding of this curve 101 occurs, the signal component occurs a time Ît before its expected occurrence, whereby a "pre-echo" is perceived.
å¦æä½¿ç¨é¿çç¼ç 帧ï¼åé¢å声ç人工产ç©åå¾è¿ä¸æ¥å¢å¼ºãéè¿ä½¿ç¨è¾çç帧ï¼è¯¥äººå·¥äº§ç©ç¨å¾®å¾å°ææ¢ãå¤çä¸è¿°é¢å声é®é¢çå¦ä¸æ¹æ³æ¯å©ç¨ä»¥ä¸äºå®ï¼å³å¨ç¼ç å¨åè§£ç 卿«å°¾é½å¯ä»¥å©ç¨å声éä¿¡å·ãè¿ä½¿å¾æå¯è½æ ¹æ®è¯¥å声éä¿¡å·çè½éè½®å»æ¥ç¼©æ¾ä¾§ä¿¡å·ãå¨è§£ç 卿«å°¾ï¼æ§è¡ç¸åç缩æ¾ï¼å èå¯ä»¥åè½»ä¸äºé¢å声é®é¢ãIf long coded frames are used, the artifacts of the pre-echo become further enhanced. By using shorter frames, this artifact is somewhat suppressed. Another way to deal with the above-mentioned pre-echo problem is to take advantage of the fact that a mono signal is available at both the encoder and decoder end. This makes it possible to scale the side signal according to the energy contour of the mono signal. At the end of the decoder, the inverse scaling is performed, thus mitigating some pre-echo issues.
卿´ä¸ªå¸§ä¸è®¡ç®è¯¥å声éä¿¡å·çè½éè½®å»ä¸ºï¼Compute the energy contour of this mono signal over the entire frame as:
E c ( m ) = [ Σ n = m - L m + L w ( n ) x mono 2 ( n ) ] , 帧å¼å§â¤mâ¤å¸§æ«å°¾ï¼ E. c ( m ) = [ Σ no = m - L m + L w ( no ) x mono 2 ( no ) ] , frame start ⤠m ⤠frame end,
å ¶ä¸w(n)æ¯å çªå½æ°ãæç®åçå çªå½æ°æ¯ä¸ä¸ªç©å½¢çªï¼ä½æ¯ä¹è®¸æ´ææå ¶å®ççªå£ç±»åï¼ä¾å¦æ±æçªãwhere w(n) is the windowing function. The simplest windowing function is a rectangular window, but other window types, such as Hamming windows, may be more desirable.
ç¶å缩æ¾ä¾§æ®çä¿¡å·ä¸ºï¼Then the residual signal on the scaling side is:
x ‾ sideresidual ( n ) = x sideresidual ( n ) E c ( n ) , 帧å¼å§â¤nâ¤å¸§æ«å°¾ã x ‾ side residual ( no ) = x side residual ( no ) E. c ( no ) , Frame start ⤠n ⤠frame end.
ä¸è¿°çå¼å¯ä»¥ä½¿ç¨æ´ä¸è¬çå½¢å¼è¢«å为ï¼The above equation can be written in a more general form as:
x ‾ sideresidual ( n ) = x sideresidual ( n ) f ( E c ( n ) ) , 帧å¼å§â¤nâ¤å¸§æ«å°¾ï¼ x ‾ side residual ( no ) = x side residual ( no ) f ( E. c ( no ) ) , frame start ⤠n ⤠frame end,
å ¶ä¸f(...)æ¯åè°è¿ç»å½æ°ãå¨è§£ç å¨ä¸ï¼å¯¹æè§£ç çå声éä¿¡å·è®¡ç®è½éè½®å»ï¼å¹¶ä¸å°æè¿°è½®å»åºç¨å°è§£ç çä¾§ä¿¡å·ä¸ï¼where f(...) is a monotone continuous function. In the decoder, an energy profile is computed on the decoded mono signal and applied to the decoded side signal:
xâ³ side(n)ï¼xâ³side(n)f(Ec(n))ï¼å¸§å¼å§â¤nâ¤å¸§æ«å°¾ã xâ³ side (n)=xâ³ side (n)f(E c (n)), frame startâ¤nâ¤frame end.
ç±äºå¨æç§ç¨åº¦ä¸ç¼©æ¾çæ¤è½éè½®å»æ¯ä½¿ç¨è¾ç帧é¿åº¦çæ¿ä»£ï¼å æ¤è¿ä¸æ¦å¿µç¹å«éäºä¸å¯å帧é¿åº¦çæ¦å¿µç¸ç»åï¼å¦ä¸é¢è¿ä¸æ¥æè¿°çãéè¿æ¥æä¸äºåºç¨è½éè½®å»ç¼©æ¾çç¼ç æ¹æ¡ãä¸äºä¸åºç¨ä»¥åä¸äºä» 卿äºå帧æé´åºç¨è½éè½®å»ç¼©æ¾çç¼ç æ¹æ¡ï¼å¯ä»¥æä¾ä¸ä¸ªæ´çµæ´»çç¼ç æ¹æ¡çéåãå¨å¾8ä¸è¯´æäºæ ¹æ®æ¬åæçä¸ä¸ªä¿¡å·ç¼ç å¨åå 30ç宿½ä¾ã卿¤ï¼ä¸åç¼ç æ¹æ¡81å å«äºå é´å½±çå帧(表示åºç¨äºè½éè½®å»ç¼©æ¾çç¼ç )åæªå é´å½±çå帧(表示没æåºç¨è½éè½®å»ç¼©æ¾çç¼ç è¿ç¨)ã以è¿ç§æ¹å¼ï¼ä¸ä» å¯ä»¥è·å¾ä¸åé¿åº¦çå帧çç»åï¼èä¸å¯ä»¥è·å¾å ·æä¸åç¼ç åççå帧çç»åãå¨å½åçè¯´ææ§å®ä¾ä¸ï¼å¨ä¸åç¼ç æ¹æ¡ä¹é´åºç¨çè½éè½®å»ç¼©æ¾ä¸åã卿´ä¸è¬çæ å½¢ä¸ï¼å¯ä»¥ç¨ç±»ä¼¼çæ¹å¼å°ä»»ä½çç¼ç åçä¸å¯åé¿åº¦çæ¦å¿µç¸ç»åãSince this energy profile that scales to some extent is an alternative to using shorter frame lengths, this concept is particularly well suited in combination with the concept of variable frame lengths, as further described above. By having some coding schemes that apply energy contour scaling, some that do not, and some that apply energy contour scaling only during certain subframes, a more flexible set of coding schemes can be provided. An embodiment of a signal encoder unit 30 according to the invention is illustrated in FIG. 8 . Here, the different coding schemes 81 include shaded subframes (representing coding with energy contour scaling applied) and unshaded subframes (representing coding without energy contour scaling applied). In this way, not only combinations of subframes of different lengths but also combinations of subframes with different coding principles can be obtained. In the current illustrative example, the energy contour scaling applied differs between different encoding schemes. In a more general case, any encoding principle can be combined with the concept of variable length in a similar fashion.
å¾8çç¼ç æ¹æ¡çéåå æ¬ä»¥ä¸åçæ¹å¼å¤çä¾å¦é¢å声人工产ç©çæ¹æ¡ãå¨ä¸äºæ¹æ¡ä¸ï¼ä½¿ç¨äºæ ¹æ®è½éè½®å»åçå ·æé¢å声æå°åçè¾é¿å帧ãå¨å ¶å®æ¹æ¡ä¸ï¼å©ç¨äºæ²¡æè¿è¡è½éè½®å»ç¼©æ¾çè¾ççåå¸§ãæ ¹æ®ä¿¡å·çå 容ï¼å ¶ä¸çä¸ä¸ªå¤éæ¹æ¡ä¼æ´ä¸ºæçã对äºåå严éçé¢å声æ å½¢ï¼å¿ 须使ç¨è¿è¡è½éè½®å»ç¼©æ¾ççå帧çç¼ç æ¹æ¡ãThe set of encoding schemes of FIG. 8 includes schemes that handle artifacts such as pre-echo in different ways. In some schemes, longer subframes with pre-echo minimization according to the principle of energy contouring are used. In other schemes, shorter subframes are utilized without energy contour scaling. Depending on the content of the signal, one of the alternatives may be more beneficial. For very severe pre-echo situations, short subframe coding schemes with energy contour scaling must be used.
ææåºçè§£å³æ¹æ¡å¯ä»¥ç¨å¨å ¨é¨é¢å¸¦ä¸æè å¨ä¸ä¸ªæå¤ä¸ªä¸åçå带ä¸ãå带ç使ç¨å¯ä»¥è¢«æ½å äºä¸»ä¿¡å·åä¾§ä¿¡å·çäºè 䏿è åç¬æ½å å¨å ¶ä¸ä¸ä¸ªä¸ãä¼é宿½ä¾å æ¬å°ä¾§ä¿¡å·åæå 个é¢å¸¦ãåå åªæ¯ç±äºå¨é离çé¢å¸¦ä¸é¤å»å¯è½çå使¯å¨æ´ä¸ªé¢å¸¦ä¸é¤å»æ´å®¹æãå½è§£ç å ·æä¸°å¯çé¢è°±å 容æ¶è¿ä¸ç¹ç¹å«éè¦ãThe proposed solution can be used in all frequency bands or in one or more different sub-bands. The use of subbands can be applied to both the main and side signals or to one of them alone. A preferred embodiment consists in dividing the side signal into several frequency bands. The reason is simply because it is easier to remove possible redundancies in isolated frequency bands than in the entire frequency band. This is especially important when decoding has rich spectral content.
ä¸ç§å¯è½çç¨éæ¯å©ç¨ä¸è¿°æ¹æ³æ¥ç¼ç ä½äºé¢å®éå¼çé¢å¸¦ãæè¿°é¢å®éå¼ä¼éå¯ä»¥ä¸º2kHzï¼æè çè³æ´ä¼é为1kHzãå¯¹äºæå ´è¶£çé¢çèå´çå ¶ä½é¨åï¼å¯ä»¥å©ç¨ä¸è¿°æ¹æ³å¯¹å¦ä¸ä¸ªéå é¢å¸¦è¿è¡ç¼ç ï¼æè 使ç¨ä¸ä¸ªå®å ¨ä¸åçæ¹æ³ãOne possible use is to use the method described above to encode frequency bands below a predetermined threshold. The predetermined threshold may preferably be 2 kHz, or even more preferably 1 kHz. For the rest of the frequency range of interest, another additional frequency band can be coded using the method described above, or a completely different method can be used.
ä¼é为ä½é¢ä½¿ç¨ä¸è¿°æ¹æ³çä¸ä¸ªå¨æºæ¯æ©æ£ç声åºé常å¨é«é¢æ²¡æå¤å°è½éå 容ãèªç¶åå æ¯å£°é³å¸æ¶é常éçé¢çèå¢å ãèä¸ï¼æ©æ£å£°åºåéå¨è¾é«é¢ç对äºäººç±»å¬è§ç³»ç»ä¼¼ä¹èµ·å°ä¸å¤ªéè¦çä½ç¨ãå æ¤ï¼å¨ä½é¢æ¶(ä½äº1æ2kHz)éç¨æè¿°è§£å³æ¹æ¡æ¯æççï¼å¹¶ä¸ä¾èµäºå ¶å®æ¡ä»¶èå¨è¾é«é¢çä½¿ç¨æ¯ç¹æçæ´é«çç¼ç æ¹æ¡ãåªå¨ä½é¢æ¶åºç¨æè¿°æ¹æ¡å¯ä»¥å¤§éèçæ¯ç¹çï¼å 为æåºçæ¹æ³æå¿ é¡»çæ¯ç¹ç䏿éè¦çå¸¦å®½ææ£æ¯ãå¨å¤§å¤æ°æ å½¢ä¸ï¼å声éç¼ç å¨å¯ä»¥å¯¹æ´ä¸ªé¢å¸¦ç¼ç ï¼èå»ºè®®åªæ¯å¨é¢å¸¦çè¾ä½é¨åæ§è¡ææåºçä¾§ä¿¡å·ç¼ç ï¼å¦å¾9ç¤ºææ§å°è¯´æçãåèæ°å301æçæ¯æ ¹æ®æ¬åæçä¾§ä¿¡å·ç¼ç æ¹æ¡ï¼åèæ°å302æçæ¯ä»»ä½å ¶å®çä¾§ä¿¡å·ç¼ç æ¹æ¡ï¼ä»¥ååèæ°å303æçæ¯ä¾§ä¿¡å·çä¸ä¸ªç¼ç æ¹æ¡ãOne motivation for using the above method preferably for low frequencies is that diffuse sound fields generally have little energy content at high frequencies. The natural reason is that sound absorption generally increases with frequency. Also, diffuse sound field components seem to play a less important role for the human auditory system at higher frequencies. Therefore, it is beneficial to employ the described solution at low frequencies (below 1 or 2 kHz), and use a more bit-efficient coding scheme at higher frequencies, depending on other conditions. Applying the scheme only at low frequencies can save a lot of bit rate, since the bit rate necessary for the proposed method is directly proportional to the required bandwidth. In most cases, a mono coder can encode the entire frequency band, whereas it is proposed to perform the proposed side signal encoding only in the lower part of the frequency band, as schematically illustrated in FIG. 9 . Reference numeral 301 refers to the coding scheme of the side signal according to the invention, reference numeral 302 refers to any other coding scheme of the side signal, and reference numeral 303 refers to a coding scheme of the side signal.
乿å¯è½å¯¹äºå 个ä¸åçé¢å¸¦ä½¿ç¨ææåºçæ¹æ³ãIt is also possible to use the proposed method for several different frequency bands.
å¨å¾10ä¸ï¼ç¨æµç¨å¾è¯´æäºæ ¹æ®æ¬åæçç¼ç æ¹æ³ç宿½ä¾çä¸»è¦æ¥éª¤ã该è¿ç¨å¼å§äºæ¥éª¤200ã卿¥éª¤210ï¼ç¼ç ä»å¤é³ä¿¡å·ä¸æ¨å¯¼åºç主信å·ã卿¥éª¤212ï¼æä¾ç¼ç æ¹æ¡ï¼å ¶å æ¬å ·æä¸åé¿åº¦å/æé¡ºåºçå帧ã卿¥éª¤214å©ç¨ä¸ä¸ªè³å°é¨åå°æ ¹æ®å½åå¤é³ä¿¡å·çå®é ä¿¡å·å 容èéæ©çç¼ç æ¹æ¡æ¥å¯¹ä»å¤é³ä¿¡å·ä¸æ¨å¯¼åºçä¾§ä¿¡å·è¿è¡ç¼ç ã该è¿ç¨ç»æäºæ¥éª¤299ãIn Fig. 10, the main steps of an embodiment of the encoding method according to the invention are illustrated with a flowchart. The process starts at step 200 . In step 210, the main signal derived from the multi-tone signal is encoded. At step 212, a coding scheme is provided that includes subframes of different lengths and/or order. The side signal derived from the multi-tone signal is encoded at step 214 using a coding scheme selected at least in part based on the actual signal content of the current multi-tone signal. The process ends at step 299.
å¨å¾11ä¸ï¼ç¨æµç¨å¾è¯´æäºæ ¹æ®æ¬åæçè§£ç æ¹æ³ç宿½ä¾çä¸»è¦æ¥éª¤ã该è¿ç¨å§äºæ¥éª¤200ã卿¥éª¤220ï¼è§£ç ææ¥æ¶çç¼ç ç主信å·ã卿¥éª¤222ï¼æä¾ç¼ç æ¹æ¡ï¼å ¶å æ¬å ·æä¸åé¿åº¦å/æé¡ºåºçå帧ã卿¥éª¤224ä¸éè¿ä¸ä¸ªéå®çç¼ç æ¹æ¡å¯¹ææ¥æ¶çä¾§ä¿¡å·è§£ç ã卿¥éª¤226ä¸ï¼å°æè§£ç ç主åä¾§ä¿¡å·ç»å为ä¸ä¸ªå¤é³ä¿¡å·ãæè¿°è¿ç¨ç»æäºæ¥éª¤299ãIn Fig. 11, the main steps of an embodiment of the decoding method according to the invention are illustrated with a flowchart. The process starts at step 200 . In step 220, the received encoded main signal is decoded. At step 222, a coding scheme is provided that includes subframes of different lengths and/or order. In step 224 the received side signal is decoded by a selected coding scheme. In step 226, the decoded main and side signals are combined into one multi-tone signal. The process ends at step 299 .
ä¸è¿°å®æ½ä¾åºå½è¢«ç解为æ¬åæçä¸äºè¯´ææ§çå®ä¾ãæ¬é¢åçææ¯äººåå°ä¼çè§£ï¼å¯ä»¥å¯¹è¿äºå®æ½ä¾è¿è¡åç§ä¿®æ¹ãç»ååååèä¸åè±ç¦»æ¬åæçèå´ãç¹å«æ¯ï¼å¨å ¶å®æ¹æ¡ä¸å¯ä»¥ç»åä¸å宿½ä¾ä¸çä¸åçé¨åè§£å³æ¹æ¡ï¼åªè¦å ¶å¨ææ¯ä¸æ¯å¯è¡çãç¶èï¼æ¬åæçèå´ç±æéçæå©è¦æ±ä¹¦å 以éå®ãThe above-described embodiments should be understood as some illustrative examples of the invention. Those skilled in the art will understand that various modifications, combinations and changes can be made to these embodiments without departing from the scope of the present invention. In particular, different partial solutions from the different exemplary embodiments can be combined in other solutions as far as this is technically possible. However, the scope of the present invention is defined by the appended claims.
åèæç®references
欧洲ä¸å©0497413European Patent 0497413
ç¾å½ä¸å©5,285,498US Patent 5,285,498
ç¾å½ä¸å©5,434,948US Patent 5,434,948
ç±C.Fallerç人å¨å¾·å½æ å°¼é»2002å¹´5æä¸¾è¡ç第112å±AESä¼è®®ä¸çâBinaural cue coding applied to stereo and multi-channel audio compression(对ç«ä½å£°åå¤å£°éé³é¢å缩æåºç¨çææ¯å¿ç声å¦ç¼ç )âã"Binaural cue coding applied to stereo and multi-channel audio compression" by C.Faller et al. at the 112th AES conference held in Munich, Germany in May 2002 coding)".
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4