RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://patents.google.com/patent/CN100559465C/en below:

CN100559465C - Fidelity optimized variable frame length encoding

å·ä½å®æ½æ¹å¼ Detailed ways

å¾1è¯´æäºä¸ä¸ªå¸åçç³»ç»1ï¼å¨å¶ä¸å¯ä»¥æçå°ä½¿ç¨æ¬åæãåå°æº10åå«ä¸ä¸ªå¤©çº¿12ï¼å¶åæ¬ç¸å³çç¡¬ä»¶åè½¯ä»¶ä»¥è½å¤åæ¥æ¶æº20åéæ çº¿çµä¿¡å·5ãåå°æº10é¤äºå¤ä¸ªå¶å®é¨åä¹å¤è¿åæ¬å¤å£°éç¼ç å¨14ï¼å¶å°å¤ä¸ªè¾å¥å£°é16çä¿¡å·åæ¢æéäºæ çº¿çµä¼ è¾çè¾åºä¿¡å·ãä»¥ä¸å°è¿ä¸æ¥è¯¦ç»æè¿°åéçå¤å£°éç¼ç å¨14çå®ä¾ãå¯ä»¥ä»ä¾å¦é³é¢ä¿¡å·åå¨å¨18æä¾è¾å¥å£°é16çä¿¡å·ï¼ä¾å¦é³é¢è®°å½çæ°åè¡¨ç¤ºçæ°æ®æä»¶ãç£å¸¦æèé³é¢çèä¹ç¯çççãè¿å¯ä»¥âå®åµâæä¾è¾å¥å£°é16çä¿¡å·ï¼ä¾å¦ä»ä¸ç»è¯ç19æä¾ãå¦æé³é¢ä¿¡å·è¿ä¸æ¯æ°åæ ¼å¼ï¼åå¨è¿å¥å¤å£°éç¼ç å¨14ä¹åå¯¹å¶è¿è¡æ°ååãFigure 1 illustrates a typical system 1 in which the present invention may be beneficially used. The transmitter 10 comprises an antenna 12 including the associated hardware and software to be able to transmit a radio signal 5 to a receiver 20 . Transmitter 10 includes, among other things, a multi-channel encoder 14 that transforms signals of a plurality of input channels 16 into output signals suitable for radio transmission. Examples of suitable multi-channel encoders 14 are described in further detail below. The input channel 16 signal may be provided from eg an audio signal memory 18, eg a data file of a digital representation of an audio recording, a magnetic tape or an audio polyethylene disc or the like. The input channel 16 signal may also be provided "live", for example from a set of microphones 19 . If the audio signal is not already in digital format, it is digitized before entering the multi-channel encoder 14 .

å¨æ¥æ¶æº20ä¾§ï¼å·æç¸å³ç¡¬ä»¶åè½¯ä»¶çå¤©çº¿22å¤çè¡¨ç¤ºå¤é³é³é¢ä¿¡å·çæ çº¿çµä¿¡å·5çæ¥æ¶ãå¨æ¤æ§è¡éå¸¸çåè½ï¼ä¾å¦è¯¯å·®æ ¡æ£ãè§£ç å¨24è§£ç ææ¥æ¶çæ çº¿çµä¿¡å·5ï¼å¹¶ä¸å°ç±æ¤æºå¸¦çé³é¢æ°æ®åæ¢æå¤ä¸ªè¾åºå£°é26çä¿¡å·ãè¾åºä¿¡å·å¯ä»¥è¢«æä¾ç»ä¾å¦æ¬å£°å¨29è¿è¡ç«å³åç°ï¼æèå¯ä»¥è¢«åå¨å¨ä»»ä½ç§ç±»çé³é¢ä¿¡å·åå¨å¨28ä¸ãOn the receiver 20 side, an antenna 22 with associated hardware and software handles the reception of a radio signal 5 representing a multi-tone audio signal. The usual functions, such as error correction, are performed here. The decoder 24 decodes the received radio signal 5 and transforms the audio data carried thereby into signals of a plurality of output channels 26 . The output signal may be provided eg to a loudspeaker 29 for immediate presentation, or may be stored in any kind of audio signal memory 28 .

ç³»ç»1å¯ä»¥æ¯ä¾å¦çµè¯ä¼è®®ç³»ç»ãç¨äºæä¾é³é¢æå¡æå¶å®é³é¢åºç¨çç³»ç»ãå¨ä¸äºç³»ç»ä¸ï¼ä¾å¦å¨çµè¯ä¼è®®ç³»ç»ä¸ï¼éä¿¡å¿é¡»æ¯åå·¥ç±»åçï¼èä»ä¸ä¸ªæå¡ä¾åºååè®¢æ·ååé³ä¹åå¯ä»¥åºæ¬ä¸æ¯ååç±»åçãä»åå°æº10å°æ¥æ¶æº20çä¿¡å·ä¼ è¾ä¹å¯ä»¥ç¨ä»»ä½å¶å®çæ¹å¼è¿è¡ï¼ä¾å¦éè¿ä¸åç§ç±»ççµç£æ³¢ãçµç¼æåçº¤ä»¥åå®ä»¬çç»åãThe system 1 may be, for example, a teleconferencing system, a system for providing audio services or other audio applications. In some systems, such as in teleconferencing systems, the communication must be of the duplex type, whereas the distribution of music from a service provider to subscribers can be essentially of the one-way type. The signal transmission from the transmitter 10 to the receiver 20 can also be done in any other way, for example by different kinds of electromagnetic waves, cables or optical fibers and combinations thereof.

å¾2aè¯´ææ ¹æ®æ¬åæçç¼ç å¨çå®æ½ä¾ãå¨è¿ä¸å®æ½ä¾ä¸ï¼å¤é³ä¿¡å·æ¯åå«å¨è¾å¥ç«¯16Aå16Bå¤æ¥æ¶çä¸¤ä¸ªå£°éaåbçç«ä½å£°ä¿¡å·ãå£°éaåbçä¿¡å·è¢«æä¾ç»é¢å¤çåå32ï¼å¨é£éå¯ä»¥æ§è¡ä¸åçä¿¡å·è°èè¿ç¨ãæ¥èªé¢å¤çåå32çè¾åºçä¿¡å·(ä¹è®¸è¢«ä¿®æ¹è¿)å¨å æ³åå34ä¸è¿è¡æ±åãæè¿°å æ³åå34è¿ææå¾å°çåé¤ä»¥å å2ãä»¥è¿ç§æ¹å¼äº§ççä¿¡å·x_monoæ¯è¯¥ç«ä½å£°ä¿¡å·çä¸»ä¿¡å·ï¼å ä¸ºå®åºæ¬ä¸åæ¬æ¥èªä¸¤ä¸ªä¿¡éçæææ°æ®ãå¨è¿ä¸å®æ½ä¾ä¸ï¼ä¸»ä¿¡å·å èè¡¨ç¤ºä¸ä¸ªçº¯âåå£°éâä¿¡å·ãä¸»ä¿¡å·x_monoè¢«æä¾ç»ä¸»ä¿¡å·ç¼ç å¨åå38ï¼å¶æ ¹æ®ä»»ä½åéçç¼ç åçæ¥ç¼ç æè¿°ä¸»ä¿¡å·ãè¿äºåçå¯ä»¥å¨ç°æææ¯ä¸è·å¾ï¼å èå¨æ¤ä¸ä½è¿ä¸æ¥çè®¨è®ºãä¸»ä¿¡å·ç¼ç å¨åå38ç»åºè¾åºä¿¡å·p_monoï¼ä½ä¸ºè¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°ãFigure 2a illustrates an embodiment of an encoder according to the invention. In this embodiment, the polyphonic signal is a stereo signal comprising two channels a and b received at inputs 16A and 16B. The signals of channels a and b are supplied to a pre-processing unit 32 where various signal conditioning processes can be performed. The signals (possibly modified) from the output of the pre-processing unit 32 are summed in an addition unit 34 . The summing unit 34 also divides the resulting sum by a factor of two. The signal x _mono produced in this way is the main signal of this stereo signal, since it basically includes all the data from both channels. In this embodiment, the main signal thus represents a purely "mono" signal. The main signal x _mono is provided to a main signal encoder unit 38 which encodes said main signal according to any suitable encoding principle. These principles are available in the prior art and thus will not be discussed further here. The main signal encoder unit 38 gives an output signal p _mono as an encoding parameter representing the main signal.

å¨åæ³åå36ä¸ï¼å£°éä¿¡å·çå·®(é¤ä»¥å å2)è¢«æä¾ä½ä¸ºä¾§ä¿¡å·x_sideãå¨è¿ä¸å®æ½ä¾ä¸ï¼ä¾§ä¿¡å·è¡¨ç¤ºç«ä½å£°ä¿¡å·çä¸¤ä¸ªå£°éä¹é´çå·®ãä¾§ä¿¡å·x_sideè¢«æä¾ç»ä¾§ä¿¡å·ç¼ç åå30ãä»¥ä¸å°è¿ä¸æ¥è®¨è®ºä¾§ä¿¡å·ç¼ç åå30çä¼éå®æ½ä¾ãæ ¹æ®å°å¨ä¸é¢è¿ä¸æ¥è¯¦ç»è®¨è®ºçä¾§ä¿¡å·ç¼ç è¿ç¨ï¼ä¾§ä¿¡å·x_sideè¢«è½¬æ¢æè¡¨ç¤ºä¾§ä¿¡å·x_sideçç¼ç åæ°p_sideãå¨æäºå®æ½ä¾ä¸ï¼è¿å©ç¨ä¸»ä¿¡å·x_monoçä¿¡æ¯æ¥è¿è¡ç¼ç ãç®å¤´42æç¤ºäºè¿ç§è®¾å¤ï¼å¶ä¸å©ç¨äºåå§æªç¼ç çä¸»ä¿¡å·x_monoãå¨è¿ä¸æ¥çå¶å®å®æ½ä¾ä¸ï¼å¨ä¾§ä¿¡å·ç¼ç åå30ä¸æä½¿ç¨çä¸»ä¿¡å·ä¿¡æ¯å¯ä»¥ä»è¡¨ç¤ºè¯¥ä¸»ä¿¡å·çç¼ç åæ°p_monoä¸æ¨æåºæ¥ï¼å¦èçº¿44ææç¤ºçãIn the subtraction unit 36 the difference of the channel signals (divided by a factor of 2) is provided as side signal x _side . In this embodiment, the side signal represents the difference between the two channels of the stereo signal. The side signal x _side is supplied to the side signal encoding unit 30 . A preferred embodiment of the side signal encoding unit 30 will be discussed further below. According to a side signal encoding process which will be discussed in further detail below, the side signal _xside is transformed into an encoding parameter _pside representing the side signal _xside . In some embodiments, information from the main signal x _mono is also used for encoding. Arrow 42 indicates such a device, in which the original unencoded main signal _xmono is utilized. In still other embodiments, the main signal information used in the side signal encoding unit 30 may be inferred from the encoding parameter p _mono representing the main signal, as indicated by the dashed line 44 .

è¡¨ç¤ºä¸»ä¿¡å·x_monoçç¼ç åæ°p_monoæ¯ç¬¬ä¸è¾åºä¿¡å·ï¼ä»¥åè¡¨ç¤ºä¾§ä¿¡å·x_sideçç¼ç åæ°p_sideæ¯ç¬¬äºè¾åºä¿¡å·ãå¨éå¸¸æå½¢ä¸ï¼è¿ä¸¤ä¸ªè¾åºä¿¡å·p_monoãp_sideä¸èµ·è¡¨ç¤ºå®æ´çç«ä½å£°å£°é³ï¼å®ä»¬å¨å¤è·¯å¤ç¨å¨åå40è¢«å¤è·¯å¤ç¨æä¸ä¸ªä¼ è¾ä¿¡å·52ãç¶èï¼å¨å¶å®å®æ½ä¾ä¸ï¼å¯ä»¥åå¼è¿è¡ç¬¬ä¸åç¬¬äºè¾åºä¿¡å·p_monoãp_sideçä¼ è¾ãThe encoding parameter p _mono representing the main signal x _mono is the first output signal and the encoding parameter p _side representing the side signal x _side is the second output signal. In the usual case, the two output signals p _mono , p _side together represent the complete stereo sound, which are multiplexed in the multiplexer unit 40 into one transmission signal 52 . However, in other embodiments, the transmission of the first and second output signals p _mono , p _side may be performed separately.

å¨å¾2bä¸ï¼ä»¥æ¡å¾å½¢å¼è¯´æäºæ ¹æ®æ¬åæçè§£ç å¨24çå®æ½ä¾ãææ¥æ¶çä¿¡å·54(åå«è¡¨ç¤ºä¸»åä¾§ä¿¡å·ä¿¡æ¯çç¼ç åæ°)è¢«æä¾ç»è§£å¤ç¨å¨åå56ï¼å®åå«ååºç¬¬ä¸åç¬¬äºè¾å¥ä¿¡å·ãå¯¹åºäºä¸»ä¿¡å·çç¼ç åæ°p_monoçç¬¬ä¸è¾å¥ä¿¡å·è¢«æä¾ç»ä¸»ä¿¡å·è§£ç å¨åå64ãä»¥ä¼ ç»çæ¹å¼ï¼è¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°p_monoè¢«ç¨äºäº§çä¸ä¸ªè§£ç çä¸»ä¿¡å·xâ_monoï¼å®å°½å¯è½å°ç±»ä¼¼äºç¼ç å¨14(å¾2a)ä¸çä¸»ä¿¡å·x_mono(å¾2a)ãIn Fig. 2b, an embodiment of a decoder 24 according to the invention is illustrated in block diagram form. The received signal 54 (comprising encoding parameters representing information about the main and side signals) is supplied to a demultiplexer unit 56, which separates the first and second input signals, respectively. A first input signal corresponding to the encoding parameter p _mono of the main signal is supplied to the main signal decoder unit 64 . In a conventional manner, the encoding parameters p _mono representing the main signal are used to generate a decoded main signal x" _mono which is as similar as possible to the main signal x _mono (Fig. 2a) in the encoder 14 (Fig. 2a).

ç±»ä¼¼å°ï¼å¯¹åºäºä¾§ä¿¡å·çç¬¬äºè¾å¥ä¿¡å·è¢«æä¾ç»ä¸ä¸ªä¾§ä¿¡å·è§£ç å¨åå60ãå¨è¿éï¼è¡¨ç¤ºä¾§ä¿¡å·çç¼ç åæ°p_sideè¢«ç¨äºæ¢å¤è§£ç çä¾§ä¿¡å·xâ_sideãå¨ä¸äºå®æ½ä¾ä¸ï¼è§£ç è¿ç¨å©ç¨æå³ä¸»ä¿¡å·xâ_monoçä¿¡æ¯ï¼å¦ç®å¤´ææç¤ºçãSimilarly, a second input signal corresponding to the side signal is provided to a side signal decoder unit 60 . Here, the encoding parameter _pside representing the side signal is used to recover the decoded side signal x" _side . In some embodiments, the decoding process utilizes information about the main signal x" _mono , as indicated by the arrow.

æè§£ç çä¸»åä¾§ä¿¡å·xâ_monoãxâ_sideè¢«æä¾ç»ä¸ä¸ªå æ³åå70ï¼å¶æä¾ä¸ä¸ªè¡¨ç¤ºå£°éaçåå§ä¿¡å·çè¾åºä¿¡å·ãç±»ä¼¼å°ï¼ç±åæ³åå68æä¾çå·®æä¾äºä¸ä¸ªè¡¨ç¤ºå£°ébçåå§ä¿¡å·çè¾åºä¿¡å·ãå¯ä»¥æ ¹æ®ç°æææ¯çå¤çè¿ç¨å¨åå¤çå¨åå74ä¸å¯¹è¿äºå£°éä¿¡å·è¿è¡åå¤çãæç»ï¼å¨è§£ç å¨çè¾åºç«¯26Aå26Bæä¾å£°éä¿¡å·aåbãThe decoded main and side signals x" _mono , x" _side are supplied to an addition unit 70 which provides an output signal representing the original signal of channel a. Similarly, the difference provided by subtraction unit 68 provides an output signal representative of the original signal of channel b. These channel signals may be post-processed in the post-processor unit 74 according to prior art processing procedures. Finally, channel signals a and b are provided at decoder outputs 26A and 26B.

å¦å¨åæåå®¹ä¸æè¿°ï¼éå¸¸ä»¥æ¯æ¬¡ä¸å¸§çæ¹å¼è¿è¡ç¼ç ãä¸å¸§åæ¬å¨ä¸ä¸ªé¢å®æ¶é´å¨æåçé³é¢éæ ·ãå¨å¾3açåºé¨ï¼ç¤ºä¾äºæç»æ¶é´ä¸ºLçå¸§SF2ãå¨æ é´å½±é¨ååçé³é¢éæ ·è¦ä¸èµ·è¢«ç¼ç ãåé¢çéæ ·åéåçéæ ·å¨å¶å®å¸§ä¸è¿è¡ç¼ç ãæ è®ºå¦ä½ï¼æéæ ·åæå¸§é½å°å¨å¸§è¾¹çå¤å¼å¥ä¸äºä¸è¿ç»ãå¤åçå£°é³å°ç»åºå¤åçç¼ç åæ°ï¼ä»èåºæ¬ä¸å¨æ¯ä¸ªå¸§è¾¹çå¤åçååãè¿å°äº§çå¯æç¥çè¯¯å·®ãå¯¹è¿ç§æå½¢ç¨å¾®è¿è¡è¡¥å¿çä¸ç§æ¹æ³æ¯ä½¿ç¼ç ä¸ä½åºäºè¦è¢«ç¼ç çéæ ·ï¼èä¸åºäºå¨è¯¥å¸§çç»å¯¹éè¿çéæ ·ï¼å¦ç±é´å½±é¨åææç¤ºçãä»¥è¿ç§æ¹æ³ï¼å¨ä¸åçå¸§ä¹é´å°æ¯æ¯è¾æåçè½¬æ¢ãä½ä¸ºå¤éæ¹æ¡æèè¡¥åï¼ææ¶å©ç¨åæææ¯æ¥éä½ç±å¸§è¾¹çå¼èµ·çå¯æç¥çäººå·¥äº§ç©ãç¶èï¼ææè¿äºè¿ç¨é½éè¦å¤§éçéå è®¡ç®èµæºï¼å¹¶ä¸å¯¹äºæäºç¹å®ç¼ç ææ¯èè¨ï¼ä¹è®¸é¾äºæä¾ä»»ä½çèµæºãAs mentioned in the Summary of the Invention, encoding is typically done one frame at a time. A frame consists of audio samples within a predetermined period of time. At the bottom of Fig. 3a, a frame SF2 of duration L is illustrated. Audio samples within the unshaded portion are to be encoded together. Previous samples and subsequent samples are coded in other frames. In any case, dividing samples into frames introduces some discontinuities at frame boundaries. A variable sound will give a variable encoding parameter that changes essentially at every frame boundary. This will produce a perceivable error. One way to somewhat compensate for this situation is to base the encoding not only on the samples to be encoded, but also on samples in the absolute vicinity of the frame, as indicated by the shaded portion. In this way there will be softer transitions between frames. As an alternative or in addition, interpolation techniques are sometimes utilized to reduce perceived artifacts caused by frame boundaries. However, all of these processes require significant additional computing resources, and for certain encoding techniques, it may be difficult to provide any resources.

å æ¤ï¼ä½¿ç¨å°½å¯è½é¿çå¸§å°æ¯æççï¼å æ¤å¸§è¾¹ççæ°ç®ä¼å°ãèä¸ç¼ç æçéå¸¸ä¼åé«ï¼å¹¶ä¸å¿è¦çä¼ è¾æ¯ç¹çéå¸¸ä¹è¢«æå°åãç¶èï¼é¿å¸§æå¸¦æ¥çé®é¢æ¯é¢åå£°äººå·¥äº§ç©åèå¹»å£°é³ãTherefore, it would be beneficial to use as long a frame as possible, so the number of frame boundaries would be small. Also the coding efficiency is usually high and the necessary transmission bit rate is usually minimized. The problem with long frames, however, is pre-echo artifacts and phantom sounds.

éè¿æ¿ä»£å°å©ç¨è¾ççå¸§ï¼ä¾å¦åå«å·æL/2åL/4çæç»æ¶é´çSF1æçè³SF0ï¼æ¬é¢åçææ¯äººåè®¤è¯å°ï¼ç¼ç æçä¼è¢«éä½ï¼ä¼ è¾æ¯ç¹çå¿é¡»æ¯è¾é«ï¼å¹¶ä¸å¸§è¾¹çäººå·¥äº§ç©çé®é¢å°å¢å ãç¶èï¼è¾ççå¸§è¾å°ç»åä¾å¦å¶å®å¯æç¥çäººå·¥äº§ç©ï¼æ¯å¦èå¹»çå£°é³åé¢åå£°ãä¸ºäºè½å¤å°½å¯è½å¤å°æå°åç¼ç è¯¯å·®ï¼åºå½ä½¿ç¨å°½å¯è½ççå¸§é¿ãBy using shorter frames instead, such as SF1 or even SF0 with durations of L/2 and L/4 respectively, those skilled in the art realize that the coding efficiency will be reduced, the transmission bit rate must be higher, and The problem of frame boundary artifacts will increase. However, shorter frames are less subject to, for example, other perceivable artifacts such as phantom sounds and pre-echoes. In order to be able to minimize coding errors as much as possible, the shortest possible frame length should be used.

æ ¹æ®æ¬åæï¼éè¿ä½¿ç¨ä¾èµäºå½åä¿¡å·åå®¹çå¸§é¿åº¦æ¥ç¼ç ä¾§ä¿¡å·å¯ä»¥æ¹è¿é³é¢æç¥ãç±äºä¸åå¸§é¿åº¦å¯¹äºé³é¢æç¥çå½±åå°æ ¹æ®è¦è¢«ç¼ç çå£°é³çç¹æ§èä¸åï¼å æ¤éè¿è®©ä¿¡å·æ¬èº«çç¹æ§æ¥å½±åæä½¿ç¨çå¸§é¿åº¦å¯ä»¥è·å¾æ¹è¿ãä¸»ä¿¡å·çç¼ç ä¸æ¯æ¬åæçç®çï¼å æ¤ä¸è¿è¡è¯¦ç»æè¿°ãç¶èï¼ä¸»ä¿¡å·æç¨çå¸§é¿åº¦å¯ä»¥ä¸ä¾§ä¿¡å·æä½¿ç¨çå¸§é¿åº¦ç¸çï¼æèå¯ä»¥ä¸ç¸çãAccording to the invention, audio perception can be improved by encoding the side signal with a frame length that depends on the current signal content. Since the impact of different frame lengths on audio perception will vary depending on the characteristics of the sound to be encoded, improvements can be obtained by letting the characteristics of the signal itself influence the frame length used. The encoding of the main signal is not the object of the invention and therefore not described in detail. However, the frame length used by the main signal may be equal to the frame length used by the side signal, or may not be equal.

ç±äºå°çç¬æ¶ååï¼æä»¥ä¾å¦å¨ä¸äºæå½¢ä¸ä½¿ç¨ç¸å¯¹è¾é¿çå¸§å¯¹ä¾§ä¿¡å·è¿è¡ç¼ç æ¯æççãå¯¹äºå·æå¤§éæ©æ£çå£°åºçè®°å½æ¯å¦é³ä¹ä¼è®°å½ä¼åºç°è¿ç§æå½¢ãå¨å¶å®æå½¢ä¸ï¼ä¾å¦å¨ç«ä½å£°è¯é³ä¼è¯ä¸ï¼çå¸§åå¯è½æ¯ä¼éçãå¯ä»¥ç¨ä¸¤ç§åºæ¬æ¹æ³æ¥å¤æéååªç§å¸§é¿åº¦ãDue to the small temporal variations, it is beneficial in some situations, for example, to use relatively long frames for encoding the side signal. This is the case for recordings with a heavily diffuse sound field, such as concert recordings. In other situations, such as in stereo speech conversations, short frames may be preferable. There are two basic ways to decide which frame length to choose.

å¨å¾3bä¸è¯´ææ ¹æ®æ¬åæçä¾§ä¿¡å·ç¼ç å¨åå30çä¸ä¸ªå®æ½ä¾ï¼å¶ä¸å©ç¨äºéç¯å¤æãå¨æ¤ä½¿ç¨äºé¿åº¦ä¸ºLçåºæ¬ç¼ç å¸§ãäº§çäºå¤ä¸ªç¼ç æ¹æ¡81ï¼ç±åå¸§çåå¼çéå80æ¥è¡¨å¾ãåå¸§çæ¯ä¸ªéå80åæ¬ä¸ä¸ªæèå¤ä¸ªåå¸§ï¼å®ä»¬å·æç¸åæä¸åçé¿åº¦ãç¶èåå¸§çéå80çæ»é¿åº¦æ»æ¯çäºåºæ¬ç¼ç å¸§é¿åº¦Lãåèå¾3bï¼é¡¶é¨ç¼ç æ¹æ¡è¢«è¡¨å¾ä¸ºåªåå«ä¸ä¸ªé¿åº¦ä¸ºLçåå¸§çåå¸§éåãä¸ä¸ä¸ªåå¸§éååå«ä¸¤ä¸ªé¿åº¦ä¸ºL/2çåå¸§ãç¬¬ä¸éååå«ä¸¤ä¸ªé¿åº¦ä¸ºL/4çåå¸§ä»¥ååé¢çä¸ä¸ªé¿åº¦ä¸ºL/2çåå¸§ãAn embodiment of a side signal encoder unit 30 according to the invention is illustrated in Fig. 3b, in which a closed-loop decision is utilized. A basic coded frame of length L is used here. A plurality of coding schemes 81 are generated, characterized by separate sets 80 of subframes. Each set 80 of subframes includes one or more subframes, which may be of the same or different length. However the total length of the set 80 of subframes is always equal to the basic coded frame length L. Referring to Fig. 3b, the top coding scheme is characterized as a set of subframes containing only one subframe of length L. The next set of subframes contains two subframes of length L/2. The third set includes two subframes of length L/4 followed by a subframe of length L/2.

éè¿ææçç¼ç æ¹æ¡81å¯¹è¢«æä¾ç»ä¾§ä¿¡å·ç¼ç å¨åå30çä¿¡å·x_sideè¿è¡ç¼ç ãå¨é¡¶é¨çç¼ç æ¹æ¡ä¸ï¼ä»¥ä¸åæ¥ç¼ç æ´ä¸ªåºæ¬ç¼ç å¸§ãç¶èå¨å¶å®çç¼ç æ¹æ¡ä¸ï¼å¨ç¸äºåå¼çåä¸ªåå¸§ä¸å¯¹ä¿¡å·x_sideè¿è¡ç¼ç ãæ¥èªæ¯ä¸ªç¼ç æ¹æ¡çç»æè¢«æä¾ç»éæ©å¨85ãä¿çåº¦æµéè£ç½®83ç¡®å®æ¯ä¸ªç¼ç ä¿¡å·çä¿çåº¦æµéå¼(measure)ãä¿çåº¦æµéå¼æ¯ä¸ä¸ªå®¢è§çè´¨éå¼ï¼ä¼éçä¸ºä¿¡åªæ¯æµéå¼æèå æçä¿¡åªæ¯ãæ¯è¾ä¸æ¯ç§ç¼ç æ¹æ¡ç¸å³çä¿çåº¦æµéå¼ï¼å¹¶ä¸å¶ç»ææ§å¶ä¸ä¸ªåæ¢è£ç½®87ï¼ç¨äºä»ç»åºæå¥½çä¿çåº¦æµéå¼çç¼ç æ¹æ¡ä¸éæ©è¡¨ç¤ºè¯¥ä¾§ä¿¡å·çç¼ç åæ°ï¼ä»¥ä½ä¸ºæ¥èªä¾§ä¿¡å·ç¼ç å¨åå30çè¾åºä¿¡å·p_sideãThe signal x _side provided to the side signal encoder unit 30 is encoded by all encoding schemes 81 . In the coding scheme at the top, the entire basic coding frame is coded in one block. In other coding schemes, however, the signal x _side is coded in separate subframes. The results from each coding scheme are provided to a selector 85 . Fidelity measurement means 83 determine a fidelity measure for each encoded signal. The fidelity measure is an objective quality value, preferably a signal-to-noise ratio measure or a weighted signal-to-noise ratio. The fidelity measures associated with each encoding scheme are compared and the result controls a switching means 87 for selecting the encoding parameters representing the side signal from the encoding scheme giving the best fidelity measure to as output signal p _side from side signal encoder unit 30 .

ä¼éå°ï¼æµè¯å¸§é¿åº¦çææå¯è½çç»åï¼å¹¶éæ©ç»åºæå¥½çå®¢è§è´¨é(ä¾å¦ä¿¡åªæ¯)çåå¸§çéåãPreferably, all possible combinations of frame lengths are tested and the set of subframes that gives the best objective quality (eg signal to noise ratio) is selected.

å¨æ¬å®æ½ä¾ä¸ï¼æ ¹æ®ä¸å¼éæ©æç¨çåå¸§çé¿åº¦ï¼In this embodiment, the length of the subframe used is selected according to the following formula:

l_sfï¼l_f/2ⁿï¼l _sf = l _f /2 ⁿ ,

å¶ä¸l_sfæ¯åå¸§çé¿åº¦ï¼l_fæ¯ç¼ç å¸§çé¿åº¦ï¼ä»¥ånæ¯ä¸ä¸ªæ´æ°ãå¨æ¬å®æ½ä¾ä¸ï¼å¨0å3ä¹é´éæ©nãç¶èï¼å°å¯è½ä½¿ç¨ä»»ä½å¸§é¿åº¦ï¼åªè¦éåçæ»é¿åº¦ä¿ææå®ãwhere l _sf is the length of the subframe, l _f is the length of the coded frame, and n is an integer. In this embodiment, n is chosen between 0 and 3. However, it would be possible to use any frame length as long as the total length of the set remains constant.

å¨å¾3cä¸è¯´æäºæ ¹æ®æ¬åæçä¾§ä¿¡å·ç¼ç å¨åå30çå¦ä¸ä¸ªå®æ½ä¾ãå¨æ¤ï¼å¸§é¿åº¦å¤ææ¯ä¸ä¸ªåºäºä¿¡å·çç»è®¡ç¹æ§çå¼ç¯å¤æãæ¢è¨ä¹ï¼å°ä½¿ç¨ä¾§ä¿¡å·çé¢è°±ç¹å¾ä»¥ä½ä¸ºç¨äºå³å®æç®ä½¿ç¨åªç§ç¼ç æ¹æ¡çåºç¡ãå¦åæè¿°ï¼å¯ä»¥è·å¾è¢«è¡¨å¾ä¸ºä¸ååå¸§çéåçä¸åç¼ç æ¹æ¡ãç¶èï¼å¨è¿ä¸å®æ½ä¾ä¸ï¼éæ©å¨85è¢«æ¾ç½®å¨å®éç¼ç ä¹åãè¾å¥çä¾§ä¿¡å·x_sideè¿å¥éæ©å¨85åä¿¡å·åæåå84ãåæçç»ææä¸ºå¼å³86çè¾å¥ï¼å¨å¼å³ä¸åªä½¿ç¨ä¸ç§ç¼ç æ¹æ¡81ãæ¥èªè¯¥ç¼ç æ¹æ¡çè¾åºä¹å°æ¯æ¥èªä¾§ä¿¡å·ç¼ç å¨åå30çè¾åºä¿¡å·p_sideãAnother embodiment of a side signal encoder unit 30 according to the invention is illustrated in Fig. 3c. Here, the frame length judgment is an open-loop judgment based on the statistical characteristics of the signal. In other words, the spectral characteristics of the side signal will be used as a basis for deciding which coding scheme to use. As mentioned before, different coding schemes can be obtained which are characterized as sets of different subframes. However, in this embodiment the selector 85 is placed before the actual encoding. The input side signal x _side enters the selector 85 and the signal analysis unit 84 . The result of the analysis becomes the input of a switch 86 in which only one encoding scheme 81 is used. The output from this encoding scheme will also be the output signal p _side from the side signal encoder unit 30 .

å¼ç¯å¤æçä¼ç¹å¨äºåªè¦æ§è¡ä¸æ¬¡å®éç¼ç ãç¶èç¼ºç¹å¨äºï¼ä¿¡å·ç¹å¾çåæå®éä¸ä¼éå¸¸å¤æï¼å¹¶ä¸é¾ä»¥äºåé¢æµå¯è½çç¹æ§ä»¥ä¾¿è½å¤å¨å¼å³86ä¸ç»åºéå½çéæ©ãå¨ä¿¡å·åæåå84ä¸å¿é¡»æ§è¡ååå«è®¸å¤çå£°é³ç»è®¡åæãç¼ç æ¹æ¡ä¸ä»»ä½å°çååé½å¯è½å®å¨é¢ åç»è®¡ç¹æ§ãThe advantage of open-loop judgment is that the actual encoding needs to be performed only once. A disadvantage, however, is that the analysis of the signal characteristics can actually be very complex and it is difficult to predict possible characteristics in advance in order to be able to make an appropriate selection in the switch 86 . In the signal analysis unit 84 a number of sound statistical analyzes have to be performed and included. Any small change in the encoding scheme can completely reverse the statistical properties.

éè¿ä½¿ç¨éç¯éæ©(å¾3b)ï¼å¯ä»¥äºæ¢ç¼ç æ¹æ¡èæ éå¯¹ååçå¶ä½é¨åè¿è¡ä»»ä½ååãå¦ä¸æ¹é¢ï¼å¦æè¦ç ç©¶è®¸å¤ç¼ç æ¹æ¡ï¼åè®¡ç®è¦æ±ä¼å¾é«ãBy using closed-loop selection (Fig. 3b), the coding schemes can be interchanged without any changes to the rest of the unit. On the other hand, if many encoding schemes are to be investigated, the computational requirements can be high.

è¿ç§å¯¹ä¾§ä¿¡å·è¿è¡å¯åå¸§é¿ç¼ç ççå¤å¨äºï¼å¯ä»¥å¨ä¸¤ç§æå½¢ä¹é´è¿è¡éæ©ï¼ä¸æ¹é¢æ¯ç²¾ç»çæ¶é´åè¾¨çåç²ç³çé¢çåè¾¨çï¼å¦ä¸æ¹é¢æ¯ç²ç³çæ¶é´åè¾¨çåç²¾ç»çé¢çåè¾¨çãä»¥ä¸çå®æ½ä¾å°ä»¥æä½³å¯è½çæ¹å¼æ¥ä¿æç«ä½å£°å¾åãThe benefit of this variable frame length coding of the side signal is that it is possible to choose between two situations: fine time resolution and coarse frequency resolution on the one hand, and coarse time resolution and coarse frequency resolution on the other hand. Fine frequency resolution. The above embodiments will preserve the stereo image in the best possible way.

å¯¹äºå¨ä¸åç¼ç æ¹æ¡ä¸æä½¿ç¨çå®éç¼ç è¿ä¼æä¸äºè¦æ±ãç¹å«æ¯ï¼å½ä½¿ç¨éç¯éæ©æ¶ï¼ç¨äºæ§è¡å¤ä¸ªæå¤æå°åæ¶ç¼ç çè®¡ç®èµæºå¿é¡»å¤§ãç¼ç è¿ç¨è¶å¤æï¼æéè¦çè®¡ç®è½åå°±è¶å¤ãæ¤å¤ï¼å¨ä¼ è¾æ¶çä½æ¯ç¹çä¹æ¯ä¼éçãThere are also requirements for the actual encoding used in the different encoding schemes. In particular, when using closed-loop selection, the computing resources for performing multiple more or less simultaneous encodings must be large. The more complex the encoding process, the more computing power is required. Furthermore, a low bit rate in transmission is also preferred.

å¨USÂ 5,434,948ä¸ç»åºçæ¹æ³ä½¿ç¨äºåå£°é(ä¸»)ä¿¡å·çæ»¤æ³¢å½¢å¼æ¥æ¯æä¾§ä¿¡å·æèå·®ä¿¡å·ãæ»¤æ³¢å¨çåæ°è¢«ä¼åï¼å¹¶ä¸åè®¸éæ¶é´ååãç¶åè¡¨ç¤ºä¾§ä¿¡å·çç¼ç çæ»¤æ³¢å¨åæ°è¢«åéãå¨ä¸ä¸ªå®æ½ä¾ä¸ï¼ä¹åéä¸ä¸ªæ®çä¾§ä¿¡å·ãå¨è®¸å¤æå½¢ä¸ï¼è¿ç§æ¹æ³å°å¯è½ç¨ä½å¨æ¬åæèå´åçä¾§ä¿¡å·ç¼ç æ¹æ³ãç¶èï¼è¯¥æ¹æ³å·æä¸äºç¼ºé·ãç±äºæ»¤æ³¢å¨é¶æ°å¿é¡»å¾é«æ¥æä¾ç²¾ç¡®çä¾§ä¿¡å·ä¼°è®¡ï¼æä»¥æ»¤æ³¢å¨ç³»æ°åä»»ä½æ®çä¾§ä¿¡å·çéåéå¸¸éè¦ç¸å¯¹è¾é«çä¼ è¾æ¯ç¹çãæ»¤æ³¢å¨èªèº«çä¼°è®¡ä¹ä¼æé®é¢ï¼ç¹å«æ¯å¨ç¬æ¶ä¸°å¯çé³ä¹ä¸ãä¼°è®¡è¯¯å·®å°ç»åºä¸ä¸ªä¿®æ¹çä¾§ä¿¡å·ï¼å¶ææ¶å¨å¹åº¦æ¹é¢æ¯æªä¿®æ¹çä¿¡å·å¤§ãè¿å°å¯¼è´è¾é«çæ¯ç¹çéè¦ãèä¸ï¼å¦ææ¯Nä¸ªéæ ·è®¡ç®ä¸ç»æ°çæ»¤æ³¢å¨ç³»æ°ï¼åéè¦åæè¿äºæ»¤æ³¢å¨ç³»æ°ä»¥äº§çä»ä¸ç»æ»¤æ³¢å¨ç³»æ°å°å¦ä¸ç»çå¹³æ»è½¬æ¢ï¼å¦ä¸é¢æè®¨è®ºçãæ»¤æ³¢å¨ç³»æ°çåææ¯ä¸é¡¹å¤æçä»»å¡ï¼å¹¶ä¸å¨åæä¸çè¯¯å·®å°ä¼è¡¨ç°ä¸ºå¤§çä¾§è¯¯å·®ä¿¡å·ï¼ä»èå¯¼è´å·®å¼è¯¯å·®ä¿¡å·ç¼ç å¨æéçè¾é«æ¯ç¹çãThe method given in US 5,434,948 uses a filtered version of the mono (main) signal to compare the side or difference signal. The parameters of the filter are optimized and allowed to vary over time. The encoded filter parameters representing the side signal are then transmitted. In one embodiment, a residual side signal is also sent. In many cases, this method will likely be used as a side signal encoding method within the scope of the present invention. However, this method has some drawbacks. Since the filter order must be high to provide accurate side signal estimates, quantization of the filter coefficients and any residual side signal typically requires a relatively high transmission bit rate. Estimation of the filter itself can also be problematic, especially in temporally rich music. The estimation error will give a modified side signal which is sometimes larger in magnitude than the unmodified signal. This will result in a higher bitrate required. Also, if a new set of filter coefficients is computed every N samples, these filter coefficients need to be interpolated to produce a smooth transition from one set of filter coefficients to another, as discussed above. Interpolation of the filter coefficients is a complex task and errors in the interpolation will appear as large side error signals, leading to higher bit rates required by the difference error signal encoder.

é¿ååæçéè¦çä¸ç§æ¹æ³æ¯åºäºéä¸ªéæ ·æ¥æ´æ°æ»¤æ³¢å¨ç³»æ°ï¼å¹¶ä¸ä¾é ååèªéåºåæãä¸ºäºå¯ä»¥è¯å¥½è¿è¡ï¼è¦æ±æ®çç¼ç å¨æç¸å½é«çæ¯ç¹çãå æ¤ï¼è¿å¯¹äºä½éçç«ä½å£°ç¼ç ä¸æ¯ä¸ä¸ªå¥½çå¤éæ¹æ¡ãOne way to avoid the need for interpolation is to update the filter coefficients on a sample-by-sample basis and rely on backward adaptive analysis. Residual encoders are required to have a fairly high bitrate in order to work well. Therefore, this is not a good candidate for low-rate stereo encoding.

åå¨ä»¥ä¸ä¾å¦å¯¹äºé³ä¹æ¥è¯´å¾å¸¸è§çæå½¢ï¼å¶ä¸åå£°éä¿¡å·åå·®ä¿¡å·å ä¹æ¯ä¸ç¸å³çãäºæ¯æ»¤æ³¢å¨ä¼°è®¡åå¾éå¸¸å°é¾ï¼éå çé£é©åªæ¯ä½¿å¾å·®å¼è¯¯å·®ä¿¡å·ç¼ç å¨çæåµæ´ç³ãThere are situations, which are common eg for music, where the mono signal and the difference signal are almost uncorrelated. Filter estimation then becomes very difficult, with the added risk of only making the difference error signal encoder worse.

æ ¹æ®USÂ 5,434,948çè§£å³æ¹æ¡å¯ä»¥å¨ä¸é¢çæå½¢ä¸è¯å¥½å·¥ä½ï¼å¶ä¸æ»¤æ³¢å¨ç³»æ°éçæ¶é´çååå¾æ¢ï¼ä¾å¦å¨ä¼è®®çµè¯ç³»ç»ä¸ãå¨é³ä¹ä¿¡å·çæå½¢ä¸ï¼è¯¥æ¹æ³å¹¶ä¸å¾å¥½å°å·¥ä½ï¼å ä¸ºæ»¤æ³¢å¨éè¦å¿«éæ¹åä»¥è·è¸ªç«ä½å£°å¾åãè¿æå³çï¼å¿é¡»ä½¿ç¨å¹åº¦éå¸¸ä¸åçåå¸§é¿åº¦ï¼å¶æå³çè¦æµè¯çç»åæ°ç®å¿«éå¢å ãè¿åæå³çç¨äºè®¡ç®ææå¯è½çç¼ç æ¹æ¡çè¦æ±åå¾é«å¾ä¸åå®éãThe solution according to US 5,434,948 may work well in situations where the filter coefficients vary slowly over time, eg in a conference call system. In the case of music signals, this approach does not work very well, since the filters need to change rapidly to track the stereo image. This means that subframe lengths of very different magnitudes have to be used, which means that the number of combinations to be tested increases rapidly. This in turn means that the requirements for computing all possible encoding schemes become impractically high.

å æ¤ï¼å¨ä¼éå®æ½ä¾ä¸ï¼åºäºä»¥ä¸ææ³æ¥ç¼ç ä¾§ä¿¡å·ï¼å³éè¿ä½¿ç¨ä¸ä¸ªç®åçå¹³è¡¡å åæ¥ä»£æ¿å¤æçæ¯ç¹çæ¶èçé¢æµæ»¤æ³¢å¨ï¼ä»èéä½åå£°éä¿¡å·åä¾§ä¿¡å·ä¹é´çåä½ãç¶åç¼ç è¿ä¸æä½çæ®çãæè¿°æ®ççå¹åº¦ç¸å¯¹è¾ä½ï¼å¹¶ä¸ä¸éè¦éå¸¸é«çæ¯ç¹çéæ±æ¥è¿è¡ä¼ éãè¿ä¸ææ³çç¡®éå¸¸éäºååé¢æè¿°çå¯åå¸§éåæ¹æ³ç¸ç»åï¼å ä¸ºè®¡ç®å¤æåº¦ä½ãTherefore, in the preferred embodiment, the side signal is encoded based on the idea that the redundancy between the mono signal and the side signal is reduced by replacing the complex bitrate-consuming prediction filter with a simple balancing factor . The remainder of this operation is then encoded. The magnitude of the residue is relatively low and does not require very high bit rate requirements for transmission. This idea is indeed very suitable for combining with the above-mentioned variable frame set method, because the computational complexity is low.

ä½¿ç¨ä¸å¯åå¸§é¿åº¦æ¹æ³ç»åçå¹³è¡¡å åæ¶é¤äºå¯¹å¤æåæçéè¦ä»¥ååæå¯è½å¼èµ·çç¸å³é®é¢ãèä¸ï¼ä½¿ç¨ç®åçå¹³è¡¡å åä»£æ¿å¤æçæ»¤æ³¢å¨äº§çæ´å°çä¼°è®¡é®é¢ï¼å ä¸ºå¹³è¡¡å åçå¯è½çä¼°è®¡è¯¯å·®å·ææ´å°çå½±åãä¼éçè§£å³æ¹æ¡å°è½å¤ä»¥è¯å¥½çè´¨éååéçæ¯ç¹çè¦æ±ä»¥åè®¡ç®èµæºæ¥åç°å¹³æ»ä¿¡å·(pannedÂ signal)åæ©æ£å£°åºãUsing a balance factor combined with a variable frame length method eliminates the need for complex interpolation and the associated problems that interpolation can cause. Also, using simple balance factors instead of complex filters creates fewer estimation problems, since possible estimation errors of the balance factors have less impact. A preferred solution will be able to reproduce panned signals and diffuse sound fields with good quality and constrained bitrate requirements and computational resources.

å¾4è¯´æäºæ ¹æ®æ¬åæçç«ä½å£°ç¼ç å¨çä¼éå®æ½ä¾ãè¯¥å®æ½ä¾ä¸å¾2aæç¤ºçå®æ½ä¾éå¸¸ç±»ä¼¼ï¼ç¶èï¼æç¤ºäºä¾§ä¿¡å·ç¼ç å¨åå30çç»èãè¯¥å®æ½ä¾çç¼ç å¨14ä¸å·å¤ä»»ä½çé¢å¤çååï¼å¹¶ä¸è¾å¥ä¿¡å·è¢«ç´æ¥æä¾ç»å æ³ååæ³åå34ã36ãå¨ä¹æ³å¨33ä¸åå£°éä¿¡å·x_åå£°éåæä¸å¹³è¡¡å åg_smç¸ä¹ãå¨åæ³åå35ä¸ï¼ç¸ä¹åçåå£°éä¿¡å·è¢«ä»ä¾§ä¿¡å·x_ä¾§ä¸åå»(å³åºæ¬ä¸æ¯è¿ä¸¤ä¸ªå£°éä¹é´çå·®å¼)ï¼ä»¥äº§çä¾§æ®çä¿¡å·ãéè¿ä¼åå¨37åºäºåå£°éä¿¡å·åä¾§ä¿¡å·çåå®¹æ¥ç¡®å®å¹³è¡¡å åg_smï¼ä»¥ä¾¿æ ¹æ®è´¨éæ åæ¥æå°åä¾§æ®çä¿¡å·ãæè¿°è´¨éæ åä¼éä¸ºæå°åæ¹æ åãæ ¹æ®ä»»ä¸ç¼ç å¨è¿ç¨å¨ä¾§æ®çç¼ç å¨39ä¸å¯¹ä¾§æ®çä¿¡å·è¿è¡ç¼ç ãä¼éå°ï¼ä¾§æ®çç¼ç å¨39æ¯ä¸ä¸ªä½æ¯ç¹çåæ¢ç¼ç å¨ï¼æèä¸ä¸ªç æ¬æ¿å±çº¿æ§é¢æµ(CELPï¼CodebookÂ ExcitedÂ LinearPrediction)ç¼ç å¨ãè¡¨ç¤ºä¾§ä¿¡å·çç¼ç åæ°p_sideååå«äºè¡¨ç¤ºä¾§æ®çä¿¡å·çç¼ç åæ°p_{sideÂ residual}åä¼åçå¹³è¡¡å å49ãFigure 4 illustrates a preferred embodiment of a stereo encoder according to the invention. This embodiment is very similar to the embodiment shown in Fig. 2a, however, the details of the side signal encoder unit 30 are revealed. The encoder 14 of this embodiment is not provided with any pre-processing unit and the input signal is provided directly to the addition and subtraction units 34,36. In the multiplier 33 the mono signal x _mono is multiplied by a certain balance factor g _sm . In a subtraction unit 35 the multiplied mono signal is subtracted from the side signal _xside (ie essentially the difference between these two channels) to produce a side residual signal. The balance factor _gsm is determined by the optimizer 37 based on the content of the mono signal and the side signal in order to minimize the side residual signal according to quality criteria. The quality standard is preferably the least mean square standard. The side residual signal is encoded in a side residual encoder 39 according to either encoder process. Preferably, the side residual encoder 39 is a low bit rate transform encoder, or a Codebook Excited Linear Prediction (CELP: Codebook Excited Linear Prediction) encoder. The coding parameter p _side representing the side signal includes the coding parameter p side residual representing _{the side residual} signal and an optimized balance factor 49 .

å¨å¾4çå®æ½ä¾ä¸ï¼ç¨äºåæä¾§ä¿¡å·çåå£°éä¿¡å·42æ¯åå£°éç¼ç å¨38çç®æ ä¿¡å·x_monoãå¦ä¸æè¿°(ç»åå¾2a)ï¼ä¹å¯ä»¥å©ç¨åå£°éç¼ç å¨38çæ¬å°åæä¿¡å·ãå¨åä¸æå½¢ä¸ä¼å¢å æ»ç¼ç å¨æ¶å»¶ï¼å¹¶ä¼å¢å ä¾§ä¿¡å·çè®¡ç®å¤æåº¦ãå¦ä¸æ¹é¢ï¼è´¨éä¼æ¯è¾å¥½ï¼å ä¸ºæå¯è½ä¿®å¤å¨åå£°éç¼ç å¨ä¸äº§ççç¼ç éè¯¯ãIn the embodiment of FIG. 4 , the mono signal 42 used to synthesize the side signal is the target signal x _mono of the mono encoder 38 . As mentioned above (in connection with Fig. 2a), it is also possible to use the local composite signal of the mono encoder 38. In the latter case the total encoder delay will be increased and the computational complexity of the side signals will be increased. On the other hand, the quality will be better because it is possible to fix encoding errors made in the mono encoder.

å¦ä¸ä»¥æ´å ç²¾ç¡®çæ¹å¼æ¥æè¿°åºæ¬ç¼ç æ¹æ¡ãå°ä¸¤ä¸ªå£°éä¿¡å·è¡¨ç¤ºä¸ºaåbï¼å®ä»¬å¯ä»¥æ¯ç«ä½å£°å¯¹çå·¦å£°éåå³å£°éãéè¿ç¸å å°å£°éä¿¡å·ç»åæä¸ä¸ªåå£°éä¿¡å·ï¼å¹¶ä¸éè¿ç¸åèç»åæä¸ä¸ªä¾§ä¿¡å·ãè¯¥æä½ä»¥çå¼çå½¢å¼è¢«æè¿°ä¸ºï¼The basic encoding scheme is described in a more precise manner as follows. Denote the two channel signals as a and b, which may be the left and right channels of a stereo pair. The channel signals are combined into a mono signal by addition and into a side signal by subtraction. This operation is described in equation form as:

x_mono(n)ï¼0.5(a(n)+b(n))x _mono (n)=0.5(a(n)+b(n))

x_side(n)ï¼0.5(a(n)-b(n)).x _side (n)=0.5(a(n)-b(n)).

æççæ¯ä»¥2ä¸ºå åæ¥ç¼©å°x_monoåx_sideä¿¡å·ãå¨æ¤ï¼è¿æç¤ºçåå¨å¶å®äº§çx_monoåx_sideçæ¹æ³ãå¯ä»¥ä½¿ç¨ä¾å¦ï¼It is beneficial to scale down the x _mono and x _side signals by a factor of 2. Here, this implies that there are other ways to generate x _mono and x _side . You can use for example:

x_mono(n)ï¼Î³a(n)+(1-Î³)b(n)x _mono (n)=Î³a(n)+(1-Î³)b(n)

x_side(n)ï¼Î³a(n)-(1-Î³)b(n)x _side (n)=Î³a(n)-(1-Î³)b(n)

0â¤Î³â¤1.0.0â¤Î³â¤1.0.

å¨è¾å¥ä¿¡å·çåä¸ï¼æ ¹æ®ä¸å¼è®¡ç®ä¿®æ¹åçæèæ®ççä¾§ä¿¡å·ï¼On blocks of the input signal, the modified or residual side signal is computed according to:

x_sideresidual(n)ï¼x_side(n)-f(x_monoï¼x_side)x_mono(n)ï¼x _sideresidual (n)=x _side (n)-f(x _mono , x _side ) x _mono (n),

å¶ä¸f(x_monoï¼x_side)æ¯å¹³è¡¡å åå½æ°ï¼å¶åºäºæ¥èªä¾§ååå£°éä¿¡å·çNä¸ªéæ ·çå(å³åå¸§)æ¥äºåä»ä¾§ä¿¡å·ä¸å°½å¯è½å¤å°æ¶é¤ãæ¢è¨ä¹ï¼ä½¿ç¨å¹³è¡¡å åæ¥æå°åæ®çä¾§ä¿¡å·ãå¨ä»¥åæ¹ä¸ºåè¿è¡æå°åçç¹æ®æå½¢ä¸ï¼è¿çä»·äºæå°åæ®çä¾§ä¿¡å·x_{sideÂ residual}çè½éãwhere f(x _mono , x _side ) is a balance factor function that seeks to cancel as much as possible from the side signal based on blocks of N samples (ie subframes) from the side and mono signals. In other words, balance factors are used to minimize the residual side signal. In the special case of mean squared minimization, this is equivalent to minimizing the energy of the residual side signal x _{side residual} .

å¨ä¸è¿°ç¹æ®æå½¢ä¸ï¼f(_xmonoï¼x_side)è¢«æè¿°ä¸ºï¼In the special case above, f( _xmono , x _side ) is described as:

ff (( xx monomono ,, xx sidethe side )) == RR smsm RR mmmm

RR mmmm == [[ ΣΣ nno == framestartframe start frameendframe end xx monomono (( nno )) xx monomono (( nno )) ]]

RR smsm == [[ ΣΣ nno == framestartframe start frameendframe end xx sidethe side (( nno )) xx monomono (( nno )) ]] ,,

å¶ä¸x_sideæ¯ä¾§ä¿¡å·ï¼ä»¥åx_monoæ¯åå£°éä¿¡å·ãæ³¨æå°ï¼è¯¥å½æ°åºäºä»¥âå¸§å¼å§âå¼å§åä»¥âå¸§ç»æâç»æçåãwhere x _side is the side signal and x _mono is the mono signal. Note that this function is based on blocks starting with "frame start" and ending with "frame end".

æå¯è½å¨é¢åä¸å¢å å ææ¥è®¡ç®å¹³è¡¡å åãè¿æ¯éè¿å©ç¨å ææ»¤æ³¢å¨çèå²ååºå¯¹x_sideåx_monoä¿¡å·å·ç§¯æ¥å®æçãè¿æ ·æå¯è½å°ä¼°è®¡è¯¯å·®ç§»å¨å°æ´ä¸æè¢«å¬å°çé¢çèå´åãè¿è¢«ç§°ä¸ºæç¥å æãIt is possible to add weighting in the frequency domain to calculate the balance factor. This is done by convolving the x _side and x _mono signals with the impulse response of the weighting filter. This has the potential to move the estimation error into frequency ranges that are less audible. This is called perceptual weighting.

ç±å½æ°f(x_monoï¼x_side)ç»åºçå¹³è¡¡å åå¼çéåå½¢å¼è¢«åéå°è§£ç å¨ãå¨äº§çä¿®æ¹çä¾§ä¿¡å·æ¶æå¥½å·²ç»è¯´æäºè¿äºéåãç¶åè·å¾ä»¥ä¸çè¡¨è¾¾å¼ï¼A quantized version of the balance factor value given by the function f(x _mono , x _side ) is sent to the decoder. These quantizations are preferably already accounted for when generating the modified side signal. Then the following expressions are obtained:

x_sideresidual(n)ï¼x_side(n)-g_Qx_mono(n)x _sideresidual (n)ï¼x _side (n)-g _Q x _mono (n)

gg QQ == QQ gg -- 11 (( QQ gg (( RR smsm RR mmmm )) )) ..

Qg(...)æ¯ä¸ä¸ªéåå½æ°ï¼å¶è¢«åºç¨å°ç±å½æ°f(x_monoï¼x_side)æç»åºçå¹³è¡¡å åä¸ãå¨ä¼ è¾ä¿¡éä¸åéæè¿°å¹³è¡¡å åãå¨æ£å¸¸çå·¦å³å¹³æ»ä¿¡å·ä¸ï¼å¹³è¡¡å åè¢«éå¶å¨åºé´[-1.0Â 1.0]ä¸ãå¦ä¸æ¹é¢ï¼å¦æå£°éç¸å¯¹äºå½¼æ¤å¼ç¸ï¼åå¹³è¡¡å åä¼è¶åºè¿äºéå¶ãQg(...) is a quantization function that is applied to the balance factor given by the function f(x _mono , x _side ). The balance factor is sent in a transport channel. In a normal left-right smooth signal, the balance factor is clamped in the interval [-1.0 1.0]. On the other hand, if the channels are out of phase with respect to each other, the balance factor will exceed these limits.

ä½ä¸ºç¨äºç¨³å®ç«ä½å£°å¾åçä¸ä¸ªå¯éæ¹æ³ï¼å¯ä»¥å¨ä»¥ä¸æåµä¸å¯¹å¹³è¡¡å åè¿è¡éå¶ï¼å³å¦æåå£°éä¿¡å·åä¾§ä¿¡å·ä¹é´çå½ä¸åäºç¸å³ä¸ä½³ï¼å¦ä»¥ä¸çå¼æç»åºçï¼As an optional method for stabilizing the stereo image, the balance factor can be limited if the normalized cross-correlation between the mono signal and the side signal is poor, as given by the following equation of:

gg QQ == QQ gg -- 11 (( QQ gg (( || RR == smsm || RR smsm RR mmmm )) )) ,,

å¶ä¸ï¼in,

RR == smsm == RR smsm RR ssss ·&Center Dot; RR mmmm

RR smsm == [[ ΣΣ nno == framestartframe start frameendframe end xx sidcsidc (( nno )) xx monomono (( nno )) ]] ..

è¿äºæå½¢å¨å·æå¤§éæ©æ£å£°é³çå¤å¸é³ä¹ææé³å®¤é³ä¹ä¸åºç°éå¸¸é¢ç¹ï¼å¶ä¸å¨ä¸äºæå½¢ä¸ï¼å¨åå»ºåå£°éä¿¡å·æ¶aåbå£°éä¹è®¸å ä¹å½¼æ¤æµæ¶ãå¯¹äºå¹³è¡¡å åçå½±åå°±æ¯ä¼å¿«éè·³åï¼ä»èå¼èµ·æ··ä¹±çç«ä½å£°å¾åãä¸è¿°è°æ´åè½»äºæè¿°é®é¢ãThese situations arise very frequently in classical or studio music with a lot of diffuse sound, where in some cases the a and b channels may nearly cancel each other out when creating a mono signal. The effect on the balance factor is that it jumps quickly, causing a confusing stereo image. The above adjustments alleviate the problem.

å¨USÂ 5,434,948ä¸åºäºæ»¤æ³¢å¨çæ¹æ³å·æç±»ä¼¼çé®é¢ï¼ä½æ¯å¨é£ç§æå½¢ä¸è§£å³æ¹æ¡å¹¶ä¸é£ä¹ç®åãThe filter based approach in US 5,434,948 has a similar problem, but the solution is not so simple in that case.

å¦æE_sæ¯æ®çä¾§ä¿¡å·çç¼ç å½æ°(ä¾å¦åæ¢ç¼ç å¨)ï¼ä»¥åE_mæ¯åå£°éä¿¡å·çç¼ç å½æ°ï¼åå¨è§£ç å¨æ«å°¾è¢«è§£ç çaâåbâä¿¡å·å¯ä»¥è¢«æè¿°ä¸º(å¨æ¤åè®¾Î³ï¼0.5)ï¼If E _s is the encoding function of the residual side signal (e.g. a transform coder), and E _m is the encoding function of the mono signal, then the decoded a" and b" signals at the end of the decoder can be described as (here Assuming Î³ = 0.5):

aâ³(n)ï¼(1+g_Q)xâ³_mono(n)+xâ³_side(n)aâ³(n)=(1+g _Q )xâ³ _mono (n)+xâ³ _side (n)

bâ³(n)ï¼(1-g_Q)xâ³_mono(n)-xâ³_side(n)bâ³(n)=(1-g _Q )xâ³ _mono (n)-xâ³ _side (n)

xx sidethe side ′′ ′′ == EE. sthe s -- 11 (( EE. sthe s (( xx sideresidualside residual )) ))

xx monomono ′′ ′′ == EE. mm -- 11 (( EE. mm (( xx monomono )) ))

å¯¹äºæ¯ä¸å¸§è®¡ç®å¹³è¡¡å åçä¸ä¸ªéè¦çå¤å°±æ¯é¿åäºä½¿ç¨åæãä»£ä¹ä»¥ï¼ä¸è¬å°å¦ä¸æè¿°ï¼å©ç¨éå çå¸§æ¥æ§è¡å¸§å¤çãAn important benefit of computing the balance factor for each frame is that it avoids the use of interpolation. Instead, frame processing is performed with overlapping frames, generally as described above.

å¨é³ä¹ä¿¡å·çæå½¢ä¸ä½¿ç¨å¹³è¡¡å åçç¼ç åçå·¥ä½ç¹å«è¯å¥½ï¼å¶ä¸éå¸¸éè¦å¿«éçæ¹åæ¥è·è¸ªç«ä½å£°å¾åãThe coding principle using balance factors works particularly well in the case of music signals, where often fast changes are required to track the stereo image.

è¿æ¥ï¼å¤å£°éç¼ç å·²ç»åå¾æ®éãä¸ä¸ªå®ä¾æ¯DVDçµå½±ä¸ç5.1å£°éç¯ç»å£°ãè¿äºå£°éå¨é£éè¢«è®¾ç½®ä¸ºï¼åå·¦ãåä¸ãåå³ãåå·¦ãåå³ä»¥åäºä½é³æ¬å£°å¨ãå¨å¾5ä¸ï¼ç¤ºåºäºæ ¹æ®æ¬åæä»¥è¿ç§éç¨å£°éé´åä½çå¸ç½®å¯¹3ä¸ªåå£°éè¿è¡ç¼ç çç¼ç å¨çå®æ½ä¾ãRecently, multi-channel encoding has become common. An example is 5.1 channel surround sound in DVD movies. The channels are set there as: Front Left, Front Center, Front Right, Rear Left, Rear Right and Subwoofer. In Fig. 5 an embodiment of an encoder encoding 3 front channels in this arrangement with inter-channel redundancy according to the invention is shown.

å¨3ä¸ªè¾å¥ç«¯16A-Cä¸æä¾3ä¸ªå£°éä¿¡å·Lï¼Cï¼Rï¼å¹¶ä¸éè¿è¿ä¸ä¸ªä¿¡å·çåæ¥äº§çåå£°éä¿¡å·x_monoãå¢å äºä¸å¤®ä¿¡å·ç¼ç å¨åå130ï¼å¶æ¥æ¶ä¸å¤®ä¿¡å·x_centreãå¨æ¬å®æ½ä¾ä¸åå£°éä¿¡å·42æ¯æç¼ç åè§£ç çåå£°éä¿¡å·xâ_monoï¼å¹¶ä¸å¨ä¹æ³å¨133ä¸ä¸æä¸å¹³è¡¡å åg_Qç¸ä¹ãå¨åæ³åå135ä¸ï¼ç¸ä¹åçåå£°éä¿¡å·è¢«ä»ä¸å¤®ä¿¡å·x_centreä¸åå»ï¼ä»¥äº§çä¸å¤®æ®çä¿¡å·ãç±ä¼åå¨137åºäºåå£°éä¿¡å·åä¸å¤®ä¿¡å·çåå®¹æ¥ç¡®å®å¹³è¡¡å åg_Qï¼ä»¥ä¾¿æ ¹æ®è´¨éæ åæ¥æå°åä¸å¤®æ®çä¿¡å·ãå¨ä¸å¤®æ®çç¼ç å¨139ä¸æ ¹æ®ä»»ä½ç¼ç è¿ç¨å¯¹ä¸å¤®æ®çä¿¡å·è¿è¡ç¼ç ãä¼éå°ï¼ä¸å¤®æ®çç¼ç å¨139æ¯ä½æ¯ç¹çåæ¢ç¼ç å¨æCELPç¼ç å¨ãè¡¨ç¤ºä¸å¤®ä¿¡å·çç¼ç åæ°p_{centreä¸å¤®}ååå«è¡¨ç¤ºä¸å¤®æ®çä¿¡å·çç¼ç åæ°p_{centreÂ residual}ä»¥åä¼åçå¹³è¡¡å å149ãå¨å æ³åå235ä¸å°ä¸å¤®æ®çä¿¡å·ä¸ç¼©æ¾åçåå£°éä¿¡å·ç¸å ï¼ä»èäº§çä¿®æ¹åçä¸å¤®ä¿¡å·142æ¥è¡¥å¿ç¼ç è¯¯å·®ãThree channel signals L, C, R are provided at the three inputs 16A-C, and a mono signal x _mono is generated by the sum of these three signals. A central signal encoder unit 130 is added, which receives the central signal _xcentre . In this embodiment the mono signal 42 is the encoded and decoded mono signal x" _mono and is multiplied in the multiplier 133 by a certain balance factor g _Q. In the subtraction unit 135, the multiplied The mono signal is subtracted from the central signal _xcentre to produce the central residual signal. The balance factor _gQ is determined by the optimizer 137 based on the content of the mono signal and the central signal, so as to minimize the central residual according to quality criteria Signal. The central residual signal is encoded according to any encoding process in the central residual encoder 139. Preferably, the central residual encoder 139 is a low bit-rate transform coder or CELP encoder. The encoding parameter p _{center representing the central} signal is then Contains an encoding parameter p _{center residual} representing the central residual signal and an optimized balance factor 149. The central residual signal is added to the scaled mono signal in the addition unit 235, thereby producing a modified central signal 142 to compensate for encoding errors .

å¦åé¢çå®æ½ä¾ä¸é£æ ·ï¼ä¾§ä¿¡å·x_side(å³å·¦Lä¸å³Rå£°éä¹é´çå·®)è¢«æä¾ç»ä¾§ä¿¡å·ç¼ç å¨åå30ãç¶èï¼å¨è¿éï¼ä¼åå¨37ä¹ä¾èµäºç±ä¸å¤®ä¿¡å·ç¼ç å¨åå130ææä¾çä¿®æ¹åçä¸å¤®ä¿¡å·142ãå æ¤å°å¨åæ³åå35ä¸äº§çä¾§æ®çä¿¡å·ä»¥ä½ä¸ºåå£°éä¿¡å·42ãä¿®æ¹åçä¸å¤®ä¿¡å·142ä»¥åä¾§ä¿¡å·çæä½³çº¿æ§ç»åãAs in the previous embodiments, the side signal x _side (ie the difference between the left L and right R channels) is supplied to the side signal encoder unit 30 . Here, however, the optimizer 37 also relies on the modified central signal 142 provided by the central signal encoder unit 130 . The side residual signal will thus be generated in the subtraction unit 35 as an optimal linear combination of the mono signal 42, the modified central signal 142 and the side signal.

ä¸è¿°å¯åå¸§é¿åº¦çæ¦å¿µå¯ä»¥è¢«åºç¨å°ä¾§ä¿¡å·åä¸å¤®ä¿¡å·çä»»ä¸ä¸æèå¨é¨ä¸ãThe concept of variable frame length described above can be applied to either or both of the side and center signals.

å¾6è¯´æéäºä»å¾5çç¼ç å¨ååæ¥æ¶ç¼ç çé³é¢ä¿¡å·çè§£ç å¨ååãææ¥æ¶çä¿¡å·54è¢«åæè¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°p_monoãè¡¨ç¤ºä¸å¤®ä¿¡å·çç¼ç åæ°p_cnetreä»¥åè¡¨ç¤ºä¾§ä¿¡å·çç¼ç åæ°p_sideãå¨è§£ç å¨64ä¸ï¼è¡¨ç¤ºä¸»ä¿¡å·çç¼ç åæ°p_monoè¢«ç¨æ¥äº§çä¸»ä¿¡å·xâ_monoãå¨è§£ç å¨160ä¸ï¼è¡¨ç¤ºä¸å¤®ä¿¡å·çç¼ç åæ°p_centreè¢«ç¨äºåºäºä¸»ä¿¡å·xâ_monoæ¥äº§çä¸å¤®ä¿¡å·xâ_centreãå¨è§£ç å¨60ä¸ï¼æ ¹æ®ä¸»ä¿¡å·xâ_monoåä¸å¤®ä¿¡å·xâ_centreæ¥è§£ç è¡¨ç¤ºä¾§ä¿¡å·çç¼ç åæ°p_sideï¼ä»èäº§çä¾§ä¿¡å·xâ_sideãFIG. 6 illustrates a decoder unit adapted to receive an encoded audio signal from the encoder unit of FIG. 5 . The received signal 54 is divided into coding parameters p _mono representing the main signal, coding parameters p _cnetre representing the central signal, and coding parameters p _side representing the side signals. In decoder 64, the coding parameter p _mono representing the main signal is used to generate the main signal x" _mono . In decoder 160, the coding parameter p _center representing the central signal is used to generate the central signal based on the main signal x" _mono . Signal x" _centre . In the decoder 60, the coding parameter _pside representing the side signal is decoded from the main signal x" _mono and the central signal x" _centre , thereby generating the side signal x" _side .

è¯¥è¿ç¨å¯ä»¥å¨æ°å¦ä¸è¡¨ç¤ºå¦ä¸ï¼The process can be expressed mathematically as follows:

æ ¹æ®ä¸å¼å°è¾å¥ä¿¡å·x_leftãx_rightä»¥åx_centreç»åä¸ºä¸ä¸ªåå£°éï¼Combine the input signals x _left , x _right and x _center into one mono channel according to:

x_mono(n)ï¼Î±x_left(n)+Î²x_right(n)+Ïx_centre(n).x _mono (n)ï¼Î±x _left (n)+Î²x _right (n)+Ïx _center (n).

ä¸ºäºç®åèµ·è§ï¼å¨å©ä½é¨åä¸å°Î±ãÎ²ä»¥åÏè®¾ç½®ä¸º1.0ï¼ä½æ¯å®ä»¬å¯ä»¥è¢«è®¾ç½®ä¸ºä»»æå¼ãÎ±ãÎ²ä»¥åÏçå¼å¯ä»¥æ¯å¸¸æ°ï¼æèåå³äºä¿¡å·åå®¹ï¼ä»¥ä¾¿å¼ºè°ä¸ä¸ªæèä¸¤ä¸ªå£°éï¼ä»èè·å¾ä¸ä¸ªæä½³è´¨éãFor simplicity, Î±, Î², and Ï are set to 1.0 in the remainder, but they can be set to arbitrary values. The values of Î±, Î² and Ï can be constant or depend on the signal content in order to emphasize one or two channels and thus obtain an optimum quality.

å¦ä¸è®¡ç®å¨åå£°éåä¸å¤®ä¿¡å·ä¹é´çå½ä¸åçäºç¸å³ï¼The normalized cross-correlation between the mono and center signals is calculated as follows:

RR == cmcm == RR cntcnt RR cccc ·&Center Dot; RR mmmm ,,

å¶ä¸in

RR cccc == [[ ΣΣ nno == framestartframe start frameendframe end xx centrecenter (( nno )) xx centrecenter (( nno )) ]]

RR mmmm == [[ ΣΣ nno == framestartframe start frameendframe end xx monomono (( nno )) xx monomono (( nno )) ]]

RR cmcm == [[ ΣΣ nno == framestartframe start frameendframe end xx centrecenter (( nno )) xx monomono (( nno )) ]] ..

x_centreæ¯ä¸å¤®ä¿¡å·ï¼ä»¥åx_monoæ¯åå£°éä¿¡å·ãåå£°éä¿¡å·æ¥èªäºåå£°éç®æ ä¿¡å·ï¼ä½æ¯ä¹å¯è½ä½¿ç¨åå£°éç¼ç å¨çæ¬å°åæãx _center is the center signal, and x _mono is the mono signal. The mono signal is derived from the mono target signal, but may also be synthesized locally using the mono encoder.

è¦ç¼ç çä¸å¤®æ®çä¿¡å·ä¸ºï¼The central residual signal to encode is:

x_{centreresidual}(n)ï¼x_centre(n)-g_Qx_mono(n)x _{centreresidual} (n)ï¼x _center (n)-g _Q x _mono (n)

gg QQ == QQ gg -- 11 (( QQ gg (( RR cmcm RR mmmm )) )) ..

Qg(...)æ¯è¢«åºç¨äºå¹³è¡¡å åçéåå½æ°ãå¨ä¼ è¾ä¿¡éä¸åéæè¿°å¹³è¡¡å åãQg(...) is the quantization function applied to the balance factor. The balance factor is sent in a transport channel.

å¦æE_cæ¯ä¸å¤®æ®çä¿¡å·çç¼ç å½æ°(ä¾å¦åæ¢ç¼ç å¨)ï¼ä»¥åE_mæ¯åå£°éä¿¡å·çç¼ç å½æ°ï¼åå¨è§£ç å¨æ«å°¾çè§£ç ä¿¡å·xâ_centreè¢«æè¿°ä¸ºï¼If E _c is the encoding function of the central residual signal (e.g. a transform coder), and E _m is the encoding function of the mono signal, then the decoded signal xâ _center at the end of the decoder is described as:

xâ³_centre(n)ï¼g_Qxâ³_mono(n)+xâ³_{centreresidual}(n)xâ³ _center (n)ï¼g _Q xâ³ _mono (n)+xâ³ _{centerresidual} (n)

xx centreresidualcenterresidual ′′ ′′ == EE. cc -- 11 (( EE. cc (( xx centreresidualcenterresidual )) ))

xx monomono ′′ ′′ == EE. mm -- 11 (( EE. mm (( xx monomono )) ))

è¦ç¼ç çä¾§æ®çä¿¡å·ä¸ºï¼The side residual signal to be encoded is:

x_sideresidual(n)ï¼(x_left(n)-x_right(n))-g_Qsmxâ³_mono(n)-g_Qscxâ³_centre(n)ï¼x _sideresidual (n)=(x _left (n)-x _right (n))-g _Qsm xâ³ _mono (n)-g _Qsc xâ³ _center (n),

å¶ä¸g_Qsmåg_Qscæ¯åæ°g_småg_scçéåå¼ï¼å¶æå°åäºè¡¨è¾¾å¼ï¼where g _Qsm and g _Qsc are quantized values of the parameters g _sm and g _sc that minimize the expression:

ΣΣ nno == framestartframe start frameendframe end [[ || (( xx leftleft (( nno )) -- xx rightright (( nno )) )) -- gg smsm xx monomono ′′ ′′ (( nno )) -- gg scsc xx centrecenter ′′ ′′ (( nno )) || ]] ηη ..

å¯¹äºè¯¯å·®çæå°åæ¹æå°åï¼Î·ä¾å¦å¯ä»¥çäº2ãg_småg_scåæ°å¯ä»¥è¢«å±åéåæèåå¼éåãFor least mean square minimization of errors, n may be equal to 2, for example. The _gsm and _gsc parameters can be quantized together or separately.

å¦æE_sæ¯ä¾§æ®çä¿¡å·çç¼ç å½æ°ï¼åè§£ç åçå£°éä¿¡å·xâ_å·¦åxâ_rightè¢«ç»åºä¸ºï¼If _Es is the encoding function of the side residual signal, the decoded channel signals x" _left and x" _right are given as:

xâ³_left(n)ï¼xâ³_mono(n)-xâ³_centre(n)+xâ³_side(n)xâ³ _left (n)ï¼xâ³ _mono (n)-xâ³ _center (n)+xâ³ _side (n)

xâ³_right(n)ï¼xâ³_mono(n)-xâ³_centre(n)-xâ³_side(n)xâ³ _right (n)ï¼xâ³ _mono (n)-xâ³ _center (n)-xâ³ _side (n)

xâ³_side(n)ï¼xâ³_sideresidual+g_Qsmxâ³_mono(n)+g_Qsxâ³_centre(n)xâ³ _side (n)ï¼xâ³ _sideresidual +g _Qsm xâ³ _mono (n)+g _Qs xâ³ _center (n)

xx sideresidualside residual ′′ ′′ == EE. sthe s -- 11 (( EE. sthe s (( xx sideresidualside residual )) )) ..

æä»¤äººè®¨åçå¯æç¥äººå·¥äº§ç©ä¹ä¸æ¯é¢åå£°æåºãå¨å¾7a-bä¸ï¼æè¿°å¾è¯´æäºè¿ç§äººå·¥äº§ç©ãåè®¾ä¿¡å·åéå·æå¦æ²çº¿100æç¤ºçæ¶é´åå±ãå¨å¼å§(ä»t0å¼å§)ï¼å¨é³é¢éæ ·ä¸ä¸åå¨ä¿¡å·åéãå¨t1åt2ä¹é´çæ¶é´tï¼çªç¶åºç°ä¿¡å·åéãå½ä½¿ç¨t2-t1çå¸§é¿åº¦å¯¹è¯¥ä¿¡å·åéç¼ç æ¶ï¼è¯¥ä¿¡å·åéçåºç°ä¼è¢«âæ¸éâå¨æ´ä¸ªå¸§ä¸ï¼å¦æ²çº¿101æç¤ºãå¦æäº§çè¯¥æ²çº¿101çè§£ç ï¼åè¯¥ä¿¡å·åéå¨è¯¥ä¿¡å·åéçé¢æåºç°ä¹ååºç°æ¶é´Îtï¼ç±æ¤æç¥å°âé¢åå£°âãOne of the most annoying perceptual artifacts is the pre-echo effect. In Figures 7a-b, the figures illustrate this artefact. Assume that the signal components have a temporal development as shown by curve 100 . At the beginning (starting from t0), there are no signal components in the audio samples. At time t between t1 and t2, a signal component suddenly appears. When this signal component is encoded using a frame length of t2-t1, the occurrence of this signal component will be "bleeded" over the entire frame, as shown by curve 101 . If a decoding of this curve 101 occurs, the signal component occurs a time Ît before its expected occurrence, whereby a "pre-echo" is perceived.

å¦æä½¿ç¨é¿çç¼ç å¸§ï¼åé¢åå£°çäººå·¥äº§ç©åå¾è¿ä¸æ¥å¢å¼ºãéè¿ä½¿ç¨è¾ççå¸§ï¼è¯¥äººå·¥äº§ç©ç¨å¾®å¾å°ææ¢ãå¤çä¸è¿°é¢åå£°é®é¢çå¦ä¸æ¹æ³æ¯å©ç¨ä»¥ä¸äºå®ï¼å³å¨ç¼ç å¨åè§£ç å¨æ«å°¾é½å¯ä»¥å©ç¨åå£°éä¿¡å·ãè¿ä½¿å¾æå¯è½æ ¹æ®è¯¥åå£°éä¿¡å·çè½éè½®å»æ¥ç¼©æ¾ä¾§ä¿¡å·ãå¨è§£ç å¨æ«å°¾ï¼æ§è¡ç¸åçç¼©æ¾ï¼å èå¯ä»¥åè½»ä¸äºé¢åå£°é®é¢ãIf long coded frames are used, the artifacts of the pre-echo become further enhanced. By using shorter frames, this artifact is somewhat suppressed. Another way to deal with the above-mentioned pre-echo problem is to take advantage of the fact that a mono signal is available at both the encoder and decoder end. This makes it possible to scale the side signal according to the energy contour of the mono signal. At the end of the decoder, the inverse scaling is performed, thus mitigating some pre-echo issues.

å¨æ´ä¸ªå¸§ä¸è®¡ç®è¯¥åå£°éä¿¡å·çè½éè½®å»ä¸ºï¼Compute the energy contour of this mono signal over the entire frame as:

E c ( m ) = [ Σ n = m - L m + L w ( n ) x mono 2 ( n ) ] , å¸§å¼å§â¤mâ¤å¸§æ«å°¾ï¼ E. c ( m ) = [ Σ no = m - L m + L w ( no ) x mono 2 ( no ) ] , frame start â¤ m â¤ frame end,

å¶ä¸w(n)æ¯å çªå½æ°ãæç®åçå çªå½æ°æ¯ä¸ä¸ªç©å½¢çªï¼ä½æ¯ä¹è®¸æ´ææå¶å®ççªå£ç±»åï¼ä¾å¦æ±æçªãwhere w(n) is the windowing function. The simplest windowing function is a rectangular window, but other window types, such as Hamming windows, may be more desirable.

x &OverBar; sideresidual ( n ) = x sideresidual ( n ) E c ( n ) , å¸§å¼å§â¤nâ¤å¸§æ«å°¾ã x &OverBar; side residual ( no ) = x side residual ( no ) E. c ( no ) , Frame start â¤ n â¤ frame end.

ä¸è¿°çå¼å¯ä»¥ä½¿ç¨æ´ä¸è¬çå½¢å¼è¢«åä¸ºï¼The above equation can be written in a more general form as:

x &OverBar; sideresidual ( n ) = x sideresidual ( n ) f ( E c ( n ) ) , å¸§å¼å§â¤nâ¤å¸§æ«å°¾ï¼ x &OverBar; side residual ( no ) = x side residual ( no ) f ( E. c ( no ) ) , frame start â¤ n â¤ frame end,

å¶ä¸f(...)æ¯åè°è¿ç»å½æ°ãå¨è§£ç å¨ä¸ï¼å¯¹æè§£ç çåå£°éä¿¡å·è®¡ç®è½éè½®å»ï¼å¹¶ä¸å°æè¿°è½®å»åºç¨å°è§£ç çä¾§ä¿¡å·ä¸ï¼where f(...) is a monotone continuous function. In the decoder, an energy profile is computed on the decoded mono signal and applied to the decoded side signal:

xâ³ _side(n)ï¼xâ³_side(n)f(E_c(n))ï¼å¸§å¼å§â¤nâ¤å¸§æ«å°¾ã xâ³ _side (n)=xâ³ _side (n)f(E _c (n)), frame startâ¤nâ¤frame end.

ç±äºå¨æç§ç¨åº¦ä¸ç¼©æ¾çæ¤è½éè½®å»æ¯ä½¿ç¨è¾çå¸§é¿åº¦çæ¿ä»£ï¼å æ¤è¿ä¸æ¦å¿µç¹å«éäºä¸å¯åå¸§é¿åº¦çæ¦å¿µç¸ç»åï¼å¦ä¸é¢è¿ä¸æ¥æè¿°çãéè¿æ¥æä¸äºåºç¨è½éè½®å»ç¼©æ¾çç¼ç æ¹æ¡ãä¸äºä¸åºç¨ä»¥åä¸äºä»å¨æäºåå¸§æé´åºç¨è½éè½®å»ç¼©æ¾çç¼ç æ¹æ¡ï¼å¯ä»¥æä¾ä¸ä¸ªæ´çµæ´»çç¼ç æ¹æ¡çéåãå¨å¾8ä¸è¯´æäºæ ¹æ®æ¬åæçä¸ä¸ªä¿¡å·ç¼ç å¨åå30çå®æ½ä¾ãå¨æ¤ï¼ä¸åç¼ç æ¹æ¡81åå«äºå é´å½±çåå¸§(è¡¨ç¤ºåºç¨äºè½éè½®å»ç¼©æ¾çç¼ç )åæªå é´å½±çåå¸§(è¡¨ç¤ºæ²¡æåºç¨è½éè½®å»ç¼©æ¾çç¼ç è¿ç¨)ãä»¥è¿ç§æ¹å¼ï¼ä¸ä»å¯ä»¥è·å¾ä¸åé¿åº¦çåå¸§çç»åï¼èä¸å¯ä»¥è·å¾å·æä¸åç¼ç åççåå¸§çç»åãå¨å½åçè¯´ææ§å®ä¾ä¸ï¼å¨ä¸åç¼ç æ¹æ¡ä¹é´åºç¨çè½éè½®å»ç¼©æ¾ä¸åãå¨æ´ä¸è¬çæå½¢ä¸ï¼å¯ä»¥ç¨ç±»ä¼¼çæ¹å¼å°ä»»ä½çç¼ç åçä¸å¯åé¿åº¦çæ¦å¿µç¸ç»åãSince this energy profile that scales to some extent is an alternative to using shorter frame lengths, this concept is particularly well suited in combination with the concept of variable frame lengths, as further described above. By having some coding schemes that apply energy contour scaling, some that do not, and some that apply energy contour scaling only during certain subframes, a more flexible set of coding schemes can be provided. An embodiment of a signal encoder unit 30 according to the invention is illustrated in FIG. 8 . Here, the different coding schemes 81 include shaded subframes (representing coding with energy contour scaling applied) and unshaded subframes (representing coding without energy contour scaling applied). In this way, not only combinations of subframes of different lengths but also combinations of subframes with different coding principles can be obtained. In the current illustrative example, the energy contour scaling applied differs between different encoding schemes. In a more general case, any encoding principle can be combined with the concept of variable length in a similar fashion.

å¾8çç¼ç æ¹æ¡çéååæ¬ä»¥ä¸åçæ¹å¼å¤çä¾å¦é¢åå£°äººå·¥äº§ç©çæ¹æ¡ãå¨ä¸äºæ¹æ¡ä¸ï¼ä½¿ç¨äºæ ¹æ®è½éè½®å»åçå·æé¢åå£°æå°åçè¾é¿åå¸§ãå¨å¶å®æ¹æ¡ä¸ï¼å©ç¨äºæ²¡æè¿è¡è½éè½®å»ç¼©æ¾çè¾ççåå¸§ãæ ¹æ®ä¿¡å·çåå®¹ï¼å¶ä¸çä¸ä¸ªå¤éæ¹æ¡ä¼æ´ä¸ºæçãå¯¹äºååä¸¥éçé¢åå£°æå½¢ï¼å¿é¡»ä½¿ç¨è¿è¡è½éè½®å»ç¼©æ¾ççåå¸§çç¼ç æ¹æ¡ãThe set of encoding schemes of FIG. 8 includes schemes that handle artifacts such as pre-echo in different ways. In some schemes, longer subframes with pre-echo minimization according to the principle of energy contouring are used. In other schemes, shorter subframes are utilized without energy contour scaling. Depending on the content of the signal, one of the alternatives may be more beneficial. For very severe pre-echo situations, short subframe coding schemes with energy contour scaling must be used.

ææåºçè§£å³æ¹æ¡å¯ä»¥ç¨å¨å¨é¨é¢å¸¦ä¸æèå¨ä¸ä¸ªæå¤ä¸ªä¸åçåå¸¦ä¸ãåå¸¦çä½¿ç¨å¯ä»¥è¢«æ½å äºä¸»ä¿¡å·åä¾§ä¿¡å·çäºèä¸æèåç¬æ½å å¨å¶ä¸ä¸ä¸ªä¸ãä¼éå®æ½ä¾åæ¬å°ä¾§ä¿¡å·åæå ä¸ªé¢å¸¦ãåå åªæ¯ç±äºå¨éç¦»çé¢å¸¦ä¸é¤å»å¯è½çåä½æ¯å¨æ´ä¸ªé¢å¸¦ä¸é¤å»æ´å®¹æãå½è§£ç å·æä¸°å¯çé¢è°±åå®¹æ¶è¿ä¸ç¹ç¹å«éè¦ãThe proposed solution can be used in all frequency bands or in one or more different sub-bands. The use of subbands can be applied to both the main and side signals or to one of them alone. A preferred embodiment consists in dividing the side signal into several frequency bands. The reason is simply because it is easier to remove possible redundancies in isolated frequency bands than in the entire frequency band. This is especially important when decoding has rich spectral content.

ä¸ç§å¯è½çç¨éæ¯å©ç¨ä¸è¿°æ¹æ³æ¥ç¼ç ä½äºé¢å®éå¼çé¢å¸¦ãæè¿°é¢å®éå¼ä¼éå¯ä»¥ä¸º2kHzï¼æèçè³æ´ä¼éä¸º1kHzãå¯¹äºæå´è¶£çé¢çèå´çå¶ä½é¨åï¼å¯ä»¥å©ç¨ä¸è¿°æ¹æ³å¯¹å¦ä¸ä¸ªéå é¢å¸¦è¿è¡ç¼ç ï¼æèä½¿ç¨ä¸ä¸ªå®å¨ä¸åçæ¹æ³ãOne possible use is to use the method described above to encode frequency bands below a predetermined threshold. The predetermined threshold may preferably be 2 kHz, or even more preferably 1 kHz. For the rest of the frequency range of interest, another additional frequency band can be coded using the method described above, or a completely different method can be used.

ä¼éä¸ºä½é¢ä½¿ç¨ä¸è¿°æ¹æ³çä¸ä¸ªå¨æºæ¯æ©æ£çå£°åºéå¸¸å¨é«é¢æ²¡æå¤å°è½éåå®¹ãèªç¶åå æ¯å£°é³å¸æ¶éå¸¸éçé¢çèå¢å ãèä¸ï¼æ©æ£å£°åºåéå¨è¾é«é¢çå¯¹äºäººç±»å¬è§ç³»ç»ä¼¼ä¹èµ·å°ä¸å¤ªéè¦çä½ç¨ãå æ¤ï¼å¨ä½é¢æ¶(ä½äº1æ2kHz)éç¨æè¿°è§£å³æ¹æ¡æ¯æççï¼å¹¶ä¸ä¾èµäºå¶å®æ¡ä»¶èå¨è¾é«é¢çä½¿ç¨æ¯ç¹æçæ´é«çç¼ç æ¹æ¡ãåªå¨ä½é¢æ¶åºç¨æè¿°æ¹æ¡å¯ä»¥å¤§éèçæ¯ç¹çï¼å ä¸ºæåºçæ¹æ³æå¿é¡»çæ¯ç¹çä¸æéè¦çå¸¦å®½ææ£æ¯ãå¨å¤§å¤æ°æå½¢ä¸ï¼åå£°éç¼ç å¨å¯ä»¥å¯¹æ´ä¸ªé¢å¸¦ç¼ç ï¼èå»ºè®®åªæ¯å¨é¢å¸¦çè¾ä½é¨åæ§è¡ææåºçä¾§ä¿¡å·ç¼ç ï¼å¦å¾9ç¤ºææ§å°è¯´æçãåèæ°å301æçæ¯æ ¹æ®æ¬åæçä¾§ä¿¡å·ç¼ç æ¹æ¡ï¼åèæ°å302æçæ¯ä»»ä½å¶å®çä¾§ä¿¡å·ç¼ç æ¹æ¡ï¼ä»¥ååèæ°å303æçæ¯ä¾§ä¿¡å·çä¸ä¸ªç¼ç æ¹æ¡ãOne motivation for using the above method preferably for low frequencies is that diffuse sound fields generally have little energy content at high frequencies. The natural reason is that sound absorption generally increases with frequency. Also, diffuse sound field components seem to play a less important role for the human auditory system at higher frequencies. Therefore, it is beneficial to employ the described solution at low frequencies (below 1 or 2 kHz), and use a more bit-efficient coding scheme at higher frequencies, depending on other conditions. Applying the scheme only at low frequencies can save a lot of bit rate, since the bit rate necessary for the proposed method is directly proportional to the required bandwidth. In most cases, a mono coder can encode the entire frequency band, whereas it is proposed to perform the proposed side signal encoding only in the lower part of the frequency band, as schematically illustrated in FIG. 9 . Reference numeral 301 refers to the coding scheme of the side signal according to the invention, reference numeral 302 refers to any other coding scheme of the side signal, and reference numeral 303 refers to a coding scheme of the side signal.

ä¹æå¯è½å¯¹äºå ä¸ªä¸åçé¢å¸¦ä½¿ç¨ææåºçæ¹æ³ãIt is also possible to use the proposed method for several different frequency bands.

å¨å¾10ä¸ï¼ç¨æµç¨å¾è¯´æäºæ ¹æ®æ¬åæçç¼ç æ¹æ³çå®æ½ä¾çä¸»è¦æ¥éª¤ãè¯¥è¿ç¨å¼å§äºæ¥éª¤200ãå¨æ¥éª¤210ï¼ç¼ç ä»å¤é³ä¿¡å·ä¸æ¨å¯¼åºçä¸»ä¿¡å·ãå¨æ¥éª¤212ï¼æä¾ç¼ç æ¹æ¡ï¼å¶åæ¬å·æä¸åé¿åº¦å/æé¡ºåºçåå¸§ãå¨æ¥éª¤214å©ç¨ä¸ä¸ªè³å°é¨åå°æ ¹æ®å½åå¤é³ä¿¡å·çå®éä¿¡å·åå®¹èéæ©çç¼ç æ¹æ¡æ¥å¯¹ä»å¤é³ä¿¡å·ä¸æ¨å¯¼åºçä¾§ä¿¡å·è¿è¡ç¼ç ãè¯¥è¿ç¨ç»æäºæ¥éª¤299ãIn Fig. 10, the main steps of an embodiment of the encoding method according to the invention are illustrated with a flowchart. The process starts at step 200 . In step 210, the main signal derived from the multi-tone signal is encoded. At step 212, a coding scheme is provided that includes subframes of different lengths and/or order. The side signal derived from the multi-tone signal is encoded at step 214 using a coding scheme selected at least in part based on the actual signal content of the current multi-tone signal. The process ends at step 299.

å¨å¾11ä¸ï¼ç¨æµç¨å¾è¯´æäºæ ¹æ®æ¬åæçè§£ç æ¹æ³çå®æ½ä¾çä¸»è¦æ¥éª¤ãè¯¥è¿ç¨å§äºæ¥éª¤200ãå¨æ¥éª¤220ï¼è§£ç ææ¥æ¶çç¼ç çä¸»ä¿¡å·ãå¨æ¥éª¤222ï¼æä¾ç¼ç æ¹æ¡ï¼å¶åæ¬å·æä¸åé¿åº¦å/æé¡ºåºçåå¸§ãå¨æ¥éª¤224ä¸éè¿ä¸ä¸ªéå®çç¼ç æ¹æ¡å¯¹ææ¥æ¶çä¾§ä¿¡å·è§£ç ãå¨æ¥éª¤226ä¸ï¼å°æè§£ç çä¸»åä¾§ä¿¡å·ç»åä¸ºä¸ä¸ªå¤é³ä¿¡å·ãæè¿°è¿ç¨ç»æäºæ¥éª¤299ãIn Fig. 11, the main steps of an embodiment of the decoding method according to the invention are illustrated with a flowchart. The process starts at step 200 . In step 220, the received encoded main signal is decoded. At step 222, a coding scheme is provided that includes subframes of different lengths and/or order. In step 224 the received side signal is decoded by a selected coding scheme. In step 226, the decoded main and side signals are combined into one multi-tone signal. The process ends at step 299 .

ä¸è¿°å®æ½ä¾åºå½è¢«çè§£ä¸ºæ¬åæçä¸äºè¯´ææ§çå®ä¾ãæ¬é¢åçææ¯äººåå°ä¼çè§£ï¼å¯ä»¥å¯¹è¿äºå®æ½ä¾è¿è¡åç§ä¿®æ¹ãç»ååååèä¸åè±ç¦»æ¬åæçèå´ãç¹å«æ¯ï¼å¨å¶å®æ¹æ¡ä¸å¯ä»¥ç»åä¸åå®æ½ä¾ä¸çä¸åçé¨åè§£å³æ¹æ¡ï¼åªè¦å¶å¨ææ¯ä¸æ¯å¯è¡çãç¶èï¼æ¬åæçèå´ç±æéçæå©è¦æ±ä¹¦å ä»¥éå®ãThe above-described embodiments should be understood as some illustrative examples of the invention. Those skilled in the art will understand that various modifications, combinations and changes can be made to these embodiments without departing from the scope of the present invention. In particular, different partial solutions from the different exemplary embodiments can be combined in other solutions as far as this is technically possible. However, the scope of the present invention is defined by the appended claims.

åèæç®references

ç±C.Fallerçäººå¨å¾·å½æå°¼é»2002å¹´5æä¸¾è¡çç¬¬112å±AESä¼è®®ä¸çâBinauralÂ cueÂ codingÂ appliedÂ toÂ stereoÂ andÂ multi-channelÂ audioÂ compression(å¯¹ç«ä½å£°åå¤å£°éé³é¢åç¼©æåºç¨çææ¯å¿çå£°å¦ç¼ç )âã"Binaural cue coding applied to stereo and multi-channel audio compression" by C.Faller et al. at the 112th AES conference held in Munich, Germany in May 2002 coding)".

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4