A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://patents.google.com/patent/CN100559465C/en below:

CN100559465C - Fidelity optimized variable frame length encoding

具体实施方式 Detailed ways

图1说明了一个典型的系统1,在其中可以有益地使用本发明。发射机10包含一个天线12,其包括相关的硬件和软件以能够向接收机20发送无线电信号5。发射机10除了多个其它部分之外还包括多声道编码器14,其将多个输入声道16的信号变换成适于无线电传输的输出信号。以下将进一步详细描述合适的多声道编码器14的实例。可以从例如音频信号存储器18提供输入声道16的信号,例如音频记录的数字表示的数据文件、磁带或者音频的聚乙烯盘等等。还可以“实况”提供输入声道16的信号,例如从一组话筒19提供。如果音频信号还不是数字格式,则在进入多声道编码器14之前对其进行数字化。Figure 1 illustrates a typical system 1 in which the present invention may be beneficially used. The transmitter 10 comprises an antenna 12 including the associated hardware and software to be able to transmit a radio signal 5 to a receiver 20 . Transmitter 10 includes, among other things, a multi-channel encoder 14 that transforms signals of a plurality of input channels 16 into output signals suitable for radio transmission. Examples of suitable multi-channel encoders 14 are described in further detail below. The input channel 16 signal may be provided from eg an audio signal memory 18, eg a data file of a digital representation of an audio recording, a magnetic tape or an audio polyethylene disc or the like. The input channel 16 signal may also be provided "live", for example from a set of microphones 19 . If the audio signal is not already in digital format, it is digitized before entering the multi-channel encoder 14 .

在接收机20侧,具有相关硬件和软件的天线22处理表示多音音频信号的无线电信号5的接收。在此执行通常的功能,例如误差校正。解码器24解码所接收的无线电信号5,并且将由此携带的音频数据变换成多个输出声道26的信号。输出信号可以被提供给例如扬声器29进行立即呈现,或者可以被存储在任何种类的音频信号存储器28中。On the receiver 20 side, an antenna 22 with associated hardware and software handles the reception of a radio signal 5 representing a multi-tone audio signal. The usual functions, such as error correction, are performed here. The decoder 24 decodes the received radio signal 5 and transforms the audio data carried thereby into signals of a plurality of output channels 26 . The output signal may be provided eg to a loudspeaker 29 for immediate presentation, or may be stored in any kind of audio signal memory 28 .

系统1可以是例如电话会议系统、用于提供音频服务或其它音频应用的系统。在一些系统中,例如在电话会议系统中,通信必须是双工类型的,而从一个服务供应商向订户分发音乐则可以基本上是单向类型的。从发射机10到接收机20的信号传输也可以用任何其它的方式进行,例如通过不同种类的电磁波、电缆或光纤以及它们的组合。The system 1 may be, for example, a teleconferencing system, a system for providing audio services or other audio applications. In some systems, such as in teleconferencing systems, the communication must be of the duplex type, whereas the distribution of music from a service provider to subscribers can be essentially of the one-way type. The signal transmission from the transmitter 10 to the receiver 20 can also be done in any other way, for example by different kinds of electromagnetic waves, cables or optical fibers and combinations thereof.

图2a说明根据本发明的编码器的实施例。在这一实施例中,多音信号是包含在输入端16A和16B处接收的两个声道a和b的立体声信号。声道a和b的信号被提供给预处理单元32,在那里可以执行不同的信号调节过程。来自预处理单元32的输出的信号(也许被修改过)在加法单元34中进行求和。所述加法单元34还把所得到的和除以因子2。以这种方式产生的信号xmono是该立体声信号的主信号,因为它基本上包括来自两个信道的所有数据。在这一实施例中,主信号因而表示一个纯“单声道”信号。主信号xmono被提供给主信号编码器单元38,其根据任何合适的编码原理来编码所述主信号。这些原理可以在现有技术中获得,因而在此不作进一步的讨论。主信号编码器单元38给出输出信号pmono,作为表示主信号的编码参数。Figure 2a illustrates an embodiment of an encoder according to the invention. In this embodiment, the polyphonic signal is a stereo signal comprising two channels a and b received at inputs 16A and 16B. The signals of channels a and b are supplied to a pre-processing unit 32 where various signal conditioning processes can be performed. The signals (possibly modified) from the output of the pre-processing unit 32 are summed in an addition unit 34 . The summing unit 34 also divides the resulting sum by a factor of two. The signal x mono produced in this way is the main signal of this stereo signal, since it basically includes all the data from both channels. In this embodiment, the main signal thus represents a purely "mono" signal. The main signal x mono is provided to a main signal encoder unit 38 which encodes said main signal according to any suitable encoding principle. These principles are available in the prior art and thus will not be discussed further here. The main signal encoder unit 38 gives an output signal p mono as an encoding parameter representing the main signal.

在减法单元36中,声道信号的差(除以因子2)被提供作为侧信号xside。在这一实施例中,侧信号表示立体声信号的两个声道之间的差。侧信号xside被提供给侧信号编码单元30。以下将进一步讨论侧信号编码单元30的优选实施例。根据将在下面进一步详细讨论的侧信号编码过程,侧信号xside被转换成表示侧信号xside的编码参数pside。在某些实施例中,还利用主信号xmono的信息来进行编码。箭头42指示了这种设备,其中利用了原始未编码的主信号xmono。在进一步的其它实施例中,在侧信号编码单元30中所使用的主信号信息可以从表示该主信号的编码参数pmono中推断出来,如虚线44所指示的。In the subtraction unit 36 the difference of the channel signals (divided by a factor of 2) is provided as side signal x side . In this embodiment, the side signal represents the difference between the two channels of the stereo signal. The side signal x side is supplied to the side signal encoding unit 30 . A preferred embodiment of the side signal encoding unit 30 will be discussed further below. According to a side signal encoding process which will be discussed in further detail below, the side signal xside is transformed into an encoding parameter pside representing the side signal xside . In some embodiments, information from the main signal x mono is also used for encoding. Arrow 42 indicates such a device, in which the original unencoded main signal xmono is utilized. In still other embodiments, the main signal information used in the side signal encoding unit 30 may be inferred from the encoding parameter p mono representing the main signal, as indicated by the dashed line 44 .

表示主信号xmono的编码参数pmono是第一输出信号,以及表示侧信号xside的编码参数pside是第二输出信号。在通常情形下,这两个输出信号pmono、pside一起表示完整的立体声声音,它们在多路复用器单元40被多路复用成一个传输信号52。然而,在其它实施例中,可以分开进行第一和第二输出信号pmono、pside的传输。The encoding parameter p mono representing the main signal x mono is the first output signal and the encoding parameter p side representing the side signal x side is the second output signal. In the usual case, the two output signals p mono , p side together represent the complete stereo sound, which are multiplexed in the multiplexer unit 40 into one transmission signal 52 . However, in other embodiments, the transmission of the first and second output signals p mono , p side may be performed separately.

在图2b中,以框图形式说明了根据本发明的解码器24的实施例。所接收的信号54(包含表示主和侧信号信息的编码参数)被提供给解复用器单元56,它分别分出第一和第二输入信号。对应于主信号的编码参数pmono的第一输入信号被提供给主信号解码器单元64。以传统的方式,表示主信号的编码参数pmono被用于产生一个解码的主信号x”mono,它尽可能地类似于编码器14(图2a)中的主信号xmono(图2a)。In Fig. 2b, an embodiment of a decoder 24 according to the invention is illustrated in block diagram form. The received signal 54 (comprising encoding parameters representing information about the main and side signals) is supplied to a demultiplexer unit 56, which separates the first and second input signals, respectively. A first input signal corresponding to the encoding parameter p mono of the main signal is supplied to the main signal decoder unit 64 . In a conventional manner, the encoding parameters p mono representing the main signal are used to generate a decoded main signal x" mono which is as similar as possible to the main signal x mono (Fig. 2a) in the encoder 14 (Fig. 2a).

类似地,对应于侧信号的第二输入信号被提供给一个侧信号解码器单元60。在这里,表示侧信号的编码参数pside被用于恢复解码的侧信号x”side。在一些实施例中,解码过程利用有关主信号x”mono的信息,如箭头所指示的。Similarly, a second input signal corresponding to the side signal is provided to a side signal decoder unit 60 . Here, the encoding parameter pside representing the side signal is used to recover the decoded side signal x" side . In some embodiments, the decoding process utilizes information about the main signal x" mono , as indicated by the arrow.

所解码的主和侧信号x”mono、x”side被提供给一个加法单元70,其提供一个表示声道a的原始信号的输出信号。类似地,由减法单元68提供的差提供了一个表示声道b的原始信号的输出信号。可以根据现有技术的处理过程在后处理器单元74中对这些声道信号进行后处理。最终,在解码器的输出端26A和26B提供声道信号a和b。The decoded main and side signals x" mono , x" side are supplied to an addition unit 70 which provides an output signal representing the original signal of channel a. Similarly, the difference provided by subtraction unit 68 provides an output signal representative of the original signal of channel b. These channel signals may be post-processed in the post-processor unit 74 according to prior art processing procedures. Finally, channel signals a and b are provided at decoder outputs 26A and 26B.

如在发明内容中所述,通常以每次一帧的方式进行编码。一帧包括在一个预定时间周期内的音频采样。在图3a的底部,示例了持续时间为L的帧SF2。在无阴影部分内的音频采样要一起被编码。前面的采样和随后的采样在其它帧中进行编码。无论如何,把采样分成帧都将在帧边界处引入一些不连续。多变的声音将给出多变的编码参数,从而基本上在每个帧边界处发生变化。这将产生可感知的误差。对这种情形稍微进行补偿的一种方法是使编码不但基于要被编码的采样,而且基于在该帧的绝对附近的采样,如由阴影部分所指示的。以这种方法,在不同的帧之间将是比较柔和的转换。作为备选方案或者补充,有时利用内插技术来降低由帧边界引起的可感知的人工产物。然而,所有这些过程都需要大量的附加计算资源,并且对于某些特定编码技术而言,也许难于提供任何的资源。As mentioned in the Summary of the Invention, encoding is typically done one frame at a time. A frame consists of audio samples within a predetermined period of time. At the bottom of Fig. 3a, a frame SF2 of duration L is illustrated. Audio samples within the unshaded portion are to be encoded together. Previous samples and subsequent samples are coded in other frames. In any case, dividing samples into frames introduces some discontinuities at frame boundaries. A variable sound will give a variable encoding parameter that changes essentially at every frame boundary. This will produce a perceivable error. One way to somewhat compensate for this situation is to base the encoding not only on the samples to be encoded, but also on samples in the absolute vicinity of the frame, as indicated by the shaded portion. In this way there will be softer transitions between frames. As an alternative or in addition, interpolation techniques are sometimes utilized to reduce perceived artifacts caused by frame boundaries. However, all of these processes require significant additional computing resources, and for certain encoding techniques, it may be difficult to provide any resources.

因此,使用尽可能长的帧将是有益的,因此帧边界的数目会小。而且编码效率通常会变高,并且必要的传输比特率通常也被最小化。然而,长帧所带来的问题是预回声人工产物和虚幻声音。Therefore, it would be beneficial to use as long a frame as possible, so the number of frame boundaries would be small. Also the coding efficiency is usually high and the necessary transmission bit rate is usually minimized. The problem with long frames, however, is pre-echo artifacts and phantom sounds.

通过替代地利用较短的帧,例如分别具有L/2和L/4的持续时间的SF1或甚至SF0,本领域的技术人员认识到,编码效率会被降低,传输比特率必须比较高,并且帧边界人工产物的问题将增加。然而,较短的帧较少经受例如其它可感知的人工产物,比如虚幻的声音和预回声。为了能够尽可能多地最小化编码误差,应当使用尽可能短的帧长。By using shorter frames instead, such as SF1 or even SF0 with durations of L/2 and L/4 respectively, those skilled in the art realize that the coding efficiency will be reduced, the transmission bit rate must be higher, and The problem of frame boundary artifacts will increase. However, shorter frames are less subject to, for example, other perceivable artifacts such as phantom sounds and pre-echoes. In order to be able to minimize coding errors as much as possible, the shortest possible frame length should be used.

根据本发明,通过使用依赖于当前信号内容的帧长度来编码侧信号可以改进音频感知。由于不同帧长度对于音频感知的影响将根据要被编码的声音的特性而不同,因此通过让信号本身的特性来影响所使用的帧长度可以获得改进。主信号的编码不是本发明的目的,因此不进行详细描述。然而,主信号所用的帧长度可以与侧信号所使用的帧长度相等,或者可以不相等。According to the invention, audio perception can be improved by encoding the side signal with a frame length that depends on the current signal content. Since the impact of different frame lengths on audio perception will vary depending on the characteristics of the sound to be encoded, improvements can be obtained by letting the characteristics of the signal itself influence the frame length used. The encoding of the main signal is not the object of the invention and therefore not described in detail. However, the frame length used by the main signal may be equal to the frame length used by the side signal, or may not be equal.

由于小的瞬时变化,所以例如在一些情形下使用相对较长的帧对侧信号进行编码是有益的。对于具有大量扩散的声场的记录比如音乐会记录会出现这种情形。在其它情形下,例如在立体声语音会话中,短帧则可能是优选的。可以用两种基本方法来判断选取哪种帧长度。Due to the small temporal variations, it is beneficial in some situations, for example, to use relatively long frames for encoding the side signal. This is the case for recordings with a heavily diffuse sound field, such as concert recordings. In other situations, such as in stereo speech conversations, short frames may be preferable. There are two basic ways to decide which frame length to choose.

在图3b中说明根据本发明的侧信号编码器单元30的一个实施例,其中利用了闭环判断。在此使用了长度为L的基本编码帧。产生了多个编码方案81,由子帧的分开的集合80来表征。子帧的每个集合80包括一个或者多个子帧,它们具有相同或不同的长度。然而子帧的集合80的总长度总是等于基本编码帧长度L。参考图3b,顶部编码方案被表征为只包含一个长度为L的子帧的子帧集合。下一个子帧集合包含两个长度为L/2的子帧。第三集合包含两个长度为L/4的子帧以及后面的一个长度为L/2的子帧。An embodiment of a side signal encoder unit 30 according to the invention is illustrated in Fig. 3b, in which a closed-loop decision is utilized. A basic coded frame of length L is used here. A plurality of coding schemes 81 are generated, characterized by separate sets 80 of subframes. Each set 80 of subframes includes one or more subframes, which may be of the same or different length. However the total length of the set 80 of subframes is always equal to the basic coded frame length L. Referring to Fig. 3b, the top coding scheme is characterized as a set of subframes containing only one subframe of length L. The next set of subframes contains two subframes of length L/2. The third set includes two subframes of length L/4 followed by a subframe of length L/2.

通过所有的编码方案81对被提供给侧信号编码器单元30的信号xside进行编码。在顶部的编码方案中,以一块来编码整个基本编码帧。然而在其它的编码方案中,在相互分开的各个子帧中对信号xside进行编码。来自每个编码方案的结果被提供给选择器85。保真度测量装置83确定每个编码信号的保真度测量值(measure)。保真度测量值是一个客观的质量值,优选的为信噪比测量值或者加权的信噪比。比较与每种编码方案相关的保真度测量值,并且其结果控制一个切换装置87,用于从给出最好的保真度测量值的编码方案中选择表示该侧信号的编码参数,以作为来自侧信号编码器单元30的输出信号pside。The signal x side provided to the side signal encoder unit 30 is encoded by all encoding schemes 81 . In the coding scheme at the top, the entire basic coding frame is coded in one block. In other coding schemes, however, the signal x side is coded in separate subframes. The results from each coding scheme are provided to a selector 85 . Fidelity measurement means 83 determine a fidelity measure for each encoded signal. The fidelity measure is an objective quality value, preferably a signal-to-noise ratio measure or a weighted signal-to-noise ratio. The fidelity measures associated with each encoding scheme are compared and the result controls a switching means 87 for selecting the encoding parameters representing the side signal from the encoding scheme giving the best fidelity measure to as output signal p side from side signal encoder unit 30 .

优选地,测试帧长度的所有可能的组合,并选择给出最好的客观质量(例如信噪比)的子帧的集合。Preferably, all possible combinations of frame lengths are tested and the set of subframes that gives the best objective quality (eg signal to noise ratio) is selected.

在本实施例中,根据下式选择所用的子帧的长度:In this embodiment, the length of the subframe used is selected according to the following formula:

lsf=lf/2n,l sf = l f /2 n ,

其中lsf是子帧的长度,lf是编码帧的长度,以及n是一个整数。在本实施例中,在0和3之间选择n。然而,将可能使用任何帧长度,只要集合的总长度保持恒定。where l sf is the length of the subframe, l f is the length of the coded frame, and n is an integer. In this embodiment, n is chosen between 0 and 3. However, it would be possible to use any frame length as long as the total length of the set remains constant.

在图3c中说明了根据本发明的侧信号编码器单元30的另一个实施例。在此,帧长度判断是一个基于信号的统计特性的开环判断。换言之,将使用侧信号的频谱特征以作为用于决定打算使用哪种编码方案的基础。如前所述,可以获得被表征为不同子帧的集合的不同编码方案。然而,在这一实施例中,选择器85被放置在实际编码之前。输入的侧信号xside进入选择器85和信号分析单元84。分析的结果成为开关86的输入,在开关中只使用一种编码方案81。来自该编码方案的输出也将是来自侧信号编码器单元30的输出信号pside。Another embodiment of a side signal encoder unit 30 according to the invention is illustrated in Fig. 3c. Here, the frame length judgment is an open-loop judgment based on the statistical characteristics of the signal. In other words, the spectral characteristics of the side signal will be used as a basis for deciding which coding scheme to use. As mentioned before, different coding schemes can be obtained which are characterized as sets of different subframes. However, in this embodiment the selector 85 is placed before the actual encoding. The input side signal x side enters the selector 85 and the signal analysis unit 84 . The result of the analysis becomes the input of a switch 86 in which only one encoding scheme 81 is used. The output from this encoding scheme will also be the output signal p side from the side signal encoder unit 30 .

开环判断的优点在于只要执行一次实际编码。然而缺点在于,信号特征的分析实际上会非常复杂,并且难以事先预测可能的特性以便能够在开关86中给出适当的选择。在信号分析单元84中必须执行和包含许多的声音统计分析。编码方案中任何小的变化都可能完全颠倒统计特性。The advantage of open-loop judgment is that the actual encoding needs to be performed only once. A disadvantage, however, is that the analysis of the signal characteristics can actually be very complex and it is difficult to predict possible characteristics in advance in order to be able to make an appropriate selection in the switch 86 . In the signal analysis unit 84 a number of sound statistical analyzes have to be performed and included. Any small change in the encoding scheme can completely reverse the statistical properties.

通过使用闭环选择(图3b),可以互换编码方案而无需对单元的其余部分进行任何变化。另一方面,如果要研究许多编码方案,则计算要求会很高。By using closed-loop selection (Fig. 3b), the coding schemes can be interchanged without any changes to the rest of the unit. On the other hand, if many encoding schemes are to be investigated, the computational requirements can be high.

这种对侧信号进行可变帧长编码的益处在于,可以在两种情形之间进行选择:一方面是精细的时间分辨率和粗糙的频率分辨率,另一方面是粗糙的时间分辨率和精细的频率分辨率。以上的实施例将以最佳可能的方式来保持立体声图像。The benefit of this variable frame length coding of the side signal is that it is possible to choose between two situations: fine time resolution and coarse frequency resolution on the one hand, and coarse time resolution and coarse frequency resolution on the other hand. Fine frequency resolution. The above embodiments will preserve the stereo image in the best possible way.

对于在不同编码方案中所使用的实际编码还会有一些要求。特别是,当使用闭环选择时,用于执行多个或多或少同时编码的计算资源必须大。编码过程越复杂,所需要的计算能力就越多。此外,在传输时的低比特率也是优选的。There are also requirements for the actual encoding used in the different encoding schemes. In particular, when using closed-loop selection, the computing resources for performing multiple more or less simultaneous encodings must be large. The more complex the encoding process, the more computing power is required. Furthermore, a low bit rate in transmission is also preferred.

在US 5,434,948中给出的方法使用了单声道(主)信号的滤波形式来比拟侧信号或者差信号。滤波器的参数被优化,并且允许随时间变化。然后表示侧信号的编码的滤波器参数被发送。在一个实施例中,也发送一个残留侧信号。在许多情形下,这种方法将可能用作在本发明范围内的侧信号编码方法。然而,该方法具有一些缺陷。由于滤波器阶数必须很高来提供精确的侧信号估计,所以滤波器系数和任何残留侧信号的量化通常需要相对较高的传输比特率。滤波器自身的估计也会有问题,特别是在瞬时丰富的音乐中。估计误差将给出一个修改的侧信号,其有时在幅度方面比未修改的信号大。这将导致较高的比特率需要。而且,如果每N个采样计算一组新的滤波器系数,则需要内插这些滤波器系数以产生从一组滤波器系数到另一组的平滑转换,如上面所讨论的。滤波器系数的内插是一项复杂的任务,并且在内插中的误差将会表现为大的侧误差信号,从而导致差值误差信号编码器所需的较高比特率。The method given in US 5,434,948 uses a filtered version of the mono (main) signal to compare the side or difference signal. The parameters of the filter are optimized and allowed to vary over time. The encoded filter parameters representing the side signal are then transmitted. In one embodiment, a residual side signal is also sent. In many cases, this method will likely be used as a side signal encoding method within the scope of the present invention. However, this method has some drawbacks. Since the filter order must be high to provide accurate side signal estimates, quantization of the filter coefficients and any residual side signal typically requires a relatively high transmission bit rate. Estimation of the filter itself can also be problematic, especially in temporally rich music. The estimation error will give a modified side signal which is sometimes larger in magnitude than the unmodified signal. This will result in a higher bitrate required. Also, if a new set of filter coefficients is computed every N samples, these filter coefficients need to be interpolated to produce a smooth transition from one set of filter coefficients to another, as discussed above. Interpolation of the filter coefficients is a complex task and errors in the interpolation will appear as large side error signals, leading to higher bit rates required by the difference error signal encoder.

避免内插的需要的一种方法是基于逐个采样来更新滤波器系数,并且依靠后向自适应分析。为了可以良好运行,要求残留编码器有相当高的比特率。因此,这对于低速率立体声编码不是一个好的备选方案。One way to avoid the need for interpolation is to update the filter coefficients on a sample-by-sample basis and rely on backward adaptive analysis. Residual encoders are required to have a fairly high bitrate in order to work well. Therefore, this is not a good candidate for low-rate stereo encoding.

存在以下例如对于音乐来说很常见的情形,其中单声道信号和差信号几乎是不相关的。于是滤波器估计变得非常困难,附加的风险只是使得差值误差信号编码器的情况更糟。There are situations, which are common eg for music, where the mono signal and the difference signal are almost uncorrelated. Filter estimation then becomes very difficult, with the added risk of only making the difference error signal encoder worse.

根据US 5,434,948的解决方案可以在下面的情形下良好工作:其中滤波器系数随着时间的变化很慢,例如在会议电话系统中。在音乐信号的情形下,该方法并不很好地工作,因为滤波器需要快速改变以跟踪立体声图像。这意味着,必须使用幅度非常不同的子帧长度,其意味着要测试的组合数目快速增加。这又意味着用于计算所有可能的编码方案的要求变得高得不切实际。The solution according to US 5,434,948 may work well in situations where the filter coefficients vary slowly over time, eg in a conference call system. In the case of music signals, this approach does not work very well, since the filters need to change rapidly to track the stereo image. This means that subframe lengths of very different magnitudes have to be used, which means that the number of combinations to be tested increases rapidly. This in turn means that the requirements for computing all possible encoding schemes become impractically high.

因此,在优选实施例中,基于以下思想来编码侧信号:即通过使用一个简单的平衡因子来代替复杂的比特率消耗的预测滤波器,从而降低单声道信号和侧信号之间的冗余。然后编码这一操作的残留。所述残留的幅度相对较低,并且不需要非常高的比特率需求来进行传送。这一思想的确非常适于和前面所述的可变帧集合方法相结合,因为计算复杂度低。Therefore, in the preferred embodiment, the side signal is encoded based on the idea that the redundancy between the mono signal and the side signal is reduced by replacing the complex bitrate-consuming prediction filter with a simple balancing factor . The remainder of this operation is then encoded. The magnitude of the residue is relatively low and does not require very high bit rate requirements for transmission. This idea is indeed very suitable for combining with the above-mentioned variable frame set method, because the computational complexity is low.

使用与可变帧长度方法结合的平衡因子消除了对复杂内插的需要以及内插可能引起的相关问题。而且,使用简单的平衡因子代替复杂的滤波器产生更少的估计问题,因为平衡因子的可能的估计误差具有更少的影响。优选的解决方案将能够以良好的质量和受限的比特率要求以及计算资源来再现平滑信号(panned signal)和扩散声场。Using a balance factor combined with a variable frame length method eliminates the need for complex interpolation and the associated problems that interpolation can cause. Also, using simple balance factors instead of complex filters creates fewer estimation problems, since possible estimation errors of the balance factors have less impact. A preferred solution will be able to reproduce panned signals and diffuse sound fields with good quality and constrained bitrate requirements and computational resources.

图4说明了根据本发明的立体声编码器的优选实施例。该实施例与图2a所示的实施例非常类似,然而,揭示了侧信号编码器单元30的细节。该实施例的编码器14不具备任何的预处理单元,并且输入信号被直接提供给加法和减法单元34、36。在乘法器33中单声道信号x单声道和某一平衡因子gsm相乘。在减法单元35中,相乘后的单声道信号被从侧信号x侧中减去(即基本上是这两个声道之间的差值),以产生侧残留信号。通过优化器37基于单声道信号和侧信号的内容来确定平衡因子gsm,以便根据质量标准来最小化侧残留信号。所述质量标准优选为最小均方标准。根据任一编码器过程在侧残留编码器39中对侧残留信号进行编码。优选地,侧残留编码器39是一个低比特率变换编码器,或者一个码本激励线性预测(CELP:Codebook Excited LinearPrediction)编码器。表示侧信号的编码参数pside则包含了表示侧残留信号的编码参数pside residual和优化的平衡因子49。Figure 4 illustrates a preferred embodiment of a stereo encoder according to the invention. This embodiment is very similar to the embodiment shown in Fig. 2a, however, the details of the side signal encoder unit 30 are revealed. The encoder 14 of this embodiment is not provided with any pre-processing unit and the input signal is provided directly to the addition and subtraction units 34,36. In the multiplier 33 the mono signal x mono is multiplied by a certain balance factor g sm . In a subtraction unit 35 the multiplied mono signal is subtracted from the side signal xside (ie essentially the difference between these two channels) to produce a side residual signal. The balance factor gsm is determined by the optimizer 37 based on the content of the mono signal and the side signal in order to minimize the side residual signal according to quality criteria. The quality standard is preferably the least mean square standard. The side residual signal is encoded in a side residual encoder 39 according to either encoder process. Preferably, the side residual encoder 39 is a low bit rate transform encoder, or a Codebook Excited Linear Prediction (CELP: Codebook Excited Linear Prediction) encoder. The coding parameter p side representing the side signal includes the coding parameter p side residual representing the side residual signal and an optimized balance factor 49 .

在图4的实施例中,用于合成侧信号的单声道信号42是单声道编码器38的目标信号xmono。如上所述(结合图2a),也可以利用单声道编码器38的本地合成信号。在后一情形下会增加总编码器时延,并会增加侧信号的计算复杂度。另一方面,质量会比较好,因为有可能修复在单声道编码器中产生的编码错误。In the embodiment of FIG. 4 , the mono signal 42 used to synthesize the side signal is the target signal x mono of the mono encoder 38 . As mentioned above (in connection with Fig. 2a), it is also possible to use the local composite signal of the mono encoder 38. In the latter case the total encoder delay will be increased and the computational complexity of the side signals will be increased. On the other hand, the quality will be better because it is possible to fix encoding errors made in the mono encoder.

如下以更加精确的方式来描述基本编码方案。将两个声道信号表示为a和b,它们可以是立体声对的左声道和右声道。通过相加将声道信号组合成一个单声道信号,并且通过相减而组合成一个侧信号。该操作以等式的形式被描述为:The basic encoding scheme is described in a more precise manner as follows. Denote the two channel signals as a and b, which may be the left and right channels of a stereo pair. The channel signals are combined into a mono signal by addition and into a side signal by subtraction. This operation is described in equation form as:

xmono(n)=0.5(a(n)+b(n))x mono (n)=0.5(a(n)+b(n))

xside(n)=0.5(a(n)-b(n)).x side (n)=0.5(a(n)-b(n)).

有益的是以2为因子来缩小xmono和xside信号。在此,这暗示着存在其它产生xmono和xside的方法。可以使用例如:It is beneficial to scale down the x mono and x side signals by a factor of 2. Here, this implies that there are other ways to generate x mono and x side . You can use for example:

xmono(n)=γa(n)+(1-γ)b(n)x mono (n)=γa(n)+(1-γ)b(n)

xside(n)=γa(n)-(1-γ)b(n)x side (n)=γa(n)-(1-γ)b(n)

0≤γ≤1.0.0≤γ≤1.0.

在输入信号的块上,根据下式计算修改后的或者残留的侧信号:On blocks of the input signal, the modified or residual side signal is computed according to:

xsideresidual(n)=xside(n)-f(xmono,xside)xmono(n),x sideresidual (n)=x side (n)-f(x mono , x side ) x mono (n),

其中f(xmono,xside)是平衡因子函数,其基于来自侧和单声道信号的N个采样的块(即子帧)来争取从侧信号中尽可能多地消除。换言之,使用平衡因子来最小化残留侧信号。在以均方为准进行最小化的特殊情形下,这等价于最小化残留侧信号xside residual的能量。where f(x mono , x side ) is a balance factor function that seeks to cancel as much as possible from the side signal based on blocks of N samples (ie subframes) from the side and mono signals. In other words, balance factors are used to minimize the residual side signal. In the special case of mean squared minimization, this is equivalent to minimizing the energy of the residual side signal x side residual .

在上述特殊情形下,f(xmono,xside)被描述为:In the special case above, f( xmono , x side ) is described as:

ff (( xx monomono ,, xx sidethe side )) == RR smsm RR mmmm

RR mmmm == [[ ΣΣ nno == framestartframe start frameendframe end xx monomono (( nno )) xx monomono (( nno )) ]]

RR smsm == [[ ΣΣ nno == framestartframe start frameendframe end xx sidethe side (( nno )) xx monomono (( nno )) ]] ,,

其中xside是侧信号,以及xmono是单声道信号。注意到,该函数基于以“帧开始”开始和以“帧结束”结束的块。where x side is the side signal and x mono is the mono signal. Note that this function is based on blocks starting with "frame start" and ending with "frame end".

有可能在频域中增加加权来计算平衡因子。这是通过利用加权滤波器的脉冲响应对xside和xmono信号卷积来完成的。这样有可能将估计误差移动到更不易被听到的频率范围内。这被称为感知加权。It is possible to add weighting in the frequency domain to calculate the balance factor. This is done by convolving the x side and x mono signals with the impulse response of the weighting filter. This has the potential to move the estimation error into frequency ranges that are less audible. This is called perceptual weighting.

由函数f(xmono,xside)给出的平衡因子值的量化形式被发送到解码器。在产生修改的侧信号时最好已经说明了这些量化。然后获得以下的表达式:A quantized version of the balance factor value given by the function f(x mono , x side ) is sent to the decoder. These quantizations are preferably already accounted for when generating the modified side signal. Then the following expressions are obtained:

xsideresidual(n)=xside(n)-gQxmono(n)x sideresidual (n)=x side (n)-g Q x mono (n)

gg QQ == QQ gg -- 11 (( QQ gg (( RR smsm RR mmmm )) )) ..

Qg(...)是一个量化函数,其被应用到由函数f(xmono,xside)所给出的平衡因子上。在传输信道中发送所述平衡因子。在正常的左右平滑信号中,平衡因子被限制在区间[-1.0 1.0]中。另一方面,如果声道相对于彼此异相,则平衡因子会超出这些限制。Qg(...) is a quantization function that is applied to the balance factor given by the function f(x mono , x side ). The balance factor is sent in a transport channel. In a normal left-right smooth signal, the balance factor is clamped in the interval [-1.0 1.0]. On the other hand, if the channels are out of phase with respect to each other, the balance factor will exceed these limits.

作为用于稳定立体声图像的一个可选方法,可以在以下情况下对平衡因子进行限制,即如果单声道信号和侧信号之间的归一化互相关不佳,如以下等式所给出的:As an optional method for stabilizing the stereo image, the balance factor can be limited if the normalized cross-correlation between the mono signal and the side signal is poor, as given by the following equation of:

gg QQ == QQ gg -- 11 (( QQ gg (( || RR == smsm || RR smsm RR mmmm )) )) ,,

其中,in,

RR == smsm == RR smsm RR ssss ·&Center Dot; RR mmmm

RR smsm == [[ ΣΣ nno == framestartframe start frameendframe end xx sidcsidc (( nno )) xx monomono (( nno )) ]] ..

这些情形在具有大量扩散声音的古典音乐或播音室音乐中出现非常频繁,其中在一些情形下,在创建单声道信号时a和b声道也许几乎彼此抵消。对于平衡因子的影响就是会快速跳变,从而引起混乱的立体声图像。上述调整减轻了所述问题。These situations arise very frequently in classical or studio music with a lot of diffuse sound, where in some cases the a and b channels may nearly cancel each other out when creating a mono signal. The effect on the balance factor is that it jumps quickly, causing a confusing stereo image. The above adjustments alleviate the problem.

在US 5,434,948中基于滤波器的方法具有类似的问题,但是在那种情形下解决方案并不那么简单。The filter based approach in US 5,434,948 has a similar problem, but the solution is not so simple in that case.

如果Es是残留侧信号的编码函数(例如变换编码器),以及Em是单声道信号的编码函数,则在解码器末尾被解码的a”和b”信号可以被描述为(在此假设γ=0.5):If E s is the encoding function of the residual side signal (e.g. a transform coder), and E m is the encoding function of the mono signal, then the decoded a" and b" signals at the end of the decoder can be described as (here Assuming γ = 0.5):

a″(n)=(1+gQ)x″mono(n)+x″side(n)a″(n)=(1+g Q )x″ mono (n)+x″ side (n)

b″(n)=(1-gQ)x″mono(n)-x″side(n)b″(n)=(1-g Q )x″ mono (n)-x″ side (n)

xx sidethe side ′′ ′′ == EE. sthe s -- 11 (( EE. sthe s (( xx sideresidualside residual )) ))

xx monomono ′′ ′′ == EE. mm -- 11 (( EE. mm (( xx monomono )) ))

对于每一帧计算平衡因子的一个重要益处就是避免了使用内插。代之以,一般地如上所述,利用重叠的帧来执行帧处理。An important benefit of computing the balance factor for each frame is that it avoids the use of interpolation. Instead, frame processing is performed with overlapping frames, generally as described above.

在音乐信号的情形下使用平衡因子的编码原理工作特别良好,其中通常需要快速的改变来跟踪立体声图像。The coding principle using balance factors works particularly well in the case of music signals, where often fast changes are required to track the stereo image.

近来,多声道编码已经变得普遍。一个实例是DVD电影中的5.1声道环绕声。这些声道在那里被设置为:前左、前中、前右、后左、后右以及亚低音扬声器。在图5中,示出了根据本发明以这种采用声道间冗余的布置对3个前声道进行编码的编码器的实施例。Recently, multi-channel encoding has become common. An example is 5.1 channel surround sound in DVD movies. The channels are set there as: Front Left, Front Center, Front Right, Rear Left, Rear Right and Subwoofer. In Fig. 5 an embodiment of an encoder encoding 3 front channels in this arrangement with inter-channel redundancy according to the invention is shown.

在3个输入端16A-C上提供3个声道信号L,C,R,并且通过这三个信号的和来产生单声道信号xmono。增加了中央信号编码器单元130,其接收中央信号xcentre。在本实施例中单声道信号42是所编码和解码的单声道信号x”mono,并且在乘法器133中与某一平衡因子gQ相乘。在减法单元135中,相乘后的单声道信号被从中央信号xcentre中减去,以产生中央残留信号。由优化器137基于单声道信号和中央信号的内容来确定平衡因子gQ,以便根据质量标准来最小化中央残留信号。在中央残留编码器139中根据任何编码过程对中央残留信号进行编码。优选地,中央残留编码器139是低比特率变换编码器或CELP编码器。表示中央信号的编码参数pcentre中央则包含表示中央残留信号的编码参数pcentre residual以及优化的平衡因子149。在加法单元235中将中央残留信号与缩放后的单声道信号相加,从而产生修改后的中央信号142来补偿编码误差。Three channel signals L, C, R are provided at the three inputs 16A-C, and a mono signal x mono is generated by the sum of these three signals. A central signal encoder unit 130 is added, which receives the central signal xcentre . In this embodiment the mono signal 42 is the encoded and decoded mono signal x" mono and is multiplied in the multiplier 133 by a certain balance factor g Q. In the subtraction unit 135, the multiplied The mono signal is subtracted from the central signal xcentre to produce the central residual signal. The balance factor gQ is determined by the optimizer 137 based on the content of the mono signal and the central signal, so as to minimize the central residual according to quality criteria Signal. The central residual signal is encoded according to any encoding process in the central residual encoder 139. Preferably, the central residual encoder 139 is a low bit-rate transform coder or CELP encoder. The encoding parameter p center representing the central signal is then Contains an encoding parameter p center residual representing the central residual signal and an optimized balance factor 149. The central residual signal is added to the scaled mono signal in the addition unit 235, thereby producing a modified central signal 142 to compensate for encoding errors .

如前面的实施例中那样,侧信号xside(即左L与右R声道之间的差)被提供给侧信号编码器单元30。然而,在这里,优化器37也依赖于由中央信号编码器单元130所提供的修改后的中央信号142。因此将在减法单元35中产生侧残留信号以作为单声道信号42、修改后的中央信号142以及侧信号的最佳线性组合。As in the previous embodiments, the side signal x side (ie the difference between the left L and right R channels) is supplied to the side signal encoder unit 30 . Here, however, the optimizer 37 also relies on the modified central signal 142 provided by the central signal encoder unit 130 . The side residual signal will thus be generated in the subtraction unit 35 as an optimal linear combination of the mono signal 42, the modified central signal 142 and the side signal.

上述可变帧长度的概念可以被应用到侧信号和中央信号的任一上或者全部上。The concept of variable frame length described above can be applied to either or both of the side and center signals.

图6说明适于从图5的编码器单元接收编码的音频信号的解码器单元。所接收的信号54被分成表示主信号的编码参数pmono、表示中央信号的编码参数pcnetre以及表示侧信号的编码参数pside。在解码器64中,表示主信号的编码参数pmono被用来产生主信号x”mono。在解码器160中,表示中央信号的编码参数pcentre被用于基于主信号x”mono来产生中央信号x”centre。在解码器60中,根据主信号x”mono和中央信号x”centre来解码表示侧信号的编码参数pside,从而产生侧信号x”side。FIG. 6 illustrates a decoder unit adapted to receive an encoded audio signal from the encoder unit of FIG. 5 . The received signal 54 is divided into coding parameters p mono representing the main signal, coding parameters p cnetre representing the central signal, and coding parameters p side representing the side signals. In decoder 64, the coding parameter p mono representing the main signal is used to generate the main signal x" mono . In decoder 160, the coding parameter p center representing the central signal is used to generate the central signal based on the main signal x" mono . Signal x" centre . In the decoder 60, the coding parameter pside representing the side signal is decoded from the main signal x" mono and the central signal x" centre , thereby generating the side signal x" side .

该过程可以在数学上表示如下:The process can be expressed mathematically as follows:

根据下式将输入信号xleft、xright以及xcentre组合为一个单声道:Combine the input signals x left , x right and x center into one mono channel according to:

xmono(n)=αxleft(n)+βxright(n)+χxcentre(n).x mono (n)=αx left (n)+βx right (n)+χx center (n).

为了简单起见,在剩余部分中将α、β以及χ设置为1.0,但是它们可以被设置为任意值。α、β以及χ的值可以是常数,或者取决于信号内容,以便强调一个或者两个声道,从而获得一个最佳质量。For simplicity, α, β, and χ are set to 1.0 in the remainder, but they can be set to arbitrary values. The values of α, β and χ can be constant or depend on the signal content in order to emphasize one or two channels and thus obtain an optimum quality.

如下计算在单声道和中央信号之间的归一化的互相关:The normalized cross-correlation between the mono and center signals is calculated as follows:

RR == cmcm == RR cntcnt RR cccc ·&Center Dot; RR mmmm ,,

其中in

RR cccc == [[ ΣΣ nno == framestartframe start frameendframe end xx centrecenter (( nno )) xx centrecenter (( nno )) ]]

RR mmmm == [[ ΣΣ nno == framestartframe start frameendframe end xx monomono (( nno )) xx monomono (( nno )) ]]

RR cmcm == [[ ΣΣ nno == framestartframe start frameendframe end xx centrecenter (( nno )) xx monomono (( nno )) ]] ..

xcentre是中央信号,以及xmono是单声道信号。单声道信号来自于单声道目标信号,但是也可能使用单声道编码器的本地合成。x center is the center signal, and x mono is the mono signal. The mono signal is derived from the mono target signal, but may also be synthesized locally using the mono encoder.

要编码的中央残留信号为:The central residual signal to encode is:

xcentreresidual(n)=xcentre(n)-gQxmono(n)x centreresidual (n)=x center (n)-g Q x mono (n)

gg QQ == QQ gg -- 11 (( QQ gg (( RR cmcm RR mmmm )) )) ..

Qg(...)是被应用于平衡因子的量化函数。在传输信道中发送所述平衡因子。Qg(...) is the quantization function applied to the balance factor. The balance factor is sent in a transport channel.

如果Ec是中央残留信号的编码函数(例如变换编码器),以及Em是单声道信号的编码函数,则在解码器末尾的解码信号x”centre被描述为:If E c is the encoding function of the central residual signal (e.g. a transform coder), and E m is the encoding function of the mono signal, then the decoded signal x” center at the end of the decoder is described as:

x″centre(n)=gQx″mono(n)+x″centreresidual(n)x″ center (n)=g Q x″ mono (n)+x″ centerresidual (n)

xx centreresidualcenterresidual ′′ ′′ == EE. cc -- 11 (( EE. cc (( xx centreresidualcenterresidual )) ))

xx monomono ′′ ′′ == EE. mm -- 11 (( EE. mm (( xx monomono )) ))

要编码的侧残留信号为:The side residual signal to be encoded is:

xsideresidual(n)=(xleft(n)-xright(n))-gQsmx″mono(n)-gQscx″centre(n),x sideresidual (n)=(x left (n)-x right (n))-g Qsm x″ mono (n)-g Qsc x″ center (n),

其中gQsm和gQsc是参数gsm和gsc的量化值,其最小化了表达式:where g Qsm and g Qsc are quantized values of the parameters g sm and g sc that minimize the expression:

ΣΣ nno == framestartframe start frameendframe end [[ || (( xx leftleft (( nno )) -- xx rightright (( nno )) )) -- gg smsm xx monomono ′′ ′′ (( nno )) -- gg scsc xx centrecenter ′′ ′′ (( nno )) || ]] ηη ..

对于误差的最小均方最小化,η例如可以等于2。gsm和gsc参数可以被共同量化或者分开量化。For least mean square minimization of errors, n may be equal to 2, for example. The gsm and gsc parameters can be quantized together or separately.

如果Es是侧残留信号的编码函数,则解码后的声道信号x”左和x”right被给出为:If Es is the encoding function of the side residual signal, the decoded channel signals x" left and x" right are given as:

x″left(n)=x″mono(n)-x″centre(n)+x″side(n)x″ left (n)=x″ mono (n)-x″ center (n)+x″ side (n)

x″right(n)=x″mono(n)-x″centre(n)-x″side(n)x″ right (n)=x″ mono (n)-x″ center (n)-x″ side (n)

x″side(n)=x″sideresidual+gQsmx″mono(n)+gQsx″centre(n)x″ side (n)=x″ sideresidual +g Qsm x″ mono (n)+g Qs x″ center (n)

xx sideresidualside residual ′′ ′′ == EE. sthe s -- 11 (( EE. sthe s (( xx sideresidualside residual )) )) ..

最令人讨厌的可感知人工产物之一是预回声效应。在图7a-b中,所述图说明了这种人工产物。假设信号分量具有如曲线100所示的时间发展。在开始(从t0开始),在音频采样中不存在信号分量。在t1和t2之间的时间t,突然出现信号分量。当使用t2-t1的帧长度对该信号分量编码时,该信号分量的出现会被“渗透”在整个帧上,如曲线101所示。如果产生该曲线101的解码,则该信号分量在该信号分量的预期出现之前出现时间Δt,由此感知到“预回声”。One of the most annoying perceptual artifacts is the pre-echo effect. In Figures 7a-b, the figures illustrate this artefact. Assume that the signal components have a temporal development as shown by curve 100 . At the beginning (starting from t0), there are no signal components in the audio samples. At time t between t1 and t2, a signal component suddenly appears. When this signal component is encoded using a frame length of t2-t1, the occurrence of this signal component will be "bleeded" over the entire frame, as shown by curve 101 . If a decoding of this curve 101 occurs, the signal component occurs a time Δt before its expected occurrence, whereby a "pre-echo" is perceived.

如果使用长的编码帧,则预回声的人工产物变得进一步增强。通过使用较短的帧,该人工产物稍微得到抑止。处理上述预回声问题的另一方法是利用以下事实,即在编码器和解码器末尾都可以利用单声道信号。这使得有可能根据该单声道信号的能量轮廓来缩放侧信号。在解码器末尾,执行相反的缩放,因而可以减轻一些预回声问题。If long coded frames are used, the artifacts of the pre-echo become further enhanced. By using shorter frames, this artifact is somewhat suppressed. Another way to deal with the above-mentioned pre-echo problem is to take advantage of the fact that a mono signal is available at both the encoder and decoder end. This makes it possible to scale the side signal according to the energy contour of the mono signal. At the end of the decoder, the inverse scaling is performed, thus mitigating some pre-echo issues.

在整个帧上计算该单声道信号的能量轮廓为:Compute the energy contour of this mono signal over the entire frame as:

E c ( m ) = [ Σ n = m - L m + L w ( n ) x mono 2 ( n ) ] , 帧开始≤m≤帧末尾, E. c ( m ) = [ Σ no = m - L m + L w ( no ) x mono 2 ( no ) ] , frame start ≤ m ≤ frame end,

其中w(n)是加窗函数。最简单的加窗函数是一个矩形窗,但是也许更期望其它的窗口类型,例如汉明窗。where w(n) is the windowing function. The simplest windowing function is a rectangular window, but other window types, such as Hamming windows, may be more desirable.

然后缩放侧残留信号为:Then the residual signal on the scaling side is:

x ‾ sideresidual ( n ) = x sideresidual ( n ) E c ( n ) , 帧开始≤n≤帧末尾。 x ‾ side residual ( no ) = x side residual ( no ) E. c ( no ) , Frame start ≤ n ≤ frame end.

上述等式可以使用更一般的形式被写为:The above equation can be written in a more general form as:

x ‾ sideresidual ( n ) = x sideresidual ( n ) f ( E c ( n ) ) , 帧开始≤n≤帧末尾, x ‾ side residual ( no ) = x side residual ( no ) f ( E. c ( no ) ) , frame start ≤ n ≤ frame end,

其中f(...)是单调连续函数。在解码器中,对所解码的单声道信号计算能量轮廓,并且将所述轮廓应用到解码的侧信号上:where f(...) is a monotone continuous function. In the decoder, an energy profile is computed on the decoded mono signal and applied to the decoded side signal:

x″ side(n)=x″side(n)f(Ec(n)),帧开始≤n≤帧末尾。 x″ side (n)=x″ side (n)f(E c (n)), frame start≤n≤frame end.

由于在某种程度上缩放的此能量轮廓是使用较短帧长度的替代,因此这一概念特别适于与可变帧长度的概念相结合,如上面进一步描述的。通过拥有一些应用能量轮廓缩放的编码方案、一些不应用以及一些仅在某些子帧期间应用能量轮廓缩放的编码方案,可以提供一个更灵活的编码方案的集合。在图8中说明了根据本发明的一个信号编码器单元30的实施例。在此,不同编码方案81包含了加阴影的子帧(表示应用了能量轮廓缩放的编码)和未加阴影的子帧(表示没有应用能量轮廓缩放的编码过程)。以这种方式,不仅可以获得不同长度的子帧的组合,而且可以获得具有不同编码原理的子帧的组合。在当前的说明性实例中,在不同编码方案之间应用的能量轮廓缩放不同。在更一般的情形下,可以用类似的方式将任何的编码原理与可变长度的概念相结合。Since this energy profile that scales to some extent is an alternative to using shorter frame lengths, this concept is particularly well suited in combination with the concept of variable frame lengths, as further described above. By having some coding schemes that apply energy contour scaling, some that do not, and some that apply energy contour scaling only during certain subframes, a more flexible set of coding schemes can be provided. An embodiment of a signal encoder unit 30 according to the invention is illustrated in FIG. 8 . Here, the different coding schemes 81 include shaded subframes (representing coding with energy contour scaling applied) and unshaded subframes (representing coding without energy contour scaling applied). In this way, not only combinations of subframes of different lengths but also combinations of subframes with different coding principles can be obtained. In the current illustrative example, the energy contour scaling applied differs between different encoding schemes. In a more general case, any encoding principle can be combined with the concept of variable length in a similar fashion.

图8的编码方案的集合包括以不同的方式处理例如预回声人工产物的方案。在一些方案中,使用了根据能量轮廓原理具有预回声最小化的较长子帧。在其它方案中,利用了没有进行能量轮廓缩放的较短的子帧。根据信号的内容,其中的一个备选方案会更为有益。对于十分严重的预回声情形,必须使用进行能量轮廓缩放的短子帧的编码方案。The set of encoding schemes of FIG. 8 includes schemes that handle artifacts such as pre-echo in different ways. In some schemes, longer subframes with pre-echo minimization according to the principle of energy contouring are used. In other schemes, shorter subframes are utilized without energy contour scaling. Depending on the content of the signal, one of the alternatives may be more beneficial. For very severe pre-echo situations, short subframe coding schemes with energy contour scaling must be used.

所提出的解决方案可以用在全部频带中或者在一个或多个不同的子带中。子带的使用可以被施加于主信号和侧信号的二者上或者单独施加在其中一个上。优选实施例包括将侧信号分成几个频带。原因只是由于在隔离的频带中除去可能的冗余比在整个频带中除去更容易。当解码具有丰富的频谱内容时这一点特别重要。The proposed solution can be used in all frequency bands or in one or more different sub-bands. The use of subbands can be applied to both the main and side signals or to one of them alone. A preferred embodiment consists in dividing the side signal into several frequency bands. The reason is simply because it is easier to remove possible redundancies in isolated frequency bands than in the entire frequency band. This is especially important when decoding has rich spectral content.

一种可能的用途是利用上述方法来编码低于预定阈值的频带。所述预定阈值优选可以为2kHz,或者甚至更优选为1kHz。对于感兴趣的频率范围的其余部分,可以利用上述方法对另一个附加频带进行编码,或者使用一个完全不同的方法。One possible use is to use the method described above to encode frequency bands below a predetermined threshold. The predetermined threshold may preferably be 2 kHz, or even more preferably 1 kHz. For the rest of the frequency range of interest, another additional frequency band can be coded using the method described above, or a completely different method can be used.

优选为低频使用上述方法的一个动机是扩散的声场通常在高频没有多少能量内容。自然原因是声音吸收通常随着频率而增加。而且,扩散声场分量在较高频率对于人类听觉系统似乎起到不太重要的作用。因此,在低频时(低于1或2kHz)采用所述解决方案是有益的,并且依赖于其它条件而在较高频率使用比特效率更高的编码方案。只在低频时应用所述方案可以大量节省比特率,因为提出的方法所必须的比特率与所需要的带宽成正比。在大多数情形下,单声道编码器可以对整个频带编码,而建议只是在频带的较低部分执行所提出的侧信号编码,如图9示意性地说明的。参考数字301指的是根据本发明的侧信号编码方案,参考数字302指的是任何其它的侧信号编码方案,以及参考数字303指的是侧信号的一个编码方案。One motivation for using the above method preferably for low frequencies is that diffuse sound fields generally have little energy content at high frequencies. The natural reason is that sound absorption generally increases with frequency. Also, diffuse sound field components seem to play a less important role for the human auditory system at higher frequencies. Therefore, it is beneficial to employ the described solution at low frequencies (below 1 or 2 kHz), and use a more bit-efficient coding scheme at higher frequencies, depending on other conditions. Applying the scheme only at low frequencies can save a lot of bit rate, since the bit rate necessary for the proposed method is directly proportional to the required bandwidth. In most cases, a mono coder can encode the entire frequency band, whereas it is proposed to perform the proposed side signal encoding only in the lower part of the frequency band, as schematically illustrated in FIG. 9 . Reference numeral 301 refers to the coding scheme of the side signal according to the invention, reference numeral 302 refers to any other coding scheme of the side signal, and reference numeral 303 refers to a coding scheme of the side signal.

也有可能对于几个不同的频带使用所提出的方法。It is also possible to use the proposed method for several different frequency bands.

在图10中,用流程图说明了根据本发明的编码方法的实施例的主要步骤。该过程开始于步骤200。在步骤210,编码从多音信号中推导出的主信号。在步骤212,提供编码方案,其包括具有不同长度和/或顺序的子帧。在步骤214利用一个至少部分地根据当前多音信号的实际信号内容而选择的编码方案来对从多音信号中推导出的侧信号进行编码。该过程结束于步骤299。In Fig. 10, the main steps of an embodiment of the encoding method according to the invention are illustrated with a flowchart. The process starts at step 200 . In step 210, the main signal derived from the multi-tone signal is encoded. At step 212, a coding scheme is provided that includes subframes of different lengths and/or order. The side signal derived from the multi-tone signal is encoded at step 214 using a coding scheme selected at least in part based on the actual signal content of the current multi-tone signal. The process ends at step 299.

在图11中,用流程图说明了根据本发明的解码方法的实施例的主要步骤。该过程始于步骤200。在步骤220,解码所接收的编码的主信号。在步骤222,提供编码方案,其包括具有不同长度和/或顺序的子帧。在步骤224中通过一个选定的编码方案对所接收的侧信号解码。在步骤226中,将所解码的主和侧信号组合为一个多音信号。所述过程结束于步骤299。In Fig. 11, the main steps of an embodiment of the decoding method according to the invention are illustrated with a flowchart. The process starts at step 200 . In step 220, the received encoded main signal is decoded. At step 222, a coding scheme is provided that includes subframes of different lengths and/or order. In step 224 the received side signal is decoded by a selected coding scheme. In step 226, the decoded main and side signals are combined into one multi-tone signal. The process ends at step 299 .

上述实施例应当被理解为本发明的一些说明性的实例。本领域的技术人员将会理解,可以对这些实施例进行各种修改、组合和变化而不同脱离本发明的范围。特别是,在其它方案中可以组合不同实施例中的不同的部分解决方案,只要其在技术上是可行的。然而,本发明的范围由所附的权利要求书加以限定。The above-described embodiments should be understood as some illustrative examples of the invention. Those skilled in the art will understand that various modifications, combinations and changes can be made to these embodiments without departing from the scope of the present invention. In particular, different partial solutions from the different exemplary embodiments can be combined in other solutions as far as this is technically possible. However, the scope of the present invention is defined by the appended claims.

参考文献references

欧洲专利0497413European Patent 0497413

美国专利5,285,498US Patent 5,285,498

美国专利5,434,948US Patent 5,434,948

由C.Faller等人在德国慕尼黑2002年5月举行的第112届AES会议上的“Binaural cue coding applied to stereo and multi-channel audio compression(对立体声和多声道音频压缩所应用的技术心理声学编码)”。"Binaural cue coding applied to stereo and multi-channel audio compression" by C.Faller et al. at the 112th AES conference held in Munich, Germany in May 2002 coding)".


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4