A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://patents.google.com/patent/CN104604256B/en below:

CN104604256B - Reflected sound rendering of object-based audio

本申请要求2012年8月31日提交的美国临时专利申请No.61/695,893的优先权,其全文内容通过引入并入本文。This application claims priority to US Provisional Patent Application No. 61/695,893, filed August 31, 2012, the entire contents of which are incorporated herein by reference.

实施例针对反射声渲染系统,该系统被配置为与声音格式及处理系统一起工作,该声音格式及处理系统可以被称为“空间音频系统”或“自适应音频系统”,其基于音频格式和渲染技术以允许增强的听众沉浸、更大的艺术控制以及系统灵活性和可扩展性。总的自适应音频系统一般包括被配置成生成包含常规的基于声道的音频元素和音频对象编码元素的一个或更多个比特流的音频编码、分发及解码系统。与单独采用的基于声道的或基于对象的方法相比,这种组合的方法提供了更大的编码效率和渲染灵活性。在2012年4月20日提交的标题为“System and Method for Adaptive Audio Signal Generation,Codingand Rendering”的待审批的美国临时专利申请61/636,429中描述了可以与本实施例一起使用的自适应音频系统的示例,其全文内容通过引用并入于此。Embodiments are directed to reflected sound rendering systems configured to work with sound formats and processing systems, which may be referred to as "spatial audio systems" or "adaptive audio systems," which are based on audio formats and Rendering techniques to allow for enhanced audience immersion, greater artistic control, and system flexibility and scalability. A general adaptive audio system generally includes an audio encoding, distribution and decoding system configured to generate one or more bitstreams containing conventional channel-based audio elements and audio object encoding elements. This combined approach provides greater coding efficiency and rendering flexibility than either channel-based or object-based approaches taken individually. An adaptive audio system that may be used with this embodiment is described in pending U.S. Provisional Patent Application 61/636,429, filed April 20, 2012, entitled "System and Method for Adaptive Audio Signal Generation, Coding and Rendering" , the entire contents of which are hereby incorporated by reference.

自适应音频系统和相关联的音频格式的示例性实施方式是 AtmosTM平台。这种系统包含可以实现为9.1环绕系统或类似的环绕声配置的高度(上/下)维度。图1示出了提供用于回放高度声道的高度扬声器的本环绕系统(例如,9.1环绕)中的扬声器布局。9.1系统100的扬声器配置由地板平面中的五个扬声器102和高度平面中的四个扬声器104构成。一般而言,这些扬声器可以被用来产生被设计为几乎准确地从收听环境内的任何位置发出的声音。预定义的扬声器配置,诸如图1所示的,会天然地限制准确地表现给定声源的位置的能力。例如,声源不能被平移得比左侧扬声器本身更左。这适用于每个扬声器,因此形成了一维的(例如,左-右)、二维的(例如,前-后)或三维的(例如,左-右、前-后、上-下)几何形状,其中,向下混合受到约束。在这种扬声器配置中,可以使用各种不同的扬声器配置和类型。例如,某些增强的音频系统可以使用9.1、11.1、13.1、19.4或其他配置中的扬声器。扬声器类型可包括全范围的直接扬声器、扬声器阵列、环绕扬声器、重低音扬声器、高音扬声以及其他类型的扬声器。An example implementation of an adaptive audio system and associated audio format is Atmos ™ platform. Such a system includes a height (up/down) dimension that can be implemented as a 9.1 surround system or similar surround sound configuration. Figure 1 shows the speaker layout in the present surround system (eg 9.1 surround) providing height speakers for playback of height channels. The loudspeaker configuration of the 9.1 system 100 consists of five loudspeakers 102 in the floor plane and four loudspeakers 104 in the height plane. In general, these loudspeakers can be used to produce sounds that are designed to emanate from almost exactly anywhere within the listening environment. Pre-defined loudspeaker configurations, such as that shown in Figure 1, inherently limit the ability to accurately represent the location of a given sound source. For example, a sound source cannot be panned further left than the left speaker itself. This applies to each speaker, thus forming a 1D (e.g. left-right), 2D (e.g. front-rear) or 3D (e.g. left-right, front-rear, top-bottom) geometry The shape in which downward blending is constrained. In this speaker configuration, a variety of different speaker configurations and types can be used. For example, some enhanced audio systems can use speakers in 9.1, 11.1, 13.1, 19.4, or other configurations. Speaker types can include full-range direct speakers, speaker arrays, surround speakers, subwoofers, tweeters, and other types of speakers.

音频对象可以被认为是可被感觉为从收听环境中的一个或多个特定物理位置发出的声音元素的组。这种对象可以是静态的(即,静止)或动态的(即,移动的)。音频对象由限定声音在给定时间点的位置的元数据和其他函数一起控制。当对象被回放时,它们根据位置元数据使用存在的扬声器来渲染,而并非一定输出到预定义的物理声道。会话中的声轨可以是音频对象,并且标准的平移数据类似于位置元数据。如此,位于屏幕上的内容可以以与基于声道的内容相同的方式有效地平移,但是,如果需要的话位于环绕中的内容可以被渲染给单个扬声器。尽管使用音频对象提供了对分离的效果的期望的控制,但是,声轨的其他方面可以在基于声道的环境中有效地起作用。例如,许多环境效果或混响实际上得益于被馈送到扬声器阵列。虽然这些可以被视为带有足够的宽度以填充阵列的对象,但是保持一些基于声道的功能是有益的。An audio object can be thought of as a group of sound elements that can be perceived as emanating from one or more specific physical locations in the listening environment. Such objects may be static (ie, stationary) or dynamic (ie, moving). Audio objects are controlled by metadata defining the position of a sound at a given point in time, along with other functions. When objects are played back, they are rendered using existing speakers based on positional metadata, rather than necessarily outputting to predefined physical channels. Tracks in a session can be audio objects, and the standard pan data is similar to position metadata. In this way, on-screen content can be effectively panned in the same manner as channel-based content, but content located in surround can be rendered to a single speaker if desired. While the use of audio objects provides desired control over discrete effects, other aspects of soundtracks can function effectively in a channel-based environment. For example, many ambient effects or reverbs actually benefit from being fed to speaker arrays. While these can be viewed as objects with sufficient width to fill the array, it is beneficial to maintain some channel-based functionality.

自适应音频系统被配置为除了音频对象外还支持“床”,其中床是有效地基于声道的副路混合(sub-mix)或阻挡物(stem)。取决于内容创建者的意图,这些可以被单独地或组合地发送到单个床中,用于最后的回放(渲染)。可以在包括头顶扬声器的阵列和不同的基于声道的配置(诸如5.1,7.1,以及9.1)中创建这些床,诸如图1所示出的。图2示出了在实施例下的用于产生自适应音频混合的基于通道和对象的数据的组合。如过程200中所示,基于声道的数据202(例如,可以是以脉冲编码调制(PCM)的数据的形式提供的5.1或7.1环绕声数据)与音频对象数据204组合,以产生自适应音频混合208。通过把原始的基于声道的数据的元素与指定关于音频对象的位置的某些参数的相关联的元数据组合,来产生音频对象数据204。如图2从概念性地示出的,创作工具提供创建同时包含扬声器声道组和对象声道的组合的音频节目的能力。例如,音频节目可以包含可任选地组织成组(或声轨,例如,立体声或5.1声轨)的一个或更多个扬声器声道、一个或更多个扬声器声道的描述性的元数据、一个或更多个对象声道以及一个或更多个对象声道的描述性的元数据。The adaptive audio system is configured to support "beds" in addition to audio objects, where beds are effectively channel-based sub-mixes or stems. Depending on the intent of the content creator, these can be sent individually or in combination into a single bed for final playback (rendering). These beds can be created in arrays including overhead speakers and in different channel-based configurations (such as 5.1, 7.1, and 9.1), such as that shown in FIG. 1 . Figure 2 illustrates the combination of channel and object based data used to generate an adaptive audio mix, under an embodiment. As shown in process 200, channel-based data 202 (e.g., 5.1 or 7.1 surround sound data that may be provided as pulse code modulated (PCM) data) is combined with audio object data 204 to produce adaptive audio Mix 208. Audio object data 204 is generated by combining elements of the original channel-based data with associated metadata specifying certain parameters about the position of the audio object. As shown conceptually in Figure 2, the authoring tool provides the ability to create audio programs that contain combinations of both speaker channel groups and object channels. For example, an audio program may contain one or more speaker channels, descriptive metadata for one or more speaker channels, optionally organized into groups (or tracks, e.g., stereo or 5.1 tracks) , one or more object channels, and descriptive metadata for the one or more object channels.

自适应音频系统作为分发空间音频的手段,有效地移动到简单“扬声器馈送”以外,并且高级的基于模型的音频描述已被开发出来,所述基于模型的音频描述允许听者自由地选择适合他们的单独的需要或预算的回放配置,并让音频专门针对他们的各自选择的配置而渲染。在高级别,有四个主要的空间音频描述格式:(1)扬声器馈送,其中音频被描述为计划用于位于标称扬声器位置的扬声器的信号;(2)麦克风馈送,其中音频被描述为由预定义配置(麦克风的数量以及它们的相对位置)中的实际或虚拟麦克风捕捉到的信号;(3)基于模型的描述,其中音频按照在所描述的时刻和位置的音频事件的顺序被描述;以及(4)两耳式的,其中音频通过到达听者的两个耳朵的信号被描述。Adaptive audio systems effectively move beyond simple "speaker feeds" as a means of distributing spatial audio, and advanced model-based audio descriptions have been developed that allow listeners the freedom to choose individual needs or budget for playback configurations, and have the audio rendered specifically for their respective selected configurations. At a high level, there are four main spatial audio description formats: (1) speaker feeds, where audio is described as the signal intended for speakers located at nominal speaker positions; (2) microphone feeds, where audio is described as signals captured by real or virtual microphones in a predefined configuration (number of microphones and their relative positions); (3) model-based description, where the audio is described in the order of audio events at the described instants and locations; and (4) diaural, where the audio is described by signals reaching both ears of the listener.

四个描述格式常常与下列常见的渲染技术相关联,其中,术语“渲染”意指转换到用作扬声器馈送的电信号:(1)平移,其中使用一组平移法则和已知的或假设的扬声器位置,来把音频流转换成扬声器馈送(典型地,在分发之前渲染);(2)高保真度立体声响复制(ambisonics),其中麦克风信号被转换成用于可扩展的扬声器阵列的馈送(典型地,在分发之后渲染);(3)波场合成(WFS),其中声源被转换成合适的扬声器信号,以合成声场(典型地,在分发之后渲染);以及(4)两耳式的,其中L/R两耳信号被发送给L/R耳朵,典型地通过耳机,但也可通过与串扰消除结合的扬声器。Four description formats are often associated with the following common rendering techniques, where the term "rendering" means conversion to an electrical signal for use as a speaker feed: (1) translation, where a set of translation laws and known or hypothesized speaker positions, to convert audio streams into speaker feeds (typically, rendered before distribution); (2) ambisonics, where microphone signals are converted into feeds for scalable speaker arrays ( typically rendered after distribution); (3) wave field synthesis (WFS), in which sound sources are converted into appropriate speaker signals to synthesize the sound field (typically rendered after distribution); and (4) diaural , where the L/R binaural signals are sent to the L/R ears, typically through headphones, but also through speakers combined with crosstalk cancellation.

一般而言,任何格式都可以被转换为另一格式(但是,这可能要求盲源分离或类似的技术),并且使用如前所述的技术中的任何一种渲染;然而,在实践中并非所有的变换都会产生好的结果。扬声器馈送格式是最常见的,因为它简单而有效。最好的声音结果(即,最准确、可靠的)是通过直接在扬声器馈送中混合/监视然后分发来实现的,因为在内容创建者和听者之间不要求处理。如果回放系统是提前已知的,则扬声器馈送描述提供最高的保真度;然而,回放系统及其配置常常不是预先已知的。相比之下,基于模型的描述是适应性最强的,因为它不作出关于回放系统的假设,因此最容易适用于多种渲染技术。基于模型的描述可以有效地捕捉空间信息,但是,随着音频源的数量增大而变得非常低效。In general, any format can be converted to another format (however, this may require blind source separation or similar techniques) and rendered using any of the techniques described previously; however, in practice it is not All transformations produce good results. The speaker feed format is the most common because it is simple and effective. The best sound results (ie, most accurate, reliable) are achieved by mixing/monitoring directly in the speaker feeds and then distributing, since no processing is required between the content creator and the listener. Speaker feed descriptions provide the highest fidelity if the playback system is known in advance; however, the playback system and its configuration are often not known in advance. In contrast, the model-based description is the most adaptable, since it makes no assumptions about the playback system, and thus is most easily applicable to multiple rendering techniques. Model-based descriptions can effectively capture spatial information, but become very inefficient as the number of audio sources increases.

自适应音频系统组合基于声道和基于模型的系统两者的优点,具有明确的益处,包括高音色质量、当使用相同的声道配置混合和渲染时艺术意图的最佳再现、具有对渲染配置的“向下”适应的单库存(single inventory)、对系统流水线的相对较低的影响、经由更精细的水平扬声器空间分辨率和新的高度声道的增强的沉浸。自适应音频系统提供了若干个新特征,包括:具有对特定影院渲染配置的向下和向上适应的单库存,即,回放环境中的可用扬声器的延迟渲染和最佳使用;增强的环绕感(envelopment),包括优化的向下混合以避免声道间关联(ICC)失真;经由彻底操控(steer-thru)阵列的增加的空间分辨率(例如,允许音频对象被动态地分配到环绕阵列内的一个或更多个扬声器);以及,经由高分辨率中心或类似的扬声器配置的增加的前面声道分辨率。Adaptive audio systems combine the advantages of both channel-based and model-based systems, with definite benefits including high timbre quality, optimal reproduction of artistic intent when mixed and rendered using the same channel configuration, Single inventory for "downward" adaptation, relatively low impact on the system pipeline, enhanced immersion via finer horizontal speaker spatial resolution and new height channels. The Adaptive Audio System provides several new features, including: single inventory with downward and upward adaptation to specific theater rendering configurations, i.e. deferred rendering and optimal use of available speakers in the playback environment; enhanced surround sensation ( environment), including optimized downmixing to avoid Inter-Channel Correlation (ICC) distortion; increased spatial resolution via steer-thru arrays (e.g., allowing audio objects to be dynamically one or more speakers); and, increased front channel resolution via a high-resolution center or similar speaker configuration.

音频信号的空间效果在为听者提供沉浸式体验时是关键的。旨在从观看屏或收听环境的特定区域发出的声音应该通过位于相同的相对位置的扬声器回放。如此,基于模型的描述中的声音事件的主要音频元数据是位置,但是也可以描述诸如大小、朝向、速度以及声频散之类的其他参数。为传达位置,基于模型的3D音频空间描述要求3D坐标系。为方便或压缩,一般选择用于传输的坐标系(欧几里得、球、圆柱);然而,其他坐标系可以用于渲染处理。除坐标系之外,还需要参照系以表示对象在空间中的位置。为使系统在各种不同的环境中准确地再现基于位置的声音,选择适当的参照系是关键的。在非自我中心(allocentric)的参照系的情况下,相对于渲染环境内的诸如房间墙和角落之类的特征、标准扬声器位置以及屏幕位置,来定义音频源位置。在自我中心(egocentric)的参照系中,相对于听者的角度来表示位置,诸如“在我前面”、“稍微向左”等等。对空间感知(音频等等)的科学研究表明,几乎普遍使用自我中心的角度。然而,对于影院,非自我中心的参照系一般更加合适。例如,当在屏幕上有相关联的对象时,音频对象的准确的位置是最重要的。当使用非自我中心的参照时,对于每个收听位置以及对于任何屏幕大小,声音将局限在屏幕上的相同的相对位置,例如,“屏幕的中间的左侧三分之一”。另一理由是,调音师倾向于从非自我中心的角度思考和混合,并且平移工具用非自我中心的参照系(即,房间墙)安排,并且调音师期望它们如此渲染,例如,“此声音应该在屏幕上”、“此声音应该在屏幕外”或“从左侧墙”等等。The spatial effects of an audio signal are critical in providing an immersive experience for the listener. Sound intended to emanate from a specific area of the viewing screen or listening environment should be reproduced through speakers located in the same relative position. As such, the primary audio metadata of a sound event in a model-based description is position, but other parameters such as size, orientation, velocity, and sound dispersion may also be described. To convey position, a model-based 3D audio spatial description requires a 3D coordinate system. A coordinate system (Euclidean, spherical, cylindrical) is generally chosen for transmission for convenience or compression; however, other coordinate systems may be used for the rendering process. In addition to a coordinate system, a frame of reference is needed to represent the position of objects in space. Choosing an appropriate frame of reference is critical for a system to accurately reproduce position-based sound in a variety of environments. In the case of an allocentric frame of reference, audio source positions are defined relative to features such as room walls and corners, standard speaker positions, and screen positions within the rendering environment. In an egocentric frame of reference, position is expressed relative to the listener's angle, such as "in front of me", "slightly to the left", etc. Scientific research on spatial perception (audio, etc.) shows an almost universal use of the egocentric perspective. For cinema, however, a non-egocentric frame of reference is generally more appropriate. For example, the exact position of an audio object is most important when there are associated objects on the screen. When using a non-egocentric reference, the sound will be localized to the same relative location on the screen for each listening position and for any screen size, eg "middle left third of the screen". Another reason is that sound engineers tend to think and mix from a non-egocentric point of view, and panning tools are arranged with non-egocentric frames of reference (i.e., room walls), and sound engineers expect them to be rendered as such, e.g., " This sound should be on screen", "This sound should be off screen", or "From the left wall", etc.

尽管在影院环境中使用非自我中心的参照系,但是,在某些情况下,自我中心的参照系可能有用且更合适。这些情况包括画外音,即不存在于“故事空间”中的那些声音,例如气氛音乐,自我中心地一致呈现可能是期望的。另一种情况是要求自我中心的表示的近场效果(例如,听者的左耳朵中的嗡嗡叫的蚊子)。另外,无限远的声源(以及所产生的平面波)可能看起来来自恒定的自我中心的位置(例如,向左30度),从自我中心的角度比从非自我中心的角度更加容易描述这种声音。在某些情况下,可以使用非自我中心的参照系,只要定义了标称收听位置,而某些示例要求还不可能渲染的自我中心的表示。虽然非自我中心的参照可能更加有用并且合适,但是音频表示应该是可扩展的,因为在某些应用和收听环境中,包括自我中心的表示的许多新特征可能更加合乎需要。Although a non-egocentric frame of reference is used in a theater environment, there are situations in which an egocentric frame of reference may be useful and more appropriate. These situations include voice-overs, ie those sounds that do not exist in the "story space", such as ambient music, where an egocentric coherent presentation may be desired. Another case is near-field effects that require egocentric representations (eg, a buzzing mosquito in the listener's left ear). Also, an infinitely distant sound source (and the resulting plane wave) may appear to come from a constant egocentric position (e.g., 30 degrees to the left), which is easier to describe from an egocentric perspective than from a non-egocentric one. sound. In some cases, a non-egocentric frame of reference can be used as long as a nominal listening position is defined, while some examples require an egocentric representation that is not yet possible to render. While non-egocentric references may be more useful and appropriate, audio representations should be extensible, as many new features including egocentric representations may be more desirable in certain applications and listening environments.

自适应音频系统的实施例包括混合型空间描述方法,该方法包括用于最佳保真度并用于使用自我中心的参照加非自我中心的基于模型的声音描述以有效地使得能够增强空间分辨率和可缩放性来渲染散开的或复杂的多点源(例如,体育场群众,周围环境)而推荐的声道配置。图3是在实施例下的用于自适应音频系统中的回放体系结构的框图。图3的系统包括在音频被发送到后处理和/或放大以及扬声器级之前执行传统(legacy)、对象和声道音频解码、对象渲染、声道重新映射和信号处理的处理块。Embodiments of the adaptive audio system include a hybrid spatial description approach that includes for optimal fidelity and for using egocentric reference plus non-egocentric model-based sound description to effectively enable enhanced spatial resolution Recommended channel configuration for rendering diffuse or complex multipoint sources (eg, stadium crowds, ambient environments) and scalability. Figure 3 is a block diagram of a playback architecture for use in an adaptive audio system, under an embodiment. The system of FIG. 3 includes processing blocks that perform legacy, object and channel audio decoding, object rendering, channel remapping, and signal processing before the audio is sent to post-processing and/or amplification and speaker stages.

回放系统300被配置为渲染和回放通过一个或更多个捕捉、预处理、创作和编解码组件生成的音频内容。自适应音频预处理器可包括通过分析输入音频自动地生成合适的元数据的源分离和内容类型检测功能。例如,通过声道对之间的相关联的输入的相对电平的分析,可以从多声道记录导出位置元数据。可以例如通过特征提取和分类来实现诸如“语音”或“音乐”之类的内容类型的检测。某些创作工具允许通过优化音响工程师的创意意图的输入和编码来创作音频节目,允许他一次创建为在几乎任何回放环境中回放而优化的最终的音频混合。这可以通过使用音频对象和与原始音频内容相关联且编码的位置数据来完成。为了准确地在礼堂周围布置声音,音响工程师需要基于回放环境的实际约束和特征来对声音最终将如何渲染进行控制。自适应音频系统通过允许音响工程师通过使用音频对象和位置数据来改变如何设计和混合音频内容,来提供此控制。一旦自适应音频内容已在合适的编解码器设备中被创作和编码,它在回放系统300的各种组件中被解码和渲染。Playback system 300 is configured to render and playback audio content generated by one or more capture, preprocessing, authoring, and codec components. An adaptive audio preprocessor may include source separation and content type detection functionality that automatically generates appropriate metadata by analyzing input audio. For example, positional metadata may be derived from a multi-channel recording by analysis of the relative levels of associated inputs between channel pairs. Detection of content types such as "speech" or "music" can be achieved, for example, by feature extraction and classification. Certain authoring tools allow the authoring of audio programs by optimizing the input and encoding to the sound engineer's creative intent, allowing him to create in one go a final audio mix optimized for playback in almost any playback environment. This can be done using audio objects and position data associated and encoded with the original audio content. To accurately place sounds around an auditorium, sound engineers need control over how the sound will ultimately be rendered based on the actual constraints and characteristics of the playback environment. Adaptive audio systems provide this control by allowing sound engineers to alter how audio content is designed and mixed by using audio object and position data. Once the adaptive audio content has been authored and encoded in the appropriate codec device, it is decoded and rendered in the various components of the playback system 300 .

如图3所示,(1)传统环绕声音频302、(2)包括对象元数据的对象音频304以及(3)包括声道元数据的声道音频306被输入到处理块310内的解码器状态308、309。对象元数据在对象渲染器312中渲染,而声道元数据根据需要可以被重新映射。向对象渲染器和声道重新映射组件提供收听环境配置信息307。然后,在混合型音频数据被输出到B链处理级316和通过扬声器318回放之前,通过诸如均衡器和限制器314之类的一个或更多个信号处理级,处理混合型音频数据。系统300表示自适应音频的回放系统的示例,其他配置、组件和互连也是可以的。As shown in FIG. 3, (1) conventional surround sound audio 302, (2) object audio 304 including object metadata, and (3) channel audio 306 including channel metadata are input to the decoder within processing block 310 Status 308, 309. Object metadata is rendered in the object renderer 312, while channel metadata can be remapped as needed. The listening environment configuration information is provided 307 to the object renderer and channel remapping component. The mixed audio data is then processed through one or more signal processing stages, such as equalizer and limiter 314 , before being output to B-chain processing stage 316 and played back through speakers 318 . System 300 represents an example of a playback system for adaptive audio, and other configurations, components, and interconnections are possible.

图3的系统示出了这样的实施例:在该实施例中,渲染器包括将对象元数据施加到输入音频声道以一起处理基于对象的音频内容和可选的基于声道的音频内容的组件。实施例也可以针对输入音频声道只包括传统的基于声道的内容并且渲染器包括生成用于传输到环绕声配置中的驱动器阵列的扬声器馈送的组件的情况。在此情况下,输入不一定是基于对象的内容,而是诸如在Dolby Digital或Dolby Digital Plus或相似的系统中提供的传统5.1或7.1(或其他非基于对象的)内容。The system of FIG. 3 shows an embodiment in which the renderer includes a component that applies object metadata to input audio channels to process object-based audio content and optional channel-based audio content together. components. Embodiments may also address the case where the input audio channels include only conventional channel-based content and the renderer includes components that generate speaker feeds for transmission to a driver array in a surround sound configuration. In this case, the input is not necessarily object-based content, but traditional 5.1 or 7.1 (or other non-object-based) content such as provided in Dolby Digital or Dolby Digital Plus or similar systems.

回放应用playback application

如上所述,自适应音频格式及系统的初始实现是在数字影院(D-影院)的背景中,所述数字影院的背景包括使用新颖的创作工具创作的、使用自适应音频影院编码器打包的、使用PCM或专有的无损编解码器使用现有的数字影院倡议(DCI)分发机制分发的内容捕捉(对象和声道)。在此情况下,音频内容旨在在数字影院中被解码和渲染,以创建沉浸式的空间音频影院体验。然而,正如以前的影院改善(诸如模拟环绕声、数字多声道音频等),有直接向家庭中的用户提供由自适应音频格式提供的增强的用户体验的迫切需求。这要求所述格式和系统的某些特征被改为用于更加受限的收听环境中。例如,与影院或剧场环境相比,家庭、房间、小礼堂或类似的场所可能具有缩小的空间、声学性质以及设备功能。出于描述的目的,术语“基于消费者的环境”旨在包括包含供常规的消费者或专业人员使用的收听环境的任何非影院环境,诸如房屋、工作室、房间、控制台区域、礼堂等。音频内容可以被单独地获得和渲染,或它可以与图形内容(例如,静止图像、光显示器、视频等)相关联。As noted above, the initial implementation of the Adaptive Audio format and system was in the context of Digital Cinema (D-Cinema), which consisted of audio files authored using novel authoring tools and packaged using the Adaptive Audio Cinema encoder. , content capture (objects and channels) distributed using PCM or a proprietary lossless codec using existing Digital Cinema Initiative (DCI) distribution mechanisms. In this case, the audio content is intended to be decoded and rendered in digital cinema to create an immersive spatial audio cinema experience. However, as with previous theater improvements (such as analog surround sound, digital multi-channel audio, etc.), there is a pressing need to provide the enhanced user experience provided by adaptive audio formats directly to the user in the home. This requires that certain features of the format and system be adapted for use in more restricted listening environments. For example, a home, room, auditorium, or similar venue may have reduced space, acoustic properties, and equipment functionality compared to a theater or theater environment. For purposes of the description, the term "consumer-based environment" is intended to include any non-theatre environment, such as a house, studio, room, console area, auditorium, etc., that encompasses a listening environment intended for regular consumer or professional use . Audio content may be obtained and rendered separately, or it may be associated with graphical content (eg, still images, light displays, video, etc.).

图4A是示出了在实施例下的用于修改基于影院的音频内容以用在收听环境中的功能性组件的框图。如图4A所示,在框402中,使用合适的设备和工具来捕捉和/或创作典型地包括运动图像声轨的影院内容。在自适应音频系统中,在框404中,通过编码/解码和渲染组件和界面,来处理该内容。然后,所产生的对象和声道音频馈送被发送到影院或剧场406中的合适的扬声器。在系统400中,该影院内容还经过处理以在诸如家庭影院系统之类的收听环境416中回放。假设由于有限空间、缩少的扬声器数量等,收听环境不如内容创建者所计划的那样全面或能够再现全部的声音内容。然而,实施例针对允许原始音频内容以最小化由收听环境的缩小的容量所施加的限制的方式来被渲染,并允许以最大化可用设备的方式来处理位置提示的系统和方法。如图4A所示,通过影院至消费者转译器组件408来处理影院音频内容,其中它在消费者内容编码和渲染链414中被处理。该链还处理在框412中捕捉和/或创作的原始音频内容。原始内容和/或经过转译的影院内容然后在收听环境416中回放。以这种方式,即使使用家庭或收听环境416的可能受限的扬声器配置,在音频内容中编码的相关空间信息也可用于以更加沉浸式方式渲染声音。Figure 4A is a block diagram illustrating functional components for modifying theater-based audio content for use in a listening environment, under an embodiment. As shown in FIG. 4A, in block 402, appropriate equipment and tools are used to capture and/or author theater content, typically including a motion picture soundtrack. In an adaptive audio system, the content is processed in block 404 by encoding/decoding and rendering components and interfaces. The resulting object and channel audio feeds are then sent to the appropriate speakers in the theater or theater 406 . In system 400, the theater content is also processed for playback in a listening environment 416, such as a home theater system. Suppose the listening environment is not as comprehensive or capable of reproducing the full sound content as planned by the content creator due to limited space, reduced number of speakers, etc. However, embodiments are directed to systems and methods that allow original audio content to be rendered in a manner that minimizes the constraints imposed by the reduced capacity of the listening environment, and allow location cues to be handled in a manner that maximizes available equipment. As shown in FIG. 4A , theater audio content is processed by a theater-to-consumer translator component 408 where it is processed in a consumer content encoding and rendering chain 414 . The chain also processes the raw audio content captured and/or authored in block 412 . The original content and/or translated theater content is then played back in the listening environment 416 . In this way, the relevant spatial information encoded in the audio content can be used to render sound in a more immersive manner, even using the possibly limited speaker configuration of the home or listening environment 416 .

图4B更详细地示出了图4A的组件。图4B示出了在整个音频回放生态系统中的用于自适应音频影院内容的示例性的分发机制。如图示420所示,捕捉422和创作423原始影院和TV内容,以在各种不同的环境中回放,从而提供影院体验427或消费者环境体验434。同样,捕捉423和创作425某些用户生成的内容(UGC)或消费者内容,以在收听环境434中回放。通过已知的影院过程426来处理用于在影院环境427中回放的影院内容。然而,在系统420中,影院创作工具框423的输出还包括传达调音师的艺术意图的音频对象、音频声道和元数据。这可以被视为夹层式的音频包,该音频包可用于创建用于回放的影院内容的多种版本。在实施例中,此功能由影院至消费者自适应音频转译器430提供。此转译器具有到自适应音频内容的输入,并从其中提炼用于期望的消费者端点434的合适的音频和元数据内容。取决于分发机制和端点,转译器创建单独的并可能不同的音频和元数据输出。Figure 4B shows the components of Figure 4A in more detail. Figure 4B illustrates an exemplary distribution mechanism for adaptive audio cinema content throughout the audio playback ecosystem. As shown in illustration 420, original theater and TV content is captured 422 and authored 423 for playback in a variety of different environments to provide a theater experience 427 or a consumer environment experience 434. Likewise, some user-generated content (UGC) or consumer content is captured 423 and authored 425 for playback in the listening environment 434 . The theater content for playback in a theater environment 427 is processed by known theater processes 426 . In system 420, however, the output of theater authoring tool block 423 also includes audio objects, audio channels, and metadata that convey the sound engineer's artistic intent. This can be thought of as a sandwich-style audio package that can be used to create multiple versions of theater content for playback. In an embodiment, this functionality is provided by the theater-to-consumer adaptive audio translator 430 . This translator has input to adaptive audio content and extracts therefrom the appropriate audio and metadata content for the desired consumer endpoint 434 . Depending on the distribution mechanism and endpoint, the transpiler creates separate and possibly different audio and metadata outputs.

如系统420的示例所示,影院至消费者转译器430为图像(广播,盘,OTT等)和游戏音频比特流创建模块428馈送声音。适用于发送影院内容的这两个模块,可以被提供到多个分发流水线432中,所有的流水线432都可以发送到消费者端点。例如,可以使用适用于广播目的的编解码器(诸如Dolby Digital Plu)来编码自适应音频影院内容,所述自适应音频影院内容可以被修改为传送声道、对象和相关联的元数据,并经由电缆或卫星通过广播链传输,然后在家庭中解码和渲染,以用于家庭影院或电视回放。类似地,相同的内容可以使用适用于带宽有限的在线分发的编解码器来编码,然后通过3G或4G移动网络来传输,然后解码和渲染,以使用耳机经由移动设备来回放。诸如TV、实况广播、游戏和音乐之类的其他内容源也可以使用自适应音频格式来创建并提供下一代音频格式的内容。As shown in the example of system 420 , theater to consumer translator 430 feeds sound to image (broadcast, disc, OTT, etc.) and game audio bitstream creation module 428 . These two modules, suitable for delivering theater content, can be provided into multiple distribution pipelines 432, all of which can be delivered to consumer endpoints. For example, Adaptive Audio Cinema content can be encoded using a codec suitable for broadcast purposes, such as Dolby Digital Plus, which can be modified to convey channels, objects, and associated metadata, and Transmitted through the broadcast chain via cable or satellite, then decoded and rendered at home for home theater or TV playback. Similarly, the same content could be encoded using a codec suitable for bandwidth-limited online distribution, then transmitted over a 3G or 4G mobile network, then decoded and rendered for playback via a mobile device using headphones. Other content sources such as TV, live broadcasts, games and music can also use the adaptive audio format to create and provide content in next generation audio formats.

图4B的系统在整个消费者音频生态系统中提供增强的用户体验,该消费者音频生态系统可包括家庭剧院(A/V接收器、音箱以及BluRay)、E-媒体(PC、平板计算机、包括耳机回放的移动设备)、广播(TV和机顶盒)、音乐、游戏、实况声音、用户生成的内容(“UGC”)等。这种系统提供:针对所有端点设备的听众的增强的沉浸、针对音频内容创建者的扩展的艺术控制、用于改善的渲染的改善的依赖内容的(描述性的)元数据、用于回放系统的扩展的灵活性和可缩放性、音色维持和匹配以及基于用户位置和交互的内容的动态渲染的机会。系统包括若干个组件,所述组件包括用于内容创建者的新的混合工具、用于分发和回放的更新的且新的打包和编码工具、家用的动态混合和渲染(适合于不同的配置)、额外的扬声器位置和设计。The system of FIG. 4B provides an enhanced user experience throughout the consumer audio ecosystem, which may include home theater (A/V receiver, speakers, and BluRay), E-media (PC, tablet, including headphone playback), broadcast (TV and set-top boxes), music, gaming, live sound, user-generated content (“UGC”), and more. Such a system provides: enhanced immersion for listeners of all endpoint devices, expanded artistic control for audio content creators, improved content-dependent (descriptive) metadata for improved rendering, playback system Opportunities for extended flexibility and scalability, timbre maintenance and matching, and dynamic rendering of content based on user position and interaction. The system includes several components including new mixing tools for content creators, updated and new packaging and encoding tools for distribution and playback, dynamic mixing and rendering at home (suitable for different configurations) , additional speaker placement and design.

自适应音频生态系统被配置为是使用自适应音频格式的全面的、端对端的下一代音频系统,其包括跨大量的端点设备和使用情况的内容创建、打包、分发和回放/渲染。如图4B所示,系统对从多个不同的使用情况422和424捕捉的内容以及用于不同的使用情况422和424的内容进行创作。这些捕捉点包括所有相关的内容格式,包括影院、电视、实况广播(以及声音)、UGC、游戏和音乐。内容,在它经过生态系统时,经过若干个关键阶段,诸如预处理和创作工具、转译工具(即,将用于影院的自适应音频内容转译为消费者内容分发应用)、特定的自适应音频打包/比特流编码(捕捉音频实质数据以及额外的元数据和音频再现信息)、用于通过各种音频声道的有效率的分发的使用现有的或新的编解码器(例如,DD+,TrueHD,Dolby Pulse)的分发编码、通过相关的分发渠道(广播、盘、移动、因特网等)的传输以及最后的端点感知的动态渲染,以再现和传达由内容创建者所限定的提供空间音频体验的益处的自适应音频用户体验。自适应音频系统可以在针对变化范围宽的大量的消费者端点进行渲染期间使用,并且所应用的渲染技术可以取决于端点设备来得到优化。例如,家庭剧院系统和音箱可以在各种位置具有2、3、5、7或者甚至9个单独的扬声器。许多其他类型的系统只有两个扬声器(TV、膝上型计算机、音乐对接器),并且几乎所有的常用设备具有耳机输出(PC、膝上型计算机、平板计算机、手机、音乐回放器等)。The adaptive audio ecosystem is configured to be a comprehensive, end-to-end next-generation audio system using adaptive audio formats, which includes content creation, packaging, distribution and playback/rendering across a large number of endpoint devices and use cases. As shown in FIG. 4B , the system authores content captured from a number of different use cases 422 and 424 and content for different use cases 422 and 424 . These capture points include all relevant content formats, including cinema, TV, live broadcast (and sound), UGC, games and music. Content, as it travels through the ecosystem, passes through several key stages, such as pre-processing and authoring tools, translation tools (i.e., translation of adaptive audio content for cinema into consumer content distribution applications), specific adaptive audio Packetization/bitstream encoding (capturing audio substance data as well as additional metadata and audio reproduction information), for efficient distribution over various audio channels using existing or new codecs (e.g. DD+, TrueHD, Dolby Pulse), transmission through associated distribution channels (broadcast, disc, mobile, Internet, etc.) and finally endpoint-aware dynamic rendering to reproduce and convey the spatial audio experience defined by the content creator The benefits of an adaptive audio user experience. The adaptive audio system can be used during rendering to a widely varying number of consumer endpoints, and the applied rendering techniques can be optimized depending on the endpoint device. For example, home theater systems and sound boxes may have 2, 3, 5, 7 or even 9 individual speakers in various locations. Many other types of systems have only two speakers (TVs, laptops, music docks), and almost all common devices have headphone outputs (PCs, laptops, tablets, cell phones, music players, etc.).

当前用于环绕声音频的创作及分发系统在对在音频实质(即,由再现系统回放的实际音频)中传达的内容类型了解有限的情况下创建计划用于再现的音频并将其传送到预定义的且固定的扬声器位置。然而,自适应音频系统为音频创建提供新的混合型方法,该方法包括用于固定的扬声器位置特定的音频(左声道、右声道等)和具有一般化的3D空间信息的基于对象的音频元素的选项,所述3D空间信息包括位置、大小和速度。此混合型方法提供对于保真度(由固定的扬声器位置所提供的)和渲染(一般化的音频对象)的灵活性的均衡的方法。此系统还经由由内容创建者在内容创建/创作时与音频实质配对的新的元数据,来提供关于音频内容的额外的有用信息。此信息提供可以在渲染期间使用的关于音频的属性的详细信息。这样的属性可包括内容类型(对话、音乐、效果、配音(Foley)、背景/周围环境等)以及诸如空间属性(3D位置、对象大小、速度等等)之类的音频对象信息和有用的渲染信息(与扬声器位置的对齐、声道权重、增益、低音管理信息等)。音频内容和再现意图元数据可以由内容创建者人工地创建,或通过使用自动的媒体智能算法来创建,所述媒体智能算法可以在创作期间在后台运行,并且如果需要的话在最后的质量控制阶段期间由内容创建者审查。Current authoring and distribution systems for surround sound audio create and deliver audio intended for reproduction with limited knowledge of the type of content conveyed in the audio substance (i.e., the actual audio played back by the reproduction system). Defined and fixed speaker positions. Adaptive audio systems, however, offer a new hybrid approach to audio creation that includes position-specific audio for fixed speakers (left channel, right channel, etc.) and object-based audio with generalized 3D spatial information. Options for audio elements, the 3D spatial information includes position, size and velocity. This hybrid approach provides a balanced approach to fidelity (provided by fixed speaker positions) and flexibility in rendering (generalized audio objects). This system also provides additional useful information about the audio content via new metadata paired with the audio substance at the time of content creation/authoring by the content creator. This information provides details about the properties of the audio that can be used during rendering. Such attributes may include content type (dialogue, music, effects, voiceover (Foley), background/surroundings, etc.) as well as audio object information such as spatial attributes (3D position, object size, velocity, etc.) and useful rendering information (alignment to speaker position, channel weighting, gain, bass management information, etc.). Audio content and rendering intent metadata can be created manually by the content creator, or through the use of automated media intelligence algorithms that can run in the background during authoring and, if desired, at the final quality control stage period is reviewed by the content creator.

图4C是在实施例下的自适应音频环境的功能性组件的框图。如图示450所示,系统处理承载有混合型的基于对象的和基于声道的音频流的经编码的比特流452。由渲染/信号处理块454处理比特流。在实施例中,此功能块的至少一部分可以在图3中所示的渲染块312中实现。渲染功能454实现用于自适应音频的各种渲染算法以及诸如向上混合、处理直接对反射声等之类的某些后处理算法。来自渲染器的输出通过双向互连456被提供到扬声器458。在实施例中,扬声器458包括可以布置在环绕声或类似的配置中的许多个单个驱动器。驱动器能单独寻址,并可以实施在单个外壳或多驱动器箱或阵列中。系统450也可以包括提供用于校准渲染过程的收听环境或房间特性的测量的麦克风460。在块462中提供系统配置和校准功能。这些功能可以作为渲染组件的一部分被包括,或者它们可以被实现为在功能上与渲染器耦接的单独的组件。双向互连456提供从收听环境中的扬声器回到校准组件462的反馈信号路径。Figure 4C is a block diagram of functional components of an adaptive audio environment, under an embodiment. As shown in diagram 450, the system processes an encoded bitstream 452 carrying a hybrid object-based and channel-based audio stream. The bitstream is processed by rendering/signal processing block 454 . In an embodiment, at least a portion of this functional block may be implemented in the rendering block 312 shown in FIG. 3 . Rendering function 454 implements various rendering algorithms for adaptive audio as well as certain post-processing algorithms such as upmixing, handling direct vs. reflections, and the like. Output from the renderer is provided to speakers 458 via bi-directional interconnect 456 . In an embodiment, speakers 458 include a number of individual drivers that may be arranged in a surround sound or similar configuration. Drives are individually addressable and can be implemented in a single enclosure or in multiple drive enclosures or arrays. System 450 may also include a microphone 460 that provides measurements of listening environment or room characteristics for calibrating the rendering process. System configuration and calibration functions are provided in block 462 . These functions can be included as part of the rendering component, or they can be implemented as a separate component functionally coupled to the renderer. Bi-directional interconnect 456 provides a feedback signal path from the speakers in the listening environment back to calibration component 462 .

收听环境listening environment

自适应音频系统的实现可以部署在各种不同的收听环境中。这些收听环境包括音频回放应用的三个主要领域:家庭剧院系统,电视和音箱,以及耳机。图5示出了示例性的家庭剧院环境中的自适应音频系统的部署。图5的系统示出了可以由自适应音频系统提供的组件和功能的超集,并某些方面可以基于用户需求而减少或去除,同时仍提供增强的体验。系统500在各种不同的箱或阵列504中包括各种不同的扬声器和驱动器。扬声器包括提供前面、侧面和向上激发选项以及使用某些音频处理技术的音频的动态虚拟化的单个驱动器。图500示出了在标准9.1扬声器配置中部署的许多个扬声器。这些扬声器包括左和右高度扬声器(LH,RH)、左和右扬声器(L,R)、中央扬声器(被示为修改的中央扬声器)以及左和右环绕和后方扬声器(LS,RS,LB以及RB,低频率元件LFE未示出)。Implementations of adaptive audio systems can be deployed in a variety of different listening environments. These listening environments include three main areas of audio playback applications: home theater systems, televisions and speakers, and headphones. Figure 5 illustrates an exemplary deployment of an adaptive audio system in a home theater environment. The system of FIG. 5 illustrates a superset of components and functionality that may be provided by an adaptive audio system, and certain aspects may be reduced or removed based on user needs while still providing an enhanced experience. System 500 includes various speakers and drivers in various cabinets or arrays 504 . Speakers consist of a single driver that provides front, side and upward firing options as well as dynamic virtualization of audio using certain audio processing techniques. Diagram 500 shows a number of speakers deployed in a standard 9.1 speaker configuration. These speakers include left and right height speakers (LH, RH), left and right speakers (L, R), center speaker (shown as modified center speaker), and left and right surround and rear speakers (LS, RS, LB and RB, low frequency element LFE not shown).

图5示出了用在收听环境的中央位置中的中央声道扬声器510的使用。在实施例中,此扬声器使用修改的中央声道或高分辨率的中央声道510来实现。这种扬声器可以是带有能单独寻址的扬声器的前面激发中央声道阵列,所述能单独寻址的扬声器通过匹配屏幕上的视频对象的移动的阵列来允许音频对象的分立的平移。它可以实施为高分辨率的中央声道(HRC)扬声器,诸如在国际申请号PCT/US2011/028783中所描述的,其全文通过引用并入于此。HRC扬声器510还可以包括侧面激发的扬声器,如图所示。如果HRC扬声器不仅用作中央扬声器还用作带有音箱功能的扬声器,则可以激活并使用这些侧面激发的扬声器。HRC扬声器也可以被包含在屏幕502的上方和/或侧面,以提供音频对象的二维的、高分辨率的平移选项。中央扬声器510也可以包括额外的驱动器,并实现带有单独地控制的声音区域的可操纵的声束。Figure 5 shows the use of a center channel speaker 510 for use in a central location of the listening environment. In an embodiment, this speaker is implemented using a modified center channel or high resolution center channel 510 . Such speakers may be a front fired center channel array with individually addressable speakers that allow discrete panning of audio objects by matching the array to the movement of video objects on the screen. It may be implemented as a high-resolution center channel (HRC) loudspeaker, such as described in International Application No. PCT/US2011/028783, which is hereby incorporated by reference in its entirety. The HRC speakers 510 may also include side fired speakers, as shown. If the HRC loudspeaker is used not only as a center loudspeaker but also as a loudspeaker with cabinet function, these side-firing loudspeakers can be activated and used. HRC speakers may also be included above and/or to the sides of the screen 502 to provide two-dimensional, high-resolution panning options for audio objects. The center speaker 510 may also include additional drivers and enable steerable sound beams with individually controlled sound zones.

系统500还包括近场效应(NFE)扬声器512,该NFE扬声器512可以位于听者的前面或接近听者的前面,诸如在座位位置的前面的桌子上。采用自适应音频,可以将音频对象带到房间,而不只是锁定到房间的周边。因此,让对象遍历三维空间是一个选项。一个示例是对象可以在L扬声器中起始,通过NFE扬声器穿过收听环境,并且在RS扬声器中结束。各种不同的扬声器可以适合用作NFE扬声器,诸如无线的电池供电的扬声器。The system 500 also includes a near field effect (NFE) speaker 512 that may be located at or near the front of the listener, such as on a table in front of a seating position. With Adaptive Audio, audio objects can be brought into the room instead of just locked to the perimeter of the room. So having objects traverse 3D space is an option. An example is that an object may start in L speakers, pass through the listening environment through NFE speakers, and end in RS speakers. A variety of different speakers may be suitable for use as NFE speakers, such as wireless battery powered speakers.

图5示出了使用动态扬声器虚拟化以在家庭影院环境中提供沉浸式用户体验。通过基于由自适应音频内容所提供的对象空间信息,对扬声器虚拟化算法参数的动态控制,来实现动态扬声器虚拟化。在图5中示出了对于L和R扬声器的所述动态虚拟化,考虑创建沿着收听环境的侧面移动的对象的感知是自然的。可以针对每个相关对象使用单独的虚拟化器,并且可以将组合的信号发送到L和R扬声器以创建多个对象虚拟化效果。示出了针对L和R扬声器以及旨在作为立体声扬声器(带有两个独立输入)的NFE扬声器的动态虚拟化效果。此扬声器与音频对象大小和位置信息一起可用于创建扩散或点源近场音频体验。类似的虚拟化效果也可以应用于系统中的其他扬声器中的任何一个或全部。在实施例中,照相机可以提供额外的听者位置和身份信息,该身份信息可以被自适应音频渲染器用来提供更加符合调音师的艺术意图的更加引人注目的体验。Figure 5 illustrates the use of dynamic speaker virtualization to provide an immersive user experience in a home theater environment. Dynamic speaker virtualization is achieved through dynamic control of speaker virtualization algorithm parameters based on object space information provided by adaptive audio content. With the described dynamic virtualization for L and R loudspeakers shown in Figure 5, it is natural to consider creating the perception of objects moving along the sides of the listening environment. A separate virtualizer can be used for each object of interest, and the combined signal can be sent to the L and R speakers to create multiple object virtualization effects. The dynamic virtualization effect is shown for the L and R speakers as well as the NFE speakers intended as stereo speakers (with two independent inputs). This speaker, along with audio object size and position information, can be used to create a diffuse or point source near-field audio experience. Similar virtualization effects can also be applied to any or all of the other speakers in the system. In an embodiment, the camera may provide additional listener location and identity information that may be used by the Adaptive Audio Renderer to provide a more compelling experience that is more in line with the sound engineer's artistic intent.

自适应音频渲染器理解混合和回放系统之间的空间关系。在回放环境的某些实例中,分立的扬声器也可以在收听环境的包括头顶位置的所有相关区域中可用,如图1所示。在分立的扬声器在某些位置可用的这些情况中,渲染器可以被配置为将对象“扣”到最近的扬声器,而不是通过平移或使用扬声器虚拟化算法来在两个或更多扬声器之间创建幻像。尽管它稍微地扭曲了混合的空间呈现,但是它还允许渲染器避免无意的幻像。例如,如果混合级的左扬声器的角位置不对应于回放系统的左扬声器的角位置,则启用此功能将避免具有初始左声道的恒定幻像。The Adaptive Audio Renderer understands the spatial relationship between mixing and playback systems. In some instances of a playback environment, discrete loudspeakers may also be available in all relevant areas of the listening environment, including overhead positions, as shown in FIG. 1 . In those cases where discrete speakers are available in some locations, the renderer can be configured to "snap" objects to the closest speaker, rather than between two or more speakers by panning or using speaker virtualization algorithms. Create phantoms. Although it slightly distorts the spatial representation of the blend, it also allows the renderer to avoid unintentional phantoms. For example, if the angular position of the left speaker of the mixing stage does not correspond to the angular position of the left speaker of the playback system, enabling this feature will avoid having a constant phantom with the initial left channel.

然而,在很多情况下,特别是在家庭环境中,诸如安装在天花板上的头顶扬声器之类的某些扬声器不可用。在此情况下,某些虚拟化技术通过渲染器来实现,以通过现有的安装在地板或墙壁的扬声器再现头顶的音频内容。在实施例中,自适应音频系统包括通过包括每个扬声器的前面激发能力和顶部(或“向上”)激发能力这两者进行的对标准配置的修改。在传统的家庭应用中,扬声器制造商试图引入新的驱动器配置而非前面激发的换能器,并碰到试图标识应该向这些新的驱动器发送原始音频信号(或对它们的修改)中的哪些的问题。采用自适应音频系统,有非常特定的关于应该在标准水平平面上方渲染哪些音频对象的信息。在实施例中,使用向上激发的驱动器来渲染存在于自适应音频系统中的高度信息。同样,侧面激发扬声器可用于渲染某些其他内容,诸如环境效果。However, in many cases, especially in domestic environments, certain speakers, such as ceiling-mounted overhead speakers, are not available. In this case, some virtualization technology is implemented with a renderer to reproduce the overhead audio content through existing floor or wall mounted speakers. In an embodiment, the adaptive audio system includes modifications to the standard configuration by including both front firing capability and top (or "upward") firing capability for each speaker. In traditional home applications, loudspeaker manufacturers attempt to introduce new driver configurations other than previously excited transducers, and are confronted with trying to identify which of the original audio signals (or modifications to them) should be sent to these new drivers. The problem. With an adaptive audio system, there is very specific information about which audio objects should be rendered above a standard horizontal plane. In an embodiment, an upwardly fired driver is used to render altitude information present in an adaptive audio system. Likewise, side-firing speakers can be used to render some other content, such as ambient effects.

向上激发的驱动器的一个优点是,它们可用于从硬的天花板表面反射声音,以模拟位于天花板中的头顶/高度扬声器的存在。自适应音频内容的引人注目的属性是,使用头顶扬声器阵列来再现在空间上不同的音频。然而,如上所述,在很多情况下,安装头顶扬声器在家庭环境中太昂贵或不切实际。通过使用水平平面中的通常安置的扬声器来模拟高度扬声器,可以在容易安置扬声器的情况下创建引人注目的3D体验。在此情况下,自适应音频系统以音频对象及其空间再现信息被用于创建由向上激发驱动器再现的音频的新方式来使用向上激发的/模拟高度的驱动器。An advantage of upward-firing drivers is that they can be used to bounce sound off hard ceiling surfaces to simulate the presence of overhead/height speakers located in the ceiling. A compelling property of adaptive audio content is the use of an array of overhead speakers to reproduce spatially distinct audio. However, as mentioned above, in many cases installing overhead speakers is either too expensive or impractical in a home environment. By simulating height speakers using normally placed speakers in a horizontal plane, a compelling 3D experience can be created with easy speaker placement. In this case, the adaptive audio system uses upward-firing/simulated height drivers in a new way that audio objects and their spatial reproduction information are used to create audio reproduced by upward-firing drivers.

图6示出了在家庭剧院中使用反射声来模拟单个头顶扬声器的向上激发驱动器的使用。应该指出的是,可以组合地使用任意数量的向上激发驱动器来创建多个模拟的高度扬声器。替代地,许多个向上激发的驱动器可以被配置为将声音传输到天花板上的基本上同一点以实现某声音强度或效果。图600示出了通常的收听位置602位于收听环境内的一个特定位置的示例。该系统不包括用于传输包含高度提示的音频内容的任何高度扬声器。相反,扬声器箱或扬声器阵列604包括向上激发的驱动器以及前面激发的驱动器。向上激发的驱动器被配置为(相对于位置和倾斜角)将其声波606发送到天花板608上的特定点,在该特定点处声波606将被反射回到收听位置602。假设,天花板由适当的材料和成份制成,以适当地将声音反射到收听环境中。可以基于天花板成份、房间大小以及收听环境的其他相关特征,来选择向上激发的驱动器的相关特性(例如,大小、功率、位置等)。虽然在图6中只示出了一个向上激发的驱动器,但是在一些实施例中可以将多个向上激发的驱动器包含到再现系统中。Figure 6 illustrates the use of reflected sound to simulate the use of an up-firing driver for a single overhead speaker in a home theater. It should be noted that any number of up-firing drivers can be used in combination to create multiple simulated height speakers. Alternatively, a number of upward firing drivers may be configured to transmit sound to substantially the same point on the ceiling to achieve a certain sound intensity or effect. Diagram 600 shows an example where a typical listening position 602 is located at a particular location within a listening environment. The system does not include any height speakers for transmitting audio content containing height cues. In contrast, speaker cabinet or speaker array 604 includes upwardly fired drivers as well as front fired drivers. An upwardly fired driver is configured (with respect to position and tilt angle) to send its sound wave 606 to a specific point on the ceiling 608 where the sound wave 606 will be reflected back to the listening position 602 . It is assumed that the ceiling is made of suitable materials and components to properly reflect sound into the listening environment. Relevant characteristics (eg, size, power, location, etc.) of the up-firing drivers may be selected based on ceiling composition, room size, and other relevant characteristics of the listening environment. Although only one upward-firing driver is shown in FIG. 6, multiple upward-firing drivers may be incorporated into the reproduction system in some embodiments.

在实施例中,自适应音频系统使用向上激发的驱动器来提供高度元素。一般而言,已经表明,包含用于将感知高度提示引入到正在被馈送到向上激发的驱动器的音频信号的信号处理改善了虚拟高度信号的定位和感知质量。例如,已经开发出了参数化感知双耳听觉模型以创建高度提示滤波器,该高度提示滤波器当用于处理正在由向上激发的驱动器再现的音频时改善了所述再现的感知质量。在实施例中,高度提示滤波器是从物理扬声器位置(大致与听者齐平)和反射扬声器位置(在听者上方)这两者导出的。对于物理扬声器位置,方向滤波器是基于外耳(或耳廓)的模型确定的。该滤波器的逆接下来被确定,并用于从物理扬声器中去除高度提示。接下来,对于反射扬声器位置,使用外耳的相同模型来确定第二方向滤波器。如果声音在听者上方,该滤波器被直接应用,基本上再现耳朵会接收到的提示。在实践中,这些滤波器可以以允许单个滤波器(1)从物理扬声器位置去除高度提示以及(2)从反射扬声器位置插入高度提示的方式组合。图16是示出了这种组合的滤波器的频率响应的曲线图。组合的滤波器可以以允许相对于所应用的滤波的主动性或量的一些可调整性的方式使用。例如,在某些情况下,不完全去除物理扬声器高度提示或完全应用反射扬声器高度提示是有益的,因为只有来自物理扬声器的有些声音直接到达听者(其余部分从天花板反射)。In an embodiment, an adaptive audio system uses upward-firing drivers to provide an element of height. In general, it has been shown that the inclusion of signal processing for introducing perceptual height cues to the audio signal being fed to an upwardly fired driver improves the localization and perceptual quality of the virtual height signal. For example, parametric perceptual binaural auditory models have been developed to create height cue filters that, when used to process audio being reproduced by upwardly firing drivers, improve the perceptual quality of the reproduction. In an embodiment, the height cue filter is derived from both the physical speaker position (approximately level with the listener) and the reflective speaker position (above the listener). For physical speaker positions, the directional filter is determined based on a model of the outer ear (or pinna). The inverse of this filter is then determined and used to remove height cues from the physical speakers. Next, for reflective speaker positions, the same model of the outer ear is used to determine a second directional filter. This filter is applied directly, essentially reproducing the cue the ear would receive if the sound was above the listener. In practice, these filters can be combined in a way that allows a single filter to (1) remove height cues from physical speaker positions and (2) insert height cues from reflected speaker positions. Figure 16 is a graph showing the frequency response of such a combined filter. The combined filters may be used in a manner that allows some adjustability with respect to the aggressiveness or amount of filtering applied. For example, in some cases it is beneficial not to completely remove the physical speaker height cues or to fully apply the reflective speaker height cues, since only some of the sound from the physical speakers reaches the listener directly (the rest is reflected from the ceiling).

扬声器配置speaker configuration

自适应音频系统的主要考虑是扬声器配置。该系统使用能单独地寻址的驱动器,并且这种驱动器的阵列被配置为提供直接和反射声源这两者的组合。到系统控制器(例如,A/V接收器,机顶盒)的双向链路允许音频和配置数据被发送到扬声器,并且允许扬声器和传感器信息被发送回控制器,创建活跃的闭环系统。The main consideration for an adaptive audio system is speaker configuration. The system uses individually addressable drivers, and arrays of such drivers are configured to provide a combination of both direct and reflected sound sources. A two-way link to the system controller (eg, A/V receiver, set-top box) allows audio and configuration data to be sent to the speakers, and allows speaker and sensor information to be sent back to the controller, creating an active closed-loop system.

为了描述的目的,术语“驱动器”意指响应于电气音频输入信号而产生声音的单个电声换能器。驱动器可以以任何合适的类型、几何形状和大小来实现,并可包括喇叭形、锥形、带状换能器等等。术语“扬声器”意指整体的外壳中的一个或更多个驱动器。图7A示出了在实施例下具有第一配置中的多个驱动器的扬声器。如图7A所示,扬声器外壳700具有安装在外壳内的许多个单独的驱动器。典型地,外壳将包括一个或更多个前面激发的驱动器702,诸如低音扬声器、中音域扬声器或高音扬声器,或其任何组合。也可以包括一个或更多个侧面激发的驱动器704。前面激发和侧面激发的驱动器典型地安装为与外壳的侧面平齐,使得它们从由扬声器限定的垂直面垂直向外投射声音,并且这些驱动器通常永久地固定在箱700内。对于以反射声的渲染为特征的自适应音频系统,还提供一个或更多个向上倾斜的驱动器706。这些驱动器被定位为使得它们以某一角度将声音投射到天花板,在那里声音被弹回到听者,如图6所示。倾斜度可以取决于收听环境特性和系统要求来设置。例如,向上驱动器706可以向上倾斜在30和60度之间,并可以定位在扬声器外壳700中的前面的激发驱动器702上方,以便最小化与从前面激发的驱动器702产生的声波的干扰。向上激发的驱动器706可以以固定角度安装,或它可以被安装为使得可以人工地调整倾斜角。替代地,可以使用伺服机制来允许对向上激发的驱动器的倾斜角和投射方向的自动或电气控制。对于某些声音,诸如环境声,向上激发的驱动器可以竖直向上指向扬声器外壳700的上表面之外,以创建可以被称为“顶部激发的”驱动器的东西。在此情况下,取决于天花板的声音特性,声音的大分量可以反射回扬声器。然而,在大多数情况下,倾斜角通常用于帮助通过从天花板反射到收听环境内的不同的或多个中心位置来投射声音,如图6所示。For purposes of description, the term "driver" means a single electro-acoustic transducer that produces sound in response to an electrical audio input signal. Drivers may be implemented in any suitable type, geometry and size, and may include horns, cones, ribbon transducers, and the like. The term "speaker" means one or more drivers in an integral housing. Figure 7A shows a loudspeaker with multiple drivers in a first configuration, under an embodiment. As shown in Figure 7A, a speaker enclosure 700 has a number of individual drivers mounted within the enclosure. Typically, the enclosure will include one or more front fired drivers 702, such as woofers, mid-range speakers, or tweeters, or any combination thereof. One or more side fired drivers 704 may also be included. Front-firing and side-firing drivers are typically mounted flush with the sides of the enclosure so that they project sound perpendicularly outward from the vertical plane defined by the speakers, and these drivers are usually permanently affixed within enclosure 700 . For adaptive audio systems featuring the rendering of reflected sound, one or more upward-sloping drivers 706 are also provided. These drivers are positioned such that they project sound at an angle to the ceiling where it is bounced back to the listener, as shown in Figure 6. The inclination can be set depending on listening environment characteristics and system requirements. For example, upward driver 706 may be angled upward between 30 and 60 degrees and may be positioned above front-firing driver 702 in speaker housing 700 to minimize interference with sound waves generated from front-firing driver 702. The upward firing driver 706 can be mounted at a fixed angle, or it can be mounted such that the tilt angle can be adjusted manually. Alternatively, a servo mechanism may be used to allow automatic or electrical control of the tilt angle and projection direction of the upwardly fired drive. For certain sounds, such as ambient sound, the upwardly fired drivers can be pointed straight up and out of the upper surface of the speaker housing 700 to create what may be referred to as "top fired" drivers. In this case, depending on the acoustic characteristics of the ceiling, a large component of the sound can be reflected back to the loudspeaker. In most cases, however, the oblique angle is generally used to help project the sound by reflecting from the ceiling to different or multiple central locations within the listening environment, as shown in Figure 6.

图7A旨在示出扬声器和驱动器配置的一个示例,并且许多其他配置也是可以的。例如,向上激发的驱动器可以设在其自己的外壳中,以允许与现有的扬声器一起使用。图7B示出了在实施例下的具有分布在多个外壳中的驱动器的扬声器系统。如图7B所示,向上激发的驱动器712设在单独的外壳710中,所述外壳710可以位于具有前面激发和/或侧面激发的驱动器716和718的外壳714附近或顶部。驱动器也可以封闭在扬声器音箱内,诸如在许多家庭影院环境中所使用的,其中沿着一个轴在单个水平或垂直外壳内排列有许多个小型或中型的驱动器。图7C示出了在实施例下的音箱内的驱动器的布局。在此示例中,音箱外壳730是包括侧面激发的驱动器734、向上激发的驱动器736以及前面激发的驱动器732的水平音箱。图7C旨在只作为一个示例性配置,并且可以针对每个功能——前面激发、侧面激发和向上激发——使用任何实际数量的驱动器。Figure 7A is intended to illustrate one example of a speaker and driver configuration, and many other configurations are possible. For example, an up-firing driver could be housed in its own housing to allow use with existing loudspeakers. Figure 7B shows a speaker system with drivers distributed in multiple enclosures, under an embodiment. As shown in FIG. 7B , up-firing driver 712 is provided in a separate housing 710 which may be located near or on top of housing 714 with front-firing and/or side-firing drivers 716 and 718 . Drivers may also be enclosed in speaker enclosures, such as are used in many home theater environments, where many small or mid-sized drivers are arrayed along one axis within a single horizontal or vertical enclosure. Figure 7C shows the layout of the drivers within the enclosure, under an embodiment. In this example, the speaker enclosure 730 is a horizontal speaker that includes a side-firing driver 734 , an upward-firing driver 736 , and a front-firing driver 732 . Figure 7C is intended to be only one exemplary configuration, and any practical number of drivers may be used for each function - front firing, side firing, and upward firing.

对于图7A-C的实施例,应该注意的是,取决于所需的频率响应特性,以及诸如大小、功率定额、组件成本等之类的任何其他相关约束,驱动器可以是任何合适的形状、大小以及类型。7A-C, it should be noted that depending on the desired frequency response characteristics, and any other relevant constraints such as size, power rating, component cost, etc., the driver can be any suitable shape, size and type.

在典型的自适应音频环境中,在收听环境中将包含许多个扬声器外壳。图8示出了放置在收听环境内的具有包括置于向上激发的驱动器的可单独寻址的驱动器的扬声器的示例性布局。如图8所示,收听环境800包括四个单独的扬声器806,每个都具有至少一个前面激发、侧面激发以及向上激发的驱动器。收听环境还可以包含用于环绕声应用的固定驱动器,诸如中央扬声器802和重低音扬声器或LFE 804。在图8中可以看出,取决于收听环境以及各扬声器单元的大小,收听环境内的扬声器806的适当放置可以提供由来自许多个向上激发的驱动器的声音从天花板反射开所产生的丰富的音频环境。取决于内容、收听环境大小、听者位置、声学特性以及其他相关参数,扬声器可以目的是提供从天花板平面上的一个或更多个点的反射。In a typical adaptive audio environment, many speaker enclosures will be included in the listening environment. Figure 8 shows an exemplary layout of a loudspeaker with individually addressable drivers including an upwardly fired driver placed within a listening environment. As shown in FIG. 8, the listening environment 800 includes four individual speakers 806, each having at least one front-firing, side-firing, and upward-firing driver. The listening environment may also contain fixed drivers such as a center speaker 802 and a subwoofer or LFE 804 for surround sound applications. As can be seen in Figure 8, depending on the listening environment and the size of the individual speaker units, proper placement of the speakers 806 within the listening environment can provide rich audio produced by sound from many upwardly fired drivers bouncing off the ceiling environment. Depending on the content, listening environment size, listener position, acoustic characteristics, and other relevant parameters, loudspeakers may be designed to provide reflections from one or more points on the ceiling plane.

在家庭剧院或类似收听环境的自适应音频系统中使用的扬声器可以使用基于现有的环绕声配置(例如,5.1、7.1、9.1等)的配置。在此情况中,在针对向上激发的声音组件提供了额外的驱动器和限定的情况下,许多个驱动器被提供并根据已知环绕声约定限定。Speakers used in an adaptive audio system for a home theater or similar listening environment may use configurations based on existing surround sound configurations (eg, 5.1, 7.1, 9.1, etc.). In this case, a number of drivers are provided and defined according to known surround sound conventions, with additional drivers and definitions provided for upwardly fired sound components.

图9A示出了在实施例下的针对反射音频使用多个能寻址的驱动器的用于自适应音频5.1系统的扬声器配置。在配置900中,标准5.1扬声器包括LFE 901、中央扬声器902、L/R前置扬声器904/906以及L/R后置扬声器908/910,其被提供有八个额外的驱动器,给出了总共14个可寻址的驱动器。在每个扬声器单元902-910中,这八个额外的驱动器是除了标有“向前”(或“前面”)的驱动器外还标有“向上”和“向侧”的驱动器。直接向前驱动器将由包含自适应音频对象的子声道和被设计为具有高度方向性的任何其他组件驱动。向上激发的(反射的)驱动器可以包含更加全方向性或无方向性的子声道内容,但是并非限制如此。示例将包括背景音乐或环境声音。如果到系统的输入包括传统环绕声内容,则该内容可以被智能地分解为直接和反射的子声道并被馈送到合适的驱动器。Figure 9A shows a speaker configuration for an adaptive audio 5.1 system using multiple addressable drivers for reflected audio, under an embodiment. In configuration 900, standard 5.1 speakers comprising LFE 901, center speaker 902, L/R front speakers 904/906 and L/R rear speakers 908/910 are provided with eight additional drivers, giving a total 14 addressable drives. In each speaker unit 902-910, the eight additional drivers are drivers labeled "upward" and "sideways" in addition to the drivers labeled "forward" (or "front"). Direct forward drivers will be driven by sub-channels containing Adaptive Audio Objects and any other components designed to be highly directional. Up-firing (reflective) drivers may contain more omnidirectional or non-directional sub-channel content, but are not so limited. Examples would include background music or ambient sounds. If the input to the system includes traditional surround sound content, this content can be intelligently broken down into direct and reflected sub-channels and fed to the appropriate drivers.

对于直接子声道,扬声器外壳将包含其中驱动器的中轴平分收听环境的“最佳听音位置(sweet-spot)”或声学中心的驱动器。向上激发的驱动器将被定位为使得驱动器的正中面和声学中心之间的角度为45到180度的范围内的某个角度。在将驱动器定位为180度的情况下,面向后面的驱动器可以通过从后壁反射来提供声音扩散。该配置使用这样的声学原理:在向上激发的驱动器与直接驱动器进行时间对齐之后,早到达的信号分量将是相干的,而晚到达的分量将得益于由收听环境所提供的天然的漫射。For direct sub-channels, the speaker enclosure will contain the driver in which the center axis of the driver bisects the "sweet-spot" or acoustic center of the listening environment. An upwardly firing driver will be positioned such that the angle between the midplane of the driver and the acoustic center is somewhere in the range of 45 to 180 degrees. With drivers positioned 180 degrees, rear-facing drivers can provide sound diffusion by reflecting off the rear wall. This configuration uses the acoustic principle that after the upwardly fired driver is time aligned with the direct driver, early arriving signal components will be coherent, while late arriving components will benefit from the natural diffusion provided by the listening environment .

为了实现由自适应音频系统提供的高度提示,向上激发的驱动器可以从水平平面向上倾斜,并且在极端情况下可以被定位成向竖直向上辐射并从诸如平坦的天花板或放置在外壳正上方的声扩散器之类的一个或更多个反射表面反射。为提供额外的方向性,中央扬声器可以使用具有跨屏幕操纵声音以提供高分辨率中央声道的能力的音箱配置(诸如图7C所示出的)。To achieve the height cues provided by adaptive audio systems, upwardly-firing drivers can be angled upwards from a horizontal plane, and in extreme cases can be positioned to radiate vertically Reflected by one or more reflective surfaces such as a sound diffuser. To provide additional directionality, the center speaker may use a cabinet configuration (such as that shown in Figure 7C) with the ability to steer sound across the screen to provide a high resolution center channel.

图9A的5.1配置可以通过添加类似于标准7.1配置的两个额外的后置外壳来扩展。图9B示出了在这种实施例下的对于反射音频使用多个能寻址的驱动器的自适应音频7.1系统的扬声器配置。如配置920所示,两个额外的外壳922和924被置于“左侧环绕”和“右侧环绕”位置,侧面扬声器以与前置外壳类似的方式指向侧壁并且向上激发的驱动器被设置为在现有的前、后对之间的中途从天花板弹回。可以根据需要作出许多次这样的增量添加,额外的对沿着侧面壁或后面壁填充缝隙。图9A和9B只示出了收听环境的自适应音频系统中的可以与向上激发和侧面激发的扬声器一起使用的扩展的环绕声扬声器布局的可能配置的一些示例,并且许多其他的配置也是可以的。The 5.1 configuration of Figure 9A can be expanded by adding two additional rear enclosures similar to the standard 7.1 configuration. Figure 9B shows a speaker configuration for an Adaptive Audio 7.1 system using multiple addressable drivers for reflected audio under such an embodiment. As shown in configuration 920, two additional enclosures 922 and 924 are placed in the "surround left" and "surround right" positions, with the side speakers pointing towards the side walls in a similar fashion to the front enclosures and the upward firing drivers set For bouncing off the ceiling midway between the existing front and rear pairs. Such incremental additions can be made as many times as desired, with the additional pairs filling the gap along the side or rear walls. Figures 9A and 9B show only some examples of possible configurations of extended surround sound speaker layouts that can be used with up-firing and side-firing speakers in an adaptive audio system for a listening environment, and many other configurations are possible .

作为上文所描述的n.1配置的替代方案,可以使用更加灵活的基于壳(pod)的系统,由此每个驱动器都被包含在其自己的外壳内,而外壳可以安装在任何方便的位置。这将使用诸如图7B所示出的驱动器配置。这些单独的单元然后可以按与n.1配置类似的方式聚集,或它们可以单个地分散在收听环境周围。壳不需要局限于被放置在收听环境的边缘,它们也可以被放置收听环境内的任何表面上(例如,茶几、书架等)。这样的系统将易于扩展,允许用户随着时间添加更多扬声器,以创建更加沉浸式的体验。如果扬声器是无线的,那么壳系统可包括用于再充电目的的对接扬声器的能力。在此设计中,壳可以被对接在一起使得当它们再充电时它们充当单个扬声器,或许用于听立体声音乐,然后脱离对接状态并定位在自适应音频内容的收听环境周围。As an alternative to the n.1 configuration described above, a more flexible pod-based system can be used whereby each drive is contained within its own enclosure which can be mounted in any convenient Location. This would use a driver configuration such as that shown in Figure 7B. These individual units can then be aggregated in a similar fashion to the n.1 configuration, or they can be individually dispersed around the listening environment. The cases need not be limited to being placed on the edge of the listening environment, they can also be placed on any surface within the listening environment (eg coffee table, bookshelf, etc.). Such a system would be easily scalable, allowing users to add more speakers over time to create a more immersive experience. If the speaker is wireless, the case system may include the ability to dock the speaker for recharging purposes. In this design, the cases can be docked together such that they act as a single speaker when they are recharged, perhaps for listening to stereo music, and then undocked and positioned around a listening environment for adaptive audio content.

为了使用向上激发的能寻址的驱动器来增强自适应音频系统的可配置性和准确度,可以向外壳添加许多个传感器和反馈设备,以将可以用于渲染算法的特性通知给渲染器。例如,安装在每个外壳中的麦克风会允许系统测量收听环境的相位、频率和混响特性,并使用三角测量和外壳本身的类似于HRTF的函数,来测量扬声器相对于彼此的位置。可以使用惯性传感器(例如,陀螺仪、罗盘等)来检测外壳的方向和角度;并且可以使用光学和视觉传感器(例如,使用基于激光器的红外测距仪)来提供相对于收听环境本身的位置信息。这些只表示可在系统中使用的附加传感器的几个可能性,其他的也是可以的。To enhance the configurability and accuracy of the adaptive audio system using up-firing addressable drivers, a number of sensors and feedback devices can be added to the enclosure to inform the renderer of properties that can be used in the rendering algorithm. For example, microphones installed in each enclosure would allow the system to measure the phase, frequency, and reverberation characteristics of the listening environment, and use triangulation and HRTF-like functions of the enclosures themselves to measure the position of the speakers relative to each other. Inertial sensors (e.g., gyroscopes, compasses, etc.) can be used to detect the orientation and angle of the housing; and optical and visual sensors (e.g., using laser-based infrared rangefinders) can be used to provide positional information relative to the listening environment itself . These represent only a few possibilities of additional sensors that can be used in the system, others are possible.

可以通过允许外壳的驱动器和/或声学修改器的位置经由机电伺服机构自动地调整,来进一步增强这样的传感器系统。这会允许驱动器的方向性在运行时被改变,以适合收听环境中的它们相对于墙和其他驱动器的定位(“积极的操纵”)。类似地,可以调谐任何声学修改器(诸如挡板、喇叭或波导),以提供用于在任何收听环境配置中都最佳回放的正确的频率和相位响应(“积极的调谐”)。可以响应于渲染的内容在初始收听环境配置期间(例如,和自动EQ/自动房间配置系统一起)或回放期间执行积极的操纵和积极的调谐这两者。Such a sensor system may be further enhanced by allowing the position of the actuator and/or the acoustic modifier of the housing to be automatically adjusted via electromechanical servomechanisms. This would allow the directionality of the drivers to be changed at runtime to suit their positioning relative to walls and other drivers in the listening environment ("active steering"). Similarly, any acoustic modifier (such as a baffle, horn or waveguide) can be tuned to provide the correct frequency and phase response for optimal playback in any listening environment configuration ("aggressive tuning"). Both active manipulation and active tuning may be performed during initial listening environment configuration (eg, with an automatic EQ/automatic room configuration system) or during playback in response to rendered content.

双向互连two-way interconnection

一经配置,扬声器就必须连接到渲染系统。传统的互连典型地为两种类型:用于无源扬声器的扬声器级输入和用于有源扬声器的线级输入。如图4C所示,自适应音频系统450包括双向互连功能。该互连实施在渲染级454和放大器/扬声器458和麦克风级460之间的一组物理和逻辑连接内。在每个扬声器箱中寻址多个驱动器的能力由声源和扬声器之间的这些智能互连来支持。双向互连允许从声源(渲染器)到扬声器的包括控制信号和音频信号的信号的传输。从扬声器到声源的信号包括控制信号和音频信号这两者,其中在此情况下,音频信号是源自可选的内置麦克风的音频。电力也可以作为双向互连的一部分来提供,至少用于扬声器/驱动器不分开地供电的情况。Once configured, the speakers must be connected to the rendering system. Traditional interconnects are typically of two types: speaker-level inputs for passive speakers and line-level inputs for active speakers. As shown in Figure 4C, adaptive audio system 450 includes bi-directional interconnection functionality. This interconnection is implemented within a set of physical and logical connections between the rendering stage 454 and amplifier/speaker 458 and microphone stage 460 . The ability to address multiple drivers in each speaker enclosure is supported by these intelligent interconnections between sound sources and speakers. The bi-directional interconnection allows the transmission of signals including control signals and audio signals from the sound source (renderer) to the loudspeakers. Signals from the speaker to the sound source include both control signals and audio signals, where in this case the audio signal is audio originating from an optional built-in microphone. Power could also be provided as part of the bi-directional interconnect, at least for cases where the speaker/drivers are not powered separately.

图10是示出了在实施例下的双向互连的构成的图示1000。可以代表渲染器加放大器/声音处理器链的声源1002通过一对互连链路1006和1008逻辑地且物理地耦接到扬声器箱1004。从声源1002到扬声器箱1004内的驱动器1005的互连1006包括用于每个驱动器的电声信号、一个或更多个控制信号以及可选的电力。从扬声器箱1004回到声源1002的互连1008包括来自用于校准渲染器或其他类似的声音处理功能的麦克风1007或其他传感器的声音信号。反馈互连1008还包含被渲染器用来修改或处理被设置为通过互连1006到驱动器的声音信号的某些驱动器定义和参数。Figure 10 is a diagram 1000 showing the composition of a bidirectional interconnect under an embodiment. Sound source 1002 , which may represent a renderer plus amplifier/sound processor chain, is logically and physically coupled to speaker enclosure 1004 by a pair of interconnecting links 1006 and 1008 . The interconnection 1006 from the sound source 1002 to the drivers 1005 within the speaker enclosure 1004 includes electro-acoustic signals for each driver, one or more control signals and optionally electrical power. An interconnect 1008 from the speaker enclosure 1004 back to the sound source 1002 includes sound signals from a microphone 1007 or other transducer used to calibrate a renderer or other similar sound processing functions. Feedback interconnect 1008 also contains certain driver definitions and parameters used by the renderer to modify or process sound signals that are set to the drivers through interconnect 1006 .

在实施例中,在系统设置期间,给系统的每个箱中的每个驱动器分配标识符(例如,数值指派)。每个扬声器箱(外壳)也可以被唯一地标识。此数值指派被扬声器箱用来确定向箱内的哪个驱动器发送哪个音频信号。所述指派以合适的存储器设备存储在扬声器箱中。替代地,每个驱动器都可以被配置为将其自己的标识符存储到本地存储器中。在进一步的替代方案中,诸如驱动器/扬声器没有本地存储容量的方案中,标识符可以存储在声源1002内的渲染级或其他组件中。在扬声器发现期间,由声源查询每个扬声器(或中心数据库)的简档。简档定义了某些驱动器定义,包括扬声器箱或其他所定义的阵列中的驱动器的数量、每个驱动器的声音特性(例如,驱动器类型、频率响应等)、每个驱动器的中心相对于扬声器箱的前面的中心的x,y,z位置、每个驱动器相对于所定义的平面(例如,天花板、地板、箱的垂直轴等)的角度以及麦克风的数量和麦克风特性。也可以定义其他相关的驱动器以及麦克风/传感器参数。在实施例中,驱动器定义以及扬声器箱简档可以被表达为由渲染器使用的一个或更多个XML文档。In an embodiment, during system setup, each drive in each box of the system is assigned an identifier (eg, a numerical assignment). Each speaker enclosure (enclosure) can also be uniquely identified. This numerical assignment is used by the speaker enclosure to determine which audio signal is sent to which driver within the enclosure. The assignments are stored in the loudspeaker enclosure with a suitable memory device. Alternatively, each driver can be configured to store its own identifier in local memory. In a further alternative, such as where the drivers/speakers have no local storage capacity, the identifiers may be stored in a rendering stage or other component within the sound source 1002 . During speaker discovery, the profile of each speaker (or a central database) is queried by the sound source. A profile defines certain driver definitions, including the number of drivers in a speaker cabinet or other defined array, the sonic characteristics of each driver (e.g. driver type, frequency response, etc.), the center of each driver relative to the speaker cabinet The x,y,z position of the center of the front, the angle of each driver relative to a defined plane (eg, ceiling, floor, vertical axis of the box, etc.), and the number and characteristics of the microphones. Other related driver and microphone/sensor parameters can also be defined. In an embodiment, driver definitions and speaker enclosure profiles may be expressed as one or more XML documents used by the renderer.

在一个可能的实现中,在声源1002和扬声器箱1004之间创建网际协议(IP)控制网络。每个扬声器箱和声源都充当单个网络端点,并在初始化或通电时被给予本地链路地址。可以使用诸如零配置组网(zeroconf)之类的自动发现机制来允许声源定位网络上的每个扬声器。零配置组网是在没有人工操作员干预或特殊配置服务器的情况下自动地创建可使用的IP网络而无需人工过程的示例,并且可以使用其他类似的技术。给定智能网络系统,多个源可以作为扬声器驻留在IP网络中。这允许多个源在不通过“主”音频源(例如,传统的A/V接收器)进行路由声音的情况下直接驱动扬声器。如果另一个源试图寻址扬声器,则在所有源之间执行通信,以确定哪个源当前是“活跃的”、是否有必要为活跃的以及控制是否可以过渡到新的声源。在制造过程中,可以基于源的分类给源预先指定优先级,例如,电信源可以具有比娱乐源更高的优先级。在诸如典型的家庭环境之类的多房间环境中,总体环境内的所有扬声器都可以驻留在单个网络上,但是,可能不需要被同时寻址。在设置和自动配置期间,通过互连1008往回提供的声音电平可用于确定哪些扬声器位于同一个物理空间中。一旦确定了该信息,扬声器可以被分组成集群。在此情况下,可以分配集群ID并使它们成为驱动器定义的一部分。向每个扬声器发送集群ID,并每隔集群可以由声源1002同时寻址。In one possible implementation, an Internet Protocol (IP) control network is created between the sound source 1002 and the speaker enclosure 1004 . Each loudspeaker cabinet and sound source acts as a single network endpoint and is given a link-local address upon initialization or power-up. An auto-discovery mechanism such as zero-configuration networking (zeroconf) can be used to allow sound sources to locate each speaker on the network. Zero-configuration networking is an example of automatically creating a usable IP network without manual process without human operator intervention or special configuration servers, and other similar techniques may be used. Given an intelligent network system, multiple sources can reside in the IP network as speakers. This allows multiple sources to directly drive speakers without routing the sound through a "main" audio source (eg, a traditional A/V receiver). If another source attempts to address the speaker, a communication is performed between all sources to determine which source is currently "active", whether it is necessary to be active, and whether control can transition to the new sound source. During manufacturing, sources may be pre-assigned priorities based on their classification, for example, telecommunications sources may have higher priority than entertainment sources. In a multi-room environment such as a typical home environment, all speakers within the general environment may reside on a single network, but may not need to be addressed simultaneously. During setup and auto-configuration, sound levels provided back through the interconnect 1008 can be used to determine which speakers are located in the same physical space. Once this information is determined, speakers can be grouped into clusters. In this case, cluster IDs can be assigned and made part of the driver definition. A cluster ID is sent to each speaker and every other cluster can be addressed by the sound source 1002 simultaneously.

如图10所示,可以通过双向互连,传输可选的电力信号。扬声器可以是无源的(要求来自声源的外部电力)或有源的(要求来自电源插座的电力)。如果扬声器系统包括没有无线支持的有源的扬声器,则到扬声器的输入包括IEEE 802.3兼容的有线以太网输入。如果扬声器系统包括带有无线支持的有源的扬声器,则到扬声器的输入包括IEEE 802.11兼容的无线以太网输入,或者替代地,包括由WISA组织所指定的无线标准。无源扬声器可以通过由声源直接提供的合适的电力信号来供电。As shown in Figure 10, optional power signals can be transmitted through a bidirectional interconnection. Speakers can be passive (requiring external power from the sound source) or active (requiring power from an electrical outlet). If the speaker system includes powered speakers without wireless support, the inputs to the speakers include IEEE 802.3 compliant wired Ethernet inputs. If the speaker system includes powered speakers with wireless support, the input to the speaker includes an IEEE 802.11 compliant wireless Ethernet input, or alternatively, a wireless standard as specified by the WISA organization. Passive loudspeakers can be powered by a suitable electrical signal provided directly by the sound source.

系统配置和校准System Configuration and Calibration

如图4C所示,自适应音频系统的功能包括校准功能462。此功能由图10中所示的麦克风1007和互连1008链路来实现。系统1000中的麦克风组件的功能是测量收听环境中的单个驱动器的响应,以便导出整个系统响应。为此目的,可以使用包括单个麦克风或麦克风阵列的多个麦克风拓扑。最简单的情况是使用位于收听环境的中心的单个全方向性测量麦克风,来测量每个驱动器的响应。如果收听环境和回放条件保证了更加精细的分析,则可以使用多个麦克风。多个麦克风的最方便的位置在用于收听环境中的特定扬声器配置的物理扬声器箱内。安装在每个外壳中的麦克风允许系统在收听环境中的多个位置测量每个驱动器的响应。此拓扑的替代方案是使用位于收听环境中的可能的听者位置的多个全方向性测量麦克风。As shown in FIG. 4C , the functionality of the adaptive audio system includes a calibration function 462 . This functionality is implemented by the microphone 1007 and interconnect 1008 links shown in FIG. 10 . The function of the microphone assembly in system 1000 is to measure the response of individual drivers in the listening environment in order to derive the overall system response. For this purpose, multiple microphone topologies including single microphones or microphone arrays may be used. In the simplest case, the response of each driver is measured using a single omnidirectional measurement microphone located in the center of the listening environment. Multiple microphones may be used if the listening environment and playback conditions warrant a more detailed analysis. The most convenient location for multiple microphones is within the physical speaker enclosure for the particular speaker configuration in the listening environment. Microphones mounted in each enclosure allow the system to measure the response of each driver at multiple locations in the listening environment. An alternative to this topology is to use multiple omnidirectional measurement microphones located at likely listener positions in the listening environment.

麦克风用来使得能够对渲染器和后处理算法进行自动配置和校准。在自适应音频系统中,渲染器负责将混合型对象和基于声道的音频流转换成针对一个或更多个物理扬声器内的具体的可寻址的驱动器指定的单个音频信号。后处理组件可包括:延迟、均衡、增益、扬声器虚拟化以及向上混合。扬声器配置代表常为关键的信息,渲染器组件可以使用该信息将混合型对象和基于声道的音频流转换为单个的每个驱动器的音频信号,以提供音频内容的最佳回放。系统配置信息包括:(1)系统中的物理扬声器的数量,(2)每个扬声器中的可单独寻址的驱动器的数量,以及(3)每一可单独寻址的驱动器相对于收听环境几何形状的位置和方向。其他特性也是可以的。图11示出了在实施例下的自动配置和系统校准组件的功能。如图示1100所示,一个或更多个麦克风的阵列1102向配置及校准组件1104提供声学信息。该声学信息捕捉收听环境的某些相关特性。然后,配置及校准组件1104将该信息提供到渲染器1106以及任何相关后处理组件1108,使得针对收听环境调整和优化最终发送到扬声器的音频信号。The microphone is used to enable automatic configuration and calibration of the renderer and post-processing algorithms. In an adaptive audio system, a renderer is responsible for converting a hybrid object and channel-based audio stream into a single audio signal specified for a specific addressable driver within one or more physical speakers. Post-processing components can include: delay, equalization, gain, speaker virtualization, and upmixing. Speaker configurations represent often critical information that the Renderer component can use to convert hybrid object and channel-based audio streams into individual per-driver audio signals to provide optimal playback of the audio content. System configuration information includes: (1) the number of physical speakers in the system, (2) the number of individually addressable drivers in each speaker, and (3) the relative geometry of each individually addressable driver to the listening environment The position and orientation of the shape. Other properties are also possible. Figure 11 illustrates the functionality of the autoconfiguration and system calibration component under an embodiment. As shown in diagram 1100 , array 1102 of one or more microphones provides acoustic information to configuration and calibration component 1104 . This acoustic information captures certain relevant characteristics of the listening environment. Configuration and calibration component 1104 then provides this information to renderer 1106 and any relevant post-processing components 1108 so that the audio signal ultimately sent to the speakers is adjusted and optimized for the listening environment.

系统中的物理扬声器的数量以及每个扬声器中的可单独寻址的驱动器的数量是物理扬声器属性。这些属性经由双向互连456直接从扬声器传输到渲染器454。渲染器和扬声器使用共同的发现协议,使得当扬声器连接到系统或与系统断开连接时,渲染器被通知变化并可以相应地重新配置系统。The number of physical speakers in the system and the number of individually addressable drivers in each speaker are physical speaker attributes. These properties are transmitted directly from the speakers to the renderer 454 via the bidirectional interconnect 456 . The renderer and speakers use a common discovery protocol so that when a speaker is connected to or disconnected from the system, the renderer is notified of the change and can reconfigure the system accordingly.

收听环境的几何形状(大小与形状)是配置及校准过程中的必要信息项。几何形状可以以许多不同的方式确定。在人工配置模式中,由听者或技术人员通过向渲染器或自适应音频系统内的其他处理单元提供输入的用户界面把收听环境的最小包围立方体的宽度、长度和高度输入到系统。各种不同的用户界面技术和工具可以用于此目的。例如,可以由自动地绘制或跟踪收听环境的几何形状的程序向渲染器发送收听环境几何形状。这种系统可以使用计算机视觉、声纳以及3D基于激光器的物理绘图的组合。The geometry (size and shape) of the listening environment is a necessary item of information in the configuration and calibration process. Geometry can be determined in many different ways. In manual configuration mode, the width, length and height of the minimal bounding cube of the listening environment are input to the system by the listener or technician through a user interface that provides input to a renderer or other processing unit within the adaptive audio system. A variety of different user interface techniques and tools can be used for this purpose. For example, the listening environment geometry may be sent to the renderer by a program that automatically draws or tracks the geometry of the listening environment. Such a system could use a combination of computer vision, sonar, and 3D laser-based physical mapping.

渲染器使用扬声器在收听环境几何形状内的位置来导出用于每个可单独寻址的驱动器(包括直接和反射(向上激发)驱动器)的音频信号。直接驱动器是被瞄准为使得它们的分散图案(dispersion pattern)的大多数在被一个或更多个反射表面(诸如地板、墙或天花板)漫射之前与收听位置相交的驱动器。反射驱动器是被瞄准为使得它们的分散图案的大多数在与收听位置相交之前被反射的驱动器,诸如图6中所示出的。如果系统处于人工配置模式,则可以通过UI向系统输入每个直接驱动器的3D坐标。对于反射驱动器,向UI输入原始反射的3D坐标。可以使用激光器或类似的技术来可视化散开的驱动器到收听环境的表面上的分散图案,如此可以测量出3D坐标并人工地输入到系统。The renderer uses the speaker's position within the listening environment geometry to derive an audio signal for each individually addressable driver, including direct and reflex (up-firing) drivers. Direct drivers are drivers aimed such that the majority of their dispersion pattern intersects the listening position before being diffused by one or more reflective surfaces such as a floor, wall or ceiling. Reflective drivers are drivers aimed such that the majority of their scatter pattern is reflected before intersecting the listening position, such as that shown in FIG. 6 . If the system is in manual configuration mode, the 3D coordinates of each direct drive can be entered into the system through the UI. For reflection drivers, feed the UI the 3D coordinates of the original reflection. A laser or similar technique can be used to visualize the scattered driver onto the surface of the listening environment as a scatter pattern, such that 3D coordinates can be measured and manually entered into the system.

驱动器位置和瞄准典型地使用人工的或自动的技术来执行。在某些情况下,可以将惯性传感器包含到每个扬声器中。在此模式中,中央扬声器被指定为“主”并且其罗盘测量值被视为参考。然后,其他扬声器传输它们的可单独寻址的驱动器中的每一个的分散图案和罗盘位置。与收听环境几何形状耦接,中央扬声器和每个附加驱动器的参考角之间的差异为系统自动地确定驱动器是直接还是反射提供足够的信息。Driver positioning and aiming is typically performed using manual or automated techniques. In some cases, inertial sensors can be incorporated into each speaker. In this mode, the center speaker is designated as "main" and its compass measurement is taken as the reference. The other speakers then transmit the scatter pattern and compass position of each of their individually addressable drivers. Coupled with the listening environment geometry, the difference between the reference angles of the center speaker and each additional driver provides sufficient information for the system to automatically determine whether the drivers are direct or reflective.

如果使用3D位置(即,Ambisonic)麦克风,则可以完全自动化扬声器位置配置。在此模式中,系统向每个驱动器发送测试信号并记录响应。取决于麦克风类型,信号可能需要被转换成x、y、z表示。分析这些信号,以找出占优势的第一到达的x、y和z分量。与收听环境几何形状耦接,这通常为系统自动地设置所有扬声器位置(直接或反射的)的3D坐标提供了足够的信息。取决于收听环境几何形状,用于配置扬声器坐标的三个所描述的方法的混合组合可以比只单独地使用一种技术更有效。If a 3D positional (ie, ambisonic) microphone is used, speaker position configuration can be fully automated. In this mode, the system sends a test signal to each drive and records the response. Depending on the microphone type, the signal may need to be converted into an x, y, z representation. These signals are analyzed to find the dominant first arriving x, y and z components. Coupled with the listening environment geometry, this usually provides enough information for the system to automatically set the 3D coordinates of all loudspeaker positions (direct or reflected). Depending on the listening environment geometry, a hybrid combination of the three described methods for configuring loudspeaker coordinates may be more efficient than using only one technique alone.

扬声器配置信息是配置渲染器所需的一个组件。扬声器校准信息也是配置后处理链(延迟、均衡和增益)所必需的。图12是示出了在实施例下的使用单个麦克风来执行自动扬声器校准的处理步骤的流程图。在此模式中,由系统使用位于收听位置的中间的单个全方向性测量麦克风来自动地计算延迟、均衡和增益。如图1200所示,在块1202中,过程测量每单个驱动器的房间脉冲响应开始。然后,在块1204中,通过求出声脉冲响应(利用麦克风捕捉到的)与直接捕捉到的电脉冲响应的互相关的峰值偏移来计算每个驱动器的延迟。在块1206中,将计算出的延迟应用于直接捕捉到的(参考)脉冲响应。然后,在块1208中,过程确定当应用于测量的脉冲响应时导致其与直接捕捉的(参考)脉冲响应之间的差异最小的宽带和每个带的增益值。该步骤可以这样进行:求取测量的脉冲响应和参考脉冲响应的窗口化的FFT,计算两个信号之间的每区间(bin)的量值比,对每区间的量值比应用中值滤波器,通过平均化完全落入带内的所有区间的增益来计算每个带的增益值,通过求取所有的每个带的增益的平均值来计算宽带增益,从每个带的增益减去宽带增益,并且应用小室X曲线(在2kHz以上,-2dB/倍频)。一旦在块1208中确定了增益值,在1210中,过程通过从其他值减去最小延迟来确定最终的延迟值,使得系统中的至少一个驱动器将总是具有零额外延迟。Speaker configuration information is a component required to configure the renderer. Speaker calibration information is also necessary to configure the post-processing chain (delay, equalization and gain). Figure 12 is a flow diagram illustrating process steps for performing automatic speaker calibration using a single microphone, under an embodiment. In this mode, delay, equalization and gain are automatically calculated by the system using a single omnidirectional measurement microphone located in the middle of the listening position. As shown in diagram 1200, in block 1202, the process begins by measuring the room impulse response of each individual driver. Then, in block 1204, the delay of each driver is calculated by finding the peak shift of the cross-correlation of the acoustic impulse response (captured with the microphone) and the directly captured electrical impulse response. In block 1206, the calculated delay is applied to the directly captured (reference) impulse response. Then, in block 1208, the process determines the broadband and gain values for each band that, when applied to the measured impulse response, result in the smallest difference between it and the directly captured (reference) impulse response. This step can be performed by taking a windowed FFT of the measured impulse response and the reference impulse response, computing the magnitude ratio per bin between the two signals, and applying a median filter to the magnitude ratio per bin , calculates the gain value for each band by averaging the gains of all intervals that fall completely within the band, and calculates the wideband gain by averaging all the gains for each band, subtracting from the gain for each band Wideband gain, and apply a chamber X-curve (-2dB/octave above 2kHz). Once the gain value is determined in block 1208, in 1210 the process determines the final delay value by subtracting the minimum delay from the other values such that at least one driver in the system will always have zero extra delay.

在使用多个麦克风进行自动校准的情况下,由系统使用多个全方向性测量麦克风来自动地计算延迟、均衡和增益。过程基本上与单麦克风技术相同,除了针对每个麦克风重复该过程并将结果平均。In the case of automatic calibration using multiple microphones, delay, equalization and gain are automatically calculated by the system using multiple omnidirectional measurement microphones. The process is basically the same as the single-microphone technique, except the process is repeated for each microphone and the results are averaged.

替代的应用alternative application

可以在诸如电视、计算机、游戏控制台或类似的设备之类的更为局部化的应用中实现自适应音频系统的方面,而非在整个收听环境或剧院中实现自适应音频系统。该情况有效地依赖于排列在与观看屏幕或监视器表面对应的平面中的扬声器。图13示出了在示例性的电视机和音箱使用情况中使用自适应音频系统。一般而言,基于设备(TV扬声器、音箱扬声器等等)的常常降低的质量和在空间分辨率方面受限制(即,无环绕或后置扬声器)的扬声器位置/配置,电视使用情况提供了创建沉浸式音频体验的挑战。图13的系统1300包括标准电视左、右位置(TV-L和TV-R)的扬声器以及左、右向上激发的驱动器(TV-LH和TV-RH)。电视1302还可以包括音箱1304或某种高度阵列中的扬声器。一般而言,与独立式或家庭剧院扬声器相比,电视扬声器的大小和质量由于成本约束和设计选择而降低。然而,使用动态虚拟化可以帮助克服这些缺陷。在图13中,示出了针对TV-L和TV-R扬声器的动态虚拟化效果,使得特定收听位置1308处的人会听到与在水平面单独地渲染的合适的音频对象相关联的水平元素。另外,将通过由LH和RH驱动器传输的反射音频来正确地渲染与合适的音频对象相关联的高度元素。电视L和R扬声器中的立体声虚拟化的使用类似于L和R家庭影院扬声器,其中通过基于由自适应音频内容提供的对象空间信息动态控制扬声器虚拟化算法参数,潜在地沉浸式动态扬声器虚拟化用户体验是可能的。此动态虚拟化可以用于创建对象沿着收听环境的侧面移动的感知。Rather than implementing an adaptive audio system in an entire listening environment or theater, aspects of an adaptive audio system may be implemented in a more localized application, such as a television, computer, game console, or similar device. This situation effectively relies on speakers being arranged in a plane corresponding to the viewing screen or monitor surface. Figure 13 illustrates the use of the adaptive audio system in an exemplary television and speaker use case. In general, TV use cases provide an opportunity to create The challenge of an immersive audio experience. The system 1300 of FIG. 13 includes speakers for standard TV left and right positions (TV-L and TV-R) and left and right up-firing drivers (TV-LH and TV-RH). Television 1302 may also include sound box 1304 or speakers in some height array. In general, TV speakers are reduced in size and mass due to cost constraints and design choices when compared to freestanding or home theater speakers. However, using dynamic virtualization can help overcome these drawbacks. In FIG. 13, a dynamic virtualization effect for TV-L and TV-R speakers is shown such that a person at a particular listening position 1308 would hear horizontal elements associated with appropriate audio objects rendered separately in the horizontal plane . Additionally, height elements associated with suitable audio objects will be correctly rendered via reflected audio delivered by the LH and RH drivers. The use of stereo virtualization in TV L and R speakers is similar to L and R home theater speakers, where by dynamically controlling speaker virtualization algorithm parameters based on object space information provided by adaptive audio content, potentially immersive dynamic speaker virtualization User experience is possible. This dynamic virtualization can be used to create the perception of objects moving along the sides of the listening environment.

电视环境还可以包括在音箱1304内所示的HRC扬声器。这种HRC扬声器可以是允许平移通过HRC阵列的可操纵单元。通过拥有带有可单独寻址的扬声器的前面激发的中央声道阵列可以具有益处(特别是对于较大的屏幕),所述可单独寻址的扬声器通过匹配视频对象在屏幕上的移动的阵列允许音频对象的单独平移。该扬声器还被示为具有侧面激发的扬声器。如果扬声器被用作音箱,则可以激活并使用这些侧面激发的扬声器,使得侧面激发的驱动器由于缺乏环绕或后置扬声器而提供更多的沉浸。还示出了用于HRC/音箱扬声器的动态虚拟化概念。示出了用于在前面激发的扬声器阵列的最远侧的L和R扬声器的动态虚拟化。此外,这可以用于创建对象沿着收听环境的侧面移动的感知。此修改的中央扬声器还可以包括更多扬声器,并实现带有分开控制的声音区域的可操纵声束。在图13的示例性实现中还示出了位于主要收听位置1308的前面的NFE扬声器1306。包括NFE扬声器可以通过移动声音远离收听环境的前面并且更靠近听者,来提供由自适应音频系统提供的更大的环绕感。The television environment may also include HRC speakers shown within sound box 1304 . Such an HRC loudspeaker may be a steerable unit that allows panning through the HRC array. There can be benefits (especially for larger screens) by having a front-firing center channel array with individually addressable speakers that match the movement of video objects across the screen through the array Allows individual panning of audio objects. The loudspeaker is also shown as a side fired loudspeaker. If the speakers are being used as enclosures, these side fired speakers can be activated and used such that the side fired drivers provide more immersion due to the lack of surround or rear speakers. A dynamic virtualization concept for HRC/cabinet loudspeakers is also shown. Dynamic virtualization for the farthest L and R loudspeakers of the loudspeaker array fired in front is shown. Furthermore, this can be used to create the perception of objects moving along the sides of the listening environment. This modified center speaker can also include more speakers and enable steerable sound beams with separately controlled sound zones. Also shown in the exemplary implementation of FIG. 13 is an NFE speaker 1306 positioned in front of a primary listening position 1308 . Including NFE speakers can provide the greater sense of surroundness provided by adaptive audio systems by moving the sound away from the front of the listening environment and closer to the listener.

相对于耳机渲染,自适应音频系统通过将HRTF与空间位置匹配,来维护创建者的原始意图。当通过耳机再现音频时,可以通过应用处理音频的头部相关传递函数来实现两耳的空间虚拟化,并添加创建音频正在在三维空间中而非在标准立体声耳机回放的感知的感知提示。空间再现的准确度依赖于选择合适的HRTF,HRTF会基于包括正在被渲染的音频声道或对象的空间位置的多种因素而变化。使用由自适应音频系统提供的空间信息会导致对表示3D空间的一个或数量持续变化的HRTF的选择,以极大地改善再现体验。In contrast to headphone rendering, adaptive audio systems maintain the creator's original intent by matching HRTFs to spatial positions. When reproducing audio through headphones, spatial virtualization of both ears can be achieved by applying a head-related transfer function that processes the audio and adds perceptual cues that create the perception that the audio is being played back in three-dimensional space rather than standard stereo headphones. The accuracy of spatial reproduction is dependent on selecting an appropriate HRTF, which can vary based on a number of factors including the spatial location of the audio channel or object being rendered. Using the spatial information provided by the adaptive audio system results in the selection of one or a continuously varying number of HRTFs representing the 3D space to greatly improve the reproduction experience.

该系统还促进了添加引导的三维的两耳的渲染和虚拟化。类似于空间渲染的情况,使用新的和修改的扬声器类型和位置时,可以通过使用三维的HRTF来创建提示以模拟来自水平面和垂直轴的音频的声音。只提供声道和固定的扬声器位置信息渲染的以前的音频格式更加受限。采用自适应音频格式信息,两耳的三维渲染耳机系统具有详细的、有用的可用于指示音频的哪些元素适于在水平和垂直平面上都渲染的信息。有些内容可能依赖于使用头顶扬声器来提供更大的环绕感觉。这些音频对象和信息可以用于两耳渲染,当使用耳机时所述两耳渲染感觉是在听者头上。图14示出了在实施例下的用于自适应音频系统中的三维两耳耳机虚拟化体验的简化表示。如图14所示,用于从自适应音频系统再现音频的耳机1402包括标准x、y平面以及z平面中的音频信号1404,使得与某些音频对象或声音相关联的高度被回放为使得它们听起来像它们在x、y产生的声音上方或下方产生。The system also facilitates the rendering and visualization of the two ears for added guided three-dimensionality. Similar to the case of spatial rendering, with new and modified speaker types and positions, cues can be created by using HRTFs in three dimensions to simulate the sound of audio from the horizontal and vertical axes. Previous audio formats that only provided channel and fixed speaker position information rendering were more limited. Using adaptive audio format information, a binaural 3D rendering headphone system has detailed, useful information that can be used to indicate which elements of the audio are suitable for rendering in both the horizontal and vertical planes. Some content may rely on the use of overhead speakers to provide a greater sense of surround. These audio objects and information can be used for binaural rendering of what is perceived as being on the listener's head when headphones are used. Figure 14 shows a simplified representation of a three-dimensional headphone virtualization experience for use in an adaptive audio system, under an embodiment. As shown in FIG. 14 , a headset 1402 for reproducing audio from an adaptive audio system includes audio signals 1404 in standard x, y and z planes such that heights associated with certain audio objects or sounds are played back such that they Sounds like they are produced above or below the sound produced by x,y.

元数据定义metadata definition

在实施例中,自适应音频系统包括从原始空间音频格式生成元数据的组件。系统300的方法和组件包括被配置为处理一个或更多个包含传统的基于声道的音频元素和音频对象编码元素这两者的比特流的音频渲染系统。包含音频对象编码元素的新扩展层被定义并被添加到基于声道的音频编解码器比特流和音频对象比特流中的任何一个。此方法允许包括扩展层的比特流由渲染器处理,以用于现有的扬声器和驱动器设计或使用可单独寻址的驱动器和驱动器定义的下一代扬声器。来自空间音频处理器的空间音频内容包括音频对象、声道以及位置元数据。当对象被渲染时,根据位置元数据和回放扬声器的位置,它被分配到一个或更多个扬声器。额外的元数据可以与对象相关联,以改变回放位置或否则限制要用于回放的扬声器。响应于工程师的混合输入,在音频工作站中生成元数据,以提供控制空间参数(例如,位置、速度、强度、音色等)并指明收听环境中的哪些驱动器或扬声器在展示期间播放各自的声音的渲染队列。元数据与工作站中的各自的音频数据相关联,以由空间音频处理器打包和传输。In an embodiment, an adaptive audio system includes a component that generates metadata from a raw spatial audio format. The methods and components of system 300 include an audio rendering system configured to process one or more bitstreams containing both conventional channel-based audio elements and audio object coding elements. A new extension layer containing audio object coding elements is defined and added to any one of the channel-based audio codec bitstream and the audio object bitstream. This approach allows bitstreams including extension layers to be processed by the renderer for use in existing loudspeaker and driver designs or next-generation loudspeakers defined using individually addressable drivers and drivers. Spatial audio content from a spatial audio processor includes audio objects, channels, and positional metadata. When an object is rendered, it is assigned to one or more speakers based on the position metadata and the playback speaker's position. Additional metadata can be associated with the object to change the playback position or otherwise limit the speakers to be used for playback. Metadata is generated in the audio workstation in response to the engineer's mixing input to provide control over spatial parameters (e.g., position, velocity, intensity, timbre, etc.) and to indicate which drivers or speakers in the listening environment played their respective sounds during the presentation Render queue. Metadata is associated with the respective audio data in the workstation to be packaged and transmitted by the spatial audio processor.

图15是示出了在实施例下的用于收听环境的自适应音频系统中的某些元数据定义的表。如表1500所示,元数据定义包括:音频内容类型、驱动器定义(数量、特性、位置、投射角)、用于积极的操纵/调谐的控制信号以及包括房间和扬声器信息的校准信息。Figure 15 is a table showing certain metadata definitions in an adaptive audio system for a listening environment, under an embodiment. As shown in table 1500, metadata definitions include: audio content type, driver definition (number, characteristics, location, throw angle), control signals for active manipulation/tuning, and calibration information including room and speaker information.

特征和能力characteristics and abilities

如上所述,自适应音频生态系统允许内容创建者经由元数据把混合的空间意图(位置,大小、速度等)嵌入在比特流内。这允许在音频的空间再现中有惊人的灵活性。从空间渲染的观点来看,自适应音频格式使内容创建者能够使混合适配收听环境中的扬声器的准确位置,以避免回放系统与创作系统的几何形状不同所导致的空间失真。在只发送扬声器声道的音频的当前音频再现系统中,内容创建者的意图对于收听环境中除了固定扬声器位置之外的位置是未知的。在当前的声道/扬声器范例下,唯一已知的信息是特定音频声道应该被发送到在收听环境中具有预定义的位置的特定扬声器。在自适应音频系统中,使用通过创建和分发流水线传达的元数据,再现系统可以使用该信息来以匹配内容创建者的原始意图的方式再现内容。例如,对于不同的音频对象,扬声器之间的关系是已知的。通过提供音频对象的空间位置,内容创建者的意图是已知的,并且这可以被“映射”到扬声器配置上,包括它们的位置。采用动态渲染音频渲染系统,该渲染可以通过添加额外的扬声器来更新和改善。As mentioned above, the adaptive audio ecosystem allows content creators to embed the spatial intent of the mix (position, size, velocity, etc.) within the bitstream via metadata. This allows for amazing flexibility in the spatial reproduction of audio. From a spatial rendering point of view, adaptive audio formats enable content creators to adapt the mix to the exact location of the speakers in the listening environment to avoid spatial distortions caused by the different geometries of the playback system and the authoring system. In current audio reproduction systems that only send audio from speaker channels, the content creator's intent is unknown to locations in the listening environment other than fixed speaker locations. Under the current channel/speaker paradigm, the only known information is that a specific audio channel should be sent to a specific speaker with a predefined location in the listening environment. In an adaptive audio system, using the metadata conveyed through the creation and distribution pipeline, the reproduction system can use this information to reproduce the content in a manner that matches the original intent of the content creator. For example, the relationship between speakers is known for different audio objects. By providing the spatial location of the audio objects, the intent of the content creator is known, and this can be "mapped" onto the speaker configuration, including their location. Featuring a dynamic rendering audio rendering system that can be updated and improved by adding additional speakers.

该系统还使得能够添加引导的三维空间渲染。有许多通过使用新的扬声器设计和配置来创建更加沉浸式的音频渲染体验的尝试。这些尝试包括使用双极扬声器,侧面激发、后面激发和向上激发的驱动器。采用以前的声道和固定的扬声器位置系统,确定音频的哪些元素应该发送到这些修改的扬声器是相对困难的。使用自适应音频格式,渲染系统具有音频的哪些元素(对象或其他)适于发送到新扬声器配置的详细且有用的信息。即,系统允许对哪些音频信号发送到前面激发的驱动器以及哪些音频信号发送到向上激发的驱动器进行控制。例如,自适应音频影院内容很大程度上依赖于使用头顶扬声器来提供较大的环绕感觉。这些音频对象和信息可以发送到向上激发的驱动器,以在收听环境中提供反射的音频从而产生类似的效果。The system also enables the addition of guided three-dimensional space rendering. There have been many attempts to create a more immersive audio rendering experience by using new speaker designs and configurations. These attempts included the use of bipolar loudspeakers, side-firing, rear-firing and upward-firing drivers. With the previous system of channels and fixed speaker positions, it was relatively difficult to determine which elements of the audio should be sent to these modified speakers. Using an adaptive audio format, the rendering system has detailed and useful information of which elements of the audio (object or otherwise) are suitable for sending to the new speaker configuration. That is, the system allows control over which audio signals are sent to the front-firing drivers and which are sent to the upward-firing drivers. For example, adaptive audio cinema content relies heavily on the use of overhead speakers to provide a greater surround feeling. These audio objects and information can be sent to an upward firing driver to provide reflected audio in the listening environment to create a similar effect.

该系统还允许将混合适配到再现系统的准确的硬件配置。在诸如电视机、家庭剧院、音箱、便携式音乐回放器对接器等之类的渲染设备中存在许多不同的可能的扬声器类型和配置。当给这些系统发送声道特定的音频信息(即,左右声道或标准多声道音频)时,系统必须处理音频以适当地匹配渲染设备的能力。典型示例是在向具有两个以上的扬声器的音箱发送标准立体(左,右)音频的时候。在只发送扬声器声道的音频的当前音频系统中,内容创建者的意图是未知的,并且通过增强的设备而变得可能的更加沉浸式的音频体验必须由假设如何修改用于在硬件上再现音频的算法来创建。这种情况的示例是使用PLII、PLII-z或下一代环绕以将基于声道的音频“向上混合”到比原始声道馈送的数量更多的扬声器。采用自适应音频系统,使用在创建和分发流水线中传达的元数据,再现系统可以使用该信息来以更紧密地匹配内容创建者的原始意图的方式再现内容。例如,某些音箱具有侧面激发的扬声器以创建环绕感觉。采用自适应音频,空间信息和内容类型信息(即,对话、音乐、环境效果等)当受诸如TV或A/V接收器之类的渲染系统控制时可以被音箱用来只把合适的音频发送到这些侧面激发的扬声器。The system also allows adapting the hybrid to the exact hardware configuration of the rendering system. There are many different possible speaker types and configurations in rendering devices such as televisions, home theaters, sound boxes, portable music player docks, and the like. When sending channel-specific audio information (ie, left and right channels or standard multi-channel audio) to these systems, the system must process the audio to appropriately match the capabilities of the rendering device. A typical example is when sending standard stereo (left, right) audio to a cabinet with more than two speakers. In current audio systems that only send audio from the speaker channels, the intent of the content creator is unknown, and the more immersive audio experiences made possible by enhanced devices must be modified by assumptions about how to reproduce on hardware audio algorithms to create. An example of this is the use of PLII, PLII-z, or Next Generation Surround to "upmix" channel-based audio to a greater number of speakers than the original channel feed. With an adaptive audio system, using the metadata communicated in the creation and distribution pipeline, the reproduction system can use this information to reproduce the content in a manner that more closely matches the original intent of the content creator. For example, some cabinets have side-firing speakers to create a surround feeling. With adaptive audio, spatial information and content type information (i.e. dialog, music, ambient effects, etc.) can be used by speakers to send only appropriate audio when controlled by a rendering system such as a TV or A/V receiver to these side fired speakers.

由自适应音频传送的空间信息允许在知道扬声器的位置和类型的情况下对内容进行动态渲染。另外,关于听者(一个或多个)与音频再现设备的关系的信息现在潜在地可用,并可以用于渲染。大多数游戏控制台包括照相机配件和可以确定人在收听环境中的位置和身份的智能图像处理。该信息可以由自适应音频系统用来改变渲染,以基于听者的位置来更准确地传达内容创建者的创意意图。例如,在几乎所有的情况中,针对回放渲染的音频假设听者位于理想的“最佳听音位置”,所述“最佳听音位置”常常与每一扬声器等距离并且是在内容创建期间调音师所处的相同位置。然而,许多时候人们并不在此理想位置,并且他们的体验不匹配调音师的创意意图。典型示例是听者坐在收听环境的左侧的椅子或床上。对于此情况,从左边的较近的扬声器再现的声音将被感觉为较响并且音频混合的空间感知向左偏斜。通过了解听者的位置,系统可以调整音频的渲染以降低左边的扬声器的音量,并提高右边的扬声器的音量,以重新平衡音频混合并使得在感觉上正确。延迟音频以补偿听者与最佳听音位置的距离也是可以的。听者位置可以通过使用照相机或带有将听者位置发送到渲染系统的某种内置发信装置的修改的遥控器来检测到。The spatial information conveyed by Adaptive Audio allows dynamic rendering of content with knowledge of the location and type of speakers. Additionally, information about the relationship of the listener(s) to the audio reproduction device is now potentially available and can be used for rendering. Most game consoles include camera accessories and intelligent image processing that can determine a person's position and identity in the listening environment. This information can be used by an adaptive audio system to alter the rendering to more accurately convey the content creator's creative intent based on the listener's position. For example, in almost all cases, audio rendered for playback assumes that the listener is located in an ideal "sweet spot" that is often equidistant from each speaker and that occurs during content creation. The same position that the tuner is in. However, many times people are not in this ideal position and their experience does not match the sound engineer's creative intent. A typical example is a listener sitting on a chair or bed on the left side of the listening environment. For this case, the sound reproduced from the closer speaker on the left will be perceived as louder and the spatial perception of the audio mix is skewed to the left. Knowing the position of the listener, the system can adjust the rendering of the audio to lower the volume of the left speaker and raise the volume of the right speaker to rebalance the audio mix and make it feel correct. It is also possible to delay the audio to compensate for the listener's distance from the sweet spot. The listener position can be detected by using a camera or a modified remote control with some kind of built-in signaling that sends the listener position to the rendering system.

除了使用标准扬声器和扬声器位置来寻址收听位置之外,也可以使用波束操纵技术来创建随着听者位置和内容变化的声场“区域”。音频波束形成使用扬声器阵列(通常8到16个水平地隔开的扬声器),并使用相位操纵和处理来创建可操纵的声束。波束形成扬声器阵列允许创建主要听得见的音频的音频区域,所述音频区域可用于把经过选择性处理的特定声音或对象指向特定空间位置。明显的使用情况是使用对话增强后处理算法来处理声轨中的对话并且把该音频对象以波束直接发到听力障碍的用户。In addition to using standard speakers and speaker positions to address listening positions, beam steering techniques can also be used to create soundstage "zones" that vary with the listener's position and content. Audio beamforming uses an array of speakers (typically 8 to 16 horizontally spaced speakers) and uses phase manipulation and processing to create steerable beams of sound. Beamforming speaker arrays allow the creation of audio regions of predominantly audible audio that can be used to direct selectively processed specific sounds or objects to specific spatial locations. An obvious use case is to use a dialogue enhancing post-processing algorithm to process dialogue in a soundtrack and beam that audio object directly to a hearing impaired user.

矩阵编码和空间向上混合Matrix encoding and spatial upmixing

在某些情况下,音频对象可以是自适应音频内容的期望的成分;然而,基于带宽限制,可能不能发送声道/扬声器音频和音频对象。过去,矩阵编码已用于为给定的分发系统传送比可能更多的音频信息。例如,这是影院的早期的情况:调音师创建了多声道音频,但是电影格式只提供立体音频。矩阵编码被用来智能地将多声道音频向下混合到两个立体声道,然后利用某些算法来处理这两个立体声道,以从立体音频重新创建多声道混合的接近的近似。类似地,可以智能地将音频对象向下混合到基本扬声器声道,并且通过使用自适应音频元数据和复杂的时间和频率敏感的下一代环绕算法,来提取对象并利用自适应音频渲染系统在空间上正确地渲染它们。In some cases, audio objects may be a desired component of adaptive audio content; however, based on bandwidth limitations, it may not be possible to send channel/speaker audio and audio objects. In the past, matrix coding has been used to convey more audio information than is possible for a given distribution system. For example, this was the case in the early days of cinema: sound engineers created multichannel audio, but movie formats only provided stereo audio. Matrix encoding is used to intelligently downmix the multichannel audio to two stereo channels, which are then processed with certain algorithms to recreate a close approximation of the multichannel mix from the stereo audio. Similarly, audio objects can be intelligently downmixed to the base speaker channel, and by using adaptive audio metadata and sophisticated time- and frequency-sensitive next-generation surround algorithms to extract objects and leverage the adaptive audio rendering system in Render them spatially correctly.

另外,当用于音频的传输系统存在带宽限制时(例如,3G和4G无线应用),传输与单独的音频对象一起矩阵编码的在空间上不同的多声道床也有益处。这种传输方法的一个使用情况将是用于传输带有两个不同的音频床(audio bed)和多个音频对象的体育广播。音频床可以代表在两个不同的队伍看台部分捕捉到的多声道音频,并且音频对象可以代表可能对一个队伍或其他队伍抱有好感的不同广播员。使用标准编码,每个床以及两个或更多个对象的5.1呈现可能会超出传输系统的带宽限制。在此情况下,如果5.1个床中的每一个都被矩阵编码为立体声信号,那么最初被捕捉为5.1声道的两个床可以作为双声道床1、双声道床2、对象1和对象2传输,从而只有四个音频声道,而不是5.1+5.1+2或12.1个声道。Additionally, when the transmission system for audio is bandwidth-limited (eg, 3G and 4G wireless applications), it is also beneficial to transmit spatially distinct multi-channel beds matrix-coded with individual audio objects. One use case for this transmission method would be for transmission of a sports broadcast with two different audio beds and multiple audio objects. Audio beds could represent multi-channel audio captured in two different sections of the team stands, and audio objects could represent different announcers who might have a soft spot for one team or the other. Using standard encoding, 5.1 rendering of each bed and two or more objects may exceed the bandwidth limitations of the transmission system. In this case, if each of the 5.1 beds is matrix-encoded as a stereo signal, the two beds that were originally captured as 5.1 channels can be used as binaural bed 1, binaural bed 2, object 1, and Object 2 is transmitted so that there are only four audio channels instead of 5.1+5.1+2 or 12.1 channels.

位置和内容相关的处理Location and context-dependent processing

自适应音频生态系统允许内容创建者创建单独的音频对象,并添加可以被传送到再现系统的关于内容的信息。这允许在再现之前在对音频的处理中有很大的灵活性。通过基于对象位置和大小对扬声器虚拟化进行动态控制,可以使处理适用于对象的位置和类型。扬声器虚拟化是指处理音频使得听者感觉到虚拟扬声器的方法。当源音频是包括环绕扬声器声道馈送的多声道音频时,该方法常用于立体声扬声器再现。虚拟扬声器处理以这样的方式修改环绕扬声器声道音频:当环绕扬声器声道音频在立体声扬声器上回放时,环绕音频元素被虚拟化到听者的侧面和后面,就好像那里有虚拟扬声器。目前,虚拟扬声器位置的位置属性是静态的,因为环绕扬声器的预定位置是固定的。然而,采用自适应音频内容,不同音频对象的空间位置是动态且不同的(即,对每个对象唯一)。有可能现在可以通过动态地控制诸如每个对象的扬声器位置角度之类的参数并然后组合若干个虚拟化的对象的渲染的输出以创建更接近地表现调音师的意图的更加沉浸式的音频体验,来以更加灵通的方式控制诸如虚拟扬声器虚拟化之类的后处理。The adaptive audio ecosystem allows content creators to create individual audio objects and add information about the content that can be passed to the reproduction system. This allows a great deal of flexibility in the processing of the audio prior to reproduction. Dynamic control of speaker virtualization based on object location and size enables processing to be adapted to the object's location and type. Speaker virtualization refers to the method of processing audio so that the listener perceives virtual speakers. This method is commonly used for stereo speaker reproduction when the source audio is multi-channel audio that includes surround speaker channel feeds. Virtual speaker processing modifies surround speaker channel audio in such a way that when surround speaker channel audio is played back on stereo speakers, surround audio elements are virtualized to the sides and behind the listener, as if there were virtual speakers there. Currently, the position attribute of the virtual speaker position is static because the predetermined positions of the surround speakers are fixed. However, with adaptive audio content, the spatial positions of different audio objects are dynamic and different (ie, unique to each object). It is now possible to dynamically control parameters such as the speaker position angle for each object and then combine the rendered output of several virtualized objects to create a more immersive audio that more closely represents the sound engineer's intent experience to gain more intuitive control over post-processing such as virtual speaker virtualization.

除音频对象的标准水平虚拟化之外,还可以使用处理固定声道和动态对象音频的感知高度提示并在正常的水平面位置从一对标准的立体声扬声器获得音频的高度再现的感知。In addition to standard horizontal virtualization of audio objects, it is possible to use perceptual height cues to handle fixed-channel and dynamic object audio and obtain a perception of height reproduction of audio from a pair of standard stereo speakers at normal horizontal plane positions.

某些效果或增强过程可以审慎地应用于合适的类型的音频内容。例如,对话增强可以只应用于对话对象。对话增强是指处理包含对话的音频使得对话的可听度和/或可理解性增加和/或改善的方法。在很多情况中,应用于对话的音频处理对于非对话音频内容(即,音乐、环境效果等)是不合适的,并会导致讨厌的能听到的噪声。采用自适应音频,音频对象可以只包含一条内容中的对话,并可以被相应地标记使得渲染解决方案将有选择地只对对话内容应用对话增强。另外,如果音频对象只是对话(而不是作为常见情况的对话和其他内容的混合),那么对话增强处理可以排他性地处理对话(由此限制对任何其他内容执行的任何处理)。Certain effects or enhancement processes can be judiciously applied to appropriate types of audio content. For example, dialog enhancement may only be applied to the dialog object. Dialogue enhancement refers to methods of processing audio containing dialogue such that the audibility and/or intelligibility of the dialogue is increased and/or improved. In many cases, audio processing applied to dialogue is inappropriate for non-dialogue audio content (ie, music, ambient effects, etc.) and can result in objectionable audible noise. With adaptive audio, an audio object can contain only dialogue within a piece of content, and can be marked accordingly so that the rendering solution will selectively apply dialogue enhancement only to the dialogue content. Additionally, if the audio object is just dialogue (rather than a mix of dialogue and other content as is the usual case), then dialogue enhancement processing may treat dialogue exclusively (thus limiting any processing performed on any other content).

类似地,也可以针对特定音频特征定制音频响应或均衡管理。例如,低音管理(滤波、衰减、增益)基于类型而针对特定的对象。低音管理是指有选择地只隔离和处理特定内容块中的低音(或较低的)频率。采用当前的音频系统和输送机制,这是应用于全部音频的“盲”过程。采用自适应音频,低音管理适当的特定音频对象可以由元数据识别,并且渲染处理被适当地应用。Similarly, audio response or equalization management can also be tailored for specific audio characteristics. For example, bass management (filtering, attenuation, gain) is targeted to specific objects based on type. Bass management refers to selectively isolating and processing only bass (or lower) frequencies within specific content blocks. With current audio systems and delivery mechanisms, this is a "blind" process applied to all audio. With Adaptive Audio, specific audio objects for which bass management is appropriate can be identified by metadata, and rendering processing applied appropriately.

自适应音频系统还促进基于对象的动态范围压缩。传统的声轨与内容本身具有相同的时长,而音频对象可能在内容中出现有限量的时间。与对象相关联的元数据可以包含关于其平均和峰值信号振幅,以及其发作或触发时间(特别是对于瞬时材料)的电平相关的信息。该信息将允许压缩器更好地修改其压缩和时间常数(触发、释放等)以更好地适应内容。The adaptive audio system also facilitates object-based dynamic range compression. Traditional soundtracks have the same duration as the content itself, whereas audio objects may appear within the content for a finite amount of time. Metadata associated with an object may contain level-related information about its average and peak signal amplitudes, as well as its onset or trigger times (particularly for transient material). This information will allow the compressor to better modify its compression and time constants (trigger, release, etc.) to better suit the content.

该系统还促进自动扬声器房间均衡。扬声器和收听环境音响效果在向声音引入能听得见的着色由此影响再现的声音的音色中起着重要的作用。此外,由于收听环境反射和扬声器-方向性的变化,音响效果依赖于位置,并且因为所述变化,感觉到的音色将针对不同的收听位置而显著地变化。通过自动扬声器-房间频谱测量和均衡、自动的延时补偿(提供合适的成像和可能地基于最小二乘的相对扬声器位置检测)及等级设置、基于扬声器净空功能的低音-重定向以及主要扬声器与低音扬声器的最佳拼接,系统中所提供的AutoEQ(自动房间均衡)功能帮助减轻这些问题中的一些。在家庭剧院或其他收听环境中,自适应音频系统包括某些附加功能,诸如:(1)基于回放房间音响效果的自动化目标曲线计算(在对家用收听环境中的均衡的研究中,这被视为开放问题),(2)使用时间-频率分析的模态衰减控制的影响,(3)理解根据管理环绕感/宽敞/源-宽度/可理解性的测量值导出的参数并控制这些参数以提供尽可能好的收听体验,(4)包含用于匹配前面的扬声器和“其他”扬声器之间的音色的头部-模型的定向滤波,以及(5)检测扬声器在相对于听者的分离的设置和空间重新映射中的空间位置(例如,Summit wireless将是一个示例)。在前锚扬声器(front-anchor loudspeaker)(例如,中央)和环绕/后置/宽/高度扬声器之间的某些平移的内容特别显示出扬声器之间的音色的不匹配。The system also facilitates automatic speaker room equalization. Loudspeaker and listening environment acoustics play an important role in introducing audible coloration to the sound thereby affecting the timbre of the reproduced sound. Furthermore, the acoustics are position dependent due to variations in listening environment reflections and speaker-directivity, and because of said variations, perceived timbre will vary significantly for different listening positions. Through automatic speaker-room spectral measurement and equalization, automatic delay compensation (providing proper imaging and possibly least-squares based relative speaker position detection) and level setting, bass-redirection based on speaker headroom functions and main speaker vs. Optimal splicing of the woofer, and the AutoEQ (Automatic Room Equalization) feature provided in the system helps alleviate some of these problems. In home theater or other listening environments, adaptive audio systems include certain additional features such as: (1) automated target curve calculation based on the acoustics of the playback room (in studies of equalization in home listening environments, this was viewed as is an open question), (2) use time-frequency analysis for the effects of modal decay control, (3) understand the parameters derived from measurements governing surroundness/spaciousness/source-width/intelligibility and control these parameters to Provides the best possible listening experience, (4) includes head-model directional filtering for matching timbre between the front speakers and the "other" speakers, and (5) detects speaker separation relative to the listener Spatial location in settings and spatial remapping (e.g. Summit wireless would be an example). Certain panned content between the front-anchor loudspeaker (eg center) and the surround/rear/wide/height speakers is particularly indicative of a mismatch in timbre between the speakers.

总的来说,如果有些音频元素的再现的空间位置匹配屏幕上的图像元素,自适应音频系统还实现了引入注目的音频/视频再现体验,特别是在家庭环境中屏幕尺寸较大的情况下。一个示例是让电影或电视节目中的对话在空间上与在屏幕上说话的人或角色重合。采用平常的基于扬声器声道的音频,没有容易的方法来确定对话在空间上应该定位在哪里以匹配屏幕上的人或角色的位置。采用自适应音频系统中可用的音频信息,可以容易地实现这种类型的音频/视觉对齐,即使在曾经以较大尺寸的屏幕为特色的家庭剧院系统中。视觉位置和音频空间对齐还可以用于非角色/对话对象,诸如汽车、卡车、动画等。In general, adaptive audio systems also enable a compelling audio/video reproduction experience, especially with larger screen sizes in a home environment, provided that the reproduced spatial position of some audio elements matches on-screen image elements . An example is having dialogue in a movie or TV show spatially coincide with the person or character speaking on screen. With normal speaker-channel-based audio, there is no easy way to determine where dialogue should be positioned spatially to match the position of a person or character on screen. Using the audio information available in adaptive audio systems, this type of audio/visual alignment can be easily achieved, even in home theater systems that once featured larger sized screens. Visual position and audio spatial alignment can also be used for non-character/dialogue objects such as cars, trucks, animations, etc.

自适应音频生态系统还通过允许内容创建者创建单独的音频对象并添加可以被传送到再现系统的关于内容的信息,来允许增强的内容管理。这允许在对音频的内容管理中有很大的灵活性。从内容管理的观点来看,自适应音频使得各种成为可能,诸如通过只替换对话对象来改变音频内容的语言,以缩小内容文件大小和/或缩短下载时间。电影、电视及其他娱乐节目典型地在国际上分布。这常常要求内容块中的语言按照它将被再现的位置而改变(在法国放映的电影用法语,在德国放映的电视节目用德语等等)。目前,这常常要求针对每种语言创建、打包并分发完全独立的音频声轨。采用自适应音频系统和音频对象的固有的概念,用于内容块的对话可以是独立的音频对象。这允许内容的语言被轻松地改变,而不会更新或改变音频声轨的诸如音乐、效果等之类的其他元素。这不仅适用于外语,还适用于针对某些观众、定向广告等的不适当的语言。The adaptive audio ecosystem also allows enhanced content management by allowing content creators to create individual audio objects and add information about the content that can be passed to the reproduction system. This allows for a great deal of flexibility in the content management of the audio. From a content management point of view, adaptive audio enables various things, such as changing the language of the audio content by replacing only the dialog objects, to reduce the content file size and/or to shorten the download time. Movies, television, and other entertainment programming are typically distributed internationally. This often requires that the language in the content block be changed according to where it will be reproduced (French for a movie shown in France, German for a TV show shown in Germany, etc.). Currently, this often requires creating, packaging, and distributing completely separate audio tracks for each language. Using the adaptive audio system and the inherent concepts of audio objects, dialogs for content blocks can be independent audio objects. This allows the language of the content to be easily changed without updating or changing other elements of the audio track such as music, effects, etc. This applies not only to foreign languages, but also to inappropriate language for certain audiences, targeted advertising, etc.

在此描述的音频环境的方面表示通过合适的扬声器和回放设备来回放音频或音频/视觉内容,并可以表示听者正在体验捕捉到的内容的回放的任何环境,诸如影院、音乐厅、露天剧场、家庭或房间、收听间(listening booth)、汽车、游戏控制台、耳筒或耳机系统、公共广播(PA)系统或任何其他回放环境。虽然主要是关于其中空间音频内容与电视内容相关联的家庭影院环境中的示例和实现来描述实施例,但是应该注意的是,实施例也可以在其他系统中实现。包括基于对象的音频和基于声道的音频的空间音频内容可以和任何相关联的内容(相关联的音频、视频、图形等)一起使用,或者它可以构成独立的音频内容。回放环境可以是从耳机或近场监视器到小或大房间、汽车、露天舞台、音乐厅等的任何合适的收听环境。The aspects of the audio environment described herein represent the playback of audio or audio/visual content through suitable speakers and playback devices, and may represent any environment in which the listener is experiencing the playback of the captured content, such as a theater, concert hall, amphitheater , home or room, listening booth, automobile, game console, headset or headphone system, public address (PA) system, or any other playback environment. Although embodiments are primarily described with respect to examples and implementations in a home theater environment where spatial audio content is associated with television content, it should be noted that embodiments may also be implemented in other systems. Spatial audio content, including object-based audio and channel-based audio, can be used with any associated content (associated audio, video, graphics, etc.), or it can constitute stand-alone audio content. The playback environment may be any suitable listening environment from headphones or near-field monitors to small or large rooms, cars, open stages, concert halls, and the like.

此处所描述的系统的方面可以在用于处理数字或数字化的音频文件的合适的基于计算机的声音处理网络环境中实现。自适应音频系统的部分可以包括含有任何期望数量的单独的机器的一个或更多个网络,包括用于对在计算机之间传输的数据进行缓冲和路由的一个或更多个路由器(未示出)。这样的网络可以建立在各种不同的网络协议上,并可以是因特网、广域网(WAN)、局域网(LAN)或其任何组合。在其中网络包括因特网的实施例中,一个或更多个机器可以被配置成通过web浏览器程序来访问因特网。Aspects of the systems described herein may be implemented in a suitable computer-based sound processing network environment for processing digital or digitized audio files. Portions of the adaptive audio system may include one or more networks comprising any desired number of individual machines, including one or more routers (not shown) for buffering and routing data transmitted between the computers. ). Such a network can be built on a variety of different network protocols and can be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof. In embodiments where the network includes the Internet, one or more machines may be configured to access the Internet through a web browser program.

组件、块、过程或其他功能性组件中的一个或更多个可以通过控制系统的基于处理器的计算设备的执行的计算机程序来实现。还应注意,在此公开的各种功能就它们的行为、寄存器传送、逻辑组件,和/或其他特性而言可以使用硬件、固件的任意数量的组合和/或作为在各种机器可读的或计算机可读介质中实施的数据和/或指令来描述。其中可以实施这样的格式化的数据和/或指令的计算机可读介质包括但不限于各种形式的物理的(非瞬时的)非易失性存储介质,诸如光学、磁性或半导体存储介质。One or more of the components, blocks, procedures or other functional components may be implemented by a computer program that controls the execution of a processor-based computing device of the system. It should also be noted that the various functions disclosed herein may use any number of combinations of hardware, firmware, and/or be implemented in various machine-readable or data and/or instructions embodied in a computer readable medium. Computer-readable media on which such formatted data and/or instructions may be embodied include, but are not limited to, various forms of physical (non-transitory) non-volatile storage media, such as optical, magnetic, or semiconductor storage media.

除非上下文明确要求,否则在整个说明书和权利要求中,“包括”等词语要在包含性的意义上理解,而不是在排斥性或详尽性的意义上理解;也就是说,在“包括但不限于”的意义上理解。使用单数或复数的词语也可以分别包括复数和单数。另外,词语“在此”、“在此之下”、“以上”、“以下”以及类似导入的词语,是指作为整体的本申请,而不是指本申请的任何特定部分。在引用两个或更多个项目的列表使用词语“或”时,该词语涵盖了下列对该词语的所有解释:列表中的任何一个项目、列表中的所有项目以及列表中的项目的任何组合。Unless the context clearly requires otherwise, throughout the specification and claims, the words "comprise" and "comprise" are to be read in an inclusive sense, not in an exclusive or exhaustive sense; that is, in "including but not understood in the sense of "limited to". Words using the singular or the plural may also include the plural and the singular respectively. Additionally, the words "herein," "herein," "above," "below," and words of similar import, refer to this application as a whole and not to any particular portions of this application. When the term "or" is used in reference to a list of two or more items, that term covers all of the following interpretations of that term: any of the items in the list, all of the items in the list, and any combination of items in the list .

尽管以举例的方式和依照特定实施例描述了一个或更多个实现,但是要理解一个或更多个实现不限于所公开的实施例。相反,旨在覆盖对本领域技术人员来说明显的各种修改和类似的布置。因此,所附权利要求的范围应被赋予最宽泛的解释以便涵盖所有这种修改和类似的布置。While one or more implementations have been described by way of example and in terms of specific embodiments, it is to be understood that the one or more implementations are not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements apparent to those skilled in the art. Accordingly, the scope of the appended claims should be given the broadest interpretation so as to cover all such modifications and similar arrangements.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4