æ¬ç³è¯·è¦æ±2012å¹´8æ31æ¥æäº¤çç¾å½ä¸´æ¶ä¸å©ç³è¯·No.61/695,893çä¼å æï¼å ¶å ¨æå 容éè¿å¼å ¥å¹¶å ¥æ¬æãThis application claims priority to US Provisional Patent Application No. 61/695,893, filed August 31, 2012, the entire contents of which are incorporated herein by reference.
宿½ä¾é对åå°å£°æ¸²æç³»ç»ï¼è¯¥ç³»ç»è¢«é 置为ä¸å£°é³æ ¼å¼åå¤çç³»ç»ä¸èµ·å·¥ä½ï¼è¯¥å£°é³æ ¼å¼åå¤çç³»ç»å¯ä»¥è¢«ç§°ä¸ºâ空é´é³é¢ç³»ç»âæâèªéåºé³é¢ç³»ç»âï¼å ¶åºäºé³é¢æ ¼å¼åæ¸²æææ¯ä»¥å 许å¢å¼ºçå¬ä¼æ²æµ¸ãæ´å¤§çèºæ¯æ§å¶ä»¥åç³»ç»çµæ´»æ§å坿©å±æ§ãæ»çèªéåºé³é¢ç³»ç»ä¸è¬å æ¬è¢«é ç½®æçæå å«å¸¸è§çåºäºå£°éçé³é¢å ç´ åé³é¢å¯¹è±¡ç¼ç å ç´ çä¸ä¸ªææ´å¤ä¸ªæ¯ç¹æµçé³é¢ç¼ç ãåååè§£ç ç³»ç»ãä¸åç¬éç¨çåºäºå£°éçæåºäºå¯¹è±¡çæ¹æ³ç¸æ¯ï¼è¿ç§ç»åçæ¹æ³æä¾äºæ´å¤§çç¼ç æç忏²æçµæ´»æ§ãå¨2012å¹´4æ20æ¥æäº¤çæ é¢ä¸ºâSystem and Method for Adaptive Audio Signal Generation,Codingand Renderingâçå¾ å®¡æ¹çç¾å½ä¸´æ¶ä¸å©ç³è¯·61/636,429ä¸æè¿°äºå¯ä»¥ä¸æ¬å®æ½ä¾ä¸èµ·ä½¿ç¨çèªéåºé³é¢ç³»ç»ç示ä¾ï¼å ¶å ¨æå 容éè¿å¼ç¨å¹¶å ¥äºæ¤ãEmbodiments are directed to reflected sound rendering systems configured to work with sound formats and processing systems, which may be referred to as "spatial audio systems" or "adaptive audio systems," which are based on audio formats and Rendering techniques to allow for enhanced audience immersion, greater artistic control, and system flexibility and scalability. A general adaptive audio system generally includes an audio encoding, distribution and decoding system configured to generate one or more bitstreams containing conventional channel-based audio elements and audio object encoding elements. This combined approach provides greater coding efficiency and rendering flexibility than either channel-based or object-based approaches taken individually. An adaptive audio system that may be used with this embodiment is described in pending U.S. Provisional Patent Application 61/636,429, filed April 20, 2012, entitled "System and Method for Adaptive Audio Signal Generation, Coding and Rendering" , the entire contents of which are hereby incorporated by reference.
èªéåºé³é¢ç³»ç»åç¸å ³èçé³é¢æ ¼å¼çç¤ºä¾æ§å®æ½æ¹å¼æ¯ AtmosTMå¹³å°ãè¿ç§ç³»ç»å å«å¯ä»¥å®ç°ä¸º9.1ç¯ç»ç³»ç»æç±»ä¼¼çç¯ç»å£°é ç½®çé«åº¦(ä¸/ä¸)维度ãå¾1示åºäºæä¾ç¨äºåæ¾é«åº¦å£°éçé«åº¦æ¬å£°å¨çæ¬ç¯ç»ç³»ç»(ä¾å¦ï¼9.1ç¯ç»)ä¸çæ¬å£°å¨å¸å±ã9.1ç³»ç»100çæ¬å£°å¨é ç½®ç±å°æ¿å¹³é¢ä¸çäºä¸ªæ¬å£°å¨102åé«åº¦å¹³é¢ä¸çå个æ¬å£°å¨104ææãä¸è¬èè¨ï¼è¿äºæ¬å£°å¨å¯ä»¥è¢«ç¨æ¥äº§ç被设计为å ä¹åç¡®å°ä»æ¶å¬ç¯å¢å çä»»ä½ä½ç½®ååºç声é³ãé¢å®ä¹çæ¬å£°å¨é ç½®ï¼è¯¸å¦å¾1æç¤ºçï¼ä¼å¤©ç¶å°éå¶åç¡®å°è¡¨ç°ç»å®å£°æºçä½ç½®çè½åãä¾å¦ï¼å£°æºä¸è½è¢«å¹³ç§»å¾æ¯å·¦ä¾§æ¬å£°å¨æ¬èº«æ´å·¦ãè¿éç¨äºæ¯ä¸ªæ¬å£°å¨ï¼å æ¤å½¢æäºä¸ç»´ç(ä¾å¦ï¼å·¦-å³)ãäºç»´ç(ä¾å¦ï¼å-å)æä¸ç»´ç(ä¾å¦ï¼å·¦-å³ãå-åãä¸-ä¸)å ä½å½¢ç¶ï¼å ¶ä¸ï¼å䏿··ååå°çº¦æãå¨è¿ç§æ¬å£°å¨é ç½®ä¸ï¼å¯ä»¥ä½¿ç¨åç§ä¸åçæ¬å£°å¨é ç½®åç±»åãä¾å¦ï¼æäºå¢å¼ºçé³é¢ç³»ç»å¯ä»¥ä½¿ç¨9.1ã11.1ã13.1ã19.4æå ¶ä»é ç½®ä¸çæ¬å£°å¨ãæ¬å£°å¨ç±»åå¯å æ¬å ¨èå´çç´æ¥æ¬å£°å¨ãæ¬å£°å¨éµåãç¯ç»æ¬å£°å¨ãéä½é³æ¬å£°å¨ãé«é³æ¬å£°ä»¥åå ¶ä»ç±»åçæ¬å£°å¨ãAn example implementation of an adaptive audio system and associated audio format is Atmos ⢠platform. Such a system includes a height (up/down) dimension that can be implemented as a 9.1 surround system or similar surround sound configuration. Figure 1 shows the speaker layout in the present surround system (eg 9.1 surround) providing height speakers for playback of height channels. The loudspeaker configuration of the 9.1 system 100 consists of five loudspeakers 102 in the floor plane and four loudspeakers 104 in the height plane. In general, these loudspeakers can be used to produce sounds that are designed to emanate from almost exactly anywhere within the listening environment. Pre-defined loudspeaker configurations, such as that shown in Figure 1, inherently limit the ability to accurately represent the location of a given sound source. For example, a sound source cannot be panned further left than the left speaker itself. This applies to each speaker, thus forming a 1D (e.g. left-right), 2D (e.g. front-rear) or 3D (e.g. left-right, front-rear, top-bottom) geometry The shape in which downward blending is constrained. In this speaker configuration, a variety of different speaker configurations and types can be used. For example, some enhanced audio systems can use speakers in 9.1, 11.1, 13.1, 19.4, or other configurations. Speaker types can include full-range direct speakers, speaker arrays, surround speakers, subwoofers, tweeters, and other types of speakers.
é³é¢å¯¹è±¡å¯ä»¥è¢«è®¤ä¸ºæ¯å¯è¢«æè§ä¸ºä»æ¶å¬ç¯å¢ä¸çä¸ä¸ªæå¤ä¸ªç¹å®ç©çä½ç½®ååºç声é³å ç´ çç»ãè¿ç§å¯¹è±¡å¯ä»¥æ¯éæç(å³ï¼éæ¢)æå¨æç(å³ï¼ç§»å¨ç)ãé³é¢å¯¹è±¡ç±éå®å£°é³å¨ç»å®æ¶é´ç¹çä½ç½®çå æ°æ®åå ¶ä»å½æ°ä¸èµ·æ§å¶ãå½å¯¹è±¡è¢«åæ¾æ¶ï¼å®ä»¬æ ¹æ®ä½ç½®å æ°æ®ä½¿ç¨åå¨çæ¬å£°å¨æ¥æ¸²æï¼èå¹¶éä¸å®è¾åºå°é¢å®ä¹çç©ç声éãä¼è¯ä¸ç声轨å¯ä»¥æ¯é³é¢å¯¹è±¡ï¼å¹¶ä¸æ åçå¹³ç§»æ°æ®ç±»ä¼¼äºä½ç½®å æ°æ®ã妿¤ï¼ä½äºå±å¹ä¸çå 容å¯ä»¥ä»¥ä¸åºäºå£°éçå 容ç¸åçæ¹å¼ææå°å¹³ç§»ï¼ä½æ¯ï¼å¦æéè¦çè¯ä½äºç¯ç»ä¸çå 容å¯ä»¥è¢«æ¸²æç»å个æ¬å£°å¨ã尽管使ç¨é³é¢å¯¹è±¡æä¾äºå¯¹åç¦»çææçææçæ§å¶ï¼ä½æ¯ï¼å£°è½¨çå ¶ä»æ¹é¢å¯ä»¥å¨åºäºå£°éçç¯å¢ä¸ææå°èµ·ä½ç¨ãä¾å¦ï¼è®¸å¤ç¯å¢ææææ··åå®é ä¸å¾çäºè¢«é¦éå°æ¬å£°å¨éµåãè½ç¶è¿äºå¯ä»¥è¢«è§ä¸ºå¸¦æè¶³å¤ç宽度以填å éµåç对象ï¼ä½æ¯ä¿æä¸äºåºäºå£°éçåè½æ¯æççãAn audio object can be thought of as a group of sound elements that can be perceived as emanating from one or more specific physical locations in the listening environment. Such objects may be static (ie, stationary) or dynamic (ie, moving). Audio objects are controlled by metadata defining the position of a sound at a given point in time, along with other functions. When objects are played back, they are rendered using existing speakers based on positional metadata, rather than necessarily outputting to predefined physical channels. Tracks in a session can be audio objects, and the standard pan data is similar to position metadata. In this way, on-screen content can be effectively panned in the same manner as channel-based content, but content located in surround can be rendered to a single speaker if desired. While the use of audio objects provides desired control over discrete effects, other aspects of soundtracks can function effectively in a channel-based environment. For example, many ambient effects or reverbs actually benefit from being fed to speaker arrays. While these can be viewed as objects with sufficient width to fill the array, it is beneficial to maintain some channel-based functionality.
èªéåºé³é¢ç³»ç»è¢«é 置为é¤äºé³é¢å¯¹è±¡å¤è¿æ¯æâåºâï¼å ¶ä¸åºæ¯ææå°åºäºå£°éçå¯è·¯æ··å(sub-mix)æé»æ¡ç©(stem)ãåå³äºå 容å建è çæå¾ï¼è¿äºå¯ä»¥è¢«åç¬å°æç»åå°åéå°å个åºä¸ï¼ç¨äºæåçåæ¾(渲æ)ãå¯ä»¥å¨å æ¬å¤´é¡¶æ¬å£°å¨çéµååä¸åçåºäºå£°éçé ç½®(诸å¦5.1ï¼7.1ï¼ä»¥å9.1)ä¸å建è¿äºåºï¼è¯¸å¦å¾1æç¤ºåºçãå¾2示åºäºå¨å®æ½ä¾ä¸çç¨äºäº§çèªéåºé³é¢æ··åçåºäºééåå¯¹è±¡çæ°æ®çç»åãå¦è¿ç¨200ä¸æç¤ºï¼åºäºå£°éçæ°æ®202(ä¾å¦ï¼å¯ä»¥æ¯ä»¥èå²ç¼ç è°å¶(PCM)çæ°æ®ç形弿ä¾ç5.1æ7.1ç¯ç»å£°æ°æ®)ä¸é³é¢å¯¹è±¡æ°æ®204ç»åï¼ä»¥äº§çèªéåºé³é¢æ··å208ãéè¿æåå§çåºäºå£°éçæ°æ®çå ç´ ä¸æå®å ³äºé³é¢å¯¹è±¡çä½ç½®çæäºåæ°çç¸å ³èçå æ°æ®ç»åï¼æ¥äº§çé³é¢å¯¹è±¡æ°æ®204ãå¦å¾2仿¦å¿µæ§å°ç¤ºåºçï¼åä½å·¥å ·æä¾åå»ºåæ¶å 嫿¬å£°å¨å£°éç»å对象声éçç»åçé³é¢èç®çè½åãä¾å¦ï¼é³é¢èç®å¯ä»¥å å«å¯ä»»éå°ç»ç»æç»(æå£°è½¨ï¼ä¾å¦ï¼ç«ä½å£°æ5.1声轨)çä¸ä¸ªææ´å¤ä¸ªæ¬å£°å¨å£°éãä¸ä¸ªææ´å¤ä¸ªæ¬å£°å¨å£°éçæè¿°æ§çå æ°æ®ãä¸ä¸ªææ´å¤ä¸ªå¯¹è±¡å£°é以åä¸ä¸ªææ´å¤ä¸ªå¯¹è±¡å£°éçæè¿°æ§çå æ°æ®ãThe adaptive audio system is configured to support "beds" in addition to audio objects, where beds are effectively channel-based sub-mixes or stems. Depending on the intent of the content creator, these can be sent individually or in combination into a single bed for final playback (rendering). These beds can be created in arrays including overhead speakers and in different channel-based configurations (such as 5.1, 7.1, and 9.1), such as that shown in FIG. 1 . Figure 2 illustrates the combination of channel and object based data used to generate an adaptive audio mix, under an embodiment. As shown in process 200, channel-based data 202 (e.g., 5.1 or 7.1 surround sound data that may be provided as pulse code modulated (PCM) data) is combined with audio object data 204 to produce adaptive audio Mix 208. Audio object data 204 is generated by combining elements of the original channel-based data with associated metadata specifying certain parameters about the position of the audio object. As shown conceptually in Figure 2, the authoring tool provides the ability to create audio programs that contain combinations of both speaker channel groups and object channels. For example, an audio program may contain one or more speaker channels, descriptive metadata for one or more speaker channels, optionally organized into groups (or tracks, e.g., stereo or 5.1 tracks) , one or more object channels, and descriptive metadata for the one or more object channels.
èªéåºé³é¢ç³»ç»ä½ä¸ºåå空é´é³é¢çææ®µï¼ææå°ç§»å¨å°ç®åâæ¬å£°å¨é¦éâ以å¤ï¼å¹¶ä¸é«çº§çåºäºæ¨¡åçé³é¢æè¿°å·²è¢«å¼ååºæ¥ï¼æè¿°åºäºæ¨¡åçé³é¢æè¿°å 许å¬è èªç±å°éæ©éåä»ä»¬çåç¬çéè¦æé¢ç®çåæ¾é ç½®ï¼å¹¶è®©é³é¢ä¸é¨é对ä»ä»¬çåèªéæ©çé ç½®èæ¸²æãå¨é«çº§å«ï¼æå个主è¦ç空é´é³é¢æè¿°æ ¼å¼ï¼(1)æ¬å£°å¨é¦éï¼å ¶ä¸é³é¢è¢«æè¿°ä¸ºè®¡åç¨äºä½äºæ ç§°æ¬å£°å¨ä½ç½®çæ¬å£°å¨çä¿¡å·ï¼(2)麦å é£é¦éï¼å ¶ä¸é³é¢è¢«æè¿°ä¸ºç±é¢å®ä¹é ç½®(麦å é£çæ°é以åå®ä»¬çç¸å¯¹ä½ç½®)ä¸çå®é æèæéº¦å 飿æå°çä¿¡å·ï¼(3)åºäºæ¨¡åçæè¿°ï¼å ¶ä¸é³é¢æç §å¨ææè¿°çæ¶å»åä½ç½®çé³é¢äºä»¶ç顺åºè¢«æè¿°ï¼ä»¥å(4)两è³å¼çï¼å ¶ä¸é³é¢éè¿å°è¾¾å¬è çä¸¤ä¸ªè³æµçä¿¡å·è¢«æè¿°ãAdaptive audio systems effectively move beyond simple "speaker feeds" as a means of distributing spatial audio, and advanced model-based audio descriptions have been developed that allow listeners the freedom to choose individual needs or budget for playback configurations, and have the audio rendered specifically for their respective selected configurations. At a high level, there are four main spatial audio description formats: (1) speaker feeds, where audio is described as the signal intended for speakers located at nominal speaker positions; (2) microphone feeds, where audio is described as signals captured by real or virtual microphones in a predefined configuration (number of microphones and their relative positions); (3) model-based description, where the audio is described in the order of audio events at the described instants and locations; and (4) diaural, where the audio is described by signals reaching both ears of the listener.
å个æè¿°æ ¼å¼å¸¸å¸¸ä¸ä¸å常è§çæ¸²æææ¯ç¸å ³èï¼å ¶ä¸ï¼æ¯è¯â渲æâææè½¬æ¢å°ç¨ä½æ¬å£°å¨é¦éççµä¿¡å·ï¼(1)平移ï¼å ¶ä¸ä½¿ç¨ä¸ç»å¹³ç§»æ³ååå·²ç¥çæåè®¾çæ¬å£°å¨ä½ç½®ï¼æ¥æé³é¢æµè½¬æ¢ææ¬å£°å¨é¦é(å ¸åå°ï¼å¨ååä¹å渲æ)ï¼(2)é«ä¿ç度ç«ä½å£°åå¤å¶(ambisonics)ï¼å ¶ä¸éº¦å é£ä¿¡å·è¢«è½¬æ¢æç¨äºå¯æ©å±çæ¬å£°å¨éµåçé¦é(å ¸åå°ï¼å¨ååä¹å渲æ)ï¼(3)æ³¢åºåæ(WFS)ï¼å ¶ä¸å£°æºè¢«è½¬æ¢æåéçæ¬å£°å¨ä¿¡å·ï¼ä»¥åæå£°åº(å ¸åå°ï¼å¨ååä¹å渲æ)ï¼ä»¥å(4)两è³å¼çï¼å ¶ä¸L/R两è³ä¿¡å·è¢«åéç»L/Rè³æµï¼å ¸åå°éè¿è³æºï¼ä½ä¹å¯éè¿ä¸ä¸²æ°æ¶é¤ç»åçæ¬å£°å¨ãFour description formats are often associated with the following common rendering techniques, where the term "rendering" means conversion to an electrical signal for use as a speaker feed: (1) translation, where a set of translation laws and known or hypothesized speaker positions, to convert audio streams into speaker feeds (typically, rendered before distribution); (2) ambisonics, where microphone signals are converted into feeds for scalable speaker arrays ( typically rendered after distribution); (3) wave field synthesis (WFS), in which sound sources are converted into appropriate speaker signals to synthesize the sound field (typically rendered after distribution); and (4) diaural , where the L/R binaural signals are sent to the L/R ears, typically through headphones, but also through speakers combined with crosstalk cancellation.
ä¸è¬èè¨ï¼ä»»ä½æ ¼å¼é½å¯ä»¥è¢«è½¬æ¢ä¸ºå¦ä¸æ ¼å¼(使¯ï¼è¿å¯è½è¦æ±ç²æºå离æç±»ä¼¼çææ¯)ï¼å¹¶ä¸ä½¿ç¨å¦åæè¿°çææ¯ä¸çä»»ä½ä¸ç§æ¸²æï¼ç¶èï¼å¨å®è·µä¸å¹¶éææç忢é½ä¼äº§ç好çç»æãæ¬å£°å¨é¦éæ ¼å¼æ¯æå¸¸è§çï¼å 为å®ç®åèææãæå¥½ç声é³ç»æ(å³ï¼æåç¡®ãå¯é ç)æ¯éè¿ç´æ¥å¨æ¬å£°å¨é¦é䏿··å/çè§ç¶åå忥å®ç°çï¼å 为å¨å 容å建è åå¬è ä¹é´ä¸è¦æ±å¤çãå¦æåæ¾ç³»ç»æ¯æåå·²ç¥çï¼åæ¬å£°å¨é¦éæè¿°æä¾æé«çä¿ç度ï¼ç¶èï¼åæ¾ç³»ç»åå ¶é ç½®å¸¸å¸¸ä¸æ¯é¢å å·²ç¥çãç¸æ¯ä¹ä¸ï¼åºäºæ¨¡åçæè¿°æ¯éåºæ§æå¼ºçï¼å 为å®ä¸ä½åºå ³äºåæ¾ç³»ç»çå设ï¼å æ¤æå®¹æéç¨äºå¤ç§æ¸²æææ¯ãåºäºæ¨¡åçæè¿°å¯ä»¥ææå°ææç©ºé´ä¿¡æ¯ï¼ä½æ¯ï¼éçé³é¢æºçæ°éå¢å¤§èåå¾é叏使ãIn general, any format can be converted to another format (however, this may require blind source separation or similar techniques) and rendered using any of the techniques described previously; however, in practice it is not All transformations produce good results. The speaker feed format is the most common because it is simple and effective. The best sound results (ie, most accurate, reliable) are achieved by mixing/monitoring directly in the speaker feeds and then distributing, since no processing is required between the content creator and the listener. Speaker feed descriptions provide the highest fidelity if the playback system is known in advance; however, the playback system and its configuration are often not known in advance. In contrast, the model-based description is the most adaptable, since it makes no assumptions about the playback system, and thus is most easily applicable to multiple rendering techniques. Model-based descriptions can effectively capture spatial information, but become very inefficient as the number of audio sources increases.
èªéåºé³é¢ç³»ç»ç»ååºäºå£°éååºäºæ¨¡åçç³»ç»ä¸¤è çä¼ç¹ï¼å ·ææç¡®ççå¤ï¼å æ¬é«é³è²è´¨éãå½ä½¿ç¨ç¸åç声éé 置混å忏²ææ¶èºæ¯æå¾çæä½³åç°ãå ·æå¯¹æ¸²æé ç½®çâåä¸âéåºçååºå(single inventory)ãå¯¹ç³»ç»æµæ°´çº¿çç¸å¯¹è¾ä½çå½±åãç»ç±æ´ç²¾ç»çæ°´å¹³æ¬å£°å¨ç©ºé´å辨çåæ°çé«åº¦å£°éçå¢å¼ºçæ²æµ¸ãèªéåºé³é¢ç³»ç»æä¾äºè¥å¹²ä¸ªæ°ç¹å¾ï¼å æ¬ï¼å ·æå¯¹ç¹å®å½±é¢æ¸²æé ç½®çåä¸ååä¸éåºçååºåï¼å³ï¼åæ¾ç¯å¢ä¸çå¯ç¨æ¬å£°å¨çå»¶è¿æ¸²æåæä½³ä½¿ç¨ï¼å¢å¼ºçç¯ç»æ(envelopment)ï¼å æ¬ä¼åçå䏿··å以é¿å 声éé´å ³è(ICC)失çï¼ç»ç±å½»åºææ§(steer-thru)éµåçå¢å ç空é´å辨ç(ä¾å¦ï¼å 许é³é¢å¯¹è±¡è¢«å¨æå°åé å°ç¯ç»éµåå çä¸ä¸ªææ´å¤ä¸ªæ¬å£°å¨)ï¼ä»¥åï¼ç»ç±é«å辨çä¸å¿æç±»ä¼¼çæ¬å£°å¨é ç½®çå¢å çåé¢å£°éå辨çãAdaptive audio systems combine the advantages of both channel-based and model-based systems, with definite benefits including high timbre quality, optimal reproduction of artistic intent when mixed and rendered using the same channel configuration, Single inventory for "downward" adaptation, relatively low impact on the system pipeline, enhanced immersion via finer horizontal speaker spatial resolution and new height channels. The Adaptive Audio System provides several new features, including: single inventory with downward and upward adaptation to specific theater rendering configurations, i.e. deferred rendering and optimal use of available speakers in the playback environment; enhanced surround sensation ( environment), including optimized downmixing to avoid Inter-Channel Correlation (ICC) distortion; increased spatial resolution via steer-thru arrays (e.g., allowing audio objects to be dynamically one or more speakers); and, increased front channel resolution via a high-resolution center or similar speaker configuration.
é³é¢ä¿¡å·çç©ºé´ææå¨ä¸ºå¬è æä¾æ²æµ¸å¼ä½éªæ¶æ¯å ³é®çãæ¨å¨ä»è§ç屿æ¶å¬ç¯å¢çç¹å®åºåååºç声é³åºè¯¥éè¿ä½äºç¸åçç¸å¯¹ä½ç½®çæ¬å£°å¨åæ¾ã妿¤ï¼åºäºæ¨¡åçæè¿°ä¸ç声é³äºä»¶ç主è¦é³é¢å æ°æ®æ¯ä½ç½®ï¼ä½æ¯ä¹å¯ä»¥æè¿°è¯¸å¦å¤§å°ãæåãé度以å声颿£ä¹ç±»çå ¶ä»åæ°ãä¸ºä¼ è¾¾ä½ç½®ï¼åºäºæ¨¡åç3Dé³é¢ç©ºé´æè¿°è¦æ±3Dåæ ç³»ã为æ¹ä¾¿æå缩ï¼ä¸è¬éæ©ç¨äºä¼ è¾çåæ ç³»(欧å éå¾ãçãåæ±)ï¼ç¶èï¼å ¶ä»åæ ç³»å¯ä»¥ç¨äºæ¸²æå¤çãé¤åæ ç³»ä¹å¤ï¼è¿éè¦åç §ç³»ä»¥è¡¨ç¤ºå¯¹è±¡å¨ç©ºé´ä¸çä½ç½®ã为使系ç»å¨åç§ä¸åçç¯å¢ä¸åç¡®å°åç°åºäºä½ç½®ç声é³ï¼éæ©éå½çåç §ç³»æ¯å ³é®çãå¨éèªæä¸å¿(allocentric)çåç §ç³»çæ åµä¸ï¼ç¸å¯¹äºæ¸²æç¯å¢å çè¯¸å¦æ¿é´å¢åè§è½ä¹ç±»çç¹å¾ãæ 忬声å¨ä½ç½®ä»¥åå±å¹ä½ç½®ï¼æ¥å®ä¹é³é¢æºä½ç½®ãå¨èªæä¸å¿(egocentric)çåç §ç³»ä¸ï¼ç¸å¯¹äºå¬è çè§åº¦æ¥è¡¨ç¤ºä½ç½®ï¼è¯¸å¦â卿åé¢âãâç¨å¾®åå·¦âççãå¯¹ç©ºé´æç¥(é³é¢çç)çç§å¦ç 究表æï¼å 乿®é使ç¨èªæä¸å¿çè§åº¦ãç¶èï¼å¯¹äºå½±é¢ï¼éèªæä¸å¿çåç §ç³»ä¸è¬æ´å åéãä¾å¦ï¼å½å¨å±å¹ä¸æç¸å ³èç对象æ¶ï¼é³é¢å¯¹è±¡çåç¡®çä½ç½®æ¯æéè¦çãå½ä½¿ç¨éèªæä¸å¿çåç §æ¶ï¼å¯¹äºæ¯ä¸ªæ¶å¬ä½ç½®ä»¥å对äºä»»ä½å±å¹å¤§å°ï¼å£°é³å°å±éå¨å±å¹ä¸çç¸åçç¸å¯¹ä½ç½®ï¼ä¾å¦ï¼âå±å¹çä¸é´ç左侧ä¸åä¹ä¸âãå¦ä¸çç±æ¯ï¼è°é³å¸å¾åäºä»éèªæä¸å¿çè§åº¦æèåæ··åï¼å¹¶ä¸å¹³ç§»å·¥å ·ç¨éèªæä¸å¿çåç §ç³»(å³ï¼æ¿é´å¢)宿ï¼å¹¶ä¸è°é³å¸ææå®ä»¬å¦æ¤æ¸²æï¼ä¾å¦ï¼âæ¤å£°é³åºè¯¥å¨å±å¹ä¸âãâæ¤å£°é³åºè¯¥å¨å±å¹å¤âæâä»å·¦ä¾§å¢âççãThe spatial effects of an audio signal are critical in providing an immersive experience for the listener. Sound intended to emanate from a specific area of the viewing screen or listening environment should be reproduced through speakers located in the same relative position. As such, the primary audio metadata of a sound event in a model-based description is position, but other parameters such as size, orientation, velocity, and sound dispersion may also be described. To convey position, a model-based 3D audio spatial description requires a 3D coordinate system. A coordinate system (Euclidean, spherical, cylindrical) is generally chosen for transmission for convenience or compression; however, other coordinate systems may be used for the rendering process. In addition to a coordinate system, a frame of reference is needed to represent the position of objects in space. Choosing an appropriate frame of reference is critical for a system to accurately reproduce position-based sound in a variety of environments. In the case of an allocentric frame of reference, audio source positions are defined relative to features such as room walls and corners, standard speaker positions, and screen positions within the rendering environment. In an egocentric frame of reference, position is expressed relative to the listener's angle, such as "in front of me", "slightly to the left", etc. Scientific research on spatial perception (audio, etc.) shows an almost universal use of the egocentric perspective. For cinema, however, a non-egocentric frame of reference is generally more appropriate. For example, the exact position of an audio object is most important when there are associated objects on the screen. When using a non-egocentric reference, the sound will be localized to the same relative location on the screen for each listening position and for any screen size, eg "middle left third of the screen". Another reason is that sound engineers tend to think and mix from a non-egocentric point of view, and panning tools are arranged with non-egocentric frames of reference (i.e., room walls), and sound engineers expect them to be rendered as such, e.g., " This sound should be on screen", "This sound should be off screen", or "From the left wall", etc.
尽管å¨å½±é¢ç¯å¢ä¸ä½¿ç¨éèªæä¸å¿çåç §ç³»ï¼ä½æ¯ï¼å¨æäºæ åµä¸ï¼èªæä¸å¿çåç §ç³»å¯è½æç¨ä¸æ´åéãè¿äºæ åµå æ¬ç»å¤é³ï¼å³ä¸åå¨äºâæ äºç©ºé´âä¸çé£äºå£°é³ï¼ä¾å¦æ°æ°é³ä¹ï¼èªæä¸å¿å°ä¸è´åç°å¯è½æ¯ææçãå¦ä¸ç§æ 嵿¯è¦æ±èªæä¸å¿ç表示çè¿åºææ(ä¾å¦ï¼å¬è çå·¦è³æµä¸çå¡å¡å«çèå)ãå¦å¤ï¼æ éè¿ç声æº(以åæäº§çç平颿³¢)å¯è½çèµ·æ¥æ¥èªæå®çèªæä¸å¿çä½ç½®(ä¾å¦ï¼åå·¦30度)ï¼ä»èªæä¸å¿çè§åº¦æ¯ä»éèªæä¸å¿çè§åº¦æ´å 容ææè¿°è¿ç§å£°é³ãå¨æäºæ åµä¸ï¼å¯ä»¥ä½¿ç¨éèªæä¸å¿çåç §ç³»ï¼åªè¦å®ä¹äºæ ç§°æ¶å¬ä½ç½®ï¼èæäºç¤ºä¾è¦æ±è¿ä¸å¯è½æ¸²æçèªæä¸å¿ç表示ãè½ç¶éèªæä¸å¿çåç §å¯è½æ´å æç¨å¹¶ä¸åéï¼ä½æ¯é³é¢è¡¨ç¤ºåºè¯¥æ¯å¯æ©å±çï¼å ä¸ºå¨æäºåºç¨åæ¶å¬ç¯å¢ä¸ï¼å æ¬èªæä¸å¿ç表示çè®¸å¤æ°ç¹å¾å¯è½æ´å åä¹éè¦ãAlthough a non-egocentric frame of reference is used in a theater environment, there are situations in which an egocentric frame of reference may be useful and more appropriate. These situations include voice-overs, ie those sounds that do not exist in the "story space", such as ambient music, where an egocentric coherent presentation may be desired. Another case is near-field effects that require egocentric representations (eg, a buzzing mosquito in the listener's left ear). Also, an infinitely distant sound source (and the resulting plane wave) may appear to come from a constant egocentric position (e.g., 30 degrees to the left), which is easier to describe from an egocentric perspective than from a non-egocentric one. sound. In some cases, a non-egocentric frame of reference can be used as long as a nominal listening position is defined, while some examples require an egocentric representation that is not yet possible to render. While non-egocentric references may be more useful and appropriate, audio representations should be extensible, as many new features including egocentric representations may be more desirable in certain applications and listening environments.
èªéåºé³é¢ç³»ç»ç宿½ä¾å æ¬æ··ååç©ºé´æè¿°æ¹æ³ï¼è¯¥æ¹æ³å æ¬ç¨äºæä½³ä¿ç度并ç¨äºä½¿ç¨èªæä¸å¿çåç §å éèªæä¸å¿çåºäºæ¨¡åçå£°é³æè¿°ä»¥ææå°ä½¿å¾è½å¤å¢å¼ºç©ºé´å辨çåå¯ç¼©æ¾æ§æ¥æ¸²ææ£å¼çæå¤æçå¤ç¹æº(ä¾å¦ï¼ä½è²åºç¾¤ä¼ï¼å¨å´ç¯å¢)èæ¨èç声éé ç½®ãå¾3æ¯å¨å®æ½ä¾ä¸çç¨äºèªéåºé³é¢ç³»ç»ä¸çåæ¾ä½ç³»ç»æçæ¡å¾ãå¾3çç³»ç»å æ¬å¨é³é¢è¢«åéå°åå¤çå/ææ¾å¤§ä»¥åæ¬å£°å¨çº§ä¹åæ§è¡ä¼ ç»(legacy)ã对象å声éé³é¢è§£ç ã对象渲æã声ééæ°æ å°åä¿¡å·å¤ççå¤çåãEmbodiments of the adaptive audio system include a hybrid spatial description approach that includes for optimal fidelity and for using egocentric reference plus non-egocentric model-based sound description to effectively enable enhanced spatial resolution Recommended channel configuration for rendering diffuse or complex multipoint sources (eg, stadium crowds, ambient environments) and scalability. Figure 3 is a block diagram of a playback architecture for use in an adaptive audio system, under an embodiment. The system of FIG. 3 includes processing blocks that perform legacy, object and channel audio decoding, object rendering, channel remapping, and signal processing before the audio is sent to post-processing and/or amplification and speaker stages.
åæ¾ç³»ç»300被é 置为渲æååæ¾éè¿ä¸ä¸ªææ´å¤ä¸ªææãé¢å¤çãåä½åç¼è§£ç ç»ä»¶çæçé³é¢å 容ãèªéåºé³é¢é¢å¤çå¨å¯å æ¬éè¿åæè¾å ¥é³é¢èªå¨å°çæåéçå æ°æ®çæºå离åå å®¹ç±»åæ£æµåè½ãä¾å¦ï¼éè¿å£°é对ä¹é´çç¸å ³èçè¾å ¥çç¸å¯¹çµå¹³çåæï¼å¯ä»¥ä»å¤å£°éè®°å½å¯¼åºä½ç½®å æ°æ®ãå¯ä»¥ä¾å¦éè¿ç¹å¾æåååç±»æ¥å®ç°è¯¸å¦âè¯é³âæâé³ä¹âä¹ç±»çå 容类åçæ£æµãæäºåä½å·¥å ·å 许éè¿ä¼åé³åå·¥ç¨å¸çåææå¾çè¾å ¥åç¼ç æ¥åä½é³é¢èç®ï¼å 许ä»ä¸æ¬¡å建为å¨å ä¹ä»»ä½åæ¾ç¯å¢ä¸åæ¾èä¼åçæç»çé³é¢æ··åãè¿å¯ä»¥éè¿ä½¿ç¨é³é¢å¯¹è±¡åä¸åå§é³é¢å 容ç¸å ³èä¸ç¼ç çä½ç½®æ°æ®æ¥å®æã为äºåç¡®å°å¨ç¤¼å å¨å´å¸ç½®å£°é³ï¼é³åå·¥ç¨å¸éè¦åºäºåæ¾ç¯å¢çå®é 约æåç¹å¾æ¥å¯¹å£°é³æç»å°å¦ä½æ¸²æè¿è¡æ§å¶ãèªéåºé³é¢ç³»ç»éè¿å 许é³åå·¥ç¨å¸éè¿ä½¿ç¨é³é¢å¯¹è±¡åä½ç½®æ°æ®æ¥æ¹åå¦ä½è®¾è®¡åæ··åé³é¢å å®¹ï¼æ¥æä¾æ¤æ§å¶ã䏿¦èªéåºé³é¢å 容已å¨åéçç¼è§£ç å¨è®¾å¤ä¸è¢«åä½åç¼ç ï¼å®å¨åæ¾ç³»ç»300çåç§ç»ä»¶ä¸è¢«è§£ç 忏²æãPlayback system 300 is configured to render and playback audio content generated by one or more capture, preprocessing, authoring, and codec components. An adaptive audio preprocessor may include source separation and content type detection functionality that automatically generates appropriate metadata by analyzing input audio. For example, positional metadata may be derived from a multi-channel recording by analysis of the relative levels of associated inputs between channel pairs. Detection of content types such as "speech" or "music" can be achieved, for example, by feature extraction and classification. Certain authoring tools allow the authoring of audio programs by optimizing the input and encoding to the sound engineer's creative intent, allowing him to create in one go a final audio mix optimized for playback in almost any playback environment. This can be done using audio objects and position data associated and encoded with the original audio content. To accurately place sounds around an auditorium, sound engineers need control over how the sound will ultimately be rendered based on the actual constraints and characteristics of the playback environment. Adaptive audio systems provide this control by allowing sound engineers to alter how audio content is designed and mixed by using audio object and position data. Once the adaptive audio content has been authored and encoded in the appropriate codec device, it is decoded and rendered in the various components of the playback system 300 .
å¦å¾3æç¤ºï¼(1)ä¼ ç»ç¯ç»å£°é³é¢302ã(2)å æ¬å¯¹è±¡å æ°æ®ç对象é³é¢304以å(3)å æ¬å£°éå æ°æ®ç声éé³é¢306被è¾å ¥å°å¤çå310å çè§£ç å¨ç¶æ308ã309ãå¯¹è±¡å æ°æ®å¨å¯¹è±¡æ¸²æå¨312䏿¸²æï¼è声éå æ°æ®æ ¹æ®éè¦å¯ä»¥è¢«éæ°æ å°ãå对象渲æå¨å声ééæ°æ å°ç»ä»¶æä¾æ¶å¬ç¯å¢é 置信æ¯307ãç¶åï¼å¨æ··ååé³é¢æ°æ®è¢«è¾åºå°Bé¾å¤ç级316åéè¿æ¬å£°å¨318åæ¾ä¹åï¼éè¿è¯¸å¦åè¡¡å¨åéå¶å¨314ä¹ç±»çä¸ä¸ªææ´å¤ä¸ªä¿¡å·å¤ç级ï¼å¤çæ··ååé³é¢æ°æ®ãç³»ç»300表示èªéåºé³é¢çåæ¾ç³»ç»ç示ä¾ï¼å ¶ä»é ç½®ãç»ä»¶åäºè¿ä¹æ¯å¯ä»¥çãAs shown in FIG. 3, (1) conventional surround sound audio 302, (2) object audio 304 including object metadata, and (3) channel audio 306 including channel metadata are input to the decoder within processing block 310 Status 308, 309. Object metadata is rendered in the object renderer 312, while channel metadata can be remapped as needed. The listening environment configuration information is provided 307 to the object renderer and channel remapping component. The mixed audio data is then processed through one or more signal processing stages, such as equalizer and limiter 314 , before being output to B-chain processing stage 316 and played back through speakers 318 . System 300 represents an example of a playback system for adaptive audio, and other configurations, components, and interconnections are possible.
å¾3çç³»ç»ç¤ºåºäºè¿æ ·ç宿½ä¾ï¼å¨è¯¥å®æ½ä¾ä¸ï¼æ¸²æå¨å æ¬å°å¯¹è±¡å æ°æ®æ½å å°è¾å ¥é³é¢å£°é以ä¸èµ·å¤çåºäºå¯¹è±¡çé³é¢å 容åå¯éçåºäºå£°éçé³é¢å 容çç»ä»¶ã宿½ä¾ä¹å¯ä»¥é对è¾å ¥é³é¢å£°éåªå æ¬ä¼ ç»çåºäºå£°éçå 容并䏿¸²æå¨å æ¬çæç¨äºä¼ è¾å°ç¯ç»å£°é ç½®ä¸ç驱å¨å¨éµåçæ¬å£°å¨é¦éçç»ä»¶çæ åµã卿¤æ åµä¸ï¼è¾å ¥ä¸ä¸å®æ¯åºäºå¯¹è±¡çå 容ï¼èæ¯è¯¸å¦å¨Dolby DigitalæDolby Digital Plusæç¸ä¼¼çç³»ç»ä¸æä¾çä¼ ç»5.1æ7.1(æå ¶ä»éåºäºå¯¹è±¡ç)å 容ãThe system of FIG. 3 shows an embodiment in which the renderer includes a component that applies object metadata to input audio channels to process object-based audio content and optional channel-based audio content together. components. Embodiments may also address the case where the input audio channels include only conventional channel-based content and the renderer includes components that generate speaker feeds for transmission to a driver array in a surround sound configuration. In this case, the input is not necessarily object-based content, but traditional 5.1 or 7.1 (or other non-object-based) content such as provided in Dolby Digital or Dolby Digital Plus or similar systems.
åæ¾åºç¨playback application
å¦ä¸æè¿°ï¼èªéåºé³é¢æ ¼å¼åç³»ç»çåå§å®ç°æ¯å¨æ°åå½±é¢(D-å½±é¢)çèæ¯ä¸ï¼æè¿°æ°åå½±é¢çèæ¯å æ¬ä½¿ç¨æ°é¢çåä½å·¥å ·åä½çã使ç¨èªéåºé³é¢å½±é¢ç¼ç 卿å çã使ç¨PCMæä¸æçæ æç¼è§£ç å¨ä½¿ç¨ç°æçæ°åå½±é¢å¡è®®(DCI)ååæºå¶ååçå 容ææ(对象å声é)ã卿¤æ åµä¸ï¼é³é¢å 容æ¨å¨å¨æ°åå½±é¢ä¸è¢«è§£ç 忏²æï¼ä»¥åå»ºæ²æµ¸å¼ç空é´é³é¢å½±é¢ä½éªãç¶èï¼æ£å¦ä»¥åç影颿¹å(è¯¸å¦æ¨¡æç¯ç»å£°ãæ°åå¤å£°éé³é¢ç)ï¼æç´æ¥åå®¶åºä¸çç¨æ·æä¾ç±èªéåºé³é¢æ ¼å¼æä¾çå¢å¼ºçç¨æ·ä½éªçè¿«åéæ±ãè¿è¦æ±æè¿°æ ¼å¼åç³»ç»çæäºç¹å¾è¢«æ¹ä¸ºç¨äºæ´å åéçæ¶å¬ç¯å¢ä¸ãä¾å¦ï¼ä¸å½±é¢æå§åºç¯å¢ç¸æ¯ï¼å®¶åºãæ¿é´ãå°ç¤¼å æç±»ä¼¼çåºæå¯è½å ·æç¼©å°ç空é´ã声妿§è´¨ä»¥å设å¤åè½ãåºäºæè¿°çç®çï¼æ¯è¯âåºäºæ¶è´¹è çç¯å¢âæ¨å¨å æ¬å å«ä¾å¸¸è§çæ¶è´¹è æä¸ä¸äººå使ç¨çæ¶å¬ç¯å¢çä»»ä½éå½±é¢ç¯å¢ï¼è¯¸å¦æ¿å±ãå·¥ä½å®¤ãæ¿é´ãæ§å¶å°åºåã礼å çãé³é¢å 容å¯ä»¥è¢«åç¬å°è·å¾å渲æï¼æå®å¯ä»¥ä¸å¾å½¢å 容(ä¾å¦ï¼éæ¢å¾åãå æ¾ç¤ºå¨ãè§é¢ç)ç¸å ³èãAs noted above, the initial implementation of the Adaptive Audio format and system was in the context of Digital Cinema (D-Cinema), which consisted of audio files authored using novel authoring tools and packaged using the Adaptive Audio Cinema encoder. , content capture (objects and channels) distributed using PCM or a proprietary lossless codec using existing Digital Cinema Initiative (DCI) distribution mechanisms. In this case, the audio content is intended to be decoded and rendered in digital cinema to create an immersive spatial audio cinema experience. However, as with previous theater improvements (such as analog surround sound, digital multi-channel audio, etc.), there is a pressing need to provide the enhanced user experience provided by adaptive audio formats directly to the user in the home. This requires that certain features of the format and system be adapted for use in more restricted listening environments. For example, a home, room, auditorium, or similar venue may have reduced space, acoustic properties, and equipment functionality compared to a theater or theater environment. For purposes of the description, the term "consumer-based environment" is intended to include any non-theatre environment, such as a house, studio, room, console area, auditorium, etc., that encompasses a listening environment intended for regular consumer or professional use . Audio content may be obtained and rendered separately, or it may be associated with graphical content (eg, still images, light displays, video, etc.).
å¾4Aæ¯ç¤ºåºäºå¨å®æ½ä¾ä¸çç¨äºä¿®æ¹åºäºå½±é¢çé³é¢å 容以ç¨å¨æ¶å¬ç¯å¢ä¸çåè½æ§ç»ä»¶çæ¡å¾ãå¦å¾4Aæç¤ºï¼å¨æ¡402ä¸ï¼ä½¿ç¨åéç设å¤åå·¥å ·æ¥ææå/æåä½å ¸åå°å æ¬è¿å¨å¾å声轨çå½±é¢å 容ãå¨èªéåºé³é¢ç³»ç»ä¸ï¼å¨æ¡404ä¸ï¼éè¿ç¼ç /è§£ç 忏²æç»ä»¶åçé¢ï¼æ¥å¤ç该å 容ãç¶åï¼æäº§çç对象å声éé³é¢é¦é被åéå°å½±é¢æå§åº406ä¸çåéçæ¬å£°å¨ãå¨ç³»ç»400ä¸ï¼è¯¥å½±é¢å 容è¿ç»è¿å¤ç以å¨è¯¸å¦å®¶åºå½±é¢ç³»ç»ä¹ç±»çæ¶å¬ç¯å¢416ä¸åæ¾ãå设ç±äºæé空é´ã缩å°çæ¬å£°å¨æ°éçï¼æ¶å¬ç¯å¢ä¸å¦å 容åå»ºè æè®¡åç飿 ·å ¨é¢æè½å¤åç°å ¨é¨ç声é³å 容ãç¶èï¼å®æ½ä¾é对å 许åå§é³é¢å 容以æå°åç±æ¶å¬ç¯å¢ç缩å°ç容鿿½å çéå¶çæ¹å¼æ¥è¢«æ¸²æï¼å¹¶å 许以æå¤§åå¯ç¨è®¾å¤çæ¹å¼æ¥å¤çä½ç½®æç¤ºçç³»ç»åæ¹æ³ãå¦å¾4Aæç¤ºï¼éè¿å½±é¢è³æ¶è´¹è 转è¯å¨ç»ä»¶408æ¥å¤çå½±é¢é³é¢å 容ï¼å ¶ä¸å®å¨æ¶è´¹è å 容ç¼ç 忏²æé¾414ä¸è¢«å¤çã该é¾è¿å¤ç卿¡412䏿æå/æåä½çåå§é³é¢å 容ãåå§å 容å/æç»è¿è½¬è¯çå½±é¢å 容ç¶å卿¶å¬ç¯å¢416ä¸åæ¾ã以è¿ç§æ¹å¼ï¼å³ä½¿ä½¿ç¨å®¶åºææ¶å¬ç¯å¢416çå¯è½åéçæ¬å£°å¨é ç½®ï¼å¨é³é¢å 容ä¸ç¼ç çç¸å ³ç©ºé´ä¿¡æ¯ä¹å¯ç¨äºä»¥æ´å æ²æµ¸å¼æ¹å¼æ¸²æå£°é³ãFigure 4A is a block diagram illustrating functional components for modifying theater-based audio content for use in a listening environment, under an embodiment. As shown in FIG. 4A, in block 402, appropriate equipment and tools are used to capture and/or author theater content, typically including a motion picture soundtrack. In an adaptive audio system, the content is processed in block 404 by encoding/decoding and rendering components and interfaces. The resulting object and channel audio feeds are then sent to the appropriate speakers in the theater or theater 406 . In system 400, the theater content is also processed for playback in a listening environment 416, such as a home theater system. Suppose the listening environment is not as comprehensive or capable of reproducing the full sound content as planned by the content creator due to limited space, reduced number of speakers, etc. However, embodiments are directed to systems and methods that allow original audio content to be rendered in a manner that minimizes the constraints imposed by the reduced capacity of the listening environment, and allow location cues to be handled in a manner that maximizes available equipment. As shown in FIG. 4A , theater audio content is processed by a theater-to-consumer translator component 408 where it is processed in a consumer content encoding and rendering chain 414 . The chain also processes the raw audio content captured and/or authored in block 412 . The original content and/or translated theater content is then played back in the listening environment 416 . In this way, the relevant spatial information encoded in the audio content can be used to render sound in a more immersive manner, even using the possibly limited speaker configuration of the home or listening environment 416 .
å¾4Bæ´è¯¦ç»å°ç¤ºåºäºå¾4Açç»ä»¶ãå¾4B示åºäºå¨æ´ä¸ªé³é¢åæ¾çæç³»ç»ä¸çç¨äºèªéåºé³é¢å½±é¢å 容çç¤ºä¾æ§çååæºå¶ãå¦å¾ç¤º420æç¤ºï¼ææ422ååä½423åå§å½±é¢åTVå 容ï¼ä»¥å¨åç§ä¸åçç¯å¢ä¸åæ¾ï¼ä»èæä¾å½±é¢ä½éª427ææ¶è´¹è ç¯å¢ä½éª434ãåæ ·ï¼ææ423ååä½425æäºç¨æ·çæçå 容(UGC)ææ¶è´¹è å 容ï¼ä»¥å¨æ¶å¬ç¯å¢434ä¸åæ¾ãéè¿å·²ç¥çå½±é¢è¿ç¨426æ¥å¤çç¨äºå¨å½±é¢ç¯å¢427ä¸åæ¾çå½±é¢å 容ãç¶èï¼å¨ç³»ç»420ä¸ï¼å½±é¢åä½å·¥å ·æ¡423çè¾åºè¿å æ¬ä¼ è¾¾è°é³å¸çèºæ¯æå¾çé³é¢å¯¹è±¡ãé³é¢å£°éåå æ°æ®ãè¿å¯ä»¥è¢«è§ä¸ºå¤¹å±å¼çé³é¢å ï¼è¯¥é³é¢å å¯ç¨äºå建ç¨äºåæ¾çå½±é¢å 容çå¤ç§çæ¬ãå¨å®æ½ä¾ä¸ï¼æ¤åè½ç±å½±é¢è³æ¶è´¹è èªéåºé³é¢è½¬è¯å¨430æä¾ãæ¤è½¬è¯å¨å ·æå°èªéåºé³é¢å 容çè¾å ¥ï¼å¹¶ä»å ¶ä¸æç¼ç¨äºææçæ¶è´¹è 端ç¹434çåéçé³é¢åå æ°æ®å 容ãåå³äºååæºå¶å端ç¹ï¼è½¬è¯å¨å建åç¬çå¹¶å¯è½ä¸åçé³é¢åå æ°æ®è¾åºãFigure 4B shows the components of Figure 4A in more detail. Figure 4B illustrates an exemplary distribution mechanism for adaptive audio cinema content throughout the audio playback ecosystem. As shown in illustration 420, original theater and TV content is captured 422 and authored 423 for playback in a variety of different environments to provide a theater experience 427 or a consumer environment experience 434. Likewise, some user-generated content (UGC) or consumer content is captured 423 and authored 425 for playback in the listening environment 434 . The theater content for playback in a theater environment 427 is processed by known theater processes 426 . In system 420, however, the output of theater authoring tool block 423 also includes audio objects, audio channels, and metadata that convey the sound engineer's artistic intent. This can be thought of as a sandwich-style audio package that can be used to create multiple versions of theater content for playback. In an embodiment, this functionality is provided by the theater-to-consumer adaptive audio translator 430 . This translator has input to adaptive audio content and extracts therefrom the appropriate audio and metadata content for the desired consumer endpoint 434 . Depending on the distribution mechanism and endpoint, the transpiler creates separate and possibly different audio and metadata outputs.
å¦ç³»ç»420çç¤ºä¾æç¤ºï¼å½±é¢è³æ¶è´¹è 转è¯å¨430为å¾å(广æï¼çï¼OTTç)忏¸æé³é¢æ¯ç¹æµå建模å428é¦é声é³ãéç¨äºåéå½±é¢å 容çè¿ä¸¤ä¸ªæ¨¡åï¼å¯ä»¥è¢«æä¾å°å¤ä¸ªååæµæ°´çº¿432ä¸ï¼ææçæµæ°´çº¿432é½å¯ä»¥åéå°æ¶è´¹è 端ç¹ãä¾å¦ï¼å¯ä»¥ä½¿ç¨éç¨äºå¹¿æç®ççç¼è§£ç å¨(诸å¦Dolby Digital Plu)æ¥ç¼ç èªéåºé³é¢å½±é¢å å®¹ï¼æè¿°èªéåºé³é¢å½±é¢å 容å¯ä»¥è¢«ä¿®æ¹ä¸ºä¼ é声éã对象åç¸å ³èçå æ°æ®ï¼å¹¶ç»ç±çµç¼æå«æéè¿å¹¿æé¾ä¼ è¾ï¼ç¶åå¨å®¶åºä¸è§£ç 忏²æï¼ä»¥ç¨äºå®¶åºå½±é¢æçµè§åæ¾ã类似å°ï¼ç¸åçå 容å¯ä»¥ä½¿ç¨éç¨äºå¸¦å®½æéçå¨çº¿ååçç¼è§£ç 卿¥ç¼ç ï¼ç¶åéè¿3Gæ4Gç§»å¨ç½ç»æ¥ä¼ è¾ï¼ç¶åè§£ç 忏²æï¼ä»¥ä½¿ç¨è³æºç»ç±ç§»å¨è®¾å¤æ¥åæ¾ã诸å¦TVãå®åµå¹¿æã游æåé³ä¹ä¹ç±»çå ¶ä»å 容æºä¹å¯ä»¥ä½¿ç¨èªéåºé³é¢æ ¼å¼æ¥å建并æä¾ä¸ä¸ä»£é³é¢æ ¼å¼çå 容ãAs shown in the example of system 420 , theater to consumer translator 430 feeds sound to image (broadcast, disc, OTT, etc.) and game audio bitstream creation module 428 . These two modules, suitable for delivering theater content, can be provided into multiple distribution pipelines 432, all of which can be delivered to consumer endpoints. For example, Adaptive Audio Cinema content can be encoded using a codec suitable for broadcast purposes, such as Dolby Digital Plus, which can be modified to convey channels, objects, and associated metadata, and Transmitted through the broadcast chain via cable or satellite, then decoded and rendered at home for home theater or TV playback. Similarly, the same content could be encoded using a codec suitable for bandwidth-limited online distribution, then transmitted over a 3G or 4G mobile network, then decoded and rendered for playback via a mobile device using headphones. Other content sources such as TV, live broadcasts, games and music can also use the adaptive audio format to create and provide content in next generation audio formats.
å¾4Bçç³»ç»å¨æ´ä¸ªæ¶è´¹è é³é¢çæç³»ç»ä¸æä¾å¢å¼ºçç¨æ·ä½éªï¼è¯¥æ¶è´¹è é³é¢çæç³»ç»å¯å æ¬å®¶åºå§é¢(A/Væ¥æ¶å¨ãé³ç®±ä»¥åBluRay)ãE-åªä½(PCãå¹³æ¿è®¡ç®æºãå æ¬è³æºåæ¾çç§»å¨è®¾å¤)ã广æ(TVåæºé¡¶ç)ãé³ä¹ã游æãå®åµå£°é³ãç¨æ·çæçå 容(âUGCâ)çãè¿ç§ç³»ç»æä¾ï¼é对ææç«¯ç¹è®¾å¤çå¬ä¼çå¢å¼ºçæ²æµ¸ãé对é³é¢å 容å建è çæ©å±çèºæ¯æ§å¶ãç¨äºæ¹åçæ¸²æçæ¹åçä¾èµå 容ç(æè¿°æ§ç)å æ°æ®ãç¨äºåæ¾ç³»ç»çæ©å±ççµæ´»æ§åå¯ç¼©æ¾æ§ãé³è²ç»´æåå¹é 以ååºäºç¨æ·ä½ç½®å交äºçå 容çå¨ææ¸²æçæºä¼ãç³»ç»å æ¬è¥å¹²ä¸ªç»ä»¶ï¼æè¿°ç»ä»¶å æ¬ç¨äºå 容å建è çæ°çæ··åå·¥å ·ãç¨äºååååæ¾çæ´æ°ç䏿°çæå åç¼ç å·¥å ·ãå®¶ç¨çå¨ææ··å忏²æ(éåäºä¸åçé ç½®)ãé¢å¤çæ¬å£°å¨ä½ç½®å设计ãThe system of FIG. 4B provides an enhanced user experience throughout the consumer audio ecosystem, which may include home theater (A/V receiver, speakers, and BluRay), E-media (PC, tablet, including headphone playback), broadcast (TV and set-top boxes), music, gaming, live sound, user-generated content (âUGCâ), and more. Such a system provides: enhanced immersion for listeners of all endpoint devices, expanded artistic control for audio content creators, improved content-dependent (descriptive) metadata for improved rendering, playback system Opportunities for extended flexibility and scalability, timbre maintenance and matching, and dynamic rendering of content based on user position and interaction. The system includes several components including new mixing tools for content creators, updated and new packaging and encoding tools for distribution and playback, dynamic mixing and rendering at home (suitable for different configurations) , additional speaker placement and design.
èªéåºé³é¢çæç³»ç»è¢«é 置为æ¯ä½¿ç¨èªéåºé³é¢æ ¼å¼çå ¨é¢çã端对端çä¸ä¸ä»£é³é¢ç³»ç»ï¼å ¶å æ¬è·¨å¤§éç端ç¹è®¾å¤åä½¿ç¨æ åµçå 容å建ãæå ãååååæ¾/渲æãå¦å¾4Bæç¤ºï¼ç³»ç»å¯¹ä»å¤ä¸ªä¸åçä½¿ç¨æ åµ422å424ææçå 容以åç¨äºä¸åçä½¿ç¨æ åµ422å424çå 容è¿è¡åä½ãè¿äºææç¹å æ¬ææç¸å ³çå å®¹æ ¼å¼ï¼å æ¬å½±é¢ãçµè§ãå®åµå¹¿æ(以å声é³)ãUGCãæ¸¸æåé³ä¹ãå 容ï¼å¨å®ç»è¿çæç³»ç»æ¶ï¼ç»è¿è¥å¹²ä¸ªå ³é®é¶æ®µï¼è¯¸å¦é¢å¤çååä½å·¥å ·ã转è¯å·¥å ·(å³ï¼å°ç¨äºå½±é¢çèªéåºé³é¢å 容转è¯ä¸ºæ¶è´¹è å 容åååºç¨)ãç¹å®çèªéåºé³é¢æå /æ¯ç¹æµç¼ç (ææé³é¢å®è´¨æ°æ®ä»¥åé¢å¤çå æ°æ®åé³é¢åç°ä¿¡æ¯)ãç¨äºéè¿åç§é³é¢å£°éçææççååç使ç¨ç°æçææ°çç¼è§£ç å¨(ä¾å¦ï¼DD+ï¼TrueHDï¼Dolby Pulse)çååç¼ç ãéè¿ç¸å ³çå忏 é(广æãçãç§»å¨ãå ç¹ç½ç)çä¼ è¾ä»¥åæåçç«¯ç¹æç¥çå¨ææ¸²æï¼ä»¥åç°åä¼ è¾¾ç±å 容åå»ºè æéå®çæä¾ç©ºé´é³é¢ä½éªççå¤çèªéåºé³é¢ç¨æ·ä½éªãèªéåºé³é¢ç³»ç»å¯ä»¥å¨é对ååèå´å®½ç大éçæ¶è´¹è 端ç¹è¿è¡æ¸²ææé´ä½¿ç¨ï¼å¹¶ä¸æåºç¨çæ¸²æææ¯å¯ä»¥åå³äºç«¯ç¹è®¾å¤æ¥å¾å°ä¼åãä¾å¦ï¼å®¶åºå§é¢ç³»ç»åé³ç®±å¯ä»¥å¨åç§ä½ç½®å ·æ2ã3ã5ã7æè çè³9个åç¬çæ¬å£°å¨ã许å¤å ¶ä»ç±»åçç³»ç»åªæä¸¤ä¸ªæ¬å£°å¨(TVãèä¸åè®¡ç®æºãé³ä¹å¯¹æ¥å¨)ï¼å¹¶ä¸å 乿æç常ç¨è®¾å¤å ·æè³æºè¾åº(PCãèä¸åè®¡ç®æºãå¹³æ¿è®¡ç®æºãææºãé³ä¹åæ¾å¨ç)ãThe adaptive audio ecosystem is configured to be a comprehensive, end-to-end next-generation audio system using adaptive audio formats, which includes content creation, packaging, distribution and playback/rendering across a large number of endpoint devices and use cases. As shown in FIG. 4B , the system authores content captured from a number of different use cases 422 and 424 and content for different use cases 422 and 424 . These capture points include all relevant content formats, including cinema, TV, live broadcast (and sound), UGC, games and music. Content, as it travels through the ecosystem, passes through several key stages, such as pre-processing and authoring tools, translation tools (i.e., translation of adaptive audio content for cinema into consumer content distribution applications), specific adaptive audio Packetization/bitstream encoding (capturing audio substance data as well as additional metadata and audio reproduction information), for efficient distribution over various audio channels using existing or new codecs (e.g. DD+, TrueHD, Dolby Pulse), transmission through associated distribution channels (broadcast, disc, mobile, Internet, etc.) and finally endpoint-aware dynamic rendering to reproduce and convey the spatial audio experience defined by the content creator The benefits of an adaptive audio user experience. The adaptive audio system can be used during rendering to a widely varying number of consumer endpoints, and the applied rendering techniques can be optimized depending on the endpoint device. For example, home theater systems and sound boxes may have 2, 3, 5, 7 or even 9 individual speakers in various locations. Many other types of systems have only two speakers (TVs, laptops, music docks), and almost all common devices have headphone outputs (PCs, laptops, tablets, cell phones, music players, etc.).
å½åç¨äºç¯ç»å£°é³é¢çåä½åååç³»ç»å¨å¯¹å¨é³é¢å®è´¨(å³ï¼ç±åç°ç³»ç»åæ¾çå®é é³é¢)ä¸ä¼ è¾¾çå 容类åäºè§£æéçæ åµä¸å建计åç¨äºåç°çé³é¢å¹¶å°å ¶ä¼ éå°é¢å®ä¹çä¸åºå®çæ¬å£°å¨ä½ç½®ãç¶èï¼èªéåºé³é¢ç³»ç»ä¸ºé³é¢å建æä¾æ°çæ··ååæ¹æ³ï¼è¯¥æ¹æ³å æ¬ç¨äºåºå®çæ¬å£°å¨ä½ç½®ç¹å®çé³é¢(左声éãå³å£°éç)åå ·æä¸è¬åç3D空é´ä¿¡æ¯çåºäºå¯¹è±¡çé³é¢å ç´ çéé¡¹ï¼æè¿°3D空é´ä¿¡æ¯å æ¬ä½ç½®ã大å°åéåº¦ãæ¤æ··ååæ¹æ³æä¾å¯¹äºä¿ç度(ç±åºå®çæ¬å£°å¨ä½ç½®ææä¾ç)忏²æ(ä¸è¬åçé³é¢å¯¹è±¡)ççµæ´»æ§çåè¡¡çæ¹æ³ãæ¤ç³»ç»è¿ç»ç±ç±å 容å建è å¨å 容å建/å使¶ä¸é³é¢å®è´¨é å¯¹çæ°çå æ°æ®ï¼æ¥æä¾å ³äºé³é¢å 容çé¢å¤çæç¨ä¿¡æ¯ãæ¤ä¿¡æ¯æä¾å¯ä»¥å¨æ¸²ææé´ä½¿ç¨çå ³äºé³é¢ç屿§ç详ç»ä¿¡æ¯ãè¿æ ·ç屿§å¯å æ¬å 容类å(对è¯ãé³ä¹ãææãé é³(Foley)ãèæ¯/å¨å´ç¯å¢ç)以å诸å¦ç©ºé´å±æ§(3Dä½ç½®ã对象大å°ãé度çç)ä¹ç±»çé³é¢å¯¹è±¡ä¿¡æ¯åæç¨ç渲æä¿¡æ¯(䏿¬å£°å¨ä½ç½®ç对é½ã声éæéãå¢çãä½é³ç®¡çä¿¡æ¯ç)ãé³é¢å 容ååç°æå¾å æ°æ®å¯ä»¥ç±å 容å建è 人工å°åå»ºï¼æéè¿ä½¿ç¨èªå¨çåªä½æºè½ç®æ³æ¥åå»ºï¼æè¿°åªä½æºè½ç®æ³å¯ä»¥å¨å使é´å¨åå°è¿è¡ï¼å¹¶ä¸å¦æéè¦çè¯å¨æåçè´¨éæ§å¶é¶æ®µæé´ç±å 容å建è 审æ¥ãCurrent authoring and distribution systems for surround sound audio create and deliver audio intended for reproduction with limited knowledge of the type of content conveyed in the audio substance (i.e., the actual audio played back by the reproduction system). Defined and fixed speaker positions. Adaptive audio systems, however, offer a new hybrid approach to audio creation that includes position-specific audio for fixed speakers (left channel, right channel, etc.) and object-based audio with generalized 3D spatial information. Options for audio elements, the 3D spatial information includes position, size and velocity. This hybrid approach provides a balanced approach to fidelity (provided by fixed speaker positions) and flexibility in rendering (generalized audio objects). This system also provides additional useful information about the audio content via new metadata paired with the audio substance at the time of content creation/authoring by the content creator. This information provides details about the properties of the audio that can be used during rendering. Such attributes may include content type (dialogue, music, effects, voiceover (Foley), background/surroundings, etc.) as well as audio object information such as spatial attributes (3D position, object size, velocity, etc.) and useful rendering information (alignment to speaker position, channel weighting, gain, bass management information, etc.). Audio content and rendering intent metadata can be created manually by the content creator, or through the use of automated media intelligence algorithms that can run in the background during authoring and, if desired, at the final quality control stage period is reviewed by the content creator.
å¾4Cæ¯å¨å®æ½ä¾ä¸çèªéåºé³é¢ç¯å¢çåè½æ§ç»ä»¶çæ¡å¾ãå¦å¾ç¤º450æç¤ºï¼ç³»ç»å¤çæ¿è½½ææ··ååçåºäºå¯¹è±¡çååºäºå£°éçé³é¢æµçç»ç¼ç çæ¯ç¹æµ452ãç±æ¸²æ/ä¿¡å·å¤çå454å¤çæ¯ç¹æµãå¨å®æ½ä¾ä¸ï¼æ¤åè½åçè³å°ä¸é¨åå¯ä»¥å¨å¾3ä¸æç¤ºçæ¸²æå312ä¸å®ç°ã渲æåè½454å®ç°ç¨äºèªéåºé³é¢çåç§æ¸²æç®æ³ä»¥å诸å¦å䏿··åãå¤çç´æ¥å¯¹åå°å£°çä¹ç±»çæäºåå¤çç®æ³ãæ¥èªæ¸²æå¨çè¾åºéè¿ååäºè¿456被æä¾å°æ¬å£°å¨458ãå¨å®æ½ä¾ä¸ï¼æ¬å£°å¨458å æ¬å¯ä»¥å¸ç½®å¨ç¯ç»å£°æç±»ä¼¼çé ç½®ä¸ç许å¤ä¸ªå个驱å¨å¨ã驱å¨å¨è½åç¬å¯»åï¼å¹¶å¯ä»¥å®æ½å¨å个å¤å£³æå¤é©±å¨å¨ç®±æéµåä¸ãç³»ç»450ä¹å¯ä»¥å æ¬æä¾ç¨äºæ ¡å渲æè¿ç¨çæ¶å¬ç¯å¢ææ¿é´ç¹æ§çæµéç麦å é£460ãå¨å462䏿ä¾ç³»ç»é ç½®åæ ¡ååè½ãè¿äºåè½å¯ä»¥ä½ä¸ºæ¸²æç»ä»¶çä¸é¨åè¢«å æ¬ï¼æè å®ä»¬å¯ä»¥è¢«å®ç°ä¸ºå¨åè½ä¸ä¸æ¸²æå¨è¦æ¥çåç¬çç»ä»¶ãååäºè¿456æä¾ä»æ¶å¬ç¯å¢ä¸çæ¬å£°å¨åå°æ ¡åç»ä»¶462çåé¦ä¿¡å·è·¯å¾ãFigure 4C is a block diagram of functional components of an adaptive audio environment, under an embodiment. As shown in diagram 450, the system processes an encoded bitstream 452 carrying a hybrid object-based and channel-based audio stream. The bitstream is processed by rendering/signal processing block 454 . In an embodiment, at least a portion of this functional block may be implemented in the rendering block 312 shown in FIG. 3 . Rendering function 454 implements various rendering algorithms for adaptive audio as well as certain post-processing algorithms such as upmixing, handling direct vs. reflections, and the like. Output from the renderer is provided to speakers 458 via bi-directional interconnect 456 . In an embodiment, speakers 458 include a number of individual drivers that may be arranged in a surround sound or similar configuration. Drives are individually addressable and can be implemented in a single enclosure or in multiple drive enclosures or arrays. System 450 may also include a microphone 460 that provides measurements of listening environment or room characteristics for calibrating the rendering process. System configuration and calibration functions are provided in block 462 . These functions can be included as part of the rendering component, or they can be implemented as a separate component functionally coupled to the renderer. Bi-directional interconnect 456 provides a feedback signal path from the speakers in the listening environment back to calibration component 462 .
æ¶å¬ç¯å¢listening environment
èªéåºé³é¢ç³»ç»çå®ç°å¯ä»¥é¨ç½²å¨åç§ä¸åçæ¶å¬ç¯å¢ä¸ãè¿äºæ¶å¬ç¯å¢å æ¬é³é¢åæ¾åºç¨çä¸ä¸ªä¸»è¦é¢åï¼å®¶åºå§é¢ç³»ç»ï¼çµè§åé³ç®±ï¼ä»¥åè³æºãå¾5示åºäºç¤ºä¾æ§çå®¶åºå§é¢ç¯å¢ä¸çèªéåºé³é¢ç³»ç»çé¨ç½²ãå¾5çç³»ç»ç¤ºåºäºå¯ä»¥ç±èªéåºé³é¢ç³»ç»æä¾çç»ä»¶ååè½çè¶ éï¼å¹¶æäºæ¹é¢å¯ä»¥åºäºç¨æ·éæ±èåå°æå»é¤ï¼åæ¶ä»æä¾å¢å¼ºçä½éªãç³»ç»500å¨åç§ä¸åçç®±æéµå504ä¸å æ¬åç§ä¸åçæ¬å£°å¨å驱å¨å¨ãæ¬å£°å¨å æ¬æä¾åé¢ãä¾§é¢åå䏿¿åé项以åä½¿ç¨æäºé³é¢å¤çææ¯çé³é¢ç卿èæåçå个驱å¨å¨ãå¾500示åºäºå¨æ å9.1æ¬å£°å¨é ç½®ä¸é¨ç½²ç许å¤ä¸ªæ¬å£°å¨ãè¿äºæ¬å£°å¨å æ¬å·¦åå³é«åº¦æ¬å£°å¨(LHï¼RH)ãå·¦å峿¬å£°å¨(Lï¼R)ãä¸å¤®æ¬å£°å¨(被示为修æ¹çä¸å¤®æ¬å£°å¨)以åå·¦åå³ç¯ç»ååæ¹æ¬å£°å¨(LSï¼RSï¼LB以åRBï¼ä½é¢çå ä»¶LFEæªç¤ºåº)ãImplementations of adaptive audio systems can be deployed in a variety of different listening environments. These listening environments include three main areas of audio playback applications: home theater systems, televisions and speakers, and headphones. Figure 5 illustrates an exemplary deployment of an adaptive audio system in a home theater environment. The system of FIG. 5 illustrates a superset of components and functionality that may be provided by an adaptive audio system, and certain aspects may be reduced or removed based on user needs while still providing an enhanced experience. System 500 includes various speakers and drivers in various cabinets or arrays 504 . Speakers consist of a single driver that provides front, side and upward firing options as well as dynamic virtualization of audio using certain audio processing techniques. Diagram 500 shows a number of speakers deployed in a standard 9.1 speaker configuration. These speakers include left and right height speakers (LH, RH), left and right speakers (L, R), center speaker (shown as modified center speaker), and left and right surround and rear speakers (LS, RS, LB and RB, low frequency element LFE not shown).
å¾5示åºäºç¨å¨æ¶å¬ç¯å¢çä¸å¤®ä½ç½®ä¸çä¸å¤®å£°éæ¬å£°å¨510ç使ç¨ãå¨å®æ½ä¾ä¸ï¼æ¤æ¬å£°å¨ä½¿ç¨ä¿®æ¹çä¸å¤®å£°éæé«å辨ççä¸å¤®å£°é510æ¥å®ç°ãè¿ç§æ¬å£°å¨å¯ä»¥æ¯å¸¦æè½åç¬å¯»åçæ¬å£°å¨çå颿¿åä¸å¤®å£°ééµåï¼æè¿°è½åç¬å¯»åçæ¬å£°å¨éè¿å¹é å±å¹ä¸çè§é¢å¯¹è±¡çç§»å¨çéµåæ¥å 许é³é¢å¯¹è±¡çåç«ç平移ãå®å¯ä»¥å®æ½ä¸ºé«å辨ççä¸å¤®å£°é(HRC)æ¬å£°å¨ï¼è¯¸å¦å¨å½é ç³è¯·å·PCT/US2011/028783䏿æè¿°çï¼å ¶å ¨æéè¿å¼ç¨å¹¶å ¥äºæ¤ãHRCæ¬å£°å¨510è¿å¯ä»¥å æ¬ä¾§é¢æ¿åçæ¬å£°å¨ï¼å¦å¾æç¤ºã妿HRCæ¬å£°å¨ä¸ä» ç¨ä½ä¸å¤®æ¬å£°å¨è¿ç¨ä½å¸¦æé³ç®±åè½çæ¬å£°å¨ï¼åå¯ä»¥æ¿æ´»å¹¶ä½¿ç¨è¿äºä¾§é¢æ¿åçæ¬å£°å¨ãHRCæ¬å£°å¨ä¹å¯ä»¥è¢«å å«å¨å±å¹502ç䏿¹å/æä¾§é¢ï¼ä»¥æä¾é³é¢å¯¹è±¡çäºç»´çãé«å辨çç平移é项ãä¸å¤®æ¬å£°å¨510ä¹å¯ä»¥å æ¬é¢å¤ç驱å¨å¨ï¼å¹¶å®ç°å¸¦æåç¬å°æ§å¶ç声é³åºåçå¯æçºµç声æãFigure 5 shows the use of a center channel speaker 510 for use in a central location of the listening environment. In an embodiment, this speaker is implemented using a modified center channel or high resolution center channel 510 . Such speakers may be a front fired center channel array with individually addressable speakers that allow discrete panning of audio objects by matching the array to the movement of video objects on the screen. It may be implemented as a high-resolution center channel (HRC) loudspeaker, such as described in International Application No. PCT/US2011/028783, which is hereby incorporated by reference in its entirety. The HRC speakers 510 may also include side fired speakers, as shown. If the HRC loudspeaker is used not only as a center loudspeaker but also as a loudspeaker with cabinet function, these side-firing loudspeakers can be activated and used. HRC speakers may also be included above and/or to the sides of the screen 502 to provide two-dimensional, high-resolution panning options for audio objects. The center speaker 510 may also include additional drivers and enable steerable sound beams with individually controlled sound zones.
ç³»ç»500è¿å æ¬è¿åºæåº(NFE)æ¬å£°å¨512ï¼è¯¥NFEæ¬å£°å¨512å¯ä»¥ä½äºå¬è çå颿æ¥è¿å¬è çåé¢ï¼è¯¸å¦å¨åº§ä½ä½ç½®çåé¢çæ¡åä¸ãéç¨èªéåºé³é¢ï¼å¯ä»¥å°é³é¢å¯¹è±¡å¸¦å°æ¿é´ï¼èä¸åªæ¯éå®å°æ¿é´çå¨è¾¹ãå æ¤ï¼è®©å¯¹è±¡éåä¸ç»´ç©ºé´æ¯ä¸ä¸ªé项ãä¸ä¸ªç¤ºä¾æ¯å¯¹è±¡å¯ä»¥å¨Læ¬å£°å¨ä¸èµ·å§ï¼éè¿NFEæ¬å£°å¨ç©¿è¿æ¶å¬ç¯å¢ï¼å¹¶ä¸å¨RSæ¬å£°å¨ä¸ç»æãåç§ä¸åçæ¬å£°å¨å¯ä»¥éåç¨ä½NFEæ¬å£°å¨ï¼è¯¸å¦æ 线ççµæ± ä¾çµçæ¬å£°å¨ãThe system 500 also includes a near field effect (NFE) speaker 512 that may be located at or near the front of the listener, such as on a table in front of a seating position. With Adaptive Audio, audio objects can be brought into the room instead of just locked to the perimeter of the room. So having objects traverse 3D space is an option. An example is that an object may start in L speakers, pass through the listening environment through NFE speakers, and end in RS speakers. A variety of different speakers may be suitable for use as NFE speakers, such as wireless battery powered speakers.
å¾5示åºäºä½¿ç¨å¨ææ¬å£°å¨èæå以å¨å®¶åºå½±é¢ç¯å¢ä¸æä¾æ²æµ¸å¼ç¨æ·ä½éªãéè¿åºäºç±èªéåºé³é¢å 容ææä¾ç对象空é´ä¿¡æ¯ï¼å¯¹æ¬å£°å¨èæåç®æ³åæ°ç卿æ§å¶ï¼æ¥å®ç°å¨ææ¬å£°å¨èæåãå¨å¾5ä¸ç¤ºåºäºå¯¹äºLåRæ¬å£°å¨çæè¿°å¨æèæåï¼èèåå»ºæ²¿çæ¶å¬ç¯å¢çä¾§é¢ç§»å¨ç对象çæç¥æ¯èªç¶çãå¯ä»¥é对æ¯ä¸ªç¸å ³å¯¹è±¡ä½¿ç¨åç¬çèæåå¨ï¼å¹¶ä¸å¯ä»¥å°ç»åçä¿¡å·åéå°LåRæ¬å£°å¨ä»¥å建å¤ä¸ªå¯¹è±¡èæåææã示åºäºé对LåRæ¬å£°å¨ä»¥åæ¨å¨ä½ä¸ºç«ä½å£°æ¬å£°å¨(带æä¸¤ä¸ªç¬ç«è¾å ¥)çNFEæ¬å£°å¨ç卿èæåææãæ¤æ¬å£°å¨ä¸é³é¢å¯¹è±¡å¤§å°åä½ç½®ä¿¡æ¯ä¸èµ·å¯ç¨äºåå»ºæ©æ£æç¹æºè¿åºé³é¢ä½éªã类似çèæåææä¹å¯ä»¥åºç¨äºç³»ç»ä¸çå ¶ä»æ¬å£°å¨ä¸çä»»ä½ä¸ä¸ªæå ¨é¨ãå¨å®æ½ä¾ä¸ï¼ç §ç¸æºå¯ä»¥æä¾é¢å¤çå¬è ä½ç½®å身份信æ¯ï¼è¯¥èº«ä»½ä¿¡æ¯å¯ä»¥è¢«èªéåºé³é¢æ¸²æå¨ç¨æ¥æä¾æ´å 符åè°é³å¸çèºæ¯æå¾çæ´å å¼äººæ³¨ç®çä½éªãFigure 5 illustrates the use of dynamic speaker virtualization to provide an immersive user experience in a home theater environment. Dynamic speaker virtualization is achieved through dynamic control of speaker virtualization algorithm parameters based on object space information provided by adaptive audio content. With the described dynamic virtualization for L and R loudspeakers shown in Figure 5, it is natural to consider creating the perception of objects moving along the sides of the listening environment. A separate virtualizer can be used for each object of interest, and the combined signal can be sent to the L and R speakers to create multiple object virtualization effects. The dynamic virtualization effect is shown for the L and R speakers as well as the NFE speakers intended as stereo speakers (with two independent inputs). This speaker, along with audio object size and position information, can be used to create a diffuse or point source near-field audio experience. Similar virtualization effects can also be applied to any or all of the other speakers in the system. In an embodiment, the camera may provide additional listener location and identity information that may be used by the Adaptive Audio Renderer to provide a more compelling experience that is more in line with the sound engineer's artistic intent.
èªéåºé³é¢æ¸²æå¨ç解混åååæ¾ç³»ç»ä¹é´ç空é´å ³ç³»ãå¨åæ¾ç¯å¢çæäºå®ä¾ä¸ï¼åç«çæ¬å£°å¨ä¹å¯ä»¥å¨æ¶å¬ç¯å¢çå æ¬å¤´é¡¶ä½ç½®çææç¸å ³åºåä¸å¯ç¨ï¼å¦å¾1æç¤ºãå¨åç«çæ¬å£°å¨å¨æäºä½ç½®å¯ç¨çè¿äºæ åµä¸ï¼æ¸²æå¨å¯ä»¥è¢«é 置为å°å¯¹è±¡âæ£âå°æè¿çæ¬å£°å¨ï¼è䏿¯éè¿å¹³ç§»æä½¿ç¨æ¬å£°å¨èæåç®æ³æ¥å¨ä¸¤ä¸ªææ´å¤æ¬å£°å¨ä¹é´å建幻åã尽管å®ç¨å¾®å°ææ²äºæ··åç空é´åç°ï¼ä½æ¯å®è¿å 许渲æå¨é¿å æ æçå¹»åãä¾å¦ï¼å¦ææ··å级çå·¦æ¬å£°å¨çè§ä½ç½®ä¸å¯¹åºäºåæ¾ç³»ç»çå·¦æ¬å£°å¨çè§ä½ç½®ï¼åå¯ç¨æ¤åè½å°é¿å å ·æåå§å·¦å£°éçæå®å¹»åãThe Adaptive Audio Renderer understands the spatial relationship between mixing and playback systems. In some instances of a playback environment, discrete loudspeakers may also be available in all relevant areas of the listening environment, including overhead positions, as shown in FIG. 1 . In those cases where discrete speakers are available in some locations, the renderer can be configured to "snap" objects to the closest speaker, rather than between two or more speakers by panning or using speaker virtualization algorithms. Create phantoms. Although it slightly distorts the spatial representation of the blend, it also allows the renderer to avoid unintentional phantoms. For example, if the angular position of the left speaker of the mixing stage does not correspond to the angular position of the left speaker of the playback system, enabling this feature will avoid having a constant phantom with the initial left channel.
ç¶èï¼å¨å¾å¤æ åµä¸ï¼ç¹å«æ¯å¨å®¶åºç¯å¢ä¸ï¼è¯¸å¦å®è£ å¨å¤©è±æ¿ä¸ç头顶æ¬å£°å¨ä¹ç±»çæäºæ¬å£°å¨ä¸å¯ç¨ã卿¤æ åµä¸ï¼æäºèæåææ¯éè¿æ¸²æå¨æ¥å®ç°ï¼ä»¥éè¿ç°æçå®è£ å¨å°æ¿æå¢å£çæ¬å£°å¨åç°å¤´é¡¶çé³é¢å 容ãå¨å®æ½ä¾ä¸ï¼èªéåºé³é¢ç³»ç»å æ¬éè¿å æ¬æ¯ä¸ªæ¬å£°å¨çå颿¿åè½ååé¡¶é¨(æâåä¸â)æ¿åè½åè¿ä¸¤è è¿è¡ç对æ åé ç½®çä¿®æ¹ãå¨ä¼ ç»çå®¶åºåºç¨ä¸ï¼æ¬å£°å¨å¶é åè¯å¾å¼å ¥æ°ç驱å¨å¨é ç½®èéå颿¿åçæ¢è½å¨ï¼å¹¶ç¢°å°è¯å¾æ è¯åºè¯¥åè¿äºæ°ç驱å¨å¨åéåå§é³é¢ä¿¡å·(æå¯¹å®ä»¬çä¿®æ¹)ä¸çåªäºçé®é¢ãéç¨èªéåºé³é¢ç³»ç»ï¼æé常ç¹å®çå ³äºåºè¯¥å¨æ åæ°´å¹³å¹³é¢ä¸æ¹æ¸²æåªäºé³é¢å¯¹è±¡çä¿¡æ¯ãå¨å®æ½ä¾ä¸ï¼ä½¿ç¨å䏿¿åç驱å¨å¨æ¥æ¸²æåå¨äºèªéåºé³é¢ç³»ç»ä¸çé«åº¦ä¿¡æ¯ãåæ ·ï¼ä¾§é¢æ¿åæ¬å£°å¨å¯ç¨äºæ¸²ææäºå ¶ä»å 容ï¼è¯¸å¦ç¯å¢ææãHowever, in many cases, especially in domestic environments, certain speakers, such as ceiling-mounted overhead speakers, are not available. In this case, some virtualization technology is implemented with a renderer to reproduce the overhead audio content through existing floor or wall mounted speakers. In an embodiment, the adaptive audio system includes modifications to the standard configuration by including both front firing capability and top (or "upward") firing capability for each speaker. In traditional home applications, loudspeaker manufacturers attempt to introduce new driver configurations other than previously excited transducers, and are confronted with trying to identify which of the original audio signals (or modifications to them) should be sent to these new drivers. The problem. With an adaptive audio system, there is very specific information about which audio objects should be rendered above a standard horizontal plane. In an embodiment, an upwardly fired driver is used to render altitude information present in an adaptive audio system. Likewise, side-firing speakers can be used to render some other content, such as ambient effects.
å䏿¿åç驱å¨å¨çä¸ä¸ªä¼ç¹æ¯ï¼å®ä»¬å¯ç¨äºä»ç¡¬çå¤©è±æ¿è¡¨é¢åå°å£°é³ï¼ä»¥æ¨¡æä½äºå¤©è±æ¿ä¸ç头顶/é«åº¦æ¬å£°å¨çåå¨ãèªéåºé³é¢å 容çå¼äººæ³¨ç®ç屿§æ¯ï¼ä½¿ç¨å¤´é¡¶æ¬å£°å¨éµåæ¥åç°å¨ç©ºé´ä¸ä¸åçé³é¢ãç¶èï¼å¦ä¸æè¿°ï¼å¨å¾å¤æ åµä¸ï¼å®è£ 头顶æ¬å£°å¨å¨å®¶åºç¯å¢ä¸å¤ªæè´µæä¸åå®é ãéè¿ä½¿ç¨æ°´å¹³å¹³é¢ä¸çé常å®ç½®çæ¬å£°å¨æ¥æ¨¡æé«åº¦æ¬å£°å¨ï¼å¯ä»¥å¨å®¹æå®ç½®æ¬å£°å¨çæ åµä¸å建å¼äººæ³¨ç®ç3Dä½éªã卿¤æ åµä¸ï¼èªéåºé³é¢ç³»ç»ä»¥é³é¢å¯¹è±¡åå ¶ç©ºé´åç°ä¿¡æ¯è¢«ç¨äºå建ç±å䏿¿å驱å¨å¨åç°çé³é¢çæ°æ¹å¼æ¥ä½¿ç¨å䏿¿åç/模æé«åº¦ç驱å¨å¨ãAn advantage of upward-firing drivers is that they can be used to bounce sound off hard ceiling surfaces to simulate the presence of overhead/height speakers located in the ceiling. A compelling property of adaptive audio content is the use of an array of overhead speakers to reproduce spatially distinct audio. However, as mentioned above, in many cases installing overhead speakers is either too expensive or impractical in a home environment. By simulating height speakers using normally placed speakers in a horizontal plane, a compelling 3D experience can be created with easy speaker placement. In this case, the adaptive audio system uses upward-firing/simulated height drivers in a new way that audio objects and their spatial reproduction information are used to create audio reproduced by upward-firing drivers.
å¾6示åºäºå¨å®¶åºå§é¢ä¸ä½¿ç¨åå°å£°æ¥æ¨¡æå个头顶æ¬å£°å¨çå䏿¿å驱å¨å¨ç使ç¨ãåºè¯¥æåºçæ¯ï¼å¯ä»¥ç»åå°ä½¿ç¨ä»»ææ°éçå䏿¿å驱å¨å¨æ¥å建å¤ä¸ªæ¨¡æçé«åº¦æ¬å£°å¨ãæ¿ä»£å°ï¼è®¸å¤ä¸ªå䏿¿åç驱å¨å¨å¯ä»¥è¢«é 置为å°å£°é³ä¼ è¾å°å¤©è±æ¿ä¸çåºæ¬ä¸åä¸ç¹ä»¥å®ç°æå£°é³å¼ºåº¦æææãå¾600示åºäºéå¸¸çæ¶å¬ä½ç½®602ä½äºæ¶å¬ç¯å¢å çä¸ä¸ªç¹å®ä½ç½®ç示ä¾ã该系ç»ä¸å æ¬ç¨äºä¼ è¾å å«é«åº¦æç¤ºçé³é¢å 容çä»»ä½é«åº¦æ¬å£°å¨ãç¸åï¼æ¬å£°å¨ç®±ææ¬å£°å¨éµå604å æ¬å䏿¿åç驱å¨å¨ä»¥åå颿¿åç驱å¨å¨ãå䏿¿åç驱å¨å¨è¢«é 置为(ç¸å¯¹äºä½ç½®å徿è§)å°å ¶å£°æ³¢606åéå°å¤©è±æ¿608ä¸çç¹å®ç¹ï¼å¨è¯¥ç¹å®ç¹å¤å£°æ³¢606å°è¢«åå°åå°æ¶å¬ä½ç½®602ãå设ï¼å¤©è±æ¿ç±éå½çææåæä»½å¶æï¼ä»¥éå½å°å°å£°é³åå°å°æ¶å¬ç¯å¢ä¸ãå¯ä»¥åºäºå¤©è±æ¿æä»½ãæ¿é´å¤§å°ä»¥åæ¶å¬ç¯å¢çå ¶ä»ç¸å ³ç¹å¾ï¼æ¥éæ©å䏿¿åç驱å¨å¨çç¸å ³ç¹æ§(ä¾å¦ï¼å¤§å°ãåçãä½ç½®ç)ãè½ç¶å¨å¾6ä¸åªç¤ºåºäºä¸ä¸ªå䏿¿åç驱å¨å¨ï¼ä½æ¯å¨ä¸äºå®æ½ä¾ä¸å¯ä»¥å°å¤ä¸ªå䏿¿åç驱å¨å¨å å«å°åç°ç³»ç»ä¸ãFigure 6 illustrates the use of reflected sound to simulate the use of an up-firing driver for a single overhead speaker in a home theater. It should be noted that any number of up-firing drivers can be used in combination to create multiple simulated height speakers. Alternatively, a number of upward firing drivers may be configured to transmit sound to substantially the same point on the ceiling to achieve a certain sound intensity or effect. Diagram 600 shows an example where a typical listening position 602 is located at a particular location within a listening environment. The system does not include any height speakers for transmitting audio content containing height cues. In contrast, speaker cabinet or speaker array 604 includes upwardly fired drivers as well as front fired drivers. An upwardly fired driver is configured (with respect to position and tilt angle) to send its sound wave 606 to a specific point on the ceiling 608 where the sound wave 606 will be reflected back to the listening position 602 . It is assumed that the ceiling is made of suitable materials and components to properly reflect sound into the listening environment. Relevant characteristics (eg, size, power, location, etc.) of the up-firing drivers may be selected based on ceiling composition, room size, and other relevant characteristics of the listening environment. Although only one upward-firing driver is shown in FIG. 6, multiple upward-firing drivers may be incorporated into the reproduction system in some embodiments.
å¨å®æ½ä¾ä¸ï¼èªéåºé³é¢ç³»ç»ä½¿ç¨å䏿¿åç驱å¨å¨æ¥æä¾é«åº¦å ç´ ãä¸è¬èè¨ï¼å·²ç»è¡¨æï¼å å«ç¨äºå°æç¥é«åº¦æç¤ºå¼å ¥å°æ£å¨è¢«é¦éå°å䏿¿åç驱å¨å¨çé³é¢ä¿¡å·çä¿¡å·å¤çæ¹åäºèæé«åº¦ä¿¡å·çå®ä½åæç¥è´¨éãä¾å¦ï¼å·²ç»å¼ååºäºåæ°åæç¥åè³å¬è§æ¨¡å以å建é«åº¦æç¤ºæ»¤æ³¢å¨ï¼è¯¥é«åº¦æç¤ºæ»¤æ³¢å¨å½ç¨äºå¤çæ£å¨ç±å䏿¿åç驱å¨å¨åç°çé³é¢æ¶æ¹åäºæè¿°åç°çæç¥è´¨éãå¨å®æ½ä¾ä¸ï¼é«åº¦æç¤ºæ»¤æ³¢å¨æ¯ä»ç©çæ¬å£°å¨ä½ç½®(大è´ä¸å¬è é½å¹³)ååå°æ¬å£°å¨ä½ç½®(å¨å¬è 䏿¹)è¿ä¸¤è 导åºçã对äºç©çæ¬å£°å¨ä½ç½®ï¼æ¹åæ»¤æ³¢å¨æ¯åºäºå¤è³(æè³å»)çæ¨¡åç¡®å®çã该滤波å¨ç鿥䏿¥è¢«ç¡®å®ï¼å¹¶ç¨äºä»ç©çæ¬å£°å¨ä¸å»é¤é«åº¦æç¤ºãæ¥ä¸æ¥ï¼å¯¹äºåå°æ¬å£°å¨ä½ç½®ï¼ä½¿ç¨å¤è³çç¸åæ¨¡åæ¥ç¡®å®ç¬¬äºæ¹å滤波å¨ã妿声é³å¨å¬è 䏿¹ï¼è¯¥æ»¤æ³¢å¨è¢«ç´æ¥åºç¨ï¼åºæ¬ä¸åç°è³æµä¼æ¥æ¶å°çæç¤ºãå¨å®è·µä¸ï¼è¿äºæ»¤æ³¢å¨å¯ä»¥ä»¥å 许å个滤波å¨(1)ä»ç©çæ¬å£°å¨ä½ç½®å»é¤é«åº¦æç¤ºä»¥å(2)ä»åå°æ¬å£°å¨ä½ç½®æå ¥é«åº¦æç¤ºçæ¹å¼ç»åãå¾16æ¯ç¤ºåºäºè¿ç§ç»åçæ»¤æ³¢å¨çé¢çååºçæ²çº¿å¾ãç»åçæ»¤æ³¢å¨å¯ä»¥ä»¥å 许ç¸å¯¹äºæåºç¨ç滤波ç䏻卿§æéçä¸äºå¯è°æ´æ§çæ¹å¼ä½¿ç¨ãä¾å¦ï¼å¨æäºæ åµä¸ï¼ä¸å®å ¨å»é¤ç©çæ¬å£°å¨é«åº¦æç¤ºæå®å ¨åºç¨åå°æ¬å£°å¨é«åº¦æç¤ºæ¯æççï¼å ä¸ºåªææ¥èªç©çæ¬å£°å¨çæäºå£°é³ç´æ¥å°è¾¾å¬è (å ¶ä½é¨åä»å¤©è±æ¿åå°)ãIn an embodiment, an adaptive audio system uses upward-firing drivers to provide an element of height. In general, it has been shown that the inclusion of signal processing for introducing perceptual height cues to the audio signal being fed to an upwardly fired driver improves the localization and perceptual quality of the virtual height signal. For example, parametric perceptual binaural auditory models have been developed to create height cue filters that, when used to process audio being reproduced by upwardly firing drivers, improve the perceptual quality of the reproduction. In an embodiment, the height cue filter is derived from both the physical speaker position (approximately level with the listener) and the reflective speaker position (above the listener). For physical speaker positions, the directional filter is determined based on a model of the outer ear (or pinna). The inverse of this filter is then determined and used to remove height cues from the physical speakers. Next, for reflective speaker positions, the same model of the outer ear is used to determine a second directional filter. This filter is applied directly, essentially reproducing the cue the ear would receive if the sound was above the listener. In practice, these filters can be combined in a way that allows a single filter to (1) remove height cues from physical speaker positions and (2) insert height cues from reflected speaker positions. Figure 16 is a graph showing the frequency response of such a combined filter. The combined filters may be used in a manner that allows some adjustability with respect to the aggressiveness or amount of filtering applied. For example, in some cases it is beneficial not to completely remove the physical speaker height cues or to fully apply the reflective speaker height cues, since only some of the sound from the physical speakers reaches the listener directly (the rest is reflected from the ceiling).
æ¬å£°å¨é ç½®speaker configuration
èªéåºé³é¢ç³»ç»ç主è¦èèæ¯æ¬å£°å¨é ç½®ã该系ç»ä½¿ç¨è½åç¬å°å¯»åç驱å¨å¨ï¼å¹¶ä¸è¿ç§é©±å¨å¨çéµå被é 置为æä¾ç´æ¥ååå°å£°æºè¿ä¸¤è çç»åãå°ç³»ç»æ§å¶å¨(ä¾å¦ï¼A/Væ¥æ¶å¨ï¼æºé¡¶ç)çååé¾è·¯å 许é³é¢åé ç½®æ°æ®è¢«åéå°æ¬å£°å¨ï¼å¹¶ä¸å 许æ¬å£°å¨åä¼ æå¨ä¿¡æ¯è¢«åéåæ§å¶å¨ï¼å建活è·çéç¯ç³»ç»ãThe main consideration for an adaptive audio system is speaker configuration. The system uses individually addressable drivers, and arrays of such drivers are configured to provide a combination of both direct and reflected sound sources. A two-way link to the system controller (eg, A/V receiver, set-top box) allows audio and configuration data to be sent to the speakers, and allows speaker and sensor information to be sent back to the controller, creating an active closed-loop system.
ä¸ºäºæè¿°çç®çï¼æ¯è¯â驱å¨å¨âææååºäºçµæ°é³é¢è¾å ¥ä¿¡å·è产ç声é³çå个çµå£°æ¢è½å¨ã驱å¨å¨å¯ä»¥ä»¥ä»»ä½åéçç±»åãå ä½å½¢ç¶å大尿¥å®ç°ï¼å¹¶å¯å æ¬ååå½¢ãé¥å½¢ãå¸¦ç¶æ¢è½å¨ççãæ¯è¯âæ¬å£°å¨âæææ´ä½çå¤å£³ä¸çä¸ä¸ªææ´å¤ä¸ªé©±å¨å¨ãå¾7A示åºäºå¨å®æ½ä¾ä¸å ·æç¬¬ä¸é ç½®ä¸çå¤ä¸ªé©±å¨å¨çæ¬å£°å¨ãå¦å¾7Aæç¤ºï¼æ¬å£°å¨å¤å£³700å ·æå®è£ å¨å¤å£³å ç许å¤ä¸ªåç¬ç驱å¨å¨ãå ¸åå°ï¼å¤å£³å°å æ¬ä¸ä¸ªææ´å¤ä¸ªå颿¿åç驱å¨å¨702ï¼è¯¸å¦ä½é³æ¬å£°å¨ãä¸é³åæ¬å£°å¨æé«é³æ¬å£°å¨ï¼æå ¶ä»»ä½ç»åãä¹å¯ä»¥å æ¬ä¸ä¸ªææ´å¤ä¸ªä¾§é¢æ¿åç驱å¨å¨704ãå颿¿åå侧颿¿åç驱å¨å¨å ¸åå°å®è£ 为ä¸å¤å£³çä¾§é¢å¹³é½ï¼ä½¿å¾å®ä»¬ä»ç±æ¬å£°å¨éå®çåç´é¢åç´å夿å°å£°é³ï¼å¹¶ä¸è¿äºé©±å¨å¨éå¸¸æ°¸ä¹ å°åºå®å¨ç®±700å ã对äºä»¥åå°å£°ç渲æä¸ºç¹å¾çèªéåºé³é¢ç³»ç»ï¼è¿æä¾ä¸ä¸ªææ´å¤ä¸ªåä¸å¾æç驱å¨å¨706ãè¿äºé©±å¨å¨è¢«å®ä½ä¸ºä½¿å¾å®ä»¬ä»¥æä¸è§åº¦å°å£°é³æå°å°å¤©è±æ¿ï¼å¨é£é声é³è¢«å¼¹åå°å¬è ï¼å¦å¾6æç¤ºãå¾æåº¦å¯ä»¥åå³äºæ¶å¬ç¯å¢ç¹æ§åç³»ç»è¦æ±æ¥è®¾ç½®ãä¾å¦ï¼åä¸é©±å¨å¨706å¯ä»¥åä¸å¾æå¨30å60度ä¹é´ï¼å¹¶å¯ä»¥å®ä½å¨æ¬å£°å¨å¤å£³700ä¸çåé¢çæ¿å驱å¨å¨702䏿¹ï¼ä»¥ä¾¿æå°åä¸ä»å颿¿åç驱å¨å¨702产çç声波çå¹²æ°ãå䏿¿åç驱å¨å¨706å¯ä»¥ä»¥åºå®è§åº¦å®è£ ï¼æå®å¯ä»¥è¢«å®è£ 为使å¾å¯ä»¥äººå·¥å°è°æ´å¾æè§ãæ¿ä»£å°ï¼å¯ä»¥ä½¿ç¨ä¼ºææºå¶æ¥å 许对å䏿¿åç驱å¨å¨ç徿è§åæå°æ¹åçèªå¨æçµæ°æ§å¶ãå¯¹äºæäºå£°é³ï¼è¯¸å¦ç¯å¢å£°ï¼å䏿¿åç驱å¨å¨å¯ä»¥ç«ç´å䏿忬声å¨å¤å£³700çä¸è¡¨é¢ä¹å¤ï¼ä»¥å建å¯ä»¥è¢«ç§°ä¸ºâ顶鍿¿åçâ驱å¨å¨çä¸è¥¿ã卿¤æ åµä¸ï¼åå³äºå¤©è±æ¿ç声é³ç¹æ§ï¼å£°é³ç大åéå¯ä»¥åå°åæ¬å£°å¨ãç¶èï¼å¨å¤§å¤æ°æ åµä¸ï¼å¾æè§é常ç¨äºå¸®å©éè¿ä»å¤©è±æ¿åå°å°æ¶å¬ç¯å¢å çä¸åçæå¤ä¸ªä¸å¿ä½ç½®æ¥æå°å£°é³ï¼å¦å¾6æç¤ºãFor purposes of description, the term "driver" means a single electro-acoustic transducer that produces sound in response to an electrical audio input signal. Drivers may be implemented in any suitable type, geometry and size, and may include horns, cones, ribbon transducers, and the like. The term "speaker" means one or more drivers in an integral housing. Figure 7A shows a loudspeaker with multiple drivers in a first configuration, under an embodiment. As shown in Figure 7A, a speaker enclosure 700 has a number of individual drivers mounted within the enclosure. Typically, the enclosure will include one or more front fired drivers 702, such as woofers, mid-range speakers, or tweeters, or any combination thereof. One or more side fired drivers 704 may also be included. Front-firing and side-firing drivers are typically mounted flush with the sides of the enclosure so that they project sound perpendicularly outward from the vertical plane defined by the speakers, and these drivers are usually permanently affixed within enclosure 700 . For adaptive audio systems featuring the rendering of reflected sound, one or more upward-sloping drivers 706 are also provided. These drivers are positioned such that they project sound at an angle to the ceiling where it is bounced back to the listener, as shown in Figure 6. The inclination can be set depending on listening environment characteristics and system requirements. For example, upward driver 706 may be angled upward between 30 and 60 degrees and may be positioned above front-firing driver 702 in speaker housing 700 to minimize interference with sound waves generated from front-firing driver 702. The upward firing driver 706 can be mounted at a fixed angle, or it can be mounted such that the tilt angle can be adjusted manually. Alternatively, a servo mechanism may be used to allow automatic or electrical control of the tilt angle and projection direction of the upwardly fired drive. For certain sounds, such as ambient sound, the upwardly fired drivers can be pointed straight up and out of the upper surface of the speaker housing 700 to create what may be referred to as "top fired" drivers. In this case, depending on the acoustic characteristics of the ceiling, a large component of the sound can be reflected back to the loudspeaker. In most cases, however, the oblique angle is generally used to help project the sound by reflecting from the ceiling to different or multiple central locations within the listening environment, as shown in Figure 6.
å¾7Aæ¨å¨ç¤ºåºæ¬å£°å¨å驱å¨å¨é ç½®çä¸ä¸ªç¤ºä¾ï¼å¹¶ä¸è®¸å¤å ¶ä»é ç½®ä¹æ¯å¯ä»¥çãä¾å¦ï¼å䏿¿åç驱å¨å¨å¯ä»¥è®¾å¨å ¶èªå·±çå¤å£³ä¸ï¼ä»¥å 许ä¸ç°æçæ¬å£°å¨ä¸èµ·ä½¿ç¨ãå¾7B示åºäºå¨å®æ½ä¾ä¸çå ·æåå¸å¨å¤ä¸ªå¤å£³ä¸ç驱å¨å¨çæ¬å£°å¨ç³»ç»ãå¦å¾7Bæç¤ºï¼å䏿¿åç驱å¨å¨712设å¨åç¬çå¤å£³710ä¸ï¼æè¿°å¤å£³710å¯ä»¥ä½äºå ·æå颿¿åå/æä¾§é¢æ¿åç驱å¨å¨716å718çå¤å£³714éè¿æé¡¶é¨ã驱å¨å¨ä¹å¯ä»¥å°é卿¬å£°å¨é³ç®±å ï¼è¯¸å¦å¨è®¸å¤å®¶åºå½±é¢ç¯å¢ä¸æä½¿ç¨çï¼å ¶ä¸æ²¿çä¸ä¸ªè½´å¨å个水平æåç´å¤å£³å æåæè®¸å¤ä¸ªå°åæä¸åç驱å¨å¨ãå¾7C示åºäºå¨å®æ½ä¾ä¸çé³ç®±å ç驱å¨å¨çå¸å±ã卿¤ç¤ºä¾ä¸ï¼é³ç®±å¤å£³730æ¯å æ¬ä¾§é¢æ¿åç驱å¨å¨734ãå䏿¿åç驱å¨å¨736以åå颿¿åç驱å¨å¨732çæ°´å¹³é³ç®±ãå¾7Cæ¨å¨åªä½ä¸ºä¸ä¸ªç¤ºä¾æ§é ç½®ï¼å¹¶ä¸å¯ä»¥é对æ¯ä¸ªåè½ââå颿¿åã侧颿¿ååå䏿¿åââ使ç¨ä»»ä½å®é æ°éç驱å¨å¨ãFigure 7A is intended to illustrate one example of a speaker and driver configuration, and many other configurations are possible. For example, an up-firing driver could be housed in its own housing to allow use with existing loudspeakers. Figure 7B shows a speaker system with drivers distributed in multiple enclosures, under an embodiment. As shown in FIG. 7B , up-firing driver 712 is provided in a separate housing 710 which may be located near or on top of housing 714 with front-firing and/or side-firing drivers 716 and 718 . Drivers may also be enclosed in speaker enclosures, such as are used in many home theater environments, where many small or mid-sized drivers are arrayed along one axis within a single horizontal or vertical enclosure. Figure 7C shows the layout of the drivers within the enclosure, under an embodiment. In this example, the speaker enclosure 730 is a horizontal speaker that includes a side-firing driver 734 , an upward-firing driver 736 , and a front-firing driver 732 . Figure 7C is intended to be only one exemplary configuration, and any practical number of drivers may be used for each function - front firing, side firing, and upward firing.
对äºå¾7A-Cç宿½ä¾ï¼åºè¯¥æ³¨æçæ¯ï¼åå³äºæéçé¢çååºç¹æ§ï¼ä»¥å诸å¦å¤§å°ãåçå®é¢ãç»ä»¶ææ¬çä¹ç±»çä»»ä½å ¶ä»ç¸å ³çº¦æï¼é©±å¨å¨å¯ä»¥æ¯ä»»ä½åéçå½¢ç¶ã大å°ä»¥åç±»åã7A-C, it should be noted that depending on the desired frequency response characteristics, and any other relevant constraints such as size, power rating, component cost, etc., the driver can be any suitable shape, size and type.
å¨å ¸åçèªéåºé³é¢ç¯å¢ä¸ï¼å¨æ¶å¬ç¯å¢ä¸å°å å«è®¸å¤ä¸ªæ¬å£°å¨å¤å£³ãå¾8示åºäºæ¾ç½®å¨æ¶å¬ç¯å¢å çå ·æå æ¬ç½®äºå䏿¿åç驱å¨å¨çå¯åç¬å¯»åç驱å¨å¨çæ¬å£°å¨çç¤ºä¾æ§å¸å±ãå¦å¾8æç¤ºï¼æ¶å¬ç¯å¢800å æ¬å个åç¬çæ¬å£°å¨806ï¼æ¯ä¸ªé½å ·æè³å°ä¸ä¸ªå颿¿åã侧颿¿å以åå䏿¿åç驱å¨å¨ãæ¶å¬ç¯å¢è¿å¯ä»¥å å«ç¨äºç¯ç»å£°åºç¨çåºå®é©±å¨å¨ï¼è¯¸å¦ä¸å¤®æ¬å£°å¨802åéä½é³æ¬å£°å¨æLFE 804ãå¨å¾8ä¸å¯ä»¥çåºï¼åå³äºæ¶å¬ç¯å¢ä»¥å忬声å¨åå ç大å°ï¼æ¶å¬ç¯å¢å çæ¬å£°å¨806çé彿¾ç½®å¯ä»¥æä¾ç±æ¥èªè®¸å¤ä¸ªå䏿¿åç驱å¨å¨ç声é³ä»å¤©è±æ¿åå°å¼æäº§çç丰å¯çé³é¢ç¯å¢ãåå³äºå å®¹ãæ¶å¬ç¯å¢å¤§å°ãå¬è ä½ç½®ã声å¦ç¹æ§ä»¥åå ¶ä»ç¸å ³åæ°ï¼æ¬å£°å¨å¯ä»¥ç®çæ¯æä¾ä»å¤©è±æ¿å¹³é¢ä¸çä¸ä¸ªææ´å¤ä¸ªç¹çåå°ãIn a typical adaptive audio environment, many speaker enclosures will be included in the listening environment. Figure 8 shows an exemplary layout of a loudspeaker with individually addressable drivers including an upwardly fired driver placed within a listening environment. As shown in FIG. 8, the listening environment 800 includes four individual speakers 806, each having at least one front-firing, side-firing, and upward-firing driver. The listening environment may also contain fixed drivers such as a center speaker 802 and a subwoofer or LFE 804 for surround sound applications. As can be seen in Figure 8, depending on the listening environment and the size of the individual speaker units, proper placement of the speakers 806 within the listening environment can provide rich audio produced by sound from many upwardly fired drivers bouncing off the ceiling environment. Depending on the content, listening environment size, listener position, acoustic characteristics, and other relevant parameters, loudspeakers may be designed to provide reflections from one or more points on the ceiling plane.
å¨å®¶åºå§é¢æç±»ä¼¼æ¶å¬ç¯å¢çèªéåºé³é¢ç³»ç»ä¸ä½¿ç¨çæ¬å£°å¨å¯ä»¥ä½¿ç¨åºäºç°æçç¯ç»å£°é ç½®(ä¾å¦ï¼5.1ã7.1ã9.1ç)çé ç½®ã卿¤æ åµä¸ï¼å¨é对å䏿¿åç声é³ç»ä»¶æä¾äºé¢å¤ç驱å¨å¨åéå®çæ åµä¸ï¼è®¸å¤ä¸ªé©±å¨å¨è¢«æä¾å¹¶æ ¹æ®å·²ç¥ç¯ç»å£°çº¦å®éå®ãSpeakers used in an adaptive audio system for a home theater or similar listening environment may use configurations based on existing surround sound configurations (eg, 5.1, 7.1, 9.1, etc.). In this case, a number of drivers are provided and defined according to known surround sound conventions, with additional drivers and definitions provided for upwardly fired sound components.
å¾9A示åºäºå¨å®æ½ä¾ä¸çé对åå°é³é¢ä½¿ç¨å¤ä¸ªè½å¯»åç驱å¨å¨çç¨äºèªéåºé³é¢5.1ç³»ç»çæ¬å£°å¨é ç½®ãå¨é ç½®900ä¸ï¼æ å5.1æ¬å£°å¨å æ¬LFE 901ãä¸å¤®æ¬å£°å¨902ãL/Råç½®æ¬å£°å¨904/906以åL/Råç½®æ¬å£°å¨908/910ï¼å ¶è¢«æä¾æå «ä¸ªé¢å¤ç驱å¨å¨ï¼ç»åºäºæ»å ±14个å¯å¯»åç驱å¨å¨ã卿¯ä¸ªæ¬å£°å¨åå 902-910ä¸ï¼è¿å «ä¸ªé¢å¤ç驱å¨å¨æ¯é¤äºæ æâååâ(æâåé¢â)ç驱å¨å¨å¤è¿æ æâåä¸âåâåä¾§âç驱å¨å¨ãç´æ¥åå驱å¨å¨å°ç±å å«èªéåºé³é¢å¯¹è±¡çå声éåè¢«è®¾è®¡ä¸ºå ·æé«åº¦æ¹åæ§çä»»ä½å ¶ä»ç»ä»¶é©±å¨ãå䏿¿åç(åå°ç)驱å¨å¨å¯ä»¥å 嫿´å å ¨æ¹åæ§ææ æ¹åæ§çå声éå 容ï¼ä½æ¯å¹¶ééå¶å¦æ¤ã示ä¾å°å æ¬èæ¯é³ä¹æç¯å¢å£°é³ã妿å°ç³»ç»çè¾å ¥å æ¬ä¼ ç»ç¯ç»å£°å 容ï¼å该å 容å¯ä»¥è¢«æºè½å°åè§£ä¸ºç´æ¥ååå°çå声é并被é¦éå°åéç驱å¨å¨ãFigure 9A shows a speaker configuration for an adaptive audio 5.1 system using multiple addressable drivers for reflected audio, under an embodiment. In configuration 900, standard 5.1 speakers comprising LFE 901, center speaker 902, L/R front speakers 904/906 and L/R rear speakers 908/910 are provided with eight additional drivers, giving a total 14 addressable drives. In each speaker unit 902-910, the eight additional drivers are drivers labeled "upward" and "sideways" in addition to the drivers labeled "forward" (or "front"). Direct forward drivers will be driven by sub-channels containing Adaptive Audio Objects and any other components designed to be highly directional. Up-firing (reflective) drivers may contain more omnidirectional or non-directional sub-channel content, but are not so limited. Examples would include background music or ambient sounds. If the input to the system includes traditional surround sound content, this content can be intelligently broken down into direct and reflected sub-channels and fed to the appropriate drivers.
对äºç´æ¥å声éï¼æ¬å£°å¨å¤å£³å°å å«å ¶ä¸é©±å¨å¨çä¸è½´å¹³åæ¶å¬ç¯å¢çâæä½³å¬é³ä½ç½®(sweet-spot)âæå£°å¦ä¸å¿ç驱å¨å¨ãå䏿¿åç驱å¨å¨å°è¢«å®ä½ä¸ºä½¿å¾é©±å¨å¨çæ£ä¸é¢å声å¦ä¸å¿ä¹é´çè§åº¦ä¸º45å°180度çèå´å çæä¸ªè§åº¦ãå¨å°é©±å¨å¨å®ä½ä¸º180åº¦çæ åµä¸ï¼é¢ååé¢ç驱å¨å¨å¯ä»¥éè¿ä»åå£åå°æ¥æä¾å£°é³æ©æ£ã该é 置使ç¨è¿æ ·ç声å¦åçï¼å¨å䏿¿åç驱å¨å¨ä¸ç´æ¥é©±å¨å¨è¿è¡æ¶é´å¯¹é½ä¹åï¼æ©å°è¾¾çä¿¡å·åéå°æ¯ç¸å¹²çï¼èæå°è¾¾çåéå°å¾çäºç±æ¶å¬ç¯å¢ææä¾ç天ç¶ç漫å°ãFor direct sub-channels, the speaker enclosure will contain the driver in which the center axis of the driver bisects the "sweet-spot" or acoustic center of the listening environment. An upwardly firing driver will be positioned such that the angle between the midplane of the driver and the acoustic center is somewhere in the range of 45 to 180 degrees. With drivers positioned 180 degrees, rear-facing drivers can provide sound diffusion by reflecting off the rear wall. This configuration uses the acoustic principle that after the upwardly fired driver is time aligned with the direct driver, early arriving signal components will be coherent, while late arriving components will benefit from the natural diffusion provided by the listening environment .
为äºå®ç°ç±èªéåºé³é¢ç³»ç»æä¾çé«åº¦æç¤ºï¼å䏿¿åç驱å¨å¨å¯ä»¥ä»æ°´å¹³å¹³é¢åä¸å¾æï¼å¹¶ä¸å¨æç«¯æ åµä¸å¯ä»¥è¢«å®ä½æåç«ç´åä¸è¾å°å¹¶ä»è¯¸å¦å¹³å¦çå¤©è±æ¿ææ¾ç½®å¨å¤å£³æ£ä¸æ¹çå£°æ©æ£å¨ä¹ç±»çä¸ä¸ªææ´å¤ä¸ªåå°è¡¨é¢åå°ã为æä¾é¢å¤çæ¹åæ§ï¼ä¸å¤®æ¬å£°å¨å¯ä»¥ä½¿ç¨å ·æè·¨å±å¹æçºµå£°é³ä»¥æä¾é«å辨çä¸å¤®å£°éçè½åçé³ç®±é ç½®(诸å¦å¾7Cæç¤ºåºç)ãTo achieve the height cues provided by adaptive audio systems, upwardly-firing drivers can be angled upwards from a horizontal plane, and in extreme cases can be positioned to radiate vertically Reflected by one or more reflective surfaces such as a sound diffuser. To provide additional directionality, the center speaker may use a cabinet configuration (such as that shown in Figure 7C) with the ability to steer sound across the screen to provide a high resolution center channel.
å¾9Aç5.1é ç½®å¯ä»¥éè¿æ·»å ç±»ä¼¼äºæ å7.1é ç½®ç两个é¢å¤çåç½®å¤å£³æ¥æ©å±ãå¾9B示åºäºå¨è¿ç§å®æ½ä¾ä¸ç对äºåå°é³é¢ä½¿ç¨å¤ä¸ªè½å¯»åç驱å¨å¨çèªéåºé³é¢7.1ç³»ç»çæ¬å£°å¨é ç½®ãå¦é ç½®920æç¤ºï¼ä¸¤ä¸ªé¢å¤çå¤å£³922å924被置äºâ左侧ç¯ç»âåâå³ä¾§ç¯ç»âä½ç½®ï¼ä¾§é¢æ¬å£°å¨ä»¥ä¸åç½®å¤å£³ç±»ä¼¼çæ¹å¼æåä¾§å£å¹¶ä¸å䏿¿åç驱å¨å¨è¢«è®¾ç½®ä¸ºå¨ç°æçåãå对ä¹é´çä¸éä»å¤©è±æ¿å¼¹åãå¯ä»¥æ ¹æ®éè¦ä½åºè®¸å¤æ¬¡è¿æ ·çå¢éæ·»å ï¼é¢å¤ç对沿çä¾§é¢å£æåé¢å£å¡«å ç¼éãå¾9Aå9Båªç¤ºåºäºæ¶å¬ç¯å¢çèªéåºé³é¢ç³»ç»ä¸çå¯ä»¥ä¸å䏿¿åå侧颿¿åçæ¬å£°å¨ä¸èµ·ä½¿ç¨çæ©å±çç¯ç»å£°æ¬å£°å¨å¸å±çå¯è½é ç½®çä¸äºç¤ºä¾ï¼å¹¶ä¸è®¸å¤å ¶ä»çé ç½®ä¹æ¯å¯ä»¥çãThe 5.1 configuration of Figure 9A can be expanded by adding two additional rear enclosures similar to the standard 7.1 configuration. Figure 9B shows a speaker configuration for an Adaptive Audio 7.1 system using multiple addressable drivers for reflected audio under such an embodiment. As shown in configuration 920, two additional enclosures 922 and 924 are placed in the "surround left" and "surround right" positions, with the side speakers pointing towards the side walls in a similar fashion to the front enclosures and the upward firing drivers set For bouncing off the ceiling midway between the existing front and rear pairs. Such incremental additions can be made as many times as desired, with the additional pairs filling the gap along the side or rear walls. Figures 9A and 9B show only some examples of possible configurations of extended surround sound speaker layouts that can be used with up-firing and side-firing speakers in an adaptive audio system for a listening environment, and many other configurations are possible .
ä½ä¸ºä¸æææè¿°çn.1é ç½®çæ¿ä»£æ¹æ¡ï¼å¯ä»¥ä½¿ç¨æ´å çµæ´»çåºäºå£³(pod)çç³»ç»ï¼ç±æ¤æ¯ä¸ªé©±å¨å¨é½è¢«å å«å¨å ¶èªå·±çå¤å£³å ï¼èå¤å£³å¯ä»¥å®è£ å¨ä»»ä½æ¹ä¾¿çä½ç½®ãè¿å°ä½¿ç¨è¯¸å¦å¾7Bæç¤ºåºç驱å¨å¨é ç½®ãè¿äºåç¬çåå ç¶åå¯ä»¥æä¸n.1é ç½®ç±»ä¼¼çæ¹å¼èéï¼æå®ä»¬å¯ä»¥å个å°åæ£å¨æ¶å¬ç¯å¢å¨å´ã壳ä¸éè¦å±éäºè¢«æ¾ç½®å¨æ¶å¬ç¯å¢çè¾¹ç¼ï¼å®ä»¬ä¹å¯ä»¥è¢«æ¾ç½®æ¶å¬ç¯å¢å çä»»ä½è¡¨é¢ä¸(ä¾å¦ï¼è¶å ã书æ¶ç)ãè¿æ ·çç³»ç»å°æäºæ©å±ï¼å è®¸ç¨æ·éçæ¶é´æ·»å æ´å¤æ¬å£°å¨ï¼ä»¥å建æ´å æ²æµ¸å¼çä½éªã妿æ¬å£°å¨æ¯æ 线çï¼é£ä¹å£³ç³»ç»å¯å æ¬ç¨äºåå çµç®ççå¯¹æ¥æ¬å£°å¨çè½åã卿¤è®¾è®¡ä¸ï¼å£³å¯ä»¥è¢«å¯¹æ¥å¨ä¸èµ·ä½¿å¾å½å®ä»¬åå çµæ¶å®ä»¬å å½å个æ¬å£°å¨ï¼æè®¸ç¨äºå¬ç«ä½å£°é³ä¹ï¼ç¶åè±ç¦»å¯¹æ¥ç¶æå¹¶å®ä½å¨èªéåºé³é¢å å®¹çæ¶å¬ç¯å¢å¨å´ãAs an alternative to the n.1 configuration described above, a more flexible pod-based system can be used whereby each drive is contained within its own enclosure which can be mounted in any convenient Location. This would use a driver configuration such as that shown in Figure 7B. These individual units can then be aggregated in a similar fashion to the n.1 configuration, or they can be individually dispersed around the listening environment. The cases need not be limited to being placed on the edge of the listening environment, they can also be placed on any surface within the listening environment (eg coffee table, bookshelf, etc.). Such a system would be easily scalable, allowing users to add more speakers over time to create a more immersive experience. If the speaker is wireless, the case system may include the ability to dock the speaker for recharging purposes. In this design, the cases can be docked together such that they act as a single speaker when they are recharged, perhaps for listening to stereo music, and then undocked and positioned around a listening environment for adaptive audio content.
为äºä½¿ç¨å䏿¿åçè½å¯»åç驱å¨å¨æ¥å¢å¼ºèªéåºé³é¢ç³»ç»çå¯é ç½®æ§åå确度ï¼å¯ä»¥åå¤å£³æ·»å 许å¤ä¸ªä¼ æå¨ååé¦è®¾å¤ï¼ä»¥å°å¯ä»¥ç¨äºæ¸²æç®æ³çç¹æ§éç¥ç»æ¸²æå¨ãä¾å¦ï¼å®è£ 卿¯ä¸ªå¤å£³ä¸ç麦å é£ä¼å è®¸ç³»ç»æµéæ¶å¬ç¯å¢çç¸ä½ãé¢çåæ··åç¹æ§ï¼å¹¶ä½¿ç¨ä¸è§æµéåå¤å£³æ¬èº«ç类似äºHRTFç彿°ï¼æ¥æµéæ¬å£°å¨ç¸å¯¹äºå½¼æ¤çä½ç½®ãå¯ä»¥ä½¿ç¨æ¯æ§ä¼ æå¨(ä¾å¦ï¼éèºä»ªãç½çç)æ¥æ£æµå¤å£³çæ¹ååè§åº¦ï¼å¹¶ä¸å¯ä»¥ä½¿ç¨å å¦åè§è§ä¼ æå¨(ä¾å¦ï¼ä½¿ç¨åºäºæ¿å å¨ççº¢å¤æµè·ä»ª)æ¥æä¾ç¸å¯¹äºæ¶å¬ç¯å¢æ¬èº«çä½ç½®ä¿¡æ¯ãè¿äºåªè¡¨ç¤ºå¯å¨ç³»ç»ä¸ä½¿ç¨çéå ä¼ æå¨çå 个å¯è½æ§ï¼å ¶ä»ç乿¯å¯ä»¥çãTo enhance the configurability and accuracy of the adaptive audio system using up-firing addressable drivers, a number of sensors and feedback devices can be added to the enclosure to inform the renderer of properties that can be used in the rendering algorithm. For example, microphones installed in each enclosure would allow the system to measure the phase, frequency, and reverberation characteristics of the listening environment, and use triangulation and HRTF-like functions of the enclosures themselves to measure the position of the speakers relative to each other. Inertial sensors (e.g., gyroscopes, compasses, etc.) can be used to detect the orientation and angle of the housing; and optical and visual sensors (e.g., using laser-based infrared rangefinders) can be used to provide positional information relative to the listening environment itself . These represent only a few possibilities of additional sensors that can be used in the system, others are possible.
å¯ä»¥éè¿å 许å¤å£³ç驱å¨å¨å/æå£°å¦ä¿®æ¹å¨çä½ç½®ç»ç±æºçµä¼ºææºæèªå¨å°è°æ´ï¼æ¥è¿ä¸æ¥å¢å¼ºè¿æ ·çä¼ æå¨ç³»ç»ãè¿ä¼å 许驱å¨å¨çæ¹åæ§å¨è¿è¡æ¶è¢«æ¹åï¼ä»¥éåæ¶å¬ç¯å¢ä¸çå®ä»¬ç¸å¯¹äºå¢åå ¶ä»é©±å¨å¨çå®ä½(â积æçæçºµâ)ã类似å°ï¼å¯ä»¥è°è°ä»»ä½å£°å¦ä¿®æ¹å¨(è¯¸å¦æ¡æ¿ãååææ³¢å¯¼)ï¼ä»¥æä¾ç¨äºå¨ä»»ä½æ¶å¬ç¯å¢é ç½®ä¸é½æä½³åæ¾çæ£ç¡®çé¢çåç¸ä½ååº(â积æçè°è°â)ãå¯ä»¥ååºäºæ¸²æçå 容å¨åå§æ¶å¬ç¯å¢é ç½®æé´(ä¾å¦ï¼åèªå¨EQ/èªå¨æ¿é´é 置系ç»ä¸èµ·)æåæ¾æé´æ§è¡ç§¯æçæçºµå积æçè°è°è¿ä¸¤è ãSuch a sensor system may be further enhanced by allowing the position of the actuator and/or the acoustic modifier of the housing to be automatically adjusted via electromechanical servomechanisms. This would allow the directionality of the drivers to be changed at runtime to suit their positioning relative to walls and other drivers in the listening environment ("active steering"). Similarly, any acoustic modifier (such as a baffle, horn or waveguide) can be tuned to provide the correct frequency and phase response for optimal playback in any listening environment configuration ("aggressive tuning"). Both active manipulation and active tuning may be performed during initial listening environment configuration (eg, with an automatic EQ/automatic room configuration system) or during playback in response to rendered content.
ååäºè¿two-way interconnection
ä¸ç»é ç½®ï¼æ¬å£°å¨å°±å¿ é¡»è¿æ¥å°æ¸²æç³»ç»ãä¼ ç»çäºè¿å ¸åå°ä¸ºä¸¤ç§ç±»åï¼ç¨äºæ æºæ¬å£°å¨çæ¬å£°å¨çº§è¾å ¥åç¨äºææºæ¬å£°å¨ç线级è¾å ¥ãå¦å¾4Cæç¤ºï¼èªéåºé³é¢ç³»ç»450å æ¬ååäºè¿åè½ã该äºè¿å®æ½å¨æ¸²æçº§454忾大å¨/æ¬å£°å¨458å麦å é£çº§460ä¹é´çä¸ç»ç©çåé»è¾è¿æ¥å ã卿¯ä¸ªæ¬å£°å¨ç®±ä¸å¯»åå¤ä¸ªé©±å¨å¨çè½åç±å£°æºåæ¬å£°å¨ä¹é´çè¿äºæºè½äºè¿æ¥æ¯æãååäºè¿å 许ä»å£°æº(渲æå¨)å°æ¬å£°å¨çå æ¬æ§å¶ä¿¡å·åé³é¢ä¿¡å·çä¿¡å·çä¼ è¾ã仿¬å£°å¨å°å£°æºçä¿¡å·å æ¬æ§å¶ä¿¡å·åé³é¢ä¿¡å·è¿ä¸¤è ï¼å ¶ä¸å¨æ¤æ åµä¸ï¼é³é¢ä¿¡å·æ¯æºèªå¯éçå 置麦å é£çé³é¢ãçµåä¹å¯ä»¥ä½ä¸ºååäºè¿çä¸é¨åæ¥æä¾ï¼è³å°ç¨äºæ¬å£°å¨/驱å¨å¨ä¸åå¼å°ä¾çµçæ åµãOnce configured, the speakers must be connected to the rendering system. Traditional interconnects are typically of two types: speaker-level inputs for passive speakers and line-level inputs for active speakers. As shown in Figure 4C, adaptive audio system 450 includes bi-directional interconnection functionality. This interconnection is implemented within a set of physical and logical connections between the rendering stage 454 and amplifier/speaker 458 and microphone stage 460 . The ability to address multiple drivers in each speaker enclosure is supported by these intelligent interconnections between sound sources and speakers. The bi-directional interconnection allows the transmission of signals including control signals and audio signals from the sound source (renderer) to the loudspeakers. Signals from the speaker to the sound source include both control signals and audio signals, where in this case the audio signal is audio originating from an optional built-in microphone. Power could also be provided as part of the bi-directional interconnect, at least for cases where the speaker/drivers are not powered separately.
å¾10æ¯ç¤ºåºäºå¨å®æ½ä¾ä¸çååäºè¿çææçå¾ç¤º1000ãå¯ä»¥ä»£è¡¨æ¸²æå¨å æ¾å¤§å¨/声é³å¤çå¨é¾ç声æº1002éè¿ä¸å¯¹äºè¿é¾è·¯1006å1008é»è¾å°ä¸ç©çå°è¦æ¥å°æ¬å£°å¨ç®±1004ãä»å£°æº1002å°æ¬å£°å¨ç®±1004å ç驱å¨å¨1005çäºè¿1006å æ¬ç¨äºæ¯ä¸ªé©±å¨å¨ççµå£°ä¿¡å·ãä¸ä¸ªææ´å¤ä¸ªæ§å¶ä¿¡å·ä»¥åå¯éççµåã仿¬å£°å¨ç®±1004åå°å£°æº1002çäºè¿1008å æ¬æ¥èªç¨äºæ ¡å渲æå¨æå ¶ä»ç±»ä¼¼ç声é³å¤çåè½ç麦å é£1007æå ¶ä»ä¼ æå¨ç声é³ä¿¡å·ãåé¦äºè¿1008è¿å å«è¢«æ¸²æå¨ç¨æ¥ä¿®æ¹æå¤ç被设置为éè¿äºè¿1006å°é©±å¨å¨ç声é³ä¿¡å·çæäºé©±å¨å¨å®ä¹ååæ°ãFigure 10 is a diagram 1000 showing the composition of a bidirectional interconnect under an embodiment. Sound source 1002 , which may represent a renderer plus amplifier/sound processor chain, is logically and physically coupled to speaker enclosure 1004 by a pair of interconnecting links 1006 and 1008 . The interconnection 1006 from the sound source 1002 to the drivers 1005 within the speaker enclosure 1004 includes electro-acoustic signals for each driver, one or more control signals and optionally electrical power. An interconnect 1008 from the speaker enclosure 1004 back to the sound source 1002 includes sound signals from a microphone 1007 or other transducer used to calibrate a renderer or other similar sound processing functions. Feedback interconnect 1008 also contains certain driver definitions and parameters used by the renderer to modify or process sound signals that are set to the drivers through interconnect 1006 .
å¨å®æ½ä¾ä¸ï¼å¨ç³»ç»è®¾ç½®æé´ï¼ç»ç³»ç»çæ¯ä¸ªç®±ä¸çæ¯ä¸ªé©±å¨å¨åé æ è¯ç¬¦(ä¾å¦ï¼æ°å¼ææ´¾)ãæ¯ä¸ªæ¬å£°å¨ç®±(å¤å£³)ä¹å¯ä»¥è¢«å¯ä¸å°æ è¯ãæ¤æ°å¼ææ´¾è¢«æ¬å£°å¨ç®±ç¨æ¥ç¡®å®åç®±å çåªä¸ªé©±å¨å¨åéåªä¸ªé³é¢ä¿¡å·ãæè¿°ææ´¾ä»¥åéçåå¨å¨è®¾å¤åå¨å¨æ¬å£°å¨ç®±ä¸ãæ¿ä»£å°ï¼æ¯ä¸ªé©±å¨å¨é½å¯ä»¥è¢«é 置为å°å ¶èªå·±çæ è¯ç¬¦åå¨å°æ¬å°åå¨å¨ä¸ãå¨è¿ä¸æ¥çæ¿ä»£æ¹æ¡ä¸ï¼è¯¸å¦é©±å¨å¨/æ¬å£°å¨æ²¡ææ¬å°åå¨å®¹éçæ¹æ¡ä¸ï¼æ è¯ç¬¦å¯ä»¥åå¨å¨å£°æº1002å çæ¸²æçº§æå ¶ä»ç»ä»¶ä¸ã卿¬å£°å¨åç°æé´ï¼ç±å£°æºæ¥è¯¢æ¯ä¸ªæ¬å£°å¨(æä¸å¿æ°æ®åº)çç®æ¡£ãç®æ¡£å®ä¹äºæäºé©±å¨å¨å®ä¹ï¼å æ¬æ¬å£°å¨ç®±æå ¶ä»æå®ä¹çéµåä¸ç驱å¨å¨çæ°éãæ¯ä¸ªé©±å¨å¨ç声é³ç¹æ§(ä¾å¦ï¼é©±å¨å¨ç±»åãé¢çååºç)ãæ¯ä¸ªé©±å¨å¨çä¸å¿ç¸å¯¹äºæ¬å£°å¨ç®±çåé¢çä¸å¿çx,y,zä½ç½®ãæ¯ä¸ªé©±å¨å¨ç¸å¯¹äºæå®ä¹çå¹³é¢(ä¾å¦ï¼å¤©è±æ¿ãå°æ¿ãç®±çåç´è½´ç)çè§åº¦ä»¥å麦å é£çæ°éå麦å é£ç¹æ§ãä¹å¯ä»¥å®ä¹å ¶ä»ç¸å ³ç驱å¨å¨ä»¥å麦å é£/ä¼ æå¨åæ°ãå¨å®æ½ä¾ä¸ï¼é©±å¨å¨å®ä¹ä»¥åæ¬å£°å¨ç®±ç®æ¡£å¯ä»¥è¢«è¡¨è¾¾ä¸ºç±æ¸²æå¨ä½¿ç¨çä¸ä¸ªææ´å¤ä¸ªXMLææ¡£ãIn an embodiment, during system setup, each drive in each box of the system is assigned an identifier (eg, a numerical assignment). Each speaker enclosure (enclosure) can also be uniquely identified. This numerical assignment is used by the speaker enclosure to determine which audio signal is sent to which driver within the enclosure. The assignments are stored in the loudspeaker enclosure with a suitable memory device. Alternatively, each driver can be configured to store its own identifier in local memory. In a further alternative, such as where the drivers/speakers have no local storage capacity, the identifiers may be stored in a rendering stage or other component within the sound source 1002 . During speaker discovery, the profile of each speaker (or a central database) is queried by the sound source. A profile defines certain driver definitions, including the number of drivers in a speaker cabinet or other defined array, the sonic characteristics of each driver (e.g. driver type, frequency response, etc.), the center of each driver relative to the speaker cabinet The x,y,z position of the center of the front, the angle of each driver relative to a defined plane (eg, ceiling, floor, vertical axis of the box, etc.), and the number and characteristics of the microphones. Other related driver and microphone/sensor parameters can also be defined. In an embodiment, driver definitions and speaker enclosure profiles may be expressed as one or more XML documents used by the renderer.
å¨ä¸ä¸ªå¯è½çå®ç°ä¸ï¼å¨å£°æº1002忬声å¨ç®±1004ä¹é´å建ç½é åè®®(IP)æ§å¶ç½ç»ãæ¯ä¸ªæ¬å£°å¨ç®±å声æºé½å å½å个ç½ç»ç«¯ç¹ï¼å¹¶å¨åå§åæéçµæ¶è¢«ç»äºæ¬å°é¾è·¯å°åãå¯ä»¥ä½¿ç¨è¯¸å¦é¶é ç½®ç»ç½(zeroconf)ä¹ç±»çèªå¨åç°æºå¶æ¥å 许声æºå®ä½ç½ç»ä¸çæ¯ä¸ªæ¬å£°å¨ãé¶é ç½®ç»ç½æ¯å¨æ²¡æäººå·¥æä½åå¹²é¢æç¹æ®é ç½®æå¡å¨çæ åµä¸èªå¨å°å建å¯ä½¿ç¨çIPç½ç»èæ é人工è¿ç¨ç示ä¾ï¼å¹¶ä¸å¯ä»¥ä½¿ç¨å ¶ä»ç±»ä¼¼çææ¯ãç»å®æºè½ç½ç»ç³»ç»ï¼å¤ä¸ªæºå¯ä»¥ä½ä¸ºæ¬å£°å¨é©»çå¨IPç½ç»ä¸ãè¿å 许å¤ä¸ªæºå¨ä¸éè¿â主âé³é¢æº(ä¾å¦ï¼ä¼ ç»çA/Væ¥æ¶å¨)è¿è¡è·¯ç±å£°é³çæ åµä¸ç´æ¥é©±å¨æ¬å£°å¨ã妿å¦ä¸ä¸ªæºè¯å¾å¯»åæ¬å£°å¨ï¼åå¨æææºä¹é´æ§è¡éä¿¡ï¼ä»¥ç¡®å®åªä¸ªæºå½åæ¯âæ´»è·çâãæ¯å¦æå¿ è¦ä¸ºæ´»è·ç以忧嶿¯å¦å¯ä»¥è¿æ¸¡å°æ°ç声æºãå¨å¶é è¿ç¨ä¸ï¼å¯ä»¥åºäºæºçåç±»ç»æºé¢å æå®ä¼å 级ï¼ä¾å¦ï¼çµä¿¡æºå¯ä»¥å ·ææ¯å¨±ä¹æºæ´é«çä¼å 级ãå¨è¯¸å¦å ¸åçå®¶åºç¯å¢ä¹ç±»ç夿¿é´ç¯å¢ä¸ï¼æ»ä½ç¯å¢å çæææ¬å£°å¨é½å¯ä»¥é©»çå¨å个ç½ç»ä¸ï¼ä½æ¯ï¼å¯è½ä¸éè¦è¢«åæ¶å¯»åãå¨è®¾ç½®åèªå¨é ç½®æé´ï¼éè¿äºè¿1008å¾åæä¾ç声é³çµå¹³å¯ç¨äºç¡®å®åªäºæ¬å£°å¨ä½äºåä¸ä¸ªç©ç空é´ä¸ã䏿¦ç¡®å®äºè¯¥ä¿¡æ¯ï¼æ¬å£°å¨å¯ä»¥è¢«åç»æé群ã卿¤æ åµä¸ï¼å¯ä»¥åé é群ID并使å®ä»¬æä¸ºé©±å¨å¨å®ä¹çä¸é¨åãåæ¯ä¸ªæ¬å£°å¨åéé群IDï¼å¹¶æ¯éé群å¯ä»¥ç±å£°æº1002忶坻åãIn one possible implementation, an Internet Protocol (IP) control network is created between the sound source 1002 and the speaker enclosure 1004 . Each loudspeaker cabinet and sound source acts as a single network endpoint and is given a link-local address upon initialization or power-up. An auto-discovery mechanism such as zero-configuration networking (zeroconf) can be used to allow sound sources to locate each speaker on the network. Zero-configuration networking is an example of automatically creating a usable IP network without manual process without human operator intervention or special configuration servers, and other similar techniques may be used. Given an intelligent network system, multiple sources can reside in the IP network as speakers. This allows multiple sources to directly drive speakers without routing the sound through a "main" audio source (eg, a traditional A/V receiver). If another source attempts to address the speaker, a communication is performed between all sources to determine which source is currently "active", whether it is necessary to be active, and whether control can transition to the new sound source. During manufacturing, sources may be pre-assigned priorities based on their classification, for example, telecommunications sources may have higher priority than entertainment sources. In a multi-room environment such as a typical home environment, all speakers within the general environment may reside on a single network, but may not need to be addressed simultaneously. During setup and auto-configuration, sound levels provided back through the interconnect 1008 can be used to determine which speakers are located in the same physical space. Once this information is determined, speakers can be grouped into clusters. In this case, cluster IDs can be assigned and made part of the driver definition. A cluster ID is sent to each speaker and every other cluster can be addressed by the sound source 1002 simultaneously.
å¦å¾10æç¤ºï¼å¯ä»¥éè¿ååäºè¿ï¼ä¼ è¾å¯éççµåä¿¡å·ãæ¬å£°å¨å¯ä»¥æ¯æ æºç(è¦æ±æ¥èªå£°æºçå¤é¨çµå)æææºç(è¦æ±æ¥èªçµæºæåº§ççµå)ã妿æ¬å£°å¨ç³»ç»å æ¬æ²¡ææ çº¿æ¯æçææºçæ¬å£°å¨ï¼åå°æ¬å£°å¨çè¾å ¥å æ¬IEEE 802.3å ¼å®¹çæçº¿ä»¥å¤ªç½è¾å ¥ã妿æ¬å£°å¨ç³»ç»å æ¬å¸¦ææ çº¿æ¯æçææºçæ¬å£°å¨ï¼åå°æ¬å£°å¨çè¾å ¥å æ¬IEEE 802.11å ¼å®¹çæ çº¿ä»¥å¤ªç½è¾å ¥ï¼æè æ¿ä»£å°ï¼å æ¬ç±WISAç»ç»ææå®çæ 线æ åãæ æºæ¬å£°å¨å¯ä»¥éè¿ç±å£°æºç´æ¥æä¾çåéççµåä¿¡å·æ¥ä¾çµãAs shown in Figure 10, optional power signals can be transmitted through a bidirectional interconnection. Speakers can be passive (requiring external power from the sound source) or active (requiring power from an electrical outlet). If the speaker system includes powered speakers without wireless support, the inputs to the speakers include IEEE 802.3 compliant wired Ethernet inputs. If the speaker system includes powered speakers with wireless support, the input to the speaker includes an IEEE 802.11 compliant wireless Ethernet input, or alternatively, a wireless standard as specified by the WISA organization. Passive loudspeakers can be powered by a suitable electrical signal provided directly by the sound source.
ç³»ç»é ç½®åæ ¡åSystem Configuration and Calibration
å¦å¾4Cæç¤ºï¼èªéåºé³é¢ç³»ç»çåè½å æ¬æ ¡ååè½462ãæ¤åè½ç±å¾10ä¸æç¤ºç麦å é£1007åäºè¿1008é¾è·¯æ¥å®ç°ãç³»ç»1000ä¸ç麦å é£ç»ä»¶çåè½æ¯æµéæ¶å¬ç¯å¢ä¸çå个驱å¨å¨çååºï¼ä»¥ä¾¿å¯¼åºæ´ä¸ªç³»ç»ååºã为æ¤ç®çï¼å¯ä»¥ä½¿ç¨å æ¬å个麦å é£æéº¦å é£éµåçå¤ä¸ªéº¦å 飿æãæç®åçæ åµæ¯ä½¿ç¨ä½äºæ¶å¬ç¯å¢çä¸å¿çåä¸ªå ¨æ¹åæ§æµé麦å é£ï¼æ¥æµéæ¯ä¸ªé©±å¨å¨çååºã妿æ¶å¬ç¯å¢ååæ¾æ¡ä»¶ä¿è¯äºæ´å ç²¾ç»çåæï¼åå¯ä»¥ä½¿ç¨å¤ä¸ªéº¦å é£ãå¤ä¸ªéº¦å é£çææ¹ä¾¿çä½ç½®å¨ç¨äºæ¶å¬ç¯å¢ä¸çç¹å®æ¬å£°å¨é ç½®çç©çæ¬å£°å¨ç®±å ãå®è£ 卿¯ä¸ªå¤å£³ä¸ç麦å é£å 许系ç»å¨æ¶å¬ç¯å¢ä¸çå¤ä¸ªä½ç½®æµéæ¯ä¸ªé©±å¨å¨çååºãæ¤ææçæ¿ä»£æ¹æ¡æ¯ä½¿ç¨ä½äºæ¶å¬ç¯å¢ä¸çå¯è½çå¬è ä½ç½®çå¤ä¸ªå ¨æ¹åæ§æµé麦å é£ãAs shown in FIG. 4C , the functionality of the adaptive audio system includes a calibration function 462 . This functionality is implemented by the microphone 1007 and interconnect 1008 links shown in FIG. 10 . The function of the microphone assembly in system 1000 is to measure the response of individual drivers in the listening environment in order to derive the overall system response. For this purpose, multiple microphone topologies including single microphones or microphone arrays may be used. In the simplest case, the response of each driver is measured using a single omnidirectional measurement microphone located in the center of the listening environment. Multiple microphones may be used if the listening environment and playback conditions warrant a more detailed analysis. The most convenient location for multiple microphones is within the physical speaker enclosure for the particular speaker configuration in the listening environment. Microphones mounted in each enclosure allow the system to measure the response of each driver at multiple locations in the listening environment. An alternative to this topology is to use multiple omnidirectional measurement microphones located at likely listener positions in the listening environment.
麦å é£ç¨æ¥ä½¿å¾è½å¤å¯¹æ¸²æå¨ååå¤çç®æ³è¿è¡èªå¨é ç½®åæ ¡åãå¨èªéåºé³é¢ç³»ç»ä¸ï¼æ¸²æå¨è´è´£å°æ··åå对象ååºäºå£°éçé³é¢æµè½¬æ¢æé对ä¸ä¸ªææ´å¤ä¸ªç©çæ¬å£°å¨å çå ·ä½çå¯å¯»åç驱å¨å¨æå®çå个é³é¢ä¿¡å·ãåå¤çç»ä»¶å¯å æ¬ï¼å»¶è¿ãåè¡¡ãå¢çãæ¬å£°å¨èæå以åå䏿··åãæ¬å£°å¨é ç½®ä»£è¡¨å¸¸ä¸ºå ³é®çä¿¡æ¯ï¼æ¸²æå¨ç»ä»¶å¯ä»¥ä½¿ç¨è¯¥ä¿¡æ¯å°æ··åå对象ååºäºå£°éçé³é¢æµè½¬æ¢ä¸ºåä¸ªçæ¯ä¸ªé©±å¨å¨çé³é¢ä¿¡å·ï¼ä»¥æä¾é³é¢å 容çæä½³åæ¾ãç³»ç»é 置信æ¯å æ¬ï¼(1)ç³»ç»ä¸çç©çæ¬å£°å¨çæ°éï¼(2)æ¯ä¸ªæ¬å£°å¨ä¸çå¯åç¬å¯»åç驱å¨å¨çæ°éï¼ä»¥å(3)æ¯ä¸å¯åç¬å¯»åç驱å¨å¨ç¸å¯¹äºæ¶å¬ç¯å¢å ä½å½¢ç¶çä½ç½®åæ¹åãå ¶ä»ç¹æ§ä¹æ¯å¯ä»¥çãå¾11示åºäºå¨å®æ½ä¾ä¸çèªå¨é ç½®åç³»ç»æ ¡åç»ä»¶çåè½ãå¦å¾ç¤º1100æç¤ºï¼ä¸ä¸ªææ´å¤ä¸ªéº¦å é£çéµå1102åé ç½®åæ ¡åç»ä»¶1104æä¾å£°å¦ä¿¡æ¯ã该声å¦ä¿¡æ¯æææ¶å¬ç¯å¢çæäºç¸å ³ç¹æ§ãç¶åï¼é ç½®åæ ¡åç»ä»¶1104å°è¯¥ä¿¡æ¯æä¾å°æ¸²æå¨1106以åä»»ä½ç¸å ³åå¤çç»ä»¶1108ï¼ä½¿å¾é对æ¶å¬ç¯å¢è°æ´åä¼åæç»åéå°æ¬å£°å¨çé³é¢ä¿¡å·ãThe microphone is used to enable automatic configuration and calibration of the renderer and post-processing algorithms. In an adaptive audio system, a renderer is responsible for converting a hybrid object and channel-based audio stream into a single audio signal specified for a specific addressable driver within one or more physical speakers. Post-processing components can include: delay, equalization, gain, speaker virtualization, and upmixing. Speaker configurations represent often critical information that the Renderer component can use to convert hybrid object and channel-based audio streams into individual per-driver audio signals to provide optimal playback of the audio content. System configuration information includes: (1) the number of physical speakers in the system, (2) the number of individually addressable drivers in each speaker, and (3) the relative geometry of each individually addressable driver to the listening environment The position and orientation of the shape. Other properties are also possible. Figure 11 illustrates the functionality of the autoconfiguration and system calibration component under an embodiment. As shown in diagram 1100 , array 1102 of one or more microphones provides acoustic information to configuration and calibration component 1104 . This acoustic information captures certain relevant characteristics of the listening environment. Configuration and calibration component 1104 then provides this information to renderer 1106 and any relevant post-processing components 1108 so that the audio signal ultimately sent to the speakers is adjusted and optimized for the listening environment.
ç³»ç»ä¸çç©çæ¬å£°å¨çæ°éä»¥åæ¯ä¸ªæ¬å£°å¨ä¸çå¯åç¬å¯»åç驱å¨å¨çæ°éæ¯ç©çæ¬å£°å¨å±æ§ãè¿äºå±æ§ç»ç±ååäºè¿456ç´æ¥ä»æ¬å£°å¨ä¼ è¾å°æ¸²æå¨454ãæ¸²æå¨åæ¬å£°å¨ä½¿ç¨å ±åçåç°åè®®ï¼ä½¿å¾å½æ¬å£°å¨è¿æ¥å°ç³»ç»æä¸ç³»ç»æå¼è¿æ¥æ¶ï¼æ¸²æå¨è¢«éç¥ååå¹¶å¯ä»¥ç¸åºå°éæ°é 置系ç»ãThe number of physical speakers in the system and the number of individually addressable drivers in each speaker are physical speaker attributes. These properties are transmitted directly from the speakers to the renderer 454 via the bidirectional interconnect 456 . The renderer and speakers use a common discovery protocol so that when a speaker is connected to or disconnected from the system, the renderer is notified of the change and can reconfigure the system accordingly.
æ¶å¬ç¯å¢çå ä½å½¢ç¶(大å°ä¸å½¢ç¶)æ¯é ç½®åæ ¡åè¿ç¨ä¸çå¿ è¦ä¿¡æ¯é¡¹ãå ä½å½¢ç¶å¯ä»¥ä»¥è®¸å¤ä¸åçæ¹å¼ç¡®å®ãå¨äººå·¥é 置模å¼ä¸ï¼ç±å¬è æææ¯äººåéè¿å渲æå¨æèªéåºé³é¢ç³»ç»å çå ¶ä»å¤çåå æä¾è¾å ¥çç¨æ·ç颿æ¶å¬ç¯å¢çæå°å å´ç«æ¹ä½ç宽度ãé¿åº¦åé«åº¦è¾å ¥å°ç³»ç»ãåç§ä¸åçç¨æ·ç颿æ¯åå·¥å ·å¯ä»¥ç¨äºæ¤ç®çãä¾å¦ï¼å¯ä»¥ç±èªå¨å°ç»å¶æè·è¸ªæ¶å¬ç¯å¢çå ä½å½¢ç¶çç¨åºå渲æå¨åéæ¶å¬ç¯å¢å ä½å½¢ç¶ãè¿ç§ç³»ç»å¯ä»¥ä½¿ç¨è®¡ç®æºè§è§ã声纳以å3Dåºäºæ¿å å¨çç©çç»å¾çç»åãThe geometry (size and shape) of the listening environment is a necessary item of information in the configuration and calibration process. Geometry can be determined in many different ways. In manual configuration mode, the width, length and height of the minimal bounding cube of the listening environment are input to the system by the listener or technician through a user interface that provides input to a renderer or other processing unit within the adaptive audio system. A variety of different user interface techniques and tools can be used for this purpose. For example, the listening environment geometry may be sent to the renderer by a program that automatically draws or tracks the geometry of the listening environment. Such a system could use a combination of computer vision, sonar, and 3D laser-based physical mapping.
渲æå¨ä½¿ç¨æ¬å£°å¨å¨æ¶å¬ç¯å¢å ä½å½¢ç¶å çä½ç½®æ¥å¯¼åºç¨äºæ¯ä¸ªå¯åç¬å¯»åç驱å¨å¨(å æ¬ç´æ¥ååå°(å䏿¿å)驱å¨å¨)çé³é¢ä¿¡å·ãç´æ¥é©±å¨å¨æ¯è¢«çå为使å¾å®ä»¬ç忣徿¡(dispersion pattern)ç大夿°å¨è¢«ä¸ä¸ªææ´å¤ä¸ªåå°è¡¨é¢(诸å¦å°æ¿ãå¢æå¤©è±æ¿)漫å°ä¹å䏿¶å¬ä½ç½®ç¸äº¤ç驱å¨å¨ãåå°é©±å¨å¨æ¯è¢«çå为使å¾å®ä»¬ç忣徿¡ç大夿°å¨ä¸æ¶å¬ä½ç½®ç¸äº¤ä¹å被åå°ç驱å¨å¨ï¼è¯¸å¦å¾6ä¸æç¤ºåºçãå¦æç³»ç»å¤äºäººå·¥é 置模å¼ï¼åå¯ä»¥éè¿UIåç³»ç»è¾å ¥æ¯ä¸ªç´æ¥é©±å¨å¨ç3Dåæ ã对äºåå°é©±å¨å¨ï¼åUIè¾å ¥åå§åå°ç3Dåæ ãå¯ä»¥ä½¿ç¨æ¿å å¨æç±»ä¼¼çææ¯æ¥å¯è§åæ£å¼ç驱å¨å¨å°æ¶å¬ç¯å¢ç表é¢ä¸ç忣徿¡ï¼å¦æ¤å¯ä»¥æµéåº3Dåæ å¹¶äººå·¥å°è¾å ¥å°ç³»ç»ãThe renderer uses the speaker's position within the listening environment geometry to derive an audio signal for each individually addressable driver, including direct and reflex (up-firing) drivers. Direct drivers are drivers aimed such that the majority of their dispersion pattern intersects the listening position before being diffused by one or more reflective surfaces such as a floor, wall or ceiling. Reflective drivers are drivers aimed such that the majority of their scatter pattern is reflected before intersecting the listening position, such as that shown in FIG. 6 . If the system is in manual configuration mode, the 3D coordinates of each direct drive can be entered into the system through the UI. For reflection drivers, feed the UI the 3D coordinates of the original reflection. A laser or similar technique can be used to visualize the scattered driver onto the surface of the listening environment as a scatter pattern, such that 3D coordinates can be measured and manually entered into the system.
驱å¨å¨ä½ç½®åçåå ¸åå°ä½¿ç¨äººå·¥çæèªå¨çææ¯æ¥æ§è¡ãå¨æäºæ åµä¸ï¼å¯ä»¥å°æ¯æ§ä¼ æå¨å å«å°æ¯ä¸ªæ¬å£°å¨ä¸ã卿¤æ¨¡å¼ä¸ï¼ä¸å¤®æ¬å£°å¨è¢«æå®ä¸ºâ主âå¹¶ä¸å ¶ç½çæµéå¼è¢«è§ä¸ºåèãç¶åï¼å ¶ä»æ¬å£°å¨ä¼ è¾å®ä»¬çå¯åç¬å¯»åç驱å¨å¨ä¸çæ¯ä¸ä¸ªç忣徿¡åç½çä½ç½®ã䏿¶å¬ç¯å¢å ä½å½¢ç¶è¦æ¥ï¼ä¸å¤®æ¬å£°å¨åæ¯ä¸ªéå 驱å¨å¨çåèè§ä¹é´çå·®å¼ä¸ºç³»ç»èªå¨å°ç¡®å®é©±å¨å¨æ¯ç´æ¥è¿æ¯åå°æä¾è¶³å¤çä¿¡æ¯ãDriver positioning and aiming is typically performed using manual or automated techniques. In some cases, inertial sensors can be incorporated into each speaker. In this mode, the center speaker is designated as "main" and its compass measurement is taken as the reference. The other speakers then transmit the scatter pattern and compass position of each of their individually addressable drivers. Coupled with the listening environment geometry, the difference between the reference angles of the center speaker and each additional driver provides sufficient information for the system to automatically determine whether the drivers are direct or reflective.
å¦æä½¿ç¨3Dä½ç½®(å³ï¼Ambisonic)麦å é£ï¼åå¯ä»¥å®å ¨èªå¨åæ¬å£°å¨ä½ç½®é ç½®ã卿¤æ¨¡å¼ä¸ï¼ç³»ç»åæ¯ä¸ªé©±å¨å¨åéæµè¯ä¿¡å·å¹¶è®°å½ååºãåå³äºéº¦å é£ç±»åï¼ä¿¡å·å¯è½éè¦è¢«è½¬æ¢æxãyãz表示ãåæè¿äºä¿¡å·ï¼ä»¥æ¾åºå ä¼å¿ç第ä¸å°è¾¾çxãyåzåéã䏿¶å¬ç¯å¢å ä½å½¢ç¶è¦æ¥ï¼è¿é常为系ç»èªå¨å°è®¾ç½®æææ¬å£°å¨ä½ç½®(ç´æ¥æåå°ç)ç3Dåæ æä¾äºè¶³å¤çä¿¡æ¯ãåå³äºæ¶å¬ç¯å¢å ä½å½¢ç¶ï¼ç¨äºé ç½®æ¬å£°å¨åæ çä¸ä¸ªææè¿°çæ¹æ³çæ··åç»åå¯ä»¥æ¯åªåç¬å°ä½¿ç¨ä¸ç§ææ¯æ´ææãIf a 3D positional (ie, ambisonic) microphone is used, speaker position configuration can be fully automated. In this mode, the system sends a test signal to each drive and records the response. Depending on the microphone type, the signal may need to be converted into an x, y, z representation. These signals are analyzed to find the dominant first arriving x, y and z components. Coupled with the listening environment geometry, this usually provides enough information for the system to automatically set the 3D coordinates of all loudspeaker positions (direct or reflected). Depending on the listening environment geometry, a hybrid combination of the three described methods for configuring loudspeaker coordinates may be more efficient than using only one technique alone.
æ¬å£°å¨é ç½®ä¿¡æ¯æ¯é 置渲æå¨æéçä¸ä¸ªç»ä»¶ãæ¬å£°å¨æ ¡åä¿¡æ¯ä¹æ¯é ç½®åå¤çé¾(å»¶è¿ãåè¡¡åå¢ç)æå¿ éçãå¾12æ¯ç¤ºåºäºå¨å®æ½ä¾ä¸ç使ç¨å个麦å 飿¥æ§è¡èªå¨æ¬å£°å¨æ ¡åçå¤çæ¥éª¤çæµç¨å¾ã卿¤æ¨¡å¼ä¸ï¼ç±ç³»ç»ä½¿ç¨ä½äºæ¶å¬ä½ç½®çä¸é´çåä¸ªå ¨æ¹åæ§æµé麦å 飿¥èªå¨å°è®¡ç®å»¶è¿ãåè¡¡åå¢çãå¦å¾1200æç¤ºï¼å¨å1202ä¸ï¼è¿ç¨æµéæ¯å个驱å¨å¨çæ¿é´èå²ååºå¼å§ãç¶åï¼å¨å1204ä¸ï¼éè¿æ±åºå£°èå²ååº(å©ç¨éº¦å 飿æå°ç)ä¸ç´æ¥ææå°ççµèå²ååºçäºç¸å ³çå³°å¼åç§»æ¥è®¡ç®æ¯ä¸ªé©±å¨å¨çå»¶è¿ãå¨å1206ä¸ï¼å°è®¡ç®åºçå»¶è¿åºç¨äºç´æ¥ææå°ç(åè)èå²ååºãç¶åï¼å¨å1208ä¸ï¼è¿ç¨ç¡®å®å½åºç¨äºæµéçèå²ååºæ¶å¯¼è´å ¶ä¸ç´æ¥ææç(åè)èå²ååºä¹é´ç差弿å°çå®½å¸¦åæ¯ä¸ªå¸¦çå¢çå¼ã该æ¥éª¤å¯ä»¥è¿æ ·è¿è¡ï¼æ±åæµéçèå²ååºååèèå²ååºççªå£åçFFTï¼è®¡ç®ä¸¤ä¸ªä¿¡å·ä¹é´çæ¯åºé´(bin)çé弿¯ï¼å¯¹æ¯åºé´çé弿¯åºç¨ä¸å¼æ»¤æ³¢å¨ï¼éè¿å¹³ååå®å ¨è½å ¥å¸¦å çææåºé´çå¢çæ¥è®¡ç®æ¯ä¸ªå¸¦çå¢çå¼ï¼éè¿æ±åææçæ¯ä¸ªå¸¦çå¢ççå¹³å弿¥è®¡ç®å®½å¸¦å¢çï¼ä»æ¯ä¸ªå¸¦çå¢çåå»å®½å¸¦å¢çï¼å¹¶ä¸åºç¨å°å®¤Xæ²çº¿(å¨2kHz以ä¸ï¼-2dB/åé¢)ã䏿¦å¨å1208ä¸ç¡®å®äºå¢çå¼ï¼å¨1210ä¸ï¼è¿ç¨éè¿ä»å ¶ä»å¼å廿å°å»¶è¿æ¥ç¡®å®æç»çå»¶è¿å¼ï¼ä½¿å¾ç³»ç»ä¸çè³å°ä¸ä¸ªé©±å¨å¨å°æ»æ¯å ·æé¶é¢å¤å»¶è¿ãSpeaker configuration information is a component required to configure the renderer. Speaker calibration information is also necessary to configure the post-processing chain (delay, equalization and gain). Figure 12 is a flow diagram illustrating process steps for performing automatic speaker calibration using a single microphone, under an embodiment. In this mode, delay, equalization and gain are automatically calculated by the system using a single omnidirectional measurement microphone located in the middle of the listening position. As shown in diagram 1200, in block 1202, the process begins by measuring the room impulse response of each individual driver. Then, in block 1204, the delay of each driver is calculated by finding the peak shift of the cross-correlation of the acoustic impulse response (captured with the microphone) and the directly captured electrical impulse response. In block 1206, the calculated delay is applied to the directly captured (reference) impulse response. Then, in block 1208, the process determines the broadband and gain values for each band that, when applied to the measured impulse response, result in the smallest difference between it and the directly captured (reference) impulse response. This step can be performed by taking a windowed FFT of the measured impulse response and the reference impulse response, computing the magnitude ratio per bin between the two signals, and applying a median filter to the magnitude ratio per bin , calculates the gain value for each band by averaging the gains of all intervals that fall completely within the band, and calculates the wideband gain by averaging all the gains for each band, subtracting from the gain for each band Wideband gain, and apply a chamber X-curve (-2dB/octave above 2kHz). Once the gain value is determined in block 1208, in 1210 the process determines the final delay value by subtracting the minimum delay from the other values such that at least one driver in the system will always have zero extra delay.
å¨ä½¿ç¨å¤ä¸ªéº¦å é£è¿è¡èªå¨æ ¡åçæ åµä¸ï¼ç±ç³»ç»ä½¿ç¨å¤ä¸ªå ¨æ¹åæ§æµé麦å 飿¥èªå¨å°è®¡ç®å»¶è¿ãåè¡¡åå¢çãè¿ç¨åºæ¬ä¸ä¸å麦å 飿æ¯ç¸åï¼é¤äºé对æ¯ä¸ªéº¦å é£éå¤è¯¥è¿ç¨å¹¶å°ç»æå¹³åãIn the case of automatic calibration using multiple microphones, delay, equalization and gain are automatically calculated by the system using multiple omnidirectional measurement microphones. The process is basically the same as the single-microphone technique, except the process is repeated for each microphone and the results are averaged.
æ¿ä»£çåºç¨alternative application
å¯ä»¥å¨è¯¸å¦çµè§ãè®¡ç®æºãæ¸¸ææ§å¶å°æç±»ä¼¼ç设å¤ä¹ç±»çæ´ä¸ºå±é¨åçåºç¨ä¸å®ç°èªéåºé³é¢ç³»ç»çæ¹é¢ï¼èé卿´ä¸ªæ¶å¬ç¯å¢æå§é¢ä¸å®ç°èªéåºé³é¢ç³»ç»ã该æ 嵿æå°ä¾èµäºæåå¨ä¸è§çå±å¹æçè§å¨è¡¨é¢å¯¹åºçå¹³é¢ä¸çæ¬å£°å¨ãå¾13示åºäºå¨ç¤ºä¾æ§ççµè§æºåé³ç®±ä½¿ç¨æ åµä¸ä½¿ç¨èªéåºé³é¢ç³»ç»ãä¸è¬èè¨ï¼åºäºè®¾å¤(TVæ¬å£°å¨ãé³ç®±æ¬å£°å¨çç)ç常常éä½çè´¨éåå¨ç©ºé´åè¾¨çæ¹é¢åéå¶(å³ï¼æ ç¯ç»æåç½®æ¬å£°å¨)çæ¬å£°å¨ä½ç½®/é ç½®ï¼çµè§ä½¿ç¨æ 嵿ä¾äºåå»ºæ²æµ¸å¼é³é¢ä½éªçææãå¾13çç³»ç»1300å æ¬æ åçµè§å·¦ãå³ä½ç½®(TV-LåTV-R)çæ¬å£°å¨ä»¥åå·¦ãå³å䏿¿åç驱å¨å¨(TV-LHåTV-RH)ãçµè§1302è¿å¯ä»¥å æ¬é³ç®±1304ææç§é«åº¦éµåä¸çæ¬å£°å¨ãä¸è¬èè¨ï¼ä¸ç¬ç«å¼æå®¶åºå§é¢æ¬å£°å¨ç¸æ¯ï¼çµè§æ¬å£°å¨ç大å°åè´¨éç±äºææ¬çº¦æåè®¾è®¡éæ©èéä½ãç¶èï¼ä½¿ç¨å¨æèæåå¯ä»¥å¸®å©å æè¿äºç¼ºé·ãå¨å¾13ä¸ï¼ç¤ºåºäºé对TV-LåTV-Ræ¬å£°å¨ç卿èæåææï¼ä½¿å¾ç¹å®æ¶å¬ä½ç½®1308å¤ç人ä¼å¬å°ä¸å¨æ°´å¹³é¢åç¬å°æ¸²æçåéçé³é¢å¯¹è±¡ç¸å ³èçæ°´å¹³å ç´ ãå¦å¤ï¼å°éè¿ç±LHåRH驱å¨å¨ä¼ è¾çåå°é³é¢æ¥æ£ç¡®å°æ¸²æä¸åéçé³é¢å¯¹è±¡ç¸å ³èçé«åº¦å ç´ ãçµè§LåRæ¬å£°å¨ä¸çç«ä½å£°èæåç使ç¨ç±»ä¼¼äºLåRå®¶åºå½±é¢æ¬å£°å¨ï¼å ¶ä¸éè¿åºäºç±èªéåºé³é¢å 容æä¾ç对象空é´ä¿¡æ¯å¨ææ§å¶æ¬å£°å¨èæåç®æ³åæ°ï¼æ½å¨å°æ²æµ¸å¼å¨ææ¬å£°å¨èæåç¨æ·ä½éªæ¯å¯è½çãæ¤å¨æèæåå¯ä»¥ç¨äºåå»ºå¯¹è±¡æ²¿çæ¶å¬ç¯å¢çä¾§é¢ç§»å¨çæç¥ãRather than implementing an adaptive audio system in an entire listening environment or theater, aspects of an adaptive audio system may be implemented in a more localized application, such as a television, computer, game console, or similar device. This situation effectively relies on speakers being arranged in a plane corresponding to the viewing screen or monitor surface. Figure 13 illustrates the use of the adaptive audio system in an exemplary television and speaker use case. In general, TV use cases provide an opportunity to create The challenge of an immersive audio experience. The system 1300 of FIG. 13 includes speakers for standard TV left and right positions (TV-L and TV-R) and left and right up-firing drivers (TV-LH and TV-RH). Television 1302 may also include sound box 1304 or speakers in some height array. In general, TV speakers are reduced in size and mass due to cost constraints and design choices when compared to freestanding or home theater speakers. However, using dynamic virtualization can help overcome these drawbacks. In FIG. 13, a dynamic virtualization effect for TV-L and TV-R speakers is shown such that a person at a particular listening position 1308 would hear horizontal elements associated with appropriate audio objects rendered separately in the horizontal plane . Additionally, height elements associated with suitable audio objects will be correctly rendered via reflected audio delivered by the LH and RH drivers. The use of stereo virtualization in TV L and R speakers is similar to L and R home theater speakers, where by dynamically controlling speaker virtualization algorithm parameters based on object space information provided by adaptive audio content, potentially immersive dynamic speaker virtualization User experience is possible. This dynamic virtualization can be used to create the perception of objects moving along the sides of the listening environment.
çµè§ç¯å¢è¿å¯ä»¥å æ¬å¨é³ç®±1304å æç¤ºçHRCæ¬å£°å¨ãè¿ç§HRCæ¬å£°å¨å¯ä»¥æ¯å 许平移éè¿HRCéµåçå¯æçºµåå ãéè¿æ¥æå¸¦æå¯åç¬å¯»åçæ¬å£°å¨çå颿¿åçä¸å¤®å£°ééµåå¯ä»¥å ·æçå¤(ç¹å«æ¯å¯¹äºè¾å¤§çå±å¹)ï¼æè¿°å¯åç¬å¯»åçæ¬å£°å¨éè¿å¹é è§é¢å¯¹è±¡å¨å±å¹ä¸çç§»å¨çéµåå 许é³é¢å¯¹è±¡çåç¬å¹³ç§»ã该æ¬å£°å¨è¿è¢«ç¤ºä¸ºå ·æä¾§é¢æ¿åçæ¬å£°å¨ã妿æ¬å£°å¨è¢«ç¨ä½é³ç®±ï¼åå¯ä»¥æ¿æ´»å¹¶ä½¿ç¨è¿äºä¾§é¢æ¿åçæ¬å£°å¨ï¼ä½¿å¾ä¾§é¢æ¿åç驱å¨å¨ç±äºç¼ºä¹ç¯ç»æåç½®æ¬å£°å¨èæä¾æ´å¤çæ²æµ¸ãè¿ç¤ºåºäºç¨äºHRC/é³ç®±æ¬å£°å¨ç卿èæåæ¦å¿µã示åºäºç¨äºå¨å颿¿åçæ¬å£°å¨éµåçæè¿ä¾§çLåRæ¬å£°å¨ç卿èæåãæ¤å¤ï¼è¿å¯ä»¥ç¨äºåå»ºå¯¹è±¡æ²¿çæ¶å¬ç¯å¢çä¾§é¢ç§»å¨çæç¥ãæ¤ä¿®æ¹çä¸å¤®æ¬å£°å¨è¿å¯ä»¥å æ¬æ´å¤æ¬å£°å¨ï¼å¹¶å®ç°å¸¦æå弿§å¶ç声é³åºåçå¯æçºµå£°æãå¨å¾13çç¤ºä¾æ§å®ç°ä¸è¿ç¤ºåºäºä½äºä¸»è¦æ¶å¬ä½ç½®1308çåé¢çNFEæ¬å£°å¨1306ãå æ¬NFEæ¬å£°å¨å¯ä»¥éè¿ç§»å¨å£°é³è¿ç¦»æ¶å¬ç¯å¢çåé¢å¹¶ä¸æ´é è¿å¬è ï¼æ¥æä¾ç±èªéåºé³é¢ç³»ç»æä¾çæ´å¤§çç¯ç»æãThe television environment may also include HRC speakers shown within sound box 1304 . Such an HRC loudspeaker may be a steerable unit that allows panning through the HRC array. There can be benefits (especially for larger screens) by having a front-firing center channel array with individually addressable speakers that match the movement of video objects across the screen through the array Allows individual panning of audio objects. The loudspeaker is also shown as a side fired loudspeaker. If the speakers are being used as enclosures, these side fired speakers can be activated and used such that the side fired drivers provide more immersion due to the lack of surround or rear speakers. A dynamic virtualization concept for HRC/cabinet loudspeakers is also shown. Dynamic virtualization for the farthest L and R loudspeakers of the loudspeaker array fired in front is shown. Furthermore, this can be used to create the perception of objects moving along the sides of the listening environment. This modified center speaker can also include more speakers and enable steerable sound beams with separately controlled sound zones. Also shown in the exemplary implementation of FIG. 13 is an NFE speaker 1306 positioned in front of a primary listening position 1308 . Including NFE speakers can provide the greater sense of surroundness provided by adaptive audio systems by moving the sound away from the front of the listening environment and closer to the listener.
ç¸å¯¹äºè³æºæ¸²æï¼èªéåºé³é¢ç³»ç»éè¿å°HRTFä¸ç©ºé´ä½ç½®å¹é ï¼æ¥ç»´æ¤å建è çåå§æå¾ãå½éè¿è³æºåç°é³é¢æ¶ï¼å¯ä»¥éè¿åºç¨å¤çé³é¢ç头é¨ç¸å ³ä¼ é彿°æ¥å®ç°ä¸¤è³ç空é´èæåï¼å¹¶æ·»å å建é³é¢æ£å¨å¨ä¸ç»´ç©ºé´ä¸èé卿 åç«ä½å£°è³æºåæ¾çæç¥çæç¥æç¤ºã空é´åç°çå确度ä¾èµäºéæ©åéçHRTFï¼HRTFä¼åºäºå æ¬æ£å¨è¢«æ¸²æçé³é¢å£°éæå¯¹è±¡ç空é´ä½ç½®çå¤ç§å ç´ èååã使ç¨ç±èªéåºé³é¢ç³»ç»æä¾ç空é´ä¿¡æ¯ä¼å¯¼è´å¯¹è¡¨ç¤º3D空é´çä¸ä¸ªææ°éæç»ååçHRTFçéæ©ï¼ä»¥æå¤§å°æ¹ååç°ä½éªãIn contrast to headphone rendering, adaptive audio systems maintain the creator's original intent by matching HRTFs to spatial positions. When reproducing audio through headphones, spatial virtualization of both ears can be achieved by applying a head-related transfer function that processes the audio and adds perceptual cues that create the perception that the audio is being played back in three-dimensional space rather than standard stereo headphones. The accuracy of spatial reproduction is dependent on selecting an appropriate HRTF, which can vary based on a number of factors including the spatial location of the audio channel or object being rendered. Using the spatial information provided by the adaptive audio system results in the selection of one or a continuously varying number of HRTFs representing the 3D space to greatly improve the reproduction experience.
该系ç»è¿ä¿è¿äºæ·»å å¼å¯¼çä¸ç»´ç两è³ç渲æåèæåã类似äºç©ºé´æ¸²æçæ åµï¼ä½¿ç¨æ°çåä¿®æ¹çæ¬å£°å¨ç±»ååä½ç½®æ¶ï¼å¯ä»¥éè¿ä½¿ç¨ä¸ç»´çHRTFæ¥å建æç¤ºä»¥æ¨¡ææ¥èªæ°´å¹³é¢ååç´è½´çé³é¢ç声é³ãåªæä¾å£°éååºå®çæ¬å£°å¨ä½ç½®ä¿¡æ¯æ¸²æç以åçé³é¢æ ¼å¼æ´å åéãéç¨èªéåºé³é¢æ ¼å¼ä¿¡æ¯ï¼ä¸¤è³çä¸ç»´æ¸²æè³æºç³»ç»å ·æè¯¦ç»çãæç¨çå¯ç¨äºæç¤ºé³é¢çåªäºå ç´ éäºå¨æ°´å¹³ååç´å¹³é¢ä¸é½æ¸²æçä¿¡æ¯ãæäºå 容å¯è½ä¾èµäºä½¿ç¨å¤´é¡¶æ¬å£°å¨æ¥æä¾æ´å¤§çç¯ç»æè§ãè¿äºé³é¢å¯¹è±¡åä¿¡æ¯å¯ä»¥ç¨äºä¸¤è³æ¸²æï¼å½ä½¿ç¨è³æºæ¶æè¿°ä¸¤è³æ¸²ææè§æ¯å¨å¬è 头ä¸ãå¾14示åºäºå¨å®æ½ä¾ä¸çç¨äºèªéåºé³é¢ç³»ç»ä¸çä¸ç»´ä¸¤è³è³æºèæåä½éªçç®å表示ãå¦å¾14æç¤ºï¼ç¨äºä»èªéåºé³é¢ç³»ç»åç°é³é¢çè³æº1402å æ¬æ åxãyå¹³é¢ä»¥åzå¹³é¢ä¸çé³é¢ä¿¡å·1404ï¼ä½¿å¾ä¸æäºé³é¢å¯¹è±¡æå£°é³ç¸å ³èçé«åº¦è¢«åæ¾ä¸ºä½¿å¾å®ä»¬å¬èµ·æ¥åå®ä»¬å¨xãy产çç声é³ä¸æ¹æä¸æ¹äº§çãThe system also facilitates the rendering and visualization of the two ears for added guided three-dimensionality. Similar to the case of spatial rendering, with new and modified speaker types and positions, cues can be created by using HRTFs in three dimensions to simulate the sound of audio from the horizontal and vertical axes. Previous audio formats that only provided channel and fixed speaker position information rendering were more limited. Using adaptive audio format information, a binaural 3D rendering headphone system has detailed, useful information that can be used to indicate which elements of the audio are suitable for rendering in both the horizontal and vertical planes. Some content may rely on the use of overhead speakers to provide a greater sense of surround. These audio objects and information can be used for binaural rendering of what is perceived as being on the listener's head when headphones are used. Figure 14 shows a simplified representation of a three-dimensional headphone virtualization experience for use in an adaptive audio system, under an embodiment. As shown in FIG. 14 , a headset 1402 for reproducing audio from an adaptive audio system includes audio signals 1404 in standard x, y and z planes such that heights associated with certain audio objects or sounds are played back such that they Sounds like they are produced above or below the sound produced by x,y.
å æ°æ®å®ä¹metadata definition
å¨å®æ½ä¾ä¸ï¼èªéåºé³é¢ç³»ç»å æ¬ä»åå§ç©ºé´é³é¢æ ¼å¼çæå æ°æ®çç»ä»¶ãç³»ç»300çæ¹æ³åç»ä»¶å æ¬è¢«é 置为å¤çä¸ä¸ªææ´å¤ä¸ªå å«ä¼ ç»çåºäºå£°éçé³é¢å ç´ åé³é¢å¯¹è±¡ç¼ç å ç´ è¿ä¸¤è çæ¯ç¹æµçé³é¢æ¸²æç³»ç»ãå å«é³é¢å¯¹è±¡ç¼ç å ç´ çæ°æ©å±å±è¢«å®ä¹å¹¶è¢«æ·»å å°åºäºå£°éçé³é¢ç¼è§£ç 卿¯ç¹æµåé³é¢å¯¹è±¡æ¯ç¹æµä¸çä»»ä½ä¸ä¸ªãæ¤æ¹æ³å è®¸å æ¬æ©å±å±çæ¯ç¹æµç±æ¸²æå¨å¤çï¼ä»¥ç¨äºç°æçæ¬å£°å¨å驱å¨å¨è®¾è®¡æä½¿ç¨å¯åç¬å¯»åç驱å¨å¨å驱å¨å¨å®ä¹çä¸ä¸ä»£æ¬å£°å¨ãæ¥èªç©ºé´é³é¢å¤çå¨ç空é´é³é¢å å®¹å æ¬é³é¢å¯¹è±¡ã声é以åä½ç½®å æ°æ®ãå½å¯¹è±¡è¢«æ¸²ææ¶ï¼æ ¹æ®ä½ç½®å æ°æ®ååæ¾æ¬å£°å¨çä½ç½®ï¼å®è¢«åé å°ä¸ä¸ªææ´å¤ä¸ªæ¬å£°å¨ãé¢å¤çå æ°æ®å¯ä»¥ä¸å¯¹è±¡ç¸å ³èï¼ä»¥æ¹ååæ¾ä½ç½®æå¦åéå¶è¦ç¨äºåæ¾çæ¬å£°å¨ãååºäºå·¥ç¨å¸çæ··åè¾å ¥ï¼å¨é³é¢å·¥ä½ç«ä¸çæå æ°æ®ï¼ä»¥æä¾æ§å¶ç©ºé´åæ°(ä¾å¦ï¼ä½ç½®ãé度ã强度ãé³è²ç)å¹¶æææ¶å¬ç¯å¢ä¸çåªäºé©±å¨å¨ææ¬å£°å¨å¨å±ç¤ºæé´ææ¾åèªç声é³ç渲æéåãå æ°æ®ä¸å·¥ä½ç«ä¸çåèªçé³é¢æ°æ®ç¸å ³èï¼ä»¥ç±ç©ºé´é³é¢å¤ç卿å åä¼ è¾ãIn an embodiment, an adaptive audio system includes a component that generates metadata from a raw spatial audio format. The methods and components of system 300 include an audio rendering system configured to process one or more bitstreams containing both conventional channel-based audio elements and audio object coding elements. A new extension layer containing audio object coding elements is defined and added to any one of the channel-based audio codec bitstream and the audio object bitstream. This approach allows bitstreams including extension layers to be processed by the renderer for use in existing loudspeaker and driver designs or next-generation loudspeakers defined using individually addressable drivers and drivers. Spatial audio content from a spatial audio processor includes audio objects, channels, and positional metadata. When an object is rendered, it is assigned to one or more speakers based on the position metadata and the playback speaker's position. Additional metadata can be associated with the object to change the playback position or otherwise limit the speakers to be used for playback. Metadata is generated in the audio workstation in response to the engineer's mixing input to provide control over spatial parameters (e.g., position, velocity, intensity, timbre, etc.) and to indicate which drivers or speakers in the listening environment played their respective sounds during the presentation Render queue. Metadata is associated with the respective audio data in the workstation to be packaged and transmitted by the spatial audio processor.
å¾15æ¯ç¤ºåºäºå¨å®æ½ä¾ä¸çç¨äºæ¶å¬ç¯å¢çèªéåºé³é¢ç³»ç»ä¸çæäºå æ°æ®å®ä¹ç表ãå¦è¡¨1500æç¤ºï¼å æ°æ®å®ä¹å æ¬ï¼é³é¢å 容类åã驱å¨å¨å®ä¹(æ°éãç¹æ§ãä½ç½®ãæå°è§)ãç¨äºç§¯æçæçºµ/è°è°çæ§å¶ä¿¡å·ä»¥åå æ¬æ¿é´åæ¬å£°å¨ä¿¡æ¯çæ ¡åä¿¡æ¯ãFigure 15 is a table showing certain metadata definitions in an adaptive audio system for a listening environment, under an embodiment. As shown in table 1500, metadata definitions include: audio content type, driver definition (number, characteristics, location, throw angle), control signals for active manipulation/tuning, and calibration information including room and speaker information.
ç¹å¾åè½åcharacteristics and abilities
å¦ä¸æè¿°ï¼èªéåºé³é¢çæç³»ç»å 许å 容å建è ç»ç±å æ°æ®ææ··åçç©ºé´æå¾(ä½ç½®ï¼å¤§å°ãé度ç)åµå ¥å¨æ¯ç¹æµå ãè¿å 许å¨é³é¢ç空é´åç°ä¸ææäººççµæ´»æ§ãä»ç©ºé´æ¸²æçè§ç¹æ¥çï¼èªéåºé³é¢æ ¼å¼ä½¿å 容å建è è½å¤ä½¿æ··åéé æ¶å¬ç¯å¢ä¸çæ¬å£°å¨çåç¡®ä½ç½®ï¼ä»¥é¿å åæ¾ç³»ç»ä¸åä½ç³»ç»çå ä½å½¢ç¶ä¸åæå¯¼è´ç空é´å¤±çãå¨åªå鿬声å¨å£°éçé³é¢çå½åé³é¢åç°ç³»ç»ä¸ï¼å 容å建è çæå¾å¯¹äºæ¶å¬ç¯å¢ä¸é¤äºåºå®æ¬å£°å¨ä½ç½®ä¹å¤çä½ç½®æ¯æªç¥çãå¨å½åç声é/æ¬å£°å¨èä¾ä¸ï¼å¯ä¸å·²ç¥çä¿¡æ¯æ¯ç¹å®é³é¢å£°éåºè¯¥è¢«åéå°å¨æ¶å¬ç¯å¢ä¸å ·æé¢å®ä¹çä½ç½®çç¹å®æ¬å£°å¨ãå¨èªéåºé³é¢ç³»ç»ä¸ï¼ä½¿ç¨éè¿å建åååæµæ°´çº¿ä¼ è¾¾çå æ°æ®ï¼åç°ç³»ç»å¯ä»¥ä½¿ç¨è¯¥ä¿¡æ¯æ¥ä»¥å¹é å 容å建è çåå§æå¾çæ¹å¼åç°å 容ãä¾å¦ï¼å¯¹äºä¸åçé³é¢å¯¹è±¡ï¼æ¬å£°å¨ä¹é´çå ³ç³»æ¯å·²ç¥çãéè¿æä¾é³é¢å¯¹è±¡ç空é´ä½ç½®ï¼å 容å建è çæå¾æ¯å·²ç¥çï¼å¹¶ä¸è¿å¯ä»¥è¢«âæ å°âå°æ¬å£°å¨é ç½®ä¸ï¼å æ¬å®ä»¬çä½ç½®ãéç¨å¨ææ¸²æé³é¢æ¸²æç³»ç»ï¼è¯¥æ¸²æå¯ä»¥éè¿æ·»å é¢å¤çæ¬å£°å¨æ¥æ´æ°åæ¹åãAs mentioned above, the adaptive audio ecosystem allows content creators to embed the spatial intent of the mix (position, size, velocity, etc.) within the bitstream via metadata. This allows for amazing flexibility in the spatial reproduction of audio. From a spatial rendering point of view, adaptive audio formats enable content creators to adapt the mix to the exact location of the speakers in the listening environment to avoid spatial distortions caused by the different geometries of the playback system and the authoring system. In current audio reproduction systems that only send audio from speaker channels, the content creator's intent is unknown to locations in the listening environment other than fixed speaker locations. Under the current channel/speaker paradigm, the only known information is that a specific audio channel should be sent to a specific speaker with a predefined location in the listening environment. In an adaptive audio system, using the metadata conveyed through the creation and distribution pipeline, the reproduction system can use this information to reproduce the content in a manner that matches the original intent of the content creator. For example, the relationship between speakers is known for different audio objects. By providing the spatial location of the audio objects, the intent of the content creator is known, and this can be "mapped" onto the speaker configuration, including their location. Featuring a dynamic rendering audio rendering system that can be updated and improved by adding additional speakers.
该系ç»è¿ä½¿å¾è½å¤æ·»å å¼å¯¼çä¸ç»´ç©ºé´æ¸²æãæè®¸å¤éè¿ä½¿ç¨æ°çæ¬å£°å¨è®¾è®¡åé ç½®æ¥å建æ´å æ²æµ¸å¼çé³é¢æ¸²æä½éªçå°è¯ãè¿äºå°è¯å æ¬ä½¿ç¨åææ¬å£°å¨ï¼ä¾§é¢æ¿åãå颿¿ååå䏿¿åç驱å¨å¨ãéç¨ä»¥åç声éååºå®çæ¬å£°å¨ä½ç½®ç³»ç»ï¼ç¡®å®é³é¢çåªäºå ç´ åºè¯¥åéå°è¿äºä¿®æ¹çæ¬å£°å¨æ¯ç¸å¯¹å°é¾çã使ç¨èªéåºé³é¢æ ¼å¼ï¼æ¸²æç³»ç»å ·æé³é¢çåªäºå ç´ (对象æå ¶ä»)éäºåéå°æ°æ¬å£°å¨é ç½®ç详ç»ä¸æç¨çä¿¡æ¯ãå³ï¼ç³»ç»å 许对åªäºé³é¢ä¿¡å·åéå°å颿¿åç驱å¨å¨ä»¥ååªäºé³é¢ä¿¡å·åéå°å䏿¿åç驱å¨å¨è¿è¡æ§å¶ãä¾å¦ï¼èªéåºé³é¢å½±é¢å 容å¾å¤§ç¨åº¦ä¸ä¾èµäºä½¿ç¨å¤´é¡¶æ¬å£°å¨æ¥æä¾è¾å¤§çç¯ç»æè§ãè¿äºé³é¢å¯¹è±¡åä¿¡æ¯å¯ä»¥åéå°å䏿¿åç驱å¨å¨ï¼ä»¥å¨æ¶å¬ç¯å¢ä¸æä¾åå°çé³é¢ä»è产çç±»ä¼¼çææãThe system also enables the addition of guided three-dimensional space rendering. There have been many attempts to create a more immersive audio rendering experience by using new speaker designs and configurations. These attempts included the use of bipolar loudspeakers, side-firing, rear-firing and upward-firing drivers. With the previous system of channels and fixed speaker positions, it was relatively difficult to determine which elements of the audio should be sent to these modified speakers. Using an adaptive audio format, the rendering system has detailed and useful information of which elements of the audio (object or otherwise) are suitable for sending to the new speaker configuration. That is, the system allows control over which audio signals are sent to the front-firing drivers and which are sent to the upward-firing drivers. For example, adaptive audio cinema content relies heavily on the use of overhead speakers to provide a greater surround feeling. These audio objects and information can be sent to an upward firing driver to provide reflected audio in the listening environment to create a similar effect.
该系ç»è¿å è®¸å°æ··åéé å°åç°ç³»ç»çåç¡®ç硬件é ç½®ãå¨è¯¸å¦çµè§æºãå®¶åºå§é¢ãé³ç®±ã便æºå¼é³ä¹åæ¾å¨å¯¹æ¥å¨çä¹ç±»ç渲æè®¾å¤ä¸åå¨è®¸å¤ä¸åçå¯è½çæ¬å£°å¨ç±»ååé ç½®ãå½ç»è¿äºç³»ç»åé声éç¹å®çé³é¢ä¿¡æ¯(å³ï¼å·¦å³å£°éææ åå¤å£°éé³é¢)æ¶ï¼ç³»ç»å¿ é¡»å¤çé³é¢ä»¥éå½å°å¹é 渲æè®¾å¤çè½åãå ¸åç¤ºä¾æ¯å¨åå ·æä¸¤ä¸ªä»¥ä¸çæ¬å£°å¨çé³ç®±åéæ åç«ä½(å·¦ï¼å³)é³é¢çæ¶åãå¨åªå鿬声å¨å£°éçé³é¢çå½åé³é¢ç³»ç»ä¸ï¼å 容å建è çæå¾æ¯æªç¥çï¼å¹¶ä¸éè¿å¢å¼ºç设å¤èåå¾å¯è½çæ´å æ²æµ¸å¼çé³é¢ä½éªå¿ é¡»ç±å设å¦ä½ä¿®æ¹ç¨äºå¨ç¡¬ä»¶ä¸åç°é³é¢çç®æ³æ¥å建ãè¿ç§æ åµçç¤ºä¾æ¯ä½¿ç¨PLIIãPLII-zæä¸ä¸ä»£ç¯ç»ä»¥å°åºäºå£°éçé³é¢âå䏿··åâå°æ¯åå§å£°éé¦éçæ°éæ´å¤çæ¬å£°å¨ãéç¨èªéåºé³é¢ç³»ç»ï¼ä½¿ç¨å¨å建åååæµæ°´çº¿ä¸ä¼ è¾¾çå æ°æ®ï¼åç°ç³»ç»å¯ä»¥ä½¿ç¨è¯¥ä¿¡æ¯æ¥ä»¥æ´ç´§å¯å°å¹é å 容å建è çåå§æå¾çæ¹å¼åç°å 容ãä¾å¦ï¼æäºé³ç®±å ·æä¾§é¢æ¿åçæ¬å£°å¨ä»¥å建ç¯ç»æè§ãéç¨èªéåºé³é¢ï¼ç©ºé´ä¿¡æ¯åå 容类åä¿¡æ¯(å³ï¼å¯¹è¯ãé³ä¹ãç¯å¢ææç)å½å诸å¦TVæA/Væ¥æ¶å¨ä¹ç±»ç渲æç³»ç»æ§å¶æ¶å¯ä»¥è¢«é³ç®±ç¨æ¥åªæåéçé³é¢åéå°è¿äºä¾§é¢æ¿åçæ¬å£°å¨ãThe system also allows adapting the hybrid to the exact hardware configuration of the rendering system. There are many different possible speaker types and configurations in rendering devices such as televisions, home theaters, sound boxes, portable music player docks, and the like. When sending channel-specific audio information (ie, left and right channels or standard multi-channel audio) to these systems, the system must process the audio to appropriately match the capabilities of the rendering device. A typical example is when sending standard stereo (left, right) audio to a cabinet with more than two speakers. In current audio systems that only send audio from the speaker channels, the intent of the content creator is unknown, and the more immersive audio experiences made possible by enhanced devices must be modified by assumptions about how to reproduce on hardware audio algorithms to create. An example of this is the use of PLII, PLII-z, or Next Generation Surround to "upmix" channel-based audio to a greater number of speakers than the original channel feed. With an adaptive audio system, using the metadata communicated in the creation and distribution pipeline, the reproduction system can use this information to reproduce the content in a manner that more closely matches the original intent of the content creator. For example, some cabinets have side-firing speakers to create a surround feeling. With adaptive audio, spatial information and content type information (i.e. dialog, music, ambient effects, etc.) can be used by speakers to send only appropriate audio when controlled by a rendering system such as a TV or A/V receiver to these side fired speakers.
ç±èªéåºé³é¢ä¼ éç空é´ä¿¡æ¯å 许å¨ç¥éæ¬å£°å¨çä½ç½®åç±»åçæ åµä¸å¯¹å 容è¿è¡å¨ææ¸²æãå¦å¤ï¼å ³äºå¬è (ä¸ä¸ªæå¤ä¸ª)ä¸é³é¢åç°è®¾å¤çå ³ç³»çä¿¡æ¯ç°å¨æ½å¨å°å¯ç¨ï¼å¹¶å¯ä»¥ç¨äºæ¸²æã大夿°æ¸¸ææ§å¶å°å æ¬ç §ç¸æºé ä»¶åå¯ä»¥ç¡®å®äººå¨æ¶å¬ç¯å¢ä¸çä½ç½®åèº«ä»½çæºè½å¾åå¤çã该信æ¯å¯ä»¥ç±èªéåºé³é¢ç³»ç»ç¨æ¥æ¹å渲æï¼ä»¥åºäºå¬è çä½ç½®æ¥æ´åç¡®å°ä¼ è¾¾å 容å建è çåææå¾ãä¾å¦ï¼å¨å 乿æçæ åµä¸ï¼éå¯¹åæ¾æ¸²æçé³é¢å设å¬è ä½äºçæ³çâæä½³å¬é³ä½ç½®âï¼æè¿°âæä½³å¬é³ä½ç½®â叏叏䏿¯ä¸æ¬å£°å¨çè·ç¦»å¹¶ä¸æ¯å¨å 容å建æé´è°é³å¸æå¤çç¸åä½ç½®ãç¶èï¼è®¸å¤æ¶å人们并ä¸å¨æ¤çæ³ä½ç½®ï¼å¹¶ä¸ä»ä»¬çä½éªä¸å¹é è°é³å¸çåææå¾ãå ¸åç¤ºä¾æ¯å¬è å卿¶å¬ç¯å¢çå·¦ä¾§çæ¤ åæåºä¸ãå¯¹äºæ¤æ åµï¼ä»å·¦è¾¹çè¾è¿çæ¬å£°å¨åç°ç声é³å°è¢«æè§ä¸ºè¾åå¹¶ä¸é³é¢æ··åçç©ºé´æç¥åå·¦åæãéè¿äºè§£å¬è çä½ç½®ï¼ç³»ç»å¯ä»¥è°æ´é³é¢ç渲æä»¥éä½å·¦è¾¹çæ¬å£°å¨çé³éï¼å¹¶æé«å³è¾¹çæ¬å£°å¨çé³éï¼ä»¥éæ°å¹³è¡¡é³é¢æ··å并使å¾å¨æè§ä¸æ£ç¡®ãå»¶è¿é³é¢ä»¥è¡¥å¿å¬è 䏿佳å¬é³ä½ç½®çè·ç¦»ä¹æ¯å¯ä»¥çãå¬è ä½ç½®å¯ä»¥éè¿ä½¿ç¨ç §ç¸æºæå¸¦æå°å¬è ä½ç½®åéå°æ¸²æç³»ç»çæç§å ç½®åä¿¡è£ ç½®çä¿®æ¹ç饿§å¨æ¥æ£æµå°ãThe spatial information conveyed by Adaptive Audio allows dynamic rendering of content with knowledge of the location and type of speakers. Additionally, information about the relationship of the listener(s) to the audio reproduction device is now potentially available and can be used for rendering. Most game consoles include camera accessories and intelligent image processing that can determine a person's position and identity in the listening environment. This information can be used by an adaptive audio system to alter the rendering to more accurately convey the content creator's creative intent based on the listener's position. For example, in almost all cases, audio rendered for playback assumes that the listener is located in an ideal "sweet spot" that is often equidistant from each speaker and that occurs during content creation. The same position that the tuner is in. However, many times people are not in this ideal position and their experience does not match the sound engineer's creative intent. A typical example is a listener sitting on a chair or bed on the left side of the listening environment. For this case, the sound reproduced from the closer speaker on the left will be perceived as louder and the spatial perception of the audio mix is skewed to the left. Knowing the position of the listener, the system can adjust the rendering of the audio to lower the volume of the left speaker and raise the volume of the right speaker to rebalance the audio mix and make it feel correct. It is also possible to delay the audio to compensate for the listener's distance from the sweet spot. The listener position can be detected by using a camera or a modified remote control with some kind of built-in signaling that sends the listener position to the rendering system.
é¤äºä½¿ç¨æ 忬声å¨åæ¬å£°å¨ä½ç½®æ¥å¯»åæ¶å¬ä½ç½®ä¹å¤ï¼ä¹å¯ä»¥ä½¿ç¨æ³¢ææçºµææ¯æ¥å建éçå¬è ä½ç½®åå 容ååç声åºâåºåâãé³é¢æ³¢æå½¢æä½¿ç¨æ¬å£°å¨éµå(é常8å°16个水平å°éå¼çæ¬å£°å¨)ï¼å¹¶ä½¿ç¨ç¸ä½æçºµåå¤çæ¥åå»ºå¯æçºµç声æãæ³¢æå½¢ææ¬å£°å¨éµåå 许å建主è¦å¬å¾è§çé³é¢çé³é¢åºåï¼æè¿°é³é¢åºåå¯ç¨äºæç»è¿éæ©æ§å¤ççç¹å®å£°é³æå¯¹è±¡æåç¹å®ç©ºé´ä½ç½®ãææ¾çä½¿ç¨æ 嵿¯ä½¿ç¨å¯¹è¯å¢å¼ºåå¤çç®æ³æ¥å¤ç声轨ä¸ç对è¯å¹¶ä¸æè¯¥é³é¢å¯¹è±¡ä»¥æ³¢æç´æ¥åå°å¬åéç¢çç¨æ·ãIn addition to using standard speakers and speaker positions to address listening positions, beam steering techniques can also be used to create soundstage "zones" that vary with the listener's position and content. Audio beamforming uses an array of speakers (typically 8 to 16 horizontally spaced speakers) and uses phase manipulation and processing to create steerable beams of sound. Beamforming speaker arrays allow the creation of audio regions of predominantly audible audio that can be used to direct selectively processed specific sounds or objects to specific spatial locations. An obvious use case is to use a dialogue enhancing post-processing algorithm to process dialogue in a soundtrack and beam that audio object directly to a hearing impaired user.
ç©éµç¼ç å空é´å䏿··åMatrix encoding and spatial upmixing
å¨æäºæ åµä¸ï¼é³é¢å¯¹è±¡å¯ä»¥æ¯èªéåºé³é¢å å®¹çææçæåï¼ç¶èï¼åºäºå¸¦å®½éå¶ï¼å¯è½ä¸è½åé声é/æ¬å£°å¨é³é¢åé³é¢å¯¹è±¡ãè¿å»ï¼ç©éµç¼ç å·²ç¨äºä¸ºç»å®çååç³»ç»ä¼ 鿝å¯è½æ´å¤çé³é¢ä¿¡æ¯ãä¾å¦ï¼è¿æ¯å½±é¢çæ©æçæ åµï¼è°é³å¸å建äºå¤å£°éé³é¢ï¼ä½æ¯çµå½±æ ¼å¼åªæä¾ç«ä½é³é¢ãç©éµç¼ç è¢«ç¨æ¥æºè½å°å°å¤å£°éé³é¢å䏿··åå°ä¸¤ä¸ªç«ä½å£°éï¼ç¶åå©ç¨æäºç®æ³æ¥å¤çè¿ä¸¤ä¸ªç«ä½å£°éï¼ä»¥ä»ç«ä½é³é¢éæ°å建å¤å£°éæ··åçæ¥è¿çè¿ä¼¼ã类似å°ï¼å¯ä»¥æºè½å°å°é³é¢å¯¹è±¡å䏿··åå°åºæ¬æ¬å£°å¨å£°éï¼å¹¶ä¸éè¿ä½¿ç¨èªéåºé³é¢å æ°æ®åå¤æçæ¶é´åé¢çææçä¸ä¸ä»£ç¯ç»ç®æ³ï¼æ¥æå对象并å©ç¨èªéåºé³é¢æ¸²æç³»ç»å¨ç©ºé´ä¸æ£ç¡®å°æ¸²æå®ä»¬ãIn some cases, audio objects may be a desired component of adaptive audio content; however, based on bandwidth limitations, it may not be possible to send channel/speaker audio and audio objects. In the past, matrix coding has been used to convey more audio information than is possible for a given distribution system. For example, this was the case in the early days of cinema: sound engineers created multichannel audio, but movie formats only provided stereo audio. Matrix encoding is used to intelligently downmix the multichannel audio to two stereo channels, which are then processed with certain algorithms to recreate a close approximation of the multichannel mix from the stereo audio. Similarly, audio objects can be intelligently downmixed to the base speaker channel, and by using adaptive audio metadata and sophisticated time- and frequency-sensitive next-generation surround algorithms to extract objects and leverage the adaptive audio rendering system in Render them spatially correctly.
å¦å¤ï¼å½ç¨äºé³é¢çä¼ è¾ç³»ç»åå¨å¸¦å®½éå¶æ¶(ä¾å¦ï¼3Gå4Gæ 线åºç¨)ï¼ä¼ è¾ä¸åç¬çé³é¢å¯¹è±¡ä¸èµ·ç©éµç¼ç çå¨ç©ºé´ä¸ä¸åçå¤å£°éåºä¹æçå¤ãè¿ç§ä¼ è¾æ¹æ³çä¸ä¸ªä½¿ç¨æ åµå°æ¯ç¨äºä¼ è¾å¸¦æä¸¤ä¸ªä¸åçé³é¢åº(audio bed)åå¤ä¸ªé³é¢å¯¹è±¡çä½è²å¹¿æãé³é¢åºå¯ä»¥ä»£è¡¨å¨ä¸¤ä¸ªä¸åçéä¼çå°é¨åææå°çå¤å£°éé³é¢ï¼å¹¶ä¸é³é¢å¯¹è±¡å¯ä»¥ä»£è¡¨å¯è½å¯¹ä¸ä¸ªéä¼æå ¶ä»é伿±æå¥½æçä¸å广æåãä½¿ç¨æ åç¼ç ï¼æ¯ä¸ªåºä»¥åä¸¤ä¸ªææ´å¤ä¸ªå¯¹è±¡ç5.1åç°å¯è½ä¼è¶ åºä¼ è¾ç³»ç»ç带宽éå¶ã卿¤æ åµä¸ï¼å¦æ5.1个åºä¸çæ¯ä¸ä¸ªé½è¢«ç©éµç¼ç 为ç«ä½å£°ä¿¡å·ï¼é£ä¹æå被ææä¸º5.1声éç两个åºå¯ä»¥ä½ä¸ºå声éåº1ãå声éåº2ã对象1å对象2ä¼ è¾ï¼ä»èåªæå个é³é¢å£°éï¼è䏿¯5.1+5.1+2æ12.1个声éãAdditionally, when the transmission system for audio is bandwidth-limited (eg, 3G and 4G wireless applications), it is also beneficial to transmit spatially distinct multi-channel beds matrix-coded with individual audio objects. One use case for this transmission method would be for transmission of a sports broadcast with two different audio beds and multiple audio objects. Audio beds could represent multi-channel audio captured in two different sections of the team stands, and audio objects could represent different announcers who might have a soft spot for one team or the other. Using standard encoding, 5.1 rendering of each bed and two or more objects may exceed the bandwidth limitations of the transmission system. In this case, if each of the 5.1 beds is matrix-encoded as a stereo signal, the two beds that were originally captured as 5.1 channels can be used as binaural bed 1, binaural bed 2, object 1, and Object 2 is transmitted so that there are only four audio channels instead of 5.1+5.1+2 or 12.1 channels.
ä½ç½®åå 容ç¸å ³çå¤çLocation and context-dependent processing
èªéåºé³é¢çæç³»ç»å 许å 容å建è å建åç¬çé³é¢å¯¹è±¡ï¼å¹¶æ·»å å¯ä»¥è¢«ä¼ éå°åç°ç³»ç»çå ³äºå 容çä¿¡æ¯ãè¿å 许å¨åç°ä¹åå¨å¯¹é³é¢çå¤ç䏿å¾å¤§ççµæ´»æ§ãéè¿åºäºå¯¹è±¡ä½ç½®å大å°å¯¹æ¬å£°å¨èæåè¿è¡å¨ææ§å¶ï¼å¯ä»¥ä½¿å¤çéç¨äºå¯¹è±¡çä½ç½®åç±»åãæ¬å£°å¨èæåæ¯æå¤çé³é¢ä½¿å¾å¬è æè§å°èææ¬å£°å¨çæ¹æ³ã彿ºé³é¢æ¯å æ¬ç¯ç»æ¬å£°å¨å£°éé¦éçå¤å£°éé³é¢æ¶ï¼è¯¥æ¹æ³å¸¸ç¨äºç«ä½å£°æ¬å£°å¨åç°ãèææ¬å£°å¨å¤çä»¥è¿æ ·çæ¹å¼ä¿®æ¹ç¯ç»æ¬å£°å¨å£°éé³é¢ï¼å½ç¯ç»æ¬å£°å¨å£°éé³é¢å¨ç«ä½å£°æ¬å£°å¨ä¸åæ¾æ¶ï¼ç¯ç»é³é¢å ç´ è¢«èæåå°å¬è çä¾§é¢ååé¢ï¼å°±å¥½åé£éæèææ¬å£°å¨ãç®åï¼èææ¬å£°å¨ä½ç½®çä½ç½®å±æ§æ¯éæçï¼å 为ç¯ç»æ¬å£°å¨çé¢å®ä½ç½®æ¯åºå®çãç¶èï¼éç¨èªéåºé³é¢å 容ï¼ä¸åé³é¢å¯¹è±¡ç空é´ä½ç½®æ¯å¨æä¸ä¸åç(å³ï¼å¯¹æ¯ä¸ªå¯¹è±¡å¯ä¸)ãæå¯è½ç°å¨å¯ä»¥éè¿å¨æå°æ§å¶è¯¸å¦æ¯ä¸ªå¯¹è±¡çæ¬å£°å¨ä½ç½®è§åº¦ä¹ç±»çåæ°å¹¶ç¶åç»åè¥å¹²ä¸ªèæåçå¯¹è±¡çæ¸²æçè¾åºä»¥åå»ºæ´æ¥è¿å°è¡¨ç°è°é³å¸çæå¾çæ´å æ²æµ¸å¼çé³é¢ä½éªï¼æ¥ä»¥æ´å çµéçæ¹å¼æ§å¶è¯¸å¦èææ¬å£°å¨èæåä¹ç±»çåå¤çãThe adaptive audio ecosystem allows content creators to create individual audio objects and add information about the content that can be passed to the reproduction system. This allows a great deal of flexibility in the processing of the audio prior to reproduction. Dynamic control of speaker virtualization based on object location and size enables processing to be adapted to the object's location and type. Speaker virtualization refers to the method of processing audio so that the listener perceives virtual speakers. This method is commonly used for stereo speaker reproduction when the source audio is multi-channel audio that includes surround speaker channel feeds. Virtual speaker processing modifies surround speaker channel audio in such a way that when surround speaker channel audio is played back on stereo speakers, surround audio elements are virtualized to the sides and behind the listener, as if there were virtual speakers there. Currently, the position attribute of the virtual speaker position is static because the predetermined positions of the surround speakers are fixed. However, with adaptive audio content, the spatial positions of different audio objects are dynamic and different (ie, unique to each object). It is now possible to dynamically control parameters such as the speaker position angle for each object and then combine the rendered output of several virtualized objects to create a more immersive audio that more closely represents the sound engineer's intent experience to gain more intuitive control over post-processing such as virtual speaker virtualization.
é¤é³é¢å¯¹è±¡çæ åæ°´å¹³èæåä¹å¤ï¼è¿å¯ä»¥ä½¿ç¨å¤çåºå®å£°éåå¨æå¯¹è±¡é³é¢çæç¥é«åº¦æç¤ºå¹¶å¨æ£å¸¸çæ°´å¹³é¢ä½ç½®ä»ä¸å¯¹æ åçç«ä½å£°æ¬å£°å¨è·å¾é³é¢çé«åº¦åç°çæç¥ãIn addition to standard horizontal virtualization of audio objects, it is possible to use perceptual height cues to handle fixed-channel and dynamic object audio and obtain a perception of height reproduction of audio from a pair of standard stereo speakers at normal horizontal plane positions.
æäºæææå¢å¼ºè¿ç¨å¯ä»¥å®¡æ å°åºç¨äºåéçç±»åçé³é¢å 容ãä¾å¦ï¼å¯¹è¯å¢å¼ºå¯ä»¥åªåºç¨äºå¯¹è¯å¯¹è±¡ã对è¯å¢å¼ºæ¯æå¤çå å«å¯¹è¯çé³é¢ä½¿å¾å¯¹è¯çå¯å¬åº¦å/æå¯çè§£æ§å¢å å/ææ¹åçæ¹æ³ãå¨å¾å¤æ åµä¸ï¼åºç¨äºå¯¹è¯çé³é¢å¤ç对äºé对è¯é³é¢å 容(å³ï¼é³ä¹ãç¯å¢ææç)æ¯ä¸åéçï¼å¹¶ä¼å¯¼è´è®¨åçè½å¬å°çåªå£°ãéç¨èªéåºé³é¢ï¼é³é¢å¯¹è±¡å¯ä»¥åªå å«ä¸æ¡å 容ä¸ç对è¯ï¼å¹¶å¯ä»¥è¢«ç¸åºå°æ è®°ä½¿å¾æ¸²æè§£å³æ¹æ¡å°æéæ©å°åªå¯¹å¯¹è¯å 容åºç¨å¯¹è¯å¢å¼ºãå¦å¤ï¼å¦æé³é¢å¯¹è±¡åªæ¯å¯¹è¯(è䏿¯ä½ä¸ºå¸¸è§æ åµç对è¯åå ¶ä»å å®¹çæ··å)ï¼é£ä¹å¯¹è¯å¢å¼ºå¤çå¯ä»¥æä»æ§å°å¤ç对è¯(ç±æ¤éå¶å¯¹ä»»ä½å ¶ä»å 容æ§è¡çä»»ä½å¤ç)ãCertain effects or enhancement processes can be judiciously applied to appropriate types of audio content. For example, dialog enhancement may only be applied to the dialog object. Dialogue enhancement refers to methods of processing audio containing dialogue such that the audibility and/or intelligibility of the dialogue is increased and/or improved. In many cases, audio processing applied to dialogue is inappropriate for non-dialogue audio content (ie, music, ambient effects, etc.) and can result in objectionable audible noise. With adaptive audio, an audio object can contain only dialogue within a piece of content, and can be marked accordingly so that the rendering solution will selectively apply dialogue enhancement only to the dialogue content. Additionally, if the audio object is just dialogue (rather than a mix of dialogue and other content as is the usual case), then dialogue enhancement processing may treat dialogue exclusively (thus limiting any processing performed on any other content).
类似å°ï¼ä¹å¯ä»¥é对ç¹å®é³é¢ç¹å¾å®å¶é³é¢ååºæå衡管çãä¾å¦ï¼ä½é³ç®¡ç(滤波ãè¡°åãå¢ç)åºäºç±»åèé对ç¹å®ç对象ãä½é³ç®¡çæ¯ææéæ©å°åªé离åå¤çç¹å®å 容åä¸çä½é³(æè¾ä½ç)é¢çãéç¨å½åçé³é¢ç³»ç»åè¾éæºå¶ï¼è¿æ¯åºç¨äºå ¨é¨é³é¢çâç²âè¿ç¨ãéç¨èªéåºé³é¢ï¼ä½é³ç®¡çéå½çç¹å®é³é¢å¯¹è±¡å¯ä»¥ç±å æ°æ®è¯å«ï¼å¹¶ä¸æ¸²æå¤ç被éå½å°åºç¨ãSimilarly, audio response or equalization management can also be tailored for specific audio characteristics. For example, bass management (filtering, attenuation, gain) is targeted to specific objects based on type. Bass management refers to selectively isolating and processing only bass (or lower) frequencies within specific content blocks. With current audio systems and delivery mechanisms, this is a "blind" process applied to all audio. With Adaptive Audio, specific audio objects for which bass management is appropriate can be identified by metadata, and rendering processing applied appropriately.
èªéåºé³é¢ç³»ç»è¿ä¿è¿åºäºå¯¹è±¡ç卿èå´å缩ãä¼ ç»ç声轨ä¸å 容æ¬èº«å ·æç¸åçæ¶é¿ï¼èé³é¢å¯¹è±¡å¯è½å¨å 容ä¸åºç°æééçæ¶é´ãä¸å¯¹è±¡ç¸å ³èçå æ°æ®å¯ä»¥å å«å ³äºå ¶å¹³ååå³°å¼ä¿¡å·æ¯å¹ ï¼ä»¥åå ¶åä½æè§¦åæ¶é´(ç¹å«æ¯å¯¹äºç¬æ¶ææ)ççµå¹³ç¸å ³çä¿¡æ¯ã该信æ¯å°å 许åç¼©å¨æ´å¥½å°ä¿®æ¹å ¶åç¼©åæ¶é´å¸¸æ°(触åãéæ¾ç)以æ´å¥½å°éåºå 容ãThe adaptive audio system also facilitates object-based dynamic range compression. Traditional soundtracks have the same duration as the content itself, whereas audio objects may appear within the content for a finite amount of time. Metadata associated with an object may contain level-related information about its average and peak signal amplitudes, as well as its onset or trigger times (particularly for transient material). This information will allow the compressor to better modify its compression and time constants (trigger, release, etc.) to better suit the content.
该系ç»è¿ä¿è¿èªå¨æ¬å£°å¨æ¿é´åè¡¡ãæ¬å£°å¨åæ¶å¬ç¯å¢é³åææå¨å声é³å¼å ¥è½å¬å¾è§ççè²ç±æ¤å½±ååç°ç声é³çé³è²ä¸èµ·çéè¦çä½ç¨ãæ¤å¤ï¼ç±äºæ¶å¬ç¯å¢åå°åæ¬å£°å¨-æ¹åæ§çååï¼é³åææä¾èµäºä½ç½®ï¼å¹¶ä¸å 为æè¿°ååï¼æè§å°çé³è²å°é对ä¸åçæ¶å¬ä½ç½®èæ¾èå°ååãéè¿èªå¨æ¬å£°å¨-æ¿é´é¢è°±æµéååè¡¡ãèªå¨çå»¶æ¶è¡¥å¿(æä¾åéçæååå¯è½å°åºäºæå°äºä¹çç¸å¯¹æ¬å£°å¨ä½ç½®æ£æµ)åç级设置ãåºäºæ¬å£°å¨å空åè½çä½é³-éå®å以åä¸»è¦æ¬å£°å¨ä¸ä½é³æ¬å£°å¨çæä½³æ¼æ¥ï¼ç³»ç»ä¸ææä¾çAutoEQ(èªå¨æ¿é´åè¡¡)åè½å¸®å©åè½»è¿äºé®é¢ä¸çä¸äºãå¨å®¶åºå§é¢æå ¶ä»æ¶å¬ç¯å¢ä¸ï¼èªéåºé³é¢ç³»ç»å æ¬æäºéå åè½ï¼è¯¸å¦ï¼(1)åºäºåæ¾æ¿é´é³åææçèªå¨åç®æ æ²çº¿è®¡ç®(å¨å¯¹å®¶ç¨æ¶å¬ç¯å¢ä¸çåè¡¡çç ç©¶ä¸ï¼è¿è¢«è§ä¸ºå¼æ¾é®é¢)ï¼(2)ä½¿ç¨æ¶é´-é¢çåæç模æè¡°åæ§å¶çå½±åï¼(3)çè§£æ ¹æ®ç®¡çç¯ç»æ/宽æ/æº-宽度/å¯çè§£æ§çæµéå¼å¯¼åºçåæ°å¹¶æ§å¶è¿äºåæ°ä»¥æä¾å°½å¯è½å¥½çæ¶å¬ä½éªï¼(4)å å«ç¨äºå¹é åé¢çæ¬å£°å¨åâå ¶ä»âæ¬å£°å¨ä¹é´çé³è²ç头é¨-模åçå®å滤波ï¼ä»¥å(5)æ£æµæ¬å£°å¨å¨ç¸å¯¹äºå¬è çå离ç设置å空é´éæ°æ å°ä¸ç空é´ä½ç½®(ä¾å¦ï¼Summit wirelesså°æ¯ä¸ä¸ªç¤ºä¾)ãå¨å鿬声å¨(front-anchor loudspeaker)(ä¾å¦ï¼ä¸å¤®)åç¯ç»/åç½®/宽/é«åº¦æ¬å£°å¨ä¹é´çæäºå¹³ç§»çå 容ç¹å«æ¾ç¤ºåºæ¬å£°å¨ä¹é´çé³è²çä¸å¹é ãThe system also facilitates automatic speaker room equalization. Loudspeaker and listening environment acoustics play an important role in introducing audible coloration to the sound thereby affecting the timbre of the reproduced sound. Furthermore, the acoustics are position dependent due to variations in listening environment reflections and speaker-directivity, and because of said variations, perceived timbre will vary significantly for different listening positions. Through automatic speaker-room spectral measurement and equalization, automatic delay compensation (providing proper imaging and possibly least-squares based relative speaker position detection) and level setting, bass-redirection based on speaker headroom functions and main speaker vs. Optimal splicing of the woofer, and the AutoEQ (Automatic Room Equalization) feature provided in the system helps alleviate some of these problems. In home theater or other listening environments, adaptive audio systems include certain additional features such as: (1) automated target curve calculation based on the acoustics of the playback room (in studies of equalization in home listening environments, this was viewed as is an open question), (2) use time-frequency analysis for the effects of modal decay control, (3) understand the parameters derived from measurements governing surroundness/spaciousness/source-width/intelligibility and control these parameters to Provides the best possible listening experience, (4) includes head-model directional filtering for matching timbre between the front speakers and the "other" speakers, and (5) detects speaker separation relative to the listener Spatial location in settings and spatial remapping (e.g. Summit wireless would be an example). Certain panned content between the front-anchor loudspeaker (eg center) and the surround/rear/wide/height speakers is particularly indicative of a mismatch in timbre between the speakers.
æ»çæ¥è¯´ï¼å¦ææäºé³é¢å ç´ çåç°ç空é´ä½ç½®å¹é å±å¹ä¸çå¾åå ç´ ï¼èªéåºé³é¢ç³»ç»è¿å®ç°äºå¼å ¥æ³¨ç®çé³é¢/è§é¢åç°ä½éªï¼ç¹å«æ¯å¨å®¶åºç¯å¢ä¸å±å¹å°ºå¯¸è¾å¤§çæ åµä¸ãä¸ä¸ªç¤ºä¾æ¯è®©çµå½±æçµè§èç®ä¸ç对è¯å¨ç©ºé´ä¸ä¸å¨å±å¹ä¸è¯´è¯ç人æè§è²éåãéç¨å¹³å¸¸çåºäºæ¬å£°å¨å£°éçé³é¢ï¼æ²¡æå®¹æçæ¹æ³æ¥ç¡®å®å¯¹è¯å¨ç©ºé´ä¸åºè¯¥å®ä½å¨åªé以å¹é å±å¹ä¸ç人æè§è²çä½ç½®ãéç¨èªéåºé³é¢ç³»ç»ä¸å¯ç¨çé³é¢ä¿¡æ¯ï¼å¯ä»¥å®¹æå°å®ç°è¿ç§ç±»åçé³é¢/è§è§å¯¹é½ï¼å³ä½¿å¨æ¾ç»ä»¥è¾å¤§å°ºå¯¸çå±å¹ä¸ºç¹è²çå®¶åºå§é¢ç³»ç»ä¸ãè§è§ä½ç½®åé³é¢ç©ºé´å¯¹é½è¿å¯ä»¥ç¨äºéè§è²/对è¯å¯¹è±¡ï¼è¯¸å¦æ±½è½¦ãå¡è½¦ãå¨ç»çãIn general, adaptive audio systems also enable a compelling audio/video reproduction experience, especially with larger screen sizes in a home environment, provided that the reproduced spatial position of some audio elements matches on-screen image elements . An example is having dialogue in a movie or TV show spatially coincide with the person or character speaking on screen. With normal speaker-channel-based audio, there is no easy way to determine where dialogue should be positioned spatially to match the position of a person or character on screen. Using the audio information available in adaptive audio systems, this type of audio/visual alignment can be easily achieved, even in home theater systems that once featured larger sized screens. Visual position and audio spatial alignment can also be used for non-character/dialogue objects such as cars, trucks, animations, etc.
èªéåºé³é¢çæç³»ç»è¿éè¿å 许å 容å建è å建åç¬çé³é¢å¯¹è±¡å¹¶æ·»å å¯ä»¥è¢«ä¼ éå°åç°ç³»ç»çå ³äºå 容çä¿¡æ¯ï¼æ¥å 许å¢å¼ºçå 容管çãè¿å 许å¨å¯¹é³é¢çå 容管ç䏿å¾å¤§ççµæ´»æ§ãä»å 容管ççè§ç¹æ¥çï¼èªéåºé³é¢ä½¿å¾åç§æä¸ºå¯è½ï¼è¯¸å¦éè¿åªæ¿æ¢å¯¹è¯å¯¹è±¡æ¥æ¹åé³é¢å 容çè¯è¨ï¼ä»¥ç¼©å°å 容æä»¶å¤§å°å/æç¼©çä¸è½½æ¶é´ãçµå½±ãçµè§åå ¶ä»å¨±ä¹èç®å ¸åå°å¨å½é ä¸åå¸ãè¿å¸¸å¸¸è¦æ±å 容åä¸çè¯è¨æç §å®å°è¢«åç°çä½ç½®èæ¹å(卿³å½æ¾æ ççµå½±ç¨æ³è¯ï¼å¨å¾·å½æ¾æ ççµè§èç®ç¨å¾·è¯çç)ãç®åï¼è¿å¸¸å¸¸è¦æ±é对æ¯ç§è¯è¨å建ãæå å¹¶ååå®å ¨ç¬ç«çé³é¢å£°è½¨ãéç¨èªéåºé³é¢ç³»ç»åé³é¢å¯¹è±¡çåºæçæ¦å¿µï¼ç¨äºå 容åç对è¯å¯ä»¥æ¯ç¬ç«çé³é¢å¯¹è±¡ãè¿å 许å 容çè¯è¨è¢«è½»æ¾å°æ¹åï¼èä¸ä¼æ´æ°ææ¹åé³é¢å£°è½¨ç诸å¦é³ä¹ãææçä¹ç±»çå ¶ä»å ç´ ãè¿ä¸ä» éç¨äºå¤è¯ï¼è¿éç¨äºé对æäºè§ä¼ãå®å广åççä¸éå½çè¯è¨ãThe adaptive audio ecosystem also allows enhanced content management by allowing content creators to create individual audio objects and add information about the content that can be passed to the reproduction system. This allows for a great deal of flexibility in the content management of the audio. From a content management point of view, adaptive audio enables various things, such as changing the language of the audio content by replacing only the dialog objects, to reduce the content file size and/or to shorten the download time. Movies, television, and other entertainment programming are typically distributed internationally. This often requires that the language in the content block be changed according to where it will be reproduced (French for a movie shown in France, German for a TV show shown in Germany, etc.). Currently, this often requires creating, packaging, and distributing completely separate audio tracks for each language. Using the adaptive audio system and the inherent concepts of audio objects, dialogs for content blocks can be independent audio objects. This allows the language of the content to be easily changed without updating or changing other elements of the audio track such as music, effects, etc. This applies not only to foreign languages, but also to inappropriate language for certain audiences, targeted advertising, etc.
卿¤æè¿°çé³é¢ç¯å¢çæ¹é¢è¡¨ç¤ºéè¿åéçæ¬å£°å¨ååæ¾è®¾å¤æ¥åæ¾é³é¢æé³é¢/è§è§å 容ï¼å¹¶å¯ä»¥è¡¨ç¤ºå¬è æ£å¨ä½éªææå°çå 容çåæ¾çä»»ä½ç¯å¢ï¼è¯¸å¦å½±é¢ãé³ä¹å ãé²å¤©å§åºãå®¶åºææ¿é´ãæ¶å¬é´(listening booth)ãæ±½è½¦ãæ¸¸ææ§å¶å°ãè³çæè³æºç³»ç»ãå ¬å ±å¹¿æ(PA)ç³»ç»æä»»ä½å ¶ä»åæ¾ç¯å¢ãè½ç¶ä¸»è¦æ¯å ³äºå ¶ä¸ç©ºé´é³é¢å 容ä¸çµè§å 容ç¸å ³èçå®¶åºå½±é¢ç¯å¢ä¸ç示ä¾åå®ç°æ¥æè¿°å®æ½ä¾ï¼ä½æ¯åºè¯¥æ³¨æçæ¯ï¼å®æ½ä¾ä¹å¯ä»¥å¨å ¶ä»ç³»ç»ä¸å®ç°ãå æ¬åºäºå¯¹è±¡çé³é¢ååºäºå£°éçé³é¢ç空é´é³é¢å 容å¯ä»¥åä»»ä½ç¸å ³èçå 容(ç¸å ³èçé³é¢ãè§é¢ãå¾å½¢ç)ä¸èµ·ä½¿ç¨ï¼æè å®å¯ä»¥ææç¬ç«çé³é¢å 容ãåæ¾ç¯å¢å¯ä»¥æ¯ä»è³æºæè¿åºçè§å¨å°å°æå¤§æ¿é´ã汽车ãé²å¤©èå°ãé³ä¹å ççä»»ä½åéçæ¶å¬ç¯å¢ãThe aspects of the audio environment described herein represent the playback of audio or audio/visual content through suitable speakers and playback devices, and may represent any environment in which the listener is experiencing the playback of the captured content, such as a theater, concert hall, amphitheater , home or room, listening booth, automobile, game console, headset or headphone system, public address (PA) system, or any other playback environment. Although embodiments are primarily described with respect to examples and implementations in a home theater environment where spatial audio content is associated with television content, it should be noted that embodiments may also be implemented in other systems. Spatial audio content, including object-based audio and channel-based audio, can be used with any associated content (associated audio, video, graphics, etc.), or it can constitute stand-alone audio content. The playback environment may be any suitable listening environment from headphones or near-field monitors to small or large rooms, cars, open stages, concert halls, and the like.
æ¤å¤ææè¿°çç³»ç»çæ¹é¢å¯ä»¥å¨ç¨äºå¤çæ°åææ°ååçé³é¢æä»¶çåéçåºäºè®¡ç®æºç声é³å¤çç½ç»ç¯å¢ä¸å®ç°ãèªéåºé³é¢ç³»ç»çé¨åå¯ä»¥å æ¬å«æä»»ä½æææ°éçåç¬çæºå¨çä¸ä¸ªææ´å¤ä¸ªç½ç»ï¼å æ¬ç¨äºå¯¹å¨è®¡ç®æºä¹é´ä¼ è¾çæ°æ®è¿è¡ç¼å²åè·¯ç±çä¸ä¸ªææ´å¤ä¸ªè·¯ç±å¨(æªç¤ºåº)ãè¿æ ·çç½ç»å¯ä»¥å»ºç«å¨åç§ä¸åçç½ç»åè®®ä¸ï¼å¹¶å¯ä»¥æ¯å ç¹ç½ã广åç½(WAN)ãå±åç½(LAN)æå ¶ä»»ä½ç»åãå¨å ¶ä¸ç½ç»å æ¬å ç¹ç½ç宿½ä¾ä¸ï¼ä¸ä¸ªææ´å¤ä¸ªæºå¨å¯ä»¥è¢«é ç½®æéè¿webæµè§å¨ç¨åºæ¥è®¿é®å ç¹ç½ãAspects of the systems described herein may be implemented in a suitable computer-based sound processing network environment for processing digital or digitized audio files. Portions of the adaptive audio system may include one or more networks comprising any desired number of individual machines, including one or more routers (not shown) for buffering and routing data transmitted between the computers. ). Such a network can be built on a variety of different network protocols and can be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof. In embodiments where the network includes the Internet, one or more machines may be configured to access the Internet through a web browser program.
ç»ä»¶ãåãè¿ç¨æå ¶ä»åè½æ§ç»ä»¶ä¸çä¸ä¸ªææ´å¤ä¸ªå¯ä»¥éè¿æ§å¶ç³»ç»çåºäºå¤çå¨ç计ç®è®¾å¤çæ§è¡çè®¡ç®æºç¨åºæ¥å®ç°ãè¿åºæ³¨æï¼å¨æ¤å ¬å¼çåç§åè½å°±å®ä»¬çè¡ä¸ºãå¯åå¨ä¼ éãé»è¾ç»ä»¶ï¼å/æå ¶ä»ç¹æ§èè¨å¯ä»¥ä½¿ç¨ç¡¬ä»¶ãåºä»¶çä»»ææ°éçç»åå/æä½ä¸ºå¨åç§æºå¨å¯è¯»çæè®¡ç®æºå¯è¯»ä»è´¨ä¸å®æ½çæ°æ®å/ææä»¤æ¥æè¿°ãå ¶ä¸å¯ä»¥å®æ½è¿æ ·çæ ¼å¼åçæ°æ®å/ææä»¤çè®¡ç®æºå¯è¯»ä»è´¨å æ¬ä½ä¸éäºåç§å½¢å¼çç©çç(éç¬æ¶ç)éæå¤±æ§åå¨ä»è´¨ï¼è¯¸å¦å å¦ãç£æ§æå导ä½åå¨ä»è´¨ãOne or more of the components, blocks, procedures or other functional components may be implemented by a computer program that controls the execution of a processor-based computing device of the system. It should also be noted that the various functions disclosed herein may use any number of combinations of hardware, firmware, and/or be implemented in various machine-readable or data and/or instructions embodied in a computer readable medium. Computer-readable media on which such formatted data and/or instructions may be embodied include, but are not limited to, various forms of physical (non-transitory) non-volatile storage media, such as optical, magnetic, or semiconductor storage media.
é¤éä¸ä¸ææç¡®è¦æ±ï¼å¦å卿´ä¸ªè¯´æä¹¦åæå©è¦æ±ä¸ï¼âå æ¬âçè¯è¯è¦å¨å 嫿§çæä¹ä¸çè§£ï¼è䏿¯å¨ææ¥æ§æè¯¦å°½æ§çæä¹ä¸çè§£ï¼ä¹å°±æ¯è¯´ï¼å¨âå æ¬ä½ä¸éäºâçæä¹ä¸çè§£ã使ç¨åæ°æå¤æ°çè¯è¯ä¹å¯ä»¥åå«å æ¬å¤æ°ååæ°ãå¦å¤ï¼è¯è¯â卿¤âãâ卿¤ä¹ä¸âãâ以ä¸âãâ以ä¸â以åç±»ä¼¼å¯¼å ¥çè¯è¯ï¼æ¯æä½ä¸ºæ´ä½çæ¬ç³è¯·ï¼è䏿¯ææ¬ç³è¯·çä»»ä½ç¹å®é¨åãå¨å¼ç¨ä¸¤ä¸ªææ´å¤ä¸ªé¡¹ç®çå表使ç¨è¯è¯âæâæ¶ï¼è¯¥è¯è¯æ¶µçäºä¸å对该è¯è¯çææè§£éï¼å表ä¸çä»»ä½ä¸ä¸ªé¡¹ç®ãå表ä¸çææé¡¹ç®ä»¥åå表ä¸ç项ç®çä»»ä½ç»åãUnless the context clearly requires otherwise, throughout the specification and claims, the words "comprise" and "comprise" are to be read in an inclusive sense, not in an exclusive or exhaustive sense; that is, in "including but not understood in the sense of "limited to". Words using the singular or the plural may also include the plural and the singular respectively. Additionally, the words "herein," "herein," "above," "below," and words of similar import, refer to this application as a whole and not to any particular portions of this application. When the term "or" is used in reference to a list of two or more items, that term covers all of the following interpretations of that term: any of the items in the list, all of the items in the list, and any combination of items in the list .
尽管以举ä¾çæ¹å¼åä¾ç §ç¹å®å®æ½ä¾æè¿°äºä¸ä¸ªææ´å¤ä¸ªå®ç°ï¼ä½æ¯è¦çè§£ä¸ä¸ªææ´å¤ä¸ªå®ç°ä¸éäºæå ¬å¼ç宿½ä¾ãç¸åï¼æ¨å¨è¦ç对æ¬é¢åææ¯äººåæ¥è¯´ææ¾çåç§ä¿®æ¹å类似çå¸ç½®ãå æ¤ï¼æéæå©è¦æ±çèå´åºè¢«èµäºæå®½æ³çè§£éä»¥ä¾¿æ¶µçææè¿ç§ä¿®æ¹å类似çå¸ç½®ãWhile one or more implementations have been described by way of example and in terms of specific embodiments, it is to be understood that the one or more implementations are not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements apparent to those skilled in the art. Accordingly, the scope of the appended claims should be given the broadest interpretation so as to cover all such modifications and similar arrangements.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4