以ä¸ãçºæã宿½ããããã®æè¯ã®å½¢æ ï¼ä»¥ä¸å®æ½ã®å½¢æ ã¨ããï¼ã«ã¤ãã¦èª¬æãã¦ããã   Hereinafter, the best mode for carrying out the invention (hereinafter referred to as an embodiment) will be described.
ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æ ï¼
å³ï¼ã¯ãæ¬çºæã«ããã第ï¼ã®å®æ½ã®å½¢æ ã¨ãã¦ã®ä¿¡å·å¦çè£ ç½®ãå«ãã§æ§æããããåçè£ ç½®ï¼ã®å 鍿§æã«ã¤ãã¦ç¤ºãã¦ããã
FIG. 1 shows an internal configuration of a playback apparatus 1 including a signal processing apparatus as a first embodiment of the present invention.
ããã§ãå
ãåæã¨ãã¦ãã¡ãã£ã¢åçé¨ï¼ãåçããè¨é²åªä½ã«ã¯ãæ ååã³é³å£°ãå«ãã³ã³ãã³ããè¨é²ãããã
ãã®ãããªã³ã³ãã³ãã¨ãã¦ã¯ãä¾ãã°ã³ã³ãµã¼ãã©ã¤ããªã©ãåé²ããæè¬ã©ã¤ããããªã§ããå ´åãæ³å®ããã
ä½ãããã®å ´åãè¨é²åªä½ã«å¯¾ãã¦ã¯ããã¼ã«ã«ãã®ã¿ã¼ããã©ã ããã¼ã¹ããã¼ãã¼ãï¼éµç¤æ¥½å¨ï¼ãªã©ã®æå±ã»æ¼å¥è
ï¼ä»¥ä¸Playerã¨ãè¨ãï¼ãã¨ã«ãããããåå¥ã«è¿æ¥ãã¤ã¯ãªã©ãç¨ãã¦ãã®é³å£°ãåé²ãï¼ããããã©ã¤ã³åé²ï¼ããã®ããã«Playerãã¨ã«ã©ã¤ã³åé²ããé³å£°ä¿¡å·ããä¾ãã°ãã©ãã¯ãã¨ã«åãããããªã©ãã¦å¥ã
ã«è¨é²ããããã«ããã¦ãããããã¦ããã®ãããªé³å£°ä¿¡å·ã¨å
±ã«ããããPlayerãã³ã³ãµã¼ããã¼ã«ãªã©ã®ä¼å ´ã§æå±ã»æ¼å¥ããæ§åãæ®å½±ããæ åãåé²ããããã®ã¨ãªã£ã¦ããã Here, as a premise, content including video and audio is recorded on a recording medium reproduced by the media reproducing unit 2.
As such content, for example, a case where a so-called live video recording a concert live is assumed.
However, in this case, the recording medium is individually used for each vocalist, guitar, drum, bass, keyboard (keyboard instrument), etc. Audio is recorded (so-called line recording), and the audio signals recorded in a line for each player in this way are recorded separately, for example, divided into tracks. Along with such audio signals, a video of the player singing and performing in a venue such as a concert hall is recorded.
ä¾ãã°ãã®ãããªæ§æã«ããã³ã³ãã³ããæ³å®ããå ´åã«ããã¦ãåPlayerã¯ãããããç¬ç«ãã鳿ºã¨ãªãããã«ããããã¤ã¾ããæ åã«æ ãåºãããåPlayerã®ä½ç½®ããããããã®é³æºã®ä½ç½®ã¨ãªããã®ã§ããã
åçè£
ç½®ï¼ã¨ãã¦ã¯ãã©ã¤ã³åé²ãããåPlayerï¼å鳿ºï¼ãã¨ã®é³å£°ä¿¡å·ãå®ä½ããä½ç½®ã¨ãæ åå
ã«æ ãåºãããåPlayerã®ä½ç½®ï¼å鳿ºã®ä½ç½®ï¼ã¨ãä¸è´ããããã«åç¾ãããã¨ãç®çã¨ãããããªãã¡ããããå®ç¾ãããã¨ã§ãããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããããã«ãããã®ã§ããã For example, in the case of assuming content with such a configuration, each Player is configured to be an independent sound source. In other words, the position of each Player displayed in the video becomes the position of each sound source.
The playback device 1 reproduces so that the position where the audio signal for each player (each sound source) recorded in the line matches the position of each player (the position of each sound source) displayed in the video. With the goal. That is, by realizing this, a more realistic video / sound field space is reproduced.
ã¾ãããã®å ´åãæ åå ã®Playerã®ä½ç½®ã¨ãã¦ã¯ã左峿¹åã¨å ±ã«ä¸ä¸æ¹åãå®ç¾©ãã¦äºæ¬¡å çã«è¡¨ããã®ã¨ããããã«å¿ãPlayerãã¨ã®é³å£°ä¿¡å·ãå®ä½ããä½ç½®ï¼ä»®æ³é³åä½ç½®ï¼ã¨ãã¦ããä¸ä¸å·¦å³ã®äºæ¬¡å çã«åç¾ãããã®ã¨ãã¦ããã   In this case, the position of the player in the video is defined in two dimensions by defining the vertical direction as well as the horizontal direction, and as a position (virtual sound image position) where the audio signal for each player is localized accordingly. It is supposed to be reproduced two-dimensionally up and down and left and right.
ãã®ããã«ãåçè£ ç½®ï¼ã§çæããé³å£°ä¿¡å·ãé³å£°åºåããã¹ãã¼ã«ï¼³ï¼°ã¨ãã¦ã¯ã次ã®å³ï¼ã«ç¤ºãããããã«ããã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ã®ä¸å¿ç¹ãä¸å¿ã¨ãã¦å·¦å³å¯¾è±¡ã«é ç½®ãããLchã®ã¹ãã¼ã«ï¼³ï¼°Lã¨ï¼²chã®ã¹ãã¼ã«ï¼³ï¼°Rã¨ãåãããããã¦ããããLchã®ã¹ãã¼ã«ï¼³ï¼°Lã¨ï¼²chã®ã¹ãã¼ã«ï¼³ï¼°Rã¨ãã¦ã¯ãããããã縦æ¹åã«ãç©ã¿éãã¦é ç½®ããããã«ããããã¤ã¾ããã®å ´åãLchã®ã¹ãã¼ã«ï¼³ï¼°Lã¨ãã¦ã¯ã䏿¹ã«é ç½®ãããã¹ãã¼ã«ï¼³ï¼°L-unã¨ããã®ä¸æ¹ã«é ç½®ãããã¹ãã¼ã«ï¼³ï¼°L-upã¨ãè¨ãããããåæ§ã«ï¼²chã®ã¹ãã¼ã«ï¼³ï¼°Rã¨ãã¦ã¯ã䏿¹ã«é ç½®ãããã¹ãã¼ã«ï¼³ï¼°R-unã¨ããã®ä¸æ¹ã«é ç½®ãããã¹ãã¼ã«ï¼³ï¼°R-upã¨ãè¨ããããã   For this reason, as a speaker SP that outputs the sound signal generated by the playback apparatus 1 as a sound, as shown in FIG. 2, the Lch speaker SPL arranged on the left and right objects with the center point of the display or screen as the center. And Rch speaker SPR. The Lch speaker SPL and the Rch speaker SPR are stacked in the vertical direction. That is, in this case, as the Lch speaker SPL, a speaker SPL-un disposed below and a speaker SPL-up disposed above are provided. Similarly, as the Rch speaker SPR, a speaker SPR-un disposed below and a speaker SPR-up disposed above the speaker SPR-un are provided.
ãªããããã§æ³¨æç¹ã¨ãã¦ã以ä¸ã§èª¬æãã第ï¼ã®å®æ½ã®å½¢æ ãå«ããå宿½ã®å½¢æ ã«ããã¦ã¯ã説æã®ä¾¿å®ä¸ãé³å£°ä¿¡å·ã«ã¯ï¼ã¤ã®é³æºï¼Playerï¼ã«ã¤ãã¦ã®é³å£°ã®ã¿ãå«ã¾ãã¦ãããã®ã¨ãã¦èª¬æãç¶ãããããªãã¡ããã®å ´åã®é³å£°ä¿¡å·ï¼¡ã¨ãã¦ã¯ãï¼ã¤ã®é³æºã«ã¤ãã¦ã©ã¤ã³åé²ããé³å£°ä¿¡å·ã®ã¿ãåçããããã®ã¨ããã   It should be noted that in each embodiment including the first embodiment described below, for convenience of explanation, the sound signal includes only sound for one sound source (Player). The explanation will continue as if it were. That is, as the audio signal A in this case, only the audio signal recorded in a line for one sound source is reproduced.
å³ï¼ã«ããã¦ãã¡ãã£ã¢åçé¨ï¼ã«ããã¦ã¯ãä¸è¿°ã®ããã«ãã¦è¨é²åªä½ã«ã¤ãã¦ã®åçãè¡ããã¨ã§ãæ åä¿¡å·ï¼¶ãå«ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã¨ãé³å£°ä¿¡å·ï¼¡ãå«ãé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¨ãå¾ãããã
ãããæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmãé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¯ãå®ãã¼ã¿ã¨ãã¦ã®æ åä¿¡å·ï¼¶ãé³å£°ä¿¡å·ï¼¡ã¨ãæå®ã®ä»å æ
å ±ã¨ãå¤éåãããã¹ããªã¼ã ãã¼ã¿ã§ããã In FIG. 1, the media playback unit 2 performs playback on the recording medium as described above, so that the video stream data V-strm including the video signal V and the audio stream data A-strm including the audio signal A are obtained. And is obtained.
These video stream data V-strm and audio stream data A-strm are stream data in which the video signal V and audio signal A as actual data and predetermined additional information are multiplexed.
ããã§ã確èªã®ããã«ã次ã®å³ï¼ã«ã¯ãä¸è¨æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã®ãã¼ã¿æ§é ã示ãã¦ããããã®å³ï¼ã«ã示ãããããã«ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã¯ãæ åä¿¡å·ï¼¶ã¨ãã®ä»å ãã¼ã¿ã¨ãå«ãã§æ§æããããä»å ãã¼ã¿ã¨ãã¦ã¯ãä¾ãã°ã»ã¯ã¿ã¼åä½ãªã©ã®æå®ãã¼ã¿åä½ãã¨ã«åãè¾¼ã¾ãããã¼ã¿ãããæ åä¿¡å·ï¼¶ã«ã¤ãã¦ã®ä»å çãªãã¼ã¿å
容ãæããã
ãªããå³ç¤ºã¯çç¥ãããé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¨ãã¦ããåæ§ã«æå®ã®ãã¼ã¿åä½ãã¨ã«é³å£°ä¿¡å·ï¼¡ã«ã¤ãã¦ã®ä»å ãã¼ã¿ãåãè¾¼ã¾ããæ§é ãæãããã®ã¨ãªãã For confirmation, FIG. 3 shows the data structure of the video stream data V-strm. As shown in FIG. 3, the video stream data V-strm includes the video signal V and its additional data. The additional data is, for example, data embedded for each predetermined data unit such as a sector unit, and has additional data content for the video signal V.
Although not shown, the audio stream data A-strm has a structure in which additional data for the audio signal A is similarly embedded for each predetermined data unit.
å³ï¼ã«ããã¦ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã¯ãããªãã³ã¼ãï¼ã«ä¾çµ¦ãããããã«ããã¦ãã³ã¼ãå¦çãæ½ããããã¨ã§æ åä¿¡å·ï¼¶ãå¾ãããã
ã¾ããé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¯ãªã¼ãã£ãªãã³ã¼ãï¼ã«ä¾çµ¦ãããåæ§ã«ãã³ã¼ãå¦çãæ½ããããã¨ã§é³å£°ä¿¡å·ï¼¡ãå¾ãããã
æ åä¿¡å·ï¼¶ã¯æ ååºå端åï¼´ï½ã«ä¾çµ¦ãããã¨å
±ã«ãå³ç¤ºãã鳿ºåº§æ¨åå¾é¨ï¼ã«å¯¾ãã¦ãåå²ãã¦ä¾çµ¦ããããæ ååºå端åï¼´ï½ããã®æ åä¿¡å·ï¼¶ã¯ãå
ã®å³ï¼ã«ç¤ºãããã£ã¹ãã¬ã¤ã¾ãã¯ã¹ã¯ãªã¼ã³ï¼ããã¸ã§ã¯ã¿è£
ç½®ï¼ã«ä¾çµ¦ãããã
䏿¹ãé³å£°ä¿¡å·ï¼¡ã¯ãé³å£°ä¿¡å·å¦çé¨ï¼ã«å¯¾ãã¦ä¾çµ¦ãããã In FIG. 1, video stream data V-strm is supplied to a video decoder 3, where a video signal V is obtained by performing a decoding process.
The audio stream data A-strm is supplied to the audio decoder 4, and the audio signal A is obtained by performing decoding processing in the same manner.
The video signal V is supplied to the video output terminal Tv and is also branched and supplied to the sound source coordinate acquisition unit 6 shown in the figure. The video signal V from the video output terminal Tv is supplied to the display or screen (projector device) shown in FIG.
On the other hand, the audio signal A is supplied to the audio signal processing unit 5.
ãªãããã®å³ï¼ã§ã¯ç ´ç·ã«ãããæ¬¡ã«èª¬æãã鳿ºåº§æ¨åå¾é¨ï¼ã座æ¨å¤æé¨ï¼ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ã夿ãããªã¯ã¹ç®åºé¨ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ãå²ã£ã¦ç¤ºãã¦ãããããããç ´ç·ã§å²ãããé¨åã第ï¼ã®å®æ½ã®å½¢æ ã¨ãã¦ã®ä¿¡å·å¦çè£ ç½®ãå½¢æãããã®ã¨ãªãã   In FIG. 1, the sound source coordinate acquisition unit 6, the coordinate conversion unit 7, the localization position control unit 8, the conversion matrix calculation unit 9, and the audio signal processing unit 5 described below are surrounded by broken lines. A portion surrounded by a broken line forms the signal processing apparatus according to the first embodiment.
鳿ºåº§æ¨åå¾é¨ï¼ã¯ãä¸è¨æ åä¿¡å·ï¼¶ã«åºã¥ããæ åä¸ã®é³æºã®ä½ç½®ã表ã座æ¨å¤ï¼å¾è¿°ããæ å座æ¨ç³»ã®åº§æ¨å¤ï¼ãåå¾ããã
ãã®ãããªæ åä¿¡å·ï¼¶ããã®é³æºåº§æ¨å¤ã®åå¾ã¯ãä¾ãã°ä»¥ä¸ã®ãããªææ³ã«ããå®ç¾ã§ããã
ã¤ã¾ããäºãæ 忮影æã«ããã¦ãPlayerã¨ãã¦ã®äººç©ã«å¯¾ãä¾ãã°èµ¤å¤ç·ã«ããIDæ
å ±ãçºå
ããçºå
è£
ç½®ãªã©ã®æå®ã®ãã¼ã«ã¼ãä»ãã¦æ åãæ®å½±ãã¦ããã鳿ºåº§æ¨åå¾é¨ï¼ã§ã¯ãä¾çµ¦ãããæ åä¿¡å·ï¼¶ãããã®ãã¼ã«ã¼ã®ä½ç½®ãç»åå¦çã«ããæ¤åºããããããã©ããã³ã°ãããã¨ã§Playerã®æ åä¸ã«ãããä½ç½®æ
å ±ãããªãã¡é³æºã®åº§æ¨å¤ãé æ¬¡åå¾ããããã«æ§æãããã®ã§ããã
ããã«ãã£ã¦æ åä¸ã®é³æºã®ä½ç½®æ
å ±ããæ åä¿¡å·ï¼¶ã«åºã¥ãåå¾ãããã¨ãã§ããã
ã¾ããããã¨å
±ã«é³æºåº§æ¨åå¾é¨ï¼ã¯ãå
¥åãããæ åä¿¡å·ï¼¶ã®æ°´å¹³ç·ç»ç´ æ°ã¨åç´ç·ç»ç´ æ°ã®æ
å ±ããå¾è¿°ãã夿ãããªã¯ã¹ç®åºé¨ï¼ã«ä¸ããã Based on the video signal V, the sound source coordinate acquisition unit 6 acquires coordinate values (coordinate values of a video coordinate system described later) representing the position of the sound source in the video.
Such acquisition of the sound source coordinate value from the video signal V can be realized by the following method, for example.
That is, at the time of video recording, a video is previously recorded by attaching a predetermined marker such as a light emitting device that emits ID information by infrared rays to a person as a player, and is supplied by the sound source coordinate acquisition unit 6. By detecting the position of the marker from the video signal V by image processing and tracking it, the position information in the video of the player, that is, the coordinate value of the sound source is sequentially obtained.
As a result, the position information of the sound source in the video can be acquired based on the video signal V.
At the same time, the sound source coordinate acquisition unit 6 gives information about the total number of horizontal pixels and the total number of vertical pixels of the input video signal V to the conversion matrix calculation unit 9 described later.
座æ¨å¤æé¨ï¼ã¯ã鳿ºåº§æ¨åå¾é¨ï¼ã«ããåå¾ããã座æ¨å¤ããå³ç¤ºãã夿ãããªã¯ã¹ç®åºé¨ï¼ã«ããç®åºããã夿ãããªã¯ã¹ã«åºã¥ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããã   The coordinate conversion unit 7 converts the coordinate value acquired by the sound source coordinate acquisition unit 6 into the coordinate value of the audio coordinate system based on the conversion matrix calculated by the conversion matrix calculation unit 9 shown in the figure.
ããã§ãæ åå ã«ãããPlayerï¼é³æºï¼ã®ç§»åéã¯ãããã¾ã§æ åå ã§ã®ç§»åéã§ãã£ã¦å®ä¸çã®ç§»åéã§ã¯ãªããä»®æ³é³åã®ä½ç½®ãæ åå ã§ã®ç§»åéã ãç§»åããã¦ãæ åå ã®Playerã®ä½ç½®ã¨ä»®æ³é³åã®ä½ç½®ã¨ã¯ä¸è´ããªããã¨ãèãããããããªãã¡ãæ åå ã®é³æºã®ä½ç½®ã¯æ å座æ¨ç³»ã§å®ç¾©ãããã®ã«å¯¾ãããã®ä»®æ³é³åä½ç½®ã¯é³å£°åº§æ¨ç³»ï¼å®ä¸ç座æ¨ç³»ï¼ã§å®ç¾©ãããã¹ããã®ã¨ãªãã   Here, the amount of movement of the player (sound source) in the video is only the amount of movement in the video, not the amount of movement in the real world, and even if the position of the virtual sound image is moved by the amount of movement in the video The position of the player and the position of the virtual sound image may not match. That is, the position of the sound source in the video is defined in the video coordinate system, while the virtual sound image position is to be defined in the audio coordinate system (real world coordinate system).
ãã®ãã¨ããæ¬¡ã®å³ï¼ãå³ï¼ãåç
§ãã¦èª¬æãããå³ï¼ã¯ãæ åä¿¡å·ï¼¶ã«åºã¥ãæ åãæ ãåºããã表示ç»é¢ï¼ãã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ï¼ã¨æ å座æ¨ç³»ã¨ã®é¢ä¿ã«ã¤ãã¦ç¤ºããå³ï¼ã§ã¯ä¸è¨è¡¨ç¤ºç»é¢ã¨åã¹ãã¼ã«ï¼³ï¼°ã®é
ç½®ä½ç½®ã¨é³å£°åº§æ¨ç³»ã¨ã®é¢ä¿ã«ã¤ãã¦ç¤ºãã¦ããã
ãªããå³ï¼ã§ã¯å³ç¤ºã®é½åä¸ãã¹ãã¼ã«ï¼³ï¼°ã縦æ¹åã«éãã¦é
ç½®ãããããã«ã¯ç¤ºãã¦ããªãããå®éã«ã¯å
ã®å³ï¼ã«ç¤ºããããã«ãã¦ã¹ãã¼ã«ï¼³ï¼°L-unã¨ã¹ãã¼ã«ï¼³ï¼°L-upãã¹ãã¼ã«ï¼³ï¼°R-unã¨ã¹ãã¼ã«ï¼³ï¼°R-upã¨ãããããç©ã¿éãããã¦é
ç½®ããããã®ã¨ããã This will be described with reference to FIGS. 4 and 5 below. FIG. 4 shows the relationship between a display screen (display or screen) on which video based on the video signal V is projected and the video coordinate system, and FIG. 5 shows the relationship between the display screen, the position of each speaker SP, and the audio coordinate system. Shows about.
In FIG. 5, for the sake of illustration, the speaker SP is not shown to be stacked in the vertical direction, but actually, the speaker SPL-un and the speaker SPL- are shown in FIG. It is assumed that the up, speaker SPR-un and speaker SPR-up are stacked and arranged.
å ãå³ï¼ã«ç¤ºãããã«ãæ å座æ¨ç³»ã¨ãã¦ã¯ãä¾ãã°è¡¨ç¤ºç»é¢ã®æ¨ªï¼æ°´å¹³ï¼æ¹åãï½è»¸ã¨ãã縦ï¼åç´ï¼æ¹åãï½è»¸ã¨ãã表示ç»é¢ã®å·¦ä¸é ã®åº§æ¨å¤ï¼ï½ï¼ï½ï¼ãï¼ï¼ï¼ï¼ï¼ãã¤ã¾ãåç¹ã¨ãããã¨ãã§ããããã®å ´åã«ããã¦ãåç¹ããæ°´å¹³æ¹åã¸ã®ç»ç´ æ°ããï¼ï¼ï¼ããåç´æ¹åã¸ã®ç»ç´ æ°ããï¼ï¼ãã§ããç¹ã¯ãå³ç¤ºããããã«åº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã¨è¡¨ããã¨ãã§ãããããã§ã¯ãæ åä¸ã®é³æºã®ä½ç½®ã®åº§æ¨å¤ãããã®åº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã®ä½ç½®ã§ãã£ãã¨ããã   First, as shown in FIG. 4, as the video coordinate system, for example, the horizontal (horizontal) direction of the display screen is set as the x axis, the vertical (vertical) direction is set as the y axis, and the coordinate value (x, y of the upper left corner of the display screen). ) Can be (0, 0), that is, the origin. In this case, the point where the number of pixels in the horizontal direction from the origin is â100â and the number of pixels in the vertical direction is â50â can be expressed as coordinate values (100, 50) as illustrated. Here, it is assumed that the coordinate value of the position of the sound source in the video is the position of this coordinate value (100, 50).
䏿¹ãå³ï¼ã«ãããé³å£°åº§æ¨ç³»ã«ããã¦ã¯ãã¹ãã¼ã«ï¼³ï¼°L-unãã¹ãã¼ã«ï¼³ï¼°L-upãã¹ãã¼ã«ï¼³ï¼°R-unãã¹ãã¼ã«ï¼³ï¼°R-upããã®é³å£°åºåã«ããå¯è½ãªä»®æ³é³åã®å®ä½ç¯å²ï¼ä»¥ä¸ãå®ä½å¯è½ç¯å²ã¨ç§°ããï¼ã®ä¸å¿ã®åº§æ¨å¤ï¼ï½ï¼ï½ï¼ãï¼ï¼ï¼ï¼ï¼ã¨è¡¨ç¾ããããã«ãããã
ä¾ãã°ãå
ã®å³ï¼ã«ããã¦åã¹ãã¼ã«ï¼³ï¼°ããã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ã®ä¸å¿ç¹ãä¸å¿ã¨ãã¦å·¦å³åã³ä¸ä¸å¯¾ç§°ã«é
ç½®ããå ´åã«ã¯ãå³ç¤ºããããã«ãã¦è¡¨ç¤ºç»é¢ã®ä¸å¿ãï¼ï¼ï¼ï¼ï¼ã¨ãªãããã«ãããã
ãã®å ´åãæ°´å¹³æ¹åã¯ï½è»¸ãåç´æ¹åã¯ï½è»¸ã§è¡¨ããã¾ãï½è»¸æ¹åã«ããã¦ä¸å¿ãã䏿¹åãæ£ã®å¤ã䏿¹åãè² ã®å¤ã«ãã示ããã¾ãï½è»¸æ¹åã«ããã¦ã¯å³æ¹åãæ£ã®å¤ãå·¦æ¹åãè² ã®å¤ã«ãã示ããããã«ããä¸å¿ãã峿¹åã«ï¼ï¼ï¼cmã䏿¹åã«ï¼ï¼cmã¨ãªãä½ç½®ã¯ãå³ä¸ã«é»ä¸¸ã§ç¤ºã座æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã¨è¡¨ããã¨ãã§ããã On the other hand, in the audio coordinate system in FIG. 5, the localization range of a virtual sound image (hereinafter referred to as a localization possible range) that can be generated by audio output from the speaker SPL-un, the speaker SPL-up, the speaker SPR-un, and the speaker SPR-up. ) Center coordinate value (x, y) is expressed as (0, 0).
For example, when the speakers SP in FIG. 2 are arranged left-right and vertically symmetrically about the center point of the display or screen, the center of the display screen is (0, 0) as shown in the figure. Is done.
Again, the horizontal direction is represented by the x-axis and the vertical direction is represented by the y-axis. In the y-axis direction, the upward direction from the center is indicated by a positive value, and the downward direction is indicated by a negative value. In the x-axis direction, the right direction is indicated by a positive value and the left direction is indicated by a negative value. As a result, the position that is 100 cm in the right direction and 50 cm in the upward direction from the center can be expressed as a coordinate value (100, 50) indicated by a black circle in the drawing.
ããã§ãå³ï¼ã«ç¤ºãããæ å座æ¨ç³»ã§ã®é³æºä½ç½®ã®åº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ããã®ãããªé³å£°åº§æ¨ç³»ã«ãã®ã¾ã¾é©ç¨ããã¨ãã¦ããä¸è¨ã®ããã«é³å£°åº§æ¨ç³»ã«ããã座æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã¯ç»é¢ä¸å¿ãã峿¹åã«ï¼ï¼ï¼cmã䏿¹åã«ï¼ï¼cmã®ä½ç½®ã¨ãªããã¨ããããããããã«ã両è ã¯ä¸è´ãããã®ã¨ã¯ãªããªããã¤ã¾ããå³ï¼ã«ç¤ºã鳿ºã®ä½ç½®ã«å¿ãã¦ä»®æ³é³åãå®ä½ãããã¹ãæ£ããä½ç½®ã¯ãå®éã«ã¯å³ä¸ã®ç ´ç·ä¸¸å°ã§ç¤ºãä½ç½®ã§ããã®ã«å¯¾ãããã®å ´åã¯èª¤ã£ãä½ç½®ãä»®æ³é³åã®ä½ç½®ã¨ãã¦èªèããã¦ãã¾ããã¨ã«ãªãã   Here, even if the coordinate value (100, 50) of the sound source position in the video coordinate system shown in FIG. 4 is directly applied to such a voice coordinate system, the coordinate value (100, 50) in the voice coordinate system as described above is used. 50) is 100 cm in the right direction and 50 cm in the upward direction from the center of the screen, so that they do not match. That is, the correct position where the virtual sound image should be localized in accordance with the position of the sound source shown in FIG. 4 is actually the position indicated by the dotted circle in the figure, but in this case, the incorrect position is the position of the virtual sound image. It will be recognized as a position.
ããã§ãå³ï¼ã«ç¤ºãåçè£
ç½®ï¼ã§ã¯ãä¸è¿°ã®ããã«ãã¦åº§æ¨å¤æé¨ï¼ãè¨ãã鳿ºåº§æ¨åå¾é¨ï¼ã«ããåå¾ãããæ å座æ¨ç³»ã®åº§æ¨å¤ãã夿ãããªã¯ã¹ç®åºé¨ï¼ã«ããç®åºããã夿ãããªã¯ã¹ã«åºã¥ãã¦é³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æãããã®ã¨ãã¦ããã
ãã®å ´åã夿ãããªã¯ã¹ã¯ãæ å座æ¨ç³»ã«ããï¼ç¹ã®åº§æ¨å¤ã¨ããããï¼ç¹ã®åã
ã¨å¯¾å¿ããé³å£°åº§æ¨ç³»ï¼å®ä¸ç座æ¨ç³»ï¼ã«ããï¼ç¹ã®åº§æ¨å¤ã¨ãä¸ãããããã¨ã§ç®åºãããã¨ãã§ããã
å
·ä½çã«ããã®å ´åã«ããã¦æ å座æ¨ç³»ã¨é³å£°åº§æ¨ç³»ã¨ã§å¯¾å¿é¢ä¿ãæãããªã®ã¯ã表示ç»é¢ã®åé
ã®ç«¯ç¹ã¨ãå®ä½å¯è½ç¯å²ã®åé
ã®ç«¯ç¹ã¨ãªããå¾ã£ã¦ã夿ãããªã¯ã¹ã¯ã表示ç»é¢å´ã®åé
端ç¹ã®ãã¡ã®ï¼ç¹ã¨ãå®ä½å¯è½ç¯å²å´ã®åé
ã®ç«¯ç¹ã®ãã¡ã®å¯¾å¿ããï¼ç¹ã¨ã«ã¤ãã¦ã®åº§æ¨å¤ãããããä¸ãããããã¨ã§ãç®åºãããã¨ãã§ããã Therefore, in the playback apparatus 1 shown in FIG. 1, the coordinate conversion unit 7 is provided as described above, and the coordinate value of the video coordinate system acquired by the sound source coordinate acquisition unit 6 is converted by the conversion matrix calculation unit 9. It is assumed that the coordinate values of the voice coordinate system are converted based on the matrix.
In this case, the conversion matrix is calculated by giving three coordinate values by the video coordinate system and three coordinate values by the audio coordinate system (real world coordinate system) corresponding to each of these three points. Can do.
Specifically, in this case, the correspondence between the video coordinate system and the audio coordinate system is obvious at the four corners of the display screen and the four corners of the localization range. Therefore, the transformation matrix can be calculated by giving the coordinate values for three of the four corner end points on the display screen side and the corresponding three points of the four corner end points on the localization possible range side. it can.
夿ãããªã¯ã¹ç®åºé¨ï¼ã«ã¯ã鳿ºåº§æ¨åå¾é¨ï¼ããæ°´å¹³ç·ç»ç´ æ°ã¨åç´ç·ç»ç´ æ°ã®æ
å ±ãå
¥åããããããç»ç´ æ°æ
å ±ã«åºã¥ããä¸è¨è¡¨ç¤ºç»é¢ã®åé
ã®ç«¯ç¹ã®ãã¡ã®æå®ã®ï¼ç¹ã«ã¤ãã¦ã®åº§æ¨å¤ãåå¾ããããã«ããããã¾ãã夿ãããªã¯ã¹ç®åºé¨ï¼ã«ã¯ãå³ç¤ºããæä½é¨ï¼ï¼ãä»ããã¦ã¼ã¶æä½ã«åºã¥ããä¸è¨æå®ã®ï¼ç¹ã¨åãä½ç½®é¢ä¿ã¨ãªãå®ä½å¯è½ç¯å²å´ã®ï¼ã¤ã®ç«¯ç¹ã«ã¤ãã¦ã®åº§æ¨å¤ãä¸ããããã
夿ãããªã¯ã¹ç®åºé¨ï¼ã¯ããããæ å座æ¨ç³»ã«ããï¼ç¹ã®ç«¯ç¹ã®åº§æ¨å¤ã¨é³å£°åº§æ¨ç³»ã«ããï¼ç¹ã®ç«¯ç¹ã®åº§æ¨å¤ã¨ã«åºã¥ãã夿ãããªã¯ã¹ãç®åºããã Information on the total number of horizontal pixels and the total number of vertical pixels is input to the conversion matrix calculation unit 9 from the sound source coordinate acquisition unit 6, and based on these pixel number information, predetermined three points among the end points of the four corners of the display screen To get the coordinate value for. In addition, based on a user operation via the illustrated operation unit 10, the conversion matrix calculation unit 9 is given coordinate values for the three end points on the localization possible range side having the same positional relationship as the predetermined three points.
The conversion matrix calculation unit 9 calculates a conversion matrix based on the coordinate values of the three end points in the video coordinate system and the coordinate values of the three end points in the audio coordinate system.
ãªãããã®å ´åã®ã¦ã¼ã¶ã«å¯¾ãã¦ã¯ãå®éã«å®ä½å¯è½ç¯å²ã®ä¸è¨ï¼ã¤ã®ç«¯ç¹ã®åº§æ¨å¤ï¼ä¾ãã°cmåä½ï¼ã«ã¤ãã¦è¨æ¸¬ããããããï¼ç¹ã®åº§æ¨å¤ãç´æ¥çã«å ¥åãããããã«ãã¦ãããããä¾ãã°ã¹ãã¼ã«ã·ã¹ãã ã¨ãã¦ã¯æ¨å¥¨ã®é ç½®ä½ç½®å¯¸æ³ãè¦å®ããããã®ãããããã®å ´åã¯ã¹ãã¼ã«ã·ã¹ãã ã¨ãã¦ã©ã®ã·ã¹ãã ãç¨ãããã¦ãããããããã°ãå®ä½å¯è½ç¯å²ã®å¯¸æ³ããããããã£ã¦ä¸è¨é³å£°åº§æ¨ç³»ã«ããï¼ã¤ã®ç«¯ç¹ã®åº§æ¨å¤ã夿ããããã®ãã¨ãããã¦ã¼ã¶ã«ã¯ã¹ãã¼ã«ã·ã¹ãã ã«ã¤ãã¦ã®è£½ååçªã製ååç§°çã®è£½åç¹å®æ å ±ã鏿åã¯æç¤ºå ¥åãããæä½ã®ã¿ãè¡ããããã®è£½åç¹å®æ å ±ã«åºã¥ãä¸è¨ï¼ç¹ã®é³å£°åº§æ¨ç³»ã«ãã座æ¨å¤ãå¾ãããã«æ§æãããã¨ãã§ããã   Note that the user in this case may actually measure the coordinate values (for example, in cm) of the above three end points in the localization range and directly input the coordinate values of these three points. However, for example, some speaker systems have recommended layout position dimensions. In this case, if it is known which system is used as the speaker system, the dimensions of the localization range can be known. The coordinate values of the three end points by are also found. From this, the user is only allowed to select or instruct to input product specifying information such as the product model number and product name for the speaker system, and based on the product specifying information, the coordinate values by the above three-point audio coordinate system are obtained. It can also be configured to obtain.
ã¾ãã確èªã®ããã«è¿°ã¹ã¦ããã¨ã夿ãããªã¯ã¹ã®ç®åºã¯ãæ å座æ¨ç³»ã¨é³å£°åº§æ¨ç³»ã¨ã®å¯¾å¿é¢ä¿ãç¶æãããéãã«ããã¦ã¯ãåè¨ç®ã®å¿ è¦ã¯ãªããããªãã¡ãä¾ãã°ãã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ã¨ãã¦ç»ç´ æ°ã®ç°ãªã製åãç¨ãããããªã©æ å座æ¨ç³»ãå¤åããå ´åããç°ãªãã¹ãã¼ã«ã·ã¹ãã ã使ç¨ãã¦é³å£°åº§æ¨ç³»ãå¤åããçã®å ´åã«ã®ã¿ãåè¨ç®ãè¡ãããããã«ãããã°ããã   For confirmation, the calculation of the conversion matrix does not require recalculation as long as the correspondence between the video coordinate system and the audio coordinate system is maintained. That is, recalculation is performed only when the video coordinate system changes, such as when a product with a different number of pixels is used as a display or screen, or when the audio coordinate system changes using a different speaker system. What should be done.
座æ¨å¤æé¨ï¼ã¯ãä¸è¨ã®ããã«ãã¦ç®åºããã夿ãããªã¯ã¹ãç¨ãã¦ãé³å£°åº§æ¨åå¾é¨ï¼ã«ããåå¾ãããæ å座æ¨ç³»ã«ãã鳿ºä½ç½®ã®åº§æ¨å¤ããé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«é æ¬¡å¤æããããã«ããããããã¦ããã®ããã«ãã¦å¾ããã鳿ºä½ç½®ã®é³å£°åº§æ¨ç³»ã«ãã座æ¨å¤ããå®ä½ä½ç½®å¶å¾¡é¨ï¼ã«å¯¾ãã¦ä¾çµ¦ããããã«ãããã   The coordinate conversion unit 7 sequentially converts the coordinate value of the sound source position in the video coordinate system acquired by the audio coordinate acquisition unit 6 into the coordinate value of the audio coordinate system, using the conversion matrix calculated as described above. To be done. Then, the coordinate value of the sound source position obtained in this way in the audio coordinate system is supplied to the localization position control unit 8.
å®ä½ä½ç½®å¶å¾¡é¨ï¼ã¯ãä¾çµ¦ãããé³å£°åº§æ¨ç³»ã«ããé³åä½ç½®ã«ä»®æ³é³æºãå®ä½ãããããã«ãå³ï¼ã«ç¤ºããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ã«å¯¾ãããããä¸ããããã¹ãã²ã¤ã³å¤ã決å®ããã
ããªãã¡ãä¾çµ¦ãããé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã¨ãã¦ï½ã®å¤ã¨ï½ã®å¤ã¨ãå
±ã«æ£ã®å¤ã§ããã°ããã®å¤ã«å¿ãã¦ã¹ãã¼ã«ï¼³ï¼°R-upããåºåãããã¹ãé³å£°ã®ã²ã¤ã³ãä»ã®ã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ã«å¯¾ãã¦ç¸å¯¾çã«å¤§ãããªãããã«ãåã²ã¤ã³å¤ã決å®ãããæãã¯ãä¾çµ¦ããã座æ¨å¤ã¨ãã¦ï½ã®å¤ã¨ï½ã®å¤ã¨ãå
±ã«è² ã®å¤ã§ããã°ããã®å¤ã«å¿ãã¦ã¹ãã¼ã«ï¼³ï¼°L-unããåºåãããã¹ãé³å£°ã®ã²ã¤ã³ãä»ã®ã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ã«å¯¾ãã¦ç¸å¯¾çã«å¤§ãããªãããã«ãåã²ã¤ã³å¤ã決å®ããã¨ãã£ããã®ã§ããã The localization position control unit 8 determines gain values to be given to the sounds to be output from the speakers SP shown in FIG. 2 in order to localize the virtual sound source to the sound image position in the supplied voice coordinate system. .
In other words, if both the x value and the y value are positive values as the coordinate values of the supplied voice coordinate system, the gain of the voice to be output from the speaker SPR-up according to the value is set to other speakers. Each gain value is determined so as to be relatively large with respect to the gain of the sound from the SP. Alternatively, if the supplied coordinate values x and y are both negative, the gain of the sound to be output from the speaker SPL-un according to the value is the sound from the other speaker SP. Each gain value is determined so as to be relatively large with respect to the gain.
é³å£°ä¿¡å·å¦çé¨ï¼ã¯ããªã¼ãã£ãªãã³ã¼ãï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããã²ã¤ã³èª¿æ´ãæ®é¿ä»å ãªã©ã®é³å£°ä¿¡å·å¦çãå®è¡ããããã«æ§æãããã
ç¹ã«æ¬å®æ½ã®å½¢æ
ã®å ´åã¯ãä¸è¨å®ä½ä½ç½®å¶å¾¡é¨ï¼ããä¾çµ¦ãããåã¹ãã¼ã«ï¼³ï¼°å¯¾å¿ã®ã²ã¤ã³å¤ã«åºã¥ããé³å£°ä¿¡å·ï¼¡ã«ã¤ãã¦ã®ã²ã¤ã³èª¿æ´ãè¡ãããã«ãããã
å
·ä½çã«ã¯ãå
¥åãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-UNã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããã
é³å£°ä¿¡å·å¦çé¨ï¼ã«ããçæãããé³å£°ä¿¡å·ï¼¡L-unã¯ãå³ç¤ºããããã«ãã¦é³å£°åºå端åï¼´AUL-unã«ä¾çµ¦ããããåæ§ã«ãé³å£°ä¿¡å·ï¼¡L-upã¯é³å£°åºå端åï¼´AUL-upãé³å£°ä¿¡å·ï¼¡R-unã¯é³å£°åºå端åï¼´AUR-unãé³å£°ä¿¡å·ï¼¡R-upã¯é³å£°åºå端åï¼´AUR-upã«å¯¾ãããããä¾çµ¦ãããã The audio signal processing unit 5 is configured to execute audio signal processing such as gain adjustment and reverberation addition for the audio signal A supplied from the audio decoder 4.
Particularly in the case of the present embodiment, the gain adjustment for the audio signal A is performed based on the gain value corresponding to each speaker SP supplied from the localization position control section 8.
Specifically, the audio signal AL-UN obtained by multiplying the input audio signal A by the gain value GL-un, the audio signal AL-up obtained by multiplying the gain value GL-up, and the gain value GR-un are obtained. The multiplied audio signal AR-un and the audio signal AR-up multiplied by the gain value GR-up are generated.
The audio signal AL-un generated by the audio signal processing unit 5 is supplied to the audio output terminal TAUL-un as shown. Similarly, the audio signal AL-up is supplied to the audio output terminal TAUL-up, the audio signal AR-un is supplied to the audio output terminal TAUR-un, and the audio signal AR-up is supplied to the audio output terminal TAUR-up.
ããã¦ãé³å£°åºå端åï¼´AUL-unã¯ãå³ï¼ã«ç¤ºããã¹ãã¼ã«ï¼³ï¼°L-unã¨æ¥ç¶ããããã¾ãé³å£°åºå端åï¼´AUL-upã¯ã¹ãã¼ã«ï¼³ï¼°L-upãé³å£°åºå端åï¼´AUR-unã¯ã¹ãã¼ã«ï¼³ï¼°R-unãé³å£°åºå端åï¼´AUR-upã¯ã¹ãã¼ã«ï¼³ï¼°R-upã¨ããããæ¥ç¶ãããã
ããã«ãã£ã¦ã¹ãã¼ã«ï¼³ï¼°L-unããã¯é³å£°ä¿¡å·ï¼¡L-unãåºåã§ããã¹ãã¼ã«ï¼³ï¼°L-upããã¯é³å£°ä¿¡å·ï¼¡L-upãåºåã§ãããã¾ããã¹ãã¼ã«ï¼³ï¼°R-unããã¯é³å£°ä¿¡å·ï¼¡R-unãåºåã§ããã¹ãã¼ã«ï¼³ï¼°R-upããã¯é³å£°ä¿¡å·ï¼¡R-upãåºåãããã¨ãã§ããã
ã¤ã¾ããããã«ãã£ã¦æ åå
ã«æ ãåºãããPlayerã®ä½ç½®ï¼é³æºã®ä½ç½®ï¼ã¨ãã©ã¤ã³åé²ãããå½è©²Playerã®é³å£°ãå®ä½ããä½ç½®ï¼ä»®æ³é³åä½ç½®ï¼ã¨ãä¸è´ããããã«åç¾ãããã¨ãã§ããããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã¨ãã§ããã The audio output terminal TAUL-un is connected to the speaker SPL-un shown in FIG. The audio output terminal TAUL-up is connected to the speaker SPL-up, the audio output terminal TAUR-un is connected to the speaker SPR-un, and the audio output terminal TAUR-up is connected to the speaker SPR-up.
As a result, the audio signal AL-un can be output from the speaker SPL-un, and the audio signal AL-up can be output from the speaker SPL-up. Further, the audio signal AR-un can be output from the speaker SPR-un, and the audio signal AR-up can be output from the speaker SPR-up.
In other words, it is possible to reproduce the position of the player (sound source position) displayed in the video and the position (virtual sound image position) where the sound of the player recorded in the line matches. It is possible to reproduce a pleasing video / sound space.
ããã¾ã§ã§èª¬æããåçè£ ç½®ï¼ã«ããã°ãæ åä¿¡å·ï¼¶ã«åºã¥ã鳿ºã®åº§æ¨å¤ãåå¾ããããã®åº§æ¨å¤ã«åºã¥ãèªåçã«ä»®æ³é³æºã®å®ä½ä½ç½®å¶å¾¡ãè¡ããããã¤ã¾ããããã«ãã£ã¦ãã®å ´åã³ã³ãã³ãã®å¶ä½å´ã¨ãã¦ã¯ãä¸è¨ã®ããã«ãã¦æ åå ã«æ ãåºããã鳿ºã®ä½ç½®ã¨ãã®é³æºã®ä»®æ³é³åä½ç½®ã¨ãä¸è´ããããã«ãã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã«ããããæéè»¸ã«æ²¿ã£ã¦é³æºã®ä½ç½®æ å ±ãæå®ãã¦ã²ã¤ã³èª¿æ´ãè¡ãæéãçããã®ã§ãããã«ä¼´ã£ã¦ã³ã³ãã³ãã®ç·¨éã«è¦ããæéã¨æéãæå¹ã«åæ¸ãããã¨ãã§ããã   According to the reproducing apparatus 1 described so far, the coordinate value of the sound source is acquired based on the video signal V, and the localization position control of the virtual sound source is automatically performed based on the coordinate value. In other words, in this case, the content production side has a more realistic video / sound field space so that the position of the sound source displayed in the video and the virtual sound image position of the sound source match as described above. Can be saved, it is possible to save the effort and time required to edit the content.
ãªããããã§ã¯åã¹ãã¼ã«ï¼³ï¼°ããåºåãããé³å£°ä¿¡å·ã®ããããã®ã²ã¤ã³å¤ã®èª¿æ´ã«ããå®ä½ä½ç½®ã®å¶å¾¡ãè¡ããã®ã¨ãã¦ããããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããé³å£°ä¿¡å·ã®ä½ç¸å·®ã®èª¿æ´ã«ãã£ã¦å®ä½ä½ç½®å¶å¾¡ãè¡ããã¨ãã§ãããã¾ãã¯ããããã®åæ¹ã«ããå®ä½ä½ç½®å¶å¾¡ãè¡ããã¨ãã§ããã   Here, the localization position is controlled by adjusting the gain value of each audio signal output from each speaker SP. However, the localization position control is performed by adjusting the phase difference of the audio signal output from each speaker SP. Can also be done. Alternatively, the localization position control can be performed by both of them.
å³ï¼ã¯ãä¸è¨ã«ãã説æãã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®ä¿¡å·å¦çè£
ç½®ã®åä½ã«ã¤ãã¦ã®å使é ãããã¼ãã£ã¼ãã«ãã示ãã¦ããã
å³ï¼ã«ããã¦ãå
ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãæ åä¿¡å·ã«åºã¥ã鳿ºä½ç½®ã®æ å座æ¨ç³»ã«ãã座æ¨å¤ãåå¾ããããã®åä½ã¯ã鳿ºåº§æ¨åå¾é¨ï¼ãããããªãã³ã¼ãï¼ã«ãããã³ã¼ãå¦çã«ããæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmããå¾ãããæ åä¿¡å·ï¼¶ã«åºã¥ãã鳿ºä½ç½®ã®åº§æ¨å¤ãåå¾ããåä½ã«ç¸å½ããã
ãã®å ´åã鳿ºä½ç½®ã®åº§æ¨å¤ã®å徿æ³ã¨ãã¦ã¯ãä¾ãã°å
ã«èª¬æããããã«ãå
ãã¯äºãæ åã®æ®å½±æã«ããã¦Playerã¨ãã¦ã®äººç©ã«å¯¾ãä¾ãã°èµ¤å¤ç·ï¼©ï¼¤ã®çºå
è£
ç½®ãªã©ã®æå®ã®ãã¼ã«ã¼ãä»ãã¦æ åãæ®å½±ãã¦ãããããã¦ã鳿ºåº§æ¨åå¾é¨ï¼ã¨ãã¦ã¯ãä¾çµ¦ãããæ åä¿¡å·ï¼¶ãããã®æå®ã®ãã¼ã«ã¼ã®ä½ç½®ãç»åå¦çã«ããæ¤åºããããããã©ããã³ã°ãããã¨ã§Playerã®æ åä¸ã«ãããä½ç½®æ
å ±ãããªãã¡é³æºä½ç½®ã®åº§æ¨å¤ãé æ¬¡åå¾ããããã«ããã FIG. 6 is a flowchart showing an operation procedure for the operation of the signal processing apparatus according to the first embodiment described above.
In FIG. 6, first, in step S101, the coordinate value of the sound source position in the video coordinate system is acquired based on the video signal. This operation corresponds to an operation in which the sound source coordinate acquisition unit 6 acquires the coordinate value of the sound source position based on the video signal V obtained from the video stream data V-strm by the decoding process by the video decoder 3.
In this case, as a method for acquiring the coordinate value of the sound source position, for example, as described above, first, a predetermined marker such as an infrared ID light emitting device is attached to a person as a player in advance when shooting a video. And record a video. Then, the sound source coordinate acquisition unit 6 detects the position of the predetermined marker from the supplied video signal V by image processing, and tracks this to position information in the player video, that is, the coordinate value of the sound source position. Are acquired sequentially.
ã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãåå¾ãã座æ¨å¤ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããã
ã¤ã¾ãã座æ¨å¤æé¨ï¼ãã夿ãããªã¯ã¹ç®åºé¨ï¼ã«ããç®åºããã夿ãããªã¯ã¹ã«åºã¥ãã鳿ºåº§æ¨åå¾é¨ï¼ã«ããåå¾ããã座æ¨å¤ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããã In step S102, the acquired coordinate values are converted into coordinate values in the audio coordinate system.
That is, the coordinate conversion unit 7 converts the coordinate value acquired by the sound source coordinate acquisition unit 6 into the coordinate value of the audio coordinate system based on the conversion matrix calculated by the conversion matrix calculation unit 9.
ã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«åºã¥ãå®ä½ä½ç½®å¶å¾¡ãè¡ãã
ãã®ã¹ãããï¼³ï¼ï¼ï¼ã¨ãã¦ã¯ãå
ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ããä¾çµ¦ãããé³å£°åº§æ¨ç³»ã«ããé³åä½ç½®ã«ä»®æ³é³æºãå®ä½ãããããã«ãå³ï¼ã«ç¤ºããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ä¿¡å·ã«å¯¾ãããããä¸ããããã¹ãã²ã¤ã³å¤ï¼GL-unãGL-upãGR-unãGR-upï¼ã決å®ãããããã¦ãé³å£°ä¿¡å·å¦çé¨ï¼ããå
¥åãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ãã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-unã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããã
ããã«ãããæ åå
ã«æ ãåºãããPlayerã®ä½ç½®ï¼é³æºã®ä½ç½®ï¼ã¨ãã©ã¤ã³åé²ãããå½è©²Playerã®é³å£°ãå®ä½ããä½ç½®ã¨ãä¸è´ããããã«åç¾ãããã¨ã®ã§ããé³å£°ä¿¡å·ãçæãããã In step S103, localization position control based on the coordinate values of the voice coordinate system is performed.
As this step S103, first, the localization position control unit 8 respectively applies to the audio signal to be output from each speaker SP shown in FIG. 2 in order to localize the virtual sound source to the sound image position by the supplied audio coordinate system. Determine the gain values (GL-un, GL-up, GR-un, GR-up) to be given. Then, the audio signal processing unit 5 performs an audio signal AL-un obtained by multiplying the input audio signal A by the gain value GL-un, an audio signal AL-up obtained by multiplying the gain value GL-up, and a gain value GR. The audio signal AR-un multiplied by -un and the audio signal AR-up multiplied by the gain value GR-up are generated.
Thus, an audio signal that can be reproduced so that the position of the player (sound source position) displayed in the video and the position where the audio of the player recorded in the line matches is generated.
ãªããããã¾ã§ã®èª¬æã§ã¯ãæ¬å®æ½ã®å½¢æ
ã¨ãã¦ã®ä¿¡å·å¦çè£
ç½®ã®åé¨ããã¼ãã¦ã¨ã¢ã«ããæ§æããå ´åãä¾ç¤ºãããããã®ä¸é¨åã¯å
¨é¨ãã½ããã¦ã¨ã¢å¦çã«ããå®ç¾ãããã¨ãå¯è½ã§ããããã®å ´åãä¿¡å·å¦çè£
ç½®ã¨ãã¦ã¯ãä¸è¨å³ï¼ã«ç¤ºããå¦çã®ãã¡å¯¾å¿ããå¦çãå®è¡ããããã®ããã°ã©ã ã«å¾ã£ã¦åä½ãããã¤ã¯ãã³ã³ãã¥ã¼ã¿ãªã©ã§æ§æããã°ããããã®å ´åãä¿¡å·å¦çè£
ç½®ã«å¯¾ãã¦ã¯ï¼²ï¼¯ï¼çã®è¨é²åªä½ãåããããããã«ä¸è¨ããã°ã©ã ãè¨é²ãããã
In the above description, the case where each part of the signal processing apparatus according to the present embodiment is configured by hardware is exemplified, but part or all of the part can be realized by software processing. In that case, the signal processing device may be configured by a microcomputer or the like that operates according to a program for executing a corresponding process among the processes shown in FIG. In this case, the signal processing apparatus is provided with a recording medium such as a ROM, and the program is recorded therein.
ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æ ï¼
å³ï¼ã¯ã第ï¼ã®å®æ½ã®å½¢æ ã¨ãã¦ã®ä¿¡å·å¦çè£ ç½®ãå«ãã§æ§æãããåçè£ ç½®ï¼ï¼ã®å 鍿§æã«ã¤ãã¦ç¤ºãã¦ããã
FIG. 7 shows the internal configuration of the playback apparatus 20 configured to include the signal processing apparatus as the second embodiment.
第ï¼ã®å®æ½ã®å½¢æ
ã®åçè£
ç½®ï¼ï¼ã¨ãã¦ã¯ãå³ï¼ã«ç¤ºããåçè£
ç½®ï¼ã®æ§æããæä½é¨ï¼ï¼ãçç¥ããã¨å
±ã«ãå³ä¸ç ´ç·ã«ããå²ãé¨åã夿´ãããã®ã¨ãªãã
ãã®ç ´ç·ã«ããå²ãé¨åãã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®ä¿¡å·å¦çè£
ç½®ãæ§æããé¨åã¨ãªããã¤ã¾ãã第ï¼ã®å®æ½ã®å½¢æ
ã®ä¿¡å·å¦çè£
ç½®ã®æ§æè¦ç´ ã¯ãå°ãªãã¨ãå³ç¤ºããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ã¨ãªãã As the playback device 20 of the second embodiment, the operation unit 10 is omitted from the configuration of the playback device 1 shown in FIG. 1, and the portion surrounded by the broken line in the figure is changed.
A portion surrounded by the broken line is a portion constituting the signal processing apparatus as the second embodiment. That is, the constituent elements of the signal processing apparatus according to the second embodiment are at least the metadata extraction unit 21, the reverberation effect control unit 22, the reverberation data table 23, and the audio signal processing unit 5, which are illustrated.
å ããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ã¯ããã®å ´åã®æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmå ã«å«ã¾ããã¡ã¿ãã¼ã¿ãæ½åºããããã«ãããã   First, the metadata extraction unit 21 extracts metadata included in the video stream data V-strm in this case.
ããã§ã第ï¼ã®å®æ½ã®å½¢æ
ã§ã¯ãä¸è¿°ã®ããã«ãã¦æ åå
容ã«å¿ããé³ã®é¿ããé³å£°ä¿¡å·ã«ä¸ããã«ããã£ã¦ãäºãã³ã³ãã³ãã®å¶ä½å´ã«ããã¦ãæ åä¿¡å·ï¼¶ã«å¯¾ãã¦æ åå
ã«æ ãåºãããå ´æãç¹å®ããããã®å ´ææ
å ±ãä»å ãã¦ããããã«ããããããã¦ããã®ããã«å ´ææ
å ±ãä»å ããæ åä¿¡å·ãè¨é²åªä½ã«å¯¾ãã¦è¨é²ããããã«ããã¦ããã
確èªã®ããã«è¿°ã¹ã¦ããã¨ããã®ãããªæ åå
ã«æ ãåºãããå ´æãç¹å®ããããã®å ´ææ
å ±ã¯ãæ åå
ã«æ ãåºãããå ´æã«å¿ããé³ã®é¿ããåç¾ããä¸ã§ããã®å ´æã«å¿ããé³ã®é¿ããç¹å®ããããã®æ
å ±ã¨ãªããå¾ã£ã¦ãã®ãããªå ´ææ
å ±ã¯ãé³å£°ä¿¡å·ã®é³é¿çãªå±æ§ã«ä¿ãé³å£°å±æ§æ
å ±ã¨ãªããã®ã§ããã Here, in the second embodiment, when the sound signal according to the video content is given to the audio signal as described above, the location where the content is projected in the video signal V in advance on the content production side. The location information for specifying is added. The video signal with the location information added is recorded on the recording medium.
For confirmation, the location information for identifying the location that appears in the video is based on the location of the sound that is reproduced according to the location that is projected in the video. This is information for identifying the sound of the sound. Therefore, such location information is audio attribute information related to the acoustic attribute of the audio signal.
å³ï¼ã¯ã第ï¼ã®å®æ½ã®å½¢æ
ã®å ´åã®æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã®æ§é ã示ãã¦ãããããã®å ´åã¯å³ç¤ºããããã«ä»å ãã¼ã¿å
ã®æ
å ±ã¨ãã¦ãä¸è¨å ´ææ
å ±ãã¡ã¿ãã¼ã¿ã¨ãã¦æ ¼ç´ããããã«ããã¦ããã
ä¾ãã°æ¬å®æ½ã®å½¢æ
ã®ããã«ã³ã³ãã³ãã¨ãã¦ã©ã¤ãæ åãåé²ãããå ´åã«ã¯ãä¸è¨å ´ææ
å ±ã¨ãã¦ã¯ç¹å®ã®ã³ã³ãµã¼ããã¼ã«ãèå¥ããããã®æ
å ±ãæ ¼ç´ãããã®ã¨ããã°ãããæãã¯ãæ åå
容ã¨ãã¦ä¾ãã°ãå¤âãã³ãã«âå¤âã³ã³ãµã¼ããã¼ã«ããªã©ã®ããã«æç³»åã«æ²¿ã£ã¦å ´æãé·ç§»ããå ´åã«ã¯ãæéè»¸ã«æ²¿ã£ã¦ãããã®å ´æãç¹å®ããããã®å ´ææ
å ±ãæ ¼ç´ããã°ããã
å
ã«ãè¿°ã¹ãããã«æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmå
ã«ãããä»å ãã¼ã¿ã¯ãæå®ã®ãã¼ã¿åä½ãã¨ã«ä»å ããããã«ããã¦ããããã®ãã¨ã§ãæ åå
容ã¨ãã¦æéè»¸ã«æ²¿ã£ã¦å ´æãå¤åããå ´åã«ã対å¿ãã¦ãããããã®å ´æã表ãå ´ææ
å ±ãæé軸ä¸ã§å¯¾å¿ã¥ãã¦åãè¾¼ããã¨ãã§ããã FIG. 8 shows the structure of the video stream data V-strm in the case of the second embodiment. In this case, the location information is stored as metadata as information in the additional data as shown in the figure. Has been.
For example, when a live video is recorded as content as in the present embodiment, information for identifying a specific concert hall may be stored as the location information. Alternatively, if the location of the video content changes along the time series, such as âoutside â tunnel â outside â concert hallâ, the location information for specifying these locations along the time axis is stored. do it.
As described above, the additional data in the video stream data V-strm is added every predetermined data unit. This makes it possible to embed place information representing each place in association with each other on the time axis, corresponding to the case where the place changes along the time axis as video content.
ããã§ããã®å ´åãé³å£°ä¿¡å·ï¼¡ã¨æ åä¿¡å·ï¼¶ã¨ã¯åæããä¿¡å·ã§ãããããã¦ãä¸è¨èª¬æã«ããã°ãæ åä¿¡å·ï¼¶ã¨ä»å ãã¼ã¿å ã®ã¡ã¿ãã¼ã¿ã¨ã¯åãæéè»¸ã«æ²¿ã£ãåæããæ å ±ã¨ãªãããããã®ãã¨ããããã®å ´åã¯æ åä¿¡å·ï¼¶ã¨å ±ã«ä¸è¨ã¡ã¿ãã¼ã¿ããæ¬çºæã§è¨ãé³å£°åææ å ±ä¿¡å·ã¨ãªãã   Here, in this case, the audio signal A and the video signal V are synchronized signals. According to the above description, the video signal V and the metadata in the additional data are synchronized information along the same time axis. Therefore, in this case, the metadata together with the video signal V becomes an audio synchronization information signal referred to in the present invention.
å³ï¼ã«ããã¦ãã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ã¯ããã®ãããªæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmããã¡ã¿ãã¼ã¿ãæ½åºããä¸è¨å ´ææ å ±ãåå¾ããããã«ããããããã¦ããã®å ´ææ å ±ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã«ä¾çµ¦ããã   In FIG. 7, the metadata extraction unit 21 extracts metadata from such video stream data V-strm and acquires the location information. The location information is supplied to the reverberation effect control unit 22.
æ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã¯ãå³ç¤ºããæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ã«åºã¥ããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ããå
¥åãããå ´ææ
å ±ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ãããã®æ®é¿ãã¼ã¿ã«åºã¥ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿ä»å å¦çã«ã¤ãã¦å¶å¾¡ããã
æ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ã«ã¯ãå ´ææ
å ±ã¨ããã®å ´ææ
å ±ã«ããç¹å®ãããå ´æã§ã®é³ã®é¿ããåç¾ããããã®æ®é¿ãã¼ã¿ã¨ã対å¿ã¥ãããã¦æ ¼ç´ããã¦ãããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã¯ããã®ãããªæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãããå
¥åãããå ´ææ
å ±ã¨å¯¾å¿ã¥ãããã¦ããæ®é¿ãã¼ã¿ãåå¾ãããã¨ã§ã対å¿ããæ®é¿ãã¼ã¿ãå¾ããã¨ãã§ããã
ããã¦ããã®ãããªæ®é¿ãã¼ã¿ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ä¾çµ¦ãããã¨ã§ãå½è©²é³å£°ä¿¡å·å¦çé¨ï¼ã«ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿ä»å å¦çã«ã¤ãã¦å¶å¾¡ããããã«ãããã
ã¤ã¾ãããã®å ´åã®é³å£°ä¿¡å·å¦çé¨ï¼ã¯ããªã¼ãã£ãªãã³ã¼ãï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãæ½ããããã«ãã£ã¦é³å£°ä¿¡å·ï¼¡ã«å¯¾ãã¦ã¯ãæ åå
容ã«å¿ããé³ã®é¿ããåç¾ããããã®æ®é¿ãä»å ããããã¨ã«ãªãã
ããã¦ããã®ããã«æ®é¿ãä»å ããé³å£°ä¿¡å·ï¼¡ãããã®å ´åã®é³å£°åºå端åï¼´AUã®æ°ã«å¿ããï¼ç³»çµ±ã«åå²ãã¦åºåããããã«ãããã The reverberation effect control unit 22 acquires reverberation data corresponding to the location information input from the metadata extraction unit 21 based on the reverberation data table 23 shown in the figure, and the audio signal A in the audio signal processing unit 5 based on the reverberation data. Controls reverberation addition processing for.
The reverberation data table 23 stores location information and reverberation data for reproducing the sound of the sound at the location specified by the location information, and the reverberation effect control unit 22 By acquiring reverberation data associated with the input location information from such a reverberation data table 23, the corresponding reverberation data can be obtained.
Then, by supplying such reverberation data to the audio signal processing unit 5, reverberation addition processing for the audio signal A in the audio signal processing unit 5 is controlled.
In other words, the audio signal processing unit 5 in this case performs reverberation adding processing based on the reverberation data supplied from the reverberation effect control unit 22 on the audio signal A supplied from the audio decoder 4. As a result, reverberation for reproducing the sound of the sound according to the video content is added to the audio signal A.
Then, the audio signal A to which reverberation is added in this way is branched into four systems according to the number of audio output terminals TAU in this case and output.
ãªãã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã¯ã第ï¼ã®å®æ½ã®å½¢æ
ã®ãããªä¸ä¸æ¹åã¸ã®ä»®æ³é³åã®å®ä½å¶å¾¡ã¯è¡ããªããã¨ãããã¹ãã¼ã«ï¼³ï¼°ã¨ãã¦ã¯å¿
ãããä¸ä¸æ¹åã«ç©ã¿éãã¦é
ç½®ããå¿
è¦ã¯ãªããããªãã¡ããã®å ´åã®é³å£°åºå端åï¼´AUã¨ãã¦ã¯ãLchã¨ï¼²chã®åã
ï¼ã¤ãã¤ã®ã¿ãè¨ããããã«ãããã¨ãã§ããã
ä½ããä¾ãã°æä¼ãã³ã³ãµã¼ããã¼ã«ãªã©å¤©äºã®é«ãã強調ããæ®é¿ãä»å ããã¨ããå ´åçã«ã¯ãä¸ä¸æ¹åã«ãã¹ãã¼ã«ï¼³ï¼°ãé
ç½®ãããã¨ã§ããè¨å ´æãé«ãããã¨ãã§ããã In the second embodiment, the virtual sound image localization control in the vertical direction is not performed as in the first embodiment, and therefore the speaker SP does not necessarily have to be stacked in the vertical direction. . That is, only one each of Lch and Rch can be provided as the audio output terminal TAU in this case.
However, for example, when reverberation that emphasizes the height of the ceiling is added, such as in a church or a concert hall, the presence of the speaker SP in the vertical direction can further enhance the sense of reality.
ä¸è¨æ§æã«ããã第ï¼ã®å®æ½ã®å½¢æ
ã®åçè£
ç½®ï¼ï¼ã«ããã°ãå®éã®åºåé³å£°ã«ããé³ã®é¿ãããæ åå
容ã«å¿ããé³ã®é¿ãã¨ä¸è´ããããã¨ãã§ããããã«ãã£ã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã¨ãã§ããã
ã¾ãããã®ãããªåçè£
ç½®ï¼ï¼ã§ã¯ãé³å£°åææ
å ±ä¿¡å·ã¨ãã¦ã®ã¡ã¿ãã¼ã¿ã«åºã¥ããæ åå
ã«æ ãåºãããå ´æã«å¿ããæ®é¿ãã¼ã¿ãåå¾ãããã¨ãã§ãããã®æ®é¿ãã¼ã¿ã«åºã¥ãã¦èªåçã«é³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿ä»å ãè¡ããããã¤ã¾ãããã®å ´åã³ã³ãã³ãã®å¶ä½å´ã¨ãã¦ã¯ãäºãæ åä¿¡å·ï¼¶ã«å¯¾ãã¡ã¿ãã¼ã¿ãä»å ãããã¨ã§ãä¸è¨ã®ããã«å®éã®åºåé³å£°ã«ããé³ã®é¿ããæ åå
容ã«å¿ããé³ã®é¿ãã¨ä¸è´ããã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ããããã¨ãã§ããã With the configuration described above, according to the playback device 20 of the second embodiment, the sound reverberation of the actual output sound can be made to coincide with the sound reverberation according to the video content, thereby making it more realistic. The image / sound field space can be reproduced.
Further, in such a playback apparatus 20, reverberation data corresponding to the location shown in the video can be acquired based on the metadata as the audio synchronization information signal, and the audio signal is automatically generated based on the reverberation data. A reverberation is added to A. In other words, in this case, the content production side adds metadata to the video signal V in advance so that the sound reverberation of the actual output sound matches the sound resonation according to the video content as described above. This makes it possible to reproduce more realistic video and sound field spaces.
å³ï¼ã¯ã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®ä¿¡å·å¦çè£
ç½®ã®åä½ã«ã¤ãã¦ã®å使é ãããã¼ãã£ã¼ãã«ãã示ãã¦ããã
å
ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãã¡ã¿ãã¼ã¿ã«åºã¥ãæ åå
容ã«å¿ããå ´ææ
å ±ãåå¾ããã
ã¤ã¾ããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmããã¡ã¿ãã¼ã¿ã¨ãã¦æ ¼ç´ãããå ´ææ
å ±ãåå¾ããã FIG. 9 is a flowchart showing an operation procedure for the operation of the signal processing apparatus according to the second embodiment.
First, in step S201, location information corresponding to the video content is acquired based on the metadata.
That is, the metadata extraction unit 21 acquires location information stored as metadata from the video stream data V-strm.
ããã¦ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãæ®é¿ãã¼ã¿ãã¼ãã«ãããåå¾ãããå ´ææ å ±ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ãããããªãã¡ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ããä¾çµ¦ãããå ´ææ å ±ã¨å¯¾å¿ã¥ãããã¦ããæ®é¿ãã¼ã¿ãåå¾ããã   In step S202, reverberation data corresponding to the acquired location information is acquired from the reverberation data table. That is, the reverberation effect control unit 22 acquires reverberation data associated with the location information supplied from the metadata extraction unit 21 from the reverberation data table 23.
ãã®ä¸ã§ã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãé³å£°ä¿¡å·ã«å¯¾ãæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãè¡ããã¤ã¾ããé³å£°ä¿¡å·å¦çé¨ï¼ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«åºã¥ããé³å£°ä¿¡å·ï¼¡ã«å¯¾ãæ®é¿ä»å å¦çãæ½ãã   In step S203, a reverberation adding process based on the reverberation data is performed on the audio signal. That is, the audio signal processing unit 5 performs reverberation addition processing on the audio signal A based on the reverberation data supplied from the reverberation effect control unit 22.
ãªãã第ï¼ã®å®æ½ã®å½¢æ ã¨ãã¦ããä¿¡å·å¦çè£ ç½®ã®åé¨ããã¼ãã¦ã¨ã¢ã«ããæ§æããå ´åãä¾ç¤ºãããããã®ä¸é¨åã¯å ¨é¨ãã½ããã¦ã¨ã¢å¦çã«ããå®ç¾ãããã¨ãå¯è½ã§ããããã®å ´åãä¿¡å·å¦çè£ ç½®ã¨ãã¦ã¯ãä¸è¨å³ï¼ã«ç¤ºããå¦çã®ãã¡å¯¾å¿ããå¦çãå®è¡ããããã®ããã°ã©ã ã«å¾ã£ã¦åä½ãããã¤ã¯ãã³ã³ãã¥ã¼ã¿ãªã©ã§æ§æããã°ããããã®å ´åãä¿¡å·å¦çè£ ç½®ã«å¯¾ãã¦ã¯ï¼²ï¼¯ï¼çã®è¨é²åªä½ãåããããããã«ä¸è¨ããã°ã©ã ãè¨é²ãããã   In addition, although the case where each part of the signal processing apparatus is configured by hardware is illustrated as the second embodiment, part or all thereof can be realized by software processing. In that case, the signal processing device may be configured by a microcomputer or the like that operates according to a program for executing a corresponding process among the processes shown in FIG. In this case, the signal processing apparatus is provided with a recording medium such as a ROM, and the program is recorded therein.
ã¾ãã第ï¼ã®å®æ½ã®å½¢æ ã«ããã¦ãå ´ææ å ±ã¨æ®é¿ãã¼ã¿ã¨ã®å¯¾å¿ã¥ãã¯ãæ åå ã«ããã¦é³æºãé ç½®ãããå ´æããäºæ¸¬ãããæ¬ä¼¼çãªé³ã®é¿ãã対å¿ã¥ãããæãã¯ãµã³ããªã³ã°ãªãã¼ãæ¹å¼ã®ããã«ãå®éã«ãã®å ´æã«ããã¦æ¸¬å®ããé³ã®é¿ãã®æ å ±ã対å¿ã¥ããããã«ãã¦è¡ããã¨ãã§ããã   Further, in the second embodiment, the association between the location information and the reverberation data associates the pseudo sound reverberated from the place where the sound source is arranged in the video, or like the sampling reverb method. In addition, the sound resonance information actually measured at the place can be correlated.
ã¾ãã第ï¼ã®å®æ½ã®å½¢æ
ã§ã¯ãæ åå
容ã«å¿ããæ®é¿ä»å ã«ããããæ åä¿¡å·ï¼¶ã«å¯¾ãã¦å ´ææ
å ±ãã¡ã¿ãã¼ã¿ã«ããåãè¾¼ããã®ã¨ããããæ åå
容ã«å¿ããé¿ããåç¾ããããã®æ®é¿ãã¼ã¿ãç¹å®ã§ããæ
å ±ã§ããã°ãå ´ææ
å ±ã«éå®ãããã¹ããã®ã§ã¯ãªããã¾ãããã®ããã«æ®é¿ãã¼ã¿ãç¹å®ããããã®æ
å ±ãåãè¾¼ã¾ãã¨ããæ®é¿ãã¼ã¿ãã®ãã®ãç´æ¥çã«ã¡ã¿ãã¼ã¿ã«ããåãè¾¼ãããã«ãããã¨ãã§ããã
ãªãããã®ãã¨ã¯æ¬¡ã«èª¬æãã第ï¼ã®å®æ½ã®å½¢æ
ã«ã¤ãã¦ãåæ§ã§ããã
In the second embodiment, the location information is embedded in the video signal V by metadata when adding the reverberation according to the video content. However, the reverberation data for reproducing the reverberation according to the video content is used. The information should not be limited to the location information as long as the information can be specified. Further, without embedding information for specifying reverberation data in this way, the reverberation data itself can be directly embedded with metadata.
This also applies to the third embodiment described below.
ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æ ï¼
å³ï¼ï¼ã¯ã第ï¼ã®å®æ½ã®å½¢æ ã¨ãã¦ã®ä¿¡å·å¦çè£ ç½®ãå«ãã§æ§æãããåçè£ ç½®ï¼ï¼ã®å 鍿§æã«ã¤ãã¦ç¤ºãã¦ããã
FIG. 10 shows the internal configuration of the playback apparatus 30 that includes the signal processing apparatus as the third embodiment.
第ï¼ã®å®æ½ã®å½¢æ ã®åçè£ ç½®ï¼ï¼ã¨ãã¦ã¯ããã®å³ï¼ï¼ã«ç¤ºãããç ´ç·ã«ããå²ã£ãä¿¡å·å¦çè£ ç½®ã¨ãã¦ãå ã®å³ï¼ã«ç¤ºããä¿¡å·å¦çè£ ç½®ã®æ§æè¦ç´ ï¼é³æºåº§æ¨åå¾é¨ï¼ã座æ¨å¤æé¨ï¼ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ã夿ãããªã¯ã¹ç®åºé¨ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ï¼ã¨ãå ã®å³ï¼ã«ç¤ºããä¿¡å·å¦çè£ ç½®ã®æ§æè¦ç´ ï¼ã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ï¼ã¨ãçµã¿åããããã®ãå«ãããã«ãã¦æ§æãããã   As the playback device 30 of the third embodiment, as the signal processing device surrounded by the broken line shown in FIG. 10, the components of the signal processing device shown in FIG. 1 (sound source coordinate acquisition unit 6, coordinate transformation) 7, localization position control unit 8, transformation matrix calculation unit 9, audio signal processing unit 5, and components of the signal processing apparatus shown in FIG. 7 ( metadata extraction unit 21, reverberation effect control unit 22, reverberation) A combination of the data table 23 and the audio signal processing unit 5) is included.
ãã®å ´åãé³å£°ä¿¡å·å¦çé¨ï¼ã¨ãã¦ã¯ããªã¼ãã£ãªãã³ã¼ãï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããå®ä½ä½ç½®å¶å¾¡é¨ï¼ããä¾çµ¦ãããã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-unã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããã
ãã®ä¸ã§ããããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upã«å¯¾ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«å¿ããæ®é¿ä»å å¦çãæ½ããããã¦ããã®ããã«æ®é¿ä»å å¦çãæ½ãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unã¨ãé³å£°ä¿¡å·ï¼¡R-upããããã対å¿ããé³å£°åºå端åï¼´AUã«å¯¾ãã¦åºåããããã«ãããã In this case, as the audio signal processing unit 5, the audio signal AL-un obtained by multiplying the audio signal A supplied from the audio decoder 4 by the gain value GL-un supplied from the localization position control unit 8, and the gain value An audio signal AL-up multiplied by GL-up, an audio signal AR-un multiplied by a gain value GR-un, and an audio signal AR-up multiplied by a gain value GR-up are generated.
Then, reverberation adding processing corresponding to the reverberation data supplied from the reverberation effect control unit 22 is performed on the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up. . Then, the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up that have been subjected to the reverberation adding process are output to the corresponding audio output terminals TAU. To be.
ãã®ãããªç¬¬ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®åçè£
ç½®ï¼ï¼ã«ããã°ãæ åå
ã«æ ãåºããã鳿ºã®ä½ç½®ã¨ãã®é³æºã®ä»®æ³é³åä½ç½®ã¨ãä¸è´ããããã¨ã¨ãå®éã®åºåé³å£°ã«ããé³ã®é¿ãã¨æ åå
容ã«å¿ããé³ã®é¿ãã¨ãä¸è´ããããã¨ã®åæ¹ãå®ç¾ãããã¨ãã§ããããã«ãã£ã¦ããã«è¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã¨ãã§ããã
ã¾ãããã®å ´åã¨ãã¦ããé³åä½ç½®ã示ã座æ¨å¤ã¨ãæ®é¿ãã¼ã¿ãç¹å®ããããã®å ´ææ
å ±ã¨ã¯ãããããæ åä¿¡å·ï¼¶ã¨ã¡ã¿ãã¼ã¿ã¨ãã¦ã®é³å£°åææ
å ±ä¿¡å·ã«åºã¥ãã¦èªåçã«åå¾ãããã®ã§ã徿¥ã®ããã«é³æºã®ä½ç½®ãæ åå
容ã«å¿ããé¿ãã®æ
å ±ããæéè»¸ã«æ²¿ã£ã¦é次æåã§æç¤ºããå¿
è¦ã¯ãªããªããã¤ã¾ããããã«ãã£ã¦ã³ã³ãã³ãã®ç·¨éã«è¦ããæéã¨æéã大å¹
ã«åæ¸ãããã¨ãã§ããã According to the reproducing apparatus 30 as the third embodiment, the position of the sound source displayed in the video is matched with the virtual sound image position of the sound source, the sound of the sound by the actual output sound and the video It is possible to realize both the matching of the sound reverberation according to the contents, and thereby to reproduce a more realistic video / sound field space.
Also in this case, the coordinate value indicating the sound image position and the location information for specifying the reverberation data are automatically acquired based on the video signal V and the audio synchronization information signal as metadata, respectively. Thus, it is not necessary to manually instruct the sound information according to the position of the sound source and the video content sequentially along the time axis as in the conventional case. In other words, this can greatly reduce the labor and time required for editing the content.
å³ï¼ï¼ã¯ã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®ä¿¡å·å¦çè£
ç½®ã®åä½ã«ã¤ãã¦ã®å使é ãããã¼ãã£ã¼ãã«ãã示ãã¦ããã
ãã®å ´åã®ä¿¡å·å¦çè£
ç½®ã®åä½ã¨ãã¦ã¯ãå
ã®å³ï¼ã«ç¤ºãã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®åä½ã¨ãå³ï¼ã«ç¤ºãã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®åä½ã¨ã並è¡ãã¦è¡ããããã®ã¨ãªãã
ã¤ã¾ããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãå
ã®å³ï¼ã«ç¤ºããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã¨åæ§ã«ãã¡ã¿ãã¼ã¿ã«åºã¥ãæ åå
容ã«å¿ããå ´ææ
å ±ã®åå¾ã¨ãæ®é¿ãã¼ã¿ãã¼ãã«ããåå¾ãããå ´ææ
å ±ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ããåä½ãè¡ãããã
䏿¹ã§ãããã¨ä¸¦è¡ããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã¨ãã¦ãå
ã®å³ï¼ã«ç¤ºããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã¨åæ§ã«æ åä¿¡å·ã«åºã¥ã鳿ºä½ç½®ã®æ å座æ¨ç³»ã«ãã座æ¨å¤ãåå¾ããåä½ã¨ãåå¾ãã座æ¨å¤ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããåä½ã¨ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«åºã¥ãå®ä½ä½ç½®å¶å¾¡ãè¡ãããã«ãããã FIG. 11 is a flowchart showing an operation procedure for the operation of the signal processing apparatus according to the third embodiment.
As the operation of the signal processing apparatus in this case, the operation as the first embodiment shown in FIG. 6 and the operation as the second embodiment shown in FIG. 9 are performed in parallel. It will be a thing.
That is, in step S301 and step S302, as in the case of step S201 and step S202 shown in FIG. 9, the acquisition of the location information according to the video content based on the metadata and the location information acquired from the reverberation data table are performed. The operation of acquiring the corresponding reverberation data is performed.
On the other hand, as step S303, step S304, and step S305 in parallel with this, the coordinate value by the video coordinate system of the sound source position is acquired based on the video signal in the same manner as in step S101, step S102, and step S103 shown in FIG. An operation of converting the acquired coordinate value into a coordinate value of the voice coordinate system, and a localization position control based on the coordinate value of the voice coordinate system.
ãã®ä¸ã§ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãå®ä½ä½ç½®å¶å¾¡ã«ããçæããé³å£°ä¿¡å·ã«å¯¾ããåå¾ããæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãæ½ãããã«ããããããªãã¡ãå®ä½ä½ç½®å¶å¾¡ã«åºã¥ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ã¦çæãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unã¨ãé³å£°ä¿¡å·ï¼¡R-upã«å¯¾ããé³å£°ä¿¡å·å¦çé¨ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«å¿ããæ®é¿ä»å å¦çãæ½ããã®ã§ããã   In step S306, a reverberation adding process based on the acquired reverberation data is performed on the audio signal generated by the localization position control. That is, the audio signal processing unit 5 performs the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up generated by the audio signal processing unit 5 based on the localization position control. The reverberation adding process is performed according to the reverberation data supplied from the reverberation effect control unit 22.
ãªãã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ããä¿¡å·å¦çè£
ç½®ã®åé¨ããã¼ãã¦ã¨ã¢ã«ããæ§æããå ´åãä¾ç¤ºãããããã®ä¸é¨åã¯å
¨é¨ãã½ããã¦ã¨ã¢å¦çã«ããå®ç¾ãããã¨ãå¯è½ã§ããããã®å ´åãä¿¡å·å¦çè£
ç½®ã¨ãã¦ã¯ãä¸è¨å³ï¼ï¼ã«ç¤ºããå¦çã®ãã¡å¯¾å¿ããå¦çãå®è¡ããããã®ããã°ã©ã ã«å¾ã£ã¦åä½ãããã¤ã¯ãã³ã³ãã¥ã¼ã¿ãªã©ã§æ§æããã°ããããã®å ´åãä¿¡å·å¦çè£
ç½®ã«å¯¾ãã¦ã¯ï¼²ï¼¯ï¼çã®è¨é²åªä½ãåããããããã«ä¸è¨ããã°ã©ã ãè¨é²ãããã
In addition, although the case where each part of the signal processing apparatus is configured by hardware is illustrated as the third embodiment, part or all thereof can be realized by software processing. In that case, the signal processing apparatus may be configured by a microcomputer or the like that operates according to a program for executing a corresponding process among the processes shown in FIG. In this case, the signal processing apparatus is provided with a recording medium such as a ROM, and the program is recorded therein.
ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æ ï¼
ã¨ããã§ãããã¾ã§ã®èª¬æã§ã¯ã宿½ã®å½¢æ ã¨ãã¦ã®ä¿¡å·å¦çè£ ç½®ããè¨é²åªä½ã«ã¤ãã¦åçãè¡ãåçè£ ç½®å´ã«çµã¿è¾¼ãã§ãã¨ã³ãã¦ã¼ã¶å´ã«ããã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ããããã®ç·¨éãè¡ããããã®ã¨ããããå ã«è¿°ã¹ã徿¥ã®ç·¨éææ³ã®ããã«ãå¶ä½è å´ã§ãã®ãããªç·¨éãè¡ãã¨ããå ´åã«å¯¾å¿ãããããã«ã宿½ã®å½¢æ ã¨ãã¦ã®ä¿¡å·å¦çè£ ç½®ãè¨é²åªä½ã«ã¤ãã¦ã®è¨é²ãè¡ãè¨é²è£ ç½®ã«å¯¾ãã¦çµã¿è¾¼ãããã«ãããã¨ãã§ããã
<Fourth embodiment>By the way, in the description so far, the signal processing device as an embodiment is incorporated in the reproducing device side that reproduces the recording medium, and the video / sound field space that is more realistic on the end user side is reproduced. Although the editing is performed, the signal processing apparatus as an embodiment is recorded in order to cope with the case where the editing is performed by the producer as in the conventional editing method described above. It can also be incorporated in a recording apparatus that performs recording on a medium.
å³ï¼ï¼ã¯ããã®ããã«ãã¦å®æ½ã®å½¢æ
ã¨ãã¦ã®ä¿¡å·å¦çè£
ç½®ãåãã¦æ§æãããè¨é²è£
ç½®ï¼ï¼ã®å
鍿§æã«ã¤ãã¦ç¤ºãã¦ããã
ãªãããã®å³ã«ããã¦ãæ¢ã«å³ï¼ãå³ï¼ã«ã¦èª¬æããé¨åã«ã¤ãã¦ã¯åä¸ç¬¦å·ãä»ãã¦èª¬æãçç¥ãããã¾ãããã®å³ã§ãç ´ç·ã§å²ãé¨åï¼é³æºåº§æ¨åå¾é¨ï¼ãæ¯çæ
å ±çæé¨ï¼ï¼ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ãå ´ææ
å ±åå¾é¨ï¼ï¼ãå ´ææ
å ±ãã¼ã¿ãã¼ã¹ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ï¼ãä¿¡å·å¦çè£
ç½®ãå½¢æããé¨åã¨ãªãã FIG. 12 shows the internal configuration of the recording apparatus 40 configured to include the signal processing apparatus as the embodiment as described above.
In this figure as well, parts already described in FIGS. 1 and 7 are denoted by the same reference numerals and description thereof is omitted. Also in this figure, the parts enclosed by broken lines (sound source coordinate acquisition unit 6, ratio information generation unit 45, localization position control unit 46, location information acquisition unit 47, location information database 48, reverberation effect control unit 22, reverberation data table 23, The audio signal processing unit 5) forms a signal processing device.
å
ããã®å ´åãå³ç¤ºããããã«ãã¦é³å£°ä¿¡å·ï¼¡ãåçããé³å£°ä¿¡å·åçé¨ï¼ï¼ã¨ãæ åä¿¡å·ï¼¶ãåçããæ åä¿¡å·åçé¨ï¼ï¼ã¨ãåãããããä¸è¨é³å£°ä¿¡å·åçé¨ï¼ï¼ã§åçãããé³å£°ä¿¡å·ï¼¡ã¯é³å£°ä¿¡å·å¦çé¨ï¼ã«ä¾çµ¦ããããã¾ããä¸è¨æ åä¿¡å·åçé¨ï¼ï¼ã§åçãããæ åä¿¡å·ï¼¶ã¯ãããªã¨ã³ã³ã¼ãï¼ï¼ã«ä¾çµ¦ãããã¨å
±ã«ãå³ç¤ºããããã«ãã¦é³æºåº§æ¨åå¾é¨ï¼ã¨å ´ææ
å ±æ½åºé¨ï¼ï¼ã¨ã«å¯¾ãã¦ãåå²ãã¦ä¾çµ¦ãããã
ãªããããã§ã¯é³å£°ä¿¡å·åçé¨ï¼ï¼ãæ åä¿¡å·åçé¨ï¼ï¼ãè¨é²è£
ç½®ï¼ï¼å
é¨ã«åãããããã®ã¨ãã¦ããããè¨é²è£
ç½®ï¼ï¼å¤é¨ã«è¨ããããé³å£°ä¿¡å·åçé¨ï¼ï¼ãæ åä¿¡å·åçé¨ï¼ï¼ããããããå
¥åãããé³å£°ä¿¡å·ï¼¡ãæ åä¿¡å·ï¼¶ãå
¥åããããã«æ§æãããã¨ãã§ããã First, in this case, an audio signal reproducing unit 42 for reproducing the audio signal A and a video signal reproducing unit 43 for reproducing the video signal V are provided as shown in the figure. The audio signal A reproduced by the audio signal reproduction unit 42 is supplied to the audio signal processing unit 5. In addition, the video signal V reproduced by the video signal reproduction unit 43 is supplied to the video encoder 44 and branched and supplied to the sound source coordinate acquisition unit 6 and the location information extraction unit 47 as shown in the figure. Is done.
Here, the audio signal reproduction unit 42 and the video signal reproduction unit 43 are provided inside the recording device 40, but input from the audio signal reproduction unit 42 and the video signal reproduction unit 43 provided outside the recording device 40, respectively. The audio signal A and the video signal V can be input.
ä¸è¨é³æºåº§æ¨åå¾é¨ï¼ã¯ããã®å ´åãæ åä¿¡å·ï¼¶ãå
¥åãã¦ç»åå¦çã«ãã鳿ºã®ä½ç½®ãè¡¨ãæ å座æ¨ç³»ã®åº§æ¨å¤ãåå¾ããã
鳿ºåº§æ¨åå¾é¨ï¼ã«ã¦åå¾ãããæ å座æ¨ç³»ã«ãã座æ¨å¤ã¯ãå³ç¤ºããããã«ãã¦æ¯çæ
å ±çæé¨ï¼ï¼ã«å¯¾ãã¦ä¾çµ¦ãããã In this case, the sound source coordinate acquisition unit 6 also receives the video signal V and acquires the coordinate value of the video coordinate system representing the position of the sound source by image processing.
The coordinate values in the video coordinate system acquired by the sound source coordinate acquisition unit 6 are supplied to the ratio information generation unit 45 as illustrated.
ããã§ãããã¾ã§ã®å宿½ã®å½¢æ
ã®ããã«ãåçè£
ç½®å´ã«å®æ½ã®å½¢æ
ã¨ãã¦ã®ä¿¡å·å¦çè£
ç½®ãçµã¿è¾¼ãã§ã¦ã¼ã¶å´ã§ã®ç·¨éãè¡ãããå ´åã«ã¯ãåã
ã®ã¦ã¼ã¶ããå®éã«ä½¿ç¨ããã¹ãã¼ã«ã·ã¹ãã ã«ããå®ä½å¯è½ç¯å²ã«ã¤ãã¦ã®æ
å ±ãå
¥åãããã¨ãã§ããããã«ãã£ã¦é©æ£ãªå¤æãããªã¯ã¹ãçæãããã¨ãã§ãã鳿ºä½ç½®ã¨ä»®æ³é³åã®ä½ç½®ã¨ã驿£ã«ä¸è´ããããã¨ãã§ããããããè¸ã¾ããã¨ãè¨é²è£
ç½®ï¼ï¼å´ã«ããã¦ãããã®ããã«ã¹ãã¼ã«ã·ã¹ãã ã«ããå®ä½å¯è½ç¯å²ã«å¿ãã¦å¤æãããªã¯ã¹ãçæãã¦åº§æ¨å¤æãè¡ããã¨ãèããããããããã«ä¼´ã£ã¦ã¯ãã¦ã¼ã¶å´ã§ä½¿ç¨ãããåã
ã®ã¹ãã¼ã«ã·ã¹ãã ã«å¯¾å¿ããã¦ãããããå¥ã
ã®ã³ã³ãã³ããè¨é²åªä½ã«è¨é²ããªããã°ãªããªããã¨ã«ãªããç¾å®çã§ã¯ãªãã
ããã§ãè¨é²è£
ç½®ï¼ï¼ã¨ãã¦ã¯ã鳿ºåº§æ¨åå¾é¨ï¼ã«ã¦åå¾ããã座æ¨å¤ï¼ï½ï¼ï½ï¼ã«ã¤ãã¦ãæ°´å¹³ç·ç»ç´ æ°ãåç´ç·ç»ç´ æ°ã«å¯¾ããããããã®å¤ã®æ¯çã«åºã¥ãã¦å®ä½ä½ç½®å¶å¾¡ãè¡ããã¨ã§ãã¦ã¼ã¶å´ã§ä½¿ç¨ãããåã
ã®ã¹ãã¼ã«ã·ã¹ãã ã®å¥ã«ããã驿£ã«é³æºä½ç½®ã¨ä»®æ³é³åã®ä½ç½®ã¨ãä¸è´ããããã¨ãã§ããããã«ããã Here, as in each of the previous embodiments, when the signal processing device as the embodiment is incorporated in the playback device and editing is performed on the user side, each user actually uses it. Information about the localization range by the speaker system can be input, and an appropriate conversion matrix can be generated, and the sound source position and the position of the virtual sound image can be properly matched. Based on this, it is conceivable that the recording device 40 side also performs coordinate transformation by generating a transformation matrix in accordance with the localization range by the speaker system in this way, but it is used on the user side accordingly. Therefore, different contents must be recorded on a recording medium in correspondence with each speaker system, which is not realistic.
Therefore, as the recording device 40, the localization position control is performed on the coordinate value (x, y) acquired by the sound source coordinate acquisition unit 6 based on the ratio of the respective values to the total number of horizontal pixels and the total number of vertical pixels. Thus, the sound source position and the position of the virtual sound image can be appropriately matched regardless of the individual speaker system used on the user side.
å
ãããã®å ´åã®åæã¨ãã¦ãå
ã®å³ï¼ã«ããã¦ç¤ºããä¸ä¸å·¦å³ã®äºæ¬¡å
æ¹åã«ã¤ãã¦ãåã¹ãã¼ã«ï¼³ï¼°ã«ããå®ç¾ãããå®ä½å¯è½ç¯å²ã®ä¸å¿ç¹ã¨ã表示ç»é¢ã®ä¸å¿ç¹ã¨ãä¸è´ããããã«ãã¦åã¹ãã¼ã«ï¼³ï¼°ã¨ãã£ã¹ãã¬ã¤ã¾ãã¯ã¹ã¯ãªã¼ã³ãé
ç½®ãããæ¡ä»¶ã®ä¸ã§ã¯ãä¾ãã°ç»é¢å·¦ä¸ç«¯ç¹ã«æ ããã鳿ºã®é³å£°ã¯ãå®ä½å¯è½ç¯å²ã«ãããå·¦ä¸ç«¯ç¹ã«å®ä½ãããã°ï¼ã¤ã¾ãã¹ãã¼ã«ï¼³ï¼°L-upããåºåãããã¹ãé³å£°ã®ã²ã¤ã³ãç¸å¯¾çã«æã大ããããã°ï¼ãæ åå
ã®é³æºä½ç½®ã¨é³æºã®ä»®æ³é³åã¨ãä¸è´ãããã®ã¨ãã¦åç¾ãããã¨ãã§ãããã¨ããããã
ã¾ããä¾ãã°ç»é¢ã®ä¸å¿ç¹ã«æ ããã鳿ºã®é³å£°ã¯ãå®ä½å¯è½ç¯å²ã«ãããä¸å¿ç¹ã«å®ä½ãããã°ï¼åã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ãçããããã°ï¼ãæ åå
ã®é³æºä½ç½®ã¨é³æºã®ä»®æ³é³åã¨ãä¸è´ãããã®ã¨ãã¦åç¾ãããã¨ãã§ããã First, as a premise in this case, the center point of the localization range realized by each speaker SP and the center point of the display screen coincide with each other in the two-dimensional directions shown in FIG. Under the condition that each speaker SP and display or screen are arranged, for example, the sound of the sound source projected at the upper left corner of the screen is localized at the upper left corner in the localization possible range (that is, output from the speaker SPL-up). It can be seen that the sound source position in the video and the virtual sound image of the sound source can be reproduced as being the same if the gain of the power to be sound is set to be the highest.
For example, if the sound of the sound source displayed at the center point of the screen is localized at the center point in the localization possible range (if the gain of the sound from each speaker SP is made equal), the sound source position in the video and the sound source It can be reproduced as the virtual sound image matches.
ããã§ãå
ã®å³ï¼ã«ããã°ããã®å ´åã®æ å座æ¨ç³»ã®åº§æ¨å¤ã®åç¹ï¼ï¼ï¼ï¼ï¼ã¯ç»é¢å·¦ä¸ç«¯ç¹ã¨ããã¦ãããå¾ã£ã¦åº§æ¨å¤ã®ï½ãï½ã®å¤ã«ã¤ãã¦ãããããæ°´å¹³ç·ç»ç´ æ°ãåç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼
ã§ããå ´åã«å¯¾å¿ãã¦ã¯ãå·¦ä¸ç«¯ã«é
ç½®ãããã¹ãã¼ã«ï¼³ï¼°L-upããã®é³å£°ã®ã²ã¤ã³ãæå¤§ã¨ããã°ãããã¨ããããã
åæ§ã«ãã¦ãï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼
ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼
ã§ããã°ãä»®æ³é³åã¯å®ä½å¯è½ç¯å²ã®ä¸å¿ç¹ã«å®ä½ãããã°ãããã¨ãããããã¤ã¾ããåã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ãçããè¨å®ããã°ãããã¨ããããã
ã¾ããä¾ãã°ï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼
ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼
ã§ããã°ãLchã®ï¼ã¤ã®ã¹ãã¼ã«ï¼³ï¼°Lããã®é³å£°ã®ã²ã¤ã³ããï¼²chã®ï¼ã¤ã®ã¹ãã¼ã«ï¼³ï¼°Rããã®é³å£°ã®ã²ã¤ã³ãããæ¯çã«å¿ããå大ãããªãããï¼ä¾ãã°ï¼ï¼ï¼åãªã©ï¼ã«è¨å®ããã°ãããã¨ããããã Here, according to FIG. 4, the origin (0, 0) of the coordinate value of the video coordinate system in this case is the upper left end point of the screen. Accordingly, when the ratio of the coordinate values x and y to the total number of horizontal pixels and the total number of vertical pixels is 0%, the gain of the sound from the speaker SPL-up arranged at the upper left corner It can be seen that the maximum is sufficient.
Similarly, if the ratio of the x value to the total horizontal number of pixels is 50% and the ratio of the y value to the total vertical number of pixels is 50%, the virtual sound image may be localized at the center point of the localization range. I understand that. That is, it can be seen that the gain of the sound from each speaker SP may be set equal.
For example, if the ratio of the x value to the total horizontal number of pixels is 25% and the ratio of the y value to the total vertical number of pixels is 50%, the audio gain from the two Lch speakers SPL is set to 2 of the Rch. It can be seen that the sound gain from the two speakers SPR may be set to be larger (for example, 1.5 times) corresponding to the ratio.
ãã®ããã«ãã¦ãåå¾ããã座æ¨å¤ã®ï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çã®æ å ±ã¨ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çã®æ å ±ã¨ã«ãããå®ä½å¯è½ç¯å²ã«ãããã©ã®ä½ç½®ã«ä»®æ³é³æºãå®ä½ãããã°ãããããããã®ã§ããããæ¯çæ å ±ã«åºã¥ããã¨ã§ãï¼ã¤ã®ã¹ãã¼ã«ï¼³ï¼°ããããããåºåãããé³å£°ä¿¡å·ã«ã¤ãã¦ã®é©æ£ãªã²ã¤ã³å¤ã決å®ãããã¨ãã§ããã   In this way, the virtual sound source can be placed at any position in the localization range by the information on the ratio of the x value of the acquired coordinate value to the total number of horizontal pixels and the information on the ratio of the y value to the total number of vertical pixels. Since it can be determined whether the localization should be performed, it is possible to determine appropriate gain values for the audio signals output from the four speakers SP based on the ratio information.
å³ï¼ï¼ã«ããã¦ãæ¯çæ å ±çæé¨ï¼ï¼ã¯ã鳿ºåº§æ¨åå¾é¨ï¼ããä¾çµ¦ãããæ å座æ¨ç³»ã«ãã座æ¨å¤ã¨ãåãã鳿ºåº§æ¨åå¾é¨ï¼ããä¾çµ¦ãããæ°´å¹³ç·ç»ç´ æ°ããã³åç´ç·ç»ç´ æ°ã®æ å ±ã«åºã¥ããåå¾ããã座æ¨å¤ã®ï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çã¨ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãç®åºãããããã¦ããããã®æ¯çæ å ±ããå®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ã«åºåããã   In FIG. 12, the ratio information generation unit 45 uses the coordinate values in the video coordinate system supplied from the sound source coordinate acquisition unit 6 and the information on the total number of horizontal pixels and the total number of vertical pixels supplied from the sound source coordinate acquisition unit 6. Based on the obtained coordinate value, the ratio of the x value to the total horizontal number of pixels and the ratio of the y value to the total vertical number of pixels are calculated. Then, the ratio information is output to the localization position control unit 46.
å®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ã¯ã忝çæ
å ±ã«åºã¥ããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ã«å¯¾ãããããä¸ããããã¹ãã²ã¤ã³å¤ã決å®ããã
ã¤ã¾ããå
ã®èª¬æããçè§£ãããããã«ããã®å ´åã¯ï½ã®å¤ã®æ¯çï¼ï¼ï¼
ãå·¦æ¹åã®ï¼ï¼¡ï¼¸å¤ãï½ã®å¤ã®æ¯çï¼ï¼ï¼ï¼ï¼
ã峿¹åã®ï¼ï¼¡ï¼¸å¤ã¨ããã¾ãï½ã®å¤ã®æ¯çï¼ï¼ï¼
ã䏿¹åã®ï¼ï¼¡ï¼¸å¤ãï½ã®å¤ã®æ¯çï¼ï¼ï¼ï¼ï¼
ã䏿¹åã®ï¼ï¼¡ï¼¸å¤ã¨ãã¦ãä¸ããããï½ã®å¤ã®æ¯çãï½ã®å¤ã®æ¯çã®æ
å ±ã«å¿ãã¦åã¹ãã¼ã«ï¼³ï¼°ãã¨ã®åã²ã¤ã³å¤ï¼ã²ã¤ã³å¤GL-unãGL-upãGR-unãGR-upï¼ã決å®ããã
ãããåã²ã¤ã³å¤ã¯ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ä¾çµ¦ãããã The localization position control unit 46 determines a gain value to be given to each sound to be output from each speaker SP based on each ratio information.
That is, as understood from the above description, in this case, the ratio of x value = 0% is the leftward MAX value, the ratio of x value = 100% is the rightward MAX value, and the value of y Each speaker SP according to information on a given x value ratio and y value ratio, where 0% is a MAX value in the upward direction and y value ratio is 100% in a downward MAX value. Each gain value (gain value GL-un, GL-up, GR-un, GR-up) is determined.
These gain values are supplied to the audio signal processing unit 5.
䏿¹ãæ åå
容ã«å¿ããæ®é¿ãä»å ããããã®æ§æã¨ãã¦ããã®å ´åã¯ä¸è¿°ããå ´ææ
å ±åå¾é¨ï¼ï¼ã¨ãå ´ææ
å ±ãã¼ã¿ãã¼ã¹ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãè¨ããããã
ä¸è¨å ´ææ
å ±åå¾é¨ï¼ï¼ã¨å ´ææ
å ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã¯ãã¡ã¿ãã¼ã¿ã§ã¯ãªãæ åä¿¡å·ï¼¶ã«ã¤ãã¦ã®ç»åå¦çã«ããå ´ææ
å ±ãç¹å®ããããã«è¨ããããã
ã¤ã¾ããä¸è¨å ´ææ
å ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã«ã¯ãäºãè¨å®ãããè¤æ°ã®å ´æã«ã¤ãã¦ã®ç»åãã¼ã¿ï¼ç»åãµã³ãã«ï¼ã¨ãã®å ´ææ
å ±ã¨ã対å¿ä»ãããã¦æ ¼ç´ããã¦ãããããã¦ãå ´ææ
å ±åå¾é¨ï¼ï¼ã¯ãæ åä¿¡å·ï¼¶ã«ãããã¬ã¼ã ç»åã¨ãå ´ææ
å ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã«æ ¼ç´ãããè¤æ°ã®å ´æç»åã¨ã®ãããã³ã°ãè¡ããæããããã³ã°åº¦ãé«ãå ´æç»åã«å¯¾å¿ã¥ãããã¦ããå ´ææ
å ±ãåå¾ããããã«ãããã
ããã§ããããã³ã°åº¦ãããé¾å¤ãè¶
ããªãå ´åã«ã¯ãä¸è´ããå ´ææ
å ±ããªãã¨å¤å®ãããã¨ãã§ãããæãã¯ããã®ããã«ä¸è´ããå ´æããªãã¨ããå ´åçã«ã¯ãæ åä¿¡å·ï¼¶ã«ãããã¬ã¼ã ç»åã¨ä¸è¨å ´æç»åã¨ãæ¯è¼ãã¦ç°å¢ãé¡ä¼¼ãã¦ããã¨ãããå ´æç»åãå¤å®ãããã®å ´æç»åã«å¯¾å¿ã¥ããããå ´ææ
å ±ãåå¾ããããã«ãã§ããã On the other hand, as the configuration for adding reverberation according to the video content, in this case, the above-described location information acquisition unit 47, location information database 48, and reverberation effect control unit 22 are provided.
The location information acquisition unit 47 and the location information database 48 are provided for specifying location information by image processing on the video signal V, not metadata.
That is, the location information database 48 stores image data (image samples) for a plurality of preset locations and the location information in association with each other. Then, the location information acquisition unit 47 performs matching between the frame image based on the video signal V and a plurality of location images stored in the location information database 48, and the location information associated with the location image having the highest degree of matching. To get to.
Here, if the matching degree does not exceed a certain threshold, it can be determined that there is no matching location information. Alternatively, when there is no matching place in this manner, the frame image based on the video signal V and the place image are compared to determine a place image that is considered to have a similar environment, and the place image It is also possible to acquire location information associated with the.
å ´ææ å ±åå¾é¨ï¼ï¼ã«ããåå¾ãããå ´ææ å ±ã¯ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã«ä¾çµ¦ãããããã®å ´åãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã¯ãä¾çµ¦ãããå ´ææ å ±ã«å¿ããæ®é¿ãã¼ã¿ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ããåå¾ããããã«ãããã   The location information acquired by the location information acquisition unit 47 is supplied to the reverberation effect control unit 22. Also in this case, the reverberation effect control unit 22 acquires reverberation data corresponding to the supplied location information from the reverberation data table 23.
ãªããããã§ã¯èª¬æã®ä¾¿å®ä¸ãå ´ææ å ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã«ããã¦ã¯å ´æç»åã«å¯¾ãå ´ææ å ±ã対å¿ã¥ãããã®å ´ææ å ±ã«å¿ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãã対å¿ããæ®é¿ãã¼ã¿ãåå¾ããããã«æ§æããããå ´æç»åã«å¯¾ãç´æ¥çã«æ®é¿ãã¼ã¿ã対å¿ä»ãããã¼ã¿ãã¼ã¹ã¨ãããããã³ã°ã«ããä¸è´ãå¤å®ãããå ´æç»åããç´æ¥çã«å¯¾å¿ããæ®é¿ãã¼ã¿ãåå¾ããããã«æ§æãããã¨ãã§ããã   Here, for convenience of explanation, the location information database 48 associates location information with location images, and the reverberation effect control unit 22 acquires corresponding reverberation data from the reverberation data table 23 according to the location information. Although configured, a database in which reverberation data is directly associated with a place image may be used, and reverberation data directly corresponding to the place image determined to be matched by matching may be acquired.
é³å£°ä¿¡å·å¦çé¨ï¼ã¯ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããåã²ã¤ã³å¤ï¼GL-un,GL-up,GR-un,GR-upï¼ã«åºã¥ãããã®å ´åãã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-UNã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããããã«ããããããã¦ããã®ããã«çæããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upã«å¯¾ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãããããæ½ãã¦åºåããã   The audio signal processing unit 5 multiplies the gain value GL-un also in this case based on the gain values (GL-un, GL-up, GR-un, GR-up) supplied from the localization position control unit 46. The audio signal AL-up multiplied by the gain value GL-up, the audio signal AR-un multiplied by the gain value GR-un, and the audio signal AR-up multiplied by the gain value GR-up And to be generated. The reverberation adding process based on the reverberation data supplied from the reverberation effect control unit 22 is performed on the sound signal AL-un, the sound signal AL-up, the sound signal AR-un, and the sound signal AR-up generated in this way. Give each and output.
ãªã¼ãã£ãªã¨ã³ã³ã¼ãï¼ï¼ã¯ããã®ããã«ãã¦æ®é¿ãä»å ãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upãå ¥åããããããæå®ã®é³å£°å§ç¸®æ¹å¼ã«ããå§ç¸®ãããªã©æå®ã®ã¨ã³ã³ã¼ãå¦çãæ½ãã¦å¤éåå¦çé¨ï¼ï¼ã«ä¾çµ¦ããã   The audio encoder 49 inputs the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up to which reverberation is added in this way, and compresses them by a predetermined audio compression method. Then, the data is supplied to the multiplexing processing unit 50 after being subjected to a predetermined encoding process.
å¤éåå¦çé¨ï¼ï¼ã«ã¯ãä¸è¿°ãããããªãã³ã¼ãï¼ï¼ã«ããã¨ã³ã³ã¼ãå¦çãæ½ãããæ åä¿¡å·ï¼¶ãå
¥åãããã
ãããªã¨ã³ã³ã¼ãï¼ï¼ã«ããã¦ããæå®ã®é³å£°å§ç¸®æ¹å¼ã«ããå§ç¸®ãããªã©ã®æå®ã®ã¨ã³ã³ã¼ãå¦çãæ åä¿¡å·ï¼¶ã«æ½ãããã«ãããã
å¤éåå¦çé¨ï¼ï¼ã¯ããªã¼ãã£ãªã¨ã³ã³ã¼ãï¼ï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upã¨ããããªã¨ã³ã³ã¼ãï¼ï¼ããä¾çµ¦ãããæ åä¿¡å·ï¼¶ã¨ãæå®ã®å¤éåæ¹å¼ã«ããå¤éåãã¦è¨é²é¨ï¼ï¼ã«ä¾çµ¦ããã The video signal V that has been encoded by the video decoder 44 described above is also input to the multiplexing processing unit 50.
Also in the video encoder 44, the video signal V is subjected to predetermined encoding processing such as compression by a predetermined audio compression method.
The multiplexing processing unit 50 includes an audio signal AL-un, an audio signal AL-up, an audio signal AR-un, an audio signal AR-up supplied from the audio encoder 49, and a video signal V supplied from the video encoder 44. Are multiplexed by a predetermined multiplexing method and supplied to the recording unit 51.
è¨é²é¨ï¼ï¼ã¯ãä¸è¨å¤éåå¦çé¨ï¼ï¼ããè¨é²ãã¼ã¿ã¨ãã¦ä¾çµ¦ãããå¤éåãã¼ã¿ãå³ç¤ºããè¨é²åªä½ï¼ï¼ï¼ã«å¯¾ãã¦è¨é²ããã
è¨é²åªä½ï¼ï¼ï¼ã¯ãä¾ãã°ï¼£ï¼¤ãDVDããã«ã¼ã¬ã¤ãã£ã¹ã¯ãªã©ã®å
ãã£ã¹ã¯è¨é²åªä½ãæãã¯ãã¼ããã£ã¹ã¯ãªã©ã®ç£æ°è¨é²åªä½ãï¼ï¼¤ï¼Mini Dsicï¼ãªã©ã®å
ç£æ°è¨é²åªä½ã¨ããããæãã¯ããã以å¤ã®è¨é²åªä½ã¨ãããã¨ãã§ããã The recording unit 51 records the multiplexed data supplied as recording data from the multiplexing processing unit 50 on the recording medium 100 shown in the drawing.
The recording medium 100 is, for example, an optical disk recording medium such as a CD, DVD, or Blu-ray disc, a magnetic recording medium such as a hard disk, or a magneto-optical recording medium such as MD (Mini Dsic). Alternatively, other recording media can be used.
ãªããããã±ã¼ã¸ã¡ãã£ã¢ã¨ãã¦è²©å£²ããè¨é²åªä½ã¨ãã¦ã¯ãåçå°ç¨ã®ï¼²ï¼¯ï¼ãã£ã¹ã¯ã¨ãããã®ãä¸è¬çã§ãããããã®å ´åå¶ä½å´ã§ã¯ãä¸è¨è¨é²åªä½ï¼ï¼ï¼ã«ä¸æ¦è¨é²ããå¤éåãã¼ã¿ãåçãã¦ãã¹ã¿ãªã³ã°è£ ç½®ã«ä¾çµ¦ãã¦ãã£ã¹ã¯åç¤ã«ãããï¼ã©ã³ãã«ãããã¼ã¿è¨é²ãè¡ãããããã«ããã°ãããæãã¯ãå¤éåãã¼ã¿ãç´æ¥çã«ãã¹ã¿ãªã³ã°è£ ç½®ã«ä¾çµ¦ãã¦ãã£ã¹ã¯åç¤ã«å¯¾ããè¨é²ãè¡ãããããã«ãã¦ãè¯ãã   A recording medium sold as a package medium is generally a read-only ROM disk. In this case, the production side reproduces the multiplexed data once recorded on the recording medium 100 and performs mastering. It may be supplied to the apparatus so that data recording by pits / lands is performed on the disc master. Alternatively, the multiplexed data may be directly supplied to the mastering device to perform recording on the disc master.
ä¸è¨ã®ãããªæ§æã«ãã第ï¼ã®å®æ½ã®å½¢æ
ã¨ãã¦ã®è¨é²è£
ç½®ï¼ï¼ã«ããã°ãæ åå
ã«æ ãåºããã鳿ºã®ä½ç½®ã¨ãã®é³æºã®ä»®æ³é³åä½ç½®ã¨ãä¸è´ããããã¨ã¨ãå®éã®åºåé³å£°ã«ããé³ã®é¿ãã¨æ åå
容ã«å¿ããé³ã®é¿ãã¨ãä¸è´ããããã¨ã®åæ¹ãå®ç¾ãããã¨ã®ã§ããé³å£°ä¿¡å·ãåã³æ åä¿¡å·ãè¨é²åªä½ã«å¯¾ãã¦è¨é²ãããã¨ãã§ããã
ã¤ã¾ãããã®ãããªè¨é²åªä½ãåçè£
ç½®ã«ã¦åçããã¦æ ååã³é³å£°åºåãè¡ããããã¨ã§ãããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã
ã¾ãããã®è¨é²è£
ç½®ï¼ï¼ã«ããã¦ã¯ãæ åä¿¡å·ï¼¶ãã鳿ºä½ç½®ã®æ
å ±ã¨å
±ã«å ´ææ
å ±ãåå¾ãããã¨ãã§ããããã鳿ºä½ç½®ã®æ
å ±ã¨å ´ææ
å ±ã¨ã«åºã¥ãèªåçã«é³å£°ä¿¡å·ï¼¡ã«å¯¾ããã²ã¤ã³èª¿æ´åã³æ®é¿ä»å ãè¡ããããããã«ãããã³ã³ãã³ãã®å¶ä½å´ã¨ãã¦ã¯ãä¸è¿°ã®ããã«ãã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã«ããã£ã¦ã徿¥ã®ããã«é³æºä½ç½®ãå ´ææ
å ±ã鿬¡æå®ãã¦ã²ã¤ã³èª¿æ´ãæ®é¿ä»å ãè¡ãæéãçãããã®çµæã³ã³ãã³ãã®ç·¨éã«è¦ããæéã¨æéã大å¹
ã«åæ¸ãããã¨ãã§ããã According to the recording apparatus 40 according to the fourth embodiment having the above-described configuration, the position of the sound source displayed in the video is matched with the virtual sound image position of the sound source, and the sound of the actual output sound is reproduced. It is possible to record an audio signal and a video signal that can realize both reverberation and sound reverberation according to video content on a recording medium.
That is, when such a recording medium is reproduced by a reproducing apparatus and video and audio are output, a more realistic video / sound field space is reproduced.
In the recording apparatus 40, location information can be acquired from the video signal V together with information on the sound source position, and gain adjustment and reverberation addition are automatically performed on the audio signal A based on the information on the sound source position and the location information. Is done. As a result, on the content production side, in order to reproduce a more realistic video / sound field space as described above, the sound source position and location information are sequentially specified as before, and gain adjustment and reverberation addition are performed. As a result, it is possible to save labor and time required for editing contents.
ããã§ãããã¾ã§ã§èª¬æããå宿½ã®å½¢æ
ã§ã¯ã説æã®ä¾¿å®ä¸ã鳿ºãï¼ã¤ã®ã¿ã¨ããããã®ã¨ãã¦èª¬æãè¡ã£ããã鳿ºãè¤æ°ã¨ãããå ´åãããªãã¡æ åå
ã®Playerãã¨ã«è¤æ°ã®é³å£°ä¿¡å·ï¼¡ãã©ã¤ã³åé²ããå ´åã¯ãããããã®é³å£°ä¿¡å·ï¼¡ã«ã¤ãã¦åæ§ã®é³æºåº§æ¨å¤ã®åå¾ãåã³é³æºåº§æ¨å¤ã«å¿ããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ä¿¡å·ã«ã¤ãã¦ã®ã²ã¤ã³èª¿æ´å¦çãè¡ãããã®ä¸ã§ããããã²ã¤ã³èª¿æ´ãããé³å£°ä¿¡å·ãåã¹ãã¼ã«å¯¾å¿ã«ããããåæãã¦åºåãããã¨ããã°ããã
Here, in each of the embodiments described so far, for the sake of convenience of explanation, it has been described that there is only one sound source. However, when there are a plurality of sound sources, that is, a plurality of sound sources for each player in the video. When the audio signal A is recorded in a line, acquisition of the same sound source coordinate value for each audio signal A and gain adjustment processing for the audio signal to be output from each speaker SP according to the sound source coordinate value are performed. Then, it is only necessary to synthesize and output these gain-adjusted audio signals for each speaker.
ï¼å¤å½¢ä¾ï¼
以ä¸ãæ¬çºæã®å®æ½ã®å½¢æ ã«ã¤ãã¦èª¬æããããæ¬çºæã¨ãã¦ã¯ããã¾ã§ã«èª¬æããå宿½ã®å½¢æ ã«éå®ãããã¹ããã®ã§ã¯ãªãã
As mentioned above, although embodiment of this invention was described, as this invention, it should not be limited to each embodiment described so far.
ã¾ãã宿½ã®å½¢æ ã§ã¯ãä¸ä¸å·¦å³ã®äºæ¬¡å ç¯å²ã®ã¿ãå®ä½å¯è½ç¯å²ã¨ããå ´åãä¾ç¤ºããããå鳿ºãã¨ã«ãã®é³é調æ´ãè¡ããã¨ã§å¥¥è¡ãæ¹åã«ãå®ä½å¯è½ç¯å²ãæ¡å¤§ãããã¨ãã§ãããã¤ã¾ããä¾ãã°æ åä¿¡å·ã«åºã¥ãç»åå¦çã«ãã£ã¦ãæ åå ã«ããã鳿ºã®ç»åãµã¤ãºãæ¤åºããçµæã«åºã¥ãããã®é³æºã®å¥¥è¡ãæ¹åã«ãããä½ç½®æ å ±ãåå¾ãããããã¦ããã®å¥¥è¡ãæ¹åã«ãããä½ç½®æ å ±ã«å¿ãã¦å鳿ºã®é³éããããã調æ´ããã°ãä¸ä¸å·¦å³ã¨å ±ã«å¥¥è¡ãæ¹åãå ãã䏿¬¡å ç¯å²ã§ããããã®ä»®æ³é³åä½ç½®ãåç¾ãããã¨ãã§ããã¨ãã£ããã®ã§ããã   Further, in the embodiment, the case where only the two-dimensional range of up, down, left, and right is set as the localization range is exemplified, but the localization range can be expanded also in the depth direction by adjusting the volume for each sound source. . That is, for example, the position information of the sound source in the depth direction is acquired based on the result of detecting the image size of the sound source in the video by image processing based on the video signal. Then, by adjusting the volume of each sound source according to the position information in the depth direction, each virtual sound image position can be reproduced in a three-dimensional range including the depth direction as well as up, down, left, and right.
ã¾ããã¹ãã¼ã«ï¼³ï¼°ã¨ãã¦ã¯ï¼¬chã®ä¸ä¸ãï¼²chã®ä¸ä¸ã®ã¿ã¨ããå®ä½å¯è½ç¯å²ã¯ä¸ä¸å·¦å³æ¹åã®äºæ¬¡å ã®ç¯å²ã¨ããããä¾ãã°ï¼ï¼ï¼chãµã©ã¦ã³ãã·ã¹ãã ã®ããã«å徿¹åã«ãã¹ãã¼ã«ï¼³ï¼°ãé ç½®ããå ´åã«ã¯ãè¦è´è ã®å¾å´ã«ãå®ä½å¯è½ç¯å²ãæ¡å¤§ãããã¨ãã§ããã   In addition, the speaker SP is only above and below Lch and above and below Rch, and the localization range is a two-dimensional range in the up and down and left and right directions. In this case, the localization possible range can be expanded on the rear side of the viewer.
ã¾ããå宿½ã®å½¢æ ã®åçè£ ç½®ï¼ï¼ãï¼ï¼ãï¼ï¼ï¼ãåããã¡ãã£ã¢åçé¨ï¼ã¨ãã¦ã¯ãè¨é²åªä½ã«ã¤ãã¦ã®åçãè¡ããã®ã¨ãã¦èª¬æããããAï¼ã»ï¼¦ï¼ãï¼´ï¼¶æ¾éãªã©ãåä¿¡ã»å¾©èª¿ãã¦é³å£°ä¿¡å·ï¼åã³æ åä¿¡å·ï¼ãåºåãããã¥ã¼ãè£ ç½®ã¨ãã¦æ§æãããã¨ãã§ããã   Further, the media playback unit 2 included in the playback device (1, 20, 30) of each embodiment has been described as performing playback on a recording medium. However, AM / FM, TV broadcasting, etc. are received and demodulated. Thus, it can be configured as a tuner device that outputs an audio signal (and a video signal).
æãã¯ãå宿½ã®å½¢æ ã®åçè£ ç½®ã¨ãã¦ã¯ããã®ãããªã¡ãã£ã¢åçé¨ï¼ãåãã¦è¨é²åªä½ã«ã¤ãã¦ã®åçæ©è½ãã¾ãã¯æ¾éä¿¡å·ã®åä¿¡æ©è½ãæããããã«æ§æããã以å¤ã«ããä¾ãã°ã¢ã³ãè£ ç½®ãªã©ã¨ãã¦ãå¤é¨ã§åçï¼åä¿¡ï¼ãããé³å£°ä¿¡å·åã³æ åä¿¡å·ãå°ãªãã¨ãå ¥åãããããã®å ¥åä¿¡å·ã«åºã¥ãå宿½ã®å½¢æ ã®ä¿¡å·å¦çè£ ç½®ã¨ãã¦ã®åä½ãè¡ãããã«æ§æãããã¨ãã§ããã   Alternatively, the playback device according to each embodiment includes such a media playback unit 2 and is configured to have a playback function for a recording medium or a broadcast signal reception function. As an alternative, at least an audio signal and a video signal reproduced (received) externally can be input, and the operation as the signal processing device of each embodiment can be performed based on these input signals.
ã¾ããå宿½ã®å½¢æ
ã«ããã¦ãæ åå
容ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ããããã®ææ³ã¨ãã¦ã¯ãã¡ã¿ãã¼ã¿ã«åºã¥ãåå¾ããææ³ãæãã¯æ åä¿¡å·ï¼¶ã¨å ´æç»åã¨ã®ãããã³ã°çµæã«åºã¥ãåå¾ããææ³ãä¾ç¤ºãããããã以å¤ã«ããäºãæ åä¿¡å·ï¼¶ã«å ´æã®åç§°ãªã©ã示ããããããæ¿å
¥ãã¦ããææ³ãæãããã¨ãã§ãããããªãã¡ããã®å ´åå¶ä½å´ã§ã¯ãæ®å½±ã«ããå¾ãæ åä¿¡å·ï¼¶ã«å ´æã®åç§°ã表ãããããï¼ã¤ã¾ãç»åä¿¡å·ã§ããï¼ãåæãã¦ãããããã¦ãåçè£
ç½®å´ï¼ã¾ãã¯è¨é²è£
ç½®å´ï¼ã§ã¯ãäºãè¤æ°ã®ããããã®ç»åã¨ãã®å ´ææ
å ±ï¼æãã¯å¯¾å¿ããæ®é¿ãã¼ã¿ï¼ã¨ã対å¿ã¥ãããã¼ã¿ãã¼ã¹ãåãã¦ããããã«ãããããããããã®ç»åã¨æ åä¿¡å·ï¼¶ã®ãã¬ã¼ã ç»åã®æå®é¨åã¨ã®ãããã³ã°ãè¡ããä¸è¨æå®é¨åã®ç»åã¨ä¸è´ããã¨å¤å®ããããããã«å¯¾å¿ã¥ããããå ´ææ
å ±ãåå¾ãããã®å ´ææ
å ±ã«åºã¥ãæ®é¿ãã¼ã¿ãåå¾ããï¼æãã¯ãä¸è´ããã¨å¤å®ããããããã«å¯¾å¿ã¥ããããæ®é¿ãã¼ã¿ãç´æ¥çã«åå¾ããï¼ã
ã¾ãããã®ããã«æ åä¿¡å·ï¼¶ã«ãããããæ¿å
¥ãã¦ããææ³ã®ä»¥å¤ã«ããä¾ãã°ãã¼ã³ã¼ããªã©ã®æè¦ã®è¨å·ãã¾ãã¯ã¤ã©ã¹ãçã®ç»åä¿¡å·ãæ åä¿¡å·ï¼¶ã«åæãã¦ãããã¨ã«ãã£ã¦ããåæ§ã«æ åä¿¡å·ï¼¶ã«åºã¥ãç»åå¦çã«ããå ´ææ
å ±ãã¾ãã¯ç´æ¥çã«æ®é¿ãã¼ã¿ãåå¾ãããã¨ãã§ããã In each embodiment, examples of a method for acquiring reverberation data according to video content include a method based on metadata or a method based on a matching result between a video signal V and a location image. However, other than this, a technique of inserting a telop indicating the name of the place in the video signal V in advance can also be mentioned. That is, in this case, the production side synthesizes a telop (that is, an image signal) representing the name of the place with the video signal V obtained by photographing. On the playback device side (or recording device side), a database in which a plurality of telop images and their location information (or corresponding reverberation data) are associated in advance is provided, and these telop images and video signals are provided. V is matched with a predetermined portion of the frame image, location information associated with the telop determined to match the image of the predetermined portion is acquired, and reverberation data is acquired based on the location information (or match) The reverberation data associated with the telop that is determined to have been directly acquired).
In addition to the method of inserting a telop in the video signal V in this way, the same can be achieved by synthesizing a video signal V with a required symbol such as a barcode or an image signal such as an illustration. In addition, location information or reverberation data can be obtained directly by image processing based on the video signal V.
ã¾ããå宿½ã®å½¢æ ã«ããã¦ãæ åä¿¡å·ï¼¶ãã鳿ºä½ç½®ã®æ å ±ãåå¾ããã«ããã£ã¦ã¯ãäºã鳿ºã¨ãã¦ã®å¯¾è±¡ç©ã«ãã¼ã«ãä»ãã¦ãããã®ãã¼ã«ããã©ããã³ã°ããææ³ãä¾ç¤ºãããããã以å¤ã«ããä¾ãã°ç»åå¦çã«ããæ åä¸ã®ç¹å®ã®é³æºã®ç»åãã¼ã¿ããã©ããã³ã°ãããã¨ã§ãã®ä½ç½®æ å ±ãåå¾ãããã¨ãã§ãããã¤ã¾ããã®å ´åãå ãã¯ä¸åº¦æ åä¿¡å·ï¼¶ãåçãã¦ãããã«æ ãåºããã鳿ºã®ç»åãã¼ã¿ãæä½ã«ããæå®ããããããã¦ãå®éã®åçæã«ã¯ãå ¥åãããæ åä¿¡å·ï¼¶ã®ãã¬ã¼ã ç»åä¸ãããã®ããã«æå®ãããç»åã¨ä¸è´ããé¨åãæ¤åºãããã®é¨åããã©ããã³ã°ããã¨ãã£ããã®ã§ããã   Moreover, in each embodiment, when acquiring the information of the sound source position from the video signal V, the method of tracking the marker by attaching a marker to the object as the sound source in advance is exemplified. For example, the position information can be obtained by tracking image data of a specific sound source in the video by image processing. That is, in this case, first, the video signal V is once reproduced, and the image data of the sound source displayed there is designated by operation. In actual reproduction, a portion matching the image thus designated is detected from the frame image of the input video signal V, and the portion is tracked.
ã¾ããå宿½ã®å½¢æ ã§ã¯ãæ¬çºæã®é³å£°å±æ§æ å ±ã¨ãã¦ã鳿ºã®ä½ç½®ãæ åå 容ã«å¿ããé¿ããç¹å®ããããã®æ å ±ãæãããããã®é³å£°å±æ§æ å ±ã¨ãã¦ã¯ãæ åå 容ã«å¿ãã¦è¨å ´æãé«ããããã®é³å£°èª¿æ´ï¼é³å£°ä¿¡å·å¦çï¼ãè¡ãã«ãããããã®èª¿æ´ãã©ã¡ã¼ã¿ã決å®ããããã«ç¹å®ãããã¹ãæ å ±ã§ãã£ã¦ãæ åä¿¡å·ã«ããæ åå 容ã«å¿ããé³å£°ä¿¡å·ã®é³é¿çãªå±æ§ã«ä¿ãæ å ±ããã°ãä»ã®æ å ±ãå«ããã®ã§ããã   In each embodiment, the audio attribute information according to the present invention includes information for specifying the sound according to the position of the sound source and the video content. When performing audio adjustment (audio signal processing) to enhance the feeling, it is information that should be specified to determine the adjustment parameter, and relates to the acoustic attributes of the audio signal according to the video content of the video signal If it is information, it includes other information.
ï¼,ï¼ï¼,ï¼ï¼ åçè£ ç½®ãï¼ ã¡ãã£ã¢åçé¨ãï¼ ãããªãã³ã¼ããï¼ ãªã¼ãã£ãªãã³ã¼ããï¼ é³å£°ä¿¡å·å¦çé¨ãï¼ é³æºåº§æ¨åå¾é¨ãï¼ åº§æ¨å¤æé¨ãï¼,ï¼ï¼ å®ä½ä½ç½®å¶å¾¡é¨ãï¼ å¤æãããªã¯ã¹ç®åºé¨ãï¼ï¼ æä½é¨ãï¼ï¼ ã¡ã¿ãã¼ã¿æ½åºé¨ãï¼ï¼ æ®é¿å¹æå¶å¾¡é¨ãï¼ï¼ æ®é¿ãã¼ã¿ãã¼ãã«ãï¼ï¼ è¨é²è£ ç½®ãï¼ï¼ é³å£°ä¿¡å·åçé¨ãï¼ï¼ æ åä¿¡å·åçé¨ãï¼ï¼ ãããªã¨ã³ã³ã¼ããï¼ï¼ æ¯çæ å ±çæé¨ãï¼ï¼ å ´ææ å ±åå¾é¨ãï¼ï¼ å ´ææ å ±ãã¼ã¿ãã¼ã¹ãï¼ï¼ ãªã¼ãã£ãªã¨ã³ã³ã¼ããï¼ï¼ å¤éåå¦çé¨ãï¼ï¼ è¨é²é¨ãï¼ï¼ï¼ è¨é²åªä½   1,20,30 playback device, 2 media playback unit, 3 video decoder, 4 audio decoder, 5 audio signal processing unit, 6 sound source coordinate acquisition unit, 7 coordinate conversion unit, 8,46 localization position control unit, 9 conversion matrix calculation Unit, 10 operation unit, 21 metadata extraction unit, 22 reverberation effect control unit, 23 reverberation data table, 40 recording device, 42 audio signal reproduction unit, 43 video signal reproduction unit, 44 video encoder, 45 ratio information generation unit, 47 Location information acquisition unit, 48 Location information database, 49 Audio encoder, 50 Multiplexing processing unit, 51 Recording unit, 100 Recording medium
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4