RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://patents.google.com/patent/JP2007158527A/en below:

JP2007158527A - Signal processing apparatus, signal processing method, reproducing apparatus, and recording apparatus

ä»¥ä¸ãçºæãå®æ½ããããã®æè¯ã®å½¢æï¼ä»¥ä¸å®æ½ã®å½¢æã¨ããï¼ã«ã¤ãã¦èª¬æãã¦ããã Â Â Hereinafter, the best mode for carrying out the invention (hereinafter referred to as an embodiment) will be described.

ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æï¼

FIG. 1 shows an internal configuration of a playback apparatus 1 including a signal processing apparatus as a first embodiment of the present invention.

First, the playback apparatus 1 includes a media playback unit 2 shown in the figure, for example, an optical disc recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), or a Blu-ray Disc (Blu-Ray Disc), or an MD (Mini Disc: a magneto-optical disk), a magnetic recording medium such as a hard disk, and a recording medium having a built-in semiconductor memory can be played back.

ããã§ãåãåæã¨ãã¦ãã¡ãã£ã¢åçé¨ï¼ãåçããè¨é²åªä½ã«ã¯ãæ ååã³é³å£°ãå«ãã³ã³ãã³ããè¨é²ãããã ãã®ãããªã³ã³ãã³ãã¨ãã¦ã¯ãä¾ãã°ã³ã³ãµã¼ãã©ã¤ããªã©ãåé²ããæè¬ã©ã¤ããããªã§ããå ´åãæ³å®ããã ä½ãããã®å ´åãè¨é²åªä½ã«å¯¾ãã¦ã¯ããã¼ã«ã«ãã®ã¿ã¼ããã©ã ããã¼ã¹ããã¼ãã¼ãï¼éµç¤æ¥½å¨ï¼ãªã©ã®æå±ã»æ¼å¥èï¼ä»¥ä¸Playerã¨ãè¨ãï¼ãã¨ã«ãããããåå¥ã«è¿æ¥ãã¤ã¯ãªã©ãç¨ãã¦ãã®é³å£°ãåé²ãï¼ããããã©ã¤ã³åé²ï¼ããã®ããã«Playerãã¨ã«ã©ã¤ã³åé²ããé³å£°ä¿¡å·ããä¾ãã°ãã©ãã¯ãã¨ã«åãããããªã©ãã¦å¥ãã«è¨é²ããããã«ããã¦ãããããã¦ããã®ãããªé³å£°ä¿¡å·ã¨å±ã«ããããPlayerãã³ã³ãµã¼ããã¼ã«ãªã©ã®ä¼å ´ã§æå±ã»æ¼å¥ããæ§åãæ®å½±ããæ åãåé²ããããã®ã¨ãªã£ã¦ããã Here, as a premise, content including video and audio is recorded on a recording medium reproduced by the media reproducing unit 2. As such content, for example, a case where a so-called live video recording a concert live is assumed. However, in this case, the recording medium is individually used for each vocalist, guitar, drum, bass, keyboard (keyboard instrument), etc. Audio is recorded (so-called line recording), and the audio signals recorded in a line for each player in this way are recorded separately, for example, divided into tracks. Along with such audio signals, a video of the player singing and performing in a venue such as a concert hall is recorded.

ä¾ãã°ãã®ãããªæ§æã«ããã³ã³ãã³ããæ³å®ããå ´åã«ããã¦ãåPlayerã¯ãããããç¬ç«ããé³æºã¨ãªãããã«ããããã¤ã¾ããæ åã«æ ãåºãããåPlayerã®ä½ç½®ããããããã®é³æºã®ä½ç½®ã¨ãªããã®ã§ããã åçè£ç½®ï¼ã¨ãã¦ã¯ãã©ã¤ã³åé²ãããåPlayerï¼åé³æºï¼ãã¨ã®é³å£°ä¿¡å·ãå®ä½ããä½ç½®ã¨ãæ ååã«æ ãåºãããåPlayerã®ä½ç½®ï¼åé³æºã®ä½ç½®ï¼ã¨ãä¸è´ããããã«åç¾ãããã¨ãç®çã¨ãããããªãã¡ããããå®ç¾ãããã¨ã§ãããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããããã«ãããã®ã§ããã For example, in the case of assuming content with such a configuration, each Player is configured to be an independent sound source. In other words, the position of each Player displayed in the video becomes the position of each sound source. The playback device 1 reproduces so that the position where the audio signal for each player (each sound source) recorded in the line matches the position of each player (the position of each sound source) displayed in the video. With the goal. That is, by realizing this, a more realistic video / sound field space is reproduced.

ã¾ãããã®å ´åãæ ååã®Playerã®ä½ç½®ã¨ãã¦ã¯ãå·¦å³æ¹åã¨å±ã«ä¸ä¸æ¹åãå®ç¾©ãã¦äºæ¬¡åçã«è¡¨ããã®ã¨ããããã«å¿ãPlayerãã¨ã®é³å£°ä¿¡å·ãå®ä½ããä½ç½®ï¼ä»®æ³é³åä½ç½®ï¼ã¨ãã¦ããä¸ä¸å·¦å³ã®äºæ¬¡åçã«åç¾ãããã®ã¨ãã¦ããã Â Â In this case, the position of the player in the video is defined in two dimensions by defining the vertical direction as well as the horizontal direction, and as a position (virtual sound image position) where the audio signal for each player is localized accordingly. It is supposed to be reproduced two-dimensionally up and down and left and right.

ãã®ããã«ãåçè£ç½®ï¼ã§çæããé³å£°ä¿¡å·ãé³å£°åºåããã¹ãã¼ã«ï¼³ï¼°ã¨ãã¦ã¯ãæ¬¡ã®å³ï¼ã«ç¤ºãããããã«ããã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ã®ä¸å¿ç¹ãä¸å¿ã¨ãã¦å·¦å³å¯¾è±¡ã«éç½®ãããï¼¬chã®ã¹ãã¼ã«ï¼³ï¼°Lã¨ï¼²chã®ã¹ãã¼ã«ï¼³ï¼°Rã¨ãåãããããã¦ããããï¼¬chã®ã¹ãã¼ã«ï¼³ï¼°Lã¨ï¼²chã®ã¹ãã¼ã«ï¼³ï¼°Rã¨ãã¦ã¯ããããããç¸¦æ¹åã«ãç©ã¿éãã¦éç½®ããããã«ããããã¤ã¾ããã®å ´åãï¼¬chã®ã¹ãã¼ã«ï¼³ï¼°Lã¨ãã¦ã¯ãä¸æ¹ã«éç½®ãããã¹ãã¼ã«ï¼³ï¼°L-unã¨ããã®ä¸æ¹ã«éç½®ãããã¹ãã¼ã«ï¼³ï¼°L-upã¨ãè¨ãããããåæ§ã«ï¼²chã®ã¹ãã¼ã«ï¼³ï¼°Rã¨ãã¦ã¯ãä¸æ¹ã«éç½®ãããã¹ãã¼ã«ï¼³ï¼°R-unã¨ããã®ä¸æ¹ã«éç½®ãããã¹ãã¼ã«ï¼³ï¼°R-upã¨ãè¨ããããã Â Â For this reason, as a speaker SP that outputs the sound signal generated by the playback apparatus 1 as a sound, as shown in FIG. 2, the Lch speaker SPL arranged on the left and right objects with the center point of the display or screen as the center. And Rch speaker SPR. The Lch speaker SPL and the Rch speaker SPR are stacked in the vertical direction. That is, in this case, as the Lch speaker SPL, a speaker SPL-un disposed below and a speaker SPL-up disposed above are provided. Similarly, as the Rch speaker SPR, a speaker SPR-un disposed below and a speaker SPR-up disposed above the speaker SPR-un are provided.

ãªããããã§æ³¨æç¹ã¨ãã¦ãä»¥ä¸ã§èª¬æããç¬¬ï¼ã®å®æ½ã®å½¢æãå«ããåå®æ½ã®å½¢æã«ããã¦ã¯ãèª¬æã®ä¾¿å®ä¸ãé³å£°ä¿¡å·ã«ã¯ï¼ã¤ã®é³æºï¼Playerï¼ã«ã¤ãã¦ã®é³å£°ã®ã¿ãå«ã¾ãã¦ãããã®ã¨ãã¦èª¬æãç¶ãããããªãã¡ããã®å ´åã®é³å£°ä¿¡å·ï¼¡ã¨ãã¦ã¯ãï¼ã¤ã®é³æºã«ã¤ãã¦ã©ã¤ã³åé²ããé³å£°ä¿¡å·ã®ã¿ãåçããããã®ã¨ããã Â Â It should be noted that in each embodiment including the first embodiment described below, for convenience of explanation, the sound signal includes only sound for one sound source (Player). The explanation will continue as if it were. That is, as the audio signal A in this case, only the audio signal recorded in a line for one sound source is reproduced.

å³ï¼ã«ããã¦ãã¡ãã£ã¢åçé¨ï¼ã«ããã¦ã¯ãä¸è¿°ã®ããã«ãã¦è¨é²åªä½ã«ã¤ãã¦ã®åçãè¡ããã¨ã§ãæ åä¿¡å·ï¼¶ãå«ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã¨ãé³å£°ä¿¡å·ï¼¡ãå«ãé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¨ãå¾ãããã ãããæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmãé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¯ãå®ãã¼ã¿ã¨ãã¦ã®æ åä¿¡å·ï¼¶ãé³å£°ä¿¡å·ï¼¡ã¨ãæå®ã®ä»å æå ±ã¨ãå¤éåãããã¹ããªã¼ã ãã¼ã¿ã§ããã In FIG. 1, the media playback unit 2 performs playback on the recording medium as described above, so that the video stream data V-strm including the video signal V and the audio stream data A-strm including the audio signal A are obtained. And is obtained. These video stream data V-strm and audio stream data A-strm are stream data in which the video signal V and audio signal A as actual data and predetermined additional information are multiplexed.

ããã§ãç¢ºèªã®ããã«ãæ¬¡ã®å³ï¼ã«ã¯ãä¸è¨æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã®ãã¼ã¿æ§é ãç¤ºãã¦ããããã®å³ï¼ã«ãç¤ºãããããã«ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã¯ãæ åä¿¡å·ï¼¶ã¨ãã®ä»å ãã¼ã¿ã¨ãå«ãã§æ§æããããä»å ãã¼ã¿ã¨ãã¦ã¯ãä¾ãã°ã»ã¯ã¿ã¼åä½ãªã©ã®æå®ãã¼ã¿åä½ãã¨ã«åãè¾¼ã¾ãããã¼ã¿ãããæ åä¿¡å·ï¼¶ã«ã¤ãã¦ã®ä»å çãªãã¼ã¿åå®¹ãæããã ãªããå³ç¤ºã¯çç¥ãããé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¨ãã¦ããåæ§ã«æå®ã®ãã¼ã¿åä½ãã¨ã«é³å£°ä¿¡å·ï¼¡ã«ã¤ãã¦ã®ä»å ãã¼ã¿ãåãè¾¼ã¾ããæ§é ãæãããã®ã¨ãªãã For confirmation, FIG. 3 shows the data structure of the video stream data V-strm. As shown in FIG. 3, the video stream data V-strm includes the video signal V and its additional data. The additional data is, for example, data embedded for each predetermined data unit such as a sector unit, and has additional data content for the video signal V. Although not shown, the audio stream data A-strm has a structure in which additional data for the audio signal A is similarly embedded for each predetermined data unit.

å³ï¼ã«ããã¦ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã¯ãããªãã³ã¼ãï¼ã«ä¾çµ¦ãããããã«ããã¦ãã³ã¼ãå¦çãæ½ããããã¨ã§æ åä¿¡å·ï¼¶ãå¾ãããã ã¾ããé³å£°ã¹ããªã¼ã ãã¼ã¿ï¼¡-strmã¯ãªã¼ãã£ãªãã³ã¼ãï¼ã«ä¾çµ¦ãããåæ§ã«ãã³ã¼ãå¦çãæ½ããããã¨ã§é³å£°ä¿¡å·ï¼¡ãå¾ãããã æ åä¿¡å·ï¼¶ã¯æ ååºåç«¯åï¼´ï½ã«ä¾çµ¦ãããã¨å±ã«ãå³ç¤ºããé³æºåº§æ¨åå¾é¨ï¼ã«å¯¾ãã¦ãåå²ãã¦ä¾çµ¦ããããæ ååºåç«¯åï¼´ï½ããã®æ åä¿¡å·ï¼¶ã¯ãåã®å³ï¼ã«ç¤ºãããã£ã¹ãã¬ã¤ã¾ãã¯ã¹ã¯ãªã¼ã³ï¼ããã¸ã§ã¯ã¿è£ç½®ï¼ã«ä¾çµ¦ãããã ä¸æ¹ãé³å£°ä¿¡å·ï¼¡ã¯ãé³å£°ä¿¡å·å¦çé¨ï¼ã«å¯¾ãã¦ä¾çµ¦ãããã In FIG. 1, video stream data V-strm is supplied to a video decoder 3, where a video signal V is obtained by performing a decoding process. The audio stream data A-strm is supplied to the audio decoder 4, and the audio signal A is obtained by performing decoding processing in the same manner. The video signal V is supplied to the video output terminal Tv and is also branched and supplied to the sound source coordinate acquisition unit 6 shown in the figure. The video signal V from the video output terminal Tv is supplied to the display or screen (projector device) shown in FIG. On the other hand, the audio signal A is supplied to the audio signal processing unit 5.

ãªãããã®å³ï¼ã§ã¯ç ´ç·ã«ãããæ¬¡ã«èª¬æããé³æºåº§æ¨åå¾é¨ï¼ãåº§æ¨å¤æé¨ï¼ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ãå¤æãããªã¯ã¹ç®åºé¨ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ãå²ã£ã¦ç¤ºãã¦ãããããããç ´ç·ã§å²ãããé¨åãç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ãå½¢æãããã®ã¨ãªãã Â Â In FIG. 1, the sound source coordinate acquisition unit 6, the coordinate conversion unit 7, the localization position control unit 8, the conversion matrix calculation unit 9, and the audio signal processing unit 5 described below are surrounded by broken lines. A portion surrounded by a broken line forms the signal processing apparatus according to the first embodiment.

é³æºåº§æ¨åå¾é¨ï¼ã¯ãä¸è¨æ åä¿¡å·ï¼¶ã«åºã¥ããæ åä¸ã®é³æºã®ä½ç½®ãè¡¨ãåº§æ¨å¤ï¼å¾è¿°ããæ ååº§æ¨ç³»ã®åº§æ¨å¤ï¼ãåå¾ããã ãã®ãããªæ åä¿¡å·ï¼¶ããã®é³æºåº§æ¨å¤ã®åå¾ã¯ãä¾ãã°ä»¥ä¸ã®ãããªææ³ã«ããå®ç¾ã§ããã ã¤ã¾ããäºãæ åæ®å½±æã«ããã¦ãPlayerã¨ãã¦ã®äººç©ã«å¯¾ãä¾ãã°èµ¤å¤ç·ã«ããï¼©ï¼¤æå ±ãçºåããçºåè£ç½®ãªã©ã®æå®ã®ãã¼ã«ã¼ãä»ãã¦æ åãæ®å½±ãã¦ãããé³æºåº§æ¨åå¾é¨ï¼ã§ã¯ãä¾çµ¦ãããæ åä¿¡å·ï¼¶ãããã®ãã¼ã«ã¼ã®ä½ç½®ãç»åå¦çã«ããæ¤åºããããããã©ããã³ã°ãããã¨ã§Playerã®æ åä¸ã«ãããä½ç½®æå ±ãããªãã¡é³æºã®åº§æ¨å¤ãé æ¬¡åå¾ããããã«æ§æãããã®ã§ããã ããã«ãã£ã¦æ åä¸ã®é³æºã®ä½ç½®æå ±ããæ åä¿¡å·ï¼¶ã«åºã¥ãåå¾ãããã¨ãã§ããã ã¾ããããã¨å±ã«é³æºåº§æ¨åå¾é¨ï¼ã¯ãå¥åãããæ åä¿¡å·ï¼¶ã®æ°´å¹³ç·ç»ç´ æ°ã¨åç´ç·ç»ç´ æ°ã®æå ±ããå¾è¿°ããå¤æãããªã¯ã¹ç®åºé¨ï¼ã«ä¸ããã Based on the video signal V, the sound source coordinate acquisition unit 6 acquires coordinate values (coordinate values of a video coordinate system described later) representing the position of the sound source in the video. Such acquisition of the sound source coordinate value from the video signal V can be realized by the following method, for example. That is, at the time of video recording, a video is previously recorded by attaching a predetermined marker such as a light emitting device that emits ID information by infrared rays to a person as a player, and is supplied by the sound source coordinate acquisition unit 6. By detecting the position of the marker from the video signal V by image processing and tracking it, the position information in the video of the player, that is, the coordinate value of the sound source is sequentially obtained. As a result, the position information of the sound source in the video can be acquired based on the video signal V. At the same time, the sound source coordinate acquisition unit 6 gives information about the total number of horizontal pixels and the total number of vertical pixels of the input video signal V to the conversion matrix calculation unit 9 described later.

åº§æ¨å¤æé¨ï¼ã¯ãé³æºåº§æ¨åå¾é¨ï¼ã«ããåå¾ãããåº§æ¨å¤ããå³ç¤ºããå¤æãããªã¯ã¹ç®åºé¨ï¼ã«ããç®åºãããå¤æãããªã¯ã¹ã«åºã¥ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããã Â Â The coordinate conversion unit 7 converts the coordinate value acquired by the sound source coordinate acquisition unit 6 into the coordinate value of the audio coordinate system based on the conversion matrix calculated by the conversion matrix calculation unit 9 shown in the figure.

ããã§ãæ ååã«ãããPlayerï¼é³æºï¼ã®ç§»åéã¯ãããã¾ã§æ ååã§ã®ç§»åéã§ãã£ã¦å®ä¸çã®ç§»åéã§ã¯ãªããä»®æ³é³åã®ä½ç½®ãæ ååã§ã®ç§»åéã ãç§»åããã¦ãæ ååã®Playerã®ä½ç½®ã¨ä»®æ³é³åã®ä½ç½®ã¨ã¯ä¸è´ããªããã¨ãèãããããããªãã¡ãæ ååã®é³æºã®ä½ç½®ã¯æ ååº§æ¨ç³»ã§å®ç¾©ãããã®ã«å¯¾ãããã®ä»®æ³é³åä½ç½®ã¯é³å£°åº§æ¨ç³»ï¼å®ä¸çåº§æ¨ç³»ï¼ã§å®ç¾©ãããã¹ããã®ã¨ãªãã Â Â Here, the amount of movement of the player (sound source) in the video is only the amount of movement in the video, not the amount of movement in the real world, and even if the position of the virtual sound image is moved by the amount of movement in the video The position of the player and the position of the virtual sound image may not match. That is, the position of the sound source in the video is defined in the video coordinate system, while the virtual sound image position is to be defined in the audio coordinate system (real world coordinate system).

ãã®ãã¨ããæ¬¡ã®å³ï¼ãå³ï¼ãåç§ãã¦èª¬æãããå³ï¼ã¯ãæ åä¿¡å·ï¼¶ã«åºã¥ãæ åãæ ãåºãããè¡¨ç¤ºç»é¢ï¼ãã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ï¼ã¨æ ååº§æ¨ç³»ã¨ã®é¢ä¿ã«ã¤ãã¦ç¤ºããå³ï¼ã§ã¯ä¸è¨è¡¨ç¤ºç»é¢ã¨åã¹ãã¼ã«ï¼³ï¼°ã®éç½®ä½ç½®ã¨é³å£°åº§æ¨ç³»ã¨ã®é¢ä¿ã«ã¤ãã¦ç¤ºãã¦ããã ãªããå³ï¼ã§ã¯å³ç¤ºã®é½åä¸ãã¹ãã¼ã«ï¼³ï¼°ãç¸¦æ¹åã«éãã¦éç½®ãããããã«ã¯ç¤ºãã¦ããªãããå®éã«ã¯åã®å³ï¼ã«ç¤ºããããã«ãã¦ã¹ãã¼ã«ï¼³ï¼°L-unã¨ã¹ãã¼ã«ï¼³ï¼°L-upãã¹ãã¼ã«ï¼³ï¼°R-unã¨ã¹ãã¼ã«ï¼³ï¼°R-upã¨ãããããç©ã¿éãããã¦éç½®ããããã®ã¨ããã This will be described with reference to FIGS. 4 and 5 below. FIG. 4 shows the relationship between a display screen (display or screen) on which video based on the video signal V is projected and the video coordinate system, and FIG. 5 shows the relationship between the display screen, the position of each speaker SP, and the audio coordinate system. Shows about. In FIG. 5, for the sake of illustration, the speaker SP is not shown to be stacked in the vertical direction, but actually, the speaker SPL-un and the speaker SPL- are shown in FIG. It is assumed that the up, speaker SPR-un and speaker SPR-up are stacked and arranged.

åãå³ï¼ã«ç¤ºãããã«ãæ ååº§æ¨ç³»ã¨ãã¦ã¯ãä¾ãã°è¡¨ç¤ºç»é¢ã®æ¨ªï¼æ°´å¹³ï¼æ¹åãï½è»¸ã¨ããç¸¦ï¼åç´ï¼æ¹åãï½è»¸ã¨ããè¡¨ç¤ºç»é¢ã®å·¦ä¸éã®åº§æ¨å¤ï¼ï½ï¼ï½ï¼ãï¼ï¼ï¼ï¼ï¼ãã¤ã¾ãåç¹ã¨ãããã¨ãã§ããããã®å ´åã«ããã¦ãåç¹ããæ°´å¹³æ¹åã¸ã®ç»ç´ æ°ããï¼ï¼ï¼ããåç´æ¹åã¸ã®ç»ç´ æ°ããï¼ï¼ãã§ããç¹ã¯ãå³ç¤ºããããã«åº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã¨è¡¨ããã¨ãã§ãããããã§ã¯ãæ åä¸ã®é³æºã®ä½ç½®ã®åº§æ¨å¤ãããã®åº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã®ä½ç½®ã§ãã£ãã¨ããã Â Â First, as shown in FIG. 4, as the video coordinate system, for example, the horizontal (horizontal) direction of the display screen is set as the x axis, the vertical (vertical) direction is set as the y axis, and the coordinate value (x, y of the upper left corner of the display screen). ) Can be (0, 0), that is, the origin. In this case, the point where the number of pixels in the horizontal direction from the origin is â100â and the number of pixels in the vertical direction is â50â can be expressed as coordinate values (100, 50) as illustrated. Here, it is assumed that the coordinate value of the position of the sound source in the video is the position of this coordinate value (100, 50).

ä¸æ¹ãå³ï¼ã«ãããé³å£°åº§æ¨ç³»ã«ããã¦ã¯ãã¹ãã¼ã«ï¼³ï¼°L-unãã¹ãã¼ã«ï¼³ï¼°L-upãã¹ãã¼ã«ï¼³ï¼°R-unãã¹ãã¼ã«ï¼³ï¼°R-upããã®é³å£°åºåã«ããå¯è½ãªä»®æ³é³åã®å®ä½ç¯å²ï¼ä»¥ä¸ãå®ä½å¯è½ç¯å²ã¨ç§°ããï¼ã®ä¸å¿ã®åº§æ¨å¤ï¼ï½ï¼ï½ï¼ãï¼ï¼ï¼ï¼ï¼ã¨è¡¨ç¾ããããã«ãããã ä¾ãã°ãåã®å³ï¼ã«ããã¦åã¹ãã¼ã«ï¼³ï¼°ããã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ã®ä¸å¿ç¹ãä¸å¿ã¨ãã¦å·¦å³åã³ä¸ä¸å¯¾ç§°ã«éç½®ããå ´åã«ã¯ãå³ç¤ºããããã«ãã¦è¡¨ç¤ºç»é¢ã®ä¸å¿ãï¼ï¼ï¼ï¼ï¼ã¨ãªãããã«ãããã ãã®å ´åãæ°´å¹³æ¹åã¯ï½è»¸ãåç´æ¹åã¯ï½è»¸ã§è¡¨ããã¾ãï½è»¸æ¹åã«ããã¦ä¸å¿ããä¸æ¹åãæ£ã®å¤ãä¸æ¹åãè² ã®å¤ã«ããç¤ºããã¾ãï½è»¸æ¹åã«ããã¦ã¯å³æ¹åãæ£ã®å¤ãå·¦æ¹åãè² ã®å¤ã«ããç¤ºããããã«ããä¸å¿ããå³æ¹åã«ï¼ï¼ï¼cmãä¸æ¹åã«ï¼ï¼cmã¨ãªãä½ç½®ã¯ãå³ä¸ã«é»ä¸¸ã§ç¤ºãåº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã¨è¡¨ããã¨ãã§ããã On the other hand, in the audio coordinate system in FIG. 5, the localization range of a virtual sound image (hereinafter referred to as a localization possible range) that can be generated by audio output from the speaker SPL-un, the speaker SPL-up, the speaker SPR-un, and the speaker SPR-up. ) Center coordinate value (x, y) is expressed as (0, 0). For example, when the speakers SP in FIG. 2 are arranged left-right and vertically symmetrically about the center point of the display or screen, the center of the display screen is (0, 0) as shown in the figure. Is done. Again, the horizontal direction is represented by the x-axis and the vertical direction is represented by the y-axis. In the y-axis direction, the upward direction from the center is indicated by a positive value, and the downward direction is indicated by a negative value. In the x-axis direction, the right direction is indicated by a positive value and the left direction is indicated by a negative value. As a result, the position that is 100 cm in the right direction and 50 cm in the upward direction from the center can be expressed as a coordinate value (100, 50) indicated by a black circle in the drawing.

ããã§ãå³ï¼ã«ç¤ºãããæ ååº§æ¨ç³»ã§ã®é³æºä½ç½®ã®åº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ããã®ãããªé³å£°åº§æ¨ç³»ã«ãã®ã¾ã¾é©ç¨ããã¨ãã¦ããä¸è¨ã®ããã«é³å£°åº§æ¨ç³»ã«ãããåº§æ¨å¤ï¼ï¼ï¼ï¼ï¼ï¼ï¼ï¼ã¯ç»é¢ä¸å¿ããå³æ¹åã«ï¼ï¼ï¼cmãä¸æ¹åã«ï¼ï¼cmã®ä½ç½®ã¨ãªããã¨ããããããããã«ãä¸¡èã¯ä¸è´ãããã®ã¨ã¯ãªããªããã¤ã¾ããå³ï¼ã«ç¤ºãé³æºã®ä½ç½®ã«å¿ãã¦ä»®æ³é³åãå®ä½ãããã¹ãæ£ããä½ç½®ã¯ãå®éã«ã¯å³ä¸ã®ç ´ç·ä¸¸å°ã§ç¤ºãä½ç½®ã§ããã®ã«å¯¾ãããã®å ´åã¯èª¤ã£ãä½ç½®ãä»®æ³é³åã®ä½ç½®ã¨ãã¦èªèããã¦ãã¾ããã¨ã«ãªãã Â Â Here, even if the coordinate value (100, 50) of the sound source position in the video coordinate system shown in FIG. 4 is directly applied to such a voice coordinate system, the coordinate value (100, 50) in the voice coordinate system as described above is used. 50) is 100 cm in the right direction and 50 cm in the upward direction from the center of the screen, so that they do not match. That is, the correct position where the virtual sound image should be localized in accordance with the position of the sound source shown in FIG. 4 is actually the position indicated by the dotted circle in the figure, but in this case, the incorrect position is the position of the virtual sound image. It will be recognized as a position.

ããã§ãå³ï¼ã«ç¤ºãåçè£ç½®ï¼ã§ã¯ãä¸è¿°ã®ããã«ãã¦åº§æ¨å¤æé¨ï¼ãè¨ããé³æºåº§æ¨åå¾é¨ï¼ã«ããåå¾ãããæ ååº§æ¨ç³»ã®åº§æ¨å¤ããå¤æãããªã¯ã¹ç®åºé¨ï¼ã«ããç®åºãããå¤æãããªã¯ã¹ã«åºã¥ãã¦é³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æãããã®ã¨ãã¦ããã ãã®å ´åãå¤æãããªã¯ã¹ã¯ãæ ååº§æ¨ç³»ã«ããï¼ç¹ã®åº§æ¨å¤ã¨ããããï¼ç¹ã®åãã¨å¯¾å¿ããé³å£°åº§æ¨ç³»ï¼å®ä¸çåº§æ¨ç³»ï¼ã«ããï¼ç¹ã®åº§æ¨å¤ã¨ãä¸ãããããã¨ã§ç®åºãããã¨ãã§ããã å·ä½çã«ããã®å ´åã«ããã¦æ ååº§æ¨ç³»ã¨é³å£°åº§æ¨ç³»ã¨ã§å¯¾å¿é¢ä¿ãæãããªã®ã¯ãè¡¨ç¤ºç»é¢ã®åéã®ç«¯ç¹ã¨ãå®ä½å¯è½ç¯å²ã®åéã®ç«¯ç¹ã¨ãªããå¾ã£ã¦ãå¤æãããªã¯ã¹ã¯ãè¡¨ç¤ºç»é¢å´ã®åéç«¯ç¹ã®ãã¡ã®ï¼ç¹ã¨ãå®ä½å¯è½ç¯å²å´ã®åéã®ç«¯ç¹ã®ãã¡ã®å¯¾å¿ããï¼ç¹ã¨ã«ã¤ãã¦ã®åº§æ¨å¤ãããããä¸ãããããã¨ã§ãç®åºãããã¨ãã§ããã Therefore, in the playback apparatus 1 shown in FIG. 1, the coordinate conversion unit 7 is provided as described above, and the coordinate value of the video coordinate system acquired by the sound source coordinate acquisition unit 6 is converted by the conversion matrix calculation unit 9. It is assumed that the coordinate values of the voice coordinate system are converted based on the matrix. In this case, the conversion matrix is calculated by giving three coordinate values by the video coordinate system and three coordinate values by the audio coordinate system (real world coordinate system) corresponding to each of these three points. Can do. Specifically, in this case, the correspondence between the video coordinate system and the audio coordinate system is obvious at the four corners of the display screen and the four corners of the localization range. Therefore, the transformation matrix can be calculated by giving the coordinate values for three of the four corner end points on the display screen side and the corresponding three points of the four corner end points on the localization possible range side. it can.

å¤æãããªã¯ã¹ç®åºé¨ï¼ã«ã¯ãé³æºåº§æ¨åå¾é¨ï¼ããæ°´å¹³ç·ç»ç´ æ°ã¨åç´ç·ç»ç´ æ°ã®æå ±ãå¥åããããããç»ç´ æ°æå ±ã«åºã¥ããä¸è¨è¡¨ç¤ºç»é¢ã®åéã®ç«¯ç¹ã®ãã¡ã®æå®ã®ï¼ç¹ã«ã¤ãã¦ã®åº§æ¨å¤ãåå¾ããããã«ããããã¾ããå¤æãããªã¯ã¹ç®åºé¨ï¼ã«ã¯ãå³ç¤ºããæä½é¨ï¼ï¼ãä»ããã¦ã¼ã¶æä½ã«åºã¥ããä¸è¨æå®ã®ï¼ç¹ã¨åãä½ç½®é¢ä¿ã¨ãªãå®ä½å¯è½ç¯å²å´ã®ï¼ã¤ã®ç«¯ç¹ã«ã¤ãã¦ã®åº§æ¨å¤ãä¸ããããã å¤æãããªã¯ã¹ç®åºé¨ï¼ã¯ããããæ ååº§æ¨ç³»ã«ããï¼ç¹ã®ç«¯ç¹ã®åº§æ¨å¤ã¨é³å£°åº§æ¨ç³»ã«ããï¼ç¹ã®ç«¯ç¹ã®åº§æ¨å¤ã¨ã«åºã¥ããå¤æãããªã¯ã¹ãç®åºããã Information on the total number of horizontal pixels and the total number of vertical pixels is input to the conversion matrix calculation unit 9 from the sound source coordinate acquisition unit 6, and based on these pixel number information, predetermined three points among the end points of the four corners of the display screen To get the coordinate value for. In addition, based on a user operation via the illustrated operation unit 10, the conversion matrix calculation unit 9 is given coordinate values for the three end points on the localization possible range side having the same positional relationship as the predetermined three points. The conversion matrix calculation unit 9 calculates a conversion matrix based on the coordinate values of the three end points in the video coordinate system and the coordinate values of the three end points in the audio coordinate system.

ãªãããã®å ´åã®ã¦ã¼ã¶ã«å¯¾ãã¦ã¯ãå®éã«å®ä½å¯è½ç¯å²ã®ä¸è¨ï¼ã¤ã®ç«¯ç¹ã®åº§æ¨å¤ï¼ä¾ãã°cmåä½ï¼ã«ã¤ãã¦è¨æ¸¬ããããããï¼ç¹ã®åº§æ¨å¤ãç´æ¥çã«å¥åãããããã«ãã¦ãããããä¾ãã°ã¹ãã¼ã«ã·ã¹ãã ã¨ãã¦ã¯æ¨å¥¨ã®éç½®ä½ç½®å¯¸æ³ãè¦å®ããããã®ãããããã®å ´åã¯ã¹ãã¼ã«ã·ã¹ãã ã¨ãã¦ã©ã®ã·ã¹ãã ãç¨ãããã¦ãããããããã°ãå®ä½å¯è½ç¯å²ã®å¯¸æ³ããããããã£ã¦ä¸è¨é³å£°åº§æ¨ç³»ã«ããï¼ã¤ã®ç«¯ç¹ã®åº§æ¨å¤ãå¤æããããã®ãã¨ãããã¦ã¼ã¶ã«ã¯ã¹ãã¼ã«ã·ã¹ãã ã«ã¤ãã¦ã®è£½ååçªãè£½ååç§°çã®è£½åç¹å®æå ±ãé¸æåã¯æç¤ºå¥åãããæä½ã®ã¿ãè¡ããããã®è£½åç¹å®æå ±ã«åºã¥ãä¸è¨ï¼ç¹ã®é³å£°åº§æ¨ç³»ã«ããåº§æ¨å¤ãå¾ãããã«æ§æãããã¨ãã§ããã Â Â Note that the user in this case may actually measure the coordinate values (for example, in cm) of the above three end points in the localization range and directly input the coordinate values of these three points. However, for example, some speaker systems have recommended layout position dimensions. In this case, if it is known which system is used as the speaker system, the dimensions of the localization range can be known. The coordinate values of the three end points by are also found. From this, the user is only allowed to select or instruct to input product specifying information such as the product model number and product name for the speaker system, and based on the product specifying information, the coordinate values by the above three-point audio coordinate system are obtained. It can also be configured to obtain.

ã¾ããç¢ºèªã®ããã«è¿°ã¹ã¦ããã¨ãå¤æãããªã¯ã¹ã®ç®åºã¯ãæ ååº§æ¨ç³»ã¨é³å£°åº§æ¨ç³»ã¨ã®å¯¾å¿é¢ä¿ãç¶æãããéãã«ããã¦ã¯ãåè¨ç®ã®å¿è¦ã¯ãªããããªãã¡ãä¾ãã°ãã£ã¹ãã¬ã¤åã¯ã¹ã¯ãªã¼ã³ã¨ãã¦ç»ç´ æ°ã®ç°ãªãè£½åãç¨ãããããªã©æ ååº§æ¨ç³»ãå¤åããå ´åããç°ãªãã¹ãã¼ã«ã·ã¹ãã ãä½¿ç¨ãã¦é³å£°åº§æ¨ç³»ãå¤åããçã®å ´åã«ã®ã¿ãåè¨ç®ãè¡ãããããã«ãããã°ããã Â Â For confirmation, the calculation of the conversion matrix does not require recalculation as long as the correspondence between the video coordinate system and the audio coordinate system is maintained. That is, recalculation is performed only when the video coordinate system changes, such as when a product with a different number of pixels is used as a display or screen, or when the audio coordinate system changes using a different speaker system. What should be done.

åº§æ¨å¤æé¨ï¼ã¯ãä¸è¨ã®ããã«ãã¦ç®åºãããå¤æãããªã¯ã¹ãç¨ãã¦ãé³å£°åº§æ¨åå¾é¨ï¼ã«ããåå¾ãããæ ååº§æ¨ç³»ã«ããé³æºä½ç½®ã®åº§æ¨å¤ããé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«é æ¬¡å¤æããããã«ããããããã¦ããã®ããã«ãã¦å¾ãããé³æºä½ç½®ã®é³å£°åº§æ¨ç³»ã«ããåº§æ¨å¤ããå®ä½ä½ç½®å¶å¾¡é¨ï¼ã«å¯¾ãã¦ä¾çµ¦ããããã«ãããã Â Â The coordinate conversion unit 7 sequentially converts the coordinate value of the sound source position in the video coordinate system acquired by the audio coordinate acquisition unit 6 into the coordinate value of the audio coordinate system, using the conversion matrix calculated as described above. To be done. Then, the coordinate value of the sound source position obtained in this way in the audio coordinate system is supplied to the localization position control unit 8.

å®ä½ä½ç½®å¶å¾¡é¨ï¼ã¯ãä¾çµ¦ãããé³å£°åº§æ¨ç³»ã«ããé³åä½ç½®ã«ä»®æ³é³æºãå®ä½ãããããã«ãå³ï¼ã«ç¤ºããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ã«å¯¾ãããããä¸ããããã¹ãã²ã¤ã³å¤ãæ±ºå®ããã ããªãã¡ãä¾çµ¦ãããé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã¨ãã¦ï½ã®å¤ã¨ï½ã®å¤ã¨ãå±ã«æ£ã®å¤ã§ããã°ããã®å¤ã«å¿ãã¦ã¹ãã¼ã«ï¼³ï¼°R-upããåºåãããã¹ãé³å£°ã®ã²ã¤ã³ãä»ã®ã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ã«å¯¾ãã¦ç¸å¯¾çã«å¤§ãããªãããã«ãåã²ã¤ã³å¤ãæ±ºå®ãããæãã¯ãä¾çµ¦ãããåº§æ¨å¤ã¨ãã¦ï½ã®å¤ã¨ï½ã®å¤ã¨ãå±ã«è² ã®å¤ã§ããã°ããã®å¤ã«å¿ãã¦ã¹ãã¼ã«ï¼³ï¼°L-unããåºåãããã¹ãé³å£°ã®ã²ã¤ã³ãä»ã®ã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ã«å¯¾ãã¦ç¸å¯¾çã«å¤§ãããªãããã«ãåã²ã¤ã³å¤ãæ±ºå®ããã¨ãã£ããã®ã§ããã The localization position control unit 8 determines gain values to be given to the sounds to be output from the speakers SP shown in FIG. 2 in order to localize the virtual sound source to the sound image position in the supplied voice coordinate system. . In other words, if both the x value and the y value are positive values as the coordinate values of the supplied voice coordinate system, the gain of the voice to be output from the speaker SPR-up according to the value is set to other speakers. Each gain value is determined so as to be relatively large with respect to the gain of the sound from the SP. Alternatively, if the supplied coordinate values x and y are both negative, the gain of the sound to be output from the speaker SPL-un according to the value is the sound from the other speaker SP. Each gain value is determined so as to be relatively large with respect to the gain.

é³å£°ä¿¡å·å¦çé¨ï¼ã¯ããªã¼ãã£ãªãã³ã¼ãï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããã²ã¤ã³èª¿æ´ãæ®é¿ä»å ãªã©ã®é³å£°ä¿¡å·å¦çãå®è¡ããããã«æ§æãããã ç¹ã«æ¬å®æ½ã®å½¢æã®å ´åã¯ãä¸è¨å®ä½ä½ç½®å¶å¾¡é¨ï¼ããä¾çµ¦ãããåã¹ãã¼ã«ï¼³ï¼°å¯¾å¿ã®ã²ã¤ã³å¤ã«åºã¥ããé³å£°ä¿¡å·ï¼¡ã«ã¤ãã¦ã®ã²ã¤ã³èª¿æ´ãè¡ãããã«ãããã å·ä½çã«ã¯ãå¥åãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-UNã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããã é³å£°ä¿¡å·å¦çé¨ï¼ã«ããçæãããé³å£°ä¿¡å·ï¼¡L-unã¯ãå³ç¤ºããããã«ãã¦é³å£°åºåç«¯åï¼´AUL-unã«ä¾çµ¦ããããåæ§ã«ãé³å£°ä¿¡å·ï¼¡L-upã¯é³å£°åºåç«¯åï¼´AUL-upãé³å£°ä¿¡å·ï¼¡R-unã¯é³å£°åºåç«¯åï¼´AUR-unãé³å£°ä¿¡å·ï¼¡R-upã¯é³å£°åºåç«¯åï¼´AUR-upã«å¯¾ãããããä¾çµ¦ãããã The audio signal processing unit 5 is configured to execute audio signal processing such as gain adjustment and reverberation addition for the audio signal A supplied from the audio decoder 4. Particularly in the case of the present embodiment, the gain adjustment for the audio signal A is performed based on the gain value corresponding to each speaker SP supplied from the localization position control section 8. Specifically, the audio signal AL-UN obtained by multiplying the input audio signal A by the gain value GL-un, the audio signal AL-up obtained by multiplying the gain value GL-up, and the gain value GR-un are obtained. The multiplied audio signal AR-un and the audio signal AR-up multiplied by the gain value GR-up are generated. The audio signal AL-un generated by the audio signal processing unit 5 is supplied to the audio output terminal TAUL-un as shown. Similarly, the audio signal AL-up is supplied to the audio output terminal TAUL-up, the audio signal AR-un is supplied to the audio output terminal TAUR-un, and the audio signal AR-up is supplied to the audio output terminal TAUR-up.

ããã¦ãé³å£°åºåç«¯åï¼´AUL-unã¯ãå³ï¼ã«ç¤ºããã¹ãã¼ã«ï¼³ï¼°L-unã¨æ¥ç¶ããããã¾ãé³å£°åºåç«¯åï¼´AUL-upã¯ã¹ãã¼ã«ï¼³ï¼°L-upãé³å£°åºåç«¯åï¼´AUR-unã¯ã¹ãã¼ã«ï¼³ï¼°R-unãé³å£°åºåç«¯åï¼´AUR-upã¯ã¹ãã¼ã«ï¼³ï¼°R-upã¨ããããæ¥ç¶ãããã ããã«ãã£ã¦ã¹ãã¼ã«ï¼³ï¼°L-unããã¯é³å£°ä¿¡å·ï¼¡L-unãåºåã§ããã¹ãã¼ã«ï¼³ï¼°L-upããã¯é³å£°ä¿¡å·ï¼¡L-upãåºåã§ãããã¾ããã¹ãã¼ã«ï¼³ï¼°R-unããã¯é³å£°ä¿¡å·ï¼¡R-unãåºåã§ããã¹ãã¼ã«ï¼³ï¼°R-upããã¯é³å£°ä¿¡å·ï¼¡R-upãåºåãããã¨ãã§ããã ã¤ã¾ããããã«ãã£ã¦æ ååã«æ ãåºãããPlayerã®ä½ç½®ï¼é³æºã®ä½ç½®ï¼ã¨ãã©ã¤ã³åé²ãããå½è©²Playerã®é³å£°ãå®ä½ããä½ç½®ï¼ä»®æ³é³åä½ç½®ï¼ã¨ãä¸è´ããããã«åç¾ãããã¨ãã§ããããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã¨ãã§ããã The audio output terminal TAUL-un is connected to the speaker SPL-un shown in FIG. The audio output terminal TAUL-up is connected to the speaker SPL-up, the audio output terminal TAUR-un is connected to the speaker SPR-un, and the audio output terminal TAUR-up is connected to the speaker SPR-up. As a result, the audio signal AL-un can be output from the speaker SPL-un, and the audio signal AL-up can be output from the speaker SPL-up. Further, the audio signal AR-un can be output from the speaker SPR-un, and the audio signal AR-up can be output from the speaker SPR-up. In other words, it is possible to reproduce the position of the player (sound source position) displayed in the video and the position (virtual sound image position) where the sound of the player recorded in the line matches. It is possible to reproduce a pleasing video / sound space.

ããã¾ã§ã§èª¬æããåçè£ç½®ï¼ã«ããã°ãæ åä¿¡å·ï¼¶ã«åºã¥ãé³æºã®åº§æ¨å¤ãåå¾ããããã®åº§æ¨å¤ã«åºã¥ãèªåçã«ä»®æ³é³æºã®å®ä½ä½ç½®å¶å¾¡ãè¡ããããã¤ã¾ããããã«ãã£ã¦ãã®å ´åã³ã³ãã³ãã®å¶ä½å´ã¨ãã¦ã¯ãä¸è¨ã®ããã«ãã¦æ ååã«æ ãåºãããé³æºã®ä½ç½®ã¨ãã®é³æºã®ä»®æ³é³åä½ç½®ã¨ãä¸è´ããããã«ãã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã«ããããæéè»¸ã«æ²¿ã£ã¦é³æºã®ä½ç½®æå ±ãæå®ãã¦ã²ã¤ã³èª¿æ´ãè¡ãæéãçããã®ã§ãããã«ä¼´ã£ã¦ã³ã³ãã³ãã®ç·¨éã«è¦ããæéã¨æéãæå¹ã«åæ¸ãããã¨ãã§ããã Â Â According to the reproducing apparatus 1 described so far, the coordinate value of the sound source is acquired based on the video signal V, and the localization position control of the virtual sound source is automatically performed based on the coordinate value. In other words, in this case, the content production side has a more realistic video / sound field space so that the position of the sound source displayed in the video and the virtual sound image position of the sound source match as described above. Can be saved, it is possible to save the effort and time required to edit the content.

ãªããããã§ã¯åã¹ãã¼ã«ï¼³ï¼°ããåºåãããé³å£°ä¿¡å·ã®ããããã®ã²ã¤ã³å¤ã®èª¿æ´ã«ããå®ä½ä½ç½®ã®å¶å¾¡ãè¡ããã®ã¨ãã¦ããããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããé³å£°ä¿¡å·ã®ä½ç¸å·®ã®èª¿æ´ã«ãã£ã¦å®ä½ä½ç½®å¶å¾¡ãè¡ããã¨ãã§ãããã¾ãã¯ããããã®åæ¹ã«ããå®ä½ä½ç½®å¶å¾¡ãè¡ããã¨ãã§ããã Â Â Here, the localization position is controlled by adjusting the gain value of each audio signal output from each speaker SP. However, the localization position control is performed by adjusting the phase difference of the audio signal output from each speaker SP. Can also be done. Alternatively, the localization position control can be performed by both of them.

å³ï¼ã¯ãä¸è¨ã«ããèª¬æããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ã®åä½ã«ã¤ãã¦ã®åä½æé ãããã¼ãã£ã¼ãã«ããç¤ºãã¦ããã å³ï¼ã«ããã¦ãåãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãæ åä¿¡å·ã«åºã¥ãé³æºä½ç½®ã®æ ååº§æ¨ç³»ã«ããåº§æ¨å¤ãåå¾ããããã®åä½ã¯ãé³æºåº§æ¨åå¾é¨ï¼ãããããªãã³ã¼ãï¼ã«ãããã³ã¼ãå¦çã«ããæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmããå¾ãããæ åä¿¡å·ï¼¶ã«åºã¥ããé³æºä½ç½®ã®åº§æ¨å¤ãåå¾ããåä½ã«ç¸å½ããã ãã®å ´åãé³æºä½ç½®ã®åº§æ¨å¤ã®åå¾ææ³ã¨ãã¦ã¯ãä¾ãã°åã«èª¬æããããã«ãåãã¯äºãæ åã®æ®å½±æã«ããã¦Playerã¨ãã¦ã®äººç©ã«å¯¾ãä¾ãã°èµ¤å¤ç·ï¼©ï¼¤ã®çºåè£ç½®ãªã©ã®æå®ã®ãã¼ã«ã¼ãä»ãã¦æ åãæ®å½±ãã¦ãããããã¦ãé³æºåº§æ¨åå¾é¨ï¼ã¨ãã¦ã¯ãä¾çµ¦ãããæ åä¿¡å·ï¼¶ãããã®æå®ã®ãã¼ã«ã¼ã®ä½ç½®ãç»åå¦çã«ããæ¤åºããããããã©ããã³ã°ãããã¨ã§Playerã®æ åä¸ã«ãããä½ç½®æå ±ãããªãã¡é³æºä½ç½®ã®åº§æ¨å¤ãé æ¬¡åå¾ããããã«ããã FIG. 6 is a flowchart showing an operation procedure for the operation of the signal processing apparatus according to the first embodiment described above. In FIG. 6, first, in step S101, the coordinate value of the sound source position in the video coordinate system is acquired based on the video signal. This operation corresponds to an operation in which the sound source coordinate acquisition unit 6 acquires the coordinate value of the sound source position based on the video signal V obtained from the video stream data V-strm by the decoding process by the video decoder 3. In this case, as a method for acquiring the coordinate value of the sound source position, for example, as described above, first, a predetermined marker such as an infrared ID light emitting device is attached to a person as a player in advance when shooting a video. And record a video. Then, the sound source coordinate acquisition unit 6 detects the position of the predetermined marker from the supplied video signal V by image processing, and tracks this to position information in the player video, that is, the coordinate value of the sound source position. Are acquired sequentially.

ã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãåå¾ããåº§æ¨å¤ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããã ã¤ã¾ããåº§æ¨å¤æé¨ï¼ããå¤æãããªã¯ã¹ç®åºé¨ï¼ã«ããç®åºãããå¤æãããªã¯ã¹ã«åºã¥ããé³æºåº§æ¨åå¾é¨ï¼ã«ããåå¾ãããåº§æ¨å¤ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããã In step S102, the acquired coordinate values are converted into coordinate values in the audio coordinate system. That is, the coordinate conversion unit 7 converts the coordinate value acquired by the sound source coordinate acquisition unit 6 into the coordinate value of the audio coordinate system based on the conversion matrix calculated by the conversion matrix calculation unit 9.

ã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«åºã¥ãå®ä½ä½ç½®å¶å¾¡ãè¡ãã ãã®ã¹ãããï¼³ï¼ï¼ï¼ã¨ãã¦ã¯ãåãå®ä½ä½ç½®å¶å¾¡é¨ï¼ããä¾çµ¦ãããé³å£°åº§æ¨ç³»ã«ããé³åä½ç½®ã«ä»®æ³é³æºãå®ä½ãããããã«ãå³ï¼ã«ç¤ºããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ä¿¡å·ã«å¯¾ãããããä¸ããããã¹ãã²ã¤ã³å¤ï¼GL-unãGL-upãGR-unãGR-upï¼ãæ±ºå®ãããããã¦ãé³å£°ä¿¡å·å¦çé¨ï¼ããå¥åãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ãã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-unã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããã ããã«ãããæ ååã«æ ãåºãããPlayerã®ä½ç½®ï¼é³æºã®ä½ç½®ï¼ã¨ãã©ã¤ã³åé²ãããå½è©²Playerã®é³å£°ãå®ä½ããä½ç½®ã¨ãä¸è´ããããã«åç¾ãããã¨ã®ã§ããé³å£°ä¿¡å·ãçæãããã In step S103, localization position control based on the coordinate values of the voice coordinate system is performed. As this step S103, first, the localization position control unit 8 respectively applies to the audio signal to be output from each speaker SP shown in FIG. 2 in order to localize the virtual sound source to the sound image position by the supplied audio coordinate system. Determine the gain values (GL-un, GL-up, GR-un, GR-up) to be given. Then, the audio signal processing unit 5 performs an audio signal AL-un obtained by multiplying the input audio signal A by the gain value GL-un, an audio signal AL-up obtained by multiplying the gain value GL-up, and a gain value GR. The audio signal AR-un multiplied by -un and the audio signal AR-up multiplied by the gain value GR-up are generated. Thus, an audio signal that can be reproduced so that the position of the player (sound source position) displayed in the video and the position where the audio of the player recorded in the line matches is generated.

ãªããããã¾ã§ã®èª¬æã§ã¯ãæ¬å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ã®åé¨ããã¼ãã¦ã¨ã¢ã«ããæ§æããå ´åãä¾ç¤ºãããããã®ä¸é¨åã¯å¨é¨ãã½ããã¦ã¨ã¢å¦çã«ããå®ç¾ãããã¨ãå¯è½ã§ããããã®å ´åãä¿¡å·å¦çè£ç½®ã¨ãã¦ã¯ãä¸è¨å³ï¼ã«ç¤ºããå¦çã®ãã¡å¯¾å¿ããå¦çãå®è¡ããããã®ããã°ã©ã ã«å¾ã£ã¦åä½ãããã¤ã¯ãã³ã³ãã¥ã¼ã¿ãªã©ã§æ§æããã°ããããã®å ´åãä¿¡å·å¦çè£ç½®ã«å¯¾ãã¦ã¯ï¼²ï¼¯ï¼çã®è¨é²åªä½ãåããããããã«ä¸è¨ããã°ã©ã ãè¨é²ãããã In the above description, the case where each part of the signal processing apparatus according to the present embodiment is configured by hardware is exemplified, but part or all of the part can be realized by software processing. In that case, the signal processing device may be configured by a microcomputer or the like that operates according to a program for executing a corresponding process among the processes shown in FIG. In this case, the signal processing apparatus is provided with a recording medium such as a ROM, and the program is recorded therein.

ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æï¼

FIG. 7 shows the internal configuration of the playback apparatus 20 configured to include the signal processing apparatus as the second embodiment.

The second embodiment is configured so as to give a sound reverberation according to video content to an audio signal. Specifically, a sound reverberation corresponding to a place projected in the video is given to the audio signal.
In FIG. 7, parts already described in FIG. 1 are given the same reference numerals and description thereof is omitted.

ç¬¬ï¼ã®å®æ½ã®å½¢æã®åçè£ç½®ï¼ï¼ã¨ãã¦ã¯ãå³ï¼ã«ç¤ºããåçè£ç½®ï¼ã®æ§æããæä½é¨ï¼ï¼ãçç¥ããã¨å±ã«ãå³ä¸ç ´ç·ã«ããå²ãé¨åãå¤æ´ãããã®ã¨ãªãã ãã®ç ´ç·ã«ããå²ãé¨åããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ãæ§æããé¨åã¨ãªããã¤ã¾ããç¬¬ï¼ã®å®æ½ã®å½¢æã®ä¿¡å·å¦çè£ç½®ã®æ§æè¦ç´ ã¯ãå°ãªãã¨ãå³ç¤ºããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ã¨ãªãã As the playback device 20 of the second embodiment, the operation unit 10 is omitted from the configuration of the playback device 1 shown in FIG. 1, and the portion surrounded by the broken line in the figure is changed. A portion surrounded by the broken line is a portion constituting the signal processing apparatus as the second embodiment. That is, the constituent elements of the signal processing apparatus according to the second embodiment are at least the metadata extraction unit 21, the reverberation effect control unit 22, the reverberation data table 23, and the audio signal processing unit 5, which are illustrated.

åããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ã¯ããã®å ´åã®æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmåã«å«ã¾ããã¡ã¿ãã¼ã¿ãæ½åºããããã«ãããã Â Â First, the metadata extraction unit 21 extracts metadata included in the video stream data V-strm in this case.

ããã§ãç¬¬ï¼ã®å®æ½ã®å½¢æã§ã¯ãä¸è¿°ã®ããã«ãã¦æ ååå®¹ã«å¿ããé³ã®é¿ããé³å£°ä¿¡å·ã«ä¸ããã«ããã£ã¦ãäºãã³ã³ãã³ãã®å¶ä½å´ã«ããã¦ãæ åä¿¡å·ï¼¶ã«å¯¾ãã¦æ ååã«æ ãåºãããå ´æãç¹å®ããããã®å ´ææå ±ãä»å ãã¦ããããã«ããããããã¦ããã®ããã«å ´ææå ±ãä»å ããæ åä¿¡å·ãè¨é²åªä½ã«å¯¾ãã¦è¨é²ããããã«ããã¦ããã ç¢ºèªã®ããã«è¿°ã¹ã¦ããã¨ããã®ãããªæ ååã«æ ãåºãããå ´æãç¹å®ããããã®å ´ææå ±ã¯ãæ ååã«æ ãåºãããå ´æã«å¿ããé³ã®é¿ããåç¾ããä¸ã§ããã®å ´æã«å¿ããé³ã®é¿ããç¹å®ããããã®æå ±ã¨ãªããå¾ã£ã¦ãã®ãããªå ´ææå ±ã¯ãé³å£°ä¿¡å·ã®é³é¿çãªå±æ§ã«ä¿ãé³å£°å±æ§æå ±ã¨ãªããã®ã§ããã Here, in the second embodiment, when the sound signal according to the video content is given to the audio signal as described above, the location where the content is projected in the video signal V in advance on the content production side. The location information for specifying is added. The video signal with the location information added is recorded on the recording medium. For confirmation, the location information for identifying the location that appears in the video is based on the location of the sound that is reproduced according to the location that is projected in the video. This is information for identifying the sound of the sound. Therefore, such location information is audio attribute information related to the acoustic attribute of the audio signal.

å³ï¼ã¯ãç¬¬ï¼ã®å®æ½ã®å½¢æã®å ´åã®æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmã®æ§é ãç¤ºãã¦ãããããã®å ´åã¯å³ç¤ºããããã«ä»å ãã¼ã¿åã®æå ±ã¨ãã¦ãä¸è¨å ´ææå ±ãã¡ã¿ãã¼ã¿ã¨ãã¦æ ¼ç´ããããã«ããã¦ããã ä¾ãã°æ¬å®æ½ã®å½¢æã®ããã«ã³ã³ãã³ãã¨ãã¦ã©ã¤ãæ åãåé²ãããå ´åã«ã¯ãä¸è¨å ´ææå ±ã¨ãã¦ã¯ç¹å®ã®ã³ã³ãµã¼ããã¼ã«ãèå¥ããããã®æå ±ãæ ¼ç´ãããã®ã¨ããã°ãããæãã¯ãæ ååå®¹ã¨ãã¦ä¾ãã°ãå¤âãã³ãã«âå¤âã³ã³ãµã¼ããã¼ã«ããªã©ã®ããã«æç³»åã«æ²¿ã£ã¦å ´æãé·ç§»ããå ´åã«ã¯ãæéè»¸ã«æ²¿ã£ã¦ãããã®å ´æãç¹å®ããããã®å ´ææå ±ãæ ¼ç´ããã°ããã åã«ãè¿°ã¹ãããã«æ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmåã«ãããä»å ãã¼ã¿ã¯ãæå®ã®ãã¼ã¿åä½ãã¨ã«ä»å ããããã«ããã¦ããããã®ãã¨ã§ãæ ååå®¹ã¨ãã¦æéè»¸ã«æ²¿ã£ã¦å ´æãå¤åããå ´åã«ãå¯¾å¿ãã¦ãããããã®å ´æãè¡¨ãå ´ææå ±ãæéè»¸ä¸ã§å¯¾å¿ã¥ãã¦åãè¾¼ããã¨ãã§ããã FIG. 8 shows the structure of the video stream data V-strm in the case of the second embodiment. In this case, the location information is stored as metadata as information in the additional data as shown in the figure. Has been. For example, when a live video is recorded as content as in the present embodiment, information for identifying a specific concert hall may be stored as the location information. Alternatively, if the location of the video content changes along the time series, such as âoutside â tunnel â outside â concert hallâ, the location information for specifying these locations along the time axis is stored. do it. As described above, the additional data in the video stream data V-strm is added every predetermined data unit. This makes it possible to embed place information representing each place in association with each other on the time axis, corresponding to the case where the place changes along the time axis as video content.

ããã§ããã®å ´åãé³å£°ä¿¡å·ï¼¡ã¨æ åä¿¡å·ï¼¶ã¨ã¯åæããä¿¡å·ã§ãããããã¦ãä¸è¨èª¬æã«ããã°ãæ åä¿¡å·ï¼¶ã¨ä»å ãã¼ã¿åã®ã¡ã¿ãã¼ã¿ã¨ã¯åãæéè»¸ã«æ²¿ã£ãåæããæå ±ã¨ãªãããããã®ãã¨ããããã®å ´åã¯æ åä¿¡å·ï¼¶ã¨å±ã«ä¸è¨ã¡ã¿ãã¼ã¿ããæ¬çºæã§è¨ãé³å£°åææå ±ä¿¡å·ã¨ãªãã Â Â Here, in this case, the audio signal A and the video signal V are synchronized signals. According to the above description, the video signal V and the metadata in the additional data are synchronized information along the same time axis. Therefore, in this case, the metadata together with the video signal V becomes an audio synchronization information signal referred to in the present invention.

å³ï¼ã«ããã¦ãã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ã¯ããã®ãããªæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmããã¡ã¿ãã¼ã¿ãæ½åºããä¸è¨å ´ææå ±ãåå¾ããããã«ããããããã¦ããã®å ´ææå ±ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã«ä¾çµ¦ããã Â Â In FIG. 7, the metadata extraction unit 21 extracts metadata from such video stream data V-strm and acquires the location information. The location information is supplied to the reverberation effect control unit 22.

æ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã¯ãå³ç¤ºããæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ã«åºã¥ããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ããå¥åãããå ´ææå ±ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ãããã®æ®é¿ãã¼ã¿ã«åºã¥ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿ä»å å¦çã«ã¤ãã¦å¶å¾¡ããã æ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ã«ã¯ãå ´ææå ±ã¨ããã®å ´ææå ±ã«ããç¹å®ãããå ´æã§ã®é³ã®é¿ããåç¾ããããã®æ®é¿ãã¼ã¿ã¨ãå¯¾å¿ã¥ãããã¦æ ¼ç´ããã¦ãããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã¯ããã®ãããªæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãããå¥åãããå ´ææå ±ã¨å¯¾å¿ã¥ãããã¦ããæ®é¿ãã¼ã¿ãåå¾ãããã¨ã§ãå¯¾å¿ããæ®é¿ãã¼ã¿ãå¾ããã¨ãã§ããã ããã¦ããã®ãããªæ®é¿ãã¼ã¿ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ä¾çµ¦ãããã¨ã§ãå½è©²é³å£°ä¿¡å·å¦çé¨ï¼ã«ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿ä»å å¦çã«ã¤ãã¦å¶å¾¡ããããã«ãããã ã¤ã¾ãããã®å ´åã®é³å£°ä¿¡å·å¦çé¨ï¼ã¯ããªã¼ãã£ãªãã³ã¼ãï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãæ½ããããã«ãã£ã¦é³å£°ä¿¡å·ï¼¡ã«å¯¾ãã¦ã¯ãæ ååå®¹ã«å¿ããé³ã®é¿ããåç¾ããããã®æ®é¿ãä»å ããããã¨ã«ãªãã ããã¦ããã®ããã«æ®é¿ãä»å ããé³å£°ä¿¡å·ï¼¡ãããã®å ´åã®é³å£°åºåç«¯åï¼´AUã®æ°ã«å¿ããï¼ç³»çµ±ã«åå²ãã¦åºåããããã«ãããã The reverberation effect control unit 22 acquires reverberation data corresponding to the location information input from the metadata extraction unit 21 based on the reverberation data table 23 shown in the figure, and the audio signal A in the audio signal processing unit 5 based on the reverberation data. Controls reverberation addition processing for. The reverberation data table 23 stores location information and reverberation data for reproducing the sound of the sound at the location specified by the location information, and the reverberation effect control unit 22 By acquiring reverberation data associated with the input location information from such a reverberation data table 23, the corresponding reverberation data can be obtained. Then, by supplying such reverberation data to the audio signal processing unit 5, reverberation addition processing for the audio signal A in the audio signal processing unit 5 is controlled. In other words, the audio signal processing unit 5 in this case performs reverberation adding processing based on the reverberation data supplied from the reverberation effect control unit 22 on the audio signal A supplied from the audio decoder 4. As a result, reverberation for reproducing the sound of the sound according to the video content is added to the audio signal A. Then, the audio signal A to which reverberation is added in this way is branched into four systems according to the number of audio output terminals TAU in this case and output.

ãªããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã¯ãç¬¬ï¼ã®å®æ½ã®å½¢æã®ãããªä¸ä¸æ¹åã¸ã®ä»®æ³é³åã®å®ä½å¶å¾¡ã¯è¡ããªããã¨ãããã¹ãã¼ã«ï¼³ï¼°ã¨ãã¦ã¯å¿ãããä¸ä¸æ¹åã«ç©ã¿éãã¦éç½®ããå¿è¦ã¯ãªããããªãã¡ããã®å ´åã®é³å£°åºåç«¯åï¼´AUã¨ãã¦ã¯ãï¼¬chã¨ï¼²chã®åãï¼ã¤ãã¤ã®ã¿ãè¨ããããã«ãããã¨ãã§ããã ä½ããä¾ãã°æä¼ãã³ã³ãµã¼ããã¼ã«ãªã©å¤©äºã®é«ããå¼·èª¿ããæ®é¿ãä»å ããã¨ããå ´åçã«ã¯ãä¸ä¸æ¹åã«ãã¹ãã¼ã«ï¼³ï¼°ãéç½®ãããã¨ã§ããè¨å ´æãé«ãããã¨ãã§ããã In the second embodiment, the virtual sound image localization control in the vertical direction is not performed as in the first embodiment, and therefore the speaker SP does not necessarily have to be stacked in the vertical direction. . That is, only one each of Lch and Rch can be provided as the audio output terminal TAU in this case. However, for example, when reverberation that emphasizes the height of the ceiling is added, such as in a church or a concert hall, the presence of the speaker SP in the vertical direction can further enhance the sense of reality.

ä¸è¨æ§æã«ãããç¬¬ï¼ã®å®æ½ã®å½¢æã®åçè£ç½®ï¼ï¼ã«ããã°ãå®éã®åºåé³å£°ã«ããé³ã®é¿ãããæ ååå®¹ã«å¿ããé³ã®é¿ãã¨ä¸è´ããããã¨ãã§ããããã«ãã£ã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã¨ãã§ããã ã¾ãããã®ãããªåçè£ç½®ï¼ï¼ã§ã¯ãé³å£°åææå ±ä¿¡å·ã¨ãã¦ã®ã¡ã¿ãã¼ã¿ã«åºã¥ããæ ååã«æ ãåºãããå ´æã«å¿ããæ®é¿ãã¼ã¿ãåå¾ãããã¨ãã§ãããã®æ®é¿ãã¼ã¿ã«åºã¥ãã¦èªåçã«é³å£°ä¿¡å·ï¼¡ã«å¯¾ããæ®é¿ä»å ãè¡ããããã¤ã¾ãããã®å ´åã³ã³ãã³ãã®å¶ä½å´ã¨ãã¦ã¯ãäºãæ åä¿¡å·ï¼¶ã«å¯¾ãã¡ã¿ãã¼ã¿ãä»å ãããã¨ã§ãä¸è¨ã®ããã«å®éã®åºåé³å£°ã«ããé³ã®é¿ããæ ååå®¹ã«å¿ããé³ã®é¿ãã¨ä¸è´ããã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ããããã¨ãã§ããã With the configuration described above, according to the playback device 20 of the second embodiment, the sound reverberation of the actual output sound can be made to coincide with the sound reverberation according to the video content, thereby making it more realistic. The image / sound field space can be reproduced. Further, in such a playback apparatus 20, reverberation data corresponding to the location shown in the video can be acquired based on the metadata as the audio synchronization information signal, and the audio signal is automatically generated based on the reverberation data. A reverberation is added to A. In other words, in this case, the content production side adds metadata to the video signal V in advance so that the sound reverberation of the actual output sound matches the sound resonation according to the video content as described above. This makes it possible to reproduce more realistic video and sound field spaces.

å³ï¼ã¯ãç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ã®åä½ã«ã¤ãã¦ã®åä½æé ãããã¼ãã£ã¼ãã«ããç¤ºãã¦ããã åãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãã¡ã¿ãã¼ã¿ã«åºã¥ãæ ååå®¹ã«å¿ããå ´ææå ±ãåå¾ããã ã¤ã¾ããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ãæ åã¹ããªã¼ã ãã¼ã¿ï¼¶-strmããã¡ã¿ãã¼ã¿ã¨ãã¦æ ¼ç´ãããå ´ææå ±ãåå¾ããã FIG. 9 is a flowchart showing an operation procedure for the operation of the signal processing apparatus according to the second embodiment. First, in step S201, location information corresponding to the video content is acquired based on the metadata. That is, the metadata extraction unit 21 acquires location information stored as metadata from the video stream data V-strm.

ããã¦ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãæ®é¿ãã¼ã¿ãã¼ãã«ãããåå¾ãããå ´ææå ±ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ãããããªãã¡ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãããã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ããä¾çµ¦ãããå ´ææå ±ã¨å¯¾å¿ã¥ãããã¦ããæ®é¿ãã¼ã¿ãåå¾ããã Â Â In step S202, reverberation data corresponding to the acquired location information is acquired from the reverberation data table. That is, the reverberation effect control unit 22 acquires reverberation data associated with the location information supplied from the metadata extraction unit 21 from the reverberation data table 23.

ãã®ä¸ã§ã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãé³å£°ä¿¡å·ã«å¯¾ãæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãè¡ããã¤ã¾ããé³å£°ä¿¡å·å¦çé¨ï¼ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«åºã¥ããé³å£°ä¿¡å·ï¼¡ã«å¯¾ãæ®é¿ä»å å¦çãæ½ãã Â Â In step S203, a reverberation adding process based on the reverberation data is performed on the audio signal. That is, the audio signal processing unit 5 performs reverberation addition processing on the audio signal A based on the reverberation data supplied from the reverberation effect control unit 22.

ãªããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ããä¿¡å·å¦çè£ç½®ã®åé¨ããã¼ãã¦ã¨ã¢ã«ããæ§æããå ´åãä¾ç¤ºãããããã®ä¸é¨åã¯å¨é¨ãã½ããã¦ã¨ã¢å¦çã«ããå®ç¾ãããã¨ãå¯è½ã§ããããã®å ´åãä¿¡å·å¦çè£ç½®ã¨ãã¦ã¯ãä¸è¨å³ï¼ã«ç¤ºããå¦çã®ãã¡å¯¾å¿ããå¦çãå®è¡ããããã®ããã°ã©ã ã«å¾ã£ã¦åä½ãããã¤ã¯ãã³ã³ãã¥ã¼ã¿ãªã©ã§æ§æããã°ããããã®å ´åãä¿¡å·å¦çè£ç½®ã«å¯¾ãã¦ã¯ï¼²ï¼¯ï¼çã®è¨é²åªä½ãåããããããã«ä¸è¨ããã°ã©ã ãè¨é²ãããã Â Â In addition, although the case where each part of the signal processing apparatus is configured by hardware is illustrated as the second embodiment, part or all thereof can be realized by software processing. In that case, the signal processing device may be configured by a microcomputer or the like that operates according to a program for executing a corresponding process among the processes shown in FIG. In this case, the signal processing apparatus is provided with a recording medium such as a ROM, and the program is recorded therein.

ã¾ããç¬¬ï¼ã®å®æ½ã®å½¢æã«ããã¦ãå ´ææå ±ã¨æ®é¿ãã¼ã¿ã¨ã®å¯¾å¿ã¥ãã¯ãæ ååã«ããã¦é³æºãéç½®ãããå ´æããäºæ¸¬ãããæ¬ä¼¼çãªé³ã®é¿ããå¯¾å¿ã¥ãããæãã¯ãµã³ããªã³ã°ãªãã¼ãæ¹å¼ã®ããã«ãå®éã«ãã®å ´æã«ããã¦æ¸¬å®ããé³ã®é¿ãã®æå ±ãå¯¾å¿ã¥ããããã«ãã¦è¡ããã¨ãã§ããã Â Â Further, in the second embodiment, the association between the location information and the reverberation data associates the pseudo sound reverberated from the place where the sound source is arranged in the video, or like the sampling reverb method. In addition, the sound resonance information actually measured at the place can be correlated.

ã¾ããç¬¬ï¼ã®å®æ½ã®å½¢æã§ã¯ãæ ååå®¹ã«å¿ããæ®é¿ä»å ã«ããããæ åä¿¡å·ï¼¶ã«å¯¾ãã¦å ´ææå ±ãã¡ã¿ãã¼ã¿ã«ããåãè¾¼ããã®ã¨ããããæ ååå®¹ã«å¿ããé¿ããåç¾ããããã®æ®é¿ãã¼ã¿ãç¹å®ã§ããæå ±ã§ããã°ãå ´ææå ±ã«éå®ãããã¹ããã®ã§ã¯ãªããã¾ãããã®ããã«æ®é¿ãã¼ã¿ãç¹å®ããããã®æå ±ãåãè¾¼ã¾ãã¨ããæ®é¿ãã¼ã¿ãã®ãã®ãç´æ¥çã«ã¡ã¿ãã¼ã¿ã«ããåãè¾¼ãããã«ãããã¨ãã§ããã ãªãããã®ãã¨ã¯æ¬¡ã«èª¬æããç¬¬ï¼ã®å®æ½ã®å½¢æã«ã¤ãã¦ãåæ§ã§ããã In the second embodiment, the location information is embedded in the video signal V by metadata when adding the reverberation according to the video content. However, the reverberation data for reproducing the reverberation according to the video content is used. The information should not be limited to the location information as long as the information can be specified. Further, without embedding information for specifying reverberation data in this way, the reverberation data itself can be directly embedded with metadata. This also applies to the third embodiment described below.

ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æï¼

FIG. 10 shows the internal configuration of the playback apparatus 30 that includes the signal processing apparatus as the third embodiment.

In the third embodiment, the first embodiment and the second embodiment are combined to match the sound image position with the position of the virtual sound image, the sound of the actual output sound, and the video content. By realizing both matching with the sound of the corresponding sound, we try to reproduce a more realistic video / sound space.
In FIG. 10, parts already described in FIGS. 1 and 7 are denoted by the same reference numerals and description thereof is omitted.

ç¬¬ï¼ã®å®æ½ã®å½¢æã®åçè£ç½®ï¼ï¼ã¨ãã¦ã¯ããã®å³ï¼ï¼ã«ç¤ºãããç ´ç·ã«ããå²ã£ãä¿¡å·å¦çè£ç½®ã¨ãã¦ãåã®å³ï¼ã«ç¤ºããä¿¡å·å¦çè£ç½®ã®æ§æè¦ç´ ï¼é³æºåº§æ¨åå¾é¨ï¼ãåº§æ¨å¤æé¨ï¼ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ãå¤æãããªã¯ã¹ç®åºé¨ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ï¼ã¨ãåã®å³ï¼ã«ç¤ºããä¿¡å·å¦çè£ç½®ã®æ§æè¦ç´ ï¼ã¡ã¿ãã¼ã¿æ½åºé¨ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ï¼ã¨ãçµã¿åããããã®ãå«ãããã«ãã¦æ§æãããã Â Â As the playback device 30 of the third embodiment, as the signal processing device surrounded by the broken line shown in FIG. 10, the components of the signal processing device shown in FIG. 1 (sound source coordinate acquisition unit 6, coordinate transformation) 7, localization position control unit 8, transformation matrix calculation unit 9, audio signal processing unit 5, and components of the signal processing apparatus shown in FIG. 7 ( metadata extraction unit 21, reverberation effect control unit 22, reverberation) A combination of the data table 23 and the audio signal processing unit 5) is included.

ãã®å ´åãé³å£°ä¿¡å·å¦çé¨ï¼ã¨ãã¦ã¯ããªã¼ãã£ãªãã³ã¼ãï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡ã«å¯¾ããå®ä½ä½ç½®å¶å¾¡é¨ï¼ããä¾çµ¦ãããã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-unã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããã ãã®ä¸ã§ããããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upã«å¯¾ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«å¿ããæ®é¿ä»å å¦çãæ½ããããã¦ããã®ããã«æ®é¿ä»å å¦çãæ½ãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unã¨ãé³å£°ä¿¡å·ï¼¡R-upãããããå¯¾å¿ããé³å£°åºåç«¯åï¼´AUã«å¯¾ãã¦åºåããããã«ãããã In this case, as the audio signal processing unit 5, the audio signal AL-un obtained by multiplying the audio signal A supplied from the audio decoder 4 by the gain value GL-un supplied from the localization position control unit 8, and the gain value An audio signal AL-up multiplied by GL-up, an audio signal AR-un multiplied by a gain value GR-un, and an audio signal AR-up multiplied by a gain value GR-up are generated. Then, reverberation adding processing corresponding to the reverberation data supplied from the reverberation effect control unit 22 is performed on the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up. . Then, the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up that have been subjected to the reverberation adding process are output to the corresponding audio output terminals TAU. To be.

ãã®ãããªç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®åçè£ç½®ï¼ï¼ã«ããã°ãæ ååã«æ ãåºãããé³æºã®ä½ç½®ã¨ãã®é³æºã®ä»®æ³é³åä½ç½®ã¨ãä¸è´ããããã¨ã¨ãå®éã®åºåé³å£°ã«ããé³ã®é¿ãã¨æ ååå®¹ã«å¿ããé³ã®é¿ãã¨ãä¸è´ããããã¨ã®åæ¹ãå®ç¾ãããã¨ãã§ããããã«ãã£ã¦ããã«è¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã¨ãã§ããã ã¾ãããã®å ´åã¨ãã¦ããé³åä½ç½®ãç¤ºãåº§æ¨å¤ã¨ãæ®é¿ãã¼ã¿ãç¹å®ããããã®å ´ææå ±ã¨ã¯ãããããæ åä¿¡å·ï¼¶ã¨ã¡ã¿ãã¼ã¿ã¨ãã¦ã®é³å£°åææå ±ä¿¡å·ã«åºã¥ãã¦èªåçã«åå¾ãããã®ã§ãå¾æ¥ã®ããã«é³æºã®ä½ç½®ãæ ååå®¹ã«å¿ããé¿ãã®æå ±ããæéè»¸ã«æ²¿ã£ã¦éæ¬¡æåã§æç¤ºããå¿è¦ã¯ãªããªããã¤ã¾ããããã«ãã£ã¦ã³ã³ãã³ãã®ç·¨éã«è¦ããæéã¨æéãå¤§å¹ã«åæ¸ãããã¨ãã§ããã According to the reproducing apparatus 30 as the third embodiment, the position of the sound source displayed in the video is matched with the virtual sound image position of the sound source, the sound of the sound by the actual output sound and the video It is possible to realize both the matching of the sound reverberation according to the contents, and thereby to reproduce a more realistic video / sound field space. Also in this case, the coordinate value indicating the sound image position and the location information for specifying the reverberation data are automatically acquired based on the video signal V and the audio synchronization information signal as metadata, respectively. Thus, it is not necessary to manually instruct the sound information according to the position of the sound source and the video content sequentially along the time axis as in the conventional case. In other words, this can greatly reduce the labor and time required for editing the content.

å³ï¼ï¼ã¯ãç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ã®åä½ã«ã¤ãã¦ã®åä½æé ãããã¼ãã£ã¼ãã«ããç¤ºãã¦ããã ãã®å ´åã®ä¿¡å·å¦çè£ç½®ã®åä½ã¨ãã¦ã¯ãåã®å³ï¼ã«ç¤ºããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®åä½ã¨ãå³ï¼ã«ç¤ºããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®åä½ã¨ãä¸¦è¡ãã¦è¡ããããã®ã¨ãªãã ã¤ã¾ããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãåã®å³ï¼ã«ç¤ºããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã¨åæ§ã«ãã¡ã¿ãã¼ã¿ã«åºã¥ãæ ååå®¹ã«å¿ããå ´ææå ±ã®åå¾ã¨ãæ®é¿ãã¼ã¿ãã¼ãã«ããåå¾ãããå ´ææå ±ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ããåä½ãè¡ãããã ä¸æ¹ã§ãããã¨ä¸¦è¡ããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã¨ãã¦ãåã®å³ï¼ã«ç¤ºããã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ãã¹ãããï¼³ï¼ï¼ï¼ã¨åæ§ã«æ åä¿¡å·ã«åºã¥ãé³æºä½ç½®ã®æ ååº§æ¨ç³»ã«ããåº§æ¨å¤ãåå¾ããåä½ã¨ãåå¾ããåº§æ¨å¤ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«å¤æããåä½ã¨ãé³å£°åº§æ¨ç³»ã®åº§æ¨å¤ã«åºã¥ãå®ä½ä½ç½®å¶å¾¡ãè¡ãããã«ãããã FIG. 11 is a flowchart showing an operation procedure for the operation of the signal processing apparatus according to the third embodiment. As the operation of the signal processing apparatus in this case, the operation as the first embodiment shown in FIG. 6 and the operation as the second embodiment shown in FIG. 9 are performed in parallel. It will be a thing. That is, in step S301 and step S302, as in the case of step S201 and step S202 shown in FIG. 9, the acquisition of the location information according to the video content based on the metadata and the location information acquired from the reverberation data table are performed. The operation of acquiring the corresponding reverberation data is performed. On the other hand, as step S303, step S304, and step S305 in parallel with this, the coordinate value by the video coordinate system of the sound source position is acquired based on the video signal in the same manner as in step S101, step S102, and step S103 shown in FIG. An operation of converting the acquired coordinate value into a coordinate value of the voice coordinate system, and a localization position control based on the coordinate value of the voice coordinate system.

ãã®ä¸ã§ãã¹ãããï¼³ï¼ï¼ï¼ã§ã¯ãå®ä½ä½ç½®å¶å¾¡ã«ããçæããé³å£°ä¿¡å·ã«å¯¾ããåå¾ããæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãæ½ãããã«ããããããªãã¡ãå®ä½ä½ç½®å¶å¾¡ã«åºã¥ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ã¦çæãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unã¨ãé³å£°ä¿¡å·ï¼¡R-upã«å¯¾ããé³å£°ä¿¡å·å¦çé¨ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«å¿ããæ®é¿ä»å å¦çãæ½ããã®ã§ããã Â Â In step S306, a reverberation adding process based on the acquired reverberation data is performed on the audio signal generated by the localization position control. That is, the audio signal processing unit 5 performs the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up generated by the audio signal processing unit 5 based on the localization position control. The reverberation adding process is performed according to the reverberation data supplied from the reverberation effect control unit 22.

ãªããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ããä¿¡å·å¦çè£ç½®ã®åé¨ããã¼ãã¦ã¨ã¢ã«ããæ§æããå ´åãä¾ç¤ºãããããã®ä¸é¨åã¯å¨é¨ãã½ããã¦ã¨ã¢å¦çã«ããå®ç¾ãããã¨ãå¯è½ã§ããããã®å ´åãä¿¡å·å¦çè£ç½®ã¨ãã¦ã¯ãä¸è¨å³ï¼ï¼ã«ç¤ºããå¦çã®ãã¡å¯¾å¿ããå¦çãå®è¡ããããã®ããã°ã©ã ã«å¾ã£ã¦åä½ãããã¤ã¯ãã³ã³ãã¥ã¼ã¿ãªã©ã§æ§æããã°ããããã®å ´åãä¿¡å·å¦çè£ç½®ã«å¯¾ãã¦ã¯ï¼²ï¼¯ï¼çã®è¨é²åªä½ãåããããããã«ä¸è¨ããã°ã©ã ãè¨é²ãããã In addition, although the case where each part of the signal processing apparatus is configured by hardware is illustrated as the third embodiment, part or all thereof can be realized by software processing. In that case, the signal processing apparatus may be configured by a microcomputer or the like that operates according to a program for executing a corresponding process among the processes shown in FIG. In this case, the signal processing apparatus is provided with a recording medium such as a ROM, and the program is recorded therein.

ï¼ç¬¬ï¼ã®å®æ½ã®å½¢æï¼

By the way, in the description so far, the signal processing device as an embodiment is incorporated in the reproducing device side that reproduces the recording medium, and the video / sound field space that is more realistic on the end user side is reproduced. Although the editing is performed, the signal processing apparatus as an embodiment is recorded in order to cope with the case where the editing is performed by the producer as in the conventional editing method described above. It can also be incorporated in a recording apparatus that performs recording on a medium.

å³ï¼ï¼ã¯ããã®ããã«ãã¦å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ãåãã¦æ§æãããè¨é²è£ç½®ï¼ï¼ã®åé¨æ§æã«ã¤ãã¦ç¤ºãã¦ããã ãªãããã®å³ã«ããã¦ãæ¢ã«å³ï¼ãå³ï¼ã«ã¦èª¬æããé¨åã«ã¤ãã¦ã¯åä¸ç¬¦å·ãä»ãã¦èª¬æãçç¥ãããã¾ãããã®å³ã§ãç ´ç·ã§å²ãé¨åï¼é³æºåº§æ¨åå¾é¨ï¼ãæ¯çæå ±çæé¨ï¼ï¼ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ãå ´ææå ±åå¾é¨ï¼ï¼ãå ´ææå ±ãã¼ã¿ãã¼ã¹ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ãé³å£°ä¿¡å·å¦çé¨ï¼ï¼ãä¿¡å·å¦çè£ç½®ãå½¢æããé¨åã¨ãªãã FIG. 12 shows the internal configuration of the recording apparatus 40 configured to include the signal processing apparatus as the embodiment as described above. In this figure as well, parts already described in FIGS. 1 and 7 are denoted by the same reference numerals and description thereof is omitted. Also in this figure, the parts enclosed by broken lines (sound source coordinate acquisition unit 6, ratio information generation unit 45, localization position control unit 46, location information acquisition unit 47, location information database 48, reverberation effect control unit 22, reverberation data table 23, The audio signal processing unit 5) forms a signal processing device.

åããã®å ´åãå³ç¤ºããããã«ãã¦é³å£°ä¿¡å·ï¼¡ãåçããé³å£°ä¿¡å·åçé¨ï¼ï¼ã¨ãæ åä¿¡å·ï¼¶ãåçããæ åä¿¡å·åçé¨ï¼ï¼ã¨ãåãããããä¸è¨é³å£°ä¿¡å·åçé¨ï¼ï¼ã§åçãããé³å£°ä¿¡å·ï¼¡ã¯é³å£°ä¿¡å·å¦çé¨ï¼ã«ä¾çµ¦ããããã¾ããä¸è¨æ åä¿¡å·åçé¨ï¼ï¼ã§åçãããæ åä¿¡å·ï¼¶ã¯ãããªã¨ã³ã³ã¼ãï¼ï¼ã«ä¾çµ¦ãããã¨å±ã«ãå³ç¤ºããããã«ãã¦é³æºåº§æ¨åå¾é¨ï¼ã¨å ´ææå ±æ½åºé¨ï¼ï¼ã¨ã«å¯¾ãã¦ãåå²ãã¦ä¾çµ¦ãããã ãªããããã§ã¯é³å£°ä¿¡å·åçé¨ï¼ï¼ãæ åä¿¡å·åçé¨ï¼ï¼ãè¨é²è£ç½®ï¼ï¼åé¨ã«åãããããã®ã¨ãã¦ããããè¨é²è£ç½®ï¼ï¼å¤é¨ã«è¨ããããé³å£°ä¿¡å·åçé¨ï¼ï¼ãæ åä¿¡å·åçé¨ï¼ï¼ããããããå¥åãããé³å£°ä¿¡å·ï¼¡ãæ åä¿¡å·ï¼¶ãå¥åããããã«æ§æãããã¨ãã§ããã First, in this case, an audio signal reproducing unit 42 for reproducing the audio signal A and a video signal reproducing unit 43 for reproducing the video signal V are provided as shown in the figure. The audio signal A reproduced by the audio signal reproduction unit 42 is supplied to the audio signal processing unit 5. In addition, the video signal V reproduced by the video signal reproduction unit 43 is supplied to the video encoder 44 and branched and supplied to the sound source coordinate acquisition unit 6 and the location information extraction unit 47 as shown in the figure. Is done. Here, the audio signal reproduction unit 42 and the video signal reproduction unit 43 are provided inside the recording device 40, but input from the audio signal reproduction unit 42 and the video signal reproduction unit 43 provided outside the recording device 40, respectively. The audio signal A and the video signal V can be input.

ä¸è¨é³æºåº§æ¨åå¾é¨ï¼ã¯ããã®å ´åãæ åä¿¡å·ï¼¶ãå¥åãã¦ç»åå¦çã«ããé³æºã®ä½ç½®ãè¡¨ãæ ååº§æ¨ç³»ã®åº§æ¨å¤ãåå¾ããã é³æºåº§æ¨åå¾é¨ï¼ã«ã¦åå¾ãããæ ååº§æ¨ç³»ã«ããåº§æ¨å¤ã¯ãå³ç¤ºããããã«ãã¦æ¯çæå ±çæé¨ï¼ï¼ã«å¯¾ãã¦ä¾çµ¦ãããã In this case, the sound source coordinate acquisition unit 6 also receives the video signal V and acquires the coordinate value of the video coordinate system representing the position of the sound source by image processing. The coordinate values in the video coordinate system acquired by the sound source coordinate acquisition unit 6 are supplied to the ratio information generation unit 45 as illustrated.

ããã§ãããã¾ã§ã®åå®æ½ã®å½¢æã®ããã«ãåçè£ç½®å´ã«å®æ½ã®å½¢æã¨ãã¦ã®ä¿¡å·å¦çè£ç½®ãçµã¿è¾¼ãã§ã¦ã¼ã¶å´ã§ã®ç·¨éãè¡ãããå ´åã«ã¯ãåãã®ã¦ã¼ã¶ããå®éã«ä½¿ç¨ããã¹ãã¼ã«ã·ã¹ãã ã«ããå®ä½å¯è½ç¯å²ã«ã¤ãã¦ã®æå ±ãå¥åãããã¨ãã§ããããã«ãã£ã¦é©æ£ãªå¤æãããªã¯ã¹ãçæãããã¨ãã§ããé³æºä½ç½®ã¨ä»®æ³é³åã®ä½ç½®ã¨ãé©æ£ã«ä¸è´ããããã¨ãã§ããããããè¸ã¾ããã¨ãè¨é²è£ç½®ï¼ï¼å´ã«ããã¦ãããã®ããã«ã¹ãã¼ã«ã·ã¹ãã ã«ããå®ä½å¯è½ç¯å²ã«å¿ãã¦å¤æãããªã¯ã¹ãçæãã¦åº§æ¨å¤æãè¡ããã¨ãèããããããããã«ä¼´ã£ã¦ã¯ãã¦ã¼ã¶å´ã§ä½¿ç¨ãããåãã®ã¹ãã¼ã«ã·ã¹ãã ã«å¯¾å¿ããã¦ãããããå¥ãã®ã³ã³ãã³ããè¨é²åªä½ã«è¨é²ããªããã°ãªããªããã¨ã«ãªããç¾å®çã§ã¯ãªãã ããã§ãè¨é²è£ç½®ï¼ï¼ã¨ãã¦ã¯ãé³æºåº§æ¨åå¾é¨ï¼ã«ã¦åå¾ãããåº§æ¨å¤ï¼ï½ï¼ï½ï¼ã«ã¤ãã¦ãæ°´å¹³ç·ç»ç´ æ°ãåç´ç·ç»ç´ æ°ã«å¯¾ããããããã®å¤ã®æ¯çã«åºã¥ãã¦å®ä½ä½ç½®å¶å¾¡ãè¡ããã¨ã§ãã¦ã¼ã¶å´ã§ä½¿ç¨ãããåãã®ã¹ãã¼ã«ã·ã¹ãã ã®å¥ã«ãããé©æ£ã«é³æºä½ç½®ã¨ä»®æ³é³åã®ä½ç½®ã¨ãä¸è´ããããã¨ãã§ããããã«ããã Here, as in each of the previous embodiments, when the signal processing device as the embodiment is incorporated in the playback device and editing is performed on the user side, each user actually uses it. Information about the localization range by the speaker system can be input, and an appropriate conversion matrix can be generated, and the sound source position and the position of the virtual sound image can be properly matched. Based on this, it is conceivable that the recording device 40 side also performs coordinate transformation by generating a transformation matrix in accordance with the localization range by the speaker system in this way, but it is used on the user side accordingly. Therefore, different contents must be recorded on a recording medium in correspondence with each speaker system, which is not realistic. Therefore, as the recording device 40, the localization position control is performed on the coordinate value (x, y) acquired by the sound source coordinate acquisition unit 6 based on the ratio of the respective values to the total number of horizontal pixels and the total number of vertical pixels. Thus, the sound source position and the position of the virtual sound image can be appropriately matched regardless of the individual speaker system used on the user side.

åãããã®å ´åã®åæã¨ãã¦ãåã®å³ï¼ã«ããã¦ç¤ºããä¸ä¸å·¦å³ã®äºæ¬¡åæ¹åã«ã¤ãã¦ãåã¹ãã¼ã«ï¼³ï¼°ã«ããå®ç¾ãããå®ä½å¯è½ç¯å²ã®ä¸å¿ç¹ã¨ãè¡¨ç¤ºç»é¢ã®ä¸å¿ç¹ã¨ãä¸è´ããããã«ãã¦åã¹ãã¼ã«ï¼³ï¼°ã¨ãã£ã¹ãã¬ã¤ã¾ãã¯ã¹ã¯ãªã¼ã³ãéç½®ãããæ¡ä»¶ã®ä¸ã§ã¯ãä¾ãã°ç»é¢å·¦ä¸ç«¯ç¹ã«æ ãããé³æºã®é³å£°ã¯ãå®ä½å¯è½ç¯å²ã«ãããå·¦ä¸ç«¯ç¹ã«å®ä½ãããã°ï¼ã¤ã¾ãã¹ãã¼ã«ï¼³ï¼°L-upããåºåãããã¹ãé³å£°ã®ã²ã¤ã³ãç¸å¯¾çã«æãå¤§ããããã°ï¼ãæ ååã®é³æºä½ç½®ã¨é³æºã®ä»®æ³é³åã¨ãä¸è´ãããã®ã¨ãã¦åç¾ãããã¨ãã§ãããã¨ããããã ã¾ããä¾ãã°ç»é¢ã®ä¸å¿ç¹ã«æ ãããé³æºã®é³å£°ã¯ãå®ä½å¯è½ç¯å²ã«ãããä¸å¿ç¹ã«å®ä½ãããã°ï¼åã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ãçããããã°ï¼ãæ ååã®é³æºä½ç½®ã¨é³æºã®ä»®æ³é³åã¨ãä¸è´ãããã®ã¨ãã¦åç¾ãããã¨ãã§ããã First, as a premise in this case, the center point of the localization range realized by each speaker SP and the center point of the display screen coincide with each other in the two-dimensional directions shown in FIG. Under the condition that each speaker SP and display or screen are arranged, for example, the sound of the sound source projected at the upper left corner of the screen is localized at the upper left corner in the localization possible range (that is, output from the speaker SPL-up). It can be seen that the sound source position in the video and the virtual sound image of the sound source can be reproduced as being the same if the gain of the power to be sound is set to be the highest. For example, if the sound of the sound source displayed at the center point of the screen is localized at the center point in the localization possible range (if the gain of the sound from each speaker SP is made equal), the sound source position in the video and the sound source It can be reproduced as the virtual sound image matches.

ããã§ãåã®å³ï¼ã«ããã°ããã®å ´åã®æ ååº§æ¨ç³»ã®åº§æ¨å¤ã®åç¹ï¼ï¼ï¼ï¼ï¼ã¯ç»é¢å·¦ä¸ç«¯ç¹ã¨ããã¦ãããå¾ã£ã¦åº§æ¨å¤ã®ï½ãï½ã®å¤ã«ã¤ãã¦ãããããæ°´å¹³ç·ç»ç´ æ°ãåç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ã§ããå ´åã«å¯¾å¿ãã¦ã¯ãå·¦ä¸ç«¯ã«éç½®ãããã¹ãã¼ã«ï¼³ï¼°L-upããã®é³å£°ã®ã²ã¤ã³ãæå¤§ã¨ããã°ãããã¨ããããã åæ§ã«ãã¦ãï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼ã§ããã°ãä»®æ³é³åã¯å®ä½å¯è½ç¯å²ã®ä¸å¿ç¹ã«å®ä½ãããã°ãããã¨ãããããã¤ã¾ããåã¹ãã¼ã«ï¼³ï¼°ããã®é³å£°ã®ã²ã¤ã³ãçããè¨å®ããã°ãããã¨ããããã ã¾ããä¾ãã°ï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãï¼ï¼ï¼ã§ããã°ãï¼¬chã®ï¼ã¤ã®ã¹ãã¼ã«ï¼³ï¼°Lããã®é³å£°ã®ã²ã¤ã³ããï¼²chã®ï¼ã¤ã®ã¹ãã¼ã«ï¼³ï¼°Rããã®é³å£°ã®ã²ã¤ã³ãããæ¯çã«å¿ããåå¤§ãããªãããï¼ä¾ãã°ï¼ï¼ï¼åãªã©ï¼ã«è¨å®ããã°ãããã¨ããããã Here, according to FIG. 4, the origin (0, 0) of the coordinate value of the video coordinate system in this case is the upper left end point of the screen. Accordingly, when the ratio of the coordinate values x and y to the total number of horizontal pixels and the total number of vertical pixels is 0%, the gain of the sound from the speaker SPL-up arranged at the upper left corner It can be seen that the maximum is sufficient. Similarly, if the ratio of the x value to the total horizontal number of pixels is 50% and the ratio of the y value to the total vertical number of pixels is 50%, the virtual sound image may be localized at the center point of the localization range. I understand that. That is, it can be seen that the gain of the sound from each speaker SP may be set equal. For example, if the ratio of the x value to the total horizontal number of pixels is 25% and the ratio of the y value to the total vertical number of pixels is 50%, the audio gain from the two Lch speakers SPL is set to 2 of the Rch. It can be seen that the sound gain from the two speakers SPR may be set to be larger (for example, 1.5 times) corresponding to the ratio.

ãã®ããã«ãã¦ãåå¾ãããåº§æ¨å¤ã®ï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çã®æå ±ã¨ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çã®æå ±ã¨ã«ãããå®ä½å¯è½ç¯å²ã«ãããã©ã®ä½ç½®ã«ä»®æ³é³æºãå®ä½ãããã°ãããããããã®ã§ããããæ¯çæå ±ã«åºã¥ããã¨ã§ãï¼ã¤ã®ã¹ãã¼ã«ï¼³ï¼°ããããããåºåãããé³å£°ä¿¡å·ã«ã¤ãã¦ã®é©æ£ãªã²ã¤ã³å¤ãæ±ºå®ãããã¨ãã§ããã Â Â In this way, the virtual sound source can be placed at any position in the localization range by the information on the ratio of the x value of the acquired coordinate value to the total number of horizontal pixels and the information on the ratio of the y value to the total number of vertical pixels. Since it can be determined whether the localization should be performed, it is possible to determine appropriate gain values for the audio signals output from the four speakers SP based on the ratio information.

å³ï¼ï¼ã«ããã¦ãæ¯çæå ±çæé¨ï¼ï¼ã¯ãé³æºåº§æ¨åå¾é¨ï¼ããä¾çµ¦ãããæ ååº§æ¨ç³»ã«ããåº§æ¨å¤ã¨ãåããé³æºåº§æ¨åå¾é¨ï¼ããä¾çµ¦ãããæ°´å¹³ç·ç»ç´ æ°ããã³åç´ç·ç»ç´ æ°ã®æå ±ã«åºã¥ããåå¾ãããåº§æ¨å¤ã®ï½ã®å¤ã®æ°´å¹³ç·ç»ç´ æ°ã«å¯¾ããæ¯çã¨ãï½ã®å¤ã®åç´ç·ç»ç´ æ°ã«å¯¾ããæ¯çãç®åºãããããã¦ããããã®æ¯çæå ±ããå®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ã«åºåããã Â Â In FIG. 12, the ratio information generation unit 45 uses the coordinate values in the video coordinate system supplied from the sound source coordinate acquisition unit 6 and the information on the total number of horizontal pixels and the total number of vertical pixels supplied from the sound source coordinate acquisition unit 6. Based on the obtained coordinate value, the ratio of the x value to the total horizontal number of pixels and the ratio of the y value to the total vertical number of pixels are calculated. Then, the ratio information is output to the localization position control unit 46.

å®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ã¯ãåæ¯çæå ±ã«åºã¥ããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ã«å¯¾ãããããä¸ããããã¹ãã²ã¤ã³å¤ãæ±ºå®ããã ã¤ã¾ããåã®èª¬æããçè§£ãããããã«ããã®å ´åã¯ï½ã®å¤ã®æ¯çï¼ï¼ï¼ãå·¦æ¹åã®ï¼ï¼¡ï¼¸å¤ãï½ã®å¤ã®æ¯çï¼ï¼ï¼ï¼ï¼ãå³æ¹åã®ï¼ï¼¡ï¼¸å¤ã¨ããã¾ãï½ã®å¤ã®æ¯çï¼ï¼ï¼ãä¸æ¹åã®ï¼ï¼¡ï¼¸å¤ãï½ã®å¤ã®æ¯çï¼ï¼ï¼ï¼ï¼ãä¸æ¹åã®ï¼ï¼¡ï¼¸å¤ã¨ãã¦ãä¸ããããï½ã®å¤ã®æ¯çãï½ã®å¤ã®æ¯çã®æå ±ã«å¿ãã¦åã¹ãã¼ã«ï¼³ï¼°ãã¨ã®åã²ã¤ã³å¤ï¼ã²ã¤ã³å¤GL-unãGL-upãGR-unãGR-upï¼ãæ±ºå®ããã ãããåã²ã¤ã³å¤ã¯ãé³å£°ä¿¡å·å¦çé¨ï¼ã«ä¾çµ¦ãããã The localization position control unit 46 determines a gain value to be given to each sound to be output from each speaker SP based on each ratio information. That is, as understood from the above description, in this case, the ratio of x value = 0% is the leftward MAX value, the ratio of x value = 100% is the rightward MAX value, and the value of y Each speaker SP according to information on a given x value ratio and y value ratio, where 0% is a MAX value in the upward direction and y value ratio is 100% in a downward MAX value. Each gain value (gain value GL-un, GL-up, GR-un, GR-up) is determined. These gain values are supplied to the audio signal processing unit 5.

ä¸æ¹ãæ ååå®¹ã«å¿ããæ®é¿ãä»å ããããã®æ§æã¨ãã¦ããã®å ´åã¯ä¸è¿°ããå ´ææå ±åå¾é¨ï¼ï¼ã¨ãå ´ææå ±ãã¼ã¿ãã¼ã¹ï¼ï¼ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãè¨ããããã ä¸è¨å ´ææå ±åå¾é¨ï¼ï¼ã¨å ´ææå ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã¯ãã¡ã¿ãã¼ã¿ã§ã¯ãªãæ åä¿¡å·ï¼¶ã«ã¤ãã¦ã®ç»åå¦çã«ããå ´ææå ±ãç¹å®ããããã«è¨ããããã ã¤ã¾ããä¸è¨å ´ææå ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã«ã¯ãäºãè¨å®ãããè¤æ°ã®å ´æã«ã¤ãã¦ã®ç»åãã¼ã¿ï¼ç»åãµã³ãã«ï¼ã¨ãã®å ´ææå ±ã¨ãå¯¾å¿ä»ãããã¦æ ¼ç´ããã¦ãããããã¦ãå ´ææå ±åå¾é¨ï¼ï¼ã¯ãæ åä¿¡å·ï¼¶ã«ãããã¬ã¼ã ç»åã¨ãå ´ææå ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã«æ ¼ç´ãããè¤æ°ã®å ´æç»åã¨ã®ãããã³ã°ãè¡ããæããããã³ã°åº¦ãé«ãå ´æç»åã«å¯¾å¿ã¥ãããã¦ããå ´ææå ±ãåå¾ããããã«ãããã ããã§ããããã³ã°åº¦ãããé¾å¤ãè¶ããªãå ´åã«ã¯ãä¸è´ããå ´ææå ±ããªãã¨å¤å®ãããã¨ãã§ãããæãã¯ããã®ããã«ä¸è´ããå ´æããªãã¨ããå ´åçã«ã¯ãæ åä¿¡å·ï¼¶ã«ãããã¬ã¼ã ç»åã¨ä¸è¨å ´æç»åã¨ãæ¯è¼ãã¦ç°å¢ãé¡ä¼¼ãã¦ããã¨ãããå ´æç»åãå¤å®ãããã®å ´æç»åã«å¯¾å¿ã¥ããããå ´ææå ±ãåå¾ããããã«ãã§ããã On the other hand, as the configuration for adding reverberation according to the video content, in this case, the above-described location information acquisition unit 47, location information database 48, and reverberation effect control unit 22 are provided. The location information acquisition unit 47 and the location information database 48 are provided for specifying location information by image processing on the video signal V, not metadata. That is, the location information database 48 stores image data (image samples) for a plurality of preset locations and the location information in association with each other. Then, the location information acquisition unit 47 performs matching between the frame image based on the video signal V and a plurality of location images stored in the location information database 48, and the location information associated with the location image having the highest degree of matching. To get to. Here, if the matching degree does not exceed a certain threshold, it can be determined that there is no matching location information. Alternatively, when there is no matching place in this manner, the frame image based on the video signal V and the place image are compared to determine a place image that is considered to have a similar environment, and the place image It is also possible to acquire location information associated with the.

å ´ææå ±åå¾é¨ï¼ï¼ã«ããåå¾ãããå ´ææå ±ã¯ãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã«ä¾çµ¦ãããããã®å ´åãæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ã¯ãä¾çµ¦ãããå ´ææå ±ã«å¿ããæ®é¿ãã¼ã¿ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ããåå¾ããããã«ãããã Â Â The location information acquired by the location information acquisition unit 47 is supplied to the reverberation effect control unit 22. Also in this case, the reverberation effect control unit 22 acquires reverberation data corresponding to the supplied location information from the reverberation data table 23.

ãªããããã§ã¯èª¬æã®ä¾¿å®ä¸ãå ´ææå ±ãã¼ã¿ãã¼ã¹ï¼ï¼ã«ããã¦ã¯å ´æç»åã«å¯¾ãå ´ææå ±ãå¯¾å¿ã¥ãããã®å ´ææå ±ã«å¿ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ãæ®é¿ãã¼ã¿ãã¼ãã«ï¼ï¼ããå¯¾å¿ããæ®é¿ãã¼ã¿ãåå¾ããããã«æ§æããããå ´æç»åã«å¯¾ãç´æ¥çã«æ®é¿ãã¼ã¿ãå¯¾å¿ä»ãããã¼ã¿ãã¼ã¹ã¨ãããããã³ã°ã«ããä¸è´ãå¤å®ãããå ´æç»åããç´æ¥çã«å¯¾å¿ããæ®é¿ãã¼ã¿ãåå¾ããããã«æ§æãããã¨ãã§ããã Â Â Here, for convenience of explanation, the location information database 48 associates location information with location images, and the reverberation effect control unit 22 acquires corresponding reverberation data from the reverberation data table 23 according to the location information. Although configured, a database in which reverberation data is directly associated with a place image may be used, and reverberation data directly corresponding to the place image determined to be matched by matching may be acquired.

é³å£°ä¿¡å·å¦çé¨ï¼ã¯ãå®ä½ä½ç½®å¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããåã²ã¤ã³å¤ï¼GL-un,GL-up,GR-un,GR-upï¼ã«åºã¥ãããã®å ´åãã²ã¤ã³å¤GL-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-UNã¨ãã²ã¤ã³å¤GL-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡L-upã¨ãã²ã¤ã³å¤GR-unãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-unã¨ãã²ã¤ã³å¤GR-upãä¹ç®ããé³å£°ä¿¡å·ï¼¡R-upã¨ãçæããããã«ããããããã¦ããã®ããã«çæããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upã«å¯¾ããæ®é¿å¹æå¶å¾¡é¨ï¼ï¼ããä¾çµ¦ãããæ®é¿ãã¼ã¿ã«åºã¥ãæ®é¿ä»å å¦çãããããæ½ãã¦åºåããã Â Â The audio signal processing unit 5 multiplies the gain value GL-un also in this case based on the gain values (GL-un, GL-up, GR-un, GR-up) supplied from the localization position control unit 46. The audio signal AL-up multiplied by the gain value GL-up, the audio signal AR-un multiplied by the gain value GR-un, and the audio signal AR-up multiplied by the gain value GR-up And to be generated. The reverberation adding process based on the reverberation data supplied from the reverberation effect control unit 22 is performed on the sound signal AL-un, the sound signal AL-up, the sound signal AR-un, and the sound signal AR-up generated in this way. Give each and output.

ãªã¼ãã£ãªã¨ã³ã³ã¼ãï¼ï¼ã¯ããã®ããã«ãã¦æ®é¿ãä»å ãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upãå¥åããããããæå®ã®é³å£°å§ç¸®æ¹å¼ã«ããå§ç¸®ãããªã©æå®ã®ã¨ã³ã³ã¼ãå¦çãæ½ãã¦å¤éåå¦çé¨ï¼ï¼ã«ä¾çµ¦ããã Â Â The audio encoder 49 inputs the audio signal AL-un, the audio signal AL-up, the audio signal AR-un, and the audio signal AR-up to which reverberation is added in this way, and compresses them by a predetermined audio compression method. Then, the data is supplied to the multiplexing processing unit 50 after being subjected to a predetermined encoding process.

å¤éåå¦çé¨ï¼ï¼ã«ã¯ãä¸è¿°ãããããªãã³ã¼ãï¼ï¼ã«ããã¨ã³ã³ã¼ãå¦çãæ½ãããæ åä¿¡å·ï¼¶ãå¥åãããã ãããªã¨ã³ã³ã¼ãï¼ï¼ã«ããã¦ããæå®ã®é³å£°å§ç¸®æ¹å¼ã«ããå§ç¸®ãããªã©ã®æå®ã®ã¨ã³ã³ã¼ãå¦çãæ åä¿¡å·ï¼¶ã«æ½ãããã«ãããã å¤éåå¦çé¨ï¼ï¼ã¯ããªã¼ãã£ãªã¨ã³ã³ã¼ãï¼ï¼ããä¾çµ¦ãããé³å£°ä¿¡å·ï¼¡L-unãé³å£°ä¿¡å·ï¼¡L-upãé³å£°ä¿¡å·ï¼¡R-unãé³å£°ä¿¡å·ï¼¡R-upã¨ããããªã¨ã³ã³ã¼ãï¼ï¼ããä¾çµ¦ãããæ åä¿¡å·ï¼¶ã¨ãæå®ã®å¤éåæ¹å¼ã«ããå¤éåãã¦è¨é²é¨ï¼ï¼ã«ä¾çµ¦ããã The video signal V that has been encoded by the video decoder 44 described above is also input to the multiplexing processing unit 50. Also in the video encoder 44, the video signal V is subjected to predetermined encoding processing such as compression by a predetermined audio compression method. The multiplexing processing unit 50 includes an audio signal AL-un, an audio signal AL-up, an audio signal AR-un, an audio signal AR-up supplied from the audio encoder 49, and a video signal V supplied from the video encoder 44. Are multiplexed by a predetermined multiplexing method and supplied to the recording unit 51.

è¨é²é¨ï¼ï¼ã¯ãä¸è¨å¤éåå¦çé¨ï¼ï¼ããè¨é²ãã¼ã¿ã¨ãã¦ä¾çµ¦ãããå¤éåãã¼ã¿ãå³ç¤ºããè¨é²åªä½ï¼ï¼ï¼ã«å¯¾ãã¦è¨é²ããã è¨é²åªä½ï¼ï¼ï¼ã¯ãä¾ãã°ï¼£ï¼¤ãï¼¤ï¼¶ï¼¤ããã«ã¼ã¬ã¤ãã£ã¹ã¯ãªã©ã®åãã£ã¹ã¯è¨é²åªä½ãæãã¯ãã¼ããã£ã¹ã¯ãªã©ã®ç£æ°è¨é²åªä½ãï¼ï¼¤ï¼Mini Dsicï¼ãªã©ã®åç£æ°è¨é²åªä½ã¨ããããæãã¯ãããä»¥å¤ã®è¨é²åªä½ã¨ãããã¨ãã§ããã The recording unit 51 records the multiplexed data supplied as recording data from the multiplexing processing unit 50 on the recording medium 100 shown in the drawing. The recording medium 100 is, for example, an optical disk recording medium such as a CD, DVD, or Blu-ray disc, a magnetic recording medium such as a hard disk, or a magneto-optical recording medium such as MD (Mini Dsic). Alternatively, other recording media can be used.

ãªããããã±ã¼ã¸ã¡ãã£ã¢ã¨ãã¦è²©å£²ããè¨é²åªä½ã¨ãã¦ã¯ãåçå°ç¨ã®ï¼²ï¼¯ï¼ãã£ã¹ã¯ã¨ãããã®ãä¸è¬çã§ãããããã®å ´åå¶ä½å´ã§ã¯ãä¸è¨è¨é²åªä½ï¼ï¼ï¼ã«ä¸æ¦è¨é²ããå¤éåãã¼ã¿ãåçãã¦ãã¹ã¿ãªã³ã°è£ç½®ã«ä¾çµ¦ãã¦ãã£ã¹ã¯åç¤ã«ãããï¼ã©ã³ãã«ãããã¼ã¿è¨é²ãè¡ãããããã«ããã°ãããæãã¯ãå¤éåãã¼ã¿ãç´æ¥çã«ãã¹ã¿ãªã³ã°è£ç½®ã«ä¾çµ¦ãã¦ãã£ã¹ã¯åç¤ã«å¯¾ããè¨é²ãè¡ãããããã«ãã¦ãè¯ãã Â Â A recording medium sold as a package medium is generally a read-only ROM disk. In this case, the production side reproduces the multiplexed data once recorded on the recording medium 100 and performs mastering. It may be supplied to the apparatus so that data recording by pits / lands is performed on the disc master. Alternatively, the multiplexed data may be directly supplied to the mastering device to perform recording on the disc master.

ä¸è¨ã®ãããªæ§æã«ããç¬¬ï¼ã®å®æ½ã®å½¢æã¨ãã¦ã®è¨é²è£ç½®ï¼ï¼ã«ããã°ãæ ååã«æ ãåºãããé³æºã®ä½ç½®ã¨ãã®é³æºã®ä»®æ³é³åä½ç½®ã¨ãä¸è´ããããã¨ã¨ãå®éã®åºåé³å£°ã«ããé³ã®é¿ãã¨æ ååå®¹ã«å¿ããé³ã®é¿ãã¨ãä¸è´ããããã¨ã®åæ¹ãå®ç¾ãããã¨ã®ã§ããé³å£°ä¿¡å·ãåã³æ åä¿¡å·ãè¨é²åªä½ã«å¯¾ãã¦è¨é²ãããã¨ãã§ããã ã¤ã¾ãããã®ãããªè¨é²åªä½ãåçè£ç½®ã«ã¦åçããã¦æ ååã³é³å£°åºåãè¡ããããã¨ã§ãããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã ã¾ãããã®è¨é²è£ç½®ï¼ï¼ã«ããã¦ã¯ãæ åä¿¡å·ï¼¶ããé³æºä½ç½®ã®æå ±ã¨å±ã«å ´ææå ±ãåå¾ãããã¨ãã§ãããããé³æºä½ç½®ã®æå ±ã¨å ´ææå ±ã¨ã«åºã¥ãèªåçã«é³å£°ä¿¡å·ï¼¡ã«å¯¾ããã²ã¤ã³èª¿æ´åã³æ®é¿ä»å ãè¡ããããããã«ãããã³ã³ãã³ãã®å¶ä½å´ã¨ãã¦ã¯ãä¸è¿°ã®ããã«ãã¦ããè¨å ´æã®ããæ åã»é³å ´ç©ºéãåç¾ãããã«ããã£ã¦ãå¾æ¥ã®ããã«é³æºä½ç½®ãå ´ææå ±ãéæ¬¡æå®ãã¦ã²ã¤ã³èª¿æ´ãæ®é¿ä»å ãè¡ãæéãçãããã®çµæã³ã³ãã³ãã®ç·¨éã«è¦ããæéã¨æéãå¤§å¹ã«åæ¸ãããã¨ãã§ããã According to the recording apparatus 40 according to the fourth embodiment having the above-described configuration, the position of the sound source displayed in the video is matched with the virtual sound image position of the sound source, and the sound of the actual output sound is reproduced. It is possible to record an audio signal and a video signal that can realize both reverberation and sound reverberation according to video content on a recording medium. That is, when such a recording medium is reproduced by a reproducing apparatus and video and audio are output, a more realistic video / sound field space is reproduced. In the recording apparatus 40, location information can be acquired from the video signal V together with information on the sound source position, and gain adjustment and reverberation addition are automatically performed on the audio signal A based on the information on the sound source position and the location information. Is done. As a result, on the content production side, in order to reproduce a more realistic video / sound field space as described above, the sound source position and location information are sequentially specified as before, and gain adjustment and reverberation addition are performed. As a result, it is possible to save labor and time required for editing contents.

ããã§ãããã¾ã§ã§èª¬æããåå®æ½ã®å½¢æã§ã¯ãèª¬æã®ä¾¿å®ä¸ãé³æºãï¼ã¤ã®ã¿ã¨ããããã®ã¨ãã¦èª¬æãè¡ã£ãããé³æºãè¤æ°ã¨ãããå ´åãããªãã¡æ ååã®Playerãã¨ã«è¤æ°ã®é³å£°ä¿¡å·ï¼¡ãã©ã¤ã³åé²ããå ´åã¯ãããããã®é³å£°ä¿¡å·ï¼¡ã«ã¤ãã¦åæ§ã®é³æºåº§æ¨å¤ã®åå¾ãåã³é³æºåº§æ¨å¤ã«å¿ããåã¹ãã¼ã«ï¼³ï¼°ããåºåãããã¹ãé³å£°ä¿¡å·ã«ã¤ãã¦ã®ã²ã¤ã³èª¿æ´å¦çãè¡ãããã®ä¸ã§ããããã²ã¤ã³èª¿æ´ãããé³å£°ä¿¡å·ãåã¹ãã¼ã«å¯¾å¿ã«ããããåæãã¦åºåãããã¨ããã°ããã Here, in each of the embodiments described so far, for the sake of convenience of explanation, it has been described that there is only one sound source. However, when there are a plurality of sound sources, that is, a plurality of sound sources for each player in the video. When the audio signal A is recorded in a line, acquisition of the same sound source coordinate value for each audio signal A and gain adjustment processing for the audio signal to be output from each speaker SP according to the sound source coordinate value are performed. Then, it is only necessary to synthesize and output these gain-adjusted audio signals for each speaker.

ï¼å¤å½¢ä¾ï¼

As mentioned above, although embodiment of this invention was described, as this invention, it should not be limited to each embodiment described so far.

For example, in each embodiment, the case where the audio signal A recorded in a line for each sound source (Player) is input has been described. For example, when recording sound, the sound of all sound sources (Player) is recorded together by a stereo microphone. It can happen.
In that case, as the signal processing device of each embodiment, the sound signal of each sound source is extracted from the input stereo sound signal, and the gain adjustment corresponding to the acquired coordinate value is performed for each sound signal. And it is sufficient.

ã¾ããå®æ½ã®å½¢æã§ã¯ãä¸ä¸å·¦å³ã®äºæ¬¡åç¯å²ã®ã¿ãå®ä½å¯è½ç¯å²ã¨ããå ´åãä¾ç¤ºããããåé³æºãã¨ã«ãã®é³éèª¿æ´ãè¡ããã¨ã§å¥¥è¡ãæ¹åã«ãå®ä½å¯è½ç¯å²ãæ¡å¤§ãããã¨ãã§ãããã¤ã¾ããä¾ãã°æ åä¿¡å·ã«åºã¥ãç»åå¦çã«ãã£ã¦ãæ ååã«ãããé³æºã®ç»åãµã¤ãºãæ¤åºããçµæã«åºã¥ãããã®é³æºã®å¥¥è¡ãæ¹åã«ãããä½ç½®æå ±ãåå¾ãããããã¦ããã®å¥¥è¡ãæ¹åã«ãããä½ç½®æå ±ã«å¿ãã¦åé³æºã®é³éãããããèª¿æ´ããã°ãä¸ä¸å·¦å³ã¨å±ã«å¥¥è¡ãæ¹åãå ããä¸æ¬¡åç¯å²ã§ããããã®ä»®æ³é³åä½ç½®ãåç¾ãããã¨ãã§ããã¨ãã£ããã®ã§ããã Â Â Further, in the embodiment, the case where only the two-dimensional range of up, down, left, and right is set as the localization range is exemplified, but the localization range can be expanded also in the depth direction by adjusting the volume for each sound source. . That is, for example, the position information of the sound source in the depth direction is acquired based on the result of detecting the image size of the sound source in the video by image processing based on the video signal. Then, by adjusting the volume of each sound source according to the position information in the depth direction, each virtual sound image position can be reproduced in a three-dimensional range including the depth direction as well as up, down, left, and right.

ã¾ããã¹ãã¼ã«ï¼³ï¼°ã¨ãã¦ã¯ï¼¬chã®ä¸ä¸ãï¼²chã®ä¸ä¸ã®ã¿ã¨ããå®ä½å¯è½ç¯å²ã¯ä¸ä¸å·¦å³æ¹åã®äºæ¬¡åã®ç¯å²ã¨ããããä¾ãã°ï¼ï¼ï¼chãµã©ã¦ã³ãã·ã¹ãã ã®ããã«åå¾æ¹åã«ãã¹ãã¼ã«ï¼³ï¼°ãéç½®ããå ´åã«ã¯ãè¦è´èã®å¾å´ã«ãå®ä½å¯è½ç¯å²ãæ¡å¤§ãããã¨ãã§ããã Â Â In addition, the speaker SP is only above and below Lch and above and below Rch, and the localization range is a two-dimensional range in the up and down and left and right directions. In this case, the localization possible range can be expanded on the rear side of the viewer.

ã¾ããåå®æ½ã®å½¢æã®åçè£ç½®ï¼ï¼ãï¼ï¼ãï¼ï¼ï¼ãåããã¡ãã£ã¢åçé¨ï¼ã¨ãã¦ã¯ãè¨é²åªä½ã«ã¤ãã¦ã®åçãè¡ããã®ã¨ãã¦èª¬æããããï¼¡ï¼ã»ï¼¦ï¼ãï¼´ï¼¶æ¾éãªã©ãåä¿¡ã»å¾©èª¿ãã¦é³å£°ä¿¡å·ï¼åã³æ åä¿¡å·ï¼ãåºåãããã¥ã¼ãè£ç½®ã¨ãã¦æ§æãããã¨ãã§ããã Â Â Further, the media playback unit 2 included in the playback device (1, 20, 30) of each embodiment has been described as performing playback on a recording medium. However, AM / FM, TV broadcasting, etc. are received and demodulated. Thus, it can be configured as a tuner device that outputs an audio signal (and a video signal).

æãã¯ãåå®æ½ã®å½¢æã®åçè£ç½®ã¨ãã¦ã¯ããã®ãããªã¡ãã£ã¢åçé¨ï¼ãåãã¦è¨é²åªä½ã«ã¤ãã¦ã®åçæ©è½ãã¾ãã¯æ¾éä¿¡å·ã®åä¿¡æ©è½ãæããããã«æ§æãããä»¥å¤ã«ããä¾ãã°ã¢ã³ãè£ç½®ãªã©ã¨ãã¦ãå¤é¨ã§åçï¼åä¿¡ï¼ãããé³å£°ä¿¡å·åã³æ åä¿¡å·ãå°ãªãã¨ãå¥åãããããã®å¥åä¿¡å·ã«åºã¥ãåå®æ½ã®å½¢æã®ä¿¡å·å¦çè£ç½®ã¨ãã¦ã®åä½ãè¡ãããã«æ§æãããã¨ãã§ããã Â Â Alternatively, the playback device according to each embodiment includes such a media playback unit 2 and is configured to have a playback function for a recording medium or a broadcast signal reception function. As an alternative, at least an audio signal and a video signal reproduced (received) externally can be input, and the operation as the signal processing device of each embodiment can be performed based on these input signals.

ã¾ããåå®æ½ã®å½¢æã«ããã¦ãæ ååå®¹ã«å¿ããæ®é¿ãã¼ã¿ãåå¾ããããã®ææ³ã¨ãã¦ã¯ãã¡ã¿ãã¼ã¿ã«åºã¥ãåå¾ããææ³ãæãã¯æ åä¿¡å·ï¼¶ã¨å ´æç»åã¨ã®ãããã³ã°çµæã«åºã¥ãåå¾ããææ³ãä¾ç¤ºããããããä»¥å¤ã«ããäºãæ åä¿¡å·ï¼¶ã«å ´æã®åç§°ãªã©ãç¤ºããããããæ¿å¥ãã¦ããææ³ãæãããã¨ãã§ãããããªãã¡ããã®å ´åå¶ä½å´ã§ã¯ãæ®å½±ã«ããå¾ãæ åä¿¡å·ï¼¶ã«å ´æã®åç§°ãè¡¨ãããããï¼ã¤ã¾ãç»åä¿¡å·ã§ããï¼ãåæãã¦ãããããã¦ãåçè£ç½®å´ï¼ã¾ãã¯è¨é²è£ç½®å´ï¼ã§ã¯ãäºãè¤æ°ã®ããããã®ç»åã¨ãã®å ´ææå ±ï¼æãã¯å¯¾å¿ããæ®é¿ãã¼ã¿ï¼ã¨ãå¯¾å¿ã¥ãããã¼ã¿ãã¼ã¹ãåãã¦ããããã«ãããããããããã®ç»åã¨æ åä¿¡å·ï¼¶ã®ãã¬ã¼ã ç»åã®æå®é¨åã¨ã®ãããã³ã°ãè¡ããä¸è¨æå®é¨åã®ç»åã¨ä¸è´ããã¨å¤å®ããããããã«å¯¾å¿ã¥ããããå ´ææå ±ãåå¾ãããã®å ´ææå ±ã«åºã¥ãæ®é¿ãã¼ã¿ãåå¾ããï¼æãã¯ãä¸è´ããã¨å¤å®ããããããã«å¯¾å¿ã¥ããããæ®é¿ãã¼ã¿ãç´æ¥çã«åå¾ããï¼ã ã¾ãããã®ããã«æ åä¿¡å·ï¼¶ã«ãããããæ¿å¥ãã¦ããææ³ã®ä»¥å¤ã«ããä¾ãã°ãã¼ã³ã¼ããªã©ã®æè¦ã®è¨å·ãã¾ãã¯ã¤ã©ã¹ãçã®ç»åä¿¡å·ãæ åä¿¡å·ï¼¶ã«åæãã¦ãããã¨ã«ãã£ã¦ããåæ§ã«æ åä¿¡å·ï¼¶ã«åºã¥ãç»åå¦çã«ããå ´ææå ±ãã¾ãã¯ç´æ¥çã«æ®é¿ãã¼ã¿ãåå¾ãããã¨ãã§ããã In each embodiment, examples of a method for acquiring reverberation data according to video content include a method based on metadata or a method based on a matching result between a video signal V and a location image. However, other than this, a technique of inserting a telop indicating the name of the place in the video signal V in advance can also be mentioned. That is, in this case, the production side synthesizes a telop (that is, an image signal) representing the name of the place with the video signal V obtained by photographing. On the playback device side (or recording device side), a database in which a plurality of telop images and their location information (or corresponding reverberation data) are associated in advance is provided, and these telop images and video signals are provided. V is matched with a predetermined portion of the frame image, location information associated with the telop determined to match the image of the predetermined portion is acquired, and reverberation data is acquired based on the location information (or match) The reverberation data associated with the telop that is determined to have been directly acquired). In addition to the method of inserting a telop in the video signal V in this way, the same can be achieved by synthesizing a video signal V with a required symbol such as a barcode or an image signal such as an illustration. In addition, location information or reverberation data can be obtained directly by image processing based on the video signal V.

ã¾ããåå®æ½ã®å½¢æã«ããã¦ãæ åä¿¡å·ï¼¶ããé³æºä½ç½®ã®æå ±ãåå¾ããã«ããã£ã¦ã¯ãäºãé³æºã¨ãã¦ã®å¯¾è±¡ç©ã«ãã¼ã«ãä»ãã¦ãããã®ãã¼ã«ããã©ããã³ã°ããææ³ãä¾ç¤ºããããããä»¥å¤ã«ããä¾ãã°ç»åå¦çã«ããæ åä¸ã®ç¹å®ã®é³æºã®ç»åãã¼ã¿ããã©ããã³ã°ãããã¨ã§ãã®ä½ç½®æå ±ãåå¾ãããã¨ãã§ãããã¤ã¾ããã®å ´åãåãã¯ä¸åº¦æ åä¿¡å·ï¼¶ãåçãã¦ãããã«æ ãåºãããé³æºã®ç»åãã¼ã¿ãæä½ã«ããæå®ããããããã¦ãå®éã®åçæã«ã¯ãå¥åãããæ åä¿¡å·ï¼¶ã®ãã¬ã¼ã ç»åä¸ãããã®ããã«æå®ãããç»åã¨ä¸è´ããé¨åãæ¤åºãããã®é¨åããã©ããã³ã°ããã¨ãã£ããã®ã§ããã Â Â Moreover, in each embodiment, when acquiring the information of the sound source position from the video signal V, the method of tracking the marker by attaching a marker to the object as the sound source in advance is exemplified. For example, the position information can be obtained by tracking image data of a specific sound source in the video by image processing. That is, in this case, first, the video signal V is once reproduced, and the image data of the sound source displayed there is designated by operation. In actual reproduction, a portion matching the image thus designated is detected from the frame image of the input video signal V, and the portion is tracked.

ã¾ããåå®æ½ã®å½¢æã§ã¯ãæ¬çºæã®é³å£°å±æ§æå ±ã¨ãã¦ãé³æºã®ä½ç½®ãæ ååå®¹ã«å¿ããé¿ããç¹å®ããããã®æå ±ãæãããããã®é³å£°å±æ§æå ±ã¨ãã¦ã¯ãæ ååå®¹ã«å¿ãã¦è¨å ´æãé«ããããã®é³å£°èª¿æ´ï¼é³å£°ä¿¡å·å¦çï¼ãè¡ãã«ãããããã®èª¿æ´ãã©ã¡ã¼ã¿ãæ±ºå®ããããã«ç¹å®ãããã¹ãæå ±ã§ãã£ã¦ãæ åä¿¡å·ã«ããæ ååå®¹ã«å¿ããé³å£°ä¿¡å·ã®é³é¿çãªå±æ§ã«ä¿ãæå ±ããã°ãä»ã®æå ±ãå«ããã®ã§ããã Â Â In each embodiment, the audio attribute information according to the present invention includes information for specifying the sound according to the position of the sound source and the video content. When performing audio adjustment (audio signal processing) to enhance the feeling, it is information that should be specified to determine the adjustment parameter, and relates to the acoustic attributes of the audio signal according to the video content of the video signal If it is information, it includes other information.

ï¼,ï¼ï¼,ï¼ï¼ åçè£ç½®ãï¼ ã¡ãã£ã¢åçé¨ãï¼ ãããªãã³ã¼ããï¼ ãªã¼ãã£ãªãã³ã¼ããï¼ é³å£°ä¿¡å·å¦çé¨ãï¼ é³æºåº§æ¨åå¾é¨ãï¼ åº§æ¨å¤æé¨ãï¼,ï¼ï¼ å®ä½ä½ç½®å¶å¾¡é¨ãï¼ å¤æãããªã¯ã¹ç®åºé¨ãï¼ï¼ æä½é¨ãï¼ï¼ ã¡ã¿ãã¼ã¿æ½åºé¨ãï¼ï¼ æ®é¿å¹æå¶å¾¡é¨ãï¼ï¼ æ®é¿ãã¼ã¿ãã¼ãã«ãï¼ï¼ è¨é²è£ç½®ãï¼ï¼ é³å£°ä¿¡å·åçé¨ãï¼ï¼ æ åä¿¡å·åçé¨ãï¼ï¼ ãããªã¨ã³ã³ã¼ããï¼ï¼ æ¯çæå ±çæé¨ãï¼ï¼ å ´ææå ±åå¾é¨ãï¼ï¼ å ´ææå ±ãã¼ã¿ãã¼ã¹ãï¼ï¼ ãªã¼ãã£ãªã¨ã³ã³ã¼ããï¼ï¼ å¤éåå¦çé¨ãï¼ï¼ è¨é²é¨ãï¼ï¼ï¼ è¨é²åªä½ Â Â 1,20,30 playback device, 2 media playback unit, 3 video decoder, 4 audio decoder, 5 audio signal processing unit, 6 sound source coordinate acquisition unit, 7 coordinate conversion unit, 8,46 localization position control unit, 9 conversion matrix calculation Unit, 10 operation unit, 21 metadata extraction unit, 22 reverberation effect control unit, 23 reverberation data table, 40 recording device, 42 audio signal reproduction unit, 43 video signal reproduction unit, 44 video encoder, 45 ratio information generation unit, 47 Location information acquisition unit, 48 Location information database, 49 Audio encoder, 50 Multiplexing processing unit, 51 Recording unit, 100 Recording medium

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4