Implement optimal mode of the present invention
Describe the present invention in detail now with reference to accompanying drawing.
Fig. 1 is according to the audio coding of first embodiment of the invention and the block scheme of decoding device.Audio decoder and code device according to present embodiment are decoded and are encoded corresponding to the object signal of object-based sound signal according to the concept of grouping.In other words, by related one or more object signal are bundled in the identical group, organize to carry out Code And Decode based on each.
Referring to Fig. 1, its expression comprises the audio coding apparatus 110 of object encoder 111, and comprises the audio decoding apparatus 120 of object decoder 121 and mixer/renderer 123.Though do not illustrate in the drawings, code device 110 can comprise multiplexer etc., be used for to generate the bit stream that reduction audio signal and side information are formed, and decoding device 120 can comprise demodulation multiplexer etc., be used for extracting reduction audio signal and side information from the bit stream that receives.The Code And Decode device with this structure according to other embodiment will be described after a while.
Code device 110 receive N object signal and related object signal based on each group group information, this group information comprises relevant positional information, size information, time mark information etc. Code device 110 is coded in related object signal wherein and is grouped in together signal, and generates the object-based reduction audio signal with one or more sound channels and comprise the side information etc. of the information that extracts from each object signal.
In decoding device 120, object decoder 121 generates the signal that is encoded based on grouping according to reduction audio signal and side information, and the signal that will export from object decoder 121 according to control information of mixer/ renderer 123 is positioned over ad-hoc location in the multichannel space with particular level.That is to say that decoding device 120 generates multi-channel signals, and can be to not being that the basis unpacks again with the object with the signal that is grouped into basic coding.
According to this structure, the object by grouping and coding have that temporal identical position changes, size changes, postpone to change etc. can reduce the quantity of information that need be transmitted.Further, if object signal is grouped, can transmit the common edge information about a group, belong to phase a plurality of object signal on the same group so can easily control.
Fig. 2 is according to the audio coding of second embodiment of the invention and the block scheme of decoding device. Audio signal decoder 140 according to this embodiment is different with first embodiment, and its difference is that audio signal decoder 140 further comprises object extraction device 143.
In other words, code device 130, object decoder 141 and mixer/ renderer 145 have and first embodiment identical functions and the structure.Yet because decoding device 140 further comprises object extraction device 143, in the time must unpacking object unit, a group under the corresponding object signal can object be that unit is unpacked.In this case, can not be that unit is unpacked with the object, and only to not being the group extraction object signal that unit carries out audio mixing with the group.
Fig. 3 is the synoptic diagram of the relation between expression sound source, group and the object signal.As shown in Figure 3, the object signal with same alike result is grouped in together, can reduce the size of bit stream like this, and all object signal belongs to the group on upper strata.
Fig. 4 is according to the audio coding of third embodiment of the invention and the block scheme of decoding device.In audio coding and decoding device according to present embodiment, used this concept of core reduction upmixed channels.
Referring to Fig. 4, it shows the object encoder 151 that belongs to audio coding apparatus, and comprises the audio decoding apparatus 160 of object decoder 161 and mixer/ renderer 163.
Object encoder 151 receives N object signal (Nï¼1), and the reduction audio signal of M sound channel of generation (1ï¼Mï¼N).In decoding device 160, object decoder 161 is decoded as N object signal with the reduction audio signal of M sound channel again, and mixer/ renderer 163 final output L sound channel signals (L ã=1).
At this moment, M the reduction upmixed channels that is generated by object encoder 151 comprises K core reduction upmixed channels (Kï¼M) and the individual non-core reduction upmixed channels of M-K.The reason that the reduction upmixed channels has said structure is because its importance can change according to object signal.In other words, the Code And Decode method general for object signal do not have enough resolution, so it can comprise the composition based on other object signal of each object signal.Then, if the reduction upmixed channels comprises aforesaid core reduction upmixed channels and non-core reduction upmixed channels, can minimize the conflict between the object signal.
In this case, core reduction upmixed channels can be used and be different from the employed disposal route of non-core reduction upmixed channels.For instance, referring to Fig. 4, the side information that is input to mixer/ renderer 163 can only be defined in core reduction upmixed channels.In other words, mixer/ renderer 163 can be configured to only control the object signal that decodes from core reduction upmixed channels, and does not control the object signal that decodes from non-core reduction upmixed channels.
As another example, core reduction upmixed channels can only be made of the minority object signal, and these object signal are grouped in together and are controlled according to a control information.For instance, extra core reduction upmixed channels can only constitute karaoke OK system by voice sound signal.Further, extra core reduction upmixed channels can constitute by signals such as the tums that only divides into groups, so low frequency signal, for example the intensity of tum signal can accurately be controlled.
Simultaneously, usually become music next life by mixing multiple sound signal with track form.For instance, under the situation that music is made up of tum, guitar sound, piano sound and voice sound signal, each of tum, guitar sound, piano sound and voice sound signal all can be used as an object signal.In this case, all the some signals in the object signal are confirmed as the signal of particular importance, and can be controlled by the user, perhaps a plurality of object signal, and it is used as an object signal and comes audio mixing and control, can be defined as main object.Further, the mixing of the object signal beyond the main object in whole object signal can be defined as background object.According to this definition, we can say that whole objects or music object comprise main object and background object.
Fig. 5 and Fig. 6 are the synoptic diagram of the main object of expression and background object.Shown in Fig. 5 a, suppose main to as if voice, background object is the mixing of the whole musical instrument sound outside the voice, music object can comprise the background object behind the audio mixing of voice object and the musical instrument except voice.Shown in Fig. 5 b, the quantity of main object can be one or more.
Further, main object can have therein multiple object signal by the form of audio mixing.For instance, as shown in Figure 6, the audio mixing of voice and guitar sound can be used as main object, and residue musical instrument sound can be used as background object.
In order to control main object and the background object in the music object respectively, bitstream encoded must have one of form as shown in Figure 7 in code device.
Situation when Fig. 7 a is illustrated in the bit stream that generates in the code device and is made up of music bit stream and main object bit stream.The music bit stream has such form, and namely all object signal are mixed in wherein, and refers to the bit stream corresponding to all main objects and background object sum.Situation when Fig. 7 b represents that bit stream is made up of music bit stream and background object bit stream.Situation when Fig. 7 c represents that bit stream is made up of main object bit stream and background object bit stream.
In Fig. 7, the encoder that has same procedure by use determines that rule is to generate music bit stream, main object bit stream and background bit stream.Yet, when main object is used as the voice object, can use MP3 to decode and the encoded music bit stream, and for example can use the audio coder ï¹ decoder (codec) of AMR, QCELP, EFR or the EVRC voice object bit stream of decoding and encode, can reduce the capacity of bit stream like this.In other words, music object and main object, the Code And Decode method of main object and background object etc. can be different.
In Fig. 7 a, use the method identical with common coding method to dispose music bit stream part.Further, in the coding method such as MP3 or AAC, in the latter half of bit stream, can comprise the part of indication side information such as sub area or auxiliary area, main object bit stream can be added to this part.Therefore, the zone that is encoded by music object of total bit stream and then the main subject area in the zone that is encoded of music object form.At this moment, identifier, sign that an expression has been increased main object etc. adds the first half parts of edge regions to, so can determine whether to exist main object in decoding device.
The situation of Fig. 7 b form with Fig. 7 a basically is identical.In Fig. 7 b, the use background object replaces the main object among Fig. 7 a.
Situation when Fig. 7 c represents that bit stream is made up of main object bit stream and background object bit stream.In this case, music object is made up of the summation of main object and background object or audio mixing.In the method for this configuration bit stream, at first store background object, and then in auxiliary area, store main object.Optionally, at first store main object, and then in auxiliary area, store background object.Under these circumstances, as mentioned above, the identifier of expression edge regions information can be added to the first half parts of edge regions.
Fig. 8 represents the method for configuration bit stream, in order to determine the main object of increase.First example is that corresponding zone begins up to next frame for auxiliary area after the music bit stream is finished.In first example, only comprise the identifier that the main object of expression has been encoded.
Second example represented the coding method of the identifier that has finished the auxiliary area that begins or data area after the music bit stream corresponding to needs.For this reason, in the process of the main object of coding, need two kinds of identifiers, such as identifier and an identifier of representing main object of representing that auxiliary area begins.In the process of this bit stream of decoding, by reading the type that identifier comes specified data, and then by the data portion decoding bit stream of assigning to.
Fig. 9 is according to the audio coding of fourth embodiment of the invention and the block scheme of decoding device.Audio coding and decoding device code according to this embodiment have increased the bit stream of voice object as main object therein with decoding.
Referring to Fig. 9, be included in the music signal that scrambler 211 coding in the code device comprises voice object and music object.The example of the music signal of scrambler 211 comprises MP3, AAC, WMA etc.Scrambler 211 is increased to the voice object in the bit stream as the main object except music signal.At this moment, scrambler 211 is increased to the voice object in the part of expression side information, for example above-mentioned sub area or auxiliary area, and will represent that the identifier that handle exists extra this fact of voice object to be notified to code device is increased to this part.
Decoding device 220 comprises general encoding and decoding demoder 221, voice demoder 223 and mixer 225.The music bit stream part of the bit stream that general encoding and decoding demoder 221 decodings are received.In this case, main subject area can be identified as edge regions or data area simply, but it is not used in decoding is handled.Voice object part in the received bit stream of voice demoder 223 decoding.225 pairs of signals of decoding in general encoding and decoding demoder 221 and voice demoder 223 of mixer carry out audio mixing, and output audio mixing result.
When the voice object that comprises in the bit stream that receives as main object, the decoding device that does not comprise voice demoder 223 only decode music bit stream and output decoded result.Yet even in this case, it is still the same with common audio frequency output, because comprised voice sound signal in the music bit stream.Further, in decoding was handled, it waited to have determined whether that according to identifier the voice object has been added in the bit stream.When can not decoding voice object, by skipping to ignore the voice object, but in the time may decoding the voice object, voice sound signal be decoded and be used to audio mixing.
General encoding and decoding demoder 221 is applicable to music and the general audio decoder that uses.For instance, MP3, AAC, HE-AAC, WMA, Ogg Vorbis etc. are arranged.Voice demoder 223 can use the codec identical or different with general encoding and decoding demoder 221.For instance, voice demoder 223 can use audio coder ï¹ decoder (codec), for example EVRC, EFR, AMR or QCELP.In this case, can reduce the calculated amount of decoding.
Further, if voice to as if formed by monophone, can reduce bit rate most possibly.Yet, if the music bit stream can not only be formed by monophone, because of its formed by stereo channel and the voice sound signal of left and right acoustic channels inequality, the voice object also can be made up of stereo.
In the decoding device 220 according to present embodiment, can be to the pattern of only playing music, only play the pattern of main object and fully audio mixing and play music and the pattern of main object in any pattern select and play, with in response to the user control command such as the operation of the button on playing device or menu.
Be left in the basket and only play in this event of original music at main object, it is corresponding to the broadcast of current music.Yet, because can carry out audio mixing in response to user control command, can control the size of main object or background object.When main to as if during the voice object, this means when with have only voice to be increased or decreased after background music is compared.
The example of only playing main object can comprise that voice object or specific musical instrument sound are with as main object.In other words, this means and only can hear voice and can't hear background music, only can hear musical instrument sound and can't hear background music etc.
When music and main object by audio mixing fully and when playing, this means when with have only voice to be increased or decreased after background music is compared.Especially, after the voice composition was separated from music fully, because the disappearance of voice composition, this music can be used to karaoke OK system.If the voice object is encoded in code device, wherein the phase place of voice object is reversed, and decoding device can be play karaoke OK system by the voice object is increased in the music object.
In above-mentioned processing, described music object and main object and decoded respectively then by audio mixing.Yet, can during decoding is handled, carry out audio mixing and handle.For instance, in the MDCT such as comprising MP3 and AAC (discrete cosine transform of modification) transition coding series, can carry out audio mixing and final the execution against MDCT to the MDCT coefficient, generate PCM output like this.In this case, can effectively reduce the amount of calculation.In addition, the present invention is not restricted to MDCT, but comprises all conversion, and wherein the coefficient about general transition coding series demoder is mixed in the transform domain, and then carries out decoding.
In addition, the example that uses a main object has been described in above-mentioned example.Yet, can also use a plurality of main objects.For instance, as shown in figure 10, voice sound signal can be used as main object 1, and guitar sound can be used as main object 2.This structure is only being play the background object except voice and guitar in the music, and the user is useful especially when directly singing and playing guitar.Further, can play this bit stream by the multiple combination of music, one of them of this musical combinations can be the music that do not comprise voice, do not comprise the music of guitar, do not comprise the music of voice and guitar etc.
Simultaneously, in the present invention, the sound channel indicated by the voice bit stream can be expanded.For instance, the tum that can use the tum bit stream to play the entire portion of music, music partly or in the music does not comprise that part of of tum in the entire portion.Further, can use two or more extra bit streams, for example voice bit stream and tum bit stream come based on each part control audio mixing.
In addition, in the present embodiment, stereo/monophone has only been described mainly.Yet present embodiment also can be extended to multichannel.For instance, can come configuration bit stream by voice object, main object bit stream etc. being increased to 5.1 sound channel bit streams, and can play original sound, peeled off the sound of voice and only comprise in the sound of voice any one.
Present embodiment also can be configured to the pattern only supporting music and peeled off voice from music, and does not support only to play the pattern of voice (main object).This method can be applied to when the singer does not wish only to play voice.It can be extended to the configuration of demoder, indicates whether to exist the identifier of the function of only supporting voice to be placed in the bit stream in this configuration, and decides the scope of broadcast according to this bit stream.
Figure 11 is according to the audio coding of fifth embodiment of the invention and the block scheme of decoding device.Audio coding and decoding device according to this embodiment can use residual signals to realize karaoke OK system.When it was exclusively used in karaoke OK system, music object can be divided into aforesaid background object and main object.Main object relates to the object signal of separating control with background object.Especially, main object relates to the voice object signal.Background object is the whole object signal sum except main object.
Referring to Figure 11, be included in background object and main object that scrambler 251 codings in the code device are exported together.In cataloged procedure, can use the general audio codec such as AAC or MP3.If decoded signal in decoding device 260, decoded signal comprise background object signal and main object signal.Signal behind the hypothesis decoding is the original decoded signal, can use following method so that karaoke OK system is used to this signal.
The master is included in whole bit streams the form that likes with residual signals.Main object is decoded and then peeled off from the original decoded signal.In this case, the whole signals of first demoder, 261 decodings, and second demoder, 263 decoded residual signal, wherein g=1.Optionally, having the form that the main object signal of opposite phase can residual signals is included in whole bit streams.Main object signal can decoded and then be added to the original decoded signal.In this case, g=-1.In above arbitrary method, can realize a kind of telescopic karaoke OK system by control g value.
For instance, when g=-0.5 or g=0.5, main object or voice object are not removed fully, and only can control its level.Further, if value g is set as positive number or 0 negative, it influences the control of the size of voice object.If do not use the original decoded signal, and only export residual signals, also can support only to have the solo pattern of voice.
Figure 12 is according to the audio coding of sixth embodiment of the invention and the block scheme of decoding device.Audio coding and decoding device according to this embodiment use two residual signals by distinguishing the residual signals that is used for the output of Karaoke signal and the output of voice pattern.
Referring to Figure 12, the original decoded signal of decoding at first demoder 291 is divided into background object signal and main object signal in object separative element 295, and then is output.In fact, background object comprises some main object component and original background object, and main object also comprises some background object compositions and original main object.This is because be that the process of background object and main object signal is incomplete with the original decoded division of signal.
Especially, about background object, the main object component that is included in the background object can be included in all bit streams by the form with residual signals in advance, and all bit streams are decoded, and can deduct main object component from background object.In this case, in Figure 12, g=-1.Can realize telescopic karaoke OK system by arbitrary method of controlling the g value as described in the fifth embodiment like that.
With same method, after being applied to main object signal, residual signals can support the solo pattern by controlling value g1.Consider residual signals and the bit comparison mutually of primary object and the degree of voice pattern, the g1 value can be employed as described above like that.
Figure 13 is according to the audio coding of seventh embodiment of the invention and the block scheme of decoding device.In this embodiment, use following method further to reduce the bit rate of the residual signals in above-described embodiment.
When main object signal was monophonic signal, the original stereo signal of stereo-305 pairs of decodings in first demoder 301 of triple-track converter unit was carried out stereo-triple-track conversion.Because stereo-triple-track conversion is not finished, background object (output just) comprises some main object component and background object composition, and main object (another output just) also comprises some background object compositions and main object component.
Then, the residual error of 303 pairs of all bit streams of second demoder is partly carried out decoding (or after decoding, carrying out qmf conversion or mdct-qmf conversion), and is weighted in background object signal and the main object signal.Thereby, can obtain the signal of being formed by background object composition and main object component respectively.
The advantage of this method is, because background object signal and main object signal are divided by stereo-triple-track conversion, can use less bit rate to make up the residual signals (that is, be retained in the main object component in the background object signal and be retained in background object composition in the main object signal) that is included in other composition of signal for removal.
Referring to Figure 13, suppose that the background object composition in background object signal BS is that B, main object component are m, and the main object component in main object signal MS is that M, background object composition are b, then set up following formula.
Formula 1
BSï¼B+m
MSï¼M+b
For instance, R is made up of b-m when residual signals, and final Karaoke output KO result is:
Formula 2
KOï¼BS+Rï¼B+b
Final solo pattern output SO result is:
Formula 3
SOï¼BS-Rï¼M+m
In above-mentioned formula, can change the symbol of residual signals on the contrary, that is to say R=m-b, g=-1 ï¹ amp; G1=1.
When configuration BS and MS, according to the symbol how B, m, M and/or b are set, can easily calculate the value of g and g1, above-mentioned g and g1 are used for making what the end value of KO and SO can be made up of B and b and M and m.In above-mentioned situation, though by original signal minor alteration has only been done in Karaoke and solo signal, but just can realize the high-quality of the output signal of actual use, this is because the output of Karaoke does not comprise the solo composition, and the output of solo does not comprise the Karaoke composition yet.
Further, when having two or more main object, can progressively use the two increase/minimizings to triple-track conversion and residual signals.
Figure 14 is according to the audio coding of eighth embodiment of the invention and the block scheme of decoding device. Audio signal decoder 330 according to this embodiment is different with the 7th embodiment, and its difference is when main object signal is stereophonic signal, and each original stereo sound channel is carried out twice monophone-stereo conversion.
Because monophone-stereo conversion is not that perfectly background object signal (output just) comprises some main object component and background object composition, and main object signal (another output just) comprises some background object compositions and main object component.Thereafter, residual error in all bit streams is partly carried out decoding (or after decoding, carry out qmf conversion or mdct-qmf conversion), then a left side and R channel composition are increased to respectively in the left and right acoustic channels of background object signal and main object signal after being multiplied by weighting, so can obtain the signal be made up of background object composition (stereo) and main object component (stereo).
In the stereo residual signals that forms by the difference of utilizing between stereo background object and the stereo main object, the g=g2=-1 among Figure 14 and g1=g3=1.In addition, as mentioned above, can easily calculate the value of g, g1, g2 and g3 according to the symbol of background object signal, main object signal and residual signals.
In general, main object signal can be monophone or stereo.Because this reason, indication is main still to be that stereosonic sign is placed in all bit streams to liking monophone.When main object signal is monophone, can use the method for the 7th embodiment as shown in figure 13 main object signal of decoding, when main object signal when being stereo, by reading method that sign can use the 8th embodiment as shown in figure 14 main object signal of decoding.
In addition, when comprising one or more main object, can according to each main to as if monophone still be the stereo said method that uses continuously.At this moment, the employed number of times of each method is the same with mono/stereo master number of objects.For instance, when the quantity of main object is 3, monophone master number of objects in three main objects is 2, and stereo main number of objects is 1 o'clock, can by use the described method of the 7th embodiment twice and Figure 14 in the described method of the 8th embodiment once export the Karaoke signal.At this moment, can be predetermined the order of the described method of the 7th embodiment and the described method of the 8th embodiment.For instance, always carry out the described method of the 7th embodiment for monophone master object, and carry out the described method of the 8th embodiment for stereo object.As another kind of sequential decision procedure, the descriptor that is used for the order of the description described method of the 7th embodiment and the described method of the 8th embodiment can be placed on total bit stream, and can come optionally manner of execution according to descriptor.
Figure 15 is according to the audio coding of ninth embodiment of the invention and the block scheme of decoding device.Use the multi-channel encoder device to become music object or background object next life according to the audio coding of this embodiment and decoding device.
Referring to Figure 15, shown a kind of audio coding apparatus 350, it comprises multi-channel encoder device 351, object encoder 353 and multiplexer 355, also shown a kind of audio decoding apparatus 360, it comprises demodulation multiplexer 361, object decoder 363 and multi-channel decoder 369. Object decoder 363 can comprise sound channel transducer 365 and mixer 367.
Multi-channel encoder device 351 uses the music object based on sound channel to generate the reduction audio signal, and generates the first audio frequency parameter information based on sound channel by the information of extracting music object.Object encoder 353 generates a reduction audio signal (this reduction audio signal is to encode by the reduction audio signal of using voice object and multi-channel encoder device 351 to generate to get) as the object basis and generates the object-based second audio frequency parameter information and corresponding to the residual signals of voice object.The bit stream that the reduction audio signal that multiplexer 355 formation object scramblers 353 generate and side information make up.At this moment, side information is first audio frequency parameter that comprises that multi-channel encoder device 351 generates, the information of second audio frequency parameter that residual signals and object decoder 353 generate etc.
In audio decoding apparatus 360, demodulation multiplexer 361 is demultiplexing reduction audio signal and side information from the bit stream that receives. Object decoder 363 by the sound signal utilizing music object therein and be encoded based on sound channel and therein at least one in the sound signal that is encoded of voice object generate the sound signal that has controlled voice composition.Object decoder 363 comprises sound channel transducer 365, and it can carry out monophone-stereo conversion or two-three conversion in decoding is handled.Mixer 367 can use the audio mixing parameter that is included in the control information to wait to control level, position of special object signal etc. Multi-channel decoder 369 uses sound signal and the side information of decoding in object decoder 363 to wait to generate multi-channel signal.
Object decoder 363 can become corresponding to the karaoke mode that generates the sound signal that does not have the voice composition therein according to the control information of input next life, generate the solo pattern of the sound signal only comprise the voice composition therein and generate any sound signal in these three kinds of patterns of general modfel of the sound signal that comprises the voice composition therein.
Figure 16 represents progressively to encode synoptic diagram under the voice object situation.Referring to Figure 16, comprise multi-channel encoder device 381, the first to the 3rd object encoder 383,385 and 387 and multiplexer 389 according to the code device 380 of present embodiment.
Multi-channel encoder device 381 has structure and the function the same with multi-channel encoder device shown in Figure 15.The difference of present embodiment and the 9th embodiment shown in Figure 15 is: first to the 3rd object encoder 383,385 and 387 is configured to progressively divide into groups voice object and residual signals, residual signals generates in each grouping step, and is included in the bit stream of multiplexer 389 generations.
When the bit stream that decoding generates by this processing, can be applied to the sound signal that is encoded by the music object of progressively dividing into groups by the residual signals that will from bit stream, extract or the sound signal that is encoded by the voice object that progressively divides into groups in generate voice composition with control or the signal of other desired object composition.
Simultaneously, in the above-described embodiments, carry out original coding signal and residual signals sum or poor, perhaps background object or main object and residual signals sum or the position of difference be not restricted to a certain specific region.For instance, can be in time domain or in frequency domain, such as carrying out this processing in the MDCT territory.Optionally, can be in subband domain, such as carrying out this processing in QMF subband domain or the hybrid subband territory.Especially, when in frequency domain or subband domain, carrying out this processing, can not comprise that the number of frequency bands of residual error composition generates telescopic Karaoke signal by control.For instance, when the number of sub-bands of original decoded signal is 20, if the number of frequency bands of residual signals is set as 20, then can export perfect Karaoke signal.When only having covered 10 low frequencies, only get rid of the voice composition from low frequency part, and keep at HFS.In the later case, sound quality is lower than the previous case, but it has the more advantage of low bit rate.
Further, when the quantity of main object was not one, a plurality of residual signals can be included in all bit streams, and can repeatedly carry out the residual signals sum or poor.For instance, when two main objects comprise voice and guitar, and their residual signals is included in all bit streams, then can generate the Karaoke signal of having removed voice and guitar signal as follows: at first remove voice sound signal from all signals, then remove the guitar signal again.In this case, can be created on the Karaoke signal of wherein only having removed voice sound signal and the Karaoke signal of only having removed the guitar signal therein.Optionally, can only export voice sound signal or only export the guitar signal.
In addition, in order to generate the Karaoke signal by fully only remove voice sound signal from all signals, all signals and voice sound signal are encoded respectively.According to the following dual mode of needs for the type of the codec of encoding.The first, always in all signals and voice sound signal, use identical coding codec.In this case, will in bit stream, set up the identifier that to determine about the coding codec type of all signals and voice sound signal, and demoder is carried out identification, the decoded signal of codec type and the processing of then removing the voice composition by determining this identifier.In this is handled, the above, it is poor to have used and reached.The information of identifier can comprise whether residual signals has used the codec identical with the original decoded signal, is used for the type etc. of the codec of coded residual signal.
In addition, can use different coding codecs for all signals and voice sound signal.For instance, voice sound signal (residual signals just) always is to use fixing codec.In this case, the identifier that is used for residual signals is just optional, and has only predetermined codec can be used to all signals of decoding.Yet in this case, the processing that the processing of removal residual signals is restricted between two signals from all signals is feasible immediately territory, such as time domain or subband domain.For instance, in the MDCT territory, the processing between two signals is not feasible immediately.
In addition, according to the present invention, the exportable Karaoke signal of only being formed by the background object signal.Can generate multi-channel signal by the Karaoke signal is carried out extra expansion audio mixing.For instance, if the Karaoke signal that additional application MPEG surround sound generates for the present invention can generate 5.1 sound channels Karaoke signal.
In addition, in the above-described embodiments, described music object in the frame and the quantity of main object, or the quantity of background object and main object has been identical situation.Yet, the main object in the frame and the quantity of main object, or the quantity of background object and main object can be different.For instance, music can be present in every frame, and main object can be present in per two frames.At this moment, main object can be decoded, and decoded result can be applied to two frames.
Music can have different sample frequency with main object.For instance, when the sample frequency of music is 44.1KHz, when the sample frequency of main object was 22.05KHz, the MDCT coefficient of main object can be calculated, and then can be only the respective regions of the MDCT coefficient of music be carried out audio mixing.It has utilized in karaoke OK system, and voice has the principle of the frequency band lower than musical instrument sound, and it has the advantage that reduces data capacity.
In addition, according to the present invention, can realize the readable code of processor at the readable recording medium of processor.The readable recording medium of processor can comprise the pen recorder of all kinds that stores the data that can be read by processor thereon.The example of the recording medium that processor is readable comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data storage etc., and comprises the carrier wave that for example passes through the transmission of the Internet.In addition, the readable recording medium of processor can be assigned with in the system that connects by network, and the readable code of processor can distribution mode be stored and carry out.
Although the present invention is described with reference to its preferred embodiment, be understandable that the present invention is not limited to these specific embodiments, those skilled in the art can make multiple possible modification.It should be noted that these revise should not break away from technical spirit of the present invention and expectation is understood separately.
Industrial applicibility
The present invention can be used to the Code And Decode of object-based sound signal and handle, and handle related object signal according to group, and play mode can be provided, such as karaoke mode, solo pattern and general modfel.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4