CN101361117B

CN101361117B - Method and apparatus for processing a media signal

Info

Publication number: CN101361117B
Application number: CN2007800015359A
Authority: CN
Inventors: 吴贤午; 房熙锡; 金东秀; 林宰显; 郑亮源
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2006-01-19
Filing date: 2007-01-19
Publication date: 2011-06-15
Anticipated expiration: 2027-01-19
Also published as: CN101361118A; CN101361118B; CN101361116B; CN101361116A; CN101361115A; CN101361119B; CN101361121A; CN101371298A; CN101361120A; CN101361120B; CN101361119A; CN101361121B; CN101361117A

Abstract

An apparatus for processing a media signal and method thereof are disclosed, by which the media signal can be converted to a surround signal by using spatial information of the media signal. The present invention provides a method of processing a signal, comprising: extracting a downmix signal from a bit stream; generating decorrelated downmix signal through using decorrelator on the downmix signal; generating surround signal through using the rendering information for generating surround signal on the downmix signal and the decorrelated downmix signal.

Description

Handle the method and apparatus of media signal

Technical field

The present invention relates to handle the devices and methods therefor of media signal, relate in particular to by the spatial information that uses media signal and generate devices and methods therefor around signal.

Background technology

Generally speaking, various types of apparatus and method have been widely used in generating this multichannel media signal by spatial information and the down-mix audio signal of using the multichannel media signal, and wherein down-mix audio signal is by becoming monophony or stereophonic signal to generate as multi-channel audio the multichannel media signal.

Yet above-mentioned method and apparatus can not use in being unsuitable for generating the environment of multi-channel signal.For example, they can not use for the equipment that only can generate stereophonic signal.In other words, without any existing method or the device that in the environment that can not generate this multi-channel signal, generates---wherein should have the multichannel feature---around signal around signal by the spatial information that uses multi-channel signal.

So, because without any existing method or the device that in the equipment that only can generate monophony or stereophonic signal, generates around signal, so be difficult to handle expeditiously media signal.

Disclosure of the Invention

Technical matters

Therefore, the present invention relates to a kind of devices and methods therefor of having eliminated the processing media signal of the limitation of one or more because correlation technique and the problem that shortcoming causes basically.

An object of the present invention is to provide a kind of devices and methods therefor that is used for processing signals, can this media signal being converted to around signal of mat by the spatial information that uses media signal.

Additional features of the present invention and advantage will be set forth in the following description, and will be in part apparent from describe, and perhaps can recognize from the practice of the present invention.Purpose of the present invention and other advantage can be realized and obtained by the structure of specifically noting in printed instructions and claims and the accompanying drawing.

Technical scheme

In order to realize these and other advantage and according to purpose of the present invention, a kind of method of treatment in accordance with the present invention signal comprises: generate source map information corresponding to each source in this multiple source by the spatial information that uses the feature between the indication multiple source; Filter information by will providing surrounding effect is applied to these source map informations by the source and generates son and play up information; Be used to generate the information of playing up by integrating this a little at least one generation of playing up in the information around signal; And be applied to generate around signal by this multi-source being carried out the down-mix audio signal that multi-channel audio handle to generate by this being played up information.

For further these and other advantage of realization and according to purpose of the present invention, a kind of device of processing signals comprises: the source map unit, and its spatial information by feature between the use indication multiple source generates the source map information corresponding to each source in this multiple source; Son is played up information generating unit, and its filter information by will having surrounding effect is applied to these source map informations by the source and generates son and play up information; Integral unit, it is used to generate the information of playing up around signal by integrating this a little at least one generation of playing up in the information; And rendering unit, it is applied to generate around signal by the down-mix audio signal of this multiple source being carried out multi-channel audio processing generation by playing up information.

Should be understood that above general description and the following detailed description are exemplary and explanat, and aim to provide of the present invention further explanation claimed.

Beneficial effect

Make to receive according to signal processing apparatus of the present invention and method and comprise by multi-channel signal being carried out multi-channel audio and handle the demoder of bit stream of the spatial information of the down-mix audio signal that generates and this multi-channel signal and can in the environment that can not recover this multi-channel signal, generate signal with surrounding effect.

The accompanying drawing summary

Be included in this to provide further understanding of the present invention, and in this application combined and constitute its a part of accompanying drawing embodiments of the present invention are shown, it can be used to explain principle of the present invention with instructions.

In the accompanying drawing:

Fig. 1 be audio signal encoding apparatus according to an embodiment of the invention and audio signal decoder block diagram;

Fig. 2 is the structural drawing of the bit stream of sound signal according to an embodiment of the invention;

Fig. 3 is the detailed diagram of spatial information converting unit according to an embodiment of the invention;

Fig. 4 and Fig. 5 are the block diagrams that is used for the channel configuration of source mapping process according to one embodiment of present invention;

Fig. 6 and Fig. 7 are the detailed diagram that is used for the rendering unit of stereosonic down-mix audio signal according to one embodiment of present invention;

Fig. 8 and Fig. 9 are the detailed diagram that is used for the rendering unit of monaural down-mix audio signal according to one embodiment of present invention;

Figure 10 and Figure 11 are the block diagrams of smooth unit according to an embodiment of the invention and expanding element;

Figure 12 is the coordinate diagram that is used to explain first smoothing method according to an embodiment of the invention;

Figure 13 is the coordinate diagram that is used to explain second smoothing method according to an embodiment of the invention;

Figure 14 is the coordinate diagram that is used to explain the 3rd smoothing method according to an embodiment of the invention;

Figure 15 is the coordinate diagram that is used to explain Siping City according to an embodiment of the invention cunning method;

Figure 16 is the coordinate diagram that is used to explain the 5th smoothing method according to an embodiment of the invention;

Figure 17 is the figure that is used to explain corresponding to the prototype filter information of each sound channel;

Figure 18 is the block diagram that generates first method of playing up filter information according to one embodiment of present invention in the spatial information converting unit;

Figure 19 is the block diagram that generates second method of playing up filter information according to one embodiment of present invention in the spatial information converting unit;

Figure 20 is the block diagram that generates third party's method of playing up filter information according to one embodiment of present invention in the spatial information converting unit;

Figure 21 is used for explaining according to one embodiment of present invention at the figure of rendering unit generation around the method for signal;

Figure 22 is the figure of first interpolation method according to an embodiment of the invention;

Figure 23 is the figure of second interpolation method according to an embodiment of the invention;

Figure 24 is according to an embodiment of the invention figure that switches method;

Figure 25 is a block diagram of using the position of the length of window that is determined by length of window decision unit according to one embodiment of present invention;

Figure 26 is the figure of the wave filter with all lengths that uses in audio signal according to one embodiment of present invention;

Figure 27 is according to one embodiment of present invention by using a plurality of subfilters to come the figure of the method for audio signal dividually;

Figure 28 is a block diagram of playing up the method for cutting apart the information of playing up that is generated by a plurality of subfilters according to one embodiment of present invention to monaural down-mix audio signal;

Figure 29 is a block diagram of playing up the method for cutting apart the information of playing up that is generated by a plurality of subfilters according to one embodiment of present invention to stereosonic down-mix audio signal;

Figure 30 is the block diagram of the first territory conversion method of down-mix audio signal according to an embodiment of the invention; And

Figure 31 is the block diagram of the second territory conversion method of down-mix audio signal according to an embodiment of the invention.

Preferred forms of the present invention

Now will be in detail with reference to preferred implementation of the present invention, its example illustration is in accompanying drawing.

Fig. 1 be audio signal encoding apparatus according to an embodiment of the invention and audio signal decoder block diagram.

With reference to figure 1, code device 10 comprises down-mix unit 100, spatial information generation unit 200, down-mix audio signal coding unit 300, spatial information coding unit 400 and multiplexed unit 500.

If multi-source (X1, X2 ..., Xn) sound signal is imported into down-mix unit 100, then down-mix unit 100 becomes down-mix audio signal with the signal of being imported as multi-channel audio.In this case, down-mix audio signal comprises monophony, stereo and multiple source audio signal.

The source comprises sound channel, and is expressed as sound channel easily in the following description.In this manual, as a reference with monophony or stereosonic down-mix audio signal.Yet, the invention is not restricted to monophony or stereosonic down-mix audio signal.

Code device 10 can randomly use the arbitrariness down-mix audio signal that directly provides from external environment condition.

Spatial information generation unit 200 is from multi-channel audio signal span information.This spatial information can generate in the multi-channel audio process.Down-mix audio signal that is generated and spatial information by down-mix audio signal coding unit 300 and spatial information coding unit 400 codings, transfer to multiplexed unit 500 respectively then.

In the present invention, ' spatial information ' is meant by decoding device and generates the required information of multi-channel signal from down-mix audio signal being carried out the channel expansion audio mixing, and wherein this down-mix audio signal generates and be transferred to this decoding device by code device by this multi-channel signal being carried out the multi-channel audio processing.Spatial information comprises spatial parameter.Spatial parameter comprises CLD (levels of channels is poor), the ICC (inter-channel coherence) of the correlativity between the indication sound channel, the CPC (sound channel predictive coefficient) that uses etc. of the energy difference between the indication sound channel from two sound channels generation triple-track the time.

In the present invention, ' down-mix audio signal coding unit ' or ' down-mix audio signal decoding unit ' is meant the codec of coding or decoded audio signal rather than spatial information.In this manual, with the down-mix audio signal be the example of sound signal rather than spatial information.And down-mix audio signal coding or decoding unit can comprise MP3, AC-3, DTS or AAC.In addition, down-mix audio signal coding or decoding unit can comprise following codec and the codec that has developed in the past.

Multiplexed unit 500 is transferred to decoding device 20 with the bit stream that is generated then by with down-mix audio signal and the multiplexed bit stream that generates of spatial information.In addition, will in Fig. 2, explain the structure of this bit stream after a while.

Decoding device 20 comprises demultiplex unit 600, down-mix audio signal decoding unit 700, spatial information decoding unit 800, rendering unit 900 and spatial information converting unit 1000.

Demultiplex unit 600 receives bit stream, isolates encoded down-mix audio signal and encoded spatial information then from this bit stream.Subsequently, down-mix audio signal decoding unit 700 is decoded to this encoded down-mix audio signal, and spatial information decoding unit 800 is decoded to this encoded spatial information.

Spatial information converting unit 1000 is utilized through the spatial information of decoding and filter information and is generated the information of playing up that can be applicable to down-mix audio signal.In this case, will play up information and be applied to this down-mix audio signal to generate around signal.

For example, generate in the following manner around signal.At first, can comprise some steps of utilizing OTT (one to two) frame or TTT (three to three) frame by code device 10 from the process that multi-channel audio signal generates down-mix audio signal.In this case, each generation that spatial information can be from these steps.Spatial information is transferred to decoding device 20.Decoding device 20 generates around signal with the spatial information that down-mix audio signal is played up through conversion then by transformed space information then.The present invention generates multi-channel signal by down-mix audio signal being carried out the processing of channel expansion audio mixing, relate to the rendering intent that may further comprise the steps but replace: extract the spatial information that is used for each channel expansion audio mixing step, and play up by using the spatial information that is extracted to carry out.For example, HRTF (transport function that head is relevant) filtering is spendable in this rendering intent.

In this case, spatial information is the value that also can be applicable to hybrid domain.So, can will play up according to the territory and be categorized into following type.

The first kind is to play up by making down-mix audio signal carry out on hybrid domain by hybrid filter-bank.In this case, the conversion of the territory of spatial information is unnecessary.

Second type is to carry out to play up on time domain.In this case, to utilize hrtf filter be to be modeled as FIR (limited reverse response) wave filter on the time domain or IIR (unlimited reverse response) this fact of wave filter to second type.So the process that spatial information is converted to the filter coefficient of time domain needs.

The 3rd type is to carry out to play up on different frequency domains.For example, this plays up on DFT (discrete Fourier transformation) territory and carries out.In this case, the process that spatial information is converted in the corresponding territory is essential.Particularly, the 3rd type can realize quick computing by the computing that the filtering on the time domain is replaced on the frequency domain.

In the present invention, filter information is the information about the required wave filter of audio signal, and comprises the filter coefficient that offers specific filter.The example of explaining filter information is as follows.At first, prototype filter information is the original filter information of specific filter, and can be expressed as GL_L etc.Through the filter coefficient of filter information indication after prototype filter information has been converted of conversion, and can be expressed as GL_L etc.Son is played up information and is meant the prototype filter information spaceization generating around the resulting filter information of signal, and can be expressed as FL_L1 etc.The information of playing up is meant to carry out plays up required filter information, and can be expressed as HL_L etc.Be meant from interpolation/smoothly this plays up the filter information that information obtains through interpolation/level and smooth information of playing up, and can be expressed as HL-L etc.In this manual, mentioned above filter information.Yet the present invention is not subjected to the restriction of the title of filter information.Particularly, with HRTF be the example of filter information.Yet, the invention is not restricted to HRTF.

Rendering unit 900 receives through the down-mix audio signal of decoding and plays up information, utilizes then through the down-mix audio signal of decoding and the information of playing up to generate around information.Around signal can be the signal that surrounding effect is provided to the audio system that only can generate stereophonic signal.Except only generating the audio system of stereophonic signal, the present invention also can be applicable to various systems.

Fig. 2 is the structural drawing of the bit stream of sound signal according to an embodiment of the invention, and wherein this bit stream comprises encoded down-mix audio signal and encoded spatial information.

Comprise down-mix audio signal field and auxiliary data field with reference to figure 2,1 frame audio frequency useful load.Encoded spatial information can be stored in this auxiliary data field.For example, if the audio frequency useful load is 48～128kbps (kilobits/second), then spatial information can have the scope of 5～32kbps.Yet, to the scope of audio frequency useful load and the spatial information system of not limiting.

Fig. 3 is the detailed diagram of spatial information converting unit according to an embodiment of the invention.

With reference to figure 3, spatial information converting unit 1000 comprises that source map unit 1010, son play up information generating unit 1020, integral unit 1030, processing unit 1040 and territory converting unit 1050.

Source map unit 101 is shone upon the source map information that generates corresponding to each source of sound signal by utilizing spatial information execution source.In this case, the source map information is meant by utilizing spatial information etc. to generate so that its information corresponding to every source in each source of sound signal.The source comprises sound channel, and in this case, generation be source map information corresponding to each sound channel.The source map information can be expressed as coefficient.And, will explain the source mapping process in detail with reference to figure 4 and Fig. 5 after a while.

Son is played up information generating unit 1020 and is played up information by utilizing source map information and filter information to generate corresponding to the son in each source.For example, if rendering unit 900 is hrtf filters.Then son is played up information generating unit 1020 and can be played up information by utilizing hrtf filter information to generate son.

Integral unit 1030 plays up information by integron so that its each source corresponding to down-mix audio signal generates the information of playing up.Be meant by being applied to down-mix audio signal by the information of playing up of utilizing spatial information and filter information to generate and generate information around signal.And the information of playing up comprises the filter coefficient type.Can omit integration to reduce the operand of render process.Subsequently, play up information and be transferred to processing unit 1042.

Processing unit 1042 comprises interpolation unit 1041 and/or smooth unit 1042.The information of playing up is by interpolation unit 1041 interpolations and/or level and smooth by smooth unit 1042.

The territory that territory converting unit 1050 will be played up information is converted to the territory of rendering unit 900 employed down-mix audio signal.And, can territory converting unit 1050 be set to one of all places that comprises the position shown in Fig. 3.So,, then can omit territory converting unit 1050 if the information of playing up is to generate on the territory identical with rendering unit 900.The information of playing up through the territory conversion is transferred to rendering unit 900 subsequently.

Spatial information converting unit 1000 can comprise filter information converting unit 1060.In Fig. 3, filter information converting unit 1060 is set in the spatial information converting unit 100.Alternatively, filter information converting unit 1060 can be arranged on the outside of spatial information converting unit 100.Filter information converting unit 1060 is converted into and is applicable to that generating son from the stochastic filtering device information of for example HRTF etc. plays up information or play up information.The transfer process of filter information can may further comprise the steps.

At first, comprise the territory is matched to applicable step.Do not carry out the territory of playing up if the territory of filter information matches, then need this territory coupling step.For example, it is essential time domain HRTF being transformed into the DFT, the QMF that are used to generate the information of playing up or the step of hybrid domain.

The second, can comprise the coefficient reduction steps.In this case, be easy to preserve the HRTF that also will change and be applied to spatial information through the territory through the HRTF of territory conversion.For example, if the prototype filter coefficient has the response of long tap (tap) number (length), then corresponding coefficient must be stored in to add up to corresponding length and is total up in 10 the corresponding storage space of response in the situation of 5.1 sound channels.This has increased the load and the operand of storer.In order to prevent this problem, can adopt the method for in the transfer process of territory, keeping the filter coefficient of approximately briefly storing in the filter characteristic.For example, the HRTF response can be converted into a few parameters value.In this case, parameter generative process and parameter value can be according to the territories of using and different.

Down-mix audio signal was passed through territory converting unit 1110 and/or correlated elements 1200 before playing up with the information of playing up.In the situation different with the territory of down-mix audio signal of the territory of playing up information, the territory of territory converting unit 1110 conversion down-mix audio signal is to mate these two territories.

Correlated elements 1200 is applied to the down-mix audio signal through the territory conversion.Compare with the method that decorrelator is applied to the information of playing up, this may have higher relatively operand.Yet it can prevent to distort in the process of information is played up in generation.If operand can allow, then correlated elements 1200 can comprise the decorrelator that differs from one another on a plurality of characteristics.If down-mix audio signal is a stereophonic signal, then can not use correlated elements 1200.In Fig. 3, what use in render process is the monaural down-mix audio signal of changing through the territory---be in the situation of monaural down-mix audio signal on frequency, mixing, QMF or the DFT territory, on corresponding territory, using decorrelator.And the present invention also is included in the decorrelator that uses on the time domain.In this case, be that the monaural down-mix audio signal before the territory converting unit 1100 is directly inputted to correlated elements 1200.First rank or more the iir filter of high-order (or FIR wave filter) can be used as decorrelator and use.

Subsequently, rendering unit 900 is utilized down-mix audio signal, is generated around signal through the down-mix audio signal of decorrelation and the information of playing up.If down-mix audio signal is a stereophonic signal, then can not use down-mix audio signal through decorrelation.After a while the details of render process will be described with reference to figure 6 to 9.

This is converted to time domain around signal by inverse conversion unit, territory 1300 and is output then.If like this, the user just can hear the sound with multichannel effect by stereophone etc.

Fig. 4 and Fig. 5 are the block diagrams that is used for the channel configuration of source mapping process according to one embodiment of present invention.The source mapping process is the process by corresponding source, each source map information that utilizes spatial information generation and sound signal.As mentioning in the above description, the source comprises sound channel, and can generate the source map information to make it corresponding to the sound channel shown in Fig. 4 and Fig. 5.The source map information generates with the type that is applicable to render process.

For example, if down-mix audio signal is a monophonic signal, then can utilizes such as spatial informations such as CLD1～CLD5, ICC1～ICC5 and generate the source map information.

The source map information can be expressed as (=D such as D_L _L), D_R (=D _R), D_C (=D _C), D_LFE (=D _LFE), D_Ls (=D _Ls), D_R (=D _Rs) equivalence.In this case, generating the process of source map information can be according to becoming corresponding to the scope of the tree structure of spatial information, the spatial information that will use etc.In this manual, down-mix audio signal for example is a monophonic signal, and it is not construed as limiting the present invention.

Can be expressed as mathematics calculation 1 from the right side and the L channel output of rendering unit 900 outputs.

Mathematics calculation 1

Lo＝L*GL_L′+C*GC_L′+R*GR_L′+Ls*GLs_L′+Rs*GRs_L′

Ro＝L*GL_R′+C*GC_R′+R*GR_R′+Ls*GLs_R′+Rs*GRs_R′

In this case, the product on operator ' * ' the indication DFT territory, and can be substituted by the convolution on QMF or the time domain.

The present invention includes by the source map information that utilizes spatial information or generate the method for L, C, R, Ls and Rs by the source map information that utilizes spatial information and filter information.For example, can only utilize the CLD of spatial information or utilize the CLD of spatial information and ICC to generate the source map information.The method of only utilizing CLD to generate the source map information is explained as follows.

Have in the situation of structure shown in Figure 4 at this tree structure, first method of only utilizing CLD to obtain the source map information can be expressed as mathematics calculation 2.

Mathematics calculation 2

[\begin{matrix} L \\ R \\ C \\ LFE \\ Ls \\ Rs \end{matrix}] = [\begin{matrix} D_{L} \\ D_{R} \\ D_{C} \\ D_{LFE} \\ D_{Ls} \\ D_{Rs} \end{matrix}] m = [\begin{matrix} c_{1, OTT 3} c_{1, OTT 1} c_{1, OTT 0} \\ c_{2, OTT 3} c_{1, OTT 1} c_{1, OTT 0} \\ c_{1, OTT 4} c_{2, OTT 1} c_{1, OTT 0} \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} \\ c_{1, OTT 2} c_{2, OTT 0} \\ c_{2, OTT 2} c_{2, OTT 0} \end{matrix}] m

In this case,

c_{1, {OTT}_{X}}^{l, m} = \sqrt{\frac{10^{\frac{{CLD}_{X}^{l, m}}{10}}}{1 + 10^{\frac{{CLD}_{X}^{l, m}}{10}}}}

c_{2, OT T_{X}}^{l, m} = \sqrt{\frac{1}{1 + 10^{\frac{{CLD}_{X}^{l, m}}{10}}}},

, and ' m ' indicates monaural down-mix audio signal.

Have in the situation of the structure shown in Fig. 5 at this tree structure, second method of only utilizing CLD to obtain the source map information can be expressed as mathematics calculation 3.

Mathematics calculation 3

[\begin{matrix} L \\ Ls \\ R \\ Rs \\ C \\ LFE \end{matrix}] = [\begin{matrix} D_{L} \\ D_{Ls} \\ D_{R} \\ D_{Rs} \\ D_{C} \\ D_{LFE} \end{matrix}m] m = [\begin{matrix} c_{1, OTT 3} c_{1, OTT 1} c_{1, OTT 0} \\ c_{2, OTT 3} c_{1, OTT 1} c_{1, OTT 0} \\ c_{1, OTT 4} c_{2, OTT 1} c_{1, OTT 0} \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} \\ c_{1, OTT 2} c_{2, OTT 0} \\ c_{2, OTT 2} c_{2, OTT 0} \end{matrix}] m

If the source map information only utilizes CLD to generate, then 3-D effect may descend.So can utilize ICC and/or decorrelator to generate the source map information.And, can be expressed as mathematics calculation 4 by the multichannel information of utilizing decorrelator output signal dx (m) to generate.

Mathematics calculation 4

[\begin{matrix} L \\ R \\ C \\ LFE \\ Ls \\ Rs \end{matrix}] = [\begin{matrix} A_{L 1} m + B_{L 0} d_{0} (m) + B_{L 1} d_{1} (C_{L 1} m) + B_{L 3} d_{3} (C_{L 3} m) \\ A_{R 1} m + B_{R 0} d_{0} (m) + B_{R 1} d_{1} (C_{R 1} m) + B_{R 3} d_{3} (C_{R 3} m) \\ A_{C 1} m + B_{C 0} d_{0} (m) + B_{C 1} d_{1} (C_{C 1} m) \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} m \\ A_{LS 1} m + B_{LS 0} d_{0} (m) + B_{LS 2} d_{2} (C_{LS 2} m) \\ A_{RS 1} m + B_{RS 0} d_{0} (m) + B_{RS 2} d_{2} (C_{RS 2} m) \end{matrix}]

In this case, ' A ', ' B ' and ' C ' are can be by the value of utilizing CLD and ICC to represent.' d ₀' to ' d ₃' the indication decorrelator.And ' m ' indicates monaural down-mix audio signal.Yet this method is not useable for generating such as source map informations such as D_L, D_R.

Therefore, utilization is considered as independent input about CLD, the ICC of down-mix audio signal and/or first method of decorrelator generation source map information with dx (m) (x=0,1,2).In this case, ' dx ' can be used for generating the process that son is played up filter information according to mathematics calculation 5.

Mathematics calculation 5

FL_L_M=d_L_M*GL_L ' (output of a monophony input → left side)

FL_R_M=d_L_M*GL_R ' (output of the monophony input → right side)

FL_L_Dx=d_L_Dx*GL_L ' (output of a Dx output → left side)

FL_R_Dx=d_L_Dx*GL_R ' (output of the Dx output → right side)

And the information of playing up can utilize the result of mathematics calculation 5 to generate according to mathematics calculation 6.

Mathematics calculation 6

HM_L＝FL_L_M+FR_L_M+FC_L_M+FLS_L_M+FRS_L_M+FLFE_L_M

HM_R＝FL_R_M+FR_R_M+FC_R_M+FLS_R_M+FRS_R_M+FLFE_R_M

HDx_L＝FL_L_Dx+FR_L_Dx+FC_L_Dx+FLS_L_Dx+FRS_L_Dx+FLFE_L_Dx

HDx_R＝FL_R_Dx+FR_R_Dx+FC_R_Dx+FLS_R_Dx+FRS_R_Dx+FLFE_R_Dx

Playing up the details of information generative process explains after a while.First method of utilizing CLD, ICC and/or decorrelator to generate the source map information is that ' dx (m) ' handles as independent input with the dx output valve, and this may increase operand.

Second method of utilizing CLD, ICC and/or decorrelator to generate the source map information adopts the decorrelator of using on frequency domain.In this case, the source map information can be expressed is that mathematics calculates 7.

Mathematics calculation 7

[\begin{matrix} L \\ R \\ C \\ LFE \\ Ls \\ Rs \end{matrix}] = [\begin{matrix} A_{L 1} m + B_{L 0} d_{0} m + B_{L 1} d_{1} C_{L 1} m + B_{L 3} d_{3} C_{L 3} m \\ A_{R 1} m + B_{R 0} d_{0} m + B_{R 1} d_{1} C_{R 1} m + B_{R 3} d_{3} C_{R 3} m \\ A_{C 1} m + B_{C 0} d_{0} m + B_{C 1} d_{1} C_{C 1} m \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} m \\ A_{LS 1} m + B_{LS 0} d_{0} m + B_{LS 2} d_{2} C_{LS 2} m \\ A_{RS 1} m + B_{RS 0} d_{0} m + B_{RS 2} d_{2} C_{RS 2} m \end{matrix}] = [\begin{matrix} A_{L 1} + B_{L 0} d_{0} + B_{L 1} d_{1} C_{L 1} + B_{L 3} d_{3} C_{L 3} \\ A_{R 1} + B_{R 0} d_{0} + B_{R 1} d_{1} C_{R 1} + B_{R 3} d_{3} C_{R 3} \\ A_{C 1} + B_{C 0} d_{0} + B_{C 1} d_{1} C_{C 1} \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} \\ A_{LS 1} + B_{LS 0} d_{0} + B_{LS 2} d_{2} C_{LS 2} \\ A_{RS 1} + B_{RS 0} D_{0} + B_{RS 2} D_{2} C_{RS 2} \end{matrix}] m

In this case, by on frequency domain, using decorrelator, just can generate and use identical source map information before the decorrelator such as D_L, D_R etc.So it can be realized in simple mode.

Third party's method of utilizing CLD, ICC and/or decorrelator to generate the source map information adopts the decorrelator with all-pass characteristic as the decorrelator of second method.In this case, the all-pass characteristic is meant the big or small fixing phase place change that only has.And the present invention can adopt the decorrelator with all-pass characteristic as the decorrelator of first method.

The cubic method of utilizing CLD, ICC and/or decorrelator to generate the source map information replaces using ' d0 ' to ' d3 ' of second method to carry out decorrelation by using the decorrelator at corresponding each sound channel (for example, L, R, C, Ls, Rs etc.).In this case, the source map information can be expressed is that mathematics calculates 8.

Mathematics calculation 8

[\begin{matrix} L \\ R \\ C \\ LFE \\ Ls \\ Rs \end{matrix}] = [\begin{matrix} A_{L 1} + K_{L} d_{L} \\ A_{R 1} + K_{R} d_{R} \\ A_{C 1} + K_{C} d_{C} \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} \\ A_{LS 1} + K_{LS} d_{Ls} \\ A_{RS} \\ 1 + K_{RS} d_{Rs} \end{matrix}] m

In this case, ' k ' is from CLD and the definite energy value through de-correlated signals of ICC value.And ' d_L ', ' d_R ', ' d_C ', ' d_Ls ' and ' d_Rs ' indicate the decorrelator that is applied to all sound channels respectively.

Utilize CLD, ICC and/or decorrelator to generate the 5th method of source map information by in cubic method, ' d_L ' and ' d_R ' being configured to symmetrically and in cubic method, ' d_Ls ' and ' d_Rs ' being configured to make symmetrically the decorrelation maximum effect.Particularly, suppose d_R=f (d_L) and d_Rs=f (d_Ls), only need design ' d_L ', ' d_C ' and ' d_Ls '.

The 6th method of utilizing CLD, ICC and/or decorrelator to generate the source map information is in the 5th method ' d_L ' and ' d_Ls ' to be configured to have correlativity.And, also ' d_L ' and ' d_C ' can be configured to have correlativity.

The 7th method of utilizing CLD, ICC and/or decorrelator to generate the source map information is with series connection or the nested structure of the decorrelator in third party's method as all-pass filter.Even the 7th method has been utilized all-pass filter also can be kept this fact as series connection or nested structure all-pass characteristic.In with the situation of all-pass filter, can obtain more different types of phase responses as series connection or nested structure.Therefore, can make the decorrelation maximum effect.

Utilize CLD, ICC and/or decorrelator generate the source map information the from all directions method be that the decorrelator of the correlation technique frequency domain de-correlation device with second method is used.In this case, multi-channel signal can be expressed as mathematics calculation 9.

Mathematics calculation 9

[\begin{matrix} L \\ R \\ C \\ LFE \\ Ls \\ Rs \end{matrix}] = [\begin{matrix} A_{L 1} + K_{L} d_{L} \\ A_{R 1} + K_{R} d_{R} \\ A_{C 1} + K_{C} d_{C} \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} \\ A_{LS 1} + K_{LS} d_{Ls} \\ A_{RS} \\ 1 + K_{RS} d_{Rs} \end{matrix}] m + [\begin{matrix} P_{L 0} d_{new 0} (m) + P_{L 1} d_{new 1} (m) + \cdot \cdot \cdot \\ P_{R 0} d_{new 0} (m) + P_{R 1} d_{new 1} (m) + \cdot \cdot \cdot \\ P_{C 0} d_{new 0} (m) + P_{C 1} d_{new 1} (m) + \cdot \cdot \cdot \\ 0 \\ P_{Ls 0} d_{new 0} (m) + P_{Ls 1} d_{new 1} (m) + \cdot \cdot \cdot \\ P_{Rs 0} d_{new 0} (m) + P_{Rs 1} d_{new 1} (m) + \cdot \cdot \cdot \end{matrix}]

In this case, the filter coefficient generative process is used the identical process of explaining in first method---except ' A ' made into ' A+Kd '.

The 9th method of utilizing CLD, ICC and/or decorrelator to generate the source map information is to generate value through further decorrelation by the output that the frequency domain de-correlation device is applied to the decorrelator of this correlation technique in the situation of the decorrelator that uses correlation technique.Therefore, can the limitation by overcoming the frequency domain de-correlation device to generate the source map information with seldom operand.

The tenth method of utilizing CLD, ICC and/or decorrelator to generate the source map information is expressed as mathematics calculation 10.

Mathematics calculation 10

[\begin{matrix} L \\ R \\ C \\ LFE \\ Ls \\ Rs \end{matrix}] = [\begin{matrix} A_{L 1} m + K_{L} d_{L} (m) \\ A_{R 1} m + K_{R} d_{R} (m) \\ A_{C 1} m + K_{C} d_{C} (m) \\ c_{2, OTT 4} c_{2, OTT 1} c_{1, OTT 0} m \\ A_{LS 1} m + K_{Ls} d_{Ls} (m) \\ A_{RS 1} m + K_{Rs} d_{Rs} (m) \end{matrix}]

In this case, and ' di_ (m) ' (i=L, R, C, Ls Rs) is the decorrelator output valve that is applied to sound channel i.And this output valve can be handled on time domain, frequency domain, QMF territory, hybrid domain etc.If output valve is handled on the territory different with the territory of working as pre-treatment, then it can be changed by the territory and be converted.Can use same ' d to d_L, d_R, d_C, d_Ls and d_Rs.In this case, can express mathematics calculation 10 in very simple mode.

If mathematics calculation 10 is applied to mathematics calculation 1, then mathematics can be calculated 1 and be expressed as mathematics calculation 11.

Mathematics calculation 11

Lo＝HM_L*m+HMD_L*d(m)

Ro＝HM_R*R+HMD_R*d(m)

In this case, playing up information HM_L generates around the resulting value of signal Lo to use input m from interblock space information and filter information.And the information HM_R of playing up generates around the resulting value of signal Ro to use input m from interblock space information and filter information.In addition, ' d (m) ' is by transferring the decorrelator output valve on the arbitrarily-shaped domain to decorrelator output valve that the value on the current field generates, or by on the current field, handling the decorrelator output valve that generates.The information HMD_L of playing up is meant and is shown in when playing up d (m) value of adding the degree of decorrelator output valve d (m) to ' Lo ', and the value that still spatial information and filter information is combined and obtain.The information HMD_R of playing up is meant and is shown in the value of adding the degree of decorrelator output valve d (m) when playing up d (m) to ' Ro '.

Thus, for being carried out, monaural down-mix audio signal plays up processing, the present invention proposes a kind of by playing up by interblock space information and filter information (for example, hrtf filter coefficient) and the information of playing up that generates generates the method around signal to down-mix audio signal with through the down-mix audio signal of decorrelation.This render process can be not limited to ground, territory and carry out.If ' d (m) ' is expressed as ' d*m ' (product operator) of carrying out on frequency domain, then mathematics can be calculated 11 and be expressed as mathematics calculation 12.

Mathematics calculation 12

Lo＝HM_L*m+HMD_L*d*m＝HMoverall_L*m

Ro＝HM_R*m+HMD_R*d*m＝HMoveralf_R*m

Thus, on frequency domain, down-mix audio signal is carried out in the situation of render process, can operand be minimized in the mode that will be expressed as product form rightly from the value that interblock space information, filter information and decorrelator combination obtain.

Fig. 6 and Fig. 7 are the detailed diagram that is used for the rendering unit of stereosonic down-mix audio signal according to one embodiment of present invention.

With reference to figure 6, rendering unit 900 comprises rendering unit-A 910 and rendering unit-B 920.

If down-mix audio signal is a stereophonic signal, then spatial information converting unit 1000 generates and is used for the left side of down-mix audio signal and the information of playing up of R channel.Rendering unit-A 910 generates around signal by the information of playing up from the L channel that is used for this down-mix audio signal to the L channel of down-mix audio signal that play up.And rendering unit-B 920 generates around signal by the information of playing up from the R channel that is used for this down-mix audio signal to the R channel of down-mix audio signal that play up.The title of sound channel only is exemplary, and it is not construed as limiting the present invention.

The information of playing up can comprise information of playing up that is delivered to same sound channel and the information of playing up that is delivered to another sound channel.

For example, spatial information converting unit 1000 can generate the information of the playing up HL_L and the HL_R of the rendering unit that inputs to the L channel that is used for down-mix audio signal, wherein play up information HL_L and be delivered to, and the information HL_R of playing up is delivered to the right side output corresponding to another sound channel corresponding to the output of the left side of same sound channel.And, spatial information converting unit 1000 can generate the information of the playing up HR_R and the HR_L of the rendering unit that inputs to the R channel that is used for down-mix audio signal, wherein play up information HR_R and be delivered to, and the information HR_L of playing up is delivered to the left side output corresponding to another sound channel corresponding to the output of the right side of same sound channel.

With reference to figure 7, rendering unit 900 comprises rendering unit-1A 911, rendering unit-2A 912, rendering unit-1B 921 and rendering unit-2B 922.

Rendering unit 900 receives stereosonic down-mix audio signal and from the information of playing up of spatial information converting unit 1000.Subsequently, rendering unit 900 is played up information and is generated around signal by play up this to this stereosonic down-mix audio signal.

Particularly, rendering unit-1A 911 is used for the information of the playing up HL_L that is delivered to same sound channel in the middle of the information of playing up of L channel of down-mix audio signal and carries out and play up by utilization.Rendering unit-2A 912 is used for being delivered in the middle of the information of playing up of L channel of down-mix audio signal another sound channel by utilization the information of playing up HL_R carries out and plays up.The information of the playing up HR_R that rendering unit-1B 921 is used for being delivered in the middle of the information of playing up of R channel of down-mix audio signal same sound channel carries out and plays up.And rendering unit-2B 922 is used for being delivered in the middle of the information of playing up of R channel of down-mix audio signal another sound channel by utilization the information of playing up HR_L carries out and plays up.

In the following description, the information of playing up that is delivered to another sound channel is named as ' intersection is played up information '.Intersect and to play up information HL_R or HR_L and be applied to same sound channel and add to another sound channel by totalizer then.In this case, intersect and to play up information HL_R and/or HR_L can be 0.If intersection plays up information HL_R and/or HR_L is 0, then mean not contribution of respective paths.

The example around signal creating method shown in Fig. 6 or Fig. 7 is explained as follows.

At first, if down-mix audio signal is a stereophonic signal, then be defined as ' x ' down-mix audio signal, be defined as ' D ' source map information, the prototype filter information that is defined as ' G ' passing through to utilize spatial information to generate, be defined as the multi-channel signal of ' p ' and be defined as ' y ' can calculate the matrix representation shown in 13 by mathematics around signal.

Mathematics calculation 13

x = [\begin{matrix} Li \\ Ri \end{matrix}],

p = [\begin{matrix} L \\ Ls \\ R \\ Rs \\ C \\ LFE \end{matrix}],

D = [\begin{matrix} D_L 1 & D_L 2 \\ D_Ls 1 & D_Ls 2 \\ D_R 1 & D_R 2 \\ D_Rs 1 & D_Rs 2 \\ D_C 1 & D_C 2 \\ D_LFE 1 & D_LFE 2 \end{matrix}],

G = [\begin{matrix} GL_L & GLs_L & GR_L & GRs_L & GC_L & GLFE_L \\ GL_R & GLs_R & GR_R & GRs_R & GC_R & GLFE_R \end{matrix}],

y = [\begin{matrix} Lo \\ Ro \end{matrix}]

In this case, if above-mentioned value is on frequency domain, then they can followingly launch.

At first, as shown in mathematics calculation 14, multi-channel signal p can be expressed as by utilizing source map information D that spatial information generates and the product between the down-mix audio signal x.

Mathematics calculation 14

p = D \cdot x [\begin{matrix} L \\ Ls \\ R \\ Rs \\ C \\ LFE \end{matrix}] = [\begin{matrix} D_L 1 & D_L 2 \\ D_Ls 1 & D_Ls 2 \\ D_R 1 & D_R 2 \\ D_Rs 1 & D_Rs 2 \\ D_C 1 & D_C 2 \\ D_LFE 1 & D_LFE 2 \end{matrix}] [\begin{matrix} Li \\ Ri \end{matrix}]

Can generate by play up prototype filter information G to multi-channel signal p shown in mathematics calculation 15 around signal y.

Mathematics calculation 15

y＝G·p

In this case, if mathematics is calculated 14 substitution p, then can be generated as mathematics calculation 16.

Mathematics calculation 16

y＝GDx

In this case, be defined as H=GD, then can have the relation of mathematics calculation 17 around signal y and down-mix audio signal x if will play up information H.

Mathematics calculation 17

H = [\begin{matrix} HL_L & HR_L \\ HL_R & HR_R \end{matrix}],

y＝Hx

Therefore, by handle filter information and source map information long-pending generate play up information H after, with down-mix audio signal x multiply by play up information H with generation around signal y.

According to the definition of playing up information H, the information H of playing up can be expressed as mathematics calculation 18.

Mathematics calculation 18

H＝GD

[\begin{matrix} GL_L & GLs_L & GR_L & GRs_L & GC_L & GLFE_L \\ GL_R & GLs_R & GR_R & GRs_R & GC_R & GLFE_R \end{matrix}] [\begin{matrix} D_L 1 & D_L 2 \\ D_Ls 1 & D_Ls 2 \\ D_R 1 & D_R 2 \\ D_Rs 1 & D_Rs 2 \\ D_C 1 & D_C 2 \\ D_LFE 1 & D_LFE 2 \end{matrix}]

Fig. 8 and Fig. 9 are the detailed diagram that is used for the rendering unit of monaural down-mix audio signal according to one embodiment of present invention.

With reference to figure 8, rendering unit 900 comprises rendering unit-A 930 and rendering unit-B 940.

If down-mix audio signal is a monophonic signal, then spatial information converting unit 1000 generates and plays up information HM_L and HM_R, wherein playing up information HM_L is to use when playing up this monophonic signal to L channel, and the information HM_R of playing up uses when playing up this monophonic signal to R channel.

Rendering unit-A 930 will play up information HM_L and be applied to monaural down-mix audio signal to generate L channel around signal.Rendering unit-B 940 will play up information HM_R and be applied to monaural down-mix audio signal to generate R channel around signal.

Rendering unit 900 among the figure is not used decorrelator.Yet,, can obtain to use the output of decorrelator respectively if rendering unit-A 930 and rendering unit-B 940 play up by the information of the playing up Hmoverall_R and the Hmoverall_L execution that utilize definition in the mathematics calculation 12 respectively.

Simultaneously, after finishing playing up that monaural down-mix audio signal is carried out, attempt to obtain in stereophonic signal rather than the situation around the output of signal, below two kinds of methods be possible.

First method is to use the value that is used for stereo output to replace using to be used for the information of playing up of surrounding effect.In this case, can obtain stereophonic signal by the information of only revising in the structure shown in Fig. 3 of playing up.

Second method is to utilize down-mix audio signal and spatial information to generate in the decode procedure of multi-channel signal, can obtain stereophonic signal by decode procedure only being carried out the corresponding steps that obtains the particular channel number.

With reference to figure 9, rendering unit 900 is corresponding to wherein being represented as one through de-correlated signals, and promptly mathematics calculates 11 situation.Rendering unit 900 comprises rendering unit-1A 931, rendering unit-2A 932, rendering unit-1B 941 and rendering unit-2B 942.Rendering unit 900 is similar to the rendering unit that is used for stereosonic down-mix audio signal---comprise the rendering unit 941 and 942 that is used for through de-correlated signals except rendering unit 900.

In the situation of stereosonic down-mix audio signal, can think that one of two sound channels are through de-correlated signals.So, under the situation that does not adopt the additional decorrelation device, can play up execution render process such as information HL_L, HL_R by using four kinds of previous definition.Particularly, rendering unit-1A 931 is applied to monaural down-mix audio signal and generates and will be delivered to the signal of same sound channel by playing up information HM_L.Rendering unit-2A 932 is applied to monaural down-mix audio signal and generates and will be delivered to the signal of another sound channel by playing up information HM_R.Rendering unit-1B 941 is applied to generate and will be delivered to the signal of same sound channel through de-correlated signals by playing up information HMD_R.And rendering unit-2B 942 is applied to this and generates and will be delivered to the signal of another sound channel through de-correlated signals by playing up information HMD_L.

If down-mix audio signal is a monophonic signal, then be defined as x down-mix audio signal, be defined as D the source channel information, be defined as G prototype filter information, be defined as the multi-channel signal of p and be defined as y can calculate the matrix representation shown in 19 by mathematics around signal.

Mathematics calculation 19

x＝[Mi]，

p = [\begin{matrix} L \\ Ls \\ R \\ Rs \\ C \\ LFE \end{matrix}],

D = [\begin{matrix} D_L \\ D_Ls \\ D_R \\ D_Rs \\ D_C \\ D_LFE \end{matrix}],

G = [\begin{matrix} GL_L & GLs_L & GR_L & GRs_L & GC_L & GLFE_L \\ GL_R & GLs_R & GR_R & GRs_R & GC_R & GLFE_R \end{matrix}],

y = [\begin{matrix} Lo \\ Ro \end{matrix}]

In this case, the relation object between these matrixes is similar to the relation in the situation that down-mix audio signal is a stereophonic signal.So omit its details.

Simultaneously, have every frequency band, parameter band and/or transmit the different value of time slot with reference to the source map information of figure 4 and Fig. 5 description and by the information of playing up of utilizing this source map information to generate.In this case, if source map information and/or the value of playing up information have sizable poorly between the phase adjacent band or between the time slot of border, then in render process, may distort.For anti-distortion here, need the smoothing process on frequency domain and/or the time domain.Except frequency domain smoothing and/or time domain smoothly, also can use to be applicable to other smoothing method of playing up.And, can use from the source map information or the information of playing up be multiply by the value that a certain gain obtains.

Figure 10 and Figure 11 are the block diagrams of smooth unit according to an embodiment of the invention and expanding element.

As shown in Figure 10 and Figure 11, smoothing method according to the present invention can be applicable to play up information and/or source map information.Yet this smoothing method also can be applicable to the information of other type.In the following description, level and smooth on the frequency domain described.Yet except frequency domain smoothing, the present invention comprises that also time domain is level and smooth.

With reference to Figure 10 and Figure 11, smooth unit 1042 can be carried out level and smooth to playing up information and/or source map information.After a while the specific example of the position of level and smooth generation will be described referring to figs. 18 to Figure 20.

Smooth unit 1042 can be configured to and expanding element 1043 couplings, plays up information and/or source map information and can be extended to the scope wideer than parameter band in expanding element---for example in the wave filter band.Particularly, the source map information can be extended to the corresponding frequency resolution of filter information (for example, wave filter band) so that multiply by this filter information (for example, hrtf filter coefficient).Smoothly before expansion or with expansion, carry out according to of the present invention.Smoothly can adopt one of method shown in Figure 12 to 16 with what expansion was used.

Figure 12 is the coordinate diagram that is used to explain first smoothing method according to an embodiment of the invention.

With reference to Figure 12, first smoothing method adopts the value that has identical size in each parameter band with spatial information.In this case, can realize smooth effect by using suitable smooth function.

Figure 13 is the coordinate diagram that is used to explain second smoothing method according to an embodiment of the invention.

With reference to Figure 13, second smoothing method is to obtain smooth effect by the representative locations that connects the parameter band.Representative locations be in all parameter bands each positive center, with proportional centers such as logarithmically calibrated scale, Bark scale.Lowest frequency value or by the pre-determined position of distinct methods.

Figure 14 is the coordinate diagram that is used to explain the 3rd smoothing method according to an embodiment of the invention.

With reference to Figure 14, the 3rd smoothing method is to carry out level and smooth with the curve on the border that connects parameter smoothly or the form of straight line.In this case, the 3rd smoothing method uses default edge smoothing curve or by single order or the more iir filter of high-order or the low-pass filtering that the FIR wave filter is done.

Figure 15 is the coordinate diagram that is used to explain Siping City according to an embodiment of the invention cunning method.

With reference to Figure 15, Siping City's cunning method is to realize smooth effect by the signal that adds such as random noise to the spatial information profile.In this case, can be with values different in sound channel or frequency band as random noise.On frequency domain, add in the situation of random noise, can when keeping phase value constant, only add sizes values.The smooth effect on frequency domain, Siping City's cunning method also can realize decorrelation effect between sound channel.

Figure 16 is the coordinate diagram that is used to explain the 5th smoothing method according to an embodiment of the invention.

With reference to Figure 16, the 5th smoothing method is to use second combination to Siping City's cunning method.For example, after the representative locations that connects representational parameter band, add random noise and application of low-pass subsequently.So just can revise sequence.The 5th smoothing method minimizes the point of discontinuity on the frequency domain, and can strengthen decorrelation effect between sound channel.

In first to the 5th smoothing method, the general power of the spatial information value on the corresponding frequency domain of every sound channel (for example, CLD value) should be uniform as constant.For this reason, after smoothing method is carried out on every sound channel ground, should carry out power normalization.For example, if down-mix audio signal is a monophonic signal, then the level value of corresponding each sound channel should satisfy the relation of mathematics calculation 20.

Mathematics calculation 20

D_L(pb)+D_R(pb)+D_C(pb)+D_Ls(pb)+D_Rs(pb)+D_Lfe(pb)＝C

In this case, ' pb=0～total parameter frequency 1 ', and ' C ' is arbitrary constant.

Figure 17 is the figure that is used to explain the prototype filter information of every sound channel.

With reference to Figure 17, in order to play up, the signal of the GL_L wave filter by being used for the L channel source is sent to left side output, and the signal by the GL_R wave filter is sent to right output.

Subsequently, (for example, Lo) export (for example, Ro) by generating finally output of a left side with the right side is final from all signal plus that corresponding each sound channel receives.A left side of being played up particularly ,/R channel output can be expressed as mathematics calculation 21.

Mathematics calculation 21

Lo＝L*GL_L+C*GC_L+R*GR_L+Ls*GLs_L+Rs*GRs_L

Ro＝L*GL_R+C*GC_R+R*GR_R+Ls*GLs_R+Rs*GRs_R

The output of a left side of being played up in the present invention ,/R channel can generate by utilizing by utilizing spatial information that down-mix audio signal is decoded into L, R, C, Ls and the Rs that multi-channel signal generates.And a left side/R channel that the present invention can utilize the information of playing up to generate under the situation that does not generate L, R, C, Ls and Rs and be played up is exported, and the information of wherein playing up generates by utilizing spatial information and filter information.

Explain that referring to figs. 18 to 20 the process of utilizing spatial information to generate the information of playing up is as follows.

Figure 18 is the block diagram that generates first method of the information of playing up according to one embodiment of present invention in spatial information converting unit 900.

With reference to Figure 18, as mentioned in the above description, spatial information converting unit 900 comprises that source map unit 1010, son play up information generating unit 1020, integral unit 1030, processing unit 1040 and territory converting unit 1050.Spatial information converting unit 900 has and configuration identical shown in Fig. 3.

Son is played up information generating unit 1020 and is comprised that at least one or a plurality of son play up information generating unit (the 1st son is played up information generating unit to the N and played up information generating unit).

Son is played up information generating unit 1020 and is played up information by using filter information and source map information to generate son.

For example, if down-mix audio signal is a monophonic signal, then first son is played up information generating unit and can be generated corresponding to the son of the L channel on the multichannel and play up information.And can utilize source map information D_L and the filter information GL_L ' through changing and GL_R ' that this son is played up information representation is mathematics calculation 22

Mathematics calculation 22

FL_L＝D_L*GL_L′

(monophony input → to the filter coefficient of left output channels)

FL_R＝D_L*GL_R′

(monophony input → to the filter coefficient of right output channels)

In this case, D_L is the value by utilizing spatial information to generate in source map unit 1010.Yet the process that generates D_L can be followed tree structure.

Second son is played up information generating unit and can be generated corresponding to the son of the R channel on the multichannel and play up information FR_L and FR_R.And N is played up information generating unit and can be generated corresponding to the son of the right surround channel on the multichannel and play up information FRs_L and FRs_R.

If down-mix audio signal is a stereophonic signal, then first son is played up information generating unit and can be generated corresponding to the son of the L channel on the multichannel and play up information.And, can be mathematics calculation 23 by utilizing source map information D_L1 and D_L2 that this son is played up information representation.

Mathematics calculation 23

FL_L1＝D_L1*GL_L′

(left side input → to the filter coefficient of left output channels)

FL_L2＝D_L2*GL_L′

(right input → to the filter coefficient of left output channels)

FL_R1＝D_L1*GL_R′

(left side input → to the filter coefficient of right output channels)

PL_R2＝D_L2*GL_R′

(right input → to the filter coefficient of right output channels)

In mathematics calculation 23, for example, description below FL_R1.

At first, in FL_R1, the position of ' L ' indication multichannel, ' R ' indicating ring is around the output channels of signal, and the sound channel of ' 1 ' indication down-mix audio signal.That is, the son that uses when the L channel from down-mix audio signal generates right output channels around signal of FL_R1 indication is played up information.

The second, D_L1 and D_L2 are the values by utilizing spatial information to generate in source map unit 1010.

If down-mix audio signal is a stereophonic signal, then can be so that to be the identical mode of the situation of monophonic signal play up information generating unit from least one height generates a plurality of sons and play up information with down-mix audio signal.Playing up son that information generating unit generates by a plurality of sons, to play up the type of information be exemplary, and this is not construed as limiting the present invention.

Playing up son that information generating unit 1020 generates by son plays up information and is sent to rendering unit 900 via integral unit 1030, processing unit 1040 and territory converting unit 1050.

Integral unit 1030 is played up the information of playing up (for example, HL_L, HL_R, HR_L, HR_R) that information integrated one-tenth is used for render process with the son that every sound channel generates.Integration process under description below monophonic signal situation and the stereophonic signal situation in the integral unit 1030.

At first, if down-mix audio signal is a monophonic signal, the information of then playing up can be expressed as mathematics calculation 24.

Mathematics calculation 24

HM_L＝FL_L+FR_L+FC_L+FLs_L+FRs_L+FLFE_L

HM_R＝FL_R+FR_R+FC_R+FLs_R+FRs_R+FLFE_R

The second, if down-mix audio signal is a stereophonic signal, can be mathematics calculation 25 then with playing up information representation.

Mathematics calculation 25

HL_L＝FL_L1+FR_L1+FC_L1+FLs_L1+FRs_L1+FLFE_L1

HR_L＝FL_L2+FR_L2+FC_L2+FLs_L2+FRs_L2+FLFE_L2

HL_R＝FL_R1+FR_R1+FC_R1+FLs_R1+FRs_R1+FLFE_R1

HR_R＝FL_R2+FR_R2+FC_R2+FLs_R2+FRs_R2+FLFE_R2

Subsequently, processing unit 1040 comprises interpolation unit 1041 and/or smooth unit 1042, and carries out at the interpolation of playing up information and/or level and smooth.Interpolation and/or smoothly can on time domain, frequency domain or QMF territory, carry out.In this manual, be example with the time domain, this is not construed as limiting the present invention.

If the information of playing up that is transmitted has wide interval on time domain, then carry out interpolation to obtain to play up the defunct information of playing up between the information.For example, suppose to play up information and be present in respectively in n time slot and (n+k) time slot, then can on the time slot that does not transmit, carry out linear interpolation by using the information of playing up (for example, HL_L, HR_L, HL_R, HR_R) that is generated.

With reference to down-mix audio signal is that the situation of monophonic signal and situation that down-mix audio signal is stereophonic signal are explained the information of playing up that generates from interpolation.

If down-mix audio signal is a monophonic signal, then interpolation can be played up information representation is mathematics calculation 26.

Mathematics calculation 26

HM_L(n+j)＝HM_L(n)*(1-a)+HM_L(n+k)*a

HM_R(n+j)＝HM_R(n)*(1-a)+HM_R(n+k)*a

If down-mix audio signal is a stereophonic signal, can be mathematics calculation 27 then with the information representation of playing up through interpolation.

Mathematics calculation 27

HL_L(n+j)＝HL_L(n)*(1-a)+HL_L(n+k)*a

HR_L(n+j)＝HR_L(n)*(1-a)+HR_L(n+k)*a

HL_R(n+j)＝HL_R(n)*(1-a)+HL_R(n+k)*a

HR_R(n+j)＝HR_R(n)*(1-a)+HR_R(n+k)*a

In this case, 0＜j＜k is arranged.' j ' and ' k ' is integer.And, ' a ' with will be expressed as mathematics calculation 28 ' the corresponding real numbers of ' 0＜a＜1 '.

Mathematics calculation 28

a＝j/k

If like this, can calculate 27 according to mathematics and obtain the corresponding value of not transmission time slot on the straight line with value in being connected these two time slots with mathematics calculation 28.To explain the details of interpolation with reference to Figure 22 and Figure 23 after a while.

In the situation of suddenling change between the two adjacent time slots of filter coefficient value on time domain, smooth unit 1042 is carried out smoothly to prevent because the distortion problem that the appearance of point of discontinuity causes.Can utilize referring to figs 12 to 16 smoothing methods of describing and carry out level and smooth on the time domains.Smoothly can carry out with expansion.And, smoothly can be and difference according to its applied position.If down-mix audio signal is a monophonic signal, then time domain smoothly can be expressed as mathematics calculation 29.

Mathematics calculation 29

HM_L(n)′＝HM_L(n)*b+HM_L(n-1)′*(1-b)

HM_R(n)′＝HM_R(n)*b+HM_R(n-1)′*(1-b)

That is, smoothly can be by according to will in last time slot n-1, having made the level and smooth information of playing up HM_L (n-1) or HM_R (n-1) multiply by (1-b), the information of the playing up HM_L (n) that generates in current time slots or HM_R (n) multiply by b and the 1-pol iir filter type that the mode of these two multiplication result additions is carried out is carried out.In this case, ' b ' is the constant of 0＜b＜1.If ' b ' diminishes, then smooth effect becomes big.If ' b ' becomes big, then smooth effect diminishes.And, can use remaining wave filter in an identical manner.

Can be by utilizing at the level and smooth mathematics calculation of time domain 29 with interpolation with smoothly be expressed as mathematics and calculate an expression formula shown in 30.

Mathematics calculation 30

HM_L(n+j)′＝(HM_L(n)*(1-a)+HM_L(n+k)*a)*b+HM_L(n+j-1)′*(1-b)

HM_R(n+j)′＝(HM_R(n)*(1-a)+HM_R(n+k)*a)*b+HM_R(n+j-1)′*(1-b)

If if carried out interpolation and/or carried out smoothly, then can obtain to have the information of playing up of playing up the different energy value of the energy value of information with prototype by smooth unit 1042 by interpolation unit 1041.In order to prevent this problem, can carry out energy normalized in addition.

At last, 1050 pairs of territory converting units are played up information and executing at the territory conversion that is used to carry out the territory of playing up.If it is identical with the territory of playing up information to be used to carry out the territory of playing up, then can not carry out this territory conversion.That will change through the territory afterwards, plays up information transmission to rendering unit 900.

Figure 19 is the block diagram that generates second method of the information of playing up according to one embodiment of present invention in the spatial information converting unit.

The similar part of second method and first method is that spatial information converting unit 1000 comprises that source map unit 1010, son play up information generating unit 1020, integral unit 1030, processing unit 1040 and territory converting unit 1050, and is that son plays up information generating unit 1020 and comprise that at least one height plays up information generating unit.

With reference to Figure 19, the difference that generates second method of the information of playing up and first method is the position of processing unit 1040.So, can carry out interpolation and/or level and smooth to play up the son that every sound channel ground generates in the information generating unit 1020 at son with playing up the every sound channel of information (for example, FL_L in the monophony situation and FL_R or the FL_L1 in the stereophonic signal situation, FL_L2, FL_R1, FL_R2).

Subsequently, integral unit 1030 will be played up information integrated one-tenth through interpolation and/or level and smooth son and play up information.

The information of playing up that is generated is transferred to rendering unit 900 via territory converting unit 1050.

Figure 20 is the block diagram that generates third party's method of playing up filter information according to one embodiment of present invention in the spatial information converting unit.

The similar part of the third party's method and first or second method is that spatial information converting unit 1000 comprises that source map unit 1010, son play up information generating unit 1020, integral unit 1030, processing unit 1040 and territory converting unit 1050, and is that son plays up information generating unit 1020 and comprise that at least one height plays up information generating unit.

With reference to Figure 20, third party's method of information is played up in generation and the difference of first or second method is that processing unit 1040 is adjacent with source map unit 1010.So, can carry out interpolation and/or level and smooth to the every sound channel of the source map information that generates by usage space information in source map unit 1010 ground.

Subsequently, son is played up information generating unit 1020 and is played up information by utilizing through interpolation and/or level and smooth source map information and filter information generation.

Son is played up information and be integrated into the information of playing up in integral unit 1030.And, the information of playing up that is generated is transferred to rendering unit 900 via territory converting unit 1050.

Figure 21 is used for explaining according to one embodiment of present invention at the figure of rendering unit generation around the method for signal.Figure 21 is illustrated in the render process of carrying out on the DFT territory.Yet this render process can not realize on the same area in a similar manner yet.Figure 21 illustrates the situation that input signal is monaural down-mix audio signal.Yet Figure 21 can be applied to comprise other input sound channel of stereosonic down-mix audio signal etc. in a similar manner.

With reference to Figure 21, the monaural down-mix audio signal on the time domain is preferably carried out in the converting unit of territory has windowing of overlapping interval OL.Figure 21 illustrates and uses 50% overlapping situation.Yet, the present invention includes and use other overlapping situation.

The window function that is used to carry out windowing can adopt by the function that has good frequency selectivity in no uncontinuity ground seamless link on the time domain on the DFT territory.For example, a sinusoidal square window function can be used as this window function.

Subsequently, utilize the information of in the converting unit of territory, changing of playing up, the monaural down-mix audio signal with OL*2 length of obtaining from windowing is carried out the zero padding ZL of tap (tab) length [accurately, being (tap length)-1] of playing up wave filter.Carry out the territory then and transfer the DFT territory to.Figure 20 illustrates piece-k down-mix audio signal and is transformed in the DFT territory by the territory.

Play up through the wave filter of playing up that the down-mix audio signal of territory conversion is played up information by use.Render process can be expressed as down-mix audio signal and the product of playing up information.Down-mix audio signal through playing up experiences IDFT (inverse discrete Fourier transformer inverse-discrete) in inverse conversion unit, territory, overlapping to generate around signal with the down-mix audio signal of before having carried out with the delay of OL length (the piece k-1 among Figure 20) then.

Can on each piece of this render process of experience, carry out interpolation.The description below interpolation method.

Figure 22 is the figure of first interpolation method according to an embodiment of the invention.Can on each position, carry out according to interpolation of the present invention.For example, interpolation can be carried out on each position in the spatial information converting unit shown in Figure 18 to Figure 20, or can carry out in rendering unit.Can be with spatial information, source map information, filter information etc. as the value for the treatment of interpolation.In this manual, spatial information exemplarily is used for describing.Yet, the invention is not restricted to spatial information.In be inserted in to extend to more before the broadband or therewith and carry out.

With reference to Figure 22, the spatial information that transmits from code device can transmit from the random site transmission rather than at each time slot.An air-frame can carry a plurality of spatial information collection (for example, parameter set n and the n+1 among Figure 22).In the situation of low bit rate, an air-frame can carry single new spatial information collection.So, be to use the value of the adjacent spatial information collection that has transmitted to carry out to not transmitting the interpolation of time slot.Always does not mate with time slot at the interval that is used to carry out between the window of playing up.So, as shown in Figure 22, find out the value that the interpolation of in the center of playing up window (K-1, K, K+1, K+2 etc.) goes out and use.Although Figure 22 is illustrated between the time slot that has the spatial information collection and carries out linear interpolation, the invention is not restricted to this interpolation method.For example, sterility interpolation on the time slot that does not have the spatial information collection.Adopt previous or predefined value but can replace.

Figure 23 is the figure of second interpolation method according to an embodiment of the invention.

With reference to Figure 23, second interpolation method according to an embodiment of the invention has the interval of will adopt preceding value, the combined structures such as interval that adopt predefined default value.For example, can by use keep preceding value method, adopt the method for predefined default value and in the interval of an air-frame, carry out at least a interpolation of carrying out in the method for linear interpolation.In a window, exist in the situation of at least two new spatial information collection, may distort.In the following description, explanation is used to prevent that the piece that distorts from switching.

Figure 24 is according to an embodiment of the invention figure that switches method.

With reference to Figure 24 (a), because length of window is greater than slot length, so may have at least two spatial information collection (for example, parameter set n and the n+1 among Figure 24) between window region.In this case, each that spatial information should be concentrated is applied to different time slots.Yet,, may distort if used a value that obtains from these at least two spatial information collection of interpolation.That is, may take place owing to distortion according to the temporal resolution deficiency of length of window.

In order to address this problem, can use to change window size to cooperate the changing method of time slot resolution.For example, shown in Figure 24 (b),, window size can be switched to the short window of size for requiring high-resolution interval.In this case, at the beginning part and the latter end place of the window that has switched, use to connect window to prevent on the time domain of the window that has switched, seam occurring.

Length of window can not be to transmit but replace by usage space information in decoding device as independent additional information to determine.For example, length of window can be determined by the interval that utilizes the time slot that upgrades spatial information.That is,, then use the short window function of length if be used to upgrade the interval narrow of spatial information.If be used to upgrade the interval wide of spatial information, then use the long window function of length.In this case, by in playing up, using the window of variable-length, advantageously do not use the bit of send window length information individually.Two types length of window has been shown in Figure 24 (b).Yet, can use window with all lengths according to the relation of transmission frequency and spatial information.The length of window information that is determined can be applicable to generate each step around signal, and this will explain in the following description.

Figure 25 is a block diagram of using the position of the length of window that is determined by length of window decision unit according to one embodiment of present invention.

With reference to Figure 25, length of window decision unit 1400 can decide length of window by usage space information.Can be applicable to source map unit 1010, integral unit 1030, processing unit 1040,

territory converting unit

1050 and 1100 and inverse conversion unit, territory 1300 about the information of the length of window that determined.Figure 25 illustrates the situation of using stereosonic down-mix audio signal.Yet the present invention is not limited only to stereosonic down-mix audio signal.As mentioned in the above-mentioned description, even length of window shortens, the zero-padded length that determines according to the filter tap number also is uncontrollable.So, explain the solution of this problem in the following description.

Figure 26 is the figure of the wave filter with all lengths that uses in audio signal according to one embodiment of present invention.As mentioning in the above description,, then add up to the covering of respective length in fact, thereby cause the temporal resolution deficiency if do not regulate according to the zero-padded length of filter tap number decision.The solution of this problem is to shorten the length of zero padding by the length of restriction filter tap.The method that shortens zero-padded length can realize by the afterbody (for example, corresponding between the diffusion region of echoing) that blocks response.In this case, render process may be lower than the situation degree of accuracy of the afterbody that does not block filter response.Yet the filter coefficient value on the time domain is very little, echoes thereby mainly influenced.So, the appreciable impact that tonequality is not blocked.

With reference to Figure 26, there are four kinds of wave filters to use.These four kinds of wave filters can use on the DFT territory, and this is not construed as limiting the present invention.

Wave filter-N indication has long filter length FL and is not subjected to the wave filter of the long zero-padded length 2*OL of filter tap number restriction.Wave filter-N2 indication has the wave filter with same filter length FL of the zero-padded length 2*LO shorter than wave filter-N1 by the tap number of restriction filter.The indication of wave filter-N3 has the wave filter of having of the long zero-padded length 2*LO filter length FL shorter than wave filter-N1 by restriction filter tap number not.And wave filter-N4 indication has the wave filter with short zero-padded length 2*LO of the length of window FL shorter than wave filter-N1 by the tap number of restriction filter.

As mentioning in the above description, can utilize above four kinds of exemplary wave filters to solve the problem of temporal resolution.And,, different filter coefficients can be used for each territory for the afterbody of filter response.

Figure 27 is according to one embodiment of present invention by using a plurality of subfilters to come the figure of the method for audio signal dividually.A wave filter can be divided into subfilter with different mutually filter coefficients.By after utilizing the subfilter audio signal, can use the method for the results added that will handle.In the situation of filter response afterbody application space information with little energy, that is, to carry out in the situation of playing up at the wave filter that has long filter tap by utilization, this method provides by predetermined length unit comes the function of audio signal dividually.For example, because the afterbody of wave filter does not have marked change for the every HRTF corresponding to each sound channel, so can carry out and play up by extracting the total coefficient of a plurality of windows.In this manual, the situation of carrying out has been described on the DFT territory.Yet, the invention is not restricted to the DFT territory.

With reference to Figure 27, after a wave filter FL was divided into a plurality of subareas, can there be a plurality of subfilters with mutually different filter coefficients (wave filter-A and wave filter-B) handle in these a plurality of subareas.

Subsequently, be combined in together through wave filter-output of A processing and the output of handling through wave filter-B.For example, to carrying out IDFT (discrete Fourier inverse conversion) to generate time-domain signal through wave filter-output of A processing and each in the output that wave filter-B handles.And, the signal that is generated is added up.In this case, the position that has added the output of handling through wave filter-B by the FL time delay than the time of Duoing through the position of the output of wave filter-A processing.Like this, the signal of handling through a plurality of subfilters has brought and the identical effect of situation of being handled this signal by single filter.

And, the present invention includes the method for directly playing up the output of handling through wave filter-B to down-mix audio signal.In this case, can by use the coefficient that extracts from spatial information, partly this output is played up to down-mix audio signal in usage space information or non-usage space information ground.

The method is characterized in that and the wave filter with long tap number can be used dividually, and the wave filter afterbody with little energy does not utilize spatial information to do conversion can to use.In this case, if the not conversion of applications exploiting spatial information, then not with different filter applies in each window that deals with.So, there is no need to use with piece and switch identical scheme.Figure 26 illustrates the wave filter that is divided into two districts.Yet the present invention can be divided into wave filter a plurality of districts.

Figure 28 is a block diagram of playing up the method for cutting apart the information of playing up that is generated by a plurality of subfilters according to one embodiment of present invention to monaural down-mix audio signal.Figure 28 relates to one and plays up coefficient.This method can whenever be played up coefficient ground and carry out.

With reference to Figure 28, the wave filter of Figure 27-A information is cut apart corresponding to first and is played up information HM_L_A, and the wave filter of Figure 27-B information is cut apart corresponding to second and played up information HM_L_B.Figure 28 illustrates the embodiment that is divided into two subfilters.Yet, the invention is not restricted to this two subfilters.These two subfilters can utilize the information of the playing up HM_L that generates in spatial information generation unit 1000 to obtain via split cells 1500.Perhaps, these two subfilters can be utilized prototype HRTF information or the information selecting to determine according to the user obtains.The information of selecting to determine according to the user can comprise the spatial information of for example selecting according to consumer taste.In this case, HM_L_A is based on the information of playing up of received spatial information, and HM_L_B can be the information of playing up that is used to provide the 3 dimension effects that are applied to signal usually.

As mentioning in the above description, utilize the processing of a plurality of subfilters not only to can be applicable to the DFT territory, also can be applicable to time domain and QMF territory.Particularly, the coefficient value that is split by wave filter-A and wave filter-B is played up by time domain or QMF territory and is applied to down-mix audio signal, is added then to generate final signal.

Rendering unit 900 comprises that first cuts apart rendering unit 950 and second and cut apart rendering unit 960.First cuts apart rendering unit 950 utilizes HM_L_A to carry out render process, and second cut apart rendering unit 960 and utilize HM_L_B to carry out render process.

If wave filter-A as shown in figure 27 and wave filter-B are same wave filters according to the fractionation of time, can consider that then suitable delay is with corresponding to this time interval.Figure 28 illustrates the example of monaural down-mix audio signal.In the situation of using monophony sound channel reduction audio signal and decorrelator, not to be applied to decorrelator corresponding to the part of wave filter-B, but be applied directly to this monaural down-mix audio signal.

Figure 29 is a block diagram of playing up the method for cutting apart the information of playing up of utilizing a plurality of subfilters generations according to one embodiment of present invention to stereosonic down-mix audio signal.

The similar part of the process of cutting apart render process and Figure 28 shown in Figure 29 is by using the information of playing up, prototype hrtf filter information or the user that are generated by spatial information converting unit 1000 to determine information to obtain two subfilters in splitter 1500.Be jointly to be applied to the L/R signal with the difference of Figure 28 corresponding to the render process of cutting apart of wave filter-B.

Particularly, splitter 1500 generates and cuts apart the information of playing up, second corresponding to first of the information of wave filter-A and cut apart the information of playing up and cut apart the information of playing up corresponding to the 3rd of the information of wave filter-B.In this case, but the 3rd cut apart the information of playing up and can generate in the filter information or the spatial information of L/R signal by using common application.

With reference to Figure 29, rendering unit 900 comprises that first cuts apart rendering unit 970, second and cut apart rendering unit 980 and the 3rd and cut apart rendering unit 990.

Generated the 3rd cut apart the information of playing up the 3rd cut apart be applied to the L/R signal in the rendering unit 990 with signal to generate an output signal.With this output signal and the addition of L/R output signal, the L/R output signal is to cut apart in

rendering unit

970 and 980 first and second respectively independently to be played up to generate around signal by wave filter-A1 and wave filter-A2.In this case, the 3rd output signal of cutting apart rendering unit 990 can be carried out addition after suitably postponing.In Figure 29, omitted the expression of playing up information to another sound channel application intersection from the L/R input for the ease of explanation.

Figure 30 is the block diagram of the first territory conversion method of down-mix audio signal according to an embodiment of the invention.Up to the present the render process of carrying out on the DFL territory has been described.As mentioning in the above description, except the DFL territory, render process also can be carried out on other territory.Yet Figure 30 is illustrated in the render process of carrying out on the DFT territory.Territory converting unit 1100 comprises QMF wave filter and DFL wave filter.Inverse conversion unit, territory 1300 comprises IDFT wave filter and IQMF wave filter.Figure 30 relates to monaural down-mix audio signal, and this is not construed as limiting the present invention.

With reference to Figure 30, there is the time domain down-mix audio signal of p sample to pass through the QMF wave filter to generate the P sub-band samples.W sample gathered on every frequency band ground again.After being windowed, the sample execution that collects again carries out zero padding.Carry out M point DFT (FFT) then.In this case, DFT realizes handling by windowing of the above-mentioned type.The value that M/2 frequency domain value of every frequency band that will be obtained by M point DFT is connected to P frequency band can be considered as appropriate value by the frequency spectrum of M/2*P point DFT acquisition.So, will the filter coefficient of representing on the M/2*P point DFT territory multiply by this frequency spectrum with bring with the DFT territory on play up identical effect.

In this case, the signal by the QMF wave filter has and sews, for example, and the aliasing between the nearby frequency bands.Particularly, penetrate into current frequency band corresponding to the value of nearby frequency bands, and the value that exists in the current frequency band moves on in the adjacent frequency band.In this case, integrate, then because the QMF characteristic can be recovered original signal if carry out QMF.Yet if filtering is to carry out on the signal of frequency band as the situation among the present invention, signal is sewed and is distorted owing to this.For this problem is minimized, can increase the process that is used for recovering original signal in the following manner: make signal after the QMF of territory converting unit 100, before DFT is carried out on every frequency band ground, carry out inverse process after minimizing butterfly wave filter and the IDFT in inverse conversion unit, territory 1300 by sewing.

Simultaneously, for the generative process that makes in the spatial information converting unit 1000 information of playing up that generates generative process coupling, be not when beginning, to carry out M/2*P point DFT but replace the signal execution DFT that passed through QMF with acquisition prototype filter information with down-mix audio signal.In this case, may there be delay and the data expansion that causes by the QMF wave filter.

Figure 31 is the block diagram of the second territory conversion method of down-mix audio signal according to an embodiment of the invention.Figure 31 is illustrated in the render process of carrying out on the QMF territory.

With reference to Figure 31, territory converting unit 1100 comprises QMF territory converting unit, and inverse conversion unit, territory 1300 comprises IQMF territory converting unit.Configuration shown in Figure 31 is identical with the situation of only utilizing DFT---and only removing the territory converting unit is the QMF wave filter.In the following description, QMF refers to comprise the QMF with same band and mixes QMF.Be that with the difference of the situation of only utilizing DFT the generation of the information of playing up carries out on the QMF territory, and this render process is expressed as product on convolution rather than the DFT territory, because this render process of being carried out by renderer-M 3012 is carried out on the QMF territory.

Suppose that the QMF wave filter is provided with B frequency band, filter coefficient can be expressed as the one group of filter coefficient that has different qualities (coefficient) for this B frequency band.Sometimes, if the filter tap number becomes single order (that is, multiply by a constant), then render process with B frequency spectrum and the calculating process on the DFT territory mates.The QMF frequency band (b) that mathematics calculation 31 expressions are played up a paths of information HM_L execution render process at utilization is gone up the render process of carrying out.

Mathematics calculation 31

Lo_m_{b} (k) = HM_L_{b} * m = Σ_{i = 0}^{filter_order - 1} hm_l_{b} (i) m_{b} (k - 1)

In this case, the sequential in the k indication QMF frequency band, that is, and the time slot unit.The advantage of the render process of carrying out on the QMF territory is: if the spatial information that transmits is the value that can be applicable to the QMF territory, then the application of Dui Ying data is most convenients, and the distortion during the application is minimized.Yet, in the situation of the conversion of the QMF territory in prototype filter information (for example, the prototype filter coefficient) transfer process, use the sizable operand of process need through the value of conversion.In this case, can be by in the filter information transfer process, operand being minimized the parameterized method of HRTF coefficient.

Industrial applicibility

Therefore, signal processing method of the present invention and device use the spatial information that provides by scrambler with in the decoding device that can generate multichannel by utilizing hrtf filter information or generating around signal according to user's filter information.And, the present invention only be applicable to very much can the reproduction of stereo signal various demoders.

Can make various modifications and distortion and can not break away from the spirit or scope of the present invention the present invention although described with reference to the preferred embodiments of the present invention and illustrated that the present invention, those skilled in that art are appreciated that.Therefore, the present invention is intended to contain all such modifications of the present invention and the distortion in the scope that falls into appended claims and equivalence techniques scheme thereof.

Claims

1. the method for a processing signals, described method comprises:

From bit stream, extract down-mix audio signal;

Generate down-mix audio signal by decorrelator being applied to described down-mix audio signal through decorrelation; And

By will play up information be applied to described down-mix audio signal and described down-mix audio signal through decorrelation generate have surrounding effect around signal.

2. the method for claim 1 is characterized in that, described application of playing up information is carried out on one of time domain, frequency domain, QMF territory and hybrid domain.

3. the method for claim 1 is characterized in that, the described information of playing up is to be used for the filter information of described surrounding effect and the spatial information that extracts from described bit stream generates by use.

4. the method for claim 1 is characterized in that, comprises also that territory with described down-mix audio signal is converted to generate described territory around signal therein.

5. method as claimed in claim 4 is characterized in that, described territory of playing up information equals the territory of generation around signal.

6. the method for claim 1 is characterized in that, described decorrelator has all-pass characteristics.

7. the method for claim 1 is characterized in that, described down-mix audio signal is a monophonic signal.

8. the device of a processing signals, described device comprises:

Correlated elements, it generates down-mix audio signal through decorrelation by the down-mix audio signal that decorrelator is applied to extract from bit stream; And

Rendering unit, its by will play up information be applied to described down-mix audio signal and described down-mix audio signal through decorrelation generate have surrounding effect around signal.

9. device as claimed in claim 8 is characterized in that, described rendering unit generates described around signal on one of time domain, frequency domain, QMF territory and hybrid domain.

10. device as claimed in claim 8 is characterized in that, the described information of playing up is to be used for the filter information of described surrounding effect and the spatial information that extracts from described bit stream generates by use.

11. device as claimed in claim 8, it is characterized in that the described information of playing up comprises that being applied to down-mix audio signal sound channel and in the down-mix audio signal sound channel of decorrelation one first plays up information and be applied to described down-mix audio signal sound channel and described in the down-mix audio signal sound channel of decorrelation one transmits then second plays up information on another sound channel what transmit on the same sound channel then.

12. device as claimed in claim 8 is characterized in that, also comprises the territory converting unit, the territory of described down-mix audio signal is converted to generates described territory around signal therein.

13. device as claimed in claim 12 is characterized in that, described territory of playing up information equals the territory of generation around signal.

14. device as claimed in claim 8 is characterized in that, described decorrelator has all-pass characteristics.

15. device as claimed in claim 8 is characterized in that, described down-mix audio signal is a monophonic signal.