The application be submitted on August 21st, 2002, application number is 02130245.6 divide an application.
Embodiment
Coding of the present invention and coding/decoding method almost are to separate fully with identical to the processing of each sound channel, and therefore the following description of this invention is all based on a sound channel, unless otherwise indicated.
Coding
[embodiment 1]
As shown in Figure 2, encoder of the present invention is by frame length selector 1, sub-band division bank of filters 2, transient detector 3, bit distributor 4, and quantized subband device 5, and multiplexer 6 constitutes, and is used for the audio signal of input is carried out compressed encoding.This audio signal can be the audio signal of Digital Television.Below by detailed description the operation principle of encoder is described to each assembly.
Frame length selector 1:
The effect of frame length selector 1 is according to certain frame length branch framing the audio signal of input.As for the selection of frame length, it is generally acknowledged that frame length is big more, encoding compression efficient is high more, but coding delay is big more, and encoder is also big more to the demand of mnemon.Therefore, generally select smaller frame length when code check is high for use, generally select the frame length of some greatly when code check is low for use.Because coding delay is directly proportional with the product of frame length and sample rate, when selecting frame length, must consider sample rate.
When audio signal is the sound accompaniment of vision signal, when selecting frame length, except will considering above factor, must be considered as some specific (special) requirements of the sound accompaniment of vision signal.The most outstanding in these specific (special) requirements, also be one of the problem to be solved in the present invention simultaneously, be the audio code stream after being compressed can be done synchronous montage with vision signal.The audio code stream after the present invention refers to be compressed with " montage synchronously " and the reasonable clip point of video code flow are fully synchronously in time, so that after their any one clip point is done montage to audio-video code stream simultaneously, audio-video code stream is not made mistakes, and decoded audio-video signal can both seamlessly transit in clip point.
The basic demand that realizes synchronous montage is the shared time of a frame audio code stream must a simple multiple relation be arranged with shared time of a frame video code flow.Otherwise, on the border of frame of video, will cut inside during the montage video code flow at audio frame, destroy the structure of audio frame, cause that decoder makes mistakes.We consider two kinds of situations.First kind is the integral multiple of audio frequency frame length or equal situation for the video frame length.Can do montage this moment on the border of each frame of video, and this clip point always drops on the border of audio frame.
For the vision signal of 25 frame/seconds (PAL), the shared time of its every frame is 0.04 second.With respect to the audio signal with the 48000kHz sampling, it is equivalent to 1920 samples.Because 1920=3x5x2
7, any combination of these factors and the audio frequency frame length that forms all can be done synchro edit with the vision signal of 25 frame/seconds.
For the vision signal of 24 frame/seconds (film), with the sample number of the frame audio signal (48000kHz sample rate) of its equity be 2000=2
4X5
3In like manner, the audio frequency frame length that is formed by these combinations of factors all can be done synchronous montage with vision signal.
Another restriction that audio frame length is selected is that it must be the integral multiple of the sub band number of the back sub-band division bank of filters that will experience.If greatest common factor (G.C.F.) 2 from 1920 and 2000
4Select among the x5 some the combination as sub band number, we then can under the condition of constant moving sub band number, realize and 24 frame/seconds and 25 frame/second these two kinds of vision signals synchro edit.In the description of sub-band division bank of filters, will be further described this.
Consider second kind of situation now, promptly the audio frequency frame length is the situation of the integral multiple (as V times) of video frame length.In order to ensure the integrality of audio frame, the clip point of an audio signal must just be arranged every V frame of video.The video code flow that was compressed often will just have a clip point every several frames, and this also may be dynamic at interval.Suppose that this dynamic maximum at interval is W, the maximum montage that then can satisfy the restriction of video and audio clips simultaneously is spaced apart WV frame of video.This obviously needs very big memory space.But method described above is appointed right being suitable for.
Sub-band division bank of filters 2
Sub-band division bank of filters 2 is used for resolving into M subband signal passing the audio signal of coming by frame length selector 1.Various bank of filters [list of references 8] be can adopt here, perfect reconstruction and non-perfect reconstruction bank of filters included but not limited to, quadrature and nonopiate bank of filters, tree-shaped bank of filters, the bank of filters of cosine-modulation, Wavelet Packet etc.In order to guarantee the flatness of audio signal after the code stream through the present invention's coding is carried out montage, overlapping long bank of filters is used in special recommendation.
Because the low and simplicity of design of operand, the bank of filters of cosine-modulation is a preferred filter group of the present invention.Its k sub-filter is [list of references 8]:
Wherein n is a sample index, and k is the subband index, and M is a sub band number, p
0() is prototype filter, and N is the length of prototype filter,
The present invention has certain restriction to the sub band number M of subband analysis filter bank 2, and its principle is that sub band number M must be the common factor of the different frame of video frequency of correspondence by the audio frequency frame length of frame length selector 1 selection.Also promptly, all available audio frequency frame length all can be by following equation expression:
Fsize=k*M (2)
Wherein Fsize is the audio frequency frame length, and k is a positive integer.Like this, when the frame frequency of video changed, 1 of frame length selector needed to select another k, just can be under the constant condition of maintenance sub band number M, select one with new video frame frequency corresponding audio frame length.Keep that sub band number is constant just to mean that the various delay lines of encoder are constant.When the frame rate of vision signal changed, encoder need not reset, and only used the k in the following formula is adjusted accordingly to adapt to this variation reposefully.
For example, mentioned when describing frame length selector 1, for the audio signal with the 48kHz sampling, 2000 can be used as the audio frequency frame length with respect to the vision signal of 24 frame/seconds, and 1920 can be used as the audio frequency frame length with respect to the vision signal of 25 frame/seconds.If selection sub band number M is 2000 and 1920 greatest common factor (G.C.F.) 80, then need only select k=25 and 24 keeping realizing 2000 and 1920 audio frequency frame length respectively under the constant condition of sub band number.Certainly, other common factor as 40 and 20 etc., all can reach same effect.Table 1 has been listed the sub band number of preferred audio frequency frame length of the present invention and bank of filters:
Table 1
Short frame length in the last table, as 200,240,400,480 etc., provide low delayed mode for this audio compression method obviously.Therefore, the present invention has low delayed mode to not overslaugh of restriction the present invention of audio frequency frame length and sub band number.
Transient detector 3
Enter transient detector 3 immediately from the subband signal of sub-band division bank of filters 2 outputs.The certain detection yardstick of transient detector 3 foundations is analyzed the transient state situation of every frame subband signal, then a frame subband signal further is divided into transient state section and stable state section, and exports every section positional information.Needing ben is that these fragment position change with the transient state situation adaptively.The present invention is base unit to the subsequent treatment of subband signal with section (to call sub band in the following text).
With subband signal shown in Figure 3 is example, and detector 3 can judge that A is the stable state section, and B is the transient state section, and C is the stable state section, and exports every section positional information such as segment length 30,20 and 50.
The yardstick that is used for the transient state detection includes but not limited to the energy of subband signal, the logarithm of energy, energy entropy etc.Detection technique can be simple Threshold detection, also can be some very complicated technology, includes but not limited to famous K-Means algorithm [list of references 8].
Bit distributor 4:
Bit distributor 4 can be each sub band allocation bit [list of references 1--9] of each sound channel according to signal to noise ratio (snr) or signal to hiding than (SMR) with present technique field method commonly used.The result of Bit Allocation in Discrete is the bit number of each sub-band samples that is used to quantize each sub band of each sound channel.This result is fed to quantized subband device 5 and multiplexer 6.
Quantized subband device 5
Quantized subband device 5 comprises one group of a plurality of quantizer [list of references 9] that has different quantification manners and quantize progression.The selection of this group quantizer is very big to the compression efficiency influence with configuration.Because the probability distribution inhomogeneous (especially at quantized level after a little while) of quantification index is so general compress technique adopts scalar quantization to add the method for entropy coding (as the Huffman sign indicating number) more.But the coding/decoding method complexity of entropy coding, operand is big and inhomogeneous, for the commerce of decoder realizes having brought many difficulties.
For this reason, the present invention preferably adopts at quantized level and uses vector quantization after a little while, and quantized level is used scalar quantization for a long time.
Quantized subband device 5 chooses a concrete quantizer for each sub band from above-mentioned quantizer group according to the result of Bit Allocation in Discrete, and quantizes each sub-band samples in this sub band with this quantizer.
Quantizing process to each sub-band samples in each sub band divides following four steps (Fig. 2):
1) estimate scale factor: scale factor estimator 51 can be with the maximum of the absolute value of all samples of this sub band, the variance of all samples of this sub band, or other variable is as scale factor.
2) quantization scaling factor: scale factor itself also needs to quantize so that send decoder to.Because people's ear increases with volume the susceptibility of volume and reduces, so the quantification of the Comparative Examples factor should be adopted nonlinear way, as to quantification.With the variance is example, the variance of establishing d of k subband of c sound channel section be σ (c, k, d), then scale factor quantizer 52 for this scale factor select quantification index be:
Wherein α is a quantization step.
3) normalization sub-band samples: sub-band samples normalization device 53 usefulness quantize the scale factor of back reconstruction to all the sample normalization in this sub band.
4) quantize sub-band samples: the bit number that quantizer Chooser 54 is sent here according to bit distributor 4 is chosen concrete quantizer, and its sub-band samples after to normalization of sub-band samples quantizer 55 usefulness quantizes to obtain the quantification index of each sample then.
Multiplexer (MUX) 6
Multiplexer 6 following information package that above each encoder component produced together to form a complete bit stream or code stream.
The audio frequency frame length that frame length selector 1 is selected.
The fragment position information of transient detector 3 outputs.
The bit number that bit distributor 4 distributes for each sub band.
The quantification index of the scale factor that quantized subband device 5 produces and the quantification index of each sub-band samples.
Multiplexer 6 also can be other supplementary packing.These supplementarys include but not limited to the sample frequency of input audio signal, audio amplifier setting, error correcting code, timing code etc.
[embodiment 2]
The encoder of the second embodiment of the present invention as shown in Figure 4.Its most of assembly is identical with embodiment's 1, and difference is the Bit distribution method of the uniqueness of present embodiment.Specifically, the Bit Allocation in Discrete of present embodiment do not resemble the embodiment 1 according to signal to noise ratio (snr) with or signal come to be each sub band allocation bit to hiding than (SMR), but come to be each sub band allocation bit with following formula according to the scale factor quantification index of scale factor quantizer 52 outputs.
b(c,k,d)=f(α·s(c,k,d))-θ(k)-β (3)
Wherein:
1) (c, k d) are the bit number of distributing to each sample of current sub section to b.
2) (α s (c, k, d)) is a strictly monotone increasing function to f.It can preferably be made as f (α s (c, k, d))=[α s (c, k, d)]
q, 0<q≤2 wherein.It also can further preferably be made as f (α s (c, k, d))=α s (and c, k, d).
3) θ (k) can preferably be made as shown in Figure 5 the curve of approximation [list of references 1,2 and 10] of the threshold of hearing (Threshold in Quite) of people's ear under quiet environment, more can preferably be made as zero and calculate to simplify.
4) β is that bit is adjusted the factor.
From formula (3) as can be seen, adjust under the definite situation of factor-beta at bit, the bit number that is assigned to each sample of each sub band depends on the quantification index of its scale factor fully.
Clearly, to distribute to the bit of each sub band many more for β more little (3) formula; The bit that β big more (3) formula is distributed to each sub band is few more.The task of Bit Allocation in Discrete is that the summation at the shared bit number of each sub band is no more than the minimum value that finds β under the condition of total bit number that given target bit rate (bit rate) allows to distribute to every frame audio signal.
Bit Allocation in Discrete can be a global optimum, and the public Bit Allocation in Discrete of also promptly all sound channels is adjusted the factor.Suppose that the total bit number that allows to distribute to every frame audio signal following of a given target bit rate is that (this is the total value that is deducting after transmitting the required bit number of various supplementarys to Total_Bits, supposition that such was the case with later on, explanation no longer in addition), bit is adjusted factor searcher 41 and must be searched for the different minimum β value of β value to find one to satisfy following condition:
Wherein (c, k d) are the sample number of sub band to n.
The β that tries to achieve thus just can be used to be each sub band allocation bit of all sound channels according to formula (3).Simultaneously, encoder need only be adjusted the factor to this Bit Allocation in Discrete pass to decoder, the Bit Allocation in Discrete that decoder also just can be used for each sub band reconstruction encoder of all sound channels according to formula (3) according to the quantification index of it and scale factor.
Bit Allocation in Discrete also can be local optimum respectively, has an independent Bit Allocation in Discrete respectively as each sound channel and adjusts the factor.Suppose that total bit number of distributing to every frame audio signal of c sound channel according to certain predetermined mode under a given target bit rate is Total_Bits (c).Bit is adjusted factor searcher 41 must search for the different minimum β value of β value to find one to satisfy following condition:
At this moment, encoder must be adjusted the factor to decoder for each sound channel transmits a Bit Allocation in Discrete.Obviously, this method can very directly be extended to other other form and share the situation that Bit Allocation in Discrete is adjusted the factor.
In sum, the Bit Allocation in Discrete program of present embodiment is as follows:
1) all proportions factor quantification device 52 of each sound channel is given bit the scale factor quantification index of each sub band and is adjusted factor searcher 41.
2) bit is adjusted factor searcher 41 and is searched out one in the Bit Allocation in Discrete adjustment factor to the global optimum under the constant bit rate according to formula (3) and (4), and passes to multiplexer 6 and bit distributor 42.Perhaps, bit is adjusted factor searcher 41 and is being adjusted the factor for each sound channel searches out one respectively to the Bit Allocation in Discrete to each sound channel local optimum under the constant bit rate according to formula (3) and (5), and they are passed to multiplexer 6 and bit distributor separately 42 respectively.
3) bit distributor 42 is each sub band allocation bit by formula (3), and passes to its corresponding quantitative device Chooser 54.
Therefore Bit Allocation in Discrete of the present invention has only utilized the human auditory system model to reach the purpose of simplifying Bit Allocation in Discrete very limitedly.So both can reduce the computation complexity of Bit Allocation in Discrete greatly, and also can only adjust the factor and express the bit distribution result with a Bit Allocation in Discrete.Only must comprise this Bit Allocation in Discrete in the code stream behind coding and adjust the factor.Decoder can accurately be rebuild the Bit Allocation in Discrete that encoder is used with formula (3) very simply according to it after receiving this parameter.So just save other technologies and be used to transmit the required bit resource of bit number that is assigned to each sub band.The bit resource that these save can be used for transmitting the quantification index of sub-band samples, thereby can further improve tonequality.
[embodiment 3]
The encoder of the third embodiment of the present invention as shown in Figure 6.Its most of assembly is identical with other embodiment's, and difference is that present embodiment Duoed a combined strength encoder 7 than other embodiment.The theoretical foundation of combined strength coding is people's ear to the space orientation of sound at high frequency when being higher than 7kHz (as) mainly according to the intensity of sound.As shown in Figure 6, when coding this combined strength encoder 7 can by about (or other can be united) sound channel transient detector 3 outputs high-frequency sub-band add up, only pass the quantification index of each sample of this and subband (be called source sound channel by the subband of combined coding), add by the intensity index of being united subband of associating sound channel, to reach the purpose of saving bit.
When using combined strength when coding, must be considered by the Bit Allocation in Discrete of the sub band of combined coding that to the source sound channel other is united the bit demand of the same sub band of sound channel coding.Suppose that the source sound channel is c, other are J by the general collection of the sound channel of combined coding, then calculate to the c sound channel by the Bit Allocation in Discrete of the sub band of combined coding the time scale factor that should adopt be:
May cause that space orientation narrows down if the frequency that the combined strength coding begins to enable is crossed to hang down, therefore, in the present embodiment, only when hanging down code check, just quote the combined strength coding.
Embodiment 4
The encoder of the fourth embodiment of the present invention as shown in Figure 7.Its most of assembly is identical with other embodiment's, difference be present embodiment than other embodiment Duoed one stride sound channel for a long time and short-term linear predictor 8.To each cross-talk band signal, the present invention searches for the short-term and the long-term autocorrelation of it and this subband, and with association's correlation of the signal of the same subband of other sound channel, with find one make the predicated error minimum stride the long-term and short-term linear predictor 8 of sound channel.If (c, k n) are n sample of k subband of c sound channel to x, and then the linear prediction to the c sound channel based on the s sound channel is
Wherein, a (m) and b (m) are respectively the coefficient of the predictive filter of short-term and long-term predictor, and τ is the delay of long-term forecast filter.When s=c, above-mentioned prediction is fully based on this sound channel, m
0=1; When s ≠ c, sound channel is striden in above-mentioned prediction, m
0=0.
With (7) formula to each sub-band samples x (c, k, n) make prediction after, just can obtain the predicated error of a correspondence:
The task of encoder is to find the predictive coefficient a (m) of one group of the best, and b (m) and delay τ are so that in total predicated error of this sub band such as following mean square error minimum
If (periodicity n) is stronger for c, k, and then the variance of predicated error can (n) variance Ben Shen be little a lot of for c, k than subband signal x for subband signal x.The prediction gain that this means linear prediction is very high, predicts very successful.This moment, ((c, k n) delivered to quantized subband device 5 n) to replace x for c, k with regard to the available predictions error signal e.Otherwise just directly (c, k n) deliver to quantized subband device 5 subband signal x.Therefore, striding sound channel working procedure long-term and short-term linear predictor 8 can be summarized as follows:
1) estimates predictive coefficient and prediction gain.
2) if the prediction gain height adopts decision, predictive coefficient and the delay of linear prediction to deliver to multiplexer 6 this sub band.Simultaneously, produce predictive error signal, and give quantizer 5 it by (8) formula.
3), do not adopt the decision of linear prediction to deliver to multiplexer 6 this sub band if prediction gain is not high.Simultaneously, (c, k n) give quantizer 5 with the sample x of this sub band.
When using the prediction of this sound channel,, must closed loop carry out [list of
references 8,9 and 11] to the quantification of predictive error signal, as shown in Figure 8 for fear of quantization error diffusion when decoding.Because decoder can only obtain the quantification index by the predicated error of
quantizer 5 output, decoder can only be rebuild predicated error with
inverse quantizer 9, then it and predicted value
The phase Calais rebuilds sub-band samples.And this sound
channel fallout predictor 81 also can only be predicted following sub-band samples with the sub-band samples of these reconstructions.The sub-band samples that the sub-band samples of substitution also promptly, (7) formula comes down to rebuild.
Similarly, when striding the sound channel prediction, for fear of quantization error diffusion when decoding, (s, k n) also must be the sub-band samples of rebuilding after having decoded, as shown in Figure 9 to x.Notice that 9 fallout predictor 82 is to stride the sound channel fallout predictor among the figure, because the sub-band samples of its input comes from another sound channel.
Striding sound channel fallout predictor 82 and this sound channel fallout predictor 81 can be simultaneously and usefulness, as shown in figure 10.At first, calculate the error of striding the sound channel prediction:
Then, this error is done the prediction of this sound channel:
Stride sound channel when prediction when using, encoder must guarantee that the source sound channel is decoded when decoding.That is to say encoder must guarantee to the decoding of each sound channel must because of follow certain can the realization order: first sound channel of decoding necessarily can only be with the prediction of this sound channel, and second sound channel can only be striden sound channel with the prediction of this sound channel or decoded that sound channel in front and be predicted.By that analogy.
Encoder can be searched for all decoding order realized to find the decoding order of prediction gain maximum.Encoder also can only be done limited search to obtain suboptimal solution.
Prediction example 1: long-term or short-term forecast
Because long-term and short-term forecast and time spent estimate that predictive coefficient is difficult, thus can adopt or for a long time or the mode of short-term forecast to reduce complexity:
At this moment, be short-term forecast, and be long-term forecast when τ is big in τ hour.
When above formula is used in the situation of striding sound channel prediction and this sound channel prediction while and usefulness, at first calculate the residual error of striding the sound channel prediction:
Wherein, postpone τ
1Desirable minimum value is zero.Then, this residual error is done the prediction of this sound channel:
Wherein, postpone τ
2Desirable minimum value is 1.
Prediction example 2: the sound channel of striding of limited search is predicted
Stride the complexity of sound channel prediction on search decoding order in order to simplify, this example finds the optimum order of two sound channels earlier.Sound channel is subsequently only searched for this sound channel prediction and predicted with the sound channel of striding that two sound channels are made the source sound channel.
In order further to reduce complexity, can not do any order search, always with the source sound channel of striding the sound channel prediction of first sound channel as all other sound channels.
Preferably, encoder of the present invention can also be striden sound channel and/or the prediction of this sound channel to audio frame before audio frame is input to sub-band division bank of filters 2, promptly between frame length selector 1 and sub-band division bank of filters 2, further comprise one and stride sound channel and/or this sound channel fallout predictor, after this, predicated error is input to sub-band division bank of filters 2, and according to similar processing described above to coded prediction error.
The coding flow process:
Aforesaid four embodiment can become a complete encoder separately.If but their all functions were all gathered encoder of composition, then could reach optimal compression efficiency.Coding flow process when Figure 11 shows all functions of four embodiment of the present invention are merged together.Wherein, the combined strength coding, it is optionally independent striding prediction of sound channel shot and long term and overall Bit Allocation in Discrete.When in them any one was not selected, it only played the function that data that a handle sends into do not spread out of with making any changes in Figure 11.Certainly, selected if the bit distributor among Figure 11 does not have, another general bit distributor of discussing in embodiment 1 must be introduced into to realize same function.Below in conjunction with Fig. 2,4,6,7,8,9 and 10 describe coding step of the present invention (referring to Figure 11).
E1) frame length is selected: frame length selector 1 receives from the sample of the audio signal of each sound channel input, sample rate according to audio signal, target bit rate, (when multi-channel audio signal during as the sound accompaniment of vision signal) selects the audio frequency frame length with the frame of video frequency, then frame length information sent to other assembly of encoder.Because encoder of the present invention and method thereof are to be base unit with the frame, all component of encoder in each step of coding flow process, is all used frame length directly or indirectly.But succinctly bright and clear for what describe, Figure 11 does not all mark the transfer path of frame length information.After frame length was determined, frame length selector 1 also divided framing the sample of input audio signal by frame length, and a frame one frame send into sub-band division bank of filters 2.This step is not done any processing to input audio signal itself.
E2) sub-band division: sub-band division bank of filters 2 is decomposed into M subband signal to the audio signal of each sound channel.
E3) transient state detects: transient detector 3 is analyzed the transient state situation of each subband signal, and in view of the above it is divided into transient state section and stable state section.Then, the positional information of each section is sent to other assembly of encoder.Owing to step e 1 similar reason, Figure 11 not temporarily/transfer path of the positional information of stable state section all marks.This step is not done any processing to subband signal itself.
E4) combined strength coding: combined strength encoder 7 is abandoned by the sample of associating subband after finishing the combined strength coding, and only the intensity index with its each sub band sends Bit Allocation in Discrete (E7) device 4 and multiplexed (E9) device 6 to.This step is that the present invention is preferred when hanging down code check.Do not adopt this step not influence the integrality of encoder or method, just code efficiency has decline.
E5) stride the long-term and short-term forecast of sound channel: stride the long-term and short-term linear predictor 8 of sound channel after finishing linear prediction, will whether adopt the decision of prediction to pass to multiplexed (E9) device 6.If prediction is adopted in decision, also the delay and the predictive coefficient of predictive filter are passed to multiplexed (E9) device 6, and replace subband signal with predictive error signal and pass to quantized subband device 5.This step is that the present invention is preferred, does not adopt this step not influence the integrality of encoder or method, and just code efficiency has decline.
For convenience of description, Figure 11 was divided into for two steps to the function of subband quantizer 5: scale factor (E6) and vector/scalar quantization (E8).
E6) scale factor: quantized subband device 5 from subband signal (having adopted linear prediction as decision, then is predictive error signal, below is referred to as subband signal) with temporarily/the stable state section is estimated by unit and quantization scaling factor.Then, the quantification index of scale factor is sent to Bit Allocation in Discrete (E7) device 4 and multiplexed (E9) device 6.This step is not done any processing to subband signal itself.
E7) Bit Allocation in Discrete: bit is adjusted the quantification index of factor searcher 41 according to the scale factor of being imported by step e 6, and by the intensity index (if the combined strength coding is selected) of E4 input, search out optimum Bit Allocation in Discrete and adjust the factor, and it is passed to multiplexed (E9) device 6.Then, bit distributor 42 arrives each sub band to Bit Allocation in Discrete according to (3) formula again, and the assigned bit number of each sub band is passed to quantizer Chooser 54 use for vector/scalar quantization (step e 8).Above Bit distribution method is that the present invention is preferred.As preferred the method, must adopt other a Bit distribution method, to keep the integrality of encoder.This step is not done any processing to subband signal itself.
E8) vector/scalar quantization: the assigned bit number of each sub band that quantizer Chooser 54 is sent here according to Bit Allocation in Discrete (E7) device 4 is chosen a quantizer for each sub band, then it is sent to sub-band samples quantizer 55.Sub-band samples quantizer 55 is that unit quantizes each sub-band samples subsequently with the sub band, and its quantification index is passed to multiplexed (E9) device 6.
E9) multiplexed (MUX): multiplexer 6 becomes a complete audio frame to the quantification index of each sub-band samples and following supplementary packing (multiplexed) and with its output: the audio frequency frame length, temporarily/position of stable state section, intensity index (if the combined strength coding is selected), the decision whether employing is predicted, the delay of predictive filter and coefficient, the quantification index of scale factor and Bit Allocation in Discrete are adjusted the factor.Multiplexer 6 also can pack (multiplexed) export some other auxiliary data, as sample frequency, audio amplifier setting, error correcting code, timing code etc.
Decoding
Decoder of the present invention and coding/decoding method thereof are the inverse process of encoder and method thereof in itself. at this, decoding process and each parts of decoder are described according to Figure 12 and 13.
Decoding process:
Code stream through coding method of the present invention and encoder generation must come the decoding and re-establishing multiple acoustic track audio signal through following key step (Figure 12):
D1) unpack (DEMUX) supplementary: multiplexed de-packetizer 110 is separated and is contracted out following supplementary:
Frame length.
The position of all sub bands (temporary/the stable state section).
Bit Allocation in Discrete is adjusted the factor.
The scale factor quantification index of each sub band.
Whether each sub band adopts the decision of striding the long-term and short-term forecast of sound channel; If adopt, further separate the delay and the coefficient that contract out predictive filter.
The intensity index of the sub band of being encoded by combined strength
D2) Bit Allocation in Discrete: the scale factor quantification index that bit distributor 42 is adjusted the factor and each sub band according to the Bit Allocation in Discrete of input is each sub band allocation bit.The bit distributor 42 that this bit distributor and encoder are used is just the same, so still prolong with label 42.This Bit distribution method is that the present invention is preferred; Adopt other Bit distribution method as encoder, then must cross and not make this step.But must increase by one unpacks project to step D1 usually: separate from input code flow and contract out Bit Allocation in Discrete.
D3) unpack the quantification index of sub-band samples: multiplexed de-packetizer 110 is separated the quantification index that contracts out each sub-band samples according to the bit number that step D2 distributes from input code flow.
D4) re-quantization is rebuild sub-band samples: the quantification index that sub-band samples inverse quantizer 120 is separated the scale factor quantification index that contracts out according to step D1 and step D3 separates the sub-band samples that contracts out is rebuild each sub-band samples.
D5) stride the long-term and short-term forecast of sound channel: to each sub band, be certainly, then stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and predict if step D1 separates the prediction decision that contracts out.Otherwise, stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and do not do any processing.This step is that the present invention is preferred; Do not adopt as encoder and to stride the long-term and short-term forecast of sound channel, then must cross and not make this step.
D6) combined strength decoding: to each sub band of being encoded by combined strength, combined strength decoder 140 at first copies sub-band samples to this sub band from the source sub band.Then, separate the intensity index that contracts out according to step D1 and rebuild the intensity factor, and revise the sample value that copies this sub band to it.This step is that the present invention is preferred when hanging down code check; Do not adopt the combined strength coding as encoder, then must cross and not make this step.
D7) composite filter group: composite filter group 150 is the synthetic audio signal that is reconstructed into of sub-band samples.
If encoder of the present invention had carried out striding sound channel and/or the prediction of this sound channel to audio frame before audio frame is input to sub-band division bank of filters 2, must do to predict with reconstructed audio signals accordingly this moment to the signal of being rebuild by composite filter group 150.
Decoder:
Decoder of the present invention as shown in figure 13.Each parts are below described:
Multiplexed de-packetizer (DEMUX) 110:
Multiplexed de-packetizer 110 is responsible for separating from the code stream of compressed encoding contracting out the data of listing among decoding step D1 and the D3.It also is responsible for separating from the code stream of compressed encoding contracting out other auxiliary data, as sample frequency, and audio amplifier setting, error correcting code, timing code etc.
Bit distributor 42:
The bit distributor 42 that this bit distributor and encoder are used is just the same, so still prolong with label 42.The function of this bit distributor is that Bit Allocation in Discrete is adjusted scale factor quantification index substitution (3) formula of the factor and each sub band to draw the bit number of each sample of distributing to each sub band.
Sub-band samples inverse quantizer 120:
Sub-band samples inverse quantizer 120 is chosen quantizer according to the Bit Allocation in Discrete of each sub band.Then, rebuild sub-band samples with the scale factor of this quantizer and this sub band by the quantification index of sub-band samples.
Notice that stride the long-term and short-term forecast of sound channel when some sub bands have adopted, what then sub-band samples inverse quantizer 120 was rebuild is the predictive error signal of this sub band, rather than sub-band samples itself.
Stride the long-term and short-term forecast device 130 of sound channel:
To each sub band, be certainly if step D1 separates the prediction decision that contracts out, then stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and predict.Otherwise, stride sound channel all samples long-term and 130 pairs of these sub bands of short-term forecast device and do not do any processing.
Make fallout predictor such as Figure 14 of the prediction of this sound channel.This sound channel fallout predictor 81 wherein is just the same with this sound channel fallout predictor 81 that encoder is used, so still prolong with label 81.
Stride fallout predictor such as Figure 15 of sound channel prediction.Wherein stride that sound channel fallout predictor 82 and encoder use to stride sound channel fallout predictor 82 just the same, so still prolong usefulness label 82.
Predict simultaneously and the time spent when striding sound channel and this sound channel, make this sound channel earlier and predict, remake then and stride the sound channel prediction, as shown in figure 16.
Combined strength decoder 140:
To each sub band of being encoded by combined strength, combined strength decoder 140 at first copies sub-band samples to this sub band from the source sub band.Then, rebuild the intensity factor, and revise the sample value that copies this sub band to it according to intensity index.
Subband synthesis filter group 150:
Subband synthesis filter group 150 is the synthetic audio signal that is reconstructed into of sub-band samples.Subband synthesis filter group 150 is sub-band division bank of filters 2 contrary in the encoder, designs simultaneously.Also promptly, after the sub-band division bank of filters 2 in the encoder was determined, subband synthesis filter group 150 had also just been determined [list of references 8] fully.
Application scheme
What the present invention relates to is multi-sound channel digital audio compressed encoding/decoding technique, and its advantage comprises the compression efficiency height, and decoder is simple, decoded audio signal fidelity height, and can be applicable to height, in and the various application of low code check.Therefore, it is applicable to pure voice applications fully, as digital audio broadcasting etc.; Its decoder can be installed in the pure stereo set fully independently, as power amplifier, and walkman etc.
Because audio signal occurs with the form of the sound accompaniment of vision signal mostly, because the compression efficiency height of decoding method of the present invention, audio signal through the present invention's coding can be encoded with synchronous montage of vision signal and the file that can afford to stand more than ten times, thereby has following advantage in actual applications:
1) can satisfy the dispensing of program and the requirement of transmission simultaneously.
2) coding techniques of the present invention has greatly been simplified the link and the equipment of program dispensing.With the Digital Television is example, and Figure 17 shows the program dispensing of adopting present technique and the process of transmitting.Clearly, it is simpler many than the scheme (Fig. 1) of Dolby.
3) owing to saved transfer coding link repeatedly, present technique has greatly improved the fidelity of program delivery process.
4) owing to saved a plurality of encoder among Fig. 1, present technique has also greatly reduced the cost of program dispensing.
[1]ISO/IEC?13818-7,1997.
[2]ISO/IEC?14496-3,1998.
[3]S.Smyth,M.Smyth,and?W.P.Smith,“Multi-channelPredictive?Subband?Audio?conder?using?Psychoacoustic?AdaptiveBit?Allocation?In?Frequency,Time,And?Over?The?MultipleChannels,”US?Patent?5956674.
[5]S.Smyth,W.P.Smith,M.Smyth,M.Yan,and?T.Jung,“DTS?Coherent?Acoustics?Delivering?High?Quality?MultichannelSound?to?the?Consume,”AES?100th?Convention,1996.
[6]L.Fielder?and?C.Todd,“The?design?of?a?video?friendlyaudio?coding?system?for?distribution?applications,”AES?17
thInternaltional?Conference,pp.86-92,1999.
[7]C.Todd,G.Davidson,M.Davis,L.Fielder,B.Link,and?S.Vernon,“AC-3,Flexible?perceptual?coding?for?audiotransmission?and?storage,”96
th?AES?Convention,Amsterdam,1994.
[8]P.P.Vaidyanathan,“Multirate?systems?and?filterbanks,”Prentice?Hall,1993.
[9]A.Gersho?and?R.M.Gray,“Vector?quantization?andsignal?compression,”Kluwer,1992.
[10]B.C.J.Moore,“An?introduction?to?the?psychologyof?hearing,”Academic?Press,1997.
[11]A.M.Kondoz,“Digital?Speech,”John?Wiley&Sons,1994.