WO2013061584A1 - 音信号ハイブリッドデコーダ、音信号ハイブリッドエンコーダ、音信号復号方法、及び音信号符号化方法 - Google Patents
音信号ハイブリッドデコーダ、音信号ハイブリッドエンコーダ、音信号復号方法、及び音信号符号化方法 Download PDFInfo
- Publication number
- WO2013061584A1 WO2013061584A1 PCT/JP2012/006802 JP2012006802W WO2013061584A1 WO 2013061584 A1 WO2013061584 A1 WO 2013061584A1 JP 2012006802 W JP2012006802 W JP 2012006802W WO 2013061584 A1 WO2013061584 A1 WO 2013061584A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- frame
- audio
- decoding
- encoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 278
- 230000005236 sound signal Effects 0.000 title claims abstract description 212
- 230000008569 process Effects 0.000 claims abstract description 144
- 230000015572 biosynthetic process Effects 0.000 claims description 42
- 238000003786 synthesis reaction Methods 0.000 claims description 42
- 230000001052 transient effect Effects 0.000 claims description 22
- 238000006243 chemical reaction Methods 0.000 claims description 19
- 230000005284 excitation Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 61
- 239000002131 composite material Substances 0.000 description 20
- 238000004590 computer program Methods 0.000 description 17
- 230000007704 transition Effects 0.000 description 16
- 239000013598 vector Substances 0.000 description 7
- 239000000470 constituent Substances 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
Definitions
- the present invention relates to a sound signal hybrid decoder and a sound signal hybrid encoder capable of switching between a sound codec and an acoustic codec.
- a hybrid codec (for example, see Patent Document 1) is a codec that combines the advantages of an acoustic codec and a voice codec (for example, see Non-Patent Document 1). According to the hybrid codec, it is possible to encode a sound signal in which audio signal-based content and audio signal-based content are mixed by switching between the audio codec and the audio codec by an encoding method suitable for each. Therefore, according to the hybrid codec, stable encoding of a sound signal at a low bit rate is realized.
- the sound quality can be improved by using, for example, an AAC-ELD (Advanced Audio Coding-Enhanced Low Delay) mode as an acoustic codec.
- AAC-ELD Advanced Audio Coding-Enhanced Low Delay
- Patent Document 1 discloses signal processing at a location where the coding mode is switched in this way, but such processing is coding that requires overlap processing by a plurality of preceding frames as in the AAC-ELD mode. This method does not correspond to the method, and the method of Patent Document 1 cannot reduce the aliasing.
- An object of the present invention is to reduce aliasing that occurs in a switching portion between a speech codec and an acoustic codec when an encoding method that requires overlap processing using a plurality of preceding frames, such as an AAC-ELD mode, is used as an acoustic codec.
- a hybrid codec sound signal hybrid decoder and sound signal hybrid encoder
- An audio signal hybrid decoder includes an audio frame encoded by an audio encoding process using a low delay filter bank and an audio frame encoded by an audio encoding process using a linear prediction coefficient.
- a sound signal hybrid decoder for decoding a bitstream including: a low-delay transform decoder that decodes the acoustic frame using a low-delay inverse filter bank process; an audio signal decoder that decodes the audio frame;
- the decoding target frame of the bit stream is the acoustic frame
- the decoding target frame is decoded by the low-delay transform decoder.
- the decoding target frame is the audio frame, the decoding target frame is converted to the audio signal.
- a block switching unit that performs control of decoding by a decoder
- the target frame is the i-th frame that is the first audio frame that is switched from the acoustic frame to the audio frame
- the i-th frame is a frame that precedes the i-th frame by one frame.
- the first signal generated using the signal before encoding of the i-1 frame is included in an encoded state
- the block switching unit is (1) a frame that precedes the i-th frame by 2 frames.
- a window-processed signal obtained by decoding a certain i-2 frame by the low-delay transform decoder and a reconstructed signal of the i-3 frame, which is a frame three frames ahead of the i frame.
- a signal obtained by convolving the signal corresponding to the second half of the second signal frame with the signal corresponding to the first half of the second signal frame is added.
- a signal subjected to window processing a signal obtained by performing window processing on the first signal obtained by decoding the i-th frame by the audio signal decoder, and the i-1 frame from the low delay
- the signal of the first half of the frame of the third signal which is the portion corresponding to the i-3th frame of the signal subjected to the inverse filter bank processing and the window processing, is added to the i-1th before encoding.
- a signal corresponding to the first half of the frame is generated, and a signal obtained by convolving the signal corresponding to the first half of the second signal frame is added to the signal corresponding to the second half of the second signal frame.
- a signal that has been processed, a signal that has undergone convolution processing and window processing on the first signal, and a signal that corresponds to the second half of the frame of the third signal are added to perform the processing before encoding.
- I-1th A signal corresponding to the second half of the frame is generated, or (2) a signal obtained by convolving a signal corresponding to the second half of the second signal frame with a signal corresponding to the first half of the second signal frame.
- a signal that has been subjected to window processing by addition, a signal that has undergone convolution processing and window processing on the first signal, and a signal that corresponds to the first half of the frame of the third signal are added to perform coding.
- a signal corresponding to the first half of the frame of the second signal is generated and a signal corresponding to the first half of the frame of the second signal is convolved with a signal corresponding to the second half of the frame of the second signal.
- an audio codec And aliasing that occurs in the switching portion between the sound codec and the sound codec can be reduced.
- FIG. 1 is a diagram showing an analysis window in an AAC-ELD encoder.
- FIG. 2 is a diagram illustrating a decoding process in the AAC-ELD decoder.
- FIG. 3 is a diagram showing a synthesis window in the AAC-ELD decoder.
- FIG. 4 is a diagram illustrating a delay amount of AAC-ELD encoding / decoding processing.
- FIG. 5 is a diagram for explaining the transition frame.
- FIG. 6 is a block diagram showing a configuration of the sound signal hybrid encoder according to the first embodiment.
- FIG. 7 is a diagram illustrating an encoded frame when the encoding mode is switched from the FD encoding mode to the ACELP encoding mode.
- FIG. 1 is a diagram showing an analysis window in an AAC-ELD encoder.
- FIG. 2 is a diagram illustrating a decoding process in the AAC-ELD decoder.
- FIG. 3 is a diagram showing a synthesis window in the
- FIG. 8A is a diagram illustrating an example of a component X generation method.
- FIG. 8B is a flowchart of a method for generating component X.
- FIG. 9 is a block diagram illustrating a configuration of a sound signal hybrid encoder including a TCX encoder.
- FIG. 10 is a block diagram showing a configuration of the sound signal hybrid decoder according to the first embodiment.
- FIG. 11 is a schematic diagram illustrating switching control of the block switching unit when a signal encoded in the FD encoding mode is switched to a signal encoded in the ACELP encoding mode.
- FIG. 12A is a diagram illustrating a method of reconstructing the signal of frame i-1.
- FIG. 12B is a flowchart of a method for reconstructing the signal of frame i-1.
- FIG. 13 is a diagram showing a delay amount of the encoding / decoding process according to the first embodiment.
- FIG. 14 is a block diagram illustrating a configuration of a sound signal hybrid decoder including a TCX decoder.
- FIG. 15 is a diagram illustrating a method for reconstructing the signal of frame i ⁇ 1 using the composite error compensation apparatus.
- FIG. 16 is a diagram illustrating the decoding process of the synthesis error information.
- FIG. 17 is a diagram illustrating an encoded frame when the encoding mode is switched from the ACELP encoding mode to the FD encoding mode.
- FIG. 18 is a schematic diagram illustrating switching control of the block switching unit when a signal encoded in the ACELP encoding mode is switched to a signal encoded in the FD encoding mode.
- FIG. 19 is a flowchart of a method for reconstructing the signal of frame i-1 according to the second embodiment.
- FIG. 20A is a diagram illustrating an example of a method of reconfiguring a signal of frame i ⁇ 1 according to Embodiment 2.
- FIG. 20B is another diagram illustrating an example of a method of reconfiguring the signal of frame i ⁇ 1 according to the second embodiment.
- FIG. 21 is a diagram illustrating an example of a method for reconstructing a signal of frame i according to the second embodiment.
- FIG. 20A is a diagram illustrating an example of a method of reconfiguring a signal of frame i ⁇ 1 according to Embodiment 2.
- FIG. 20B is another diagram illustrating an example of a method of reconfiguring the signal of frame i ⁇ 1 according to the second embodiment
- FIG. 22 is a diagram illustrating an example of a method of reconfiguring the signal of frame i + 1 according to the second embodiment.
- FIG. 23 is a diagram showing a delay amount of the encoding / decoding process according to the second embodiment.
- FIG. 24 is a diagram illustrating a method of reconstructing the signal of frame i-1 using the SEC device.
- FIG. 25 is a diagram illustrating a method of reconstructing the signal of frame i using the SEC device.
- FIG. 26 is a diagram illustrating a method of reconstructing the signal of frame i ⁇ 1 using the SEC device.
- FIG. 27 is a diagram illustrating an encoded frame when the encoding mode is switched from the FD encoding mode to the TCX encoding mode.
- FIG. 28 is a schematic diagram illustrating switching control of the block switching unit when a signal encoded in the FD encoding mode is switched to a signal encoded in the TCX code mode.
- FIG. 29 is a diagram illustrating a delay amount of the encoding / decoding process according to Embodiment 3.
- FIG. 30 is a diagram illustrating an encoded frame when the encoding mode is switched from the TCX encoding mode to the FD encoding mode.
- FIG. 31 is a diagram illustrating an encoded frame when the encoding mode is switched from the TCX encoding mode to the FD encoding mode.
- FIG. 32 is a diagram illustrating an example of a method of reconfiguring a signal of frame i ⁇ 1 according to the fourth embodiment.
- FIG. 33 is a diagram illustrating a delay amount of the encoding / decoding process according to the fourth embodiment.
- the audio codec is a codec for encoding an audio signal according to the characteristics of the audio signal (see Non-Patent Document 1).
- the audio codec realizes good sound quality with low delay when the audio signal is encoded at a low bit rate.
- speech codecs are not suitable for encoding acoustic signals. Therefore, when an audio signal is encoded by an audio codec, the sound quality is deteriorated as compared to, for example, an encoding by an audio codec such as AAC.
- ACELP coding mode Algebraic Code Excited Linear Prediction
- TCX coding mode Transform Coded Excitation
- ACELP coding mode after linear prediction analysis, an algebraic codebook is applied to the coding of the excitation signal.
- TCX coding mode after the linear prediction analysis, transform coding is used for the excitation signal.
- the acoustic codec is a codec suitable for encoding an acoustic signal.
- a high bit rate is usually required to achieve stable sound quality like the audio codec.
- Hybrid codec combines the advantages of acoustic codec and audio codec.
- the encoding mode is divided into two systems. One is a frequency domain (FD) encoding mode such as AAC, which corresponds to the acoustic codec.
- FD frequency domain
- LPD linear prediction domain
- orthogonal transform encoding such as AAC-LD encoding mode and AAC encoding mode is used.
- LPD encoding mode a TCX encoding mode that is a frequency domain display of an LPC (Lenear Prediction Coefficient) residual and an ACELP encoding mode that is a time domain display of an LPC residual are generally used.
- LPC Long Prediction Coefficient
- the encoding mode is switched depending on whether the signal to be encoded is an audio signal or an acoustic signal (see Patent Document 1). Note that whether to select the ACELP encoding mode or the TCX encoding mode is selected based on, for example, a closed-loop analysis / synthesis technique.
- AAC-ELD AAC-ELD encoding method
- AAC and AAC-LD AAC-ELD encoding method
- the AAC-ELD encoding scheme has the following characteristics in order to realize a sufficiently low delay.
- the number of samples in one frame of AAC-ELD (frame size N, hereinafter the same also in this specification) is as small as 512 time domain samples and 480 time domain samples.
- the analysis and synthesis filter bank is modified to employ a low delay filter bank. Specifically, a long window having a length of 4N is used with much overlap with the past and less overlap with the future (value N / 4 is actually zero).
- bit reservoir is minimized or no bit reservoir is used.
- Time domain noise shaping and long-term prediction functions are adapted according to low delay frame size.
- low delay analysis and synthesis filter banks are used in AAC-ELD.
- the low delay filter bank is defined as follows.
- x n is a windowed input signal (encoding target).
- the low delay inverse filter bank of AAC-ELD is defined as follows.
- X k is a decoded transform coefficient
- AAC-ELD 4 frames are encoded corresponding to one frame. Specifically, when frame i-1 is encoded, an extended frame having a length of 4N is formed by concatenating three frames i-4, i-3, i-2 preceding the frame i-1. This extension frame is encoded. If one frame size is N, the encoded frame size is 4N.
- FIG. 1 shows an analysis window (encoder window) in an AAC-ELD encoder, which is denoted as wenc .
- the length of the analysis window is 4N.
- one frame is divided into two subframes.
- the frame i-1 is divided and expressed in the form of a vector such as [a i-1 , b i-1 ].
- the lengths of a i-1 and b i-1 are each N / 2 samples.
- the encoder window having a length of 4N is divided into eight, and as shown in FIG. 1, these are [w 1 , w 2 , w 3 , w 4 , w 5 , w 6 , w 7 , w 8 ].
- the extended frame is indicated as [a i-4 , b i-4 , a i-3 , b i-3 , a i-2 , b i-2 , a i-1 , b i-1 ].
- the low delay filter bank defined by equation (1) is used to transform the windowed signal xn .
- a converted spectral coefficient having a frame size N is generated from the windowed signal xn having a frame size 4N.
- the basic algorithm of the low delay filter bank is the same as that of MDCT (Modified Discrete Cosine Transform).
- MDCT Modified Discrete Cosine Transform
- DCT-IV non-patent
- DCT-IV has the following even / odd boundary conditions.
- the signal of frame i-1 converted by the low delay filter bank using these boundary conditions is expressed as follows in DCT-IV.
- (A i-4 w 1 ) R , (a i-2 w 5 ) R , (b i-3 w 4 ) R , (b i-1 w 8 ) R are respectively represented by vectors a i -4 w 1, a i-2 w 5 , b i-3 w 4, b i-1 w 8 in reverse order.
- FIG. 2 is a diagram showing a decoding process in the AAC-ELD decoder.
- the length (frame size) of the output signal after decoding is 4N.
- the inverse transform signal for frame i-1 is as follows.
- FIG. 3 shows the synthesis window in the AAC-ELD decoder, which is denoted w dec .
- the synthesis window is obtained by reversing the analysis windows in the AAC-ELD encoder as they are. Similarly to the analysis window in the AAC-ELD encoder, for the sake of convenience, the synthesis window is divided into eight as shown in FIG.
- the composite window is expressed in the form of a vector as follows.
- the decoding target frame i is decoded in order to reconstruct the signal [a i-1 , b i-1 ] of the frame i-1. That is, the overlap addition process is performed using the inversely converted signals obtained by windowing the frame i and the three frames preceding the frame i. Therefore, the overlap addition process shown in FIG. 2 is expressed by the following equation.
- the length of the reconstructed signal is N.
- the signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1 is reconstructed by the overlap addition process from the following window characteristics.
- FIG. 4 is a diagram showing the amount of delay in AAC-ELD encoding / decoding processing. In FIG. 4, it is assumed that the encoding process for frame i-1 is started at time t.
- the portion corresponding to N / 4 samples in the second half of the analysis window w 8 in the encoder of the AAC-ELD is zero. Therefore, as shown in FIG. 4, at time t + 3 * N / 4 samples, x i ⁇ 1 is in a state where MDCT conversion is possible, and an IMDCT converted signal y i ⁇ 1 is obtained.
- window processing and overlap addition processing are applied to y i ⁇ 1 , y i to obtain out i, n .
- the portion corresponding to the first half N / 4 samples of the synthesis window w R, 8 in the AAC-ELD decoder is zero.
- AAC-ELD MDCT is performed on four consecutive frames, and the four frames are subjected to overlap addition processing as shown in FIG.
- the MDCT conversion is also used in the TCX encoding mode. However, in the TCX encoding mode, one or more blocks exist in one frame, and the MDCT conversion is performed on the consecutive blocks. Subsequent blocks are overlapped so that the second half of one block matches the first half of the next block.
- the decoding mode is changed from the LPD encoding mode to the AAC-ELD, or from the AAC-ELD to the LPD encoding mode in order to perform decoding using the preceding frame and the subsequent frame by the overlap addition process as described above. Aliasing occurs when the transition frame, which is the first frame switched, is decoded.
- FIG. 5 is a diagram for explaining the transition frame.
- a frame i in FIG. 5 is a transition frame.
- mode 1 is AAC-ELD and mode 2 is an LPD encoding mode
- aliasing occurs when frame i is decoded.
- mode 1 is an LPD encoding mode
- mode 2 is AAC-ELD
- aliasing occurs when frame i is decoded.
- Patent Document 1 Since the method described in Patent Document 1 does not support an encoding method that requires overlap processing using a plurality of preceding frames such as AAC-ELD, it cannot reduce the generated aliasing.
- a sound signal hybrid decoder includes an audio frame encoded by an audio encoding process using a low-delay filter bank and audio using a linear prediction coefficient.
- An audio signal hybrid decoder that decodes a bitstream including an audio frame encoded by an encoding process, wherein the audio frame is decoded using a low delay inverse filter bank process; and the audio
- the decoding target frame is decoded by the low-delay transform decoder, and the decoding target frame is the audio frame
- the decoding target frame is controlled by the audio signal decoder.
- the i-th frame includes the i-th frame from the i-th frame.
- the block switching unit includes (1) the i th Reconstruction of the i-3 frame, which is a frame 3 frames ahead of the i frame, obtained by decoding the i-2 frame, which is 2 frames ahead of the frame, by the low delay transform decoder
- the signal corresponding to the first half of the second signal frame, which is a windowed signal of the processed signal corresponds to the second half of the second signal frame.
- the signal obtained by adding the signal of the first half of the third signal frame, which is the portion corresponding to the i-3 frame of the signal obtained by subjecting the i-1 frame to the low delay inverse filter bank processing and the window processing, is encoded.
- a signal corresponding to the first half of the frame of the second signal is generated and a signal corresponding to the first half of the frame of the second signal is convolved with a signal corresponding to the second half of the frame of the second signal.
- a process of adding a signal obtained by performing window processing by adding the processed signals, a signal obtained by performing convolution processing and window processing on the first signal, and a signal corresponding to the second half of the frame of the third signal To generate a signal corresponding to the second half of the i-1 frame before encoding, or (2) a signal corresponding to the first half of the second signal frame is converted into a signal corresponding to the first half of the frame of the second signal.
- the signal obtained by performing the window processing by adding the signal obtained by convolution processing the signal corresponding to the second half portion, the signal obtained by performing the convolution processing and the window processing on the first signal
- a signal corresponding to the first half of the frame of the i-1th frame before encoding to generate a signal corresponding to the second half of the frame of the second signal.
- a signal obtained by performing window processing by adding a signal obtained by convolving a signal corresponding to the first half portion of the signal frame, a signal obtained by performing window processing on the first signal, and a second half portion of the frame of the third signal. Add the corresponding signal Processes performed and generates a signal corresponding to the second half of the first i-1 frame before encoding that.
- the block switching unit performs the process shown in FIG. 12A. This can reduce aliasing that occurs when the first frame whose coding mode is switched from the FD coding mode to the LPD coding mode is decoded. Therefore, seamless switching between the FD decoding technique and the LPD decoding technique is realized.
- an acoustic frame encoded by an acoustic encoding process using a low delay filter bank and an audio frame encoded by an audio encoding process using a linear prediction coefficient are included.
- the target frame is the acoustic frame
- the decoding target frame is decoded by the low-delay transform decoder, and when the decoding target frame is the audio frame, the decoding target frame is controlled by the audio signal decoder.
- a block switching unit for performing the block switching unit When the decoding target frame is the i-th frame that is the first acoustic frame that is switched from the audio frame to the acoustic frame, the i-th frame that is one frame preceding the i-th frame is used as the audio signal.
- a signal obtained by performing window processing on the signal obtained by decoding by the decoder is added to a signal obtained by convolution processing of the fourth signal, and the fifth signal subjected to window processing is preceded by 3 frames before the i-th frame.
- a signal obtained by convolution processing of the sixth signal is added to a sixth signal obtained by performing window processing on a signal obtained by decoding the i-3th frame which is a frame to be decoded by the audio signal decoder, and the window processing is performed.
- the eighth signal may generate the reconstructed signal which is the signal corresponding to the first i-1 frame before encoding by performing a process of adding.
- the block switching unit performs the processing shown in FIGS. 20A and 20B. Thereby, it is possible to reduce aliasing that occurs when the first frame whose coding mode is switched from the LPD coding mode to the FD coding mode is decoded. Therefore, seamless switching between the FD decoding technique and the LPD decoding technique is realized.
- the block switching unit may convert the i + 1 frame to the low-delay inverse filter when the decoding target frame is an i + 1 frame that is a frame after the i-th frame.
- a ninth signal that is a portion corresponding to the i-2 frame, which is a frame that precedes the i frame by 2 frames, and the low delay inverse filter
- the tenth signal corresponding to the i-2th frame of the banked and windowed signals and the eleventh signal obtained by decoding the i-2 frame by the audio signal decoder are the first signal
- a signal corresponding to the first half of the frame of the signal subjected to the window processing is added to the second half of the frame of the signal obtained by performing the first window processing on the eleventh signal.
- the signal obtained by adding the signal subjected to the convolution process to the signal corresponding to is connected to the signal obtained by performing the convolution process on the twelfth signal, and the thirteenth signal subjected to the window process and the eleventh signal.
- the signal corresponding to the first half of the frame of the signal subjected to the second window processing different from the first window processing corresponds to the second half of the frame of the signal subjected to the second window processing on the eleventh signal.
- the signal obtained by adding the convolution processed signal to the signal to be added is connected to the signal obtained by convolution processing of the 14th signal and the signal inverted in sign, and the 15th signal subjected to window processing is added.
- the signal corresponding to the i-th frame before encoding may be generated by performing the above process.
- the block switching unit performs the processing shown in FIG. As a result, it is possible to reduce aliasing that occurs when a frame one frame after the first frame whose coding mode is switched from the LPD coding mode to the FD coding mode is decoded.
- the block switching unit may convert the i + 2 frame to the low-delay inverse filter bank when the decoding target frame is an i + 2 frame that is a frame after the i-th frame.
- 16th signal corresponding to the i ⁇ 1th frame of the processed and windowed signal and the i ⁇ 1th frame of the low delay inverse filter bank processing and windowed signal of the i + 1th frame A signal corresponding to the i-1th frame of the signal obtained by subjecting the i-th frame to the low delay inverse filter bank processing and the window processing, and the i-3th frame.
- the 20th signal obtained by adding the signal subjected to the convolution processing to the signal corresponding to the latter half of the frame of the signal subjected to the window processing on the 19th signal is connected to the signal obtained by convolving the 20th signal
- the 21st signal subjected to window processing and the signal corresponding to the first half of the frame of the signal subjected to window processing on the reconstructed signal correspond to the second half of the frame of the signal subjected to window processing on the reconstructed signal
- the signal obtained by adding the convolution-processed signal to the signal to be added is connected to the signal obtained by convolution-processing the 22nd signal and inverted in sign, and the window-processed 23rd signal is added.
- the signal corresponding to the (i + 1) th frame before encoding may be generated by performing the above process.
- the block switching unit performs the process shown in FIG. As a result, it is possible to reduce aliasing that occurs when a frame two frames after the first frame whose coding mode is switched from the LPD coding mode to the FD coding mode is decoded.
- an acoustic frame encoded by an acoustic encoding process using a low delay filter bank and an audio frame encoded by an audio encoding process using a linear prediction coefficient are included.
- TCX Transform Coded Exitation
- the decoding target frame of the TCX decoder and the bitstream is the acoustic frame
- the decoding target frame is decoded by the low-delay transform decoder
- the decoding target Frame is the audio signal
- the decoding target frame is the first audio frame that is switched from the acoustic frame to the audio frame, and is a frame in which a transient signal is encoded.
- the i-th frame is encoded with the first signal generated by using the signal before the encoding of the (i-1) -th frame, which is a frame preceding the i-th frame.
- the block switching unit is (1) obtained by decoding the i-2 frame that is two frames ahead of the i frame by the low delay transform decoder.
- the frame of the second signal which is a signal obtained by performing window processing on the reconstructed signal of the i-3th frame which is a frame three frames ahead of the i frame
- a signal obtained by performing window processing by adding a signal corresponding to the second half of the frame of the second signal to a signal corresponding to the first half is subjected to window processing, and the i-th frame is decoded by the audio signal decoder.
- a signal corresponding to the first half of the i-1 frame before encoding is generated by adding the signal of the first half of the frame of the third signal to the second half of the frame of the second signal.
- a signal obtained by performing window processing by adding a signal obtained by convolving a signal corresponding to the first half of the frame of the second signal to a corresponding signal, and a signal obtained by performing convolution processing and window processing on the first signal.
- a signal corresponding to the second half of the frame of the third signal to generate a signal corresponding to the second half of the i-1 frame before encoding, or
- the signal corresponding to the first half of the second signal frame is added to the signal corresponding to the second half of the second signal frame and the signal is subjected to window processing, and the signal is convolved with the first signal.
- a signal corresponding to the first half of the i-1 frame before encoding by performing a process of adding the signal subjected to the processing and window processing and the signal corresponding to the first half of the frame of the third signal.
- the signal corresponding to the latter half of the third signal frame may generate a signal corresponding to the second half of the first i-1 frame before processing performed encoding for adding.
- the block switching unit performs the process shown in FIG. 12A in decoding of the encoded signal when a transient signal (transient frame) occurs in the FD encoding mode. Thereby, the sound quality of the sound when the transient frame is decoded can be improved.
- the low-delay transform decoder performs low-delay inverse filter bank processing and window processing for each of the acoustic frame and three frames that precede the acoustic frame in time.
- An AAC-ELD (Advanced Audio Coding-Enhanced Low Delay) decoder that decodes the sound frame by performing overlap addition processing on each of the signals may be used.
- the audio signal decoder may be an ACELP decoder that decodes the audio frame encoded using an ACELP (Algebraic Code Excited Linear Prediction) coefficient.
- ACELP Algebraic Code Excited Linear Prediction
- the audio signal decoder may be a TCX decoder that decodes the audio frame encoded by the TCX method.
- the information processing apparatus further includes a synthesis error compensation device that decodes the synthesis error information encoded together with the decoding target frame, wherein the synthesis error information is a signal before the bitstream is encoded. And the signal representing the difference between the bit stream and the decoded signal, and the synthesis error compensation device is configured to generate the signal of the i-1th frame before encoding generated by the block switching unit, the block switching unit, The signal of the i-th frame before encoding generated by or the signal of the i + 1-th frame before encoding generated by the block switching unit may be corrected using the decoded synthesis error information. .
- the sound signal hybrid encoder analyzes a sound characteristic of the sound signal, and determines whether a frame included in the sound signal is an acoustic signal or an audio signal;
- a low-delay transform encoder that encodes the frame using a low-delay filter bank, an audio signal encoder that encodes the frame by calculating a linear prediction coefficient of the frame, and the signal classification unit is the acoustic signal
- Block switching for performing control for encoding the encoding target frame determined to be present by the low-delay transform encoder and encoding the encoding target frame determined by the signal classification unit as the speech signal by the speech signal encoder
- the block switching unit includes: (1) the frame to be encoded, and the signal classification unit is the sound.
- a signal obtained by adding the convolution-processed signal and the i-th frame are encoded by the audio signal encoder.
- the block switching unit performs the processing shown in FIG. 7 and FIG. 8A. This can reduce aliasing that occurs when the first frame whose coding mode is switched from the FD coding mode to the LPD coding mode is decoded. Therefore, seamless switching between the FD decoding technique and the LPD decoding technique is realized.
- a signal classification unit that analyzes an acoustic characteristic of a sound signal and determines whether a frame included in the sound signal is an acoustic signal or an audio signal, and a low delay filter bank are used.
- a low-delay transform encoder that encodes the frame
- a TCX encoder that encodes the frame using a TCX method in which a residual of a linear prediction coefficient of the frame is processed by MDCT (Modified Discrete Cosine Transform)
- MDCT Modified Discrete Cosine Transform
- the block switching unit When the i-th frame that is the encoding target frame is a frame that the signal classifying unit determines to be the acoustic signal and a transient signal whose energy changes abruptly, (1) the i-th frame A signal obtained by windowing a signal corresponding to the first half of the i-1th frame, which is a frame one frame before, is added with a signal obtained by windowing the signal corresponding to the second half of the i-1th frame and performing a convolution process.
- a signal and the i-th frame are encoded by the audio signal encoder, or (2) a signal obtained by windowing a signal corresponding to the second half of the i-1 frame is a first half of the i-1 frame
- a signal obtained by adding a signal obtained by performing window processing on a signal corresponding to the above and convolution processing and the i-th frame may be encoded by the audio signal encoder.
- the block switching unit performs the processing shown in FIGS. 7 and 8A in encoding when a transient signal (transient frame) occurs in the FD encoding mode. Thereby, the sound quality of the sound when the transient frame is decoded can be improved.
- the low-delay transform encoder performs window processing and low-delay filter bank processing on an extended frame obtained by connecting the frame and three frames that precede the frame in time.
- it may be an AAC-ELD encoder that encodes the frame.
- the audio signal encoder may be an ACELP encoder that encodes the frame by generating ACELP coefficients.
- the speech signal encoder may be a TCX encoder that encodes the frame by performing MDCT processing on the residual of the linear prediction coefficient.
- a local decoder that decodes the encoded sound signal, and synthesis error information that is a difference between the sound signal and the sound signal decoded by the local decoder is encoded.
- a local encoder may be provided.
- Transition from FD encoding mode to ACELP encoding mode (Embodiment 1) Transition from ACELP coding mode to FD coding mode (Embodiment 2) Transition from FD encoding mode to TCX encoding mode (Embodiment 3) Transition from TCX encoding mode to FD encoding mode (Embodiment 4) Transition from FD encoding mode to transient signal encoding mode (Embodiment 5)
- FIG. 6 is a block diagram showing a configuration of the sound signal hybrid encoder according to the first embodiment.
- the sound signal hybrid encoder 500 includes a high-frequency encoder 501, a block switching unit 502, a signal classification unit 503, an ACELP encoder 504, an FD encoder 505, and a bit multiplexer 506.
- the input signal is transmitted to the high frequency encoder 501 and the signal classification unit 503, respectively.
- the high-frequency encoder 501 generates a high-frequency parameter that is a signal obtained by extracting a high-frequency band from an input signal and a low-frequency signal that is a signal obtained by extracting a low-frequency band from the input signal.
- the high frequency parameter is transmitted to the bit multiplexer 506.
- the low frequency signal is transmitted to the block switching unit 502.
- the signal classification unit 503 analyzes the acoustic characteristics of the low-frequency signal, and determines whether the low-frequency signal is an acoustic signal or an audio signal every N samples (for each frame). Specifically, the signal classification unit 503 calculates the spectrum intensity of the band of 3 kHz or more of the frame and the spectrum intensity of the band of 3 kHz or less of the frame. When the spectrum intensity of 3 kHz or less is larger than the spectrum intensity of the other band, the signal classification unit 503 determines that the frame is a signal mainly composed of an audio signal, that is, an audio signal, and indicates a mode index indicating the determination result. Is transmitted to the block switching unit 502 and the bit multiplexer 506.
- the signal classification unit 503 determines that the frame is a signal mainly composed of an acoustic signal, that is, an acoustic signal, and sets the mode index.
- the data is transmitted to the block switching unit 502 and the bit multiplexer 506.
- the block switching unit 502 performs switching control in which the FD encoder 505 encodes a frame indicating that the mode indicator is an acoustic signal, and the ACELP encoder 504 encodes a frame indicating that the mode indicator is an audio signal. That is, the block switching unit 502 transmits the low frequency signal received from the high frequency encoder to the FD encoder 505 and the ACELP encoder 504 for each frame according to the mode index.
- the FD encoder 505 encodes the frame in the AAC-ELD encoding mode based on the control of the block switching unit 502, and transmits the FD transform coefficient generated by the encoding to the bit multiplexer 506.
- the ACELP encoder 504 encodes the frame in the ACELP encoding mode based on the control of the block switching unit 502, and transmits the ACELP coefficient generated by the encoding to the bit multiplexer 506.
- the bit multiplexer 506 generates a bit stream that combines the coding mode index, the high-band parameter, the FD conversion coefficient, and the ACELP coefficient.
- the sound signal hybrid encoder 500 may include a storage unit that temporarily stores frames (signals).
- FIG. 7 is a diagram illustrating an encoded frame when the encoding mode is switched from the FD encoding mode to the ACELP encoding mode.
- the block switching unit 502 when the frame i is encoded, a signal obtained by adding the component X generated from the signal [a i ⁇ 1 , b i ⁇ 1 ] of the preceding frame i ⁇ 1 is encoded. Specifically, the block switching unit 502 generates an extended frame that combines the component X and the signal [a i , b i ] of the frame i.
- the extension frame has a length of (N + N / 2).
- the extended frame is transmitted to the ACELP encoder 504 by the block switching unit 502 and encoded in the ACELP encoding mode.
- the component X is specifically generated as follows.
- FIG. 8A is a diagram illustrating an example of a method for generating the component X.
- FIG. 8B is a flowchart of a method for generating component X.
- the component a i-1 w 5 is obtained by applying the window w 5 to the input part a i-1 which is the first half of the signal of the frame i-1 (S101 in FIG. 8B).
- b i-1 w 6 is obtained by applying the window w 6 to the input part b i-1 which is the latter half of the signal of the frame i-1 (S102 in FIG. 8B).
- further convolution processing is applied to b i-1 w 6 (S103 in FIG. 8B).
- volution processing on a signal means that samples constituting a signal vector are rearranged in reverse order in time for each signal vector.
- the obtained component X is used for decoding together with a plurality of preceding frames in the decoder. Thereby, the signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1 is appropriately reconstructed.
- the convolution process is further applied to b i-1 w 6, but the convolution process may be further applied to a i-1 w 5 . That is, the component X may be (a i ⁇ 1 w 5 ) R + b i ⁇ 1 w 6 .
- the sound signal hybrid encoder 500 may further include a TCX encoder 507.
- the TCX encoder 507 encodes the frame in the TCX encoding mode based on the control of the block switching unit 502, and transmits the TCX coefficient generated by the encoding to the bit multiplexer 506.
- FIG. 10 is a block diagram showing a configuration of the sound signal hybrid decoder according to the first embodiment.
- the sound signal hybrid decoder 900 includes a demultiplexer 901, an FD decoder 902, an ACELP decoder 903, a block switching unit 904, and a high frequency decoder 905.
- the demultiplexer 901 demultiplexes the bit stream. Specifically, the demultiplexer 901 divides the bit stream into a mode indicator, a high band parameter, and an encoded signal.
- the mode index is transmitted to the block switching unit 904, the high-frequency parameter is transmitted to the high-frequency decoder 905, and the encoded signals (FD conversion coefficient and ACELP coefficient) are FD decoder 902 and ACELP decoder 903 corresponding to each frame. Sent to.
- the FD decoder 902 generates an FD inverse conversion signal from the FD conversion coefficient by the AAC-ELD decoding process described with reference to FIG. That is, the FD decoder 902 decodes a frame encoded by the FD encoding mode.
- the ACELP decoder 903 generates an ACELP composite signal from the ACELP coefficient by ACELP decoding processing. That is, the ACELP decoder 903 decodes a frame encoded by the ACELP encoding mode.
- the FD inverse conversion signal and the ACELP composite signal are transmitted to the block switching unit 904.
- the block switching unit 904 decodes a frame indicating that the mode indicator is an acoustic signal by the FD decoder 902 and receives an FD inverse transform signal, and decodes a frame indicating that the mode indicator is an audio signal by the ACELP decoder 903. Then, the ACELP composite signal is received.
- the high frequency decoder 905 reconstructs the input signal using the high frequency parameter transmitted from the demultiplexer and the low frequency band time domain signal transmitted from the block switching unit 904.
- the sound signal hybrid decoder 900 may include a storage unit that temporarily stores frames (signals).
- switching control (decoding method) of the block switching unit 904 when a signal encoded in the FD encoding mode is switched to a signal encoded in the ACELP encoding mode will be described.
- FIG. 11 is a schematic diagram showing switching control (decoding method) of the block switching unit 904 when a signal encoded in the FD encoding mode is switched to a signal encoded in the ACELP code mode.
- the frame i-1 is a frame encoded by the FD encoding mode
- the frame i that is a decoding target frame is a frame encoded by the ACELP encoding mode.
- the decoding target frame i when the signal encoded in the FD encoding mode is continuous, the decoding target frame i can be decoded to reconstruct the signal of the frame i-1. That is, in the case shown in FIG. 11, the signal of frame i-2 can be reconstructed by a normal FD decoding process.
- the decoding target frame i since the decoding target frame i is encoded in the ACELP encoding mode, an unnatural sound due to an aliasing component is generated in the signal of the frame i ⁇ 1. That is, the signal of frame i-1 becomes an aliasing portion as shown in FIG.
- the block switching unit 904 performs a decoding process using the following three signals.
- the component X signal (first signal) of the ACELP composite signal obtained by performing the ACELP decoding process on the decoding target frame i is used to reconstruct the signal of the frame i ⁇ 1 with the aliasing component reduced.
- This signal is a signal indicated as a subframe 1001 in FIG. 11, and is the component X described with reference to FIG. 8A.
- the decoding target frame i is a frame having a length of 3N / 2 encoded in the ACELP encoding mode. That is, the ACELP composite signal obtained by performing the ACELP decoding process on the frame i is represented as y i, n acelp .
- the extended portion corresponding to the component X is as follows.
- the component X is specifically a i ⁇ 1 w 5 + (b i ⁇ 1 w 6 ) R.
- the signal (third signal) of the portion corresponding to the frame i-3 in the windowed signal has an aliasing component. Used to reconstruct the reduced frame i-1 signal. This signal is shown as subframe 1002 and subframe 1003 in FIG.
- this signal is obtained by inversely transforming the frame i-1 as a normal frame with a length of 4N by the AAC-ELD low delay filter bank, and further performing window processing.
- the inverse transform signal is
- the signal corresponding to the frame i-3 (two aliasing portions indicated as subframe 1002 and subframe 1003 in FIG. 11) is extracted from the inversely transformed signal as follows. That is,
- the signal of frame i-3 is shown as subframe 1004 and subframe 1005 in FIG.
- the signal a i ⁇ 1 w 5 + (b i ⁇ 1 w 6 ) R indicated as the subframe 1001 in FIG. 11 and the signal [c ⁇ 3 ] i ⁇ 1 indicated as the subframe 1002 are illustrated.
- the signal [d ⁇ 3 ] i ⁇ 1 indicated by the subframe 1003 and the signals [a i ⁇ 3 and b i ⁇ 3 ] indicated by the subframes 1004 and 1005 are the frame i ⁇ with reduced aliasing components. Used to reconstruct one signal.
- FIG. 12A is a diagram illustrating a method of reconstructing a i-1 that is the first half sample portion of the signal of frame i-1.
- FIG. 12B is a flowchart of a method for reconstructing a i ⁇ 1 which is the first half sample portion of the signal of frame i ⁇ 1.
- a i-3 w 3 is obtained by applying the window w 3 to a i-3 which is the subframe 1004 (the first half of the frame of the second signal) (S201 in FIG. 12B).
- b i-3 w 4 is obtained, and further, the convolution processing is applied.
- B i-3 w 4 in reverse order (b i-3 w 4 ) R is obtained (S202 in FIG. 12B).
- window processing is applied to the signal obtained by adding a i-3 w 3 and (b i-3 w 4 ) R , so that a i-3 w 3 w R, 6 ⁇ ( b i-3 w 4 ) R w R, 6 is obtained (S203 in FIG. 12B).
- the composite window w R, 8 is applied to a i ⁇ 1 w 5 + (b i ⁇ 1 w 6 ) R which is the subframe 1001 (component X, first signal), and a i ⁇ 1 w 5 w R, 8 + (b i-1 w 6 ) R w R, 8 is obtained (S204 in FIG. 12B).
- the subframe 1002 (the first half of the third signal frame) which is an inversely converted signal is
- subframe 1101 which is the first half of the signal of frame i ⁇ 1 with reduced aliasing components, is obtained.
- FIG. 12A is a diagram illustrating a method of reconstructing b i ⁇ 1 which is the second half sample portion of the signal of frame i ⁇ 1.
- b i ⁇ 1 which is the second half sample portion of the signal of frame i ⁇ 1.
- subframe 1102 which is the latter half of the signal of frame i ⁇ 1 with reduced aliasing components, is obtained.
- the signal [a i ⁇ 1 , b i ⁇ 1 ] of the signal frame i ⁇ 1 obtained by concatenating the subframe 1101 and the subframe 1102 is obtained.
- window processing is applied to the subframe 1001 shown in FIG. 12A (a), and convolution processing and window processing are applied to the subframe 1001 shown in FIG. 12A (b). .
- This is processing when the component X is represented as a i-1 w 5 + (b i-1 w 6 ) R as described above.
- the component X is (a i ⁇ 1 w 5 ) R + b i ⁇ 1 w 6
- convolution processing and window processing are applied to the subframe 1001 shown in FIG.
- Window processing is applied to the subframe 1001 shown in FIG.
- FIG. 13 is a diagram showing a delay amount of the encoding / decoding process according to the first embodiment. In FIG. 13, it is assumed that the encoding process for frame i-1 is started at time t.
- IMDCT-transformed output of frame i-1 due to the characteristics of the window of the low delay filter bank in AAC-ELD
- subframes 1002 and 1003 are obtained at time t + 3 * N / 4 samples.
- the subframe 1004 and the subframe 1005 are signals reconstructed by decoding the preceding frame, they have already been acquired.
- the ACELP composite signal of frame i is obtained at time t + 2N samples. That is, subframe 1001 (component X) is obtained at time t + 2N samples.
- the synthesis window w R, 8 in which the portion corresponding to the N / 4 samples in the first half is zero is applied to the subframe 1001, N / 4 samples before completely acquiring the subframe 1001 are applied. Sound output can be started.
- the sound signal hybrid decoder 900 may further include a TCX decoder 906.
- the TCX decoder 906 shown in FIG. 14 generates a TCX composite signal from the TCX coefficient by TCX decoding processing. That is, the TCX decoder 906 decodes a frame encoded by the TCX encoding mode.
- the sound signal hybrid decoder 900 may further include a synthesis error compensation (SEC) device.
- SEC synthesis error compensation
- SEC processing is performed at the time of decoding the decoding target frame i in order to generate a final synthesized signal.
- the purpose of adding the SEC device is to reduce (eliminate) synthesis errors caused by switching the encoding mode in the sound signal hybrid decoder 900 in order to improve sound quality.
- FIG. 15 is a diagram illustrating a method for reconstructing the signal of frame i ⁇ 1 using the composite error compensation apparatus.
- SEC processing is performed on the reconstructed signals [a i ⁇ 1 , b i ⁇ 1 ] in order to efficiently compensate for the influence of time domain aliasing.
- the SEC device decodes the synthesis error information calculated by conversion using the DCT-IV, AVQ method, or the like during the encoding process in the decoding target frame.
- the decoded synthesis error information is added to the reconstructed signal [a i ⁇ 1 , b i ⁇ 1 ] by the SEC process, and the reconstructed signal is corrected.
- the subframe 1101 is modified to a subframe 2901 as shown in FIG. 15A
- the subframe 1102 is modified to a subframe 2902 as shown in FIG.
- FIG. 16 is a diagram showing encoding and decoding methods of synthesis error information.
- the sound signal hybrid encoder 500 when encoding synthesis error information, includes a local decoder 508 and a local encoder.
- the local decoder 508 decodes the original signal (the signal before encoding) encoded by the encoder (ACELP encoder 504, FD encoder 505, or TCX encoder 507).
- the difference between the reconstructed signal (decoded original signal) and the original signal is synthesis error information.
- the local encoder 509 encodes (converts) synthesis error information using DCT-IV, AVQ (Adaptive Vector Quantization), or the like.
- the encoded synthesis error information is decoded (inversely transformed) by the SEC device 907 provided in the sound signal hybrid decoder 900, and is used for correcting the signal after reconstruction by the SEC processing as described with reference to FIG.
- FIG. 17 is a diagram illustrating an encoded frame when the encoding mode is switched from the ACELP encoding mode to the FD encoding mode.
- Frame i-1 is encoded by the ACELP encoding mode.
- the frame i is encoded by being concatenated with the preceding three frames i-3, i-2, and i-1 according to the FD encoding mode.
- the signal of the frame i-1 is obtained by performing the overlap addition process with the preceding three frames i-3, i-2, i-1 as described above.
- the overlap addition process is a process on the premise that all consecutive frames are encoded by the FD encoding mode.
- the frame i is a transition frame when the encoding mode is switched from the ACELP encoding mode to the FD encoding mode
- the preceding three frames, i-3, i-2, i ⁇ 1 is encoded in the ACELP encoding mode.
- aliasing occurs when the decoding target frame i is subjected to normal FD decoding processing.
- the preceding three frames include a frame encoded in the ACELP encoding mode, and therefore aliasing occurs.
- FIG. 18 is a schematic diagram illustrating switching control (decoding method) of the block switching unit 904 when a signal encoded in the ACELP encoding mode is switched to a signal encoded in the FD encoding mode.
- the block switching unit 904 When decoding the decoding target frame i and reconstructing the signal [a i-1 , b i-1 ] of the frame i-1, in order to reduce the aliasing component, the block switching unit 904 includes the following three signals: Is used to perform the decoding process.
- the signal corresponding to the frame i-3 in the windowed signal is used. This signal is shown as subframe 1401 and subframe 1402 in FIG.
- the ACELP composite signal [a i ⁇ 1 , b i ⁇ 1 ] obtained by performing the ACELP decoding process on the decoding target frame i ⁇ 1 is used. This signal is shown as subframes 1403 and 1404 in FIG.
- the signal [a i-3 , b i-3 ] of the frame i-3 obtained by performing the ACELP decoding process on the decoding target frame i-3 is used.
- the signal of frame i-3 is shown as subframe 1407 and subframe 1408 in FIG.
- FIG. 19 is a flowchart of a method for reconstructing the signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1.
- a windowed signal (eighth signal) is generated (S301 in FIG. 19).
- the eighth signal is expressed by the following equation.
- the signals corresponding to the frame i-3 are respectively expressed by the following equations.
- FIG. 20A is a diagram illustrating an example of a method for reconstructing the signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1.
- a signal obtained by adding a signal obtained by convolving the fourth signal to a window-processed signal (fourth signal) obtained by decoding the signal obtained by decoding the i-1th frame by ACELP decoding processing is obtained as follows:
- Frifth signal is generated (S302 in FIG. 19).
- the fifth signal is shown as subframe 1501 and subframe 1502 in FIG. 20A.
- FIG. 20B is another diagram illustrating an example of a method for reconstructing the signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1.
- the seventh signal, the sixth signal (subframe 1501 and subframe 1502), and the eighth signal (subframe 1401 and subframe 1402) which is an aliasing component extended from frame i. are added to generate the reconstructed signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1 (S304 in FIG. 19).
- the signal corresponding to the frame i-2 (the ninth signal) in the windowed signal is used.
- the decoding target frame i + 1 is inversely transformed by the AAC-ELD low delay filter bank, and the windowed signal is
- a portion (aliasing portion) corresponding to frame i-2 extracted from is as follows.
- a signal (tenth signal) corresponding to the frame i-2 in the windowed signal after the decoding target frame i is inversely transformed by the AAC-ELD low delay filter bank is used.
- the decoding target frame i is inversely converted by the AAC-ELD low delay filter bank, and the windowed signal is
- the signal [a i-2 , b i-2 ] of the frame i-2 obtained by performing the ACELP decoding process on the decoding target frame i-2 is used. It is done. This signal is shown as subframe 1405 and subframe 1406 in FIG.
- FIG. 21 is a diagram illustrating an example of a method for reconstructing the signal of frame i.
- a signal corresponding to the first half of the frame among signals obtained by performing window processing [w 1 , w 2 ] (first window processing) on the signal [a i-2 , b i-2 ] (11th signal) of the frame i-2 Is denoted a i ⁇ 2 W 1 .
- a signal (b i ⁇ 2 W 2 ) R obtained by convolving b i ⁇ 2 W 2 which is a signal corresponding to the latter half of the frame among signals obtained by performing window processing on the signal of frame i ⁇ 2 is added to this signal.
- the twelfth signal is generated.
- a signal corresponding to the first half of the frame among signals obtained by performing window processing [w 3 , w 4 ] (second window processing) on the signal of frame i-2 is denoted as a i-2 W 3 .
- a signal (b i ⁇ 2 W 4 ) R obtained by convolving b i ⁇ 2 W 4 which is a signal corresponding to the latter half of the frame among signals obtained by performing window processing on the signal of frame i ⁇ 2 is added to this signal. As a result, the fourteenth signal is generated.
- the fifteenth signal is added to the ninth signal and the tenth signal extracted from.
- the signals [a i, b i ] (subframes 1701 and 1702) of the frame i from the decoding target frame i + 1 are reconstructed.
- the signal (16th signal) of the portion (aliasing portion) corresponding to the frame i-1 in the windowed signal is used.
- the frame i + 2 is inverse transformed by the AAC-ELD low delay filter bank, and the windowed signal is
- the portion (aliasing portion) corresponding to frame i-1 extracted from is as follows.
- the signal (18th signal) of the portion corresponding to the frame i-1 (aliasing portion) of the windowed signal after the frame i is inversely transformed by the AAC-ELD low delay filter bank is used.
- the signal obtained by inversely transforming the frame i by the AAC-ELD low delay filter bank and windowing is given by
- the signal (17th signal) of the portion (aliasing portion) corresponding to the frame i-1 in the windowed signal is used.
- Frame i + 1 is inverse transformed by the AAC-ELD low delay filter bank and the windowed signal is
- the eighteenth signal is as follows.
- the 17th signal is as follows.
- signals (19th signal) shown as subframe 1407 and subframe 1408 in FIG. 18 are used.
- the subframe 1407 and the subframe 1408 are signals [a i-3 , b i-3 ] obtained by decoding the frame i-3 by the ACELP decoding process.
- the reconstructed signals [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1 shown as the subframe 1601 and the subframe 1602 in FIG. 20B are used.
- FIG. 22 is a diagram illustrating an example of a method for reconstructing the signal of frame i + 1.
- the signal corresponding to the first half of the frame is a i-3 W It is shown as 1 .
- Frame i-3 of b i-3 W 2 convolution processed signal is a signal corresponding to the second half frame of the signal of the window processing signal (b i-3 W 2) that R is added to the signal
- the twentieth signal is generated.
- the signal corresponding to the first half of the frame is a i ⁇ 1 W 7 It is shown.
- the signal in the frame i-1 of b i-1 W 8 convolution processed signal is a signal corresponding to the second half frame of the window processing on reconstructed signal signal (b i-1 W 8) R is added Thus, the 22nd signal is generated.
- this 22nd signal Furthermore, by combining (concatenating) this 22nd signal with a signal obtained by convolving the 22nd signal and inverting the sign (multiplied by -1), a signal is obtained.
- the 16th signal, the 17th signal, and the 18th signal extracted from the above, the 21st signal, and the 23rd signal are added.
- FIG. 23 is a diagram showing a delay amount of the encoding / decoding process according to the second embodiment. In FIG. 23, it is assumed that the encoding process for frame i-1 is started at time t.
- the ACELP composite signal of frame i-1 is obtained at time t + N samples. That is, subframes 1501 and 1502 (subframes 1403 and 1404) are obtained at time t + N samples.
- the subframe 1407 and the subframe 1408 are signals reconstructed by decoding the preceding frame, they have already been acquired.
- the IMDCT transformed output of frame i is obtained at time t + 7 * N / 4 samples due to the window characteristics of the low delay filter bank in AAC-ELD. That is, subframes 1401 and 1402 are obtained at time t + 7 * N / 4 samples.
- the synthesis window w R, 8 in which the portion corresponding to the N / 4 samples in the first half is zero is applied to the subframe 1401, N / 4 samples before completely acquiring the subframe 1401 are applied. Sound output can be started.
- the sound signal hybrid decoder 900 may further include a TCX decoder 906 as shown in FIG.
- the sound signal hybrid decoder 900 may further include a synthesis error compensation (SEC) device.
- SEC synthesis error compensation
- FIG. 24 is a diagram illustrating a method of reconstructing the signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i ⁇ 1 using the SEC device.
- the configuration shown in FIG. 24 is obtained by adding an SEC device to the configuration shown in FIG. 20B.
- the subframes 1601 and 1602 are corrected to subframes 3101 and 3102 by SEC processing, respectively.
- FIG. 25 is a diagram illustrating a method of reconstructing the signal [a i , b i ] of the frame i using the SEC device.
- the configuration shown in FIG. 25 is obtained by adding an SEC device to the configuration shown in FIG.
- subframes 1701 and 1702 are modified into subframes 3201 and 3202, respectively, by SEC processing.
- FIG. 26 is a diagram illustrating a method of reconstructing the signal [a i + 1 , b i + 1 ] of the frame i ⁇ 1 using the SEC device.
- the configuration shown in FIG. 26 is obtained by adding an SEC device to the configuration shown in FIG.
- subframes 1801 and 1802 are modified into subframes 3301 and 3302, respectively, by SEC processing.
- the sound quality can be further improved by compensating the synthesis error included in the reconstructed signal by the SEC device provided in the decoder.
- the configuration of the sound signal hybrid encoder 500 is the same as the configuration shown in FIG. 9, but the ACELP encoder 504 in FIG. 9 can be omitted.
- the configuration of the sound signal hybrid decoder 900 is the same as that shown in FIG. 14, but the ACELP decoder 903 in FIG. 14 can be omitted.
- FIG. 27 is a diagram illustrating an encoded frame when the encoding mode is switched from the FD encoding mode to the TCX encoding mode.
- the block switching unit 502 when the frame i is encoded, a signal obtained by adding the component X generated from the signal [a i ⁇ 1 , b i ⁇ 1 ] of the preceding frame i ⁇ 1 is encoded. Specifically, the block switching unit 502 generates an extended frame that combines the component X and the signal [a i , b i ] of the frame i.
- the extension frame has a length of (N + N / 2).
- the extended frame is transmitted to the TCX encoder 507 by the block switching unit 502 and encoded in the TCX encoding mode.
- the component X is generated by the same method as described with reference to FIGS. 8A and 8B.
- FIG. 28 is a schematic diagram showing switching control (decoding method) of the block switching unit 904 when a signal encoded in the FD encoding mode is switched to a signal encoded in the TCX code mode.
- a frame i-1 is a frame encoded by the FD encoding mode
- a frame i that is a decoding target frame is a frame encoded by the TCX encoding mode.
- the decoding target frame i when the signal encoded in the FD encoding mode is continuous, the decoding target frame i can be decoded to reconstruct the signal of the frame i-1. That is, in the case shown in FIG. 11, the signal of frame i-2 can be reconstructed by a normal FD decoding process.
- the decoding target frame i since the decoding target frame i is encoded in the ACELP encoding mode, an unnatural sound due to an aliasing component is generated in the signal of the frame i ⁇ 1. That is, the signal of frame i-1 becomes an aliasing portion as shown in FIG.
- the block switching unit 904 performs a decoding process using the following three signals.
- the component X signal of the TCX composite signal obtained by performing the TCX decoding process on the decoding target frame i is used to reconstruct the signal of the frame i ⁇ 1 in which the aliasing component is reduced.
- This signal is a signal indicated as a subframe 2001 in FIG. 11, and is the component X described with reference to FIG. 8A.
- the component X is specifically a i ⁇ 1 w 5 + (b i ⁇ 1 w 6 ) R.
- the signal corresponding to the frame i-3 in the windowed signal has a reduced aliasing component in the frame i ⁇ . Used to reconstruct one signal.
- This signal is shown as a subframe 2002 and a subframe 2003 in FIG.
- this signal is obtained by inversely transforming the frame i-1 as a normal frame with a length of 4N by the AAC-ELD low delay filter bank, and further performing window processing.
- the inverse transform signal is
- the signal corresponding to the frame i-3 (the aliasing portion indicated as subframe 2002 and subframe 2003 in FIG. 28) is extracted from the inversely transformed signal as follows. That is,
- the signal [a i-3 , b i-3 ] of the frame i-3 obtained by performing the FD decoding process on the decoding target frame i-2 regenerates the signal of the frame i-1 in which the aliasing component is reduced. Used to configure.
- the signal of frame i-3 is shown as subframe 2004 and subframe 2005 in FIG.
- the method for reconstructing the signal of frame i ⁇ 1 with the aliasing component reduced using the above signal is the same as the method described with reference to FIGS. 12A and 12B. Specifically, it may be considered that the subframes 1001, 1002, 1003, 1004, and 1005 in FIG. 12A are respectively replaced with the subframes 2001, 2002, 2003, 2004, and 2005 in FIG. As a result, the signal [a i ⁇ 1 , b i ⁇ 1 ] of the frame i is reconstructed.
- FIG. 29 is a diagram showing a delay amount of the encoding / decoding process according to the third embodiment. In FIG. 29, it is assumed that the encoding process for frame i-1 is started at time t.
- IMDCT-transformed output of frame i-1 due to the characteristics of the window of the low delay filter bank in AAC-ELD
- a TCX composite signal for frame i is obtained at time t + 2N samples. That is, subframe 2001 (component X) is obtained at time t + 2N samples.
- the synthesis window w R, 8 in which the portion corresponding to the N / 4 samples in the first half is zero is applied to the subframe 2001, N / 4 samples before the subframe 2001 is completely acquired. Sound output can be started.
- the sound signal hybrid decoder 900 may further include a synthesis error compensation (SEC) device.
- SEC synthesis error compensation
- the configuration of the sound signal hybrid encoder 500 is the same as the configuration shown in FIG. 9, but the ACELP encoder 504 in FIG. 9 can be omitted.
- the configuration of the sound signal hybrid decoder 900 is the same as that shown in FIG. 14, but the ACELP decoder 903 in FIG. 14 can be omitted.
- FIG. 30 is a diagram illustrating an encoded frame when the encoding mode is switched from the TCX encoding mode to the FD encoding mode.
- Frame i-1 is encoded by the TCX encoding mode.
- the frame i is encoded by being concatenated with the preceding three frames i-3, i-2, and i-1 according to the FD encoding mode.
- the signal corresponding to the frame i-3 in the windowed signal is used. This signal is shown as subframe 2301 and subframe 2302 in FIG.
- a TCX composite signal [a i-1 , b i-1 ] obtained by performing TCX decoding on the decoding target frame i-1 is used.
- This signal is a signal indicated by subframes 2303 and 2304 in FIG.
- the signal [a i-3 , b i-3 ] of the frame i-3 obtained by performing the TCX decoding process on the decoding target frame i-3 is used.
- the signal of frame i-3 is shown as subframe 2307 and subframe 2308 in FIG.
- the signal corresponding to the frame i-3 of the windowed signal (eighth signal) (in FIG. 31, subframe 2301 and subframe 2302 Each signal) is represented by the following equation.
- the TCX composite signal [a i-1 , b i-1 ] obtained by performing the TCX decoding process on the decoding target frame i-1 is as follows.
- the TCX composite signal indicated as subframes 2303 and 2304 includes an aliasing component because subsequent frames are not encoded in the TCX encoding mode
- the method for generating the subframes 2401 and 2402 shown in FIG. 32 is the same as the method shown in FIG. 20A.
- subframes 1401, 1402, 1407, 1408, 1501, and 1502 are replaced with subframes 2301, 2302, 2307, 2308, 2401, and 2402, respectively.
- the signal corresponding to the frame i-2 (the ninth signal) in the windowed signal is used.
- a signal (tenth signal) corresponding to the frame i-2 in the windowed signal after the decoding target frame i is inversely transformed by the AAC-ELD low delay filter bank is used.
- the signal [a i-2 , b i-2 ] of the frame i-2 obtained by performing the TCX decoding process on the decoding target frame i-2 is used. This signal is shown as subframe 2305 and subframe 2306 in FIG.
- the decoding method of the decoding target frame i + 1 using the above three signals is the same as the method described with reference to FIG. Specifically, in FIG. 21, it can be considered that subframes 1405 and 1406 are replaced with subframes 2305 and 2306, respectively.
- the signal (16th signal) of the portion (aliasing portion) corresponding to the frame i-1 in the windowed signal is used.
- the signal (18th signal) of the portion corresponding to the frame i-1 (aliasing portion) of the windowed signal after the frame i is inversely transformed by the AAC-ELD low delay filter bank is used.
- the signal (17th signal) of the portion (aliasing portion) corresponding to the frame i-1 in the windowed signal is used.
- signals [a i-3 , b i-3 ] obtained by decoding the frame i-3 by the TCX decoding process are used.
- the decoding method of the decoding target frame i + 2 using the above five signals is the same as the method described with reference to FIG. Specifically, in FIG. 22, it can be considered that subframes 1407 and 1408 are replaced with subframes 2307 and 2308, respectively. Also, subframes 1601 and 1602 shown in FIG. 22 are replaced with frames generated by the method described in the decoding method of decoding target frame i (the method of replacing a frame with a frame in TCX encoding mode in FIG. 20B). Just think of it.
- FIG. 33 is a diagram showing a delay amount of the encoding / decoding process according to the fourth embodiment. In FIG. 33, it is assumed that the encoding process for frame i-1 is started at time t.
- the TCX composite signal of frame i-1 is obtained at time t + N samples. That is, subframes 2401 and 2402 (subframes 2303 and 2304) are obtained at time t + N samples.
- subframe 2307 and the subframe 2308 are signals reconstructed by decoding the preceding frame, they have already been acquired.
- the IMDCT transformed output of frame i is obtained at time t + 7 * N / 4 samples due to the window characteristics of the low delay filter bank in AAC-ELD. That is, subframe 2301 and subframe 2302 are obtained at time t + 7 * N / 4 samples. However, the sub-frame 2301, since the first half N / 4 four synthesis window corresponding parts is zero in the sample w R, 8 is applied to N / 4 samples before obtaining the sub-frame 2301 completely Sound output can be started.
- the sound signal hybrid decoder 900 may further include a synthesis error compensation (SEC) device.
- SEC synthesis error compensation
- a sound signal hybrid encoder encoding method when a transient signal is encoded and a sound signal hybrid decoder decoding method when a transient signal is decoded will be described.
- the configuration of the sound signal hybrid encoder 500 is the same as the configuration shown in FIG. 9, but the ACELP encoder 504 in FIG. 9 can be omitted.
- the configuration of the sound signal hybrid decoder 900 is the same as that shown in FIG. 14, but the ACELP decoder 903 in FIG. 14 can be omitted.
- the encoding target frame i is a transient signal (transient frame)
- the encoding target frame i when encoding the encoding target frame i, it is generated from the signal [a i-1 , b i-1 ] of the preceding frame i-1.
- the signal to which the component X to be added is added is encoded.
- the block switching unit 502 generates an extended frame that combines the component X and the signal [a i , b i ] of the frame i.
- the extension frame has a length of (N + N / 2).
- the extended frame is transmitted to the TCX encoder 507 by the block switching unit 502 and encoded in the TCX encoding mode.
- the TCX encoder 507 performs TCX encoding using the short window mode of the MDCT filter bank.
- the encoded frame is the same as that described with reference to FIG.
- the component X is generated by the same method as described with reference to FIGS. 8A and 8B.
- Whether or not the encoding target frame i is a transient signal is determined based on, for example, whether or not the energy in the encoding target frame exceeds a predetermined threshold, but is limited to such a method. is not.
- the delay amount of the encoding / decoding process of the fifth embodiment is the same as that of the first and third embodiments and is 7 * N / 4 samples.
- the sound signal hybrid decoder 900 may further include a synthesis error compensation (SEC) device.
- SEC synthesis error compensation
- a CELP system other than ACELP such as a VSELP (Vector Sum Excited Linear Prediction) encoding mode
- VSELP Vector Sum Excited Linear Prediction
- CELP methods other than ACELP may be used for the decoding process.
- the AAC-ELD mode has been mainly described as an example of the FD encoding mode.
- the present invention is not limited to the AAC-ELD mode, and is a code that requires overlap processing by a plurality of preceding frames. It is applicable to the conversion method.
- each of the above devices can be realized by a computer system including a microprocessor, a ROM, a RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like.
- a computer program is stored in the RAM or the hard disk unit.
- Each device achieves its functions by the microprocessor operating according to the computer program.
- the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.
- a part or all of the components constituting each of the above devices may be configured by one system LSI (Large Scale Integration).
- the system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. .
- a computer program is stored in the ROM.
- the system LSI achieves its functions by the microprocessor loading a computer program from the ROM to the RAM and performing operations such as operations in accordance with the loaded computer program.
- Part or all of the constituent elements constituting each of the above devices may be configured from an IC card or a single module that can be attached to and detached from each device.
- the IC card or module is a computer system that includes a microprocessor, ROM, RAM, and the like.
- the IC card or the module may include the super multifunctional LSI described above.
- the IC card or the module achieves its functions by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
- the present invention may be realized by the method described above. Further, these methods may be realized by a computer program realized by a computer, or may be realized by a digital signal consisting of a computer program.
- the present invention also relates to a computer-readable recording medium that can read a computer program or a digital signal, such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray Disc), You may implement
- a computer program or a digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.
- the present invention is also a computer system including a microprocessor and a memory.
- the memory stores a computer program, and the microprocessor may operate according to the computer program.
- program or digital signal may be recorded on a recording medium and transferred, or the program or digital signal may be transferred via a network or the like, and may be implemented by another independent computer system.
- this invention is not limited to these embodiment or its modification. Unless it deviates from the gist of the present invention, various modifications conceived by those skilled in the art are applied to the present embodiment or the modification thereof, or a form constructed by combining different embodiments or components in the modification. Included within the scope of the present invention.
- the sound signal hybrid decoder and sound signal hybrid encoder of the present invention are capable of encoding and decoding sound signals with high sound quality and low delay, and can be used for broadcasting systems, portable televisions, mobile phone communications, video conferences, and the like. it can.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
音声コーデックは、特に、音声信号の特徴に応じて音声信号を符号化するためのコーデックである(非特許文献1参照)。音声コーデックは、音声信号を低ビットレートで符号化した場合、低遅延で良好な音質が実現される。しかしながら、音声コーデックは、音響信号の符号化には適していない。したがって、音声コーデックによって音響信号を符号化した場合、例えば、AACなどの音響コーデックで符号化した場合に比べて音質は低下する。
・ACELP符号化モードからFD符号化モードへの遷移(実施の形態2)
・FD符号化モードからTCX符号化モードへの遷移(実施の形態3)
・TCX符号化モードからFD符号化モードへの遷移(実施の形態4)
・FD符号化モードから過渡信号符号化モードへの遷移(実施の形態5)
実施の形態1では、符号化モードをFD符号化モードからACELP符号化モードに切り替える場合の音信号ハイブリッドエンコーダの符号化方法及び音信号ハイブリッドデコーダの復号方法について説明する。なお、以下の実施の形態の説明においては、特に断りのない限り、FD符号化モードとはAAC-ELDを意味するものとする。
図6は、実施の形態1に係る音信号ハイブリッドエンコーダの構成を示すブロック図である。
以下、音信号ハイブリッドエンコーダ500によって図8Aに示されるように符号化された符号化信号を復号する音信号ハイブリッドデコーダについて説明する。
次に、以上説明した実施の形態1に係る符号化・復号処理の遅延量について説明する。
以上、説明したように、音信号ハイブリッドエンコーダ500及び音信号ハイブリッドデコーダ900によれば、符号化モードがFD符号化モードからACELP符号化モードに切り替えられた最初のフレームである遷移フレームを復号する場合に発生するエイリアシングを低減することができ、FD復号技術とACELP復号技術とのシームレスな切替が実現される。
実施の形態2では、符号化モードをACELP符号化モードからがFD符号化モードに切り替える場合の音信号ハイブリッドエンコーダ500の符号化方法及び音信号ハイブリッドデコーダ900の復号方法について説明する。なお、音信号ハイブリッドエンコーダ500及び音信号ハイブリッドデコーダ900の構成は、実施の形態1と同じである。
図17は、符号化モードがACELP符号化モードからFD符号化モードに切り替えられる場合の符号化されたフレームを示す図である。
以下、音信号ハイブリッドエンコーダ500によって図17に示されるように符号化された符号化信号を復号する音信号ハイブリッドデコーダ900の復号方法について説明する。
図18は、ACELP符号化モードで符号化された信号がFD符号化モードで符号化された信号に切り替わるときの、ブロック切替部904の切替制御(復号方法)を示す模式図である。
復号対象フレームi+1を復号してフレームiの信号[ai,bi]を再構成する場合、エイリアシング成分を低減するために、ブロック切替部904は、次の3つの信号を用いて復号処理を行う。
復号対象フレームi+2を復号してフレームi+1の信号[ai+1,bi+1]を再構成する場合、エイリアシング成分を低減するために、ブロック切替部904は、次の5つの信号を用いて復号処理を行う。
次に、以上説明した実施の形態2に係る符号化・復号処理の遅延量について説明する。
以上、実施の形態2において説明したように、音信号ハイブリッドエンコーダ500及び音信号ハイブリッドデコーダ900によれば、符号化モードがACELP符号化モードからFD符号化モードに切り替えられた最初のフレームである遷移フレームを復号する場合に発生するエイリアシングを低減することができ、ACELP復号処理とFD復号処理とのシームレスな切替が実現される。
実施の形態3では、符号化モードをFD符号化モードからTCX符号化モードに切り替える場合の音信号ハイブリッドエンコーダ500の符号化方法及び音信号ハイブリッドデコーダ900の復号方法について説明する。
まず、符号化モードがFD符号化モードからTCX符号化モードに切り替えられる場合のブロック切替部502の制御について説明する。
次に、FD符号化モードで符号化された信号がTCX符号化モードで符号化された信号に切り替わるときの、ブロック切替部904の切替制御(復号方法)について説明する。
次に、以上説明した実施の形態1に係る符号化・復号処理の遅延量について説明する。
以上、説明したように、音信号ハイブリッドエンコーダ500及び音信号ハイブリッドデコーダ900によれば、符号化モードがFD符号化モードからTCX符号化モードに切り替えられた最初のフレームである遷移フレームを復号する場合に発生するエイリアシングを低減することができ、FD復号技術とTCX復号技術とのシームレスな切替が実現される。
実施の形態4では、符号化モードをTCX符号化モードからFD符号化モードに切り替える場合の音信号ハイブリッドエンコーダ500符号化方法及び音信号ハイブリッドデコーダ900の復号方法について説明する。
図30は、符号化モードがTCX符号化モードからFD符号化モードに切り替えられる場合の符号化されたフレームを示す図である。
以下、音信号ハイブリッドエンコーダ500によって図31に示されるように符号化された符号化信号を復号する音信号ハイブリッドデコーダ900の復号方法について説明する。
復号対象フレームiを復号する場合、エイリアシング成分を低減するために、ブロック切替部904は、次の3つの信号を用いて復号処理を行う。
復号対象フレームi+1を復号する場合、エイリアシング成分を低減するために、ブロック切替部904は、次の3つの信号を用いて復号処理を行う。
復号対象フレームi+2を復号する場合、エイリアシング成分を低減するために、ブロック切替部904は、次の5つの信号を用いて復号処理を行う。
次に、以上説明した実施の形態4に係る符号化・復号処理の遅延量について説明する。
以上、説明したように、音信号ハイブリッドエンコーダ500及び音信号ハイブリッドデコーダ900によれば、符号化モードがTCX符号化モードからFD符号化モードに切り替えられた最初のフレームである遷移フレームを復号する場合に発生するエイリアシングを低減することができ、TCX復号技術とFD復号技術とのシームレスな切替が実現される。
実施の形態5では、過渡信号を符号化する場合の音信号ハイブリッドエンコーダの符号化方法、及び過渡信号を復号する場合の音信号ハイブリッドデコーダの復号方法について説明する。実施の形態5において、音信号ハイブリッドエンコーダ500の構成は、図9に示される構成と同様であるが、図9中のACELPエンコーダ504は、省略可能である。また、音信号ハイブリッドデコーダ900の構成は、図14に示される構成と同様であるが、図14中のACELPデコーダ903は、省略可能である。
まず、符号化対象フレームiが過渡信号(過渡フレーム)である場合、符号化対象フレームiを符号化するときには、先行するフレームi-1の信号[ai-1,bi-1]から生成される成分Xを加えた信号が符号化される。具体的には、ブロック切替部502は、成分Xと、フレームiの信号[ai,bi]とを合わせた拡張フレームを生成する。拡張フレームは、(N+N/2)の長さである。拡張フレームは、ブロック切替部502によりTCXエンコーダ507に送信され、TCX符号化モードで符号化される。なお、このとき、TCXエンコーダ507は、MDCTフィルタバンクのショートウィンドウモードを用いたTCX符号化を行う。このとき、符号化されたフレームは、図27を用いて説明したものと同様である。また、成分Xは、図8A及び図8Bを用いて説明した方法と同じ方法で生成される。
上記のように符号化された過渡フレームの復号方法は、FD符号化モードで符号化された信号がTCX符号化モードで符号化された信号に切り替わるときの復号方法と同様である。すなわち、図12Aまたは図28を用いて説明した方法と同様である。
以上、説明したように、音信号ハイブリッドデコーダ900によれば、FD符号化モードで符号化を行っているときの過渡フレームにおいて、TCX符号化モードで符号化し、復号することによって、より音質を向上させることができる。
以上、本発明を上記実施の形態に基づいて説明してきたが、本発明は、上記の実施の形態に限定されないのはもちろんである。
501 高周波エンコーダ
502 ブロック切替部
503 信号分類部
504 ACELPエンコーダ
505 FDエンコーダ
506 ビットマルチプレクサ
507 TCXエンコーダ
508 ローカルデコーダ
509 ローカルエンコーダ
900 音信号ハイブリッドデコーダ
901 デマルチプレクサ
902 FDデコーダ
903 ACELPデコーダ
904 ブロック切替部
905 高周波デコーダ
906 TCXデコーダ
907 SEC装置
1001~1005、1101、1102 サブフレーム
1401~1408、1501、1502、1601、1602 サブフレーム
1701、1702、1801、1802 サブフレーム
2001~2005、2301~2308、2401、2402 サブフレーム
2901、2902、3101、3102、3201、3202 サブフレーム
3301、3302 サブフレーム
Claims (20)
- 低遅延フィルタバンクを用いた音響符号化処理で符号化された音響フレームと、線形予測係数を用いた音声符号化処理で符号化された音声フレームとが含まれるビットストリームを復号する音信号ハイブリッドデコーダであって、
前記音響フレームを低遅延逆フィルタバンク処理を用いて復号する低遅延変換デコーダと、
前記音声フレームを復号する音声信号デコーダと、
前記ビットストリームのうちの復号対象フレームが前記音響フレームである場合、当該復号対象フレームを前記低遅延変換デコーダによって復号し、前記復号対象フレームが前記音声フレームである場合、当該復号対象フレームを前記音声信号デコーダによって復号する制御を行うブロック切替部とを備え、
前記復号対象フレームが、前記音響フレームから前記音声フレームに切り替わった最初の前記音声フレームである第iフレームであるとき、
前記第iフレームには、前記第iフレームよりも1フレーム先行するフレームである第i-1フレームの符号化前の信号を用いて生成された第1信号が符号化された状態で含まれ、
前記ブロック切替部は、
(1)
前記第iフレームよりも2フレーム先行するフレームである第i-2フレームを前記低遅延変換デコーダによって復号することで得られる、前記第iフレームよりも3フレーム先行するフレームである第i-3フレームの再構成された信号を窓処理した信号である第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第iフレームを前記音声信号デコーダによって復号することで得られる、前記第1信号に窓処理を行った信号と、前記第i-1フレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-3フレームに対応する部分である第3信号のフレームの前半部分の信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する、または
(2)
前記第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの前半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する
音信号ハイブリッドデコーダ。 - 低遅延フィルタバンクを用いた音響符号化処理で符号化された音響フレームと、線形予測係数を用いた音声符号化処理で符号化された音声フレームとが含まれるビットストリームを復号する音信号ハイブリッドデコーダであって、
前記音響フレームを低遅延逆フィルタバンク処理によって復号する低遅延変換デコーダと、
前記音声フレームを復号する音声信号デコーダと、
前記ビットストリームのうちの復号対象フレームが前記音響フレームである場合、当該復号対象フレームを前記低遅延変換デコーダによって復号し、前記復号対象フレームが前記音声フレームである場合、当該復号対象フレームを前記音声信号デコーダによって復号する制御を行うブロック切替部とを備え、
前記ブロック切替部は、
前記復号対象フレームが、前記音声フレームから前記音響フレームに切り替わった最初の音響フレームである第iフレームであるとき、
前記第iフレームよりも1フレーム先行するフレームである第i-1フレームを前記音声信号デコーダによって復号することで得られる信号を窓処理した第4信号に、当該第4信号を畳み込み処理した信号を加算し、窓処理を行った第5信号と、前記第iフレームよりも3フレーム先行するフレームである第i-3フレームを前記音声信号デコーダによって復号することで得られる信号を窓処理した第6信号に、当該第6信号を畳み込み処理した信号を加算し、窓処理を行った第7信号と、前記第iフレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-3フレームに対応する部分である第8信号と、を加算する処理を行って符号化前の前記第i-1フレームに対応する信号である再構成信号を生成する
音信号ハイブリッドデコーダ。 - 前記ブロック切替部は、
前記復号対象フレームが、前記第iフレームの1フレーム後のフレームである第i+1フレームであるとき、
前記第i+1フレームを前記低遅延逆フィルタバンク処理及び窓処理した信号のうちの、前記第iフレームよりも2フレーム先行するフレームである第i-2フレームに対応する部分である第9信号と、前記第iフレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-2フレームに対応する部分である第10信号と、前記第i-2フレームを前記音声信号デコーダによって復号することで得られる第11信号に第1の窓処理を行なった信号のフレームの前半部分に相当する信号に、前記第11信号に前記第1の窓処理を行った信号のフレームの後半部分に相当する信号に畳み込み処理した信号を加算することで得られる第12信号に、当該第12信号を畳み込み処理した信号を連結し、窓処理を行った第13信号と、前記第11信号に前記第1の窓処理とは異なる第2の窓処理を行った信号のフレームの前半部分に相当する信号に、前記第11信号に前記第2の窓処理を行った信号のフレームの後半部分に相当する信号に畳み込み処理した信号を加算することで得られる第14信号に、当該第14信号を畳み込み処理して符号を反転させた信号を連結し、窓処理を行った第15信号と、を加算する処理を行って、符号化前の前記第iフレームに対応する信号を生成する
請求項2に記載の音信号ハイブリッドデコーダ。 - 前記ブロック切替部は、
前記復号対象フレームが、前記第iフレームの2フレーム後のフレームである第i+2フレームであるとき、
前記i+2フレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-1フレームに対応する部分である第16信号と、前記第i+1フレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-1フレームに対応する部分である第17信号と、前記第iフレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-1フレームに対応する部分である第18信号と、前記第i-3フレームを前記音声信号デコーダによって復号することで得られる第19信号に窓処理を行なった信号のフレームの前半部分に相当する信号に、前記第19信号に窓処理を行った信号のフレームの後半部分に相当する信号に畳み込み処理した信号を加算することで得られる第20信号に、当該第20信号を畳み込み処理した信号を連結し、窓処理を行った第21信号と、前記再構成信号に窓処理を行った信号のフレームの前半部分に相当する信号に、前記再構成信号に窓処理を行った信号のフレームの後半部分に相当する信号に畳み込み処理した信号を加算することで得られる第22信号に、当該第22信号を畳み込み処理して符号を反転させた信号を連結し、窓処理を行った第23信号と、を加算する処理を行って、符号化前の前記第i+1フレームに対応する信号を生成する
請求項3に記載の音信号ハイブリッドデコーダ。 - 低遅延フィルタバンクを用いた音響符号化処理で符号化された音響フレームと、線形予測係数を用いた音声符号化処理で符号化された音声フレームとが含まれるビットストリームを復号する音信号ハイブリッドデコーダであって、
前記音響フレームを低遅延逆フィルタバンク処理を用いて復号する低遅延変換デコーダと、
TCX(Transform Coded Excitation)方式によって符号化された前記音声フレームを復号するTCXデコーダと、
前記ビットストリームのうちの復号対象フレームが前記音響フレームである場合、当該復号対象フレームを前記低遅延変換デコーダによって復号し、前記復号対象フレームが前記音声フレームである場合、当該復号対象フレームを前記音声信号デコーダによって復号する制御を行うブロック切替部とを備え、
前記復号対象フレームが、前記音響フレームから前記音声フレームに切り替わった最初の前記音声フレームであって、過渡信号が符号化されたフレームである第iフレームであるとき、
前記第iフレームには、前記第iフレームよりも1フレーム先行するフレームである第i-1フレームの符号化前の信号を用いて生成された第1信号が符号化された状態で含まれ、
前記ブロック切替部は、
(1)
前記第iフレームよりも2フレーム先行するフレームである第i-2フレームを前記低遅延変換デコーダによって復号することで得られる、前記第iフレームよりも3フレーム先行するフレームである第i-3フレームの再構成された信号を窓処理した信号である第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第iフレームを前記音声信号デコーダによって復号することで得られる、前記第1信号に窓処理を行った信号と、前記第i-1フレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-3フレームに対応する部分である第3信号のフレームの前半部分の信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する、または
(2)
前記第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの前半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する
音信号ハイブリッドデコーダ。 - 前記低遅延変換デコーダは、前記音響フレーム及び当該音響フレームに時間的に連続して先行する3つのフレームのそれぞれについて低遅延逆フィルタバンク処理及び窓処理を行った信号のそれぞれを重複加算処理することによって、当該音響フレームを復号するAAC-ELD(Advanced Audio Coding - Enhanced Low Delay)デコーダである
請求項1~5のいずれか1項に記載の音信号ハイブリッドデコーダ。 - 前記音声信号デコーダは、ACELP(Algebraic Code Excited Linear Prediction)係数を用いて符号化された前記音声フレームを復号するACELPデコーダである
請求項1~4のいずれか1項に記載の音信号ハイブリッドデコーダ。 - 前記音声信号デコーダは、TCX方式によって符号化された前記音声フレームを復号するTCXデコーダである
請求項1~4のいずれか1項に記載の音信号ハイブリッドデコーダ。 - さらに、前記復号対象フレームとともに符号化された合成エラー情報を復号する合成エラー補償装置を備え、
前記合成エラー情報は、前記ビットストリームが符号化される前の信号と、前記ビットストリームを復号した信号との差分を表す情報であり、
前記合成エラー補償装置は、前記ブロック切替部が生成した前記符号化前の前記第i-1フレームの信号、前記ブロック切替部が生成した前記符号化前の前記第iフレームの信号、または前記ブロック切替部が生成した前記符号化前の前記第i+1フレームの信号を、復号した前記合成エラー情報を用いて修正する
請求項1~8のいずれか1項に記載の音信号ハイブリッドデコーダ。 - 音信号の音響特性を分析し、前記音信号に含まれるフレームが音響信号であるか音声信号であるかを判断する信号分類部と、
低遅延フィルタバンクを用いて前記フレームを符号化する低遅延変換エンコーダと、
前記フレームの線形予測係数を算出することによって当該フレームを符号化する音声信号エンコーダと、
前記信号分類部が前記音響信号であると判断した符号化対象フレームを前記低遅延変換エンコーダによって符号化し、前記信号分類部が前記音声信号であると判断した前記符号化対象フレームを前記音声信号エンコーダによって符号化する制御を行うブロック切替部とを備え、
前記ブロック切替部は、
(1)前記符号化対象フレームが、前記信号分類部が前記音声信号であると判断したフレームである第i-1フレームの1フレーム後のフレームであって、前記信号分類部が前記音響信号であると判断したフレームである第iフレームであるとき、
前記第i-1フレームの前半部分に相当する信号を窓処理した信号に前記第i-1フレームの後半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコーダによって符号化する、または
(2)前記第i-1フレームの後半部分に相当する信号を窓処理した信号に前記第i-1フレームの前半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコーダによって符号化する
音信号ハイブリッドエンコーダ。 - 音信号の音響特性を分析し、前記音信号に含まれるフレームが音響信号であるか音声信号であるかを判断する信号分類部と、
低遅延フィルタバンクを用いて前記フレームを符号化する低遅延変換エンコーダと、
前記フレームの線形予測係数の残差をMDCT(Modified Discrete Cosine Transform)処理したTCX方式によって前記フレームを符号化するTCXエンコーダと、
前記信号分類部が前記音響信号であると判断した符号化対象フレームを前記低遅延変換エンコーダによって符号化し、前記信号分類部が前記音声信号であると判断した前記符号化対象フレームを前記音声信号エンコーダによって符号化する制御を行うブロック切替部とを備え、
前記ブロック切替部は、
前記符号化対象フレームである第iフレームが、前記信号分類部が前記音響信号であり、なおかつエネルギーが急激に変化する過渡信号であると判断したフレームであるとき、
(1)前記第iフレームの1フレーム前のフレームである第i-1フレームの前半部分に相当する信号を窓処理した信号に前記第i-1フレームの後半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコーダによって符号化する、または
(2)前記第i-1フレームの後半部分に相当する信号を窓処理した信号に前記第i-1フレームの前半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコーダによって符号化する
音信号ハイブリッドエンコーダ。 - 前記低遅延変換エンコーダは、前記フレームと、当該フレームに時間的に連続して先行する3つのフレームとを連結した拡張フレームについて窓処理及び低遅延フィルタバンク処理をすることによって、前記フレームを符号化するAAC-ELDエンコーダである
請求項10または11に記載の音信号ハイブリッドエンコーダ。 - 前記音声信号エンコーダは、ACELP係数を生成することによって前記フレームを符号化するACELPエンコーダである
請求項10~12のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - 前記音声信号エンコーダは、前記線形予測係数の残差をMDCT処理して前記フレームを符号化するTCXエンコーダである
請求項10~12のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - さらに、
符号化した前記音信号を復号するローカルデコーダと、
前記音信号と、前記ローカルデコーダが復号した前記音信号との差分である合成エラー情報を符号化するローカルエンコーダとを備える
請求項10~14のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - 低遅延フィルタバンクを用いた音響符号化処理で符号化された音響フレームと、線形予測係数を用いた音声符号化処理で符号化された音声フレームとが含まれるビットストリームを復号する音信号復号方法であって、
前記音響フレームを低遅延逆フィルタバンク処理を用いて復号する低遅延変換デコードステップと、
前記音声フレームを復号する音声信号デコードステップと、
前記ビットストリームのうちの復号対象フレームが前記音響フレームである場合、当該復号対象フレームを前記低遅延変換デコードステップによって復号し、前記復号対象フレームが前記音声フレームである場合、当該復号対象フレームを前記音声信号デコードステップによって復号する制御を行う制御ステップとを含み、
前記復号対象フレームが、前記音響フレームから前記音声フレームに切り替わった最初の前記音声フレームである第iフレームであるとき、
前記第iフレームには、前記第iフレームよりも1フレーム先行するフレームである第i-1フレームの符号化前の信号を用いて生成された第1信号が符号化された状態で含まれ、
前記制御ステップでは、
(1)
前記第iフレームよりも2フレーム先行するフレームである第i-2フレームを前記低遅延変換デコードステップによって復号することで得られる、前記第iフレームよりも3フレーム先行するフレームである第i-3フレームの再構成された信号を窓処理した信号である第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第iフレームを前記音声信号デコードステップによって復号することで得られる、前記第1信号に窓処理を行った信号と、前記第i-1フレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-3フレームに対応する部分である第3信号のフレームの前半部分の信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する、または
(2)
前記第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの前半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する
音信号復号方法。 - 低遅延フィルタバンクを用いた音響符号化処理で符号化された音響フレームと、線形予測係数を用いた音声符号化処理で符号化された音声フレームとが含まれるビットストリームを復号する音信号復号方法であって、
前記音響フレームを低遅延逆フィルタバンク処理によって復号する低遅延変換デコードステップと、
前記音声フレームを復号する音声信号デコードステップと、
前記ビットストリームのうちの復号対象フレームが前記音響フレームである場合、当該復号対象フレームを前記低遅延変換デコードステップによって復号し、前記復号対象フレームが前記音声フレームである場合、当該復号対象フレームを前記音声信号デコードステップによって復号する制御を行う制御ステップとを含み、
前記制御ステップは、
前記復号対象フレームが、前記音声フレームから前記音響フレームに切り替わった最初の音響フレームである第iフレームであるとき、
前記第iフレームよりも1フレーム先行するフレームである第i-1フレームを前記音声信号デコードステップによって復号することで得られる信号を窓処理した第4信号に、当該第4信号を畳み込み処理した信号を加算し、窓処理を行った第5信号と、前記第iフレームよりも3フレーム先行するフレームである第i-3フレームを前記音声信号デコードステップによって復号することで得られる信号を窓処理した第6信号に、当該第6信号を畳み込み処理した信号を加算し、窓処理を行った第7信号と、前記第iフレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-3フレームに対応する部分である第8信号と、を加算する処理を行って符号化前の前記第i-1フレームに対応する信号である再構成信号を生成する
音信号復号方法。 - 低遅延フィルタバンクを用いた音響符号化処理で符号化された音響フレームと、線形予測係数を用いた音声符号化処理で符号化された音声フレームとが含まれるビットストリームを復号する音信号復号方法であって、
前記音響フレームを低遅延逆フィルタバンク処理を用いて復号する低遅延変換デコードステップと、
TCX方式によって符号化された前記音声フレームを復号するTCXデコードステップと、
前記ビットストリームのうちの復号対象フレームが前記音響フレームである場合、当該復号対象フレームを前記低遅延変換デコードステップによって復号し、前記復号対象フレームが前記音声フレームである場合、当該復号対象フレームを前記音声信号デコードステップによって復号する制御を行う制御ステップとを含み、
前記復号対象フレームが、前記音響フレームから前記音声フレームに切り替わった最初の前記音声フレームであって、エネルギーが急激に変化する過渡信号が符号化されたフレームである第iフレームであるとき、
前記第iフレームには、前記第iフレームよりも1フレーム先行するフレームである第i-1フレームの符号化前の信号を用いて生成された第1信号が符号化された状態で含まれ、
前記制御ステップでは、
(1)
前記第iフレームよりも2フレーム先行するフレームである第i-2フレームを前記低遅延変換デコードステップによって復号することで得られる、前記第iフレームよりも3フレーム先行するフレームである第i-3フレームの再構成された信号を窓処理した信号である第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第iフレームを前記音声信号デコードステップによって復号することで得られる、前記第1信号に窓処理を行った信号と、前記第i-1フレームを前記低遅延逆フィルタバンク処理及び窓処理した信号の前記第i-3フレームに対応する部分である第3信号のフレームの前半部分の信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する、または
(2)
前記第2信号のフレームの前半部分に相当する信号に、前記第2信号のフレームの後半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に畳み込み処理及び窓処理を行った信号と、前記第3信号のフレームの前半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの前半部分に対応する信号を生成し、
前記第2信号のフレームの後半部分に相当する信号に、前記第2信号のフレームの前半部分に相当する信号を畳み込み処理した信号を加算して窓処理を行った信号と、前記第1信号に窓処理を行った信号と、前記第3信号のフレームの後半部分に相当する信号と、を加算する処理を行って符号化前の前記第i-1フレームの後半部分に対応する信号を生成する
音信号復号方法。 - 音信号の音響特性を分析し、前記音信号に含まれるフレームが音響信号であるか音声信号であるかを判断する判断ステップと、
低遅延フィルタバンクを用いて前記フレームを符号化する低遅延変換エンコードステップと、
前記フレームの線形予測係数を算出することによって当該フレームを符号化する音声信号エンコードステップと、
前記判断ステップにおいて前記音響信号であると判断した符号化対象フレームを前記低遅延変換エンコードステップによって符号化し、前記判断ステップにおいて前記音声信号であると判断した前記符号化対象フレームを前記音声信号エンコードステップによって符号化する制御を行う制御ステップとを含み、
前記制御ステップでは、
(1)前記符号化対象フレームが、前記判断ステップにおいて前記音声信号であると判断したフレームである第i-1フレームの1フレーム後のフレームであって、前記判断ステップにおいて前記音響信号であると判断したフレームである第iフレームであるとき、
前記第i-1フレームの前半部分に相当する信号を窓処理した信号に前記第i-1フレームの後半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコードステップによって符号化する、または
(2)前記第i-1フレームの後半部分に相当する信号を窓処理した信号に前記第i-1フレームの前半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコードステップによって符号化する
音信号符号化方法。 - 音信号の音響特性を分析し、前記音信号に含まれるフレームが音響信号であるか音声信号であるかを判断する判断ステップと、
低遅延フィルタバンクを用いて前記フレームを符号化する低遅延変換エンコードステップと、
前記フレームの線形予測係数の残差をMDCT処理したTCX方式によって前記フレームを符号化するTCXエンコードステップと、
前記判断ステップにおいて前記音響信号であると判断した符号化対象フレームを前記低遅延変換エンコードステップによって符号化し、前記判断ステップにおいて前記音声信号であると判断した前記符号化対象フレームを前記音声信号エンコードステップによって符号化する制御を行う制御ステップとを含み、
前記制御ステップでは、
前記符号化対象フレームである第iフレームが、前記判断ステップにおいて前記音響信号であり、なおかつエネルギーが急激に変化する過渡信号であると判断したフレームであるとき、
(1)前記第iフレームの1フレーム前のフレームである第i-1フレームの前半部分に相当する信号を窓処理した信号に前記第i-1フレームの後半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコードステップによって符号化する、または
(2)前記第i-1フレームの後半部分に相当する信号を窓処理した信号に前記第i-1フレームの前半部分に相当する信号を窓処理して畳み込み処理した信号を加算した信号と、前記第iフレームとを前記音声信号エンコードステップによって符号化する
音信号符号化方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012800043379A CN103477388A (zh) | 2011-10-28 | 2012-10-24 | 声音信号混合解码器、声音信号混合编码器、声音信号解码方法及声音信号编码方法 |
US13/996,644 US20140058737A1 (en) | 2011-10-28 | 2012-10-24 | Hybrid sound signal decoder, hybrid sound signal encoder, sound signal decoding method, and sound signal encoding method |
EP12844467.6A EP2772914A4 (en) | 2011-10-28 | 2012-10-24 | DECODER FOR HYBRID SOUND SIGNALS, COORDINATORS FOR HYBRID SOUND SIGNALS, DECODING PROCEDURE FOR SOUND SIGNALS AND CODING SIGNALING PROCESSES |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011-236912 | 2011-10-28 | ||
JP2011236912 | 2011-10-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013061584A1 true WO2013061584A1 (ja) | 2013-05-02 |
Family
ID=48167435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/006802 WO2013061584A1 (ja) | 2011-10-28 | 2012-10-24 | 音信号ハイブリッドデコーダ、音信号ハイブリッドエンコーダ、音信号復号方法、及び音信号符号化方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20140058737A1 (ja) |
EP (1) | EP2772914A4 (ja) |
JP (1) | JPWO2013061584A1 (ja) |
CN (1) | CN103477388A (ja) |
WO (1) | WO2013061584A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10770084B2 (en) | 2015-09-25 | 2020-09-08 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder, decoder and methods for signal-adaptive switching of the overlap ratio in audio transform coding |
CN113272896A (zh) * | 2018-11-05 | 2021-08-17 | 弗劳恩霍夫应用研究促进协会 | 提供经处理音频信号表示的装置和处理器、音频解码器、音频编码器、方法及计算机程序 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10141004B2 (en) * | 2013-08-28 | 2018-11-27 | Dolby Laboratories Licensing Corporation | Hybrid waveform-coded and parametric-coded speech enhancement |
CN106448688B (zh) | 2014-07-28 | 2019-11-05 | 华为技术有限公司 | 音频编码方法及相关装置 |
US9555308B2 (en) | 2014-08-18 | 2017-01-31 | Nike, Inc. | Bag with multiple storage compartments |
CN104967755A (zh) * | 2015-05-28 | 2015-10-07 | 魏佳 | 一种基于嵌入式编码的远程互动方法 |
US10504530B2 (en) | 2015-11-03 | 2019-12-10 | Dolby Laboratories Licensing Corporation | Switching between transforms |
US11488613B2 (en) * | 2019-11-13 | 2022-11-01 | Electronics And Telecommunications Research Institute | Residual coding method of linear prediction coding coefficient based on collaborative quantization, and computing device for performing the method |
CN115223579A (zh) * | 2021-04-20 | 2022-10-21 | 华为技术有限公司 | 一种编解码器协商与切换方法 |
CN117356092A (zh) * | 2021-04-22 | 2024-01-05 | Op方案有限责任公司 | 用于混合特征视频比特流和解码器的系统、方法和比特流结构 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08263098A (ja) * | 1995-03-28 | 1996-10-11 | Nippon Telegr & Teleph Corp <Ntt> | 音響信号符号化方法、音響信号復号化方法 |
WO2010003532A1 (en) | 2008-07-11 | 2010-01-14 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for encoding/decoding an audio signal using an aliasing switch scheme |
WO2010003663A1 (en) * | 2008-07-11 | 2010-01-14 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder and decoder for encoding frames of sampled audio signals |
JP2010210680A (ja) * | 2009-03-06 | 2010-09-24 | Ntt Docomo Inc | 音信号符号化方法、音信号復号方法、符号化装置、復号装置、音信号処理システム、音信号符号化プログラム、及び、音信号復号プログラム |
WO2011158485A2 (ja) * | 2010-06-14 | 2011-12-22 | パナソニック株式会社 | オーディオハイブリッド符号化装置およびオーディオハイブリッド復号装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2415105A1 (en) * | 2002-12-24 | 2004-06-24 | Voiceage Corporation | A method and device for robust predictive vector quantization of linear prediction parameters in variable bit rate speech coding |
CA2457988A1 (en) * | 2004-02-18 | 2005-08-18 | Voiceage Corporation | Methods and devices for audio compression based on acelp/tcx coding and multi-rate lattice vector quantization |
PT2165328T (pt) * | 2007-06-11 | 2018-04-24 | Fraunhofer Ges Forschung | Codificação e descodificação de um sinal de áudio tendo uma parte do tipo impulso e uma parte estacionária |
PT2301011T (pt) * | 2008-07-11 | 2018-10-26 | Fraunhofer Ges Forschung | Método e discriminador para classificar diferentes segmentos de um sinal de áudio compreendendo segmentos de discurso e de música |
JP5243661B2 (ja) * | 2009-10-20 | 2013-07-24 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | オーディオ信号符号器、オーディオ信号復号器、オーディオコンテンツの符号化表現を供給するための方法、オーディオコンテンツの復号化表現を供給するための方法、および低遅延アプリケーションにおける使用のためのコンピュータ・プログラム |
TR201900663T4 (tr) * | 2010-01-13 | 2019-02-21 | Voiceage Corp | Doğrusal öngörücü filtreleme kullanarak ileri doğru zaman alanı alıasıng iptali ile ses kod çözümü. |
-
2012
- 2012-10-24 JP JP2013512289A patent/JPWO2013061584A1/ja not_active Withdrawn
- 2012-10-24 US US13/996,644 patent/US20140058737A1/en not_active Abandoned
- 2012-10-24 CN CN2012800043379A patent/CN103477388A/zh active Pending
- 2012-10-24 EP EP12844467.6A patent/EP2772914A4/en not_active Withdrawn
- 2012-10-24 WO PCT/JP2012/006802 patent/WO2013061584A1/ja active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08263098A (ja) * | 1995-03-28 | 1996-10-11 | Nippon Telegr & Teleph Corp <Ntt> | 音響信号符号化方法、音響信号復号化方法 |
WO2010003532A1 (en) | 2008-07-11 | 2010-01-14 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for encoding/decoding an audio signal using an aliasing switch scheme |
WO2010003663A1 (en) * | 2008-07-11 | 2010-01-14 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder and decoder for encoding frames of sampled audio signals |
JP2010210680A (ja) * | 2009-03-06 | 2010-09-24 | Ntt Docomo Inc | 音信号符号化方法、音信号復号方法、符号化装置、復号装置、音信号処理システム、音信号符号化プログラム、及び、音信号復号プログラム |
WO2011158485A2 (ja) * | 2010-06-14 | 2011-12-22 | パナソニック株式会社 | オーディオハイブリッド符号化装置およびオーディオハイブリッド復号装置 |
Non-Patent Citations (3)
Title |
---|
CHI-MIN LIU; WEN-CHIEH LEE: "A unified fast algorithm for cosine modulated filterbanks in current audio standards", J. AUDIO ENGINEERING, vol. 47, no. 12, 1999, pages 1061 - 1075 |
MILAN JELINEK: "Wideband Speech Coding Advances in VMR-WB Standard", IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 15, no. 4, 2007, pages 1167 - 1179, XP011177208, DOI: doi:10.1109/TASL.2007.894514 |
See also references of EP2772914A4 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10770084B2 (en) | 2015-09-25 | 2020-09-08 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder, decoder and methods for signal-adaptive switching of the overlap ratio in audio transform coding |
CN113272896A (zh) * | 2018-11-05 | 2021-08-17 | 弗劳恩霍夫应用研究促进协会 | 提供经处理音频信号表示的装置和处理器、音频解码器、音频编码器、方法及计算机程序 |
US11948590B2 (en) | 2018-11-05 | 2024-04-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and audio signal processor, for providing processed audio signal representation, audio decoder, audio encoder, methods and computer programs |
US11990146B2 (en) | 2018-11-05 | 2024-05-21 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and audio signal processor, for providing processed audio signal representation, audio decoder, methods and computer programs |
Also Published As
Publication number | Publication date |
---|---|
EP2772914A1 (en) | 2014-09-03 |
JPWO2013061584A1 (ja) | 2015-04-02 |
CN103477388A (zh) | 2013-12-25 |
US20140058737A1 (en) | 2014-02-27 |
EP2772914A4 (en) | 2015-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013061584A1 (ja) | 音信号ハイブリッドデコーダ、音信号ハイブリッドエンコーダ、音信号復号方法、及び音信号符号化方法 | |
JP5171842B2 (ja) | 時間領域データストリームを表している符号化および復号化のための符号器、復号器およびその方法 | |
EP3958257B1 (en) | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal | |
JP5208901B2 (ja) | 音声信号および音楽信号を符号化する方法 | |
KR101508819B1 (ko) | 멀티 모드 오디오 코덱 및 이를 위해 적응된 celp 코딩 | |
KR101699898B1 (ko) | 스펙트럼 영역에서 디코딩된 오디오 신호를 처리하기 위한 방법 및 장치 | |
JP5882895B2 (ja) | 復号装置 | |
JP6067601B2 (ja) | 音声/音楽統合信号の符号化/復号化装置 | |
RU2584463C2 (ru) | Кодирование звука с малой задержкой, содержащее чередующиеся предсказательное кодирование и кодирование с преобразованием | |
TWI479478B (zh) | 用以使用對齊的預看部分將音訊信號解碼的裝置與方法 | |
JP6126006B2 (ja) | 音信号ハイブリッドエンコーダ、音信号ハイブリッドデコーダ、音信号符号化方法、及び音信号復号方法 | |
US20200035253A1 (en) | Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates | |
KR20130069821A (ko) | 오디오 신호를 처리하고 결합된 통합형 음성 및 오디오 코덱(usac)을 위한 보다 높은 시간적 입도를 제공하기 위한 장치 및 방법 | |
US9984696B2 (en) | Transition from a transform coding/decoding to a predictive coding/decoding | |
JP3748083B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP3598112B2 (ja) | 広帯域音声復元方法及び広帯域音声復元装置 | |
JP2006065362A (ja) | 広帯域音声復元装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2013512289 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12844467 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13996644 Country of ref document: US Ref document number: 2012844467 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |