WO2012006770A1 - Audio signal generator - Google Patents

Audio signal generator Download PDF

Info

Publication number
WO2012006770A1
WO2012006770A1 PCT/CN2010/075107 CN2010075107W WO2012006770A1 WO 2012006770 A1 WO2012006770 A1 WO 2012006770A1 CN 2010075107 W CN2010075107 W CN 2010075107W WO 2012006770 A1 WO2012006770 A1 WO 2012006770A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
audio
audio channel
phase shift
channel signal
Prior art date
Application number
PCT/CN2010/075107
Other languages
French (fr)
Inventor
Faller Christof
Yue Lang
Jianfeng Xu
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN201080067974.1A priority Critical patent/CN102986254B/en
Priority to PCT/CN2010/075107 priority patent/WO2012006770A1/en
Publication of WO2012006770A1 publication Critical patent/WO2012006770A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1

Definitions

  • the present invention relates to mobile communications over communication networks.
  • parametric stereo or multi-channel audio coding as described in C. Faller and F. Baumgarte, "Efficient representation of spatial audio using perceptual parametrization," in Proc. IEEE Workshop on Appl. of Sig. Proc. to Audio and Acoust., Oct. 2001 , pp. 199-202, C. Faller and F. Baumgarte, "Binaural Cue Coding: A novel and efficient representation of spatial audio," in Proc. ICASSP, May 2002, vol. 2, pp. 1841-1844, E. Schuijers, W.
  • the decoder may use this side information to regenerate the original number of audio channels based on the decoded waveform coded audio channels.
  • the audio channels are independent then the downmix audio signal can be generated by summing the input audio channels.
  • the audio channels are not independent, such as is commonly the case for stereo and multichannel audio signals, then the summing operation may result in coloration of the sound due to time varying inter-channel signal statistics.
  • a magnitude equalization may be deployed, as described in A. Baumgarte, C. Faller, and P. Kroon, "Audio coder enhancement using scalable binaural cue coding with equalized mixing," in Preprint 1 16th Conv. Aud. Eng. Soc, May 2004.
  • a goal to be achieved by the present invention is to provide a concept for more efficiently generating a downmix signal from a plurality of audio channels.
  • the invention is based on the finding, that a downmix audio signal may more efficiently be generated when a time-adaptive phase alignment is used prior to summation of audio channel signals embodying input audio channels.
  • the phase alignment may reduce signal cancellations when combining the resulting audio channel signals to obtain a downmix signal and may be performed either frame by frame and/or upon the basis of an averaging process which is performed over a multiplicity of frames. Additionally, magnitude equalization may be applied in addition to the averaging process.
  • the invention relates to an audio signal generator for generating a downmix audio signal from a multi-channel audio signal comprising a first audio channel signal and a second audio channel signal, the audio signal generator comprising a processor for amending a phase of the first audio channel signal using a first phase shift coefficient, and/or for amending a phase of the second audio channel signal using a second phase shift coefficient to reduce signal cancellations when combining the resulting first and second audio channel signal, and a combiner for combining the resulting first and second audio channel signal to obtain the downmix audio signal.
  • the processor is
  • the processor is configured to amend the phase of the first audio channel signal and/or the phase of the second audio channel signal to match a phase of a reference signal.
  • the reference signal may be e.g. a predetermined reference signal or may be generated from the first and the second audio signal.
  • the processor is
  • the mean value may be determined upon the basis of an averaging process by summing such products e.g. over a plurality of frames.
  • the processor is
  • the first phase shift coefficient is a complex-conjugated version of the second phase shift coefficient.
  • the sign of the respective imaginary part may be inverted.
  • the processor is configured to determine the first phase shift coefficient P x ⁇ k,i) and the second phase shift coefficient P 2 (k,i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
  • the processor is configured to determine the first phase shift coefficient P x ⁇ k,i) and the second phase shift coefficient P 2 (k,i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
  • X ⁇ (k,i) and X 2 (k,i) respectively denote the first audio channel signal and the second audio channel signal
  • E ⁇ . ⁇ denotes an averaging operation
  • the processor is configured to determine the first phase shift coefficient P x (k,i) and the second phase shift coefficient P 2 (k, i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
  • the power factor scales the downmix signal in order to adjust its power with regard to the first and second audio channel.
  • the combiner is configured to superimpose the first auxiliary signal and the second auxiliary signal to obtain the downmix signal.
  • the combiner may be configured to sum up the auxiliary signals.
  • the processor is
  • the processing means may comprise at least one multiplier to multiply the respective audio channel signal.
  • the audio signal generator further comprises a transformer for transforming a first time-domain signal into frequency domain to obtain the first audio channel signal, and for transforming a second time-domain signal into frequency domain to obtain the second audio channel signal.
  • the transformer may be a Fourier transformer.
  • the downmix audio signal is a frequency domain signal
  • the audio signal generator further comprises a transformer for transforming the downmix audio signal into time- domain.
  • the transformer may be e.g. an inverse Fourier transformer.
  • the invention relates to a method for generating a downmix audio signal from a multi-channel audio signal comprising a first audio channel signal and a second audio channel signal, the method comprising amending a phase of the first audio channel signal using a first phase shift coefficient, and/or for amending a phase of the second audio channel signal using a second phase shift coefficient to reduce signal cancellations when combining the resulting first and second audio channel signal, and combining the resulting first and second audio channel signal to obtain the downmix audio signal.
  • a method for generating a downmix signal of multiple input audio channels.
  • the method may comprise the steps of receiving a plurality of input audio channels, converting the input audio channels to a plurality of subbands, estimating the phase difference between the input audio channels and a reference audio channel, modifying the phase of at least one input audio channel subband to match the phase of the corresponding reference audio channel subband, generating a sum of the modified input audio channel subbands to generate the downmix signal subbands, and converting the downmix signal subbands to the time-domain to generate the downmix output signal.
  • the invention relates to a computer program for performing the method for generating a downmix audio signal when run on a computer.
  • Fig. 1 shows a block diagram of an audio signal generator
  • Fig. 2 shows a diagram of a method for generating a downmix signal.
  • Fig. 1 shows a block diagram of an audio signal generator according to an implementation form.
  • the following descriptions may refer to a stereo signal forming an embodiment of a multi-channel signal.
  • the left and right channels of the stereo signal may form embodiments of the first and second audio channel signal of a multi-channel audio signal.
  • the audio signal generator may comprise a transformer 101 for transforming a left time-domain channel, x-i(n) of a stereo signal, and for transforming a right time-domain channel, x 2 (n) of the stereo signal into frequency domain to obtain a first audio channel signal Xi(k, i) and a second audio channel signal X 2 (k, i) in frequency domain.
  • the first and second audio channel signals are provided to a processor 103 which is configured to amend a phase of the first audio channel signal using a first phase shift coefficient Pi(k, i) and/or for amending a phase of the second audio channel signal using a second phase shift coefficient P 2 (k, i) to reduce signal cancellations when combining the resulting first and second audio channel signal after amendment.
  • the processor may comprise a first multiplier 105 for multiplying the first audio channel signal with the first phase shift coefficient, and a second multiplier 107 for multiplying the second audio channel signal with the second phase shift coefficient.
  • the output of the multiplier 105 and 107 may be provided to a combiner 109 for combining, e.g. superimposing, the resulting first and second audio channel signal to obtain the downmix audio signal.
  • the processor 103 may comprise a downmix parameter computer 1 10 receiving the outputs of the transformer 101.
  • the downmix parameter computer 1 10 may be configured to determine the first and second phase shift coefficient according to the principles and/or upon the basis of the formulas described herein.
  • the audio signal generator may comprise a further multiplier 1 1 1 for weighting the output of the combiner 109 with a power factor M(k, i).
  • the processor 103 may be configured to weight the output of the combiner 109 with the power factor.
  • a downmix audio signal X(k, i) in frequency domain may result.
  • the downmix audio signal in frequency domain may be transformed into time-domain using e.g. an inverse filter bank 1 13, which may be implemented as a inverse Fourier transform by way of example.
  • the transformer 101 may, correspondingly, comprise a first filter bank 1 15 for transforming the left channel to obtain the first audio channel signal, and a second filter bank 1 17 for transforming the right channel to obtain the second audio channel signal in frequency domain.
  • the filter banks 1 15, 1 17 may be
  • Fig. 2 shows a diagram of a method for generating a downmix audio signal from a multi-channel audio signal which comprises a first audio channel signal and a second audio channel signal.
  • the method comprises amending 201 a phase of the first audio channel signal using a first phase shift coefficient, and/or amending 203 a phase of the second audio channel signal using a second phase shift coefficient, and combining 205 the resulting first and second audio channel signal to obtain the downmix audio signal.
  • the left and right time-domain channels of a stereo signal are denoted x-i(n) and X2(n), where n is the discrete time index.
  • the signals are converted to a time-frequency representation.
  • the left and right stereo signal channels in the time-frequency representation are denoted Xi(k, i) and X 2 (k, i), where k is e.g. a downsampled time index (also referred to as frame index) and * is a frequency index.
  • k is e.g. a downsampled time index (also referred to as frame index)
  • * is a frequency index.
  • the downmix signal is computed as
  • FIG. 1 shows the processing scheme which is applied to generate the downmix signal.
  • the left and right signals, x-i(n) and X2(n) are converted to a time-frequency domain by a transform or interbank (FB).
  • Downmix processing parameters are computed and applied prior to adding the left and right subband signals to generate the subband downmix signal.
  • the subband downmix signal is converted back to time domain using an inverse filterbank/transform (IFB).
  • IFB inverse filterbank/transform
  • the goal is to determine Pi(k, i) and P 2 (k, i) such that the left and right channels add in phase to prevent potentially time dependent signal cancellations.
  • the real-valued factor M(k, i) is determined such that the power of X(k, i) is the same or approximates the sum of the power of Xi(k, i) and X 2 (k, i).
  • One strategy is to align one channel, e.g. X 2 (k,i) , relative to the other channel, e.g. X x (k,i) . This may be achieved by choosing
  • ⁇ , ⁇
  • E ⁇ . ⁇ is a short-time averaging operation
  • . is the absolute value of a complex number
  • * denotes complex conjugate.
  • M(k,i) may be computed such that the power of the downmix signal is the same or approximates the sum of power of the left and right channel. This may be achieved by using
  • the range of M(k, i) may be limited to [0.5, 2] corresponding to ⁇ 6dB.
  • the following formulas may be used to obtain the phase shift coefficients:
  • both audio channel signals representing e.g. a right channel and a left channel may be phase modified.
  • half of the phase correction may be applied to both channels, which may have the advantage that the maximum audio waveform modification is smaller.
  • one may phase-align both audio channel signals, e.g. the left and right channel of a stereo signal, relative to the sum signal, i.e.
  • a reference signal ' may be used which has a phase which may be a weighted sum of the phases of both channels and a magnitude which is the sum or norm of the magnitude of both channels. That is, the phase shift coefficients may be used with a reference signal ("sum signal") which may be equal to:
  • Such signal may have the following properties:
  • Power spectrum is the sum of left and right power spectra, such that during time- averaging operations, the phase will be weighted by signal power.
  • Phase is weighted average of the phase of left and right, i.e. first and second, channel.
  • the weights may be chosen such that the phase of the stronger channel may dominate.
  • the reference signal may be one of the first or second audio channel signals.
  • the reference signal may be the sum of the first and second audio channel signal.
  • the reference signal may be a signal with a magnitude which is a combination of the input signal subband magnitudes, and a phase which is a combination of the input signal subband phases.
  • a phase difference may be estimated using an averaging process over multiple frames.
  • a gain factor may be applied to the downmix subbband signals for magnitude equalization, after summation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The invention relates to an audio signal generator for generating a downmix audio signal from a multi-channel audio signal comprising a first audio channel signal and a second audio channel signal. The audio signal generator comprises a processor (103) for amending a phase of the first audio channel signal using a first phase shift coefficient, and/or for amending a phase of the second audio channel signal using a second phase shift coefficient to reduce signal cancellations when combining the resulting first and second audio channel signal, and a combiner (109) for combining the resulting first and second audio channel signal to obtain the downmix audio signal.

Description

DESCRIPTION
Audio signal generator BACKGROUND OF THE INVENTION
The present invention relates to mobile communications over communication networks. In order to code a multi-channel audio signal, parametric stereo or multi-channel audio coding as described in C. Faller and F. Baumgarte, "Efficient representation of spatial audio using perceptual parametrization," in Proc. IEEE Workshop on Appl. of Sig. Proc. to Audio and Acoust., Oct. 2001 , pp. 199-202, C. Faller and F. Baumgarte, "Binaural Cue Coding: A novel and efficient representation of spatial audio," in Proc. ICASSP, May 2002, vol. 2, pp. 1841-1844, E. Schuijers, W.
Oomen, B. den Brinker, and J. Breebaart, "Advances in parametric coding for high-quality audio," in Preprint 1 14th Conv. Aud. Eng. Soc, Mar. 2003, F.
Baumgarte and C. Faller, "Binaural Cue Coding - Part I: Psychoacoustic
fundamentals and design principles," IEEE Trans, on Speech and Audio Proc, vol. 1 1 , no. 6, pp. 509-519, Nov. 2003, C. Faller and F. Baumgarte, "Binaural Cue Coding - Part II: Schemes and applications," IEEE Trans, on Speech and Audio Proc, vol. 1 1 , no. 6, pp. 520-531 , Nov. 2003, may be applied. Conventional parametric stereo or multi-channel audio coding approaches apply downmixing to generate a downmix audio signal comprising fewer channels than the original multi-channel audio signal. These fewer channels may be waveform coded and side information relating to the original signal channel relations may be added to the coded audio channels. The decoder may use this side information to regenerate the original number of audio channels based on the decoded waveform coded audio channels. When the audio channels are independent then the downmix audio signal can be generated by summing the input audio channels. When, however, the audio channels are not independent, such as is commonly the case for stereo and multichannel audio signals, then the summing operation may result in coloration of the sound due to time varying inter-channel signal statistics. To mitigate this problem, e.g. a magnitude equalization may be deployed, as described in A. Baumgarte, C. Faller, and P. Kroon, "Audio coder enhancement using scalable binaural cue coding with equalized mixing," in Preprint 1 16th Conv. Aud. Eng. Soc, May 2004. However, when there are delays between the original audio channels then the magnitude equalization may not always sufficiently correct the undesired effects of signal cancellation, occurring by cancellation when out-of-phase signals are added for downmix generation. This problem may occur when a sound engineer mixed music using delays between channels, phase inversion, or spaced microphones for recording. When the parametric stereo or multi-channel audio coding is used for speech applications, i.e. telephony or voice-over-IP, then the mentioned problems may occur when several microphones are used to pick up voice in a tele-conferencing scenario. SUMMARY OF THE INVENTION
A goal to be achieved by the present invention is to provide a concept for more efficiently generating a downmix signal from a plurality of audio channels. The invention is based on the finding, that a downmix audio signal may more efficiently be generated when a time-adaptive phase alignment is used prior to summation of audio channel signals embodying input audio channels. The phase alignment may reduce signal cancellations when combining the resulting audio channel signals to obtain a downmix signal and may be performed either frame by frame and/or upon the basis of an averaging process which is performed over a multiplicity of frames. Additionally, magnitude equalization may be applied in addition to the averaging process.
According to a first aspect, the invention relates to an audio signal generator for generating a downmix audio signal from a multi-channel audio signal comprising a first audio channel signal and a second audio channel signal, the audio signal generator comprising a processor for amending a phase of the first audio channel signal using a first phase shift coefficient, and/or for amending a phase of the second audio channel signal using a second phase shift coefficient to reduce signal cancellations when combining the resulting first and second audio channel signal, and a combiner for combining the resulting first and second audio channel signal to obtain the downmix audio signal.
According to an implementation form of the first aspect, the processor is
configured to amend the phase of the first audio channel signal and/or the phase of the second audio channel signal to match a phase of a reference signal. The reference signal may be e.g. a predetermined reference signal or may be generated from the first and the second audio signal. According to an implementation form of the first aspect, the processor is
configured to determine a mean value of a product of the first audio channel signal and the second audio channel signal to obtain the first phase shift factor and/or the second phase shift factor. The mean value may be determined upon the basis of an averaging process by summing such products e.g. over a plurality of frames.
According to an implementation form of the first aspect, the processor is
configured to set the first phase shift coefficient or the second phase shift coefficient to one. Thus, a phase of only one audio channel may be amended. According to an implementation form of the first aspect, the first phase shift coefficient is a complex-conjugated version of the second phase shift coefficient. In order to obtain the complex-conjugated version of the either phase shift coefficient, the sign of the respective imaginary part may be inverted.
According to an implementation form of the first aspect, the processor is configured to determine the first phase shift coefficient Px{k,i) and the second phase shift coefficient P2(k,i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
E{X,( , 2 *( , }
P2(k,i)
E{ ,( , 2 *(M} wherein Xx(k,i) and X2(k,i) respectively denote the first audio channel signal and the second audio channel signal, and wherein E{.} denotes an averaging operation. According to an implementation form of the first aspect, the processor is configured to determine the first phase shift coefficient Px{k,i) and the second phase shift coefficient P2(k,i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
Pl(k,i) = P(k,iY
P2(k, i) = P(k,i)
Figure imgf000005_0001
wherein X{(k,i) and X2(k,i) respectively denote the first audio channel signal and the second audio channel signal, and wherein E{.} denotes an averaging operation.
According to an implementation form of the first aspect, the processor is configured to determine the first phase shift coefficient Px (k,i) and the second phase shift coefficient P2(k, i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
Figure imgf000006_0001
E{S(k,i)X2 *(k,i)}
P2(k,i) =
E{s(k,i)x;(k,i)}
S(k,i) = X,(k,i) + X2(k,i) or
Figure imgf000006_0002
wherein
Figure imgf000006_0003
wherein X^k ) and X2(k,i) respectively denote the first audio channel signal and the second audio channel signal, and wherein E{.} denotes an averaging operation. According to an implementation form of the first aspect, the processor is
configured to weight the downmix signal by a power factor, in particular by a power factor which depends on a sum of powers of the first channel audio signal and the second channel audio signal. Thus, the power factor scales the downmix signal in order to adjust its power with regard to the first and second audio channel.
According to an implementation form of the first aspect, the combiner is configured to superimpose the first auxiliary signal and the second auxiliary signal to obtain the downmix signal. In order to superimpose the auxiliary signals, the combiner may be configured to sum up the auxiliary signals.
According to an implementation form of the first aspect, the processor is
configured to multiply the first audio channel signal by the first phase shift coefficient, or to multiply the second audio channel signal by the second phase shift coefficient for phase amendment. The processing means may comprise at least one multiplier to multiply the respective audio channel signal.
According to an implementation form of the first aspect, the audio signal generator further comprises a transformer for transforming a first time-domain signal into frequency domain to obtain the first audio channel signal, and for transforming a second time-domain signal into frequency domain to obtain the second audio channel signal. The transformer may be a Fourier transformer.
According to an implementation form of the first aspect, the downmix audio signal is a frequency domain signal, and wherein the audio signal generator further comprises a transformer for transforming the downmix audio signal into time- domain. The transformer may be e.g. an inverse Fourier transformer.
Furthermore, each implementation form of the first aspect may be combined with any other implementation form of the first aspect to obtain further implementation forms of the first aspect of the invention. According to a second aspect, the invention relates to a method for generating a downmix audio signal from a multi-channel audio signal comprising a first audio channel signal and a second audio channel signal, the method comprising amending a phase of the first audio channel signal using a first phase shift coefficient, and/or for amending a phase of the second audio channel signal using a second phase shift coefficient to reduce signal cancellations when combining the resulting first and second audio channel signal, and combining the resulting first and second audio channel signal to obtain the downmix audio signal.
According to some implementation forms of the second aspect or according to another aspect, a method is provided for generating a downmix signal of multiple input audio channels. The method may comprise the steps of receiving a plurality of input audio channels, converting the input audio channels to a plurality of subbands, estimating the phase difference between the input audio channels and a reference audio channel, modifying the phase of at least one input audio channel subband to match the phase of the corresponding reference audio channel subband, generating a sum of the modified input audio channel subbands to generate the downmix signal subbands, and converting the downmix signal subbands to the time-domain to generate the downmix output signal.
According to a third aspect, the invention relates to a computer program for performing the method for generating a downmix audio signal when run on a computer.
BRIEF DESCRIPTION OF THE DRAWINGS
Further embodiments of the invention will be described with respect to the following figures, in which:
Fig. 1 shows a block diagram of an audio signal generator; and Fig. 2 shows a diagram of a method for generating a downmix signal. DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
Fig. 1 shows a block diagram of an audio signal generator according to an implementation form. For brevity, the following descriptions may refer to a stereo signal forming an embodiment of a multi-channel signal. Thus, the left and right channels of the stereo signal may form embodiments of the first and second audio channel signal of a multi-channel audio signal.
As shown in Fig. 1 , the audio signal generator may comprise a transformer 101 for transforming a left time-domain channel, x-i(n) of a stereo signal, and for transforming a right time-domain channel, x2(n) of the stereo signal into frequency domain to obtain a first audio channel signal Xi(k, i) and a second audio channel signal X2(k, i) in frequency domain. The first and second audio channel signals are provided to a processor 103 which is configured to amend a phase of the first audio channel signal using a first phase shift coefficient Pi(k, i) and/or for amending a phase of the second audio channel signal using a second phase shift coefficient P2(k, i) to reduce signal cancellations when combining the resulting first and second audio channel signal after amendment. In order to amend the respective phase of the respective audio channel signal, the processor may comprise a first multiplier 105 for multiplying the first audio channel signal with the first phase shift coefficient, and a second multiplier 107 for multiplying the second audio channel signal with the second phase shift coefficient.
The output of the multiplier 105 and 107 may be provided to a combiner 109 for combining, e.g. superimposing, the resulting first and second audio channel signal to obtain the downmix audio signal. In order to determine the first and second phase shift coefficient, the processor 103 may comprise a downmix parameter computer 1 10 receiving the outputs of the transformer 101. The downmix parameter computer 1 10 may be configured to determine the first and second phase shift coefficient according to the principles and/or upon the basis of the formulas described herein.
Optionally, the audio signal generator may comprise a further multiplier 1 1 1 for weighting the output of the combiner 109 with a power factor M(k, i). Optionally, the processor 103 may be configured to weight the output of the combiner 109 with the power factor. At the output of the combiner 109 or at the output of the multiplier 1 1 1 , a downmix audio signal X(k, i) in frequency domain may result. The downmix audio signal in frequency domain may be transformed into time-domain using e.g. an inverse filter bank 1 13, which may be implemented as a inverse Fourier transform by way of example.
The transformer 101 may, correspondingly, comprise a first filter bank 1 15 for transforming the left channel to obtain the first audio channel signal, and a second filter bank 1 17 for transforming the right channel to obtain the second audio channel signal in frequency domain. The filter banks 1 15, 1 17 may be
implemented as Fourier transformers.
Fig. 2 shows a diagram of a method for generating a downmix audio signal from a multi-channel audio signal which comprises a first audio channel signal and a second audio channel signal. The method comprises amending 201 a phase of the first audio channel signal using a first phase shift coefficient, and/or amending 203 a phase of the second audio channel signal using a second phase shift coefficient, and combining 205 the resulting first and second audio channel signal to obtain the downmix audio signal. With respect to Fig. 1 , the left and right time-domain channels of a stereo signal are denoted x-i(n) and X2(n), where n is the discrete time index. For downmix processing, the signals are converted to a time-frequency representation. The left and right stereo signal channels in the time-frequency representation are denoted Xi(k, i) and X2(k, i), where k is e.g. a downsampled time index (also referred to as frame index) and * is a frequency index. Without loss of generality, it may in the following assumed that a complex-valued time frequency representation is used.
The downmix signal is computed as
X(k, i) = M(k, (k, i)X, (k, i) + P2(k, i)X2 (k, )) where M(k, i) is an optional real-valued gain factor and Pi(k, i) and P2(k, i) are complex left and right "phase alignment" factors with magnitude one. Figure 1 shows the processing scheme which is applied to generate the downmix signal. The left and right signals, x-i(n) and X2(n)„ are converted to a time-frequency domain by a transform or interbank (FB). Downmix processing parameters are computed and applied prior to adding the left and right subband signals to generate the subband downmix signal. The subband downmix signal is converted back to time domain using an inverse filterbank/transform (IFB).
The goal is to determine Pi(k, i) and P2(k, i) such that the left and right channels add in phase to prevent potentially time dependent signal cancellations.
Additionally, the real-valued factor M(k, i) is determined such that the power of X(k, i) is the same or approximates the sum of the power of Xi(k, i) and X2(k, i).
One strategy is to align one channel, e.g. X2(k,i) , relative to the other channel, e.g. Xx(k,i) . This may be achieved by choosing
Ρ^,ί) = \
Figure imgf000011_0001
where E{.} is a short-time averaging operation, . is the absolute value of a complex number, and * denotes complex conjugate. For the operation, a single pole averaging with a 80 ms time constant may me chosen. As mentioned above, M(k,i) may be computed such that the power of the downmix signal is the same or approximates the sum of power of the left and right channel. This may be achieved by using
E{ t (k, i)X; (k, i) + E{X2 (k, i)X2 (k, } }
Figure imgf000012_0001
i)Xx (k, i) + P2(k, i)X2 (k, if }
To improve performance in terms of artifacts when M(k, i) becomes too large or too small, the range of M(k, i) may be limited to [0.5, 2] corresponding to ± 6dB.
According to some embodiments, the following formulas may be used to obtain the phase shift coefficients:
Figure imgf000012_0002
P2(k,i) = P(k,i)
with
Figure imgf000012_0003
According to the above formulas, both audio channel signals representing e.g. a right channel and a left channel may be phase modified. As opposed to applying the whole phase correction to one channel, half of the phase correction may be applied to both channels, which may have the advantage that the maximum audio waveform modification is smaller. Alternatively, one may phase-align both audio channel signals, e.g. the left and right channel of a stereo signal, relative to the sum signal, i.e.
Figure imgf000013_0001
E{S(k )X2 *(k, i)}
P2(k, i) =
E{S(k,i)X2 *(k, i)} with S(k,i) = Xx{k,i) + X2{k,i) forming an embodiment of a reference audio signal. According to some embodiments, instead of using a sum signal, a reference signal 'may be used which has a phase which may be a weighted sum of the phases of both channels and a magnitude which is the sum or norm of the magnitude of both channels. That is, the phase shift coefficients may be used with a reference signal ("sum signal") which may be equal to:
Figure imgf000013_0002
Figure imgf000013_0003
Such signal may have the following properties:
• Power spectrum is the sum of left and right power spectra, such that during time- averaging operations, the phase will be weighted by signal power.
Phase is weighted average of the phase of left and right, i.e. first and second, channel. The weights may be chosen such that the phase of the stronger channel may dominate. According to some implementation forms, the reference signal may be one of the first or second audio channel signals.
According to some implementation forms, the reference signal may be the sum of the first and second audio channel signal.
According to some implementation forms, the reference signal may be a signal with a magnitude which is a combination of the input signal subband magnitudes, and a phase which is a combination of the input signal subband phases.
According to some implementation forms, a phase difference may be estimated using an averaging process over multiple frames.
According to some implementation forms, a gain factor may be applied to the downmix subbband signals for magnitude equalization, after summation.

Claims

CLAIMS:
1. Audio signal generator for generating a downmix audio signal from a multichannel audio signal comprising a first audio channel signal and a second audio channel signal, the audio signal generator comprising: a processor (103) for amending a phase of the first audio channel signal using a first phase shift coefficient, and/or for amending a phase of the second audio channel signal using a second phase shift coefficient to reduce signal
cancellations when combining the resulting first and second audio channel signal; and a combiner (109) for combining the resulting first and second audio channel signal to obtain the downmix audio signal.
2. The audio signal generator of claim 1 , wherein the processor (103) is configured to amend the phase of the first audio channel signal or the phase of the second audio channel signal to match a phase of a reference signal.
3. The audio signal generator of claim 1 or 2, wherein the processor (103) is configured to determine a mean value of a product of the first audio channel signal and the second audio channel signal to obtain the first phase shift factor or the second phase shift factor
4. The audio signal generator of any of the preceding claims, wherein the processor (103) is configured to set the first phase shift coefficient or the second phase shift coefficient to one.
5. The audio signal generator of any of the preceding claims, wherein the first phase shift coefficient is a complex-conjugated version of the second phase shift coefficient.
6. The audio signal generator of any of the preceding claims, wherein the processor (103) is configured to determine the first phase shift coefficient Px{k,i) and the second phase shift coefficient P2(k,i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
/?(*, /) = 1
P2(k,i)
Figure imgf000016_0001
wherein Xx(k,i) and X2(k,i) respectively denote the first audio channel signal and the second audio channel signal, and wherein E{.} denotes an averaging operation.
7. The audio signal generator of any of the preceding claims, wherein the processor (103) is configured to determine the first phase shift coefficient Px{k ) and the second phase shift coefficient P2(k,i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
P2(k,i) = P(k,i)
Figure imgf000017_0001
wherein X^k ) and X2(k,i) respectively denote the first audio channel signal and the second audio channel signal, and wherein E{.} denotes an averaging operation.
8. The audio signal generator of any of the preceding claims, wherein the processor (103) is configured to determine the first phase shift coefficient P^k ) and the second phase shift coefficient P2(k,i) , k denoting a time index, i denoting a frequency index, upon the basis of the following formulas:
Figure imgf000017_0002
S(k, = X (k, + X2 (k, or
Figure imgf000017_0003
wherein
Figure imgf000018_0001
x.ik +\x2{k )\ wherein X (k, i) and X2(k,i) respectively denote the first audio channel signal and the second audio channel signal, and wherein E{.} denotes an averaging operation.
9. The audio signal generator of any of the preceding claims, wherein the processor (103) is configured to weight the downmix signal by a power factor, in particular by a power factor which depends on a sum of powers of the first channel audio signal and the second channel audio signal.
10. The audio signal generator of any of the preceding claims, wherein the combiner (109) is configured to superimpose the first auxiliary signal and the second auxiliary signal to obtain the downmix signal.
1 1 . The audio signal generator of any of the preceding claims, wherein the processor (103) is configured to multiply the first audio channel signal by the first phase shift coefficient, or to multiply the second audio channel signal by the second phase shift coefficient for phase amendment.
12. The audio signal generator of any of the preceding claims, further comprising a transformer (101 ) for transforming a first time-domain signal into frequency domain to obtain the first audio channel signal, and for transforming a second time-domain signal into frequency domain to obtain the second audio channel signal.
13. The audio signal generator of any of the preceding claims, wherein the downmix audio signal is a frequency domain signal, and wherein the audio signal generator further comprises a transformer (1 13) for transforming the downmix audio signal into time-domain.
14. A method for generating a downmix audio signal from a multi-channel audio signal comprising a first audio channel signal and a second audio channel signal, the method comprising: amending (201 ) a phase of the first audio channel signal using a first phase shift coefficient to reduce signal cancellations when combining the resulting first and second audio channel signal; and/or amending (201 ) a phase of the second audio channel signal using a second phase shift coefficient to reduce signal cancellations when combining the resulting first and second audio channel signal; and combining (205) the resulting first and second audio channel signal to obtain the downmix audio signal.
PCT/CN2010/075107 2010-07-12 2010-07-12 Audio signal generator WO2012006770A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201080067974.1A CN102986254B (en) 2010-07-12 2010-07-12 Audio signal generator
PCT/CN2010/075107 WO2012006770A1 (en) 2010-07-12 2010-07-12 Audio signal generator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2010/075107 WO2012006770A1 (en) 2010-07-12 2010-07-12 Audio signal generator

Publications (1)

Publication Number Publication Date
WO2012006770A1 true WO2012006770A1 (en) 2012-01-19

Family

ID=45468869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/075107 WO2012006770A1 (en) 2010-07-12 2010-07-12 Audio signal generator

Country Status (2)

Country Link
CN (1) CN102986254B (en)
WO (1) WO2012006770A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015011057A1 (en) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. In an reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
CN107071197A (en) * 2017-05-16 2017-08-18 中山大学花都产业科技研究院 A kind of echo removing method and system based on the piecemeal frequency domain of delay more than all phase
EP3550561A1 (en) * 2018-04-06 2019-10-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Downmixer, audio encoder, method and computer program applying a phase value to a magnitude value
RU2773510C2 (en) * 2018-04-06 2022-06-06 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Downmixer, audio encoder, method and computer program applying phase value to absolute value

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104064191B (en) * 2014-06-10 2017-12-15 北京音之邦文化科技有限公司 Sound mixing method and device
CN104700839B (en) * 2015-02-26 2016-03-23 深圳市中兴移动通信有限公司 The method that multi-channel sound gathers, device, mobile phone and system
CN107682529B (en) * 2017-09-07 2019-11-26 维沃移动通信有限公司 A kind of acoustic signal processing method and mobile terminal
JP7352383B2 (en) * 2019-06-04 2023-09-28 フォルシアクラリオン・エレクトロニクス株式会社 Mixing processing device and mixing processing method
CN111739540A (en) * 2020-07-20 2020-10-02 天域全感音科技有限公司 Audio signal acquisition device, computer equipment and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1140294A (en) * 1995-07-10 1997-01-15 三星电子株式会社 Vocal mix circuit
CN1329810A (en) * 1998-10-13 2002-01-02 Srs实验室公司 Apparatus and method for synthesizing pseudo-stereophonic outputs from monophonic input
US20070140499A1 (en) * 2004-03-01 2007-06-21 Dolby Laboratories Licensing Corporation Multichannel audio coding

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3680374B2 (en) * 1995-09-28 2005-08-10 ソニー株式会社 Speech synthesis method
CA2808226C (en) * 2004-03-01 2016-07-19 Dolby Laboratories Licensing Corporation Multichannel audio coding
US7761304B2 (en) * 2004-11-30 2010-07-20 Agere Systems Inc. Synchronizing parametric coding of spatial audio with externally provided downmix

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1140294A (en) * 1995-07-10 1997-01-15 三星电子株式会社 Vocal mix circuit
CN1329810A (en) * 1998-10-13 2002-01-02 Srs实验室公司 Apparatus and method for synthesizing pseudo-stereophonic outputs from monophonic input
US20070140499A1 (en) * 2004-03-01 2007-06-21 Dolby Laboratories Licensing Corporation Multichannel audio coding

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2838086A1 (en) 2013-07-22 2015-02-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. In an reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
CN105518775A (en) * 2013-07-22 2016-04-20 弗朗霍夫应用科学研究促进协会 In reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
TWI560702B (en) * 2013-07-22 2016-12-01 Fraunhofer Ges Forschung Audio signal processing decoder and encoder, system, method of processing input audio signal, computer program
US10360918B2 (en) 2013-07-22 2019-07-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
WO2015011057A1 (en) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. In an reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
US10937435B2 (en) 2013-07-22 2021-03-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
CN107071197B (en) * 2017-05-16 2020-04-24 中山大学花都产业科技研究院 Echo cancellation method and system based on full-phase multi-delay block frequency domain
CN107071197A (en) * 2017-05-16 2017-08-18 中山大学花都产业科技研究院 A kind of echo removing method and system based on the piecemeal frequency domain of delay more than all phase
EP3550561A1 (en) * 2018-04-06 2019-10-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Downmixer, audio encoder, method and computer program applying a phase value to a magnitude value
KR20210003784A (en) * 2018-04-06 2021-01-12 프라운호퍼-게젤샤프트 추르 푀르데룽 데어 안제반텐 포르슝 에 파우 Downmixer, audio encoder, how to apply phase value to magnitude value and computer program
WO2019193185A1 (en) * 2018-04-06 2019-10-10 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Downmixer, audio encoder, method and computer program applying a phase value to a magnitude value
RU2773510C2 (en) * 2018-04-06 2022-06-06 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Downmixer, audio encoder, method and computer program applying phase value to absolute value
US11418904B2 (en) 2018-04-06 2022-08-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Downmixer, audio encoder, method and computer program applying a phase value to a magnitude value
KR102554699B1 (en) 2018-04-06 2023-07-13 프라운호퍼-게젤샤프트 추르 푀르데룽 데어 안제반텐 포르슝 에 파우 Downmixers, audio encoders, methods for applying phase values to magnitude values, and computer programs
EP4307721A3 (en) * 2018-04-06 2024-02-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Downmixer, audio encoder, method and computer program applying a phase value to a magnitude value

Also Published As

Publication number Publication date
CN102986254A (en) 2013-03-20
CN102986254B (en) 2015-06-17

Similar Documents

Publication Publication Date Title
US10937435B2 (en) Reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
WO2012006770A1 (en) Audio signal generator
US8090122B2 (en) Audio mixing using magnitude equalization
US9082396B2 (en) Audio signal synthesizer
EP2730102B1 (en) Method and apparatus for decomposing a stereo recording using frequency-domain processing employing a spectral weights generator
EP3748633A1 (en) Downmixer and method for downmixing at least two channels and multichannel encoder and multichannel decoder
EP3935630B1 (en) Audio downmixing

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201080067974.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10854565

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10854565

Country of ref document: EP

Kind code of ref document: A1