WO2008084436A1 - An object-oriented audio decoder - Google Patents
An object-oriented audio decoder Download PDFInfo
- Publication number
- WO2008084436A1 WO2008084436A1 PCT/IB2008/050041 IB2008050041W WO2008084436A1 WO 2008084436 A1 WO2008084436 A1 WO 2008084436A1 IB 2008050041 W IB2008050041 W IB 2008050041W WO 2008084436 A1 WO2008084436 A1 WO 2008084436A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- parameters
- audio
- transfer function
- mix
- related transfer
- Prior art date
Links
- 238000012546 transfer Methods 0.000 claims abstract description 72
- 230000005236 sound signal Effects 0.000 claims abstract description 61
- 238000006243 chemical reaction Methods 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims description 19
- 238000009877 rendering Methods 0.000 claims description 16
- 238000004891 communication Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000008901 benefit Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
- H04S3/004—For headphones
Definitions
- the invention relates to an object-oriented audio decoder comprising a binaural MPEG Surround decoder.
- parameters are extracted from the original audio signals so as to produce a reduced number of down-mix audio signals (for example only a single down-mix signal corresponding to a mono, or two down-mix signals for a stereo down-mix signal), and a corresponding set of parameters describing the spatial properties of the original multi-channel audio signal.
- the spatial properties described by the transmitted spatial parameters are used to recreate the original spatial multi-channel signal, which closely resembles the original audio signal.
- MPEG framework a workgroup has been started on object-based spatial audio coding.
- the aim of this workgroup is to "explore new technology and reuse of current MPEG Surround components and technologies for the bit rate efficient coding of multiple sound sources or objects into a number of down-mix channels and corresponding spatial parameters".
- the aim is to encode multiple audio objects in a limited set of down-mix channels with corresponding parameters.
- users interact with the content for example by repositioning the individual objects.
- Such interaction with the content is easily realized in object-oriented decoders. It is then realized by including a rendering step that follows the decoding process. Said rendering is combined with the decoding as a single processing step to prevent the need of determining individual objects.
- Said rendering is combined with the decoding as a single processing step to prevent the need of determining individual objects.
- For loudspeaker playback such combination is described in Faller, C, "Parametric joint-coding of audio sources", Proc. 120 th AES Convention, Paris, France, May 2006.
- headphone playback an efficient combination of decoding and head- related transfer function processing is described in Breebaart, J., Herre, J., Villemoes, L., Jin, C, Kj ⁇ rling, K., Plogsties, J., Koppens, J.
- MPEG Surround features a dedicated binaural decoding mode that generates a three-dimensional sound scene over the conventional headphones.
- the binaural MPEG Surround decoder has a disadvantage that the decoding process is defined specifically for only five virtual loudspeakers. This means that all objects from the down-mix audio signal can be mapped to only these positions where those five virtual loudspeakers are located.
- This object is achieved by an object-oriented audio decoder according to the invention. It is assumed that a set of objects, each with its corresponding waveform, has previously been encoded in an object-oriented encoder, which generates a down-mix audio signal (a single signal in case of a single channel), said down-mix audio signal being a down- mix of a plurality of audio objects characterized by corresponding parametric data.
- the parametric data comprises a set of object parameters for each of the different audio objects.
- the receiver receives said down-mix audio signal and said parametric data.
- This down-mix audio signal is further fed into a decoding means, said decoding means being in accordance with the binaural MPEG Surround standard.
- These decoding means perform both decoding and rendering of the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters.
- the dynamic head-related transfer function parameters are provided from an outside of the decoding means.
- the conversion means convert the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
- the head-related transfer function parameters are provided from the outside of the object-oriented decoder.
- the decoding means generate a spatial output audio signal from the audio objects to be played back over e.g. headphones.
- the advantage of the object-oriented audio decoder according to the invention is that the head-related transfer function parameters are moved out of the object-oriented audio decoder comprising decoding means in accordance with the binaural MPEG Surround standard, therefore freeing the object-oriented audio decoder from the inherent limitation of a predetermined maximum of the virtual locations of objects, when the head-related transfer function parameters are hard-wired in the binaural MPEG Surround decoder.
- the corresponding conversion of the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters results in freedom in placing the decoded objects in the space.
- the advantage of the object-oriented audio decoder according to the invention is that no explicit object decoding is required, and the rendering emerged in the MPEG Surround decoder comprised in the object-oriented audio decoder is preserved.
- the objects can be virtually placed at any position in a space by manipulating of the head-related transfer function parameters.
- converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters is combined with generation of binaural parameters as used by decoding means in one step.
- Said decoding means perform decoding in accordance with the binaural MPEG Surround standard.
- Said decoding means use the binaural parameters for decoding.
- Said binaural parameters being derived based on the spatial parameters and dynamic head-related transfer function parameters. Making such a combination simplifies the implementation of the object-oriented audio decoder, as no intermediate spatial parameters or dynamic head-related transfer function parameters are needed. Instead, the binaural parameters are directly derived from the parametric data and the head-related transfer function parameters.
- the invention further provides a receiver and a communication system, as well as corresponding methods.
- the head-related transfer function parameters are set in response to user input. This allows a user to position the objects at any position in the virtual space according to user preferences.
- the invention further provides a computer program product enabling a programmable device to perform the method according to the invention.
- Fig 1 schematically shows an object-oriented decoder according to the invention
- Fig 2 shows an example set-up of virtual loudspeakers
- Fig 3 shows a method of decoding according in accordance with some embodiments of the invention.
- Fig 4 shows a transmission system for communication of an audio signal in accordance with some embodiments of the invention.
- the parametric data 103 comprises a set of object parameters for each of the different audio objects.
- the receiver 110 receives said down-mix audio signal 102 and said parametric data 103.
- the down-mix audio signal and parametric data are indicated as a separate signals/data paths, they could be multiplexed into one signal/data stream comprising concatenated down-mix audio data that corresponds to the down-mix audio signal and the parametric data.
- the function of the receiver is then demultiplexing of the two data streams. If the down-mix audio signal 102 is provided in a compressed form (such as MPEG-I layer 3), the receiver 110 also performs decompression or decoding of the compressed audio signal into a time-domain audio down-mix signal.
- the down-mix audio signal 102 is further fed into decoding means 120, said decoding means 120 being in accordance with the binaural MPEG Surround standard.
- decoding means 120 perform both, decoding and rendering of the audio objects from the down-mix audio signals 102 based on the spatial parameters 106 and dynamic head-related transfer function parameters 107.
- the dynamic head-related transfer function (HRTF) parameters 107 are being provided from an outside of the decoding means 120.
- the conversion means 130 perform conversion of the parametric data 103 and HRTF parameters 105 into the spatial parameters 106 and the dynamic HRTF parameters 107.
- the HRTF parameters 105 are provided from the outside of the object-oriented decoder by the HRTF parameter database 200.
- decoding means 120 comprise essentially an MPEG Surround decoder
- the spatial parameters 106 are preferably provided in MPEG Surround format; furthermore the dynamic head-related transfer functions 107 are preferably provided in the corresponding MPEG Surround format for HRTF parameters.
- the conversion means can use a user-control data 104 in order to generate the spatial parameters 106 and the dynamic head-related transfer function parameters 107.
- Said user-control data 104 comprises data concerned with rendering of the audio objects.
- the user-control data 104 may indicate the desired spatial position (for example in terms of elevation and azimuth) of one or more audio objects.
- conversion means 130 select HRTF parameters 105 that correspond to the desired position from HRTF database 200. If the desired position is not directly available in HRTF parameter database 200, interpolation of parameters may be required that correspond to HRTF database positions surrounding the desired position.
- the decoding means 120 generate a spatial output audio signal 108 from the audio objects to be played back at e.g. the headphones.
- the MPEG Surround standard comprises a dedicated binaural decoding mode that generates a three-dimensional (3D) sound scene over conventional stereo headphones.
- the conventional 3D synthesis as known from MPEG Surround requires audio configured up to five loudspeakers, as currently standardized, which is further convolved with HRTFs followed by summation of the convolved signals to result in a binaural output signal pair.
- the advantage of MPEG Surround is that it provides means to perform multi-channel HRTF processing in the down-mix domain, without multi-channel audio as intermediate step. This is realized by converting HRTF parameters to the parameter domain, and subsequently computing binaural parameters from the combined spatial parameters and HRTF parameters.
- Multi-channel goes mobile: MPEG Surround binaural rendering. Proc. 29th AES conference, Seoul, Korea.
- the HRTF parameters are provided to the MPEG Surround decoder, however the end user can change the actual parameter values. This is the case when the user wants to use his/her personalization options.
- the HRTF parameters comprise: a level of the left-ear channel output, a level of the right-ear channel output, the phase difference between the left and right-ear binaural output, and optionally a coherence between the left and right-ear channel output.
- the level parameters are defined as a change in a level with respect to the original input signal level.
- the HRTF parameters are defined as a function of spatial position of a sound source, either by a mathematical model or by using a database 200.
- the resulting binaural parameters are: a level of a left-channel binaural output, a level of a right-channel binaural output, the phase difference between the left and right- channel outputs, and the coherence between the left and right-channel output.
- the level parameters are expressed relatively to the level(s) of the down-mix signal(s).
- the proposed object-oriented decoder overcomes however the limitation of audio up to five loudspeakers only.
- HRTF parameters represent the binaural properties of each virtual object separately, i.e., one HRTF parameter set for each virtual loudspeaker.
- the MPEG Surround decoder combines the binaural properties of each virtual loudspeaker into binaural parameters of the (up to) five virtual loudspeakers simultaneously, with the help of spatial parameters 106 that describe the spatial relations between the virtual loudspeaker signals. This process is only defined for (up to) five virtual loudspeakers.
- the HRTF parameters and binaural parameters have very similar representations.
- Fig 2 shows an example set-up of virtual loudspeakers.
- the object-oriented audio decoder is adapted to use virtual loudspeakers at a center 320, left-front 310, right-front 330, left-surround 340, and right-surround 350 positions, whereby the user is positioned at the location 400.
- this down-mix signal is mapped e.g. to the center 320 loudspeaker of the virtual loudspeaker set-up.
- the signal corresponding to the left channel can be mapped to the left-front 310 loudspeaker, while the signal corresponding to the right channel can be mapped to the right-front 330 loudspeaker.
- the objects can be virtually placed at any position in a space through manipulating of the head-related transfer function parameters.
- the HRTF parameters corresponding to e.g. the center loudspeaker are set to represent a combination of a multiple objects placed differently in the space.
- the HRTF parameters being taken out of control of the decoding means comprising the MPEG Surround decoder, therefore allowing an arbitrary modification of the HRTF parameters to influence the placing of object arbitrarily in the space.
- the head-related transfer function parameters comprise at least left-ear magnitude, right-ear magnitude, and phase difference, respectively, p ⁇ :l> p r , u ⁇ z . Said parameters correspond to the perceived position of an object.
- the coherence parameter p can also be comprised in the head-related transfer function parameters. If this parameter is not specified, it can be assumed to be equal to +1, or derived from the default- parameter value table as specified in the MPEG Surround standard.
- Fig 3 shows a method of decoding according in accordance with some embodiments of the invention.
- the step 510 comprises receiving at least one down-mix audio signal and parametric data.
- Each down-mix audio signal comprises a down-mix of a plurality of audio objects.
- the parametric data comprises a plurality of object parameters for each of the plurality of audio objects.
- the step 520 comprises decoding and rendering the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters.
- the dynamic head-related transfer function parameters being provided from an outside of decoding means.
- the decoding means are performing decoding in accordance with the binaural MPEG Surround standard.
- the step 530 comprises converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
- the steps 520 and 530 can be performed in the reversed sequence or simultaneously. The synchronization in time of the said parameters with the down-mix signal should be taken care off.
- the converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters is combined with generation of binaural parameters as used by decoding means in one step.
- Said decoding means perform decoding in accordance with the binaural MPEG Surround standard. Further, said decoding means use the binaural parameters for decoding, where said binaural parameters are based on the spatial parameters and dynamic head-related transfer function parameters.
- the binaural parameters as used by the decoding means have the same format as the head-related transfer function parameters.
- the binaural parameters comprise the left-ear magnitude, the right-ear magnitude, the phase difference, and coherence parameter, respectively, pit, p r ,b, ⁇ & and p ⁇ .
- the multiple objects are comprised as object (subband - i.e., a set of band- limited and possibly down-sampled) signals X 1 in a down-mix signal s:
- the binaural parameters are derived from the parametric data and head-related transfer function parameters according to:
- /?/,, p T ⁇ ll ⁇ 1 , and P 1 are the head-related transfer function parameters corresponding respectively to a left-ear magnitude, a right-ear magnitude, a phase difference and a coherence parameter corresponding to a perceived position of object i, and pit, p r ,b, ⁇ & and pb, are the binaural parameters corresponding respectively to a left-ear magnitude, a right-ear magnitude, a phase difference and a coherence parameter, said binaural parameters being representative for an object or a plurality of objects, and Z(.) represents the complex phase angle operator:
- each down-mix signal each corresponding to one of the two channels, is mapped to a single multi-channel output, e.g. to the left-front and right-front virtual loudspeakers. Subsequently, the binaural parameters for both the left- front and right-front virtual loudspeakers as indicated by the above formulas.
- the difference with the mono down-mix signal case is that the parametric data ⁇ is different for the two down-mix channels because one or more objects may be present predominantly in only one of the down-mix signals.
- a receiver for receiving audio signals comprises: a receiver element, decoding means, and conversion means.
- the receiver element is receiving from a transmitter at least one down-mix audio signal and parametric data.
- Each down-mix audio signal comprises a down-mix of a plurality of audio objects.
- Said parametric data comprises a plurality of object parameters for each of the plurality of audio objects.
- the decoding means are in accordance with the binaural MPEG Surround standard and are decoding and rendering the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters. Said dynamic head-related transfer function parameters are provided from an outside of the decoding means.
- the conversion means are converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
- Fig 4 shows a transmission system for communication of an audio signal in accordance with some embodiments of the invention.
- the transmission system comprises a transmitter 600, which is coupled with a receiver 800 through a network 700.
- the network 700 could be e.g. Internet.
- the transmitter 600 is for example a signal recording device and the receiver 800 is for example a signal player device.
- the transmitter 600 comprises means 610 for receiving a plurality of audio objects. Consequently, these objects are encoded by encoding means 620 for encoding the plurality of audio objects in at least one down-mix audio signal and parametric data.
- encoding means 620 is given in Faller, C, "Parametric joint-coding of audio sources", Proc. 120 th AES Convention, Paris, France, May 2006.
- Each down-mix audio signal comprises a down-mix of a plurality of audio objects.
- Said parametric data comprises a plurality of object parameters for each of the plurality of audio objects.
- the encoded audio objects are transmitted to the receiver 800 by means 630 for transmitting down-mix audio signals and the parametric data.
- Said means 630 have an interface with the network 700, and may transmit the down-mix signals through the network 700.
- the receiver comprises a receiver element 810 for receiving from the transmitter 600 at least one down-mix audio signal and parametric data.
- Each down-mix audio signal comprises a down-mix of a plurality of audio objects.
- Said parametric data comprises a plurality of object parameters for each of the plurality of audio objects.
- the received down-mix audio signal is decoded by decoding means 830 in accordance with the binaural MPEG Surround standard.
- the decoding means perform decoding and rendering of the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters. Said dynamic head-related transfer function parameters are provided from an outside of the decoding means 830.
- the conversion means 820 perform converting the parametric data and head- related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
- the head-related transfer function parameters are set in response to user input.
- the user can by means of e.g. button, slider, knob, or graphical user interface, set the HRTF parameters according to own preferences.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
An object-oriented audio decoder (100) comprising: conversion means and decoding means. The conversion means (130) converte received parametric data and received head-related transfer function parameters into spatial parameters and dynamic head-related transfer function parameters. Said parametric data comprises a plurality of object parameters 5 for each of the plurality o f audio objects. The decoding means (120) in accordance with the binaural MPEG Surround standard decode and render the audio objects from the received down-mix audio signals based on the spatial parameters and the dynamic head-related transfer function parameters. Said spatial parameters and said dynamic head-related transfer function parameters are provided from the conversion means. Said received down-mix 10 signals comprise a down-mix of a plurality of audio objects.
Description
An object-oriented audio decoder
TECHNICAL FIELD
The invention relates to an object-oriented audio decoder comprising a binaural MPEG Surround decoder.
TECHNICAL BACKGROUND
In (parametric) spatial audio (en)coders, parameters are extracted from the original audio signals so as to produce a reduced number of down-mix audio signals (for example only a single down-mix signal corresponding to a mono, or two down-mix signals for a stereo down-mix signal), and a corresponding set of parameters describing the spatial properties of the original multi-channel audio signal. In (parametric) spatial audio decoders, the spatial properties described by the transmitted spatial parameters are used to recreate the original spatial multi-channel signal, which closely resembles the original audio signal. Recently, techniques for processing and manipulating of individual audio objects at the decoding side have attracted significant interest. For example, within the
MPEG framework, a workgroup has been started on object-based spatial audio coding. The aim of this workgroup is to "explore new technology and reuse of current MPEG Surround components and technologies for the bit rate efficient coding of multiple sound sources or objects into a number of down-mix channels and corresponding spatial parameters". In other words, the aim is to encode multiple audio objects in a limited set of down-mix channels with corresponding parameters. At the decoder side, users interact with the content for example by repositioning the individual objects.
Such interaction with the content is easily realized in object-oriented decoders. It is then realized by including a rendering step that follows the decoding process. Said rendering is combined with the decoding as a single processing step to prevent the need of determining individual objects. For loudspeaker playback, such combination is described in Faller, C, "Parametric joint-coding of audio sources", Proc. 120th AES Convention, Paris, France, May 2006. For headphone playback, an efficient combination of decoding and head- related transfer function processing is described in Breebaart, J., Herre, J., Villemoes, L., Jin,
C, Kjόrling, K., Plogsties, J., Koppens, J. (2006), "Multi-channel goes mobile: MPEG Surround binaural rendering", Proc. 29th AES conference, Seoul, Korea. From the point of view of reuse it is preferable to use an existing MPEG Surround decoder as rendering engine for the object-oriented audio decoder. For headphone playback, MPEG Surround features a dedicated binaural decoding mode that generates a three-dimensional sound scene over the conventional headphones. However, the binaural MPEG Surround decoder has a disadvantage that the decoding process is defined specifically for only five virtual loudspeakers. This means that all objects from the down-mix audio signal can be mapped to only these positions where those five virtual loudspeakers are located.
SUMMARY OF THE INVENTION
It is an object of the invention to provide an enhanced object-oriented decoder for headphones playback that allows an arbitrary virtual positioning of objects in a space. This object is achieved by an object-oriented audio decoder according to the invention. It is assumed that a set of objects, each with its corresponding waveform, has previously been encoded in an object-oriented encoder, which generates a down-mix audio signal (a single signal in case of a single channel), said down-mix audio signal being a down- mix of a plurality of audio objects characterized by corresponding parametric data. The parametric data comprises a set of object parameters for each of the different audio objects. The receiver receives said down-mix audio signal and said parametric data. This down-mix audio signal is further fed into a decoding means, said decoding means being in accordance with the binaural MPEG Surround standard. These decoding means perform both decoding and rendering of the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters. The dynamic head-related transfer function parameters are provided from an outside of the decoding means. The conversion means convert the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters. The head-related transfer function parameters are provided from the outside of the object-oriented decoder. The decoding means generate a spatial output audio signal from the audio objects to be played back over e.g. headphones.
The advantage of the object-oriented audio decoder according to the invention is that the head-related transfer function parameters are moved out of the object-oriented audio decoder comprising decoding means in accordance with the binaural MPEG Surround standard, therefore freeing the object-oriented audio decoder from the inherent limitation of a
predetermined maximum of the virtual locations of objects, when the head-related transfer function parameters are hard-wired in the binaural MPEG Surround decoder. The corresponding conversion of the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters results in freedom in placing the decoded objects in the space. Furthermore, the advantage of the object-oriented audio decoder according to the invention is that no explicit object decoding is required, and the rendering emerged in the MPEG Surround decoder comprised in the object-oriented audio decoder is preserved.
In an embodiment, the objects can be virtually placed at any position in a space by manipulating of the head-related transfer function parameters.
In an embodiment, converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters is combined with generation of binaural parameters as used by decoding means in one step. Said decoding means perform decoding in accordance with the binaural MPEG Surround standard. Said decoding means use the binaural parameters for decoding. Said binaural parameters being derived based on the spatial parameters and dynamic head- related transfer function parameters. Making such a combination simplifies the implementation of the object-oriented audio decoder, as no intermediate spatial parameters or dynamic head-related transfer function parameters are needed. Instead, the binaural parameters are directly derived from the parametric data and the head-related transfer function parameters.
The invention further provides a receiver and a communication system, as well as corresponding methods.
In an embodiment, the head-related transfer function parameters are set in response to user input. This allows a user to position the objects at any position in the virtual space according to user preferences.
The invention further provides a computer program product enabling a programmable device to perform the method according to the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments shown in the drawings, in which:
Fig 1 schematically shows an object-oriented decoder according to the invention;
Fig 2 shows an example set-up of virtual loudspeakers;
Fig 3 shows a method of decoding according in accordance with some embodiments of the invention;
Fig 4 shows a transmission system for communication of an audio signal in accordance with some embodiments of the invention.
Throughout the figures, same reference numerals indicate similar or corresponding features. Some of the features indicated in the drawings are typically implemented in software, and as such represent software entities, such as software modules or objects.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Fig 1 schematically shows an object-oriented decoder 100 according to the invention. It is assumed that a set of objects, each with its corresponding waveform, has previously been encoded in an object-oriented encoder, which generates a down-mix audio signal 102 (a single signal in case of a single channel, or two signals in case of two channels (= stereo)), said down-mix audio signal 102 being a down-mix of a plurality of audio objects characterized by corresponding parametric data 103. The parametric data 103 comprises a set of object parameters for each of the different audio objects. The receiver 110 receives said down-mix audio signal 102 and said parametric data 103. Although the down-mix audio signal and parametric data are indicated as a separate signals/data paths, they could be multiplexed into one signal/data stream comprising concatenated down-mix audio data that corresponds to the down-mix audio signal and the parametric data. The function of the receiver is then demultiplexing of the two data streams. If the down-mix audio signal 102 is provided in a compressed form (such as MPEG-I layer 3), the receiver 110 also performs decompression or decoding of the compressed audio signal into a time-domain audio down-mix signal.
The down-mix audio signal 102 is further fed into decoding means 120, said decoding means 120 being in accordance with the binaural MPEG Surround standard. These decoding means 120 perform both, decoding and rendering of the audio objects from the down-mix audio signals 102 based on the spatial parameters 106 and dynamic head-related transfer function parameters 107. The dynamic head-related transfer function (HRTF) parameters 107 are being provided from an outside of the decoding means 120. The conversion means 130 perform conversion of the parametric data 103 and HRTF parameters 105 into the spatial parameters 106 and the dynamic HRTF parameters 107. The HRTF
parameters 105 are provided from the outside of the object-oriented decoder by the HRTF parameter database 200.
Since decoding means 120 comprise essentially an MPEG Surround decoder, the spatial parameters 106 are preferably provided in MPEG Surround format; furthermore the dynamic head-related transfer functions 107 are preferably provided in the corresponding MPEG Surround format for HRTF parameters.
Optionally, the conversion means can use a user-control data 104 in order to generate the spatial parameters 106 and the dynamic head-related transfer function parameters 107. Said user-control data 104 comprises data concerned with rendering of the audio objects. For example, the user-control data 104 may indicate the desired spatial position (for example in terms of elevation and azimuth) of one or more audio objects. In that case, conversion means 130 select HRTF parameters 105 that correspond to the desired position from HRTF database 200. If the desired position is not directly available in HRTF parameter database 200, interpolation of parameters may be required that correspond to HRTF database positions surrounding the desired position.
The decoding means 120 generate a spatial output audio signal 108 from the audio objects to be played back at e.g. the headphones.
For a person skilled in art, it is known that for headphones playback the MPEG Surround standard comprises a dedicated binaural decoding mode that generates a three-dimensional (3D) sound scene over conventional stereo headphones. The conventional 3D synthesis as known from MPEG Surround requires audio configured up to five loudspeakers, as currently standardized, which is further convolved with HRTFs followed by summation of the convolved signals to result in a binaural output signal pair. The advantage of MPEG Surround is that it provides means to perform multi-channel HRTF processing in the down-mix domain, without multi-channel audio as intermediate step. This is realized by converting HRTF parameters to the parameter domain, and subsequently computing binaural parameters from the combined spatial parameters and HRTF parameters. Detailed information about the said parameter conversion is provided in Breebaart, J., Herre, J., Villemoes, L., Jin, C, Kjόrling, K., Plogsties, J., Koppens, J. (2006). Multi-channel goes mobile: MPEG Surround binaural rendering. Proc. 29th AES conference, Seoul, Korea. The HRTF parameters are provided to the MPEG Surround decoder, however the end user can change the actual parameter values. This is the case when the user wants to use his/her personalization options.
The HRTF parameters comprise: a level of the left-ear channel output, a level of the right-ear channel output, the phase difference between the left and right-ear binaural output, and optionally a coherence between the left and right-ear channel output. The level parameters are defined as a change in a level with respect to the original input signal level. The HRTF parameters are defined as a function of spatial position of a sound source, either by a mathematical model or by using a database 200.
The resulting binaural parameters are: a level of a left-channel binaural output, a level of a right-channel binaural output, the phase difference between the left and right- channel outputs, and the coherence between the left and right-channel output. Preferably, the level parameters are expressed relatively to the level(s) of the down-mix signal(s).
The proposed object-oriented decoder overcomes however the limitation of audio up to five loudspeakers only. In the MPEG Surround decoder, HRTF parameters represent the binaural properties of each virtual object separately, i.e., one HRTF parameter set for each virtual loudspeaker. The MPEG Surround decoder combines the binaural properties of each virtual loudspeaker into binaural parameters of the (up to) five virtual loudspeakers simultaneously, with the help of spatial parameters 106 that describe the spatial relations between the virtual loudspeaker signals. This process is only defined for (up to) five virtual loudspeakers. The HRTF parameters and binaural parameters, however, have very similar representations. It is therefore possible to estimate binaural parameters of a complex auditory scene comprising more than five virtual loudspeakers by conversion means 130, and providing the resulting binaural parameters as dynamic HRTF parameters to MPEG Surround decoder 120. In the proposed object-oriented decoder, the parametric data 103 and user control data 104 are combined together in the process of computing the binaural parameters, which are fed into the HRTF parameter input of decoding means 120 in a dynamic fashion. At the same time the spatial parameters are computed and fed into the decoding means to be used to map each down-mix channel to one of the possible configurations for five virtual loudspeakers.
Fig 2 shows an example set-up of virtual loudspeakers. The object-oriented audio decoder is adapted to use virtual loudspeakers at a center 320, left-front 310, right-front 330, left-surround 340, and right-surround 350 positions, whereby the user is positioned at the location 400. In such a set-up, in case of a mono down-mix signal, this down-mix signal is mapped e.g. to the center 320 loudspeaker of the virtual loudspeaker set-up. For the stereo down-mix signal, the signal corresponding to the left channel can be mapped to the left-front
310 loudspeaker, while the signal corresponding to the right channel can be mapped to the right-front 330 loudspeaker.
In an embodiment, the objects can be virtually placed at any position in a space through manipulating of the head-related transfer function parameters. The HRTF parameters corresponding to e.g. the center loudspeaker are set to represent a combination of a multiple objects placed differently in the space. In other words, despite using the five virtual speakers the sound can be generated that can be perceived to originate from objects placed arbitrarily in space and not located at the loudspeaker locations. This advantage of the proposed solution is realized by the HRTF parameters being taken out of control of the decoding means comprising the MPEG Surround decoder, therefore allowing an arbitrary modification of the HRTF parameters to influence the placing of object arbitrarily in the space.
In an embodiment, the head-related transfer function parameters comprise at least left-ear magnitude, right-ear magnitude, and phase difference, respectively, pι:l> pr,u φz. Said parameters correspond to the perceived position of an object. Optionally the coherence parameter p, can also be comprised in the head-related transfer function parameters. If this parameter is not specified, it can be assumed to be equal to +1, or derived from the default- parameter value table as specified in the MPEG Surround standard.
Fig 3 shows a method of decoding according in accordance with some embodiments of the invention. The step 510 comprises receiving at least one down-mix audio signal and parametric data. Each down-mix audio signal comprises a down-mix of a plurality of audio objects. The parametric data comprises a plurality of object parameters for each of the plurality of audio objects.
The step 520 comprises decoding and rendering the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters. The dynamic head-related transfer function parameters being provided from an outside of decoding means. The decoding means are performing decoding in accordance with the binaural MPEG Surround standard.
The step 530 comprises converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters. The steps 520 and 530 can be performed in the reversed sequence or simultaneously. The synchronization in time of the said parameters with the down-mix signal should be taken care off.
In an embodiment, the converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters is combined with generation of binaural parameters as used by decoding means in one step. Said decoding means perform decoding in accordance with the binaural MPEG Surround standard. Further, said decoding means use the binaural parameters for decoding, where said binaural parameters are based on the spatial parameters and dynamic head-related transfer function parameters.
In an embodiment, the binaural parameters as used by the decoding means have the same format as the head-related transfer function parameters. The binaural parameters comprise the left-ear magnitude, the right-ear magnitude, the phase difference, and coherence parameter, respectively, pit, pr,b, φ& and p^.
The multiple objects are comprised as object (subband - i.e., a set of band- limited and possibly down-sampled) signals X1 in a down-mix signal s:
The parametric data represents the power σf of each object signal X1 within the down-mix signal s: σ: 2 = (xtx*) , where <.> is the expected value operator and x* is the complex conjugate of x. In an embodiment, the binaural parameters are derived from the parametric data and head-related transfer function parameters according to:
wherein /?/,, pTιll ^1, and P1, are the head-related transfer function parameters corresponding respectively to a left-ear magnitude, a right-ear magnitude, a phase difference and a coherence parameter corresponding to a perceived position of object i, and pit, pr,b, φ& and pb, are the binaural parameters corresponding respectively to a left-ear magnitude, a
right-ear magnitude, a phase difference and a coherence parameter, said binaural parameters being representative for an object or a plurality of objects, and Z(.) represents the complex phase angle operator:
The above formulas comprise summation over i, where i is the index of the objects comprised in the down-mix signal. In case of mono down-mix signal, these binaural parameters representative for a multitude of objects are fed into the decoding means comprising the MPEG Surround decoder as parameters corresponding to a single audio channel.
In case of stereo down-mix signal, each down-mix signal, each corresponding to one of the two channels, is mapped to a single multi-channel output, e.g. to the left-front and right-front virtual loudspeakers. Subsequently, the binaural parameters for both the left- front and right-front virtual loudspeakers as indicated by the above formulas. The difference with the mono down-mix signal case is that the parametric data σ is different for the two down-mix channels because one or more objects may be present predominantly in only one of the down-mix signals.
In an embodiment a receiver for receiving audio signals comprises: a receiver element, decoding means, and conversion means. The receiver element is receiving from a transmitter at least one down-mix audio signal and parametric data. Each down-mix audio signal comprises a down-mix of a plurality of audio objects. Said parametric data comprises a plurality of object parameters for each of the plurality of audio objects.
The decoding means are in accordance with the binaural MPEG Surround standard and are decoding and rendering the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters. Said dynamic head-related transfer function parameters are provided from an outside of the decoding means.
The conversion means are converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
Fig 4 shows a transmission system for communication of an audio signal in accordance with some embodiments of the invention. The transmission system comprises a
transmitter 600, which is coupled with a receiver 800 through a network 700. The network 700 could be e.g. Internet.
The transmitter 600 is for example a signal recording device and the receiver 800 is for example a signal player device. In the specific example when a signal recording function is supported, the transmitter 600 comprises means 610 for receiving a plurality of audio objects. Consequently, these objects are encoded by encoding means 620 for encoding the plurality of audio objects in at least one down-mix audio signal and parametric data. An example of such encoding means 620 is given in Faller, C, "Parametric joint-coding of audio sources", Proc. 120th AES Convention, Paris, France, May 2006. Each down-mix audio signal comprises a down-mix of a plurality of audio objects. Said parametric data comprises a plurality of object parameters for each of the plurality of audio objects. The encoded audio objects are transmitted to the receiver 800 by means 630 for transmitting down-mix audio signals and the parametric data. Said means 630 have an interface with the network 700, and may transmit the down-mix signals through the network 700.
The receiver comprises a receiver element 810 for receiving from the transmitter 600 at least one down-mix audio signal and parametric data. Each down-mix audio signal comprises a down-mix of a plurality of audio objects. Said parametric data comprises a plurality of object parameters for each of the plurality of audio objects. The received down-mix audio signal is decoded by decoding means 830 in accordance with the binaural MPEG Surround standard. The decoding means perform decoding and rendering of the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters. Said dynamic head-related transfer function parameters are provided from an outside of the decoding means 830.
The conversion means 820 perform converting the parametric data and head- related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
In an embodiment, the head-related transfer function parameters are set in response to user input. The user can by means of e.g. button, slider, knob, or graphical user interface, set the HRTF parameters according to own preferences.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims.
In the accompanying claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer.
Claims
1. An object-oriented audio decoder (100) comprising: conversion means (130) for converting received parametric data and received head-related transfer function parameters into spatial parameters and dynamic head-related transfer function parameters, said parametric data comprising a plurality of object parameters for each of the plurality o f audio obj ects; decoding means (120) in accordance with the binaural MPEG Surround standard for decoding and rendering the audio objects from the received down-mix audio signals based on the spatial parameters and the dynamic head-related transfer function parameters, said spatial parameters and said dynamic head-related transfer function parameters being provided from the conversion means, said received down-mix signals comprising a down-mix of a plurality of audio objects.
2. An object-oriented audio decoder as claimed in claim 1, wherein said object- oriented audio decoder is adapted to use virtual loudspeakers at a center (320), left-front (310), right-front (330), left-surround (340), and right-surround (350) positions.
3. An object-oriented audio decoder as claimed in claim 1, wherein the objects can be virtually placed at any position in a space through manipulating of the head-related transfer function parameters.
4. An object-oriented audio decoder as claimed in claim 1, wherein the head- related transfer function parameters comprise at least left-ear magnitude, right-ear magnitude, and phase difference, said parameters corresponding to the perceived position of an object.
5. A method of decoding audio signals comprising: receiving (510) at least one down-mix audio signal and parametric data, each down-mix audio signal comprising a down-mix of a plurality of audio objects, said parametric data comprising a plurality of object parameters for each of the plurality of audio objects; decoding and rendering (520) the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters, said dynamic head-related transfer function parameters being provided from an outside of decoding means, said decoding in accordance with the binaural MPEG Surround standard; converting (530) the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
6. A method as claimed in claim 5, wherein the converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters is combined with generation of binaural parameters as used by decoding means in one step; said decoding means performing decoding in accordance with the binaural MPEG Surround standard; said decoding means using the binaural parameters for decoding; said binaural parameters being based on the spatial parameters and dynamic head-related transfer function parameters.
7. A method as claimed in claim 6, wherein the binaural parameters as used by the decoding means have the same format as the head-related transfer function parameters.
8. A method as claimed in claim 7, wherein the binaural parameters are derived from the parametric data and head-related transfer function parameters according to:
wherein /?/,, pTιll §u and p,, are the head-related transfer function parameters corresponding respectively to a left-ear magnitude, a right-ear magnitude, a phase difference and a coherence parameter corresponding to a perceived position of object i, and pit, pr,b, φ& and pb, are the binaural parameters corresponding respectively to a left-ear magnitude, a right-ear magnitude, a phase difference and a coherence parameter, said binaural parameters being representative for an object or a plurality of objects.
9. A receiver for receiving audio signals, the receiver comprising the object- oriented audio decoder of claim 1 and a receiver element (110) for receiving from a transmitter at least one down-mix audio signal and parametric data, each down-mix audio signal comprising a down-mix of a plurality of audio objects, said parametric data comprising a plurality of object parameters for each of the plurality of audio objects, the receiver element being coupled to the conversion means (130) and the decoding means (120) of the object- oriented decoder (100).
10. A communication system for communicating audio signals, the communication system comprising: a transmitter (600) comprising: means (610) for receiving a plurality of audio objects, encoding means (620) for encoding the plurality of audio objects in at least one down-mix audio signal and parametric data, each down-mix audio signal comprising a down-mix of a plurality of audio objects, said parametric data comprising a plurality of object parameters for each of the plurality of audio objects, and means (630) for transmitting down-mix audio signals and the parametric data to the receiver as claimed in claim 9.
11. A method of receiving audio signals, the method comprising: receiving from a transmitter at least one down-mix audio signal and parametric data, each down-mix audio signal comprising a down-mix of a plurality of audio objects, said parametric data comprising a plurality of object parameters for each of the plurality of audio objects; decoding and rendering the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters, said dynamic head-related transfer function parameters being provided from an outside of the decoding means, said decoding in accordance with the binaural MPEG Surround standard; converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
12. A method of transmitting and receiving audio signals, the method comprising: at a transmitter performing the steps of: receiving a plurality of audio objects, encoding the plurality of audio objects in at least one down-mix audio signal and parametric data, each down-mix audio signal comprising a down-mix of a plurality of audio objects, said parametric data comprising a plurality of object parameters for each of the plurality of audio objects, and transmitting down-mix audio signals and the parametric data to a receiver; and at the receiver performing the steps of: receiving from a transmitter at least one down-mix audio signal and parametric data, each down-mix audio signal comprising a down-mix of a plurality of audio objects, said parametric data comprising a plurality of object parameters for each of the plurality of audio objects; decoding and rendering the audio objects from the down-mix audio signals based on the spatial parameters and dynamic head-related transfer function parameters, said dynamic head-related transfer function parameters being provided from an outside of the decoding means, said decoding in accordance with the binaural MPEG Surround standard; converting the parametric data and head-related transfer function parameters into the spatial parameters and the dynamic head-related transfer function parameters.
13. A method claimed in any of claims 6-8 , 11, and 12, wherein the head-related transfer function parameters are set in response to user input.
14. A computer program product for executing the method of any claims 6-8, 11, and 12.
15. An audio playing device comprising an object-oriented audio decoder according to claim 1.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07100343 | 2007-01-10 | ||
EP07100343.8 | 2007-01-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008084436A1 true WO2008084436A1 (en) | 2008-07-17 |
Family
ID=39276374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2008/050041 WO2008084436A1 (en) | 2007-01-10 | 2008-01-08 | An object-oriented audio decoder |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008084436A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8027477B2 (en) | 2005-09-13 | 2011-09-27 | Srs Labs, Inc. | Systems and methods for audio processing |
US8396575B2 (en) | 2009-08-14 | 2013-03-12 | Dts Llc | Object-oriented audio streaming system |
US8831254B2 (en) | 2006-04-03 | 2014-09-09 | Dts Llc | Audio signal processing |
US9026450B2 (en) | 2011-03-09 | 2015-05-05 | Dts Llc | System for dynamically creating and rendering audio objects |
US9558785B2 (en) | 2013-04-05 | 2017-01-31 | Dts, Inc. | Layered audio coding and transmission |
-
2008
- 2008-01-08 WO PCT/IB2008/050041 patent/WO2008084436A1/en active Application Filing
Non-Patent Citations (4)
Title |
---|
"SAOC use cases, draft requirements and architecture", VIDEO STANDARDS AND DRAFTS, XX, XX, no. W8638, 27 October 2006 (2006-10-27), XP030015132 * |
BREEBAART J ET AL: "Multi-channel goes mobile: MPEG surround binaural rendering", AES INTERNATIONAL CONFERENCE. AUDIO FOR MOBILE AND HANDHELD DEVICES, XX, XX, 2 September 2006 (2006-09-02), pages 1 - 13, XP007902577 * |
PASI OJALA ET AL: "Description of Binaural Audio Image Control to FCD", VIDEO STANDARDS AND DRAFTS, XX, XX, no. M13545, 12 July 2006 (2006-07-12), XP030042214 * |
PASI OJALA ET AL: "Further information on binaural decoder functionality", VIDEO STANDARDS AND DRAFTS, XX, XX, no. M13233, 29 March 2006 (2006-03-29), XP030041902 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9232319B2 (en) | 2005-09-13 | 2016-01-05 | Dts Llc | Systems and methods for audio processing |
US8027477B2 (en) | 2005-09-13 | 2011-09-27 | Srs Labs, Inc. | Systems and methods for audio processing |
US8831254B2 (en) | 2006-04-03 | 2014-09-09 | Dts Llc | Audio signal processing |
US9167346B2 (en) | 2009-08-14 | 2015-10-20 | Dts Llc | Object-oriented audio streaming system |
US8396577B2 (en) | 2009-08-14 | 2013-03-12 | Dts Llc | System for creating audio objects for streaming |
US8396576B2 (en) | 2009-08-14 | 2013-03-12 | Dts Llc | System for adaptively streaming audio objects |
US8396575B2 (en) | 2009-08-14 | 2013-03-12 | Dts Llc | Object-oriented audio streaming system |
US9026450B2 (en) | 2011-03-09 | 2015-05-05 | Dts Llc | System for dynamically creating and rendering audio objects |
US9165558B2 (en) | 2011-03-09 | 2015-10-20 | Dts Llc | System for dynamically creating and rendering audio objects |
US9721575B2 (en) | 2011-03-09 | 2017-08-01 | Dts Llc | System for dynamically creating and rendering audio objects |
US9558785B2 (en) | 2013-04-05 | 2017-01-31 | Dts, Inc. | Layered audio coding and transmission |
US9613660B2 (en) | 2013-04-05 | 2017-04-04 | Dts, Inc. | Layered audio reconstruction system |
US9837123B2 (en) | 2013-04-05 | 2017-12-05 | Dts, Inc. | Layered audio reconstruction system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101490743B (en) | Dynamic decoding of binaural audio signals | |
RU2759160C2 (en) | Apparatus, method, and computer program for encoding, decoding, processing a scene, and other procedures related to dirac-based spatial audio encoding | |
JP4944902B2 (en) | Binaural audio signal decoding control | |
KR101120909B1 (en) | Apparatus and method for multi-channel parameter transformation and computer readable recording medium therefor | |
CN107533843B (en) | System and method for capturing, encoding, distributing and decoding immersive audio | |
EP1989920B1 (en) | Audio encoding and decoding | |
US20170366912A1 (en) | Ambisonic audio rendering with depth decoding | |
US9794686B2 (en) | Controllable playback system offering hierarchical playback options | |
US9478228B2 (en) | Encoding and decoding of audio signals | |
JP5227946B2 (en) | Filter adaptive frequency resolution | |
CN104054126A (en) | Spatial audio rendering and encoding | |
Breebaart et al. | Spatial audio object coding (SAOC)-the upcoming MPEG standard on parametric object based audio coding | |
CN101433099A (en) | Personalized decoding of multi-channel surround sound | |
JP2013174891A (en) | High quality multi-channel audio encoding and decoding apparatus | |
CN108141685A (en) | Use the audio coding and decoding that transformation parameter is presented | |
Jot et al. | Beyond surround sound-creation, coding and reproduction of 3-D audio soundtracks | |
CN114067810A (en) | Audio signal rendering method and device | |
WO2008084436A1 (en) | An object-oriented audio decoder | |
KR102555789B1 (en) | Processing of monophonic signals in 3D audio decoders delivering stereophonic content | |
JP7102024B2 (en) | Audio signal processing device that uses metadata | |
JP2010516077A (en) | Audio signal processing method and apparatus | |
JP2016507175A (en) | Multi-channel encoder and decoder with efficient transmission of position information | |
KR20190060464A (en) | Audio signal processing method and apparatus | |
KR20080078907A (en) | Controlling the decoding of binaural audio signals | |
KR20070081735A (en) | Apparatus for encoding and decoding audio signal and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08700216 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08700216 Country of ref document: EP Kind code of ref document: A1 |