US20110040395A1 - Object-oriented audio streaming system - Google Patents

Object-oriented audio streaming system Download PDF

Info

Publication number
US20110040395A1
US20110040395A1 US12/856,442 US85644210A US2011040395A1 US 20110040395 A1 US20110040395 A1 US 20110040395A1 US 85644210 A US85644210 A US 85644210A US 2011040395 A1 US2011040395 A1 US 2011040395A1
Authority
US
United States
Prior art keywords
audio
objects
stream
oriented
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/856,442
Other versions
US8396575B2 (en
Inventor
Alan D. Kraemer
James Tracey
Themis Katsianos
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DTS Inc
Original Assignee
SRS Labs Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SRS Labs Inc filed Critical SRS Labs Inc
Priority to US12/856,442 priority Critical patent/US8396575B2/en
Assigned to SRS LABS, INC. reassignment SRS LABS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KATSIANOS, THEMIS, KRAEMER, ALAN D., TRACEY, JAMES
Publication of US20110040395A1 publication Critical patent/US20110040395A1/en
Assigned to DTS LLC reassignment DTS LLC MERGER (SEE DOCUMENT FOR DETAILS). Assignors: SRS LABS, INC.
Priority to US13/791,488 priority patent/US9167346B2/en
Application granted granted Critical
Publication of US8396575B2 publication Critical patent/US8396575B2/en
Assigned to ROYAL BANK OF CANADA, AS COLLATERAL AGENT reassignment ROYAL BANK OF CANADA, AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIGITALOPTICS CORPORATION, DigitalOptics Corporation MEMS, DTS, INC., DTS, LLC, IBIQUITY DIGITAL CORPORATION, INVENSAS CORPORATION, PHORUS, INC., TESSERA ADVANCED TECHNOLOGIES, INC., TESSERA, INC., ZIPTRONIX, INC.
Assigned to DTS, INC. reassignment DTS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DTS LLC
Assigned to BANK OF AMERICA, N.A. reassignment BANK OF AMERICA, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DTS, INC., IBIQUITY DIGITAL CORPORATION, INVENSAS BONDING TECHNOLOGIES, INC., INVENSAS CORPORATION, PHORUS, INC., ROVI GUIDES, INC., ROVI SOLUTIONS CORPORATION, ROVI TECHNOLOGIES CORPORATION, TESSERA ADVANCED TECHNOLOGIES, INC., TESSERA, INC., TIVO SOLUTIONS INC., VEVEO, INC.
Assigned to TESSERA, INC., DTS, INC., INVENSAS CORPORATION, IBIQUITY DIGITAL CORPORATION, INVENSAS BONDING TECHNOLOGIES, INC. (F/K/A ZIPTRONIX, INC.), TESSERA ADVANCED TECHNOLOGIES, INC, FOTONATION CORPORATION (F/K/A DIGITALOPTICS CORPORATION AND F/K/A DIGITALOPTICS CORPORATION MEMS), PHORUS, INC., DTS LLC reassignment TESSERA, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: ROYAL BANK OF CANADA
Assigned to IBIQUITY DIGITAL CORPORATION, DTS, INC., PHORUS, INC., VEVEO LLC (F.K.A. VEVEO, INC.) reassignment IBIQUITY DIGITAL CORPORATION PARTIAL RELEASE OF SECURITY INTEREST IN PATENTS Assignors: BANK OF AMERICA, N.A., AS COLLATERAL AGENT
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/308Electronic adaptation dependent on speaker or headphone connection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/40Visual indication of stereophonic sound image

Definitions

  • Audio distribution systems are also unsuited for 3D video applications because they are incapable of rendering sound accurately in three-dimensional space. These systems are limited by the number and position of speakers and by the fact that psychoacoustic principles are generally ignored. As a result, even the most elaborate sound systems create merely a rough simulation of an acoustic space, which does not approximate a true 3D or multi-dimensional presentation.
  • audio objects are created by associating sound sources with attributes of those sound sources, such as location, velocity, directivity, and the like. Audio objects can be used in place of or in addition to channels to distribute sound, for example, by streaming the audio objects over a network to a client device.
  • the objects can define their locations in space with associated two or three dimensional coordinates.
  • the objects can be adaptively streamed to the client device based on available network or client device resources.
  • a renderer on the client device can use the attributes of the objects to determine how to render the objects.
  • the renderer can further adapt the playback of the objects based on information about a rendering environment of the client device.
  • audio object creation techniques are also described.
  • FIGS. 1A and 1B illustrate embodiments of object-oriented audio systems
  • FIG. 2 illustrates another embodiment of an object-oriented audio system
  • FIG. 3 illustrates an embodiment of a streaming module for use in any of the object-oriented audio systems described herein;
  • FIG. 4 illustrates an embodiment of an object-oriented audio streaming format
  • FIG. 5A illustrates an embodiment of an audio stream assembly process
  • FIG. 5B illustrates an embodiment of an audio stream rendering process
  • FIG. 6 illustrates an embodiment of an adaptive audio object streaming system
  • FIG. 7 illustrates an embodiment of an adaptive audio object streaming process
  • FIG. 8 illustrates an embodiment of an adaptive audio object rendering process
  • FIG. 9 illustrates an example scene for object-oriented audio capture
  • FIG. 10 illustrates an embodiment of a system for object-oriented audio capture
  • FIG. 11 illustrates an embodiment of a process for object-oriented audio capture.
  • audio distribution systems do not adequately take into account the playback environment of the listener. Instead, audio systems are designed to deliver the specified number of channels to the final listening environment without any compensation for the environment, listener preferences, or the implementation of psychoacoustic principles. These functions and capabilities are traditionally left to the system integrator.
  • audio objects are created by associating sound sources with attributes of those sound sources, such as location, velocity, directivity, and the like. Audio objects can be used in place of or in addition to channels to distribute sound, for example, by streaming the audio objects over a network to a client device. In certain embodiments, these objects are not related to channels or panned positions between channels, but rather define their locations in space with associated two or three dimensional coordinates. A renderer on the client device can use the attributes of the objects to determine how to render the objects.
  • the renderer can also account for the renderer's environment in certain embodiments by adapting the rendering and/or streaming based on available computing resources.
  • streaming of the audio objects can be adapted based on network conditions, such as available bandwidth.
  • audio object creation techniques are also described.
  • the systems and methods described herein can reduce or overcome the drawbacks associated with the rigid audio channel distribution model.
  • FIGS. 1A and 1B introduce embodiments of object-oriented audio systems. Later Figures describe techniques that can be implemented by these object-oriented audio systems.
  • FIGS. 2 through 5B describe various example techniques for streaming object-oriented audio.
  • FIGS. 6 through 8 describe example techniques for adaptively streaming and rendering object-oriented audio based on environment and network conditions.
  • FIGS. 9 through 11 describe example audio object creation techniques.
  • streaming and its derivatives, in addition to having their ordinary meaning, can mean distribution of content from one computing system (such as a server) to another computing system (such as a client).
  • the term “streaming” and its derivatives can also refer to distributing content through peer-to-peer networks using any of a variety of protocols, including BitTorrent and related protocols.
  • FIGS. 1A and 1B illustrate embodiments of object-oriented audio systems 100 A, 100 B.
  • the object-oriented audio systems 100 A, 100 B can be implemented in computer hardware and/or software.
  • the object-oriented audio systems 100 A, 100 B can enable content creators to create audio objects, stream such objects, and render the objects without being bound to the fixed channel model.
  • the object-oriented audio system 100 A includes an audio object creation system 110 A, a streaming module 122 A implemented in a content server 120 A, and a renderer 142 A implemented in a user system 140 .
  • the audio object creation system 110 A can provide functionality for users to create and modify audio objects.
  • the streaming module 122 A shown installed on a content server 120 A, can be used to stream audio objects to a user system 140 over a network 130 .
  • the network 130 can include a LAN, a WAN, the Internet, or combinations of the same.
  • the renderer 142 A on the user system 140 can render the audio objects for output to one or more loudspeakers.
  • the audio object creation system 110 A includes an object creation module 114 and an object-oriented encoder 112 A.
  • the object creation module 114 can provide functionality for creating objects, for example, by associating audio data with attributes of the audio data. Any type of audio can be used to generate an audio object. Some examples of audio that can be generated into objects and streamed can include audio associated with movies, television, movie trailers, music, music videos, other online videos, video games, and the like.
  • audio data can be recorded or otherwise obtained.
  • the object creation module 114 can provide a user interface that enables a user to access, edit, or otherwise manipulate the audio data.
  • the audio data can represent a sound source or a collection of sound sources. Some examples of sound sources include dialog, background music, and sounds generated by any item (such as a car, an airplane, or any prop). More generally, a sound source can be any audio clip.
  • Sound sources can have one or more attributes that the object creation module 114 can associate with the audio data to create an object.
  • attributes include a location of the sound source, a velocity of a sound source, directivity of a sound source, and the like. Some attributes may be obtained directly from the audio data, such as a time attribute reflecting a time when the audio data was recorded. Other attributes can be supplied by a user to the object creation module 114 , such as the type of sound source that generated the audio (e.g., a car versus an actor). Still other attributes can be automatically imported by the object creation module 114 from other devices. As an example, the location of a sound source can be retrieved from a Global Positioning System (GPS) device or the like and imported into the object creation module 114 . Additional examples of attributes and techniques for identifying attributes are described in greater detail below.
  • the object creation module 114 can store the audio objects in an object data repository 116 , which can include a database or other data storage.
  • the object-oriented encoder 112 A can encode one or more audio objects into an audio stream suitable for transmission over a network.
  • the object-oriented encoder 112 A encodes the audio objects as uncompressed PCM (pulse code modulated) audio together with associated attribute metadata.
  • the object-oriented encoder 112 A also applies compression to the objects when creating the stream.
  • the audio stream generated by the object-oriented encoder can include at least one object represented by a metadata header and an audio payload.
  • the audio stream can be composed of frames, which can each include object metadata headers and audio payloads. Some objects may include metadata only and no audio payload. Other objects may include an audio payload but little or no metadata. Examples of such objects are described in detail below.
  • the audio object creation system 110 A can supply the encoded audio objects to the content server 120 A over a network (not shown).
  • the content server 120 A can host the encoded audio objects for later transmission.
  • the content server 120 A can include one or more machines, such as physical computing devices.
  • the content server 120 A can be accessible to user systems over the network 130 .
  • the content server 120 A can be a web server, an edge node in a content delivery network (CDN), or the like.
  • CDN content delivery network
  • the user system 140 can access the content server 120 A to request audio content.
  • the content server 120 A can stream, upload, or otherwise transmit the audio content to the user system 140 .
  • Any form of computing device can access the audio content.
  • the user system 140 can be a desktop, laptop, tablet, personal digital assistant (PDA), television, wireless handheld device (such as a phone), or the like.
  • PDA personal digital assistant
  • the renderer 142 A on the user system 140 can decode the encoded audio objects and render the audio objects for output to one or more loudspeakers.
  • the renderer 142 A can include a variety of different rendering features, audio enhancements, psychoacoustic enhancements, and the like for rending the audio objects.
  • the renderer 142 A can use the object attributes of the audio objects as cues on how to render the audio objects.
  • the object-oriented audio system 100 B includes many of the features of the system 100 A, such as an audio object creation system 110 B, a content server 120 B, and a user system 140 .
  • the functionality of the components shown can be the same as that described above, with certain differences noted herein.
  • the content server 120 B includes an adaptive streaming module 122 B that can dynamically adapt the amount of object data streamed to the user system 140 .
  • the user system 140 includes an adaptive renderer 142 B that can adapt audio streaming and/or the way objects are rendered by the user system 140 .
  • the object-oriented encoder 112 B has been moved from the audio object creation system 110 B to the content server 120 B.
  • the audio object creation system 110 B uploads audio objects instead of audio streams to the content server 120 B.
  • An adaptive streaming module 122 B on the content server 120 B includes the object-oriented encoder 112 B. Encoding of audio objects is therefore performed on the content server 120 B in the depicted embodiment.
  • the audio object creation system 110 B can stream encoded objects to the adaptive streaming module 122 B, which decodes the audio objects for further manipulation and later re-encoding.
  • the adaptive streaming module 122 B can dynamically adapt the way objects are encoded prior to streaming.
  • the adaptive streaming module 122 B can monitor available network 130 resources, such as network bandwidth, latency, and so forth. Based on the available network resources, the adaptive streaming module 122 B can encode more or fewer audio objects into the audio stream. For instance, as network resources become more available, the adaptive streaming module 122 B can encode relatively more audio objects into the audio stream, and vice versa.
  • the adaptive streaming module 122 B can also adjust the types of objects encoded into the audio stream, rather (or in addition to) than the number. For example, the adaptive streaming module 122 B can encode higher priority objects (such as dialog) but not lower priority objects (such as certain background sounds) when network resources are constrained. The concept of adapting streaming based on object priority is described in greater detail below.
  • the adaptive renderer 142 B can also affect how audio objects are streamed to the user system 140 .
  • the adaptive renderer 142 B can communicate with the adaptive streaming module 122 B to control the amount and/or type of audio objects streamed to the user system 140 .
  • the adaptive renderer 142 B can also adjust the way audio streams are rendered based on the playback environment. For example, a large theater may specify the location and capabilities of many tens or hundreds of amplifiers and speakers while a self-contained TV may specify that only two amplifier channels and speakers are available. Based on this information, the systems 100 A, 100 B can optimize the acoustic field presentation.
  • the adaptive features described herein can be implemented even if an object-oriented encoder (such as the encoder 112 A) sends an encoded stream to the adaptive streaming module 122 B.
  • the adaptive streaming module 122 B can remove objects from or otherwise filter the audio stream when computing resources or network resources become less available.
  • the adaptive streaming module 122 B can remove packets from the stream corresponding to objects that are relatively less important to render. Techniques for assigning importance to objects for streaming and/or rendering are described in greater detail below.
  • the disclosed systems 100 A, 100 B for audio distribution and playback can encompass the entire chain from initial production of audio content to the perceptual system of the listener(s).
  • the systems 100 A, 100 B can be scalable and future proof in that conceptual improvements in the transmission/storage or multi-dimensional rendering system can easily be incorporated.
  • the systems 100 A, 100 B can also easily scale from large format theater based presentations to home theater configurations and self contained TV audio systems.
  • the systems 100 A, 100 B can abstract the production of audio content to a series of audio objects that provide information about the structure of a scene as well as individual components within a scene.
  • the information associated with each object can be used by the systems 100 A, 100 B to create the most accurate representation of the information provided, given the resources available. These resources can be specified as an additional input to the systems 100 A, 100 B.
  • the systems 100 A, 100 B may also incorporate psychoacoustic processing to enhance listener immersion in the acoustic environment as well as to implement positioning of 3D objects that correspond accurately to their position in the visual field.
  • This processing can also be defined to the systems 100 A, 100 B (e.g., to the renderer 142 ) as a resource available to enhance or otherwise optimize the presentation of the audio object information contained in the transmission stream.
  • the stream is designed to be extensible so that additional information could be added at any time.
  • the renderer 142 A, 142 B could be generic or designed to support a particular environment and resource mix. Future improvements and new concepts in audio reproduction could be incorporated at will and the same descriptive information contained in the transmission/storage stream utilized with potentially more accurate rendering.
  • the system 100 A, 100 B is abstracted to the level that any future physical or conceptual improvements can easily be incorporated at any point within the system 100 A, 100 B while maintaining compatibility with previous content and rendering systems. Unlike current systems, the system 100 A, 100 B are flexible and adaptable.
  • object-oriented audio techniques can also be implemented in non-network environments.
  • an object-oriented audio stream can be stored on a computer-readable storage medium, such as a DVD disk, Blue-ray Disk, or the like.
  • a media player (such as a Blue-ray player) can play back the object-oriented audio stream stored on the disk.
  • An object-oriented audio package can also be downloaded to local storage on a user system and then played back from the local storage. Many other variations are possible.
  • the functionality of certain components described with respect to FIGS. 1A and 1B can be combined, modified, or omitted.
  • the audio object creation system 110 can be implemented on the content server 120 . Audio streams could be streamed directly from the audio object creation system 110 to the user system 140 . Many other configurations are possible.
  • FIG. 2 another embodiment of an object-oriented audio system 200 is shown.
  • the system 200 can implement any of the features of the systems 100 A, 100 B described above.
  • the system 200 can generate an object-oriented audio stream that can be decoded, rendered, and output by one or more speakers.
  • audio objects 202 are provided to an object-oriented encoder 212 .
  • the object-oriented encoder 212 can be implemented by an audio content creation system or a streaming module on a content server, as described above.
  • the object-oriented encoder 212 can encode and/or compress the audio objects into a bit stream 214 .
  • the object-oriented encoder 212 can use any codec or compression technique to encode the objects, including compression techniques based on any of the Moving Picture Experts Group (MPEG) standards (e.g., to create MP3 files).
  • MPEG Moving Picture Experts Group
  • the object-oriented encoder 212 creates a single bit stream 214 having metadata headers and audio payloads for different audio objects.
  • the object-oriented encoder 212 can transmit the bit stream 214 over a network (see, e.g., FIG. 1B ).
  • a decoder 220 implemented on a user system can receive the bit stream 214 .
  • the decoder 220 can decode the bit stream 214 into its constituent audio objects 202 .
  • the decoder 220 provides the audio objects 202 to a renderer 242 .
  • the renderer 242 can directly implement the functionality of the decoder 220 .
  • the renderer 242 can render the audio objects into audio signals 244 suitable for playback on one or more speakers 250 .
  • the renderer 142 A can use the object attributes of the audio objects as cues on how to render the audio objects.
  • the functionality of the renderer 142 A can be changed without changing the format of the audio objects.
  • one type of renderer 142 A might use a position attribute of an audio object to pan the audio from one speaker to another.
  • a second renderer 142 A might use the same position attribute to perform 3D psychoacoustic filtering to the audio object in response to determining that a psychoacoustic enhancement is available to the renderer 142 A.
  • the renderer 142 A can take into account some or all resources available to create the best possible presentation. As rendering technology improves, additional renders 142 A or rendering resources can be added to the user system 140 that take advantage of the preexisting format of the audio objects.
  • the object-oriented encoder 212 and/or the renderer 242 can also have adaptive features.
  • FIG. 3 illustrates an embodiment of a streaming module 322 for use with any of the object-oriented audio systems described herein.
  • the streaming module 322 includes an object-oriented encoder 312 .
  • the streaming module 322 and encoder 312 can be implemented in hardware and/or software.
  • the depicted embodiment illustrates how different types of audio objects can be encoded into a single bit stream 314 .
  • the example streaming module 322 shown receives two different types of objects—static objects 302 and dynamic objects 304 .
  • Static objects 302 can represent channels of audio, such as 5.1 channel surround sound. Each channel can be represented as a static object 302 .
  • Some content creators may wish to use channels instead of or in addition to the object-based functionality of the systems 100 A, 100 B.
  • Static objects 302 provide a way for these content creators to use channels, facilitating backwards compatibility with existing fixed channel systems and promoting ease of adoption.
  • Dynamic objects 304 can include any objects that can be used instead of or in addition to the static objects 302 . Dynamic objects 304 can include enhancements that, when rendered together with static objects 302 , enhance the audio associated with the static objects 302 .
  • the dynamic objects 304 can include psychoacoustic information that a renderer can use to enhance the static objects 302 .
  • the dynamic objects 304 can also include background objects (such as a passing airplane) that a renderer can use to enhance an audio scene. Dynamic objects 304 need not be background objects, however.
  • the dynamic objects 304 can include dialog or any other audio data.
  • the metadata associated with static objects 302 can be little or nonexistent. In one embodiment, this metadata simply includes the object attribute of “channel,” indicating to which channel the static objects 302 correspond. As this metadata does not change in some implementations, the static objects 302 are therefore static in their object attributes. In contrast, the dynamic objects 304 can include changing object attributes, such as changing position, velocity, and so forth. Thus, the metadata associated with these objects 304 can be dynamic. In some circumstances, however, the metadata associated with static objects 302 can change over time, while the metadata associated with dynamic objects 304 can stay the same.
  • some dynamic objects 304 can contain little or no audio payload.
  • Environment objects 304 can specify the desired characteristics of the acoustic environment in which a scene takes place. These dynamic objects 304 can include information on the type of building or outdoor area where the audio scene occurs, such as a room, office, cathedral, stadium, or the like. A renderer can use this information to adjust playback of the audio in the static objects 302 , for example, by applying an appropriate amount of reverberation or delay corresponding to the indicated environment.
  • Environmental dynamic objects 304 can also include an audio payload in some implementations. Some examples of environment objects are described below with respect to FIG. 4 .
  • a user system can include a library of audio clips or sounds that can be rendered by the renderer upon receipt of audio definition objects.
  • An audio definition object can include a reference to an audio clip or sound stored on the user system, along with instructions for how long to play the clip, whether to loop the clip, and so forth.
  • An audio stream can be constructed partly or even solely from audio definition objects, with some or all of the actual audio data being stored on the user system (or accessible from another server).
  • the streaming module 322 can send a plurality of audio definition objects to a user system, followed by a plurality of audio payload objects, separating the metadata and the actual audio. Many other configurations are possible.
  • Content creators can declare static objects 302 or dynamic objects 304 using a descriptive computer language (using, e.g., the audio object creation system 110 ).
  • a content creator can declare a desired number of static objects 302 .
  • a content creator can request that a dialog static object 302 (e.g., corresponding to a center channel) or any other number of static objects 302 be always on. This “always on” property can also make the static objects 302 static.
  • the dynamic objects 304 may come and go and not always be present in the audio stream. Of course, these features may be reversed. It may be desirable to gate or otherwise toggle static objects 302 , for instance.
  • dialog e.g., corresponding to a center channel
  • static objects 302 e.g., corresponding to a center channel
  • This “always on” property can also make the static objects 302 static.
  • the dynamic objects 304 may come and go and not always be present in the audio stream. Of course, these features may be reversed. It may be desirable
  • FIG. 4 illustrates an embodiment of an object-oriented audio streaming format 400 .
  • the audio streaming format includes a bit stream 414 , which can correspond to any of the bit streams described above.
  • the format 400 of the bit stream 414 is broken down into successively more detailed views ( 420 , 430 ).
  • the bit stream format 400 shown is merely an example embodiment and can be varied depending on the implementation.
  • the bit stream 414 includes a stream header 412 and macro frames 420 .
  • the stream header 412 can occur at the beginning or end of the bit stream 414 .
  • Some examples of information that can be included in the stream header 412 include an author of the stream, an origin of the stream, copyright information, a timestamp related to creation and/or delivery of the stream, length of the stream, information regarding which codec was used to encode the stream, and the like.
  • the stream header 412 can be used by a decoder and/or renderer to properly decode the stream 414 .
  • the macro frames 420 divide the bit stream 414 into sections of data.
  • Each macro frame 420 can correspond to an audio scene or a time slice of audio.
  • Each macro frame 420 further includes a macro frame header 422 and individual frames 430 .
  • the macro frame header 422 can define a number of audio objects included in the macro frame, a time stamp corresponding to the macro frame 420 , and so on.
  • the macro frame header 422 can be placed after the frames 430 in the macro frame 420 .
  • the individual frames 430 can each represent a single audio object. However, the frames 430 can also represent multiple audio objects in some implementations.
  • a renderer receives an entire macro frame 420 before rendering the audio objects associated with the macro frame 420 .
  • Each frame 430 includes a frame header 432 containing object metadata and an audio payload 434 .
  • the frame header 432 can be placed after the audio payload 434 .
  • some audio objects may have either only metadata 432 or only an audio payload 434 .
  • some frames 432 may include a frame header 432 with little or no object metadata (or no header at all), and some frames 432 may include little or no audio payload 434 .
  • the object metadata in the frame header 432 can include information on object attributes.
  • Tables illustrate examples of metadata that can be used to define object attributes.
  • Table 1 illustrates various object attributes, organized by an attribute name and attribute description. Fewer or more than the attributes shown may be implemented in some designs.
  • DOPPLER_FACT Permits scaling/exaggerating the Doppler pitch effect.
  • SRC_VEL_X Modify the sound source's velocity in the X axis direction.
  • SRC_VEL_Y Modify the sound source's velocity in the Y axis direction.
  • SRC_VEL_Z Modify the sound source's velocity in the Z axis direction.
  • ENABLE_DISTANCE Enable/Disable the Distance Attenuation process.
  • MINIMUM_DIST The distance from the listener at which distance attenuation begins to attenuate the signal.
  • MAXIMUM_DIST This distance from the listener at which distance attenuation no longer attenuates the signal.
  • SILENCE_AFT_MAX Silence the signal after reaching the maximum distance.
  • ROLLOFF_FACT The rate at which the source signal level decays as a function of distance from the listener.
  • LISTENER_RELATIVE Sets whether or not the source position is relative to listener, rather than absolute or to the camera.
  • LISTENER_X The position of the listener along the X-axis.
  • LISTENER_Y The position of the listener along the Y-axis.
  • LISTENER_Z The position of the listener along the Z-axis.
  • LISTENER_VEL_X The velocity of the listener along the X-axis.
  • LISTENER_VEL_Y The velocity of the listener along the Y-axis.
  • LISTENER_VEL_Z The velocity of the listener along the Z-axis.
  • LISTENER_ABOVE_X The X-axis orientation vector above the listener.
  • LISTENER_ABOVE_Y The Y-axis orientation vector above the listener.
  • LISTENER_ABOVE_Z The Z-axis orientation vector above the listener.
  • LISTENER_FRONT_X The X-axis orientation vector in front of the listener.
  • LISTENER_FRONT_Y The Y-axis orientation vector in front of the listener.
  • LISTENER_FRONT_Z The Z-axis orientation vector in front of the listener.
  • ENABLE_MACROSCOPIC Enables or disables use of the Macroscopic specification of an object.
  • MACROSCOPIC_X Specifies the x dimension size of sound emission.
  • MACROSCOPIC_Y Specifies the y dimension size of sound emission.
  • MACROSCOPIC_Z Specifies the z dimension size of sound emission. ENABLE_SRC_ORIENT Enables or disables the use of orientation on a source.
  • SRC_FRONT_X The X-axis orientation vector in front of the sound object
  • SRC_FRONT_Y The Y-axis orientation vector in front of the sound object
  • SRC_FRONT_Z The Z-axis orientation vector in front of the sound object
  • SRC_ABOVE_X The X-axis orientation vector above the sound object.
  • SRC_ABOVE_Y The Y-axis orientation vector above the sound object.
  • SRC_ABOVE_Z The Z-axis orientation vector above the sound object.
  • ENABLE_DIRECTIVITY Enables or disables the directivity process.
  • DIRECTIVITY_MIN_ANGLE Sets the minimum angle, normalized to 360°, for directivity attenuation. The angle is centered at about the source's front orientation creating a cone.
  • DIRECTIVITY_MAX_ANGLE Sets the maximum angle, normalized to 360°, for directivity attenuation.
  • DIRECTIVITY_REAR_LEVEL Attenuates the signal by the specified fractional amount of full-scale.
  • ENABLE_OBSTRUCTION Enables or disables the obstruction process.
  • OBSTRUCT_PRESET A preset HF Level/Level setting (see Table 2 below).
  • REVERB_ENABLE_PROCSS Enables/Disable the reverb process (affects all sources)
  • REVERB_DECAY Selects the time for the reverberant signal to decay by 60 dB (overall process).
  • REVERB_MIX Specifies the amount of original signal to processed signal to use.
  • REVERB_PRESET Selects a predefined reverb configuration based on an environment. This may modify the decay time when changed. Several predefined presets are available (see Table 3 below).
  • Example values for the OBSTRUCT_PRESET (obstruction preset) listed in Table 1 are shown below in Table 2.
  • the obstruction preset value can affect a degree to which a sound source is occluded or blocked from the camera or listener's point of view.
  • a sound source emanating from behind a thick door can be rendered differently than a sound source emanating from behind a curtain.
  • a renderer can perform any desired rendering technique (or none at all) based on the values of these and other object attributes.
  • the REVERB_PRESET reverberation preset
  • the REVERB_PRESET can include example values as shown in Table 3. These reverberation values correspond to types of environments in which a sound source may be located. Thus, a sound source emanating in an auditorium might be rendered differently than a sound source emanating in a living room.
  • an environment object includes a reverberation attribute that includes preset values such as those described below.
  • environment objects are not merely described using the reverberation presets described above. Instead, environment objects can be described with one or more attributes such as an amount of reverberation (that need not be a preset), an amount of echo, a degree of background noise, and so forth. Many other configurations are possible.
  • attributes of audio objects can generally have forms other than values. For example, an attribute can contain a snippet of code or instructions that define a behavior or characteristic of a sound source.
  • FIG. 5A illustrates an embodiment of an audio stream assembly process 500 A.
  • the audio stream assembly process 500 A can be implemented by any of the systems described herein.
  • the stream assembly process 500 A can be implemented by any of the object-oriented encoders or streaming modules described above.
  • the stream assembly process 500 A assembles an audio stream from at least one audio object.
  • an audio object is selected to stream.
  • the audio object may have been created by the audio object creation module 110 described above.
  • selecting the audio object can include accessing the audio object in the object data repository 116 .
  • the streaming module 122 can access the audio object from computer storage.
  • this example FIGURE describes streaming a single object, but it should be understood that multiple objects can be streamed in an audio stream.
  • the object selected can be a static or dynamic object.
  • the selected object has metadata and an audio payload.
  • An object header having metadata of the object is assembled at block 504 .
  • This metadata can include any description of object attributes, some examples of which are described above.
  • an audio payload having the audio signal data of the object is provided.
  • the object header and the audio payload are combined to form the audio stream at block 508 .
  • Forming the audio stream can include encoding the audio stream, compressing the audio stream, and the like.
  • the audio stream is transmitted over a network. While the audio stream can be streamed using any streaming technique, the audio stream can also be uploaded to a user system (or conversely, downloaded by the user system). Thereafter, the audio stream can be rendered by the user system, as described below with respect to FIG. 5B .
  • FIG. 5B illustrates an embodiment of an audio stream rendering process 500 B.
  • the audio stream rendering process 500 B can be implemented by any of the systems described herein.
  • the stream rendering process 500 B can be implemented by any of the renderers described herein.
  • an object-oriented audio stream is received.
  • This audio stream may have been created using the techniques of the process 500 A or with other techniques described above.
  • Object metadata in the audio stream is accessed at block 524 .
  • This metadata may be obtained by decoding the stream using, for example, the same codec used to encode the stream.
  • One or more object attributes in the metadata are identified at block 526 . Values of these object attributes can be identified by the renderer as cues for rendering the audio objects in the stream.
  • An audio signal in the audio stream is rendered at block 528 .
  • the audio stream is rendered according to the one or more object attributes to produce output audio.
  • the output audio is supplied to one or more loudspeakers at block 530 .
  • An adaptive streaming module 122 B and adaptive renderer 142 B were described above with respect to FIG. 1B . More detailed embodiments of an adaptive streaming module 622 and an adaptive renderer 642 are shown in the system 600 of FIG. 6 .
  • the adaptive streaming module 622 has several components, including a priority module 624 , a network resource monitor 626 , an object-oriented encoder 612 , and an audio communications module 628 .
  • the adaptive renderer 642 includes a computing resource monitor 644 and a rendering module 646 . Some of the components shown may be omitted in different implementations.
  • the object-oriented encoder 612 can include any of the encoding features described above.
  • the audio communications module 628 can transmit the bit stream 614 to the adaptive renderer 642 over a network (not shown).
  • the priority module 624 can apply priority values or other priority information to audio objects.
  • each object can have a priority value, which may be a numeric value or the like.
  • Priority values can indicate the relative importance of objects from a rendering standpoint. Objects with higher priority can be more important to render than objects of lower priority. Thus, if resources are constrained, objects with relatively lower priority can be ignored. Priority can initially be established by a content creator, using the audio object creation systems 110 described above.
  • a dialog object that includes dialog for a video might have a relatively higher priority than a background sound object. If the priority values are on a scale from 1 to 5, for instance, the dialog object might have a priority value of 1 (meaning the highest priority), while a background sound object might have a lower priority (e.g., somewhere from 2 to 5).
  • the priority module 624 can establish thresholds for transmitting objects that satisfy certain priority levels. For instance, the priority module 624 can establish a threshold of 3, such that objects having priority of 1, 2, and 3 are transmitted to a user system while objects with a priority of 4 or 5 are not.
  • the priority module 624 can dynamically set this threshold based on changing network conditions, as determined by the network resource monitor 626 .
  • the network resource monitor 626 can monitor available network resources or other quality of service measures, such as bandwidth, latency, and so forth.
  • the network resource monitor 626 can provide this information to the priority module 624 . Using this information, the priority module 624 can adjust the threshold to allow lower priority objects to be transmitted to the user system if network resources are high. Similarly, the priority module 624 can adjust the threshold to prevent lower priority objects from being transmitted when network resources are low.
  • the priority module 624 can also adjust the priority threshold based on information received from the adaptive renderer 642 .
  • the computing resource module 644 of the adaptive renderer 642 can identify characteristics of the playback environment of a user system, such as the number of speakers connected to the user system, the processing capability of the user system, and so forth.
  • the computing resource module 644 can communicate the computing resource information to the priority module 624 over a control channel 650 . Based on this information, the priority module 624 can adjust the threshold to send both higher and lower priority objects if the computing resources are high and solely higher priority objects if the computing resources are low.
  • the computing resource monitor 644 of the adaptive renderer 642 can therefore control the amount and/or type of audio objects that are streamed to the user system.
  • the adaptive renderer 642 can also adjust the way audio streams are rendered based on the playback environment. If the user system is connected to two speakers, for instance, the adaptive renderer 642 can render the audio objects on the two speakers. If additional speakers are connected to the user system, the adaptive renderer 642 can render the audio objects on the additional channels as well.
  • the adaptive renderer 642 may also apply psychoacoustic techniques when rendering the audio objects on one or two (or sometimes more) speakers.
  • the priority module 624 can change the priority of audio objects dynamically. For instance, the priority module 624 can set objects to have relative priority to one another.
  • a dialog object for example, can be assigned a highest priority value by the priority module 624 .
  • Other objects' priority values can be relative to the priority of the dialog object. Thus, if the dialog object is not present for a period of time in the audio stream, the other objects can have relatively higher priority.
  • FIG. 7 illustrates an embodiment of an adaptive streaming process 700 .
  • the adaptive streaming process 700 can be implemented by any of the systems described above, such as the system 600 .
  • the adaptive streaming process 700 facilitates efficient use of streaming resources.
  • Blocks 702 through 708 can be performed by the priority module 624 described above.
  • a request is received from a remote computer for audio content.
  • a user system can send the request to a content server, for instance.
  • computing resource information regarding resources of the remote computer system are received. This computing resource information can describe various available resources of the user system and can be provided together with the audio content request.
  • Network resource information regarding available network resources is also received at block 726 . This network resource information can be obtained by the network resource monitor 626 .
  • a priority threshold is set at block 708 based at least partly on the computer and/or network resource information.
  • the priority module 624 establishes a lower threshold (e.g., to allow lower priority objects in the stream) when both the computing and network resources are relatively high.
  • the priority module 624 can establish a higher threshold (e.g., to allow higher priority objects in the stream) when either computing or network resources are relatively low.
  • Blocks 710 through 714 can be performed by the object-oriented encoder 612 .
  • decision block 710 for a given object in the requested audio content, it is determined whether the priority value for that object satisfies the previously established threshold. If so, at block 712 , the object is added to the audio stream. Otherwise, the object is not added to the audio stream, thereby advantageously saving network and/or computing resources in certain embodiments.
  • the process 700 can be modified in some implementations to remove objects from a pre-encoded audio stream instead of assembling an audio stream on the fly. For instance, in block 710 , if a given object has a priority that does not satisfy a threshold, at block 712 , the object can be removed from the audio stream.
  • content creators can provide an audio stream to a content server with a variety of objects, and the adaptive streaming module at the content server can dynamically remove some of the objects based on the objects' priorities. Selecting audio objects for streaming can therefore include adding objects to a stream, removing objects from a stream, or both.
  • FIG. 8 illustrates an embodiment of an adaptive rendering process 800 .
  • the adaptive rendering process 800 can be implemented by any of the systems described above, such as the system 600 .
  • the adaptive rendering process 800 also facilitates efficient use of streaming resources.
  • an audio stream having a plurality of audio objects is received by a renderer of a user system.
  • the adaptive renderer 642 can receive the audio objects.
  • Playback environment information is accessed at block 804 .
  • the playback environment information can be accessed by the computing resource monitor 644 of the adaptive renderer 642 .
  • This resource information can include information on speaker configurations, computing power, and so forth.
  • Blocks 806 through 810 can be implemented by the rendering module 646 of the adaptive renderer 642 .
  • one or more audio objects are selected based at least partly on the environment information.
  • the rendering module 646 can use the priority values of the objects to select the objects to render. In another embodiment, the rendering module 646 does not select objects based on priority values, but instead down-mixes objects into fewer speaker channels or otherwise uses less processing resources to render the audio.
  • the audio objects are rendered to produce output audio at block 808 .
  • the rendered audio is output to one or more speakers at block 810 .
  • FIGS. 9 through 11 describe example audio object creation techniques in the context of audio-visual reproductions, such as movies, television, podcasting, and the like. However, some or all of the features described with respect to FIGS. 9 through 11 can also be implemented in the pure audio context (e.g., without accompanying video).
  • FIG. 9 illustrates an example scene 900 for object-oriented audio capture.
  • the scene 900 represents a simplified view of an audio-visual scene such as may be constructed for a movie, television, or other video.
  • two actors 910 are performing, and their sounds and actions are recorded by a microphone 920 and camera 930 respectively.
  • a microphone 920 is illustrated, although in some cases the actors 910 may wear individual microphones.
  • individual microphones can also be supplied for props (not shown).
  • location-tracking devices 912 are provided. These location-tracking devices 912 can include GPS devices, motion capture suits, laser range finders, and the like. Data from the location-tracking devices 912 can be transmitted to the audio object creation system 110 together with data from the microphone 920 (or microphones). Time stamps included in the data from the location-tracking devices 912 can be correlated with time stamps obtained from the microphone 920 and/or camera 930 so as to provide position data for each instance of audio. This position data can be used to create audio objects having a position attribute. Similarly, velocity data can be obtained from the location-tracking devices 912 or can be derived from the position data.
  • the location data from the location-tracking devices 912 can be used directly as the position data or can be translated to a coordinate system.
  • Cartesian coordinates 940 in three dimensions (x, y, and z) can be used to track audio object position. Coordinate systems other than Cartesian coordinates may be used as well, such as spherical or cylindrical coordinates.
  • the origin for the coordinate system 940 can be the camera 930 in one embodiment. To facilitate this arrangement, the camera 930 can also include a location-tracking device 912 so as to determine its location relative to the audio objects. Thus, even if the camera's 930 position changes, the position of the audio objects in the scene 900 can still be relative to the camera's 930 position.
  • Positition data can also be applied to audio objects during post-production of an audio-visual production.
  • the coordinates of animated objects (such as characters) can be known to the content creators. These coordinates can be automatically associated with the audio produced by each animated object to create audio objects.
  • FIG. 10 schematically illustrates a system 1000 for object-oriented audio capture that can implement the features described above with respect to FIG. 9 .
  • sound source location data 1002 and microphone data 1006 are provided to an object creation module 1014 .
  • the object creation module 1014 can include all the features of the object creation modules 114 A, 114 B described above.
  • the object creation module 1014 can correlate the sound source location data 1002 for a given sound source with the microphone data 1006 based on timestamps 1004 , 1008 , as described above with respect to FIG. 9 .
  • the object creation module 1014 includes an object linker 1020 that can link or otherwise associate objects together. Certain audio objects may be inherently related to one another and can therefore be automatically linked together by the object linker 1020 . Linked objects can be rendered together in ways that will be described below.
  • Objects may be inherently related to each other because the objects are related to a same higher class of object.
  • the object creation module 1014 can form hierarchies of objects that include parent objects and child objects that are related to and inherent properties of the parent objects. In this manner, audio objects can borrow certain object-oriented principles from computer programming languages.
  • An example of a parent object that may have child objects is a marching band.
  • a marching band can have several sections corresponding to different groups of instruments, such as trombones, flutes, clarinets, and so forth.
  • a content creator using the object creation module 1014 can assign the band to be a parent object and each section to be a child object. Further, the content creator can also assign the individual band members to be child objects of the section objects.
  • the complexity of the object hierarchy including the number of levels in the hierarchy, can be established by the content creator.
  • child objects can inherit properties of their parent objects.
  • child objects can inherit some or all of the metadata of their parent objects.
  • child objects can also inherit some or all of the audio signal data associated with their parent objects.
  • the child objects can modify some or all of this metadata and/or audio signal data. For example, a child object can modify a position attribute inherited from the parent so that the child and parent have differing positions but other similar metadata.
  • the child object's position can also be represented as an offset from the parent object's position or can otherwise be derived from the parent object's position.
  • a section of the band can have a position that is offset from the band's position.
  • the child object representing the band section can automatically update its position based on the offset and the parent band's position. In this manner, different sections of the band having different position offsets can move together.
  • an object-oriented encoder can remove redundant metadata from the child object, replacing the redundant metadata with a reference to the parent's metadata. Likewise, if redundant audio signal data is common to the child and parent objects, the object-oriented encoder can reduce or eliminate the redundant audio signal data.
  • the object linker 1020 of the object creation module 1014 can link child and parent objects together.
  • the object linker 1020 can perform this linking by creating an association between the two objects, which may be reflected in the metadata of the two objects.
  • the object linker 1020 can store this association in an object data repository 1016 .
  • content creators can manually link objects together, for example, even when the objects do not have parent-child relationships.
  • a renderer When a renderer receives two linked objects, the renderer can choose to render the two objects separately or together.
  • a renderer can render the marching band as a sound field of audio objects together on a variety of speakers. As the band moves in a video, for instance, the renderer can move the sound field across the speakers.
  • the renderer can interpret the linking information in a variety of ways.
  • the renderer may, for instance, render linked objects on the same speaker at different times, delayed from one another, or on different speakers at the same time, or the like.
  • the renderer may also render the linked objects at different points in space determined psychoacoustically, so as to provide the impression to the listener that the linked objects are at different points around the listener's head.
  • a renderer can cause the trombone section to appear to be marching to the left of a listener while the clarinet section is marching to the right of the listener.
  • FIG. 11 illustrates an embodiment of a process 1100 for object-oriented audio capture.
  • the process 1100 can be implemented by any of the systems described herein, such as the system 1000 .
  • the process 1100 can be implemented by the object linker 1020 of the object creation module 1014 .
  • audio and location data are received for first and second sound sources.
  • the audio data can be obtained using a microphone, while the location data can be obtained using any of the techniques described above with respect to FIG. 9 .
  • a first audio object is created for the first sound source at block 1104 .
  • a second audio object is created for the second sound source at block 1106 .
  • An association is created between the first and second sound sources at block 1108 .
  • This association can be created automatically by the object linker 1020 based on whether the two objects are related in an object hierarchy. Further, the object linker 1020 can create the association automatically based on other metadata associated with the objects, such as any two similar attributes.
  • the association is stored in computer storage at block 1110 .
  • acts, events, or functions of any of the algorithms described herein can be performed in a different sequence, can be added, merged, or left out all together (e.g., not all described acts or events are necessary for the practice of the algorithm).
  • acts or events can be performed concurrently, e.g., through multi-threaded processing, interrupt processing, or multiple processors or processor cores or on other parallel architectures, rather than sequentially.
  • a machine such as a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor can be a microprocessor, but in the alternative, the processor can be a controller, microcontroller, or state machine, combinations of the same, or the like.
  • a processor can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
  • a software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of computer-readable storage medium known in the art.
  • An exemplary storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium.
  • the storage medium can be integral to the processor.
  • the processor and the storage medium can reside in an ASIC.
  • the ASIC can reside in a user terminal.
  • the processor and the storage medium can reside as discrete components in a user terminal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Quality & Reliability (AREA)
  • Stereophonic System (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)
  • Communication Control (AREA)

Abstract

Systems and methods for providing object-oriented audio are described. Audio objects can be created by associating sound sources with attributes of those sound sources, such as location, velocity, directivity, and the like. Audio objects can be used in place of or in addition to channels to distribute sound, for example, by streaming the audio objects over a network to a client device. The objects can define their locations in space with associated two or three dimensional coordinates. The objects can be adaptively streamed to the client device based on available network or client device resources. A renderer on the client device can use the attributes of the objects to determine how to render the objects. The renderer can further adapt the playback of the objects based on information about a rendering environment of the client device. Various examples of audio object creation techniques are also described.

Description

    RELATED APPLICATION
  • This application claims the benefit of priority under 35 U.S.C. §119(e) of U.S. Provisional Patent Application No. 61/233,931, filed on Aug. 14, 2009, and entitled “Production, Transmission, Storage and Rendering System for Multi-Dimensional Audio,” the disclosure of which is hereby incorporated by reference in its entirety.
  • BACKGROUND
  • Existing audio distribution systems, such as stereo and surround sound, are based on an inflexible paradigm implementing a fixed number of channels from the point of production to the playback environment. Throughout the entire audio chain, there has traditionally been a one-to-one correspondence between the number of channels created and the number of channels physically transmitted or recorded. In some cases, the number of available channels is reduced through a process known as mix-down to accommodate playback configurations with fewer reproduction channels than the number provided in the transmission stream. Common examples of mix-down are mixing stereo to mono for reproduction over a single speaker and mixing multi-channel surround sound to stereo for two-speaker playback.
  • Audio distribution systems are also unsuited for 3D video applications because they are incapable of rendering sound accurately in three-dimensional space. These systems are limited by the number and position of speakers and by the fact that psychoacoustic principles are generally ignored. As a result, even the most elaborate sound systems create merely a rough simulation of an acoustic space, which does not approximate a true 3D or multi-dimensional presentation.
  • SUMMARY
  • Systems and methods for providing object-oriented audio are described. In certain embodiments, audio objects are created by associating sound sources with attributes of those sound sources, such as location, velocity, directivity, and the like. Audio objects can be used in place of or in addition to channels to distribute sound, for example, by streaming the audio objects over a network to a client device. The objects can define their locations in space with associated two or three dimensional coordinates. The objects can be adaptively streamed to the client device based on available network or client device resources. A renderer on the client device can use the attributes of the objects to determine how to render the objects. The renderer can further adapt the playback of the objects based on information about a rendering environment of the client device. Various examples of audio object creation techniques are also described.
  • For purposes of summarizing the disclosure, certain aspects, advantages and novel features of the inventions have been described herein. It is to be understood that not necessarily all such advantages can be achieved in accordance with any particular embodiment of the inventions disclosed herein. Thus, the inventions disclosed herein can be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other advantages as can be taught or suggested herein.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Throughout the drawings, reference numbers are re-used to indicate correspondence between referenced elements. The drawings are provided to illustrate embodiments of the inventions described herein and not to limit the scope thereof.
  • FIGS. 1A and 1B illustrate embodiments of object-oriented audio systems;
  • FIG. 2 illustrates another embodiment of an object-oriented audio system;
  • FIG. 3 illustrates an embodiment of a streaming module for use in any of the object-oriented audio systems described herein;
  • FIG. 4 illustrates an embodiment of an object-oriented audio streaming format;
  • FIG. 5A illustrates an embodiment of an audio stream assembly process;
  • FIG. 5B illustrates an embodiment of an audio stream rendering process;
  • FIG. 6 illustrates an embodiment of an adaptive audio object streaming system;
  • FIG. 7 illustrates an embodiment of an adaptive audio object streaming process;
  • FIG. 8 illustrates an embodiment of an adaptive audio object rendering process;
  • FIG. 9 illustrates an example scene for object-oriented audio capture;
  • FIG. 10 illustrates an embodiment of a system for object-oriented audio capture; and
  • FIG. 11 illustrates an embodiment of a process for object-oriented audio capture.
  • DETAILED DESCRIPTION I. Introduction
  • In addition to the problems with existing systems described above, audio distribution systems do not adequately take into account the playback environment of the listener. Instead, audio systems are designed to deliver the specified number of channels to the final listening environment without any compensation for the environment, listener preferences, or the implementation of psychoacoustic principles. These functions and capabilities are traditionally left to the system integrator.
  • This disclosure describes systems and methods for streaming object-oriented audio that address at least some of these problems. In certain embodiments, audio objects are created by associating sound sources with attributes of those sound sources, such as location, velocity, directivity, and the like. Audio objects can be used in place of or in addition to channels to distribute sound, for example, by streaming the audio objects over a network to a client device. In certain embodiments, these objects are not related to channels or panned positions between channels, but rather define their locations in space with associated two or three dimensional coordinates. A renderer on the client device can use the attributes of the objects to determine how to render the objects.
  • The renderer can also account for the renderer's environment in certain embodiments by adapting the rendering and/or streaming based on available computing resources. Similarly, streaming of the audio objects can be adapted based on network conditions, such as available bandwidth. Various examples of audio object creation techniques are also described. Advantageously, the systems and methods described herein can reduce or overcome the drawbacks associated with the rigid audio channel distribution model.
  • By way of overview, FIGS. 1A and 1B introduce embodiments of object-oriented audio systems. Later Figures describe techniques that can be implemented by these object-oriented audio systems. For example, FIGS. 2 through 5B describe various example techniques for streaming object-oriented audio. FIGS. 6 through 8 describe example techniques for adaptively streaming and rendering object-oriented audio based on environment and network conditions. FIGS. 9 through 11 describe example audio object creation techniques.
  • As used herein, the term “streaming” and its derivatives, in addition to having their ordinary meaning, can mean distribution of content from one computing system (such as a server) to another computing system (such as a client). The term “streaming” and its derivatives can also refer to distributing content through peer-to-peer networks using any of a variety of protocols, including BitTorrent and related protocols.
  • II. Object-Oriented Audio System Overview
  • FIGS. 1A and 1B illustrate embodiments of object- oriented audio systems 100A, 100B. The object-oriented audio systems 100A, 100B can be implemented in computer hardware and/or software. Advantageously, in certain embodiments, the object-oriented audio systems 100A, 100B can enable content creators to create audio objects, stream such objects, and render the objects without being bound to the fixed channel model.
  • Referring specifically to FIG. 1A, the object-oriented audio system 100A includes an audio object creation system 110A, a streaming module 122A implemented in a content server 120A, and a renderer 142A implemented in a user system 140. The audio object creation system 110A can provide functionality for users to create and modify audio objects. The streaming module 122A, shown installed on a content server 120A, can be used to stream audio objects to a user system 140 over a network 130. The network 130 can include a LAN, a WAN, the Internet, or combinations of the same. The renderer 142A on the user system 140 can render the audio objects for output to one or more loudspeakers.
  • In the depicted embodiment, the audio object creation system 110A includes an object creation module 114 and an object-oriented encoder 112A. The object creation module 114 can provide functionality for creating objects, for example, by associating audio data with attributes of the audio data. Any type of audio can be used to generate an audio object. Some examples of audio that can be generated into objects and streamed can include audio associated with movies, television, movie trailers, music, music videos, other online videos, video games, and the like.
  • Initially, audio data can be recorded or otherwise obtained. The object creation module 114 can provide a user interface that enables a user to access, edit, or otherwise manipulate the audio data. The audio data can represent a sound source or a collection of sound sources. Some examples of sound sources include dialog, background music, and sounds generated by any item (such as a car, an airplane, or any prop). More generally, a sound source can be any audio clip.
  • Sound sources can have one or more attributes that the object creation module 114 can associate with the audio data to create an object. Examples of attributes include a location of the sound source, a velocity of a sound source, directivity of a sound source, and the like. Some attributes may be obtained directly from the audio data, such as a time attribute reflecting a time when the audio data was recorded. Other attributes can be supplied by a user to the object creation module 114, such as the type of sound source that generated the audio (e.g., a car versus an actor). Still other attributes can be automatically imported by the object creation module 114 from other devices. As an example, the location of a sound source can be retrieved from a Global Positioning System (GPS) device or the like and imported into the object creation module 114. Additional examples of attributes and techniques for identifying attributes are described in greater detail below. The object creation module 114 can store the audio objects in an object data repository 116, which can include a database or other data storage.
  • The object-oriented encoder 112A can encode one or more audio objects into an audio stream suitable for transmission over a network. In one embodiment, the object-oriented encoder 112A encodes the audio objects as uncompressed PCM (pulse code modulated) audio together with associated attribute metadata. In another embodiment, the object-oriented encoder 112A also applies compression to the objects when creating the stream.
  • Advantageously, in certain embodiments, the audio stream generated by the object-oriented encoder can include at least one object represented by a metadata header and an audio payload. The audio stream can be composed of frames, which can each include object metadata headers and audio payloads. Some objects may include metadata only and no audio payload. Other objects may include an audio payload but little or no metadata. Examples of such objects are described in detail below.
  • The audio object creation system 110A can supply the encoded audio objects to the content server 120A over a network (not shown). The content server 120A can host the encoded audio objects for later transmission. The content server 120A can include one or more machines, such as physical computing devices. The content server 120A can be accessible to user systems over the network 130. For instance, the content server 120A can be a web server, an edge node in a content delivery network (CDN), or the like.
  • The user system 140 can access the content server 120A to request audio content. In response to receiving such a request, the content server 120A can stream, upload, or otherwise transmit the audio content to the user system 140. Any form of computing device can access the audio content. For example, the user system 140 can be a desktop, laptop, tablet, personal digital assistant (PDA), television, wireless handheld device (such as a phone), or the like.
  • The renderer 142A on the user system 140 can decode the encoded audio objects and render the audio objects for output to one or more loudspeakers. The renderer 142A can include a variety of different rendering features, audio enhancements, psychoacoustic enhancements, and the like for rending the audio objects. The renderer 142A can use the object attributes of the audio objects as cues on how to render the audio objects.
  • Referring to FIG. 1B, the object-oriented audio system 100B includes many of the features of the system 100A, such as an audio object creation system 110B, a content server 120B, and a user system 140. The functionality of the components shown can be the same as that described above, with certain differences noted herein. For instance, in the depicted embodiment, the content server 120B includes an adaptive streaming module 122B that can dynamically adapt the amount of object data streamed to the user system 140. Likewise, the user system 140 includes an adaptive renderer 142B that can adapt audio streaming and/or the way objects are rendered by the user system 140.
  • As can be seen from FIG. 1B, the object-oriented encoder 112B has been moved from the audio object creation system 110B to the content server 120B. In the depicted embodiment, the audio object creation system 110B uploads audio objects instead of audio streams to the content server 120B. An adaptive streaming module 122B on the content server 120B includes the object-oriented encoder 112B. Encoding of audio objects is therefore performed on the content server 120B in the depicted embodiment. Alternatively, the audio object creation system 110B can stream encoded objects to the adaptive streaming module 122B, which decodes the audio objects for further manipulation and later re-encoding.
  • By encoding objects on the content server 120B, the adaptive streaming module 122B can dynamically adapt the way objects are encoded prior to streaming. The adaptive streaming module 122B can monitor available network 130 resources, such as network bandwidth, latency, and so forth. Based on the available network resources, the adaptive streaming module 122B can encode more or fewer audio objects into the audio stream. For instance, as network resources become more available, the adaptive streaming module 122B can encode relatively more audio objects into the audio stream, and vice versa.
  • The adaptive streaming module 122B can also adjust the types of objects encoded into the audio stream, rather (or in addition to) than the number. For example, the adaptive streaming module 122B can encode higher priority objects (such as dialog) but not lower priority objects (such as certain background sounds) when network resources are constrained. The concept of adapting streaming based on object priority is described in greater detail below.
  • The adaptive renderer 142B can also affect how audio objects are streamed to the user system 140. For example, the adaptive renderer 142B can communicate with the adaptive streaming module 122B to control the amount and/or type of audio objects streamed to the user system 140. The adaptive renderer 142B can also adjust the way audio streams are rendered based on the playback environment. For example, a large theater may specify the location and capabilities of many tens or hundreds of amplifiers and speakers while a self-contained TV may specify that only two amplifier channels and speakers are available. Based on this information, the systems 100A, 100B can optimize the acoustic field presentation. Many different types of rendering features in the systems 100A, 100B can be applied depending on the reproducing resources and environment, as the incoming audio stream can be descriptive and not dependant on the physical characteristics of the playback environment. These and other features of the adaptive renderer 142B are described in greater detail below.
  • In some embodiments, the adaptive features described herein can be implemented even if an object-oriented encoder (such as the encoder 112A) sends an encoded stream to the adaptive streaming module 122B. Instead of assembling a new audio stream on the fly, the adaptive streaming module 122B can remove objects from or otherwise filter the audio stream when computing resources or network resources become less available. For example, the adaptive streaming module 122B can remove packets from the stream corresponding to objects that are relatively less important to render. Techniques for assigning importance to objects for streaming and/or rendering are described in greater detail below.
  • As can be seen from the above embodiments, the disclosed systems 100A, 100B for audio distribution and playback can encompass the entire chain from initial production of audio content to the perceptual system of the listener(s). The systems 100A, 100B can be scalable and future proof in that conceptual improvements in the transmission/storage or multi-dimensional rendering system can easily be incorporated. The systems 100A, 100B can also easily scale from large format theater based presentations to home theater configurations and self contained TV audio systems.
  • In contrast with existing physical channel based systems, the systems 100A, 100B can abstract the production of audio content to a series of audio objects that provide information about the structure of a scene as well as individual components within a scene. The information associated with each object can be used by the systems 100A, 100B to create the most accurate representation of the information provided, given the resources available. These resources can be specified as an additional input to the systems 100A, 100B.
  • In addition to using physical speakers and amplifiers, the systems 100A, 100B may also incorporate psychoacoustic processing to enhance listener immersion in the acoustic environment as well as to implement positioning of 3D objects that correspond accurately to their position in the visual field. This processing can also be defined to the systems 100A, 100B (e.g., to the renderer 142) as a resource available to enhance or otherwise optimize the presentation of the audio object information contained in the transmission stream.
  • The stream is designed to be extensible so that additional information could be added at any time. The renderer 142A, 142B could be generic or designed to support a particular environment and resource mix. Future improvements and new concepts in audio reproduction could be incorporated at will and the same descriptive information contained in the transmission/storage stream utilized with potentially more accurate rendering. The system 100A, 100B is abstracted to the level that any future physical or conceptual improvements can easily be incorporated at any point within the system 100A, 100B while maintaining compatibility with previous content and rendering systems. Unlike current systems, the system 100A, 100B are flexible and adaptable.
  • For ease of illustration, this specification primarily describes object-oriented audio techniques in the context of streaming audio over a network. However, object-oriented audio techniques can also be implemented in non-network environments. For instance, an object-oriented audio stream can be stored on a computer-readable storage medium, such as a DVD disk, Blue-ray Disk, or the like. A media player (such as a Blue-ray player) can play back the object-oriented audio stream stored on the disk. An object-oriented audio package can also be downloaded to local storage on a user system and then played back from the local storage. Many other variations are possible.
  • It should be appreciated that the functionality of certain components described with respect to FIGS. 1A and 1B can be combined, modified, or omitted. For example, in one implementation, the audio object creation system 110 can be implemented on the content server 120. Audio streams could be streamed directly from the audio object creation system 110 to the user system 140. Many other configurations are possible.
  • III. Audio Object Streaming Embodiments
  • More detailed embodiments of audio object streams will now be described with respect to FIGS. 2 through 5B. Referring to FIG. 2, another embodiment of an object-oriented audio system 200 is shown. The system 200 can implement any of the features of the systems 100A, 100B described above. The system 200 can generate an object-oriented audio stream that can be decoded, rendered, and output by one or more speakers.
  • In the system 200, audio objects 202 are provided to an object-oriented encoder 212. The object-oriented encoder 212 can be implemented by an audio content creation system or a streaming module on a content server, as described above. The object-oriented encoder 212 can encode and/or compress the audio objects into a bit stream 214. The object-oriented encoder 212 can use any codec or compression technique to encode the objects, including compression techniques based on any of the Moving Picture Experts Group (MPEG) standards (e.g., to create MP3 files).
  • In certain embodiments, the object-oriented encoder 212 creates a single bit stream 214 having metadata headers and audio payloads for different audio objects. The object-oriented encoder 212 can transmit the bit stream 214 over a network (see, e.g., FIG. 1B). A decoder 220 implemented on a user system can receive the bit stream 214. The decoder 220 can decode the bit stream 214 into its constituent audio objects 202. The decoder 220 provides the audio objects 202 to a renderer 242. In some embodiments, the renderer 242 can directly implement the functionality of the decoder 220.
  • The renderer 242 can render the audio objects into audio signals 244 suitable for playback on one or more speakers 250. As described above, the renderer 142A can use the object attributes of the audio objects as cues on how to render the audio objects. Advantageously, in certain embodiments, because the audio objects include such attributes, the functionality of the renderer 142A can be changed without changing the format of the audio objects. For example, one type of renderer 142A might use a position attribute of an audio object to pan the audio from one speaker to another. A second renderer 142A might use the same position attribute to perform 3D psychoacoustic filtering to the audio object in response to determining that a psychoacoustic enhancement is available to the renderer 142A. In general, the renderer 142A can take into account some or all resources available to create the best possible presentation. As rendering technology improves, additional renders 142A or rendering resources can be added to the user system 140 that take advantage of the preexisting format of the audio objects.
  • As described above, the object-oriented encoder 212 and/or the renderer 242 can also have adaptive features.
  • FIG. 3 illustrates an embodiment of a streaming module 322 for use with any of the object-oriented audio systems described herein. The streaming module 322 includes an object-oriented encoder 312. The streaming module 322 and encoder 312 can be implemented in hardware and/or software. The depicted embodiment illustrates how different types of audio objects can be encoded into a single bit stream 314.
  • The example streaming module 322 shown receives two different types of objects—static objects 302 and dynamic objects 304. Static objects 302 can represent channels of audio, such as 5.1 channel surround sound. Each channel can be represented as a static object 302. Some content creators may wish to use channels instead of or in addition to the object-based functionality of the systems 100A, 100B. Static objects 302 provide a way for these content creators to use channels, facilitating backwards compatibility with existing fixed channel systems and promoting ease of adoption.
  • Dynamic objects 304 can include any objects that can be used instead of or in addition to the static objects 302. Dynamic objects 304 can include enhancements that, when rendered together with static objects 302, enhance the audio associated with the static objects 302. For example, the dynamic objects 304 can include psychoacoustic information that a renderer can use to enhance the static objects 302. The dynamic objects 304 can also include background objects (such as a passing airplane) that a renderer can use to enhance an audio scene. Dynamic objects 304 need not be background objects, however. The dynamic objects 304 can include dialog or any other audio data.
  • The metadata associated with static objects 302 can be little or nonexistent. In one embodiment, this metadata simply includes the object attribute of “channel,” indicating to which channel the static objects 302 correspond. As this metadata does not change in some implementations, the static objects 302 are therefore static in their object attributes. In contrast, the dynamic objects 304 can include changing object attributes, such as changing position, velocity, and so forth. Thus, the metadata associated with these objects 304 can be dynamic. In some circumstances, however, the metadata associated with static objects 302 can change over time, while the metadata associated with dynamic objects 304 can stay the same.
  • Further, as mentioned above, some dynamic objects 304 can contain little or no audio payload. Environment objects 304, for example, can specify the desired characteristics of the acoustic environment in which a scene takes place. These dynamic objects 304 can include information on the type of building or outdoor area where the audio scene occurs, such as a room, office, cathedral, stadium, or the like. A renderer can use this information to adjust playback of the audio in the static objects 302, for example, by applying an appropriate amount of reverberation or delay corresponding to the indicated environment. Environmental dynamic objects 304 can also include an audio payload in some implementations. Some examples of environment objects are described below with respect to FIG. 4.
  • Another type of object that can include metadata but little or no payload is an audio definition object. In one embodiment, a user system can include a library of audio clips or sounds that can be rendered by the renderer upon receipt of audio definition objects. An audio definition object can include a reference to an audio clip or sound stored on the user system, along with instructions for how long to play the clip, whether to loop the clip, and so forth. An audio stream can be constructed partly or even solely from audio definition objects, with some or all of the actual audio data being stored on the user system (or accessible from another server). In another embodiment, the streaming module 322 can send a plurality of audio definition objects to a user system, followed by a plurality of audio payload objects, separating the metadata and the actual audio. Many other configurations are possible.
  • Content creators can declare static objects 302 or dynamic objects 304 using a descriptive computer language (using, e.g., the audio object creation system 110). When creating audio content to be later streamed, a content creator can declare a desired number of static objects 302. For example, a content creator can request that a dialog static object 302 (e.g., corresponding to a center channel) or any other number of static objects 302 be always on. This “always on” property can also make the static objects 302 static. In contrast, the dynamic objects 304 may come and go and not always be present in the audio stream. Of course, these features may be reversed. It may be desirable to gate or otherwise toggle static objects 302, for instance. When dialog is not present in a given static object 302, for example, not including that static object 302 in an audio stream can save computing and network resources.
  • FIG. 4 illustrates an embodiment of an object-oriented audio streaming format 400. The audio streaming format includes a bit stream 414, which can correspond to any of the bit streams described above. The format 400 of the bit stream 414 is broken down into successively more detailed views (420, 430). The bit stream format 400 shown is merely an example embodiment and can be varied depending on the implementation.
  • In the depicted embodiment, the bit stream 414 includes a stream header 412 and macro frames 420. The stream header 412 can occur at the beginning or end of the bit stream 414. Some examples of information that can be included in the stream header 412 include an author of the stream, an origin of the stream, copyright information, a timestamp related to creation and/or delivery of the stream, length of the stream, information regarding which codec was used to encode the stream, and the like. The stream header 412 can be used by a decoder and/or renderer to properly decode the stream 414.
  • The macro frames 420 divide the bit stream 414 into sections of data. Each macro frame 420 can correspond to an audio scene or a time slice of audio. Each macro frame 420 further includes a macro frame header 422 and individual frames 430. The macro frame header 422 can define a number of audio objects included in the macro frame, a time stamp corresponding to the macro frame 420, and so on. In some implementations, the macro frame header 422 can be placed after the frames 430 in the macro frame 420. The individual frames 430 can each represent a single audio object. However, the frames 430 can also represent multiple audio objects in some implementations. In one embodiment, a renderer receives an entire macro frame 420 before rendering the audio objects associated with the macro frame 420.
  • Each frame 430 includes a frame header 432 containing object metadata and an audio payload 434. In some implementations, the frame header 432 can be placed after the audio payload 434. However, as discussed above, some audio objects may have either only metadata 432 or only an audio payload 434. Thus, some frames 432 may include a frame header 432 with little or no object metadata (or no header at all), and some frames 432 may include little or no audio payload 434.
  • The object metadata in the frame header 432 can include information on object attributes. The following Tables illustrate examples of metadata that can be used to define object attributes. In particular, Table 1 illustrates various object attributes, organized by an attribute name and attribute description. Fewer or more than the attributes shown may be implemented in some designs.
  • TABLE 1
    Example Object Attributes
    ATTRIBUTE NAME ATTRIBUTE DESCRIPTION
    ENABLE_PROCESS Enable/Disable all processes, applies
    to all sources.
    ENABLE_3D_POSITION Enable/Disable the 3D Position
    process.
    SRC_X Modify the sound source's X axis
    position. This is relative to the
    listener and/or the camera.
    SRC_Y Modify the sound source's Y axis
    position. This is relative to the
    listener and/or the camera.
    SRC_Z Modify the sound source's Z axis
    position. This is relative to the
    listener and/or the camera.
    ENABLE_DOPPLER Enable/Disable the Doppler process.
    DOPPLER_FACT Permits scaling/exaggerating the
    Doppler pitch effect.
    SRC_VEL_X Modify the sound source's velocity
    in the X axis direction.
    SRC_VEL_Y Modify the sound source's velocity
    in the Y axis direction.
    SRC_VEL_Z Modify the sound source's velocity
    in the Z axis direction.
    ENABLE_DISTANCE Enable/Disable the Distance
    Attenuation process.
    MINIMUM_DIST The distance from the listener at
    which distance attenuation begins
    to attenuate the signal.
    MAXIMUM_DIST This distance from the listener at
    which distance attenuation no
    longer attenuates the signal.
    SILENCE_AFT_MAX Silence the signal after reaching
    the maximum distance.
    ROLLOFF_FACT The rate at which the source signal
    level decays as a function of
    distance from the listener.
    LISTENER_RELATIVE Sets whether or not the source
    position is relative to listener,
    rather than absolute or to
    the camera.
    LISTENER_X The position of the listener along
    the X-axis.
    LISTENER_Y The position of the listener along
    the Y-axis.
    LISTENER_Z The position of the listener along
    the Z-axis.
    LISTENER_VEL_X The velocity of the listener along
    the X-axis.
    LISTENER_VEL_Y The velocity of the listener along
    the Y-axis.
    LISTENER_VEL_Z The velocity of the listener along
    the Z-axis.
    ENABLE_ORIENTATION Enable/Disable the listener
    orientation manager (this applies
    to all sources).
    LISTENER_ABOVE_X The X-axis orientation vector above
    the listener.
    LISTENER_ABOVE_Y The Y-axis orientation vector above
    the listener.
    LISTENER_ABOVE_Z The Z-axis orientation vector above
    the listener.
    LISTENER_FRONT_X The X-axis orientation vector in
    front of the listener.
    LISTENER_FRONT_Y The Y-axis orientation vector in
    front of the listener.
    LISTENER_FRONT_Z The Z-axis orientation vector in
    front of the listener.
    ENABLE_MACROSCOPIC Enables or disables use of the
    Macroscopic specification of
    an object.
    MACROSCOPIC_X Specifies the x dimension size of
    sound emission.
    MACROSCOPIC_Y Specifies the y dimension size of
    sound emission.
    MACROSCOPIC_Z Specifies the z dimension size of
    sound emission.
    ENABLE_SRC_ORIENT Enables or disables the use of
    orientation on a source.
    SRC_FRONT_X The X-axis orientation vector in
    front of the sound object
    SRC_FRONT_Y The Y-axis orientation vector in
    front of the sound object
    SRC_FRONT_Z The Z-axis orientation vector in
    front of the sound object
    SRC_ABOVE_X The X-axis orientation vector
    above the sound object.
    SRC_ABOVE_Y The Y-axis orientation vector
    above the sound object.
    SRC_ABOVE_Z The Z-axis orientation vector
    above the sound object.
    ENABLE_DIRECTIVITY Enables or disables the
    directivity process.
    DIRECTIVITY_MIN_ANGLE Sets the minimum angle, normalized
    to 360°, for directivity
    attenuation. The angle is centered
    at about the source's front
    orientation creating a cone.
    DIRECTIVITY_MAX_ANGLE Sets the maximum angle, normalized
    to 360°, for directivity
    attenuation.
    DIRECTIVITY_REAR_LEVEL Attenuates the signal by the
    specified fractional amount of
    full-scale.
    ENABLE_OBSTRUCTION Enables or disables the
    obstruction process.
    OBSTRUCT_PRESET A preset HF Level/Level setting
    (see Table 2 below).
    REVERB_ENABLE_PROCSS Enables/Disable the reverb process
    (affects all sources)
    REVERB_DECAY Selects the time for the
    reverberant signal to decay by
    60 dB (overall process).
    REVERB_MIX Specifies the amount of original
    signal to processed signal to use.
    REVERB_PRESET Selects a predefined reverb
    configuration based on an
    environment. This may modify the
    decay time when changed. Several
    predefined presets are available
    (see Table 3 below).
  • Example values for the OBSTRUCT_PRESET (obstruction preset) listed in Table 1 are shown below in Table 2. The obstruction preset value can affect a degree to which a sound source is occluded or blocked from the camera or listener's point of view. Thus, for example, a sound source emanating from behind a thick door can be rendered differently than a sound source emanating from behind a curtain. As discussed above, a renderer can perform any desired rendering technique (or none at all) based on the values of these and other object attributes.
  • TABLE 2
    Example Obstruction Presets
    Obstruction
    Preset Type
    1 Single Door
    2 Double Door
    3 Thin Door
    4 Thick Door
    5 Wood Wall
    6 Brick Wall
    7 Stone Wall
    8 Curtain
  • Like the obstruction preset (sometimes referred to as occlusion), the REVERB_PRESET (reverberation preset) can include example values as shown in Table 3. These reverberation values correspond to types of environments in which a sound source may be located. Thus, a sound source emanating in an auditorium might be rendered differently than a sound source emanating in a living room. In one embodiment, an environment object includes a reverberation attribute that includes preset values such as those described below.
  • TABLE 3
    Example Reverberation Presets
    Reverb
    Preset Type
    1 Alley
    2 Arena
    3 Auditorium
    4 Bathroom
    5 Cave
    6 Chamber
    7 City
    8 Concert Hall
    9 Forest
    10 Hallway
    11 Hangar
    12 Large Room
    13 Living Room
    14 Medium Room
    15 Mountains
    16 Parking Garage
    17 Plate
    18 Room
    19 Under Water
  • In some embodiments, environment objects are not merely described using the reverberation presets described above. Instead, environment objects can be described with one or more attributes such as an amount of reverberation (that need not be a preset), an amount of echo, a degree of background noise, and so forth. Many other configurations are possible. Similarly, attributes of audio objects can generally have forms other than values. For example, an attribute can contain a snippet of code or instructions that define a behavior or characteristic of a sound source.
  • FIG. 5A illustrates an embodiment of an audio stream assembly process 500A. The audio stream assembly process 500A can be implemented by any of the systems described herein. For example, the stream assembly process 500A can be implemented by any of the object-oriented encoders or streaming modules described above. The stream assembly process 500A assembles an audio stream from at least one audio object.
  • At block 502, an audio object is selected to stream. The audio object may have been created by the audio object creation module 110 described above. As such, selecting the audio object can include accessing the audio object in the object data repository 116. Alternatively, the streaming module 122 can access the audio object from computer storage. For ease of illustration, this example FIGURE describes streaming a single object, but it should be understood that multiple objects can be streamed in an audio stream. The object selected can be a static or dynamic object. In this particular example, the selected object has metadata and an audio payload.
  • An object header having metadata of the object is assembled at block 504. This metadata can include any description of object attributes, some examples of which are described above. At block 506, an audio payload having the audio signal data of the object is provided.
  • The object header and the audio payload are combined to form the audio stream at block 508. Forming the audio stream can include encoding the audio stream, compressing the audio stream, and the like. At block 510, the audio stream is transmitted over a network. While the audio stream can be streamed using any streaming technique, the audio stream can also be uploaded to a user system (or conversely, downloaded by the user system). Thereafter, the audio stream can be rendered by the user system, as described below with respect to FIG. 5B.
  • FIG. 5B illustrates an embodiment of an audio stream rendering process 500B. The audio stream rendering process 500B can be implemented by any of the systems described herein. For example, the stream rendering process 500B can be implemented by any of the renderers described herein.
  • At block 522, an object-oriented audio stream is received. This audio stream may have been created using the techniques of the process 500A or with other techniques described above. Object metadata in the audio stream is accessed at block 524. This metadata may be obtained by decoding the stream using, for example, the same codec used to encode the stream.
  • One or more object attributes in the metadata are identified at block 526. Values of these object attributes can be identified by the renderer as cues for rendering the audio objects in the stream.
  • An audio signal in the audio stream is rendered at block 528. In the depicted embodiment, the audio stream is rendered according to the one or more object attributes to produce output audio. The output audio is supplied to one or more loudspeakers at block 530.
  • IV. Adaptive Streaming and Rendering Embodiments
  • An adaptive streaming module 122B and adaptive renderer 142B were described above with respect to FIG. 1B. More detailed embodiments of an adaptive streaming module 622 and an adaptive renderer 642 are shown in the system 600 of FIG. 6.
  • In FIG. 6, the adaptive streaming module 622 has several components, including a priority module 624, a network resource monitor 626, an object-oriented encoder 612, and an audio communications module 628. The adaptive renderer 642 includes a computing resource monitor 644 and a rendering module 646. Some of the components shown may be omitted in different implementations. The object-oriented encoder 612 can include any of the encoding features described above. The audio communications module 628 can transmit the bit stream 614 to the adaptive renderer 642 over a network (not shown).
  • The priority module 624 can apply priority values or other priority information to audio objects. In one embodiment, each object can have a priority value, which may be a numeric value or the like. Priority values can indicate the relative importance of objects from a rendering standpoint. Objects with higher priority can be more important to render than objects of lower priority. Thus, if resources are constrained, objects with relatively lower priority can be ignored. Priority can initially be established by a content creator, using the audio object creation systems 110 described above.
  • As an example, a dialog object that includes dialog for a video might have a relatively higher priority than a background sound object. If the priority values are on a scale from 1 to 5, for instance, the dialog object might have a priority value of 1 (meaning the highest priority), while a background sound object might have a lower priority (e.g., somewhere from 2 to 5). The priority module 624 can establish thresholds for transmitting objects that satisfy certain priority levels. For instance, the priority module 624 can establish a threshold of 3, such that objects having priority of 1, 2, and 3 are transmitted to a user system while objects with a priority of 4 or 5 are not.
  • The priority module 624 can dynamically set this threshold based on changing network conditions, as determined by the network resource monitor 626. The network resource monitor 626 can monitor available network resources or other quality of service measures, such as bandwidth, latency, and so forth. The network resource monitor 626 can provide this information to the priority module 624. Using this information, the priority module 624 can adjust the threshold to allow lower priority objects to be transmitted to the user system if network resources are high. Similarly, the priority module 624 can adjust the threshold to prevent lower priority objects from being transmitted when network resources are low.
  • The priority module 624 can also adjust the priority threshold based on information received from the adaptive renderer 642. The computing resource module 644 of the adaptive renderer 642 can identify characteristics of the playback environment of a user system, such as the number of speakers connected to the user system, the processing capability of the user system, and so forth. The computing resource module 644 can communicate the computing resource information to the priority module 624 over a control channel 650. Based on this information, the priority module 624 can adjust the threshold to send both higher and lower priority objects if the computing resources are high and solely higher priority objects if the computing resources are low. The computing resource monitor 644 of the adaptive renderer 642 can therefore control the amount and/or type of audio objects that are streamed to the user system.
  • The adaptive renderer 642 can also adjust the way audio streams are rendered based on the playback environment. If the user system is connected to two speakers, for instance, the adaptive renderer 642 can render the audio objects on the two speakers. If additional speakers are connected to the user system, the adaptive renderer 642 can render the audio objects on the additional channels as well. The adaptive renderer 642 may also apply psychoacoustic techniques when rendering the audio objects on one or two (or sometimes more) speakers.
  • The priority module 624 can change the priority of audio objects dynamically. For instance, the priority module 624 can set objects to have relative priority to one another. A dialog object, for example, can be assigned a highest priority value by the priority module 624. Other objects' priority values can be relative to the priority of the dialog object. Thus, if the dialog object is not present for a period of time in the audio stream, the other objects can have relatively higher priority.
  • FIG. 7 illustrates an embodiment of an adaptive streaming process 700. The adaptive streaming process 700 can be implemented by any of the systems described above, such as the system 600. The adaptive streaming process 700 facilitates efficient use of streaming resources.
  • Blocks 702 through 708 can be performed by the priority module 624 described above. At block 702, a request is received from a remote computer for audio content. A user system can send the request to a content server, for instance. At block 704, computing resource information regarding resources of the remote computer system are received. This computing resource information can describe various available resources of the user system and can be provided together with the audio content request. Network resource information regarding available network resources is also received at block 726. This network resource information can be obtained by the network resource monitor 626.
  • A priority threshold is set at block 708 based at least partly on the computer and/or network resource information. In one embodiment, the priority module 624 establishes a lower threshold (e.g., to allow lower priority objects in the stream) when both the computing and network resources are relatively high. The priority module 624 can establish a higher threshold (e.g., to allow higher priority objects in the stream) when either computing or network resources are relatively low.
  • Blocks 710 through 714 can be performed by the object-oriented encoder 612. At decision block 710, for a given object in the requested audio content, it is determined whether the priority value for that object satisfies the previously established threshold. If so, at block 712, the object is added to the audio stream. Otherwise, the object is not added to the audio stream, thereby advantageously saving network and/or computing resources in certain embodiments.
  • It is further determined at block 714 whether additional objects remain to be considered for adding to the stream. If so, the process 700 loops back to block 710. Otherwise, the audio stream is transmitted to the remote computing system at block 716, for example, by the audio communications module 628.
  • The process 700 can be modified in some implementations to remove objects from a pre-encoded audio stream instead of assembling an audio stream on the fly. For instance, in block 710, if a given object has a priority that does not satisfy a threshold, at block 712, the object can be removed from the audio stream. Thus, content creators can provide an audio stream to a content server with a variety of objects, and the adaptive streaming module at the content server can dynamically remove some of the objects based on the objects' priorities. Selecting audio objects for streaming can therefore include adding objects to a stream, removing objects from a stream, or both.
  • FIG. 8 illustrates an embodiment of an adaptive rendering process 800. The adaptive rendering process 800 can be implemented by any of the systems described above, such as the system 600. The adaptive rendering process 800 also facilitates efficient use of streaming resources.
  • At block 802, an audio stream having a plurality of audio objects is received by a renderer of a user system. For example, the adaptive renderer 642 can receive the audio objects. Playback environment information is accessed at block 804. The playback environment information can be accessed by the computing resource monitor 644 of the adaptive renderer 642. This resource information can include information on speaker configurations, computing power, and so forth.
  • Blocks 806 through 810 can be implemented by the rendering module 646 of the adaptive renderer 642. At block 806, one or more audio objects are selected based at least partly on the environment information. The rendering module 646 can use the priority values of the objects to select the objects to render. In another embodiment, the rendering module 646 does not select objects based on priority values, but instead down-mixes objects into fewer speaker channels or otherwise uses less processing resources to render the audio. The audio objects are rendered to produce output audio at block 808. The rendered audio is output to one or more speakers at block 810.
  • V. Audio Object Creation Embodiments
  • FIGS. 9 through 11 describe example audio object creation techniques in the context of audio-visual reproductions, such as movies, television, podcasting, and the like. However, some or all of the features described with respect to FIGS. 9 through 11 can also be implemented in the pure audio context (e.g., without accompanying video).
  • FIG. 9 illustrates an example scene 900 for object-oriented audio capture. The scene 900 represents a simplified view of an audio-visual scene such as may be constructed for a movie, television, or other video. In the scene 900, two actors 910 are performing, and their sounds and actions are recorded by a microphone 920 and camera 930 respectively. For simplicity, a single microphone 920 is illustrated, although in some cases the actors 910 may wear individual microphones. Similarly, individual microphones can also be supplied for props (not shown).
  • In order to determine the location, velocity, and other attributes of the sound sources (e.g., the actors) in the present scene 900, location-tracking devices 912 are provided. These location-tracking devices 912 can include GPS devices, motion capture suits, laser range finders, and the like. Data from the location-tracking devices 912 can be transmitted to the audio object creation system 110 together with data from the microphone 920 (or microphones). Time stamps included in the data from the location-tracking devices 912 can be correlated with time stamps obtained from the microphone 920 and/or camera 930 so as to provide position data for each instance of audio. This position data can be used to create audio objects having a position attribute. Similarly, velocity data can be obtained from the location-tracking devices 912 or can be derived from the position data.
  • The location data from the location-tracking devices 912 (such as GPS-derived latitude and longitude) can be used directly as the position data or can be translated to a coordinate system. For instance, Cartesian coordinates 940 in three dimensions (x, y, and z) can be used to track audio object position. Coordinate systems other than Cartesian coordinates may be used as well, such as spherical or cylindrical coordinates. The origin for the coordinate system 940 can be the camera 930 in one embodiment. To facilitate this arrangement, the camera 930 can also include a location-tracking device 912 so as to determine its location relative to the audio objects. Thus, even if the camera's 930 position changes, the position of the audio objects in the scene 900 can still be relative to the camera's 930 position.
  • Positition data can also be applied to audio objects during post-production of an audio-visual production. For animation productions, the coordinates of animated objects (such as characters) can be known to the content creators. These coordinates can be automatically associated with the audio produced by each animated object to create audio objects.
  • FIG. 10 schematically illustrates a system 1000 for object-oriented audio capture that can implement the features described above with respect to FIG. 9. In the system 1000, sound source location data 1002 and microphone data 1006 are provided to an object creation module 1014. The object creation module 1014 can include all the features of the object creation modules 114A, 114B described above. The object creation module 1014 can correlate the sound source location data 1002 for a given sound source with the microphone data 1006 based on timestamps 1004, 1008, as described above with respect to FIG. 9.
  • Additionally, the object creation module 1014 includes an object linker 1020 that can link or otherwise associate objects together. Certain audio objects may be inherently related to one another and can therefore be automatically linked together by the object linker 1020. Linked objects can be rendered together in ways that will be described below.
  • Objects may be inherently related to each other because the objects are related to a same higher class of object. In other words, the object creation module 1014 can form hierarchies of objects that include parent objects and child objects that are related to and inherent properties of the parent objects. In this manner, audio objects can borrow certain object-oriented principles from computer programming languages. An example of a parent object that may have child objects is a marching band. A marching band can have several sections corresponding to different groups of instruments, such as trombones, flutes, clarinets, and so forth. A content creator using the object creation module 1014 can assign the band to be a parent object and each section to be a child object. Further, the content creator can also assign the individual band members to be child objects of the section objects. The complexity of the object hierarchy, including the number of levels in the hierarchy, can be established by the content creator.
  • As mentioned above, child objects can inherit properties of their parent objects. Thus, child objects can inherit some or all of the metadata of their parent objects. In some cases, child objects can also inherit some or all of the audio signal data associated with their parent objects. The child objects can modify some or all of this metadata and/or audio signal data. For example, a child object can modify a position attribute inherited from the parent so that the child and parent have differing positions but other similar metadata.
  • The child object's position can also be represented as an offset from the parent object's position or can otherwise be derived from the parent object's position. Referring to the marching band example, a section of the band can have a position that is offset from the band's position. As the band changes position, the child object representing the band section can automatically update its position based on the offset and the parent band's position. In this manner, different sections of the band having different position offsets can move together.
  • Inheritance between child and parent objects can result in common metadata between child and parent objects. This overlap in metadata can be exploited by any of the object-oriented encoders described above to optimize or reduce data in the audio stream. In one embodiment, an object-oriented encoder can remove redundant metadata from the child object, replacing the redundant metadata with a reference to the parent's metadata. Likewise, if redundant audio signal data is common to the child and parent objects, the object-oriented encoder can reduce or eliminate the redundant audio signal data. These techniques are merely examples of many optimization techniques that the object-oriented encoder can implement to reduce or eliminate redundant data in the audio stream.
  • Moreover, the object linker 1020 of the object creation module 1014 can link child and parent objects together. The object linker 1020 can perform this linking by creating an association between the two objects, which may be reflected in the metadata of the two objects. The object linker 1020 can store this association in an object data repository 1016. Also, in some embodiments, content creators can manually link objects together, for example, even when the objects do not have parent-child relationships.
  • When a renderer receives two linked objects, the renderer can choose to render the two objects separately or together. Thus, instead of rendering a marching band as a single point source on one speaker, for instance, a renderer can render the marching band as a sound field of audio objects together on a variety of speakers. As the band moves in a video, for instance, the renderer can move the sound field across the speakers.
  • More generally, the renderer can interpret the linking information in a variety of ways. The renderer may, for instance, render linked objects on the same speaker at different times, delayed from one another, or on different speakers at the same time, or the like. The renderer may also render the linked objects at different points in space determined psychoacoustically, so as to provide the impression to the listener that the linked objects are at different points around the listener's head. Thus, for example, a renderer can cause the trombone section to appear to be marching to the left of a listener while the clarinet section is marching to the right of the listener.
  • FIG. 11 illustrates an embodiment of a process 1100 for object-oriented audio capture. The process 1100 can be implemented by any of the systems described herein, such as the system 1000. For example, the process 1100 can be implemented by the object linker 1020 of the object creation module 1014.
  • At block 1102, audio and location data are received for first and second sound sources. The audio data can be obtained using a microphone, while the location data can be obtained using any of the techniques described above with respect to FIG. 9.
  • A first audio object is created for the first sound source at block 1104. Similarly, a second audio object is created for the second sound source at block 1106. An association is created between the first and second sound sources at block 1108. This association can be created automatically by the object linker 1020 based on whether the two objects are related in an object hierarchy. Further, the object linker 1020 can create the association automatically based on other metadata associated with the objects, such as any two similar attributes. The association is stored in computer storage at block 1110.
  • VI. Terminology
  • Depending on the embodiment, certain acts, events, or functions of any of the algorithms described herein can be performed in a different sequence, can be added, merged, or left out all together (e.g., not all described acts or events are necessary for the practice of the algorithm). Moreover, in certain embodiments, acts or events can be performed concurrently, e.g., through multi-threaded processing, interrupt processing, or multiple processors or processor cores or on other parallel architectures, rather than sequentially.
  • The various illustrative logical blocks, modules, and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. The described functionality can be implemented in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosure.
  • The various illustrative logical blocks and modules described in connection with the embodiments disclosed herein can be implemented or performed by a machine, such as a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor can be a microprocessor, but in the alternative, the processor can be a controller, microcontroller, or state machine, combinations of the same, or the like. A processor can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
  • The steps of a method, process, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of computer-readable storage medium known in the art. An exemplary storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium can be integral to the processor. The processor and the storage medium can reside in an ASIC. The ASIC can reside in a user terminal. In the alternative, the processor and the storage medium can reside as discrete components in a user terminal.
  • Conditional language used herein, such as, among others, “can,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or states. Thus, such conditional language is not generally intended to imply that features, elements and/or states are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or states are included or are to be performed in any particular embodiment.
  • While the above detailed description has shown, described, and pointed out novel features as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the devices or algorithms illustrated can be made without departing from the spirit of the disclosure. As will be recognized, certain embodiments of the inventions described herein can be embodied within a form that does not provide all of the features and benefits set forth herein, as some features can be used or practiced separately from others. The scope of certain inventions disclosed herein is indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (17)

1. A method of generating an object-oriented audio stream, the method comprising:
selecting an audio object for transmission in an audio stream, the audio object comprising audio signal data and object metadata, the object metadata comprising one or more object attributes;
assembling an object header comprising the object metadata;
providing an audio payload comprising the audio signal data;
combining, with one or more processors, the object header and the audio payload to form at least a portion of the audio stream; and
transmitting the audio stream over a network.
2. The method of claim 1, wherein said transmitting comprises transmitting the audio stream as a single stream over the network.
3. The method of claim 1, wherein the one or more object attributes comprise at least one or more of the following: location of the audio object, velocity of the audio object, occlusion of the audio object, and an environment associated with the audio object.
4. The method of claim 1, wherein said combining comprises forming the audio stream from a plurality of variable-length frames, wherein a length of each frame depends at least partly on an amount of the object metadata associated with each frame.
5. The method of claim 1, further comprising compressing the audio stream prior to transmitting the audio stream over the network.
6. The method of claim 1, wherein the audio object comprises a static object.
7. The method of claim 6, wherein the static object represents a channel of audio.
8. The method of claim 6, further comprising placing a dynamic audio object in the audio stream, the dynamic audio object comprising enhancement data configured to enhance the static object.
9. The method of claim 1, further comprising reducing redundant object metadata in the audio stream.
10. A system for generating an object-oriented audio stream, the system comprising:
an object-oriented streaming module implemented in one or more processors, the object-oriented streaming module configured to:
select an audio object representative of a sound source, the audio object comprising audio signal data and object metadata, the object metadata comprising one or more attributes of the sound source;
encode the object metadata together with the audio signal data to form at least a portion of a single object-oriented audio stream; and
transmit the object-oriented audio stream over a network.
11. The system of claim 10, wherein the object-oriented streaming module is further configured to insert a second audio object into the object-oriented audio stream, the second audio object comprising solely second object metadata without an audio payload.
12. The system of claim 11, wherein the second object metadata of the second audio object comprises environmental definition data.
13. The system of claim 10, wherein the object-oriented streaming module is further configured to encode the object metadata together with the audio signal data by at least compressing one or both of the object metadata and the audio signal data.
14. The system of claim 10, wherein the one or more attributes of the sound source comprise a location of the sound source.
15. The system of claim 14, wherein the location of the sound source is determined with respect to a camera view of video associated with the audio object.
16. The system of claim 10, wherein the one or more attributes of the sound source comprise two or more of the following:
a location of the sound source represented by the audio object;
a velocity of the sound source;
directivity of the sound source;
occlusion of the sound source; and
an environment associated with the sound source.
17. The system of claim 10, wherein the object-oriented streaming module is further configured to reduce redundant object metadata in the audio stream.
US12/856,442 2009-08-14 2010-08-13 Object-oriented audio streaming system Active US8396575B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/856,442 US8396575B2 (en) 2009-08-14 2010-08-13 Object-oriented audio streaming system
US13/791,488 US9167346B2 (en) 2009-08-14 2013-03-08 Object-oriented audio streaming system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US23393109P 2009-08-14 2009-08-14
US12/856,442 US8396575B2 (en) 2009-08-14 2010-08-13 Object-oriented audio streaming system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/791,488 Continuation US9167346B2 (en) 2009-08-14 2013-03-08 Object-oriented audio streaming system

Publications (2)

Publication Number Publication Date
US20110040395A1 true US20110040395A1 (en) 2011-02-17
US8396575B2 US8396575B2 (en) 2013-03-12

Family

ID=43586534

Family Applications (4)

Application Number Title Priority Date Filing Date
US12/856,449 Active US8396576B2 (en) 2009-08-14 2010-08-13 System for adaptively streaming audio objects
US12/856,450 Active US8396577B2 (en) 2009-08-14 2010-08-13 System for creating audio objects for streaming
US12/856,442 Active US8396575B2 (en) 2009-08-14 2010-08-13 Object-oriented audio streaming system
US13/791,488 Active 2031-07-02 US9167346B2 (en) 2009-08-14 2013-03-08 Object-oriented audio streaming system

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US12/856,449 Active US8396576B2 (en) 2009-08-14 2010-08-13 System for adaptively streaming audio objects
US12/856,450 Active US8396577B2 (en) 2009-08-14 2010-08-13 System for creating audio objects for streaming

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/791,488 Active 2031-07-02 US9167346B2 (en) 2009-08-14 2013-03-08 Object-oriented audio streaming system

Country Status (8)

Country Link
US (4) US8396576B2 (en)
EP (3) EP2465259A4 (en)
JP (2) JP5726874B2 (en)
KR (3) KR20120062758A (en)
CN (2) CN102549655B (en)
ES (1) ES2793958T3 (en)
PL (1) PL2465114T3 (en)
WO (2) WO2011020065A1 (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040396A1 (en) * 2009-08-14 2011-02-17 Srs Labs, Inc. System for adaptively streaming audio objects
WO2012054750A1 (en) 2010-10-20 2012-04-26 Srs Labs, Inc. Stereo image widening system
WO2012122397A1 (en) * 2011-03-09 2012-09-13 Srs Labs, Inc. System for dynamically creating and rendering audio objects
WO2013032822A2 (en) 2011-08-26 2013-03-07 Dts Llc Audio adjustment system
KR20130127344A (en) * 2012-05-14 2013-11-22 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
WO2014025752A1 (en) * 2012-08-07 2014-02-13 Dolby Laboratories Licensing Corporation Encoding and rendering of object based audio indicative of game audio content
CN103650539A (en) * 2011-07-01 2014-03-19 杜比实验室特许公司 System and method for adaptive audio signal generation, coding and rendering
US20140112480A1 (en) * 2011-06-15 2014-04-24 Dolby Laboratories Licensing Corporation Method for capturing and playback of sound originating from a plurality of sound sources
US20140126758A1 (en) * 2011-06-24 2014-05-08 Bright Minds Holding B.V. Method and device for processing sound data
US20150025664A1 (en) * 2013-07-22 2015-01-22 Dolby Laboratories Licensing Corporation Interactive Audio Content Generation, Delivery, Playback and Sharing
US20150221319A1 (en) * 2012-09-21 2015-08-06 Dolby International Ab Methods and systems for selecting layers of encoded audio signals for teleconferencing
US20150244869A1 (en) * 2012-09-27 2015-08-27 Dolby Laboratories Licensing Corporation Spatial Multiplexing in a Soundfield Teleconferencing System
US9190065B2 (en) 2012-07-15 2015-11-17 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for three-dimensional audio coding using basis function coefficients
US20160029138A1 (en) * 2013-04-03 2016-01-28 Dolby Laboratories Licensing Corporation Methods and Systems for Interactive Rendering of Object Based Audio
US9258664B2 (en) 2013-05-23 2016-02-09 Comhear, Inc. Headphone audio enhancement system
CN105578380A (en) * 2011-07-01 2016-05-11 杜比实验室特许公司 System and Method for Adaptive Audio Signal Generation, Coding and Rendering
US9367283B2 (en) * 2014-07-22 2016-06-14 Sonos, Inc. Audio settings
US20160300577A1 (en) * 2015-04-08 2016-10-13 Dolby International Ab Rendering of Audio Content
US9479886B2 (en) 2012-07-20 2016-10-25 Qualcomm Incorporated Scalable downmix design with feedback for object-based surround codec
US9489954B2 (en) 2012-08-07 2016-11-08 Dolby Laboratories Licensing Corporation Encoding and rendering of object based audio indicative of game audio content
US20160357501A1 (en) * 2015-06-03 2016-12-08 Skullcandy, Inc. Audio devices and related methods for acquiring audio device use information
US20170013387A1 (en) * 2014-04-02 2017-01-12 Dolby International Ab Exploiting metadata redundancy in immersive audio metadata
US9558785B2 (en) 2013-04-05 2017-01-31 Dts, Inc. Layered audio coding and transmission
US9622014B2 (en) 2012-06-19 2017-04-11 Dolby Laboratories Licensing Corporation Rendering and playback of spatial audio using channel-based audio systems
US20170223476A1 (en) * 2013-07-31 2017-08-03 Dolby International Ab Processing Spatially Diffuse or Large Audio Objects
US9761229B2 (en) 2012-07-20 2017-09-12 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for audio object clustering
WO2017173155A1 (en) * 2016-03-30 2017-10-05 Microsoft Technology Licensing, Llc Spatial audio resource management and mixing for applications
TWI607654B (en) * 2011-07-01 2017-12-01 杜比實驗室特許公司 Apparatus, method and non-transitory medium for enhanced 3d audio authoring and rendering
CN107454511A (en) * 2012-08-31 2017-12-08 杜比实验室特许公司 Loudspeakers for reflecting sound from viewing screens or display surfaces
US9886234B2 (en) 2016-01-28 2018-02-06 Sonos, Inc. Systems and methods of distributing audio to one or more playback devices
US20180167755A1 (en) * 2016-12-14 2018-06-14 Nokia Technologies Oy Distributed Audio Mixing
US20180275955A1 (en) * 2015-12-01 2018-09-27 Fraunhofer-Gesellschaft Zur Foerderung De Angewandten Forschung E.V. System for outputting audio signals and respective method and setting device
US20180332424A1 (en) * 2017-05-12 2018-11-15 Microsoft Technology Licensing, Llc Spatializing audio data based on analysis of incoming audio data
US10136240B2 (en) * 2015-04-20 2018-11-20 Dolby Laboratories Licensing Corporation Processing audio data to compensate for partial hearing loss or an adverse hearing environment
US10171971B2 (en) 2015-12-21 2019-01-01 Skullcandy, Inc. Electrical systems and related methods for providing smart mobile electronic device features to a user of a wearable device
US20190191258A1 (en) * 2015-02-06 2019-06-20 Dolby Laboratories Licensing Corporation Methods and systems for rendering audio based on priority
US10599382B2 (en) * 2013-11-05 2020-03-24 Sony Corporation Information processing device and information processing method for indicating a position outside a display region
EP3566456A4 (en) * 2017-01-06 2020-08-19 Nokia Technologies Oy Discovery, announcement and assignment of position tracks
US10848894B2 (en) * 2018-04-09 2020-11-24 Nokia Technologies Oy Controlling audio in multi-viewpoint omnidirectional content
US11178503B2 (en) * 2012-08-31 2021-11-16 Dolby Laboratories Licensing Corporation System for rendering and playback of object based audio in various listening environments
US11386913B2 (en) * 2017-08-01 2022-07-12 Dolby Laboratories Licensing Corporation Audio object classification based on location metadata
US11528576B2 (en) 2016-12-05 2022-12-13 Magic Leap, Inc. Distributed audio capturing techniques for virtual reality (VR), augmented reality (AR), and mixed reality (MR) systems

Families Citing this family (126)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10296561B2 (en) 2006-11-16 2019-05-21 James Andrews Apparatus, method and graphical user interface for providing a sound link for combining, publishing and accessing websites and audio files on the internet
US9361295B1 (en) 2006-11-16 2016-06-07 Christopher C. Andrews Apparatus, method and graphical user interface for providing a sound link for combining, publishing and accessing websites and audio files on the internet
US20120244863A1 (en) * 2011-03-23 2012-09-27 Opanga Networks Inc. System and method for dynamic service offering based on available resources
US20120253493A1 (en) 2011-04-04 2012-10-04 Andrews Christopher C Automatic audio recording and publishing system
WO2012145709A2 (en) * 2011-04-20 2012-10-26 Aurenta Inc. A method for encoding multiple microphone signals into a source-separable audio signal for network transmission and an apparatus for directed source separation
US9084068B2 (en) * 2011-05-30 2015-07-14 Sony Corporation Sensor-based placement of sound in video recording
US20130007218A1 (en) * 2011-06-28 2013-01-03 Cisco Technology, Inc. Network Assisted Tracker for Better P2P Traffic Management
RU2564681C2 (en) * 2011-07-01 2015-10-10 Долби Лабораторис Лайсэнзин Корпорейшн Methods and systems of synchronisation and changeover for adaptive sound system
US9247182B2 (en) 2011-10-10 2016-01-26 Eyeview, Inc. Using cluster computing for generating personalized dynamic videos
US8832226B2 (en) * 2011-10-10 2014-09-09 Eyeview, Inc. Using cloud computing for generating personalized dynamic and broadcast quality videos
US9654821B2 (en) 2011-12-30 2017-05-16 Sonos, Inc. Systems and methods for networked music playback
US8856272B2 (en) * 2012-01-08 2014-10-07 Harman International Industries, Incorporated Cloud hosted audio rendering based upon device and environment profiles
US9578438B2 (en) 2012-03-30 2017-02-21 Barco Nv Apparatus and method for driving loudspeakers of a sound system in a vehicle
KR101915258B1 (en) * 2012-04-13 2018-11-05 한국전자통신연구원 Apparatus and method for providing the audio metadata, apparatus and method for providing the audio data, apparatus and method for playing the audio data
US9674587B2 (en) 2012-06-26 2017-06-06 Sonos, Inc. Systems and methods for networked music playback including remote add to queue
WO2014021588A1 (en) 2012-07-31 2014-02-06 인텔렉추얼디스커버리 주식회사 Method and device for processing audio signal
EP2717262A1 (en) * 2012-10-05 2014-04-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Encoder, decoder and methods for signal-dependent zoom-transform in spatial audio object coding
KR20140046980A (en) * 2012-10-11 2014-04-21 한국전자통신연구원 Apparatus and method for generating audio data, apparatus and method for playing audio data
KR20140047509A (en) 2012-10-12 2014-04-22 한국전자통신연구원 Audio coding/decoding apparatus using reverberation signal of object audio signal
WO2014058138A1 (en) * 2012-10-12 2014-04-17 한국전자통신연구원 Audio encoding/decoding device using reverberation signal of object audio signal
KR20230011500A (en) * 2013-01-21 2023-01-20 돌비 레버러토리즈 라이쎈싱 코오포레이션 Decoding of encoded audio bitstream with metadata container located in reserved data space
EP2757559A1 (en) * 2013-01-22 2014-07-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for spatial audio object coding employing hidden objects for signal mixture manipulation
US9191742B1 (en) * 2013-01-29 2015-11-17 Rawles Llc Enhancing audio at a network-accessible computing platform
US9357215B2 (en) 2013-02-12 2016-05-31 Michael Boden Audio output distribution
US10038957B2 (en) * 2013-03-19 2018-07-31 Nokia Technologies Oy Audio mixing based upon playing device location
WO2014159898A1 (en) 2013-03-29 2014-10-02 Dolby Laboratories Licensing Corporation Methods and apparatuses for generating and using low-resolution preview tracks with high-quality encoded object and multichannel audio signals
US20160066118A1 (en) * 2013-04-15 2016-03-03 Intellectual Discovery Co., Ltd. Audio signal processing method using generating virtual object
US9501533B2 (en) 2013-04-16 2016-11-22 Sonos, Inc. Private queue for a media playback system
US9247363B2 (en) 2013-04-16 2016-01-26 Sonos, Inc. Playback queue transfer in a media playback system
US9361371B2 (en) 2013-04-16 2016-06-07 Sonos, Inc. Playlist update in a media playback system
WO2014184618A1 (en) 2013-05-17 2014-11-20 Nokia Corporation Spatial object oriented audio apparatus
US9666198B2 (en) 2013-05-24 2017-05-30 Dolby International Ab Reconstruction of audio scenes from a downmix
CN105229732B (en) 2013-05-24 2018-09-04 杜比国际公司 The high efficient coding of audio scene including audio object
WO2014187991A1 (en) 2013-05-24 2014-11-27 Dolby International Ab Efficient coding of audio scenes comprising audio objects
CN117012210A (en) 2013-05-24 2023-11-07 杜比国际公司 Method, apparatus and computer readable medium for decoding audio scene
CN104240711B (en) * 2013-06-18 2019-10-11 杜比实验室特许公司 For generating the mthods, systems and devices of adaptive audio content
GB2516056B (en) 2013-07-09 2021-06-30 Nokia Technologies Oy Audio processing apparatus
EP2830045A1 (en) 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concept for audio encoding and decoding for audio channels and audio objects
EP2830049A1 (en) 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for efficient object metadata coding
EP2830050A1 (en) 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for enhanced spatial audio object coding
WO2015056383A1 (en) 2013-10-17 2015-04-23 パナソニック株式会社 Audio encoding device and audio decoding device
CN108712711B (en) 2013-10-31 2021-06-15 杜比实验室特许公司 Binaural rendering of headphones using metadata processing
US9596280B2 (en) 2013-11-11 2017-03-14 Amazon Technologies, Inc. Multiple stream content presentation
US9805479B2 (en) 2013-11-11 2017-10-31 Amazon Technologies, Inc. Session idle optimization for streaming server
US9641592B2 (en) 2013-11-11 2017-05-02 Amazon Technologies, Inc. Location of actor resources
US9634942B2 (en) 2013-11-11 2017-04-25 Amazon Technologies, Inc. Adaptive scene complexity based on service quality
US9604139B2 (en) 2013-11-11 2017-03-28 Amazon Technologies, Inc. Service for generating graphics object data
US9582904B2 (en) 2013-11-11 2017-02-28 Amazon Technologies, Inc. Image composition based on remote object data
WO2015080967A1 (en) 2013-11-28 2015-06-04 Dolby Laboratories Licensing Corporation Position-based gain adjustment of object-based audio and ring-based channel audio
CN104882145B (en) * 2014-02-28 2019-10-29 杜比实验室特许公司 It is clustered using the audio object of the time change of audio object
US9564136B2 (en) * 2014-03-06 2017-02-07 Dts, Inc. Post-encoding bitrate reduction of multiple object audio
JP6439296B2 (en) * 2014-03-24 2018-12-19 ソニー株式会社 Decoding apparatus and method, and program
JP6863359B2 (en) * 2014-03-24 2021-04-21 ソニーグループ株式会社 Decoding device and method, and program
EP2928216A1 (en) 2014-03-26 2015-10-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for screen related audio object remapping
WO2015150384A1 (en) 2014-04-01 2015-10-08 Dolby International Ab Efficient coding of audio scenes comprising audio objects
WO2015152661A1 (en) * 2014-04-02 2015-10-08 삼성전자 주식회사 Method and apparatus for rendering audio object
US9959876B2 (en) * 2014-05-16 2018-05-01 Qualcomm Incorporated Closed loop quantization of higher order ambisonic coefficients
JP6432180B2 (en) * 2014-06-26 2018-12-05 ソニー株式会社 Decoding apparatus and method, and program
EP3171601A4 (en) 2014-07-14 2018-05-16 SK TechX Co., Ltd. Cloud streaming service system, data compressing method for preventing memory bottlenecking, and device for same
KR102199276B1 (en) 2014-08-20 2021-01-06 에스케이플래닛 주식회사 System for cloud streaming service, method for processing service based on type of cloud streaming service and apparatus for the same
EP3002960A1 (en) * 2014-10-04 2016-04-06 Patents Factory Ltd. Sp. z o.o. System and method for generating surround sound
CN105895086B (en) * 2014-12-11 2021-01-12 杜比实验室特许公司 Metadata-preserving audio object clustering
EP3254435B1 (en) 2015-02-03 2020-08-26 Dolby Laboratories Licensing Corporation Post-conference playback system having higher perceived quality than originally heard in the conference
WO2016126819A1 (en) 2015-02-03 2016-08-11 Dolby Laboratories Licensing Corporation Optimized virtual scene layout for spatial meeting playback
EP3254477A1 (en) 2015-02-03 2017-12-13 Dolby Laboratories Licensing Corporation Adaptive audio construction
US9560393B2 (en) * 2015-02-20 2017-01-31 Disney Enterprises, Inc. Media processing node
CN105989845B (en) * 2015-02-25 2020-12-08 杜比实验室特许公司 Video content assisted audio object extraction
WO2016148553A2 (en) * 2015-03-19 2016-09-22 (주)소닉티어랩 Method and device for editing and providing three-dimensional sound
WO2016148552A2 (en) * 2015-03-19 2016-09-22 (주)소닉티어랩 Device and method for reproducing three-dimensional sound image in sound image externalization
US20160315722A1 (en) * 2015-04-22 2016-10-27 Apple Inc. Audio stem delivery and control
CN105070304B (en) 2015-08-11 2018-09-04 小米科技有限责任公司 Realize method and device, the electronic equipment of multi-object audio recording
CN108141692B (en) 2015-08-14 2020-09-29 Dts(英属维尔京群岛)有限公司 Bass management system and method for object-based audio
US20170098452A1 (en) * 2015-10-02 2017-04-06 Dts, Inc. Method and system for audio processing of dialog, music, effect and height objects
US9877137B2 (en) 2015-10-06 2018-01-23 Disney Enterprises, Inc. Systems and methods for playing a venue-specific object-based audio
CN106935251B (en) * 2015-12-30 2019-09-17 瑞轩科技股份有限公司 Audio playing apparatus and method
WO2017130210A1 (en) * 2016-01-27 2017-08-03 Indian Institute Of Technology Bombay Method and system for rendering audio streams
WO2017208820A1 (en) * 2016-05-30 2017-12-07 ソニー株式会社 Video sound processing device, video sound processing method, and program
EP3255905A1 (en) * 2016-06-07 2017-12-13 Nokia Technologies Oy Distributed audio mixing
EP3255904A1 (en) * 2016-06-07 2017-12-13 Nokia Technologies Oy Distributed audio mixing
US9980078B2 (en) 2016-10-14 2018-05-22 Nokia Technologies Oy Audio object modification in free-viewpoint rendering
US10555107B2 (en) * 2016-10-28 2020-02-04 Panasonic Intellectual Property Corporation Of America Binaural rendering apparatus and method for playing back of multiple audio sources
EP3470976A1 (en) * 2017-10-12 2019-04-17 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for efficient delivery and usage of audio messages for high quality of experience
US11064453B2 (en) * 2016-11-18 2021-07-13 Nokia Technologies Oy Position stream session negotiation for spatial audio applications
US10424307B2 (en) * 2017-01-03 2019-09-24 Nokia Technologies Oy Adapting a distributed audio recording for end user free viewpoint monitoring
US11096004B2 (en) 2017-01-23 2021-08-17 Nokia Technologies Oy Spatial audio rendering point extension
WO2018144367A1 (en) * 2017-02-03 2018-08-09 iZotope, Inc. Audio control system and related methods
US10531219B2 (en) 2017-03-20 2020-01-07 Nokia Technologies Oy Smooth rendering of overlapping audio-object interactions
US20180315437A1 (en) * 2017-04-28 2018-11-01 Microsoft Technology Licensing, Llc Progressive Streaming of Spatial Audio
US11074036B2 (en) 2017-05-05 2021-07-27 Nokia Technologies Oy Metadata-free audio-object interactions
US10165386B2 (en) * 2017-05-16 2018-12-25 Nokia Technologies Oy VR audio superzoom
GB2562488A (en) 2017-05-16 2018-11-21 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
US11303689B2 (en) 2017-06-06 2022-04-12 Nokia Technologies Oy Method and apparatus for updating streamed content
US11395087B2 (en) 2017-09-29 2022-07-19 Nokia Technologies Oy Level-based audio-object interactions
US10854209B2 (en) * 2017-10-03 2020-12-01 Qualcomm Incorporated Multi-stream audio coding
US10531222B2 (en) 2017-10-18 2020-01-07 Dolby Laboratories Licensing Corporation Active acoustics control for near- and far-field sounds
RU2020120328A (en) * 2017-12-28 2021-12-20 Сони Корпорейшн INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM
US11393483B2 (en) 2018-01-26 2022-07-19 Lg Electronics Inc. Method for transmitting and receiving audio data and apparatus therefor
US10542368B2 (en) 2018-03-27 2020-01-21 Nokia Technologies Oy Audio content modification for playback audio
CN108600911B (en) 2018-03-30 2021-05-18 联想(北京)有限公司 Output method and electronic equipment
CN108777832B (en) * 2018-06-13 2021-02-09 上海艺瓣文化传播有限公司 Real-time 3D sound field construction and sound mixing system based on video object tracking
GB2578715A (en) * 2018-07-20 2020-05-27 Nokia Technologies Oy Controlling audio focus for spatial audio processing
EP3860156A4 (en) * 2018-09-28 2021-12-01 Sony Group Corporation Information processing device, method, and program
US11019449B2 (en) 2018-10-06 2021-05-25 Qualcomm Incorporated Six degrees of freedom and three degrees of freedom backward compatibility
JP7504091B2 (en) * 2018-11-02 2024-06-21 ドルビー・インターナショナル・アーベー Audio Encoders and Decoders
US11304021B2 (en) * 2018-11-29 2022-04-12 Sony Interactive Entertainment Inc. Deferred audio rendering
CN111282271B (en) * 2018-12-06 2023-04-07 网易(杭州)网络有限公司 Sound rendering method and device in mobile terminal game and electronic equipment
US11617051B2 (en) 2019-01-28 2023-03-28 EmbodyVR, Inc. Streaming binaural audio from a cloud spatial audio processing system to a mobile station for playback on a personal audio delivery device
US11049509B2 (en) 2019-03-06 2021-06-29 Plantronics, Inc. Voice signal enhancement for head-worn audio devices
KR20220004825A (en) 2019-06-03 2022-01-11 인텔렉추얼디스커버리 주식회사 Method, apparatus, computer program and recording medium for controlling audio data in a wireless communication system
US11076257B1 (en) 2019-06-14 2021-07-27 EmbodyVR, Inc. Converting ambisonic audio to binaural audio
US11416208B2 (en) * 2019-09-23 2022-08-16 Netflix, Inc. Audio metadata smoothing
US11430451B2 (en) * 2019-09-26 2022-08-30 Apple Inc. Layered coding of audio with discrete objects
US11967329B2 (en) * 2020-02-20 2024-04-23 Qualcomm Incorporated Signaling for rendering tools
JP2023517709A (en) * 2020-03-16 2023-04-26 ノキア テクノロジーズ オサケユイチア Rendering and deferred updating of encoded 6DOF audio bitstreams
US11080011B1 (en) 2020-03-20 2021-08-03 Tap Sound System Audio rendering device and audio configurator device for audio stream selection, and related methods
US11102606B1 (en) 2020-04-16 2021-08-24 Sony Corporation Video component in 3D audio
US11930349B2 (en) 2020-11-24 2024-03-12 Naver Corporation Computer system for producing audio content for realizing customized being-there and method thereof
KR102508815B1 (en) * 2020-11-24 2023-03-14 네이버 주식회사 Computer system for realizing customized being-there in assocation with audio and method thereof
JP7536733B2 (en) 2020-11-24 2024-08-20 ネイバー コーポレーション Computer system and method for achieving user-customized realism in connection with audio - Patents.com
EP4037339A1 (en) * 2021-02-02 2022-08-03 Nokia Technologies Oy Selecton of audio channels based on prioritization
US20220391167A1 (en) * 2021-06-02 2022-12-08 Tencent America LLC Adaptive audio delivery and rendering
CN117730368A (en) * 2021-07-29 2024-03-19 杜比国际公司 Method and apparatus for processing object-based audio and channel-based audio
WO2024012665A1 (en) * 2022-07-12 2024-01-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for encoding or decoding of precomputed data for rendering early reflections in ar/vr systems
WO2024074282A1 (en) * 2022-10-05 2024-04-11 Dolby International Ab Method, apparatus, and medium for encoding and decoding of audio bitstreams
WO2024074284A1 (en) * 2022-10-05 2024-04-11 Dolby International Ab Method, apparatus, and medium for efficient encoding and decoding of audio bitstreams
WO2024074283A1 (en) * 2022-10-05 2024-04-11 Dolby International Ab Method, apparatus, and medium for decoding of audio signals with skippable blocks

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4332979A (en) * 1978-12-19 1982-06-01 Fischer Mark L Electronic environmental acoustic simulator
US5592588A (en) * 1994-05-10 1997-01-07 Apple Computer, Inc. Method and apparatus for object-oriented digital audio signal processing using a chain of sound objects
US6108626A (en) * 1995-10-27 2000-08-22 Cselt-Centro Studi E Laboratori Telecomunicazioni S.P.A. Object oriented audio coding
US6160907A (en) * 1997-04-07 2000-12-12 Synapix, Inc. Iterative three-dimensional process for creating finished media content
US20030219130A1 (en) * 2002-05-24 2003-11-27 Frank Baumgarte Coherence-based audio coding and synthesis
US20050105442A1 (en) * 2003-08-04 2005-05-19 Frank Melchior Apparatus and method for generating, storing, or editing an audio representation of an audio scene
US20050147257A1 (en) * 2003-02-12 2005-07-07 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Device and method for determining a reproduction position
US20060206221A1 (en) * 2005-02-22 2006-09-14 Metcalf Randall B System and method for formatting multimode sound content and metadata
US7116787B2 (en) * 2001-05-04 2006-10-03 Agere Systems Inc. Perceptual synthesis of auditory scenes
US7164769B2 (en) * 1996-09-19 2007-01-16 Terry D. Beard Trust Multichannel spectral mapping audio apparatus and method with dynamically varying mapping coefficients
US7292901B2 (en) * 2002-06-24 2007-11-06 Agere Systems Inc. Hybrid multi-channel/cue coding/decoding of audio signals
US7295994B2 (en) * 2000-06-23 2007-11-13 Sony Corporation Information distribution system, terminal apparatus, information center, recording medium, and information distribution method
US20080005347A1 (en) * 2006-06-29 2008-01-03 Yahoo! Inc. Messenger system for publishing podcasts
US20080140426A1 (en) * 2006-09-29 2008-06-12 Dong Soo Kim Methods and apparatuses for encoding and decoding object-based audio signals
US7394903B2 (en) * 2004-01-20 2008-07-01 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Apparatus and method for constructing a multi-channel output signal or for generating a downmix signal
US20080310640A1 (en) * 2006-01-19 2008-12-18 Lg Electronics Inc. Method and Apparatus for Processing a Media Signal
US20090034613A1 (en) * 2007-07-31 2009-02-05 Samsung Electronics Co., Ltd. Method and apparatus for generating multimedia data having decoding level, and method and apparatus for reconstructing multimedia data by using the decoding level
US20090060236A1 (en) * 2007-08-29 2009-03-05 Microsoft Corporation Loudspeaker array providing direct and indirect radiation from same set of drivers
US20090082888A1 (en) * 2006-01-31 2009-03-26 Niels Thybo Johansen Audio-visual system control using a mesh network
US7583805B2 (en) * 2004-02-12 2009-09-01 Agere Systems Inc. Late reverberation-based synthesis of auditory scenes
US20090225993A1 (en) * 2005-11-24 2009-09-10 Zoran Cvetkovic Audio signal processing method and system
US20090237564A1 (en) * 2008-03-18 2009-09-24 Invism, Inc. Interactive immersive virtual reality and simulation
US20090326960A1 (en) * 2006-09-18 2009-12-31 Koninklijke Philips Electronics N.V. Encoding and decoding of audio objects
US20100135510A1 (en) * 2008-12-02 2010-06-03 Electronics And Telecommunications Research Institute Apparatus for generating and playing object based audio contents
US20110013790A1 (en) * 2006-10-16 2011-01-20 Johannes Hilpert Apparatus and Method for Multi-Channel Parameter Transformation
US20120057715A1 (en) * 2010-09-08 2012-03-08 Johnston James D Spatial audio encoding and reproduction

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001359067A (en) * 2000-06-09 2001-12-26 Canon Inc Communication system and its communication method
JP2002204437A (en) * 2000-12-28 2002-07-19 Canon Inc Communication unit, communication system, communication method, and storage medium
JP2005086537A (en) * 2003-09-09 2005-03-31 Nippon Hoso Kyokai <Nhk> High presence sound field reproduction information transmitter, high presence sound field reproduction information transmitting program, high presence sound field reproduction information transmitting method and high presence sound field reproduction information receiver, high presence sound field reproduction information receiving program, high presence sound field reproduction information receiving method
JP4497885B2 (en) * 2003-10-16 2010-07-07 三洋電機株式会社 Signal processing device
JP4433287B2 (en) * 2004-03-25 2010-03-17 ソニー株式会社 Receiving apparatus and method, and program
EP1650973A1 (en) * 2004-10-25 2006-04-26 Alcatel USA Sourcing, L.P. Method for encoding a multimedia content
DE102005008366A1 (en) * 2005-02-23 2006-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Device for driving wave-field synthesis rendering device with audio objects, has unit for supplying scene description defining time sequence of audio objects
JP2007018646A (en) * 2005-07-11 2007-01-25 Hitachi Ltd Recording and reproducing device
JP2007028432A (en) * 2005-07-20 2007-02-01 Mitsubishi Electric Corp Packet relay transmission apparatus
US8705747B2 (en) * 2005-12-08 2014-04-22 Electronics And Telecommunications Research Institute Object-based 3-dimensional audio service system using preset audio scenes
CN100527704C (en) * 2006-01-05 2009-08-12 华为软件技术有限公司 Stream medium server and stream medium transmitting and storaging method
JP4687538B2 (en) * 2006-04-04 2011-05-25 パナソニック株式会社 Receiving device, transmitting device, and communication method therefor
EP2022263B1 (en) * 2006-05-19 2012-08-01 Electronics and Telecommunications Research Institute Object-based 3-dimensional audio service system using preset audio scenes
CN101490744B (en) * 2006-11-24 2013-07-17 Lg电子株式会社 Method and apparatus for encoding and decoding an audio signal
WO2008084436A1 (en) 2007-01-10 2008-07-17 Koninklijke Philips Electronics N.V. An object-oriented audio decoder
WO2008100067A1 (en) * 2007-02-13 2008-08-21 Lg Electronics Inc. A method and an apparatus for processing an audio signal
KR101069268B1 (en) * 2007-02-14 2011-10-04 엘지전자 주식회사 methods and apparatuses for encoding and decoding object-based audio signals
EP2137726B1 (en) * 2007-03-09 2011-09-28 LG Electronics Inc. A method and an apparatus for processing an audio signal
EP2158752B1 (en) 2007-05-22 2019-07-10 Telefonaktiebolaget LM Ericsson (publ) Methods and arrangements for group sound telecommunication
KR101431253B1 (en) * 2007-06-26 2014-08-21 코닌클리케 필립스 엔.브이. A binaural object-oriented audio decoder
TW200921643A (en) 2007-06-27 2009-05-16 Koninkl Philips Electronics Nv A method of merging at least two input object-oriented audio parameter streams into an output object-oriented audio parameter stream
WO2009093866A2 (en) * 2008-01-23 2009-07-30 Lg Electronics Inc. A method and an apparatus for processing an audio signal
KR20120062758A (en) 2009-08-14 2012-06-14 에스알에스 랩스, 인크. System for adaptively streaming audio objects

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4332979A (en) * 1978-12-19 1982-06-01 Fischer Mark L Electronic environmental acoustic simulator
US5592588A (en) * 1994-05-10 1997-01-07 Apple Computer, Inc. Method and apparatus for object-oriented digital audio signal processing using a chain of sound objects
US6108626A (en) * 1995-10-27 2000-08-22 Cselt-Centro Studi E Laboratori Telecomunicazioni S.P.A. Object oriented audio coding
US7164769B2 (en) * 1996-09-19 2007-01-16 Terry D. Beard Trust Multichannel spectral mapping audio apparatus and method with dynamically varying mapping coefficients
US6160907A (en) * 1997-04-07 2000-12-12 Synapix, Inc. Iterative three-dimensional process for creating finished media content
US7295994B2 (en) * 2000-06-23 2007-11-13 Sony Corporation Information distribution system, terminal apparatus, information center, recording medium, and information distribution method
US7116787B2 (en) * 2001-05-04 2006-10-03 Agere Systems Inc. Perceptual synthesis of auditory scenes
US7006636B2 (en) * 2002-05-24 2006-02-28 Agere Systems Inc. Coherence-based audio coding and synthesis
US20030219130A1 (en) * 2002-05-24 2003-11-27 Frank Baumgarte Coherence-based audio coding and synthesis
US7292901B2 (en) * 2002-06-24 2007-11-06 Agere Systems Inc. Hybrid multi-channel/cue coding/decoding of audio signals
US20050147257A1 (en) * 2003-02-12 2005-07-07 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Device and method for determining a reproduction position
US7680288B2 (en) * 2003-08-04 2010-03-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating, storing, or editing an audio representation of an audio scene
US20050105442A1 (en) * 2003-08-04 2005-05-19 Frank Melchior Apparatus and method for generating, storing, or editing an audio representation of an audio scene
US7394903B2 (en) * 2004-01-20 2008-07-01 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Apparatus and method for constructing a multi-channel output signal or for generating a downmix signal
US7583805B2 (en) * 2004-02-12 2009-09-01 Agere Systems Inc. Late reverberation-based synthesis of auditory scenes
US20060206221A1 (en) * 2005-02-22 2006-09-14 Metcalf Randall B System and method for formatting multimode sound content and metadata
US20090225993A1 (en) * 2005-11-24 2009-09-10 Zoran Cvetkovic Audio signal processing method and system
US20080310640A1 (en) * 2006-01-19 2008-12-18 Lg Electronics Inc. Method and Apparatus for Processing a Media Signal
US20090082888A1 (en) * 2006-01-31 2009-03-26 Niels Thybo Johansen Audio-visual system control using a mesh network
US20080005347A1 (en) * 2006-06-29 2008-01-03 Yahoo! Inc. Messenger system for publishing podcasts
US20090326960A1 (en) * 2006-09-18 2009-12-31 Koninklijke Philips Electronics N.V. Encoding and decoding of audio objects
US20090164222A1 (en) * 2006-09-29 2009-06-25 Dong Soo Kim Methods and apparatuses for encoding and decoding object-based audio signals
US20080140426A1 (en) * 2006-09-29 2008-06-12 Dong Soo Kim Methods and apparatuses for encoding and decoding object-based audio signals
US20110013790A1 (en) * 2006-10-16 2011-01-20 Johannes Hilpert Apparatus and Method for Multi-Channel Parameter Transformation
US20090034613A1 (en) * 2007-07-31 2009-02-05 Samsung Electronics Co., Ltd. Method and apparatus for generating multimedia data having decoding level, and method and apparatus for reconstructing multimedia data by using the decoding level
US20090060236A1 (en) * 2007-08-29 2009-03-05 Microsoft Corporation Loudspeaker array providing direct and indirect radiation from same set of drivers
US20090237564A1 (en) * 2008-03-18 2009-09-24 Invism, Inc. Interactive immersive virtual reality and simulation
US20100135510A1 (en) * 2008-12-02 2010-06-03 Electronics And Telecommunications Research Institute Apparatus for generating and playing object based audio contents
US20120057715A1 (en) * 2010-09-08 2012-03-08 Johnston James D Spatial audio encoding and reproduction
US20120082319A1 (en) * 2010-09-08 2012-04-05 Jean-Marc Jot Spatial audio encoding and reproduction of diffuse sound

Cited By (119)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8396577B2 (en) 2009-08-14 2013-03-12 Dts Llc System for creating audio objects for streaming
US20110040397A1 (en) * 2009-08-14 2011-02-17 Srs Labs, Inc. System for creating audio objects for streaming
US9167346B2 (en) 2009-08-14 2015-10-20 Dts Llc Object-oriented audio streaming system
US8396575B2 (en) 2009-08-14 2013-03-12 Dts Llc Object-oriented audio streaming system
US20110040396A1 (en) * 2009-08-14 2011-02-17 Srs Labs, Inc. System for adaptively streaming audio objects
US8396576B2 (en) 2009-08-14 2013-03-12 Dts Llc System for adaptively streaming audio objects
WO2012054750A1 (en) 2010-10-20 2012-04-26 Srs Labs, Inc. Stereo image widening system
WO2012122397A1 (en) * 2011-03-09 2012-09-13 Srs Labs, Inc. System for dynamically creating and rendering audio objects
US20120232910A1 (en) * 2011-03-09 2012-09-13 Srs Labs, Inc. System for dynamically creating and rendering audio objects
US9721575B2 (en) 2011-03-09 2017-08-01 Dts Llc System for dynamically creating and rendering audio objects
US9165558B2 (en) 2011-03-09 2015-10-20 Dts Llc System for dynamically creating and rendering audio objects
US9026450B2 (en) * 2011-03-09 2015-05-05 Dts Llc System for dynamically creating and rendering audio objects
US20140112480A1 (en) * 2011-06-15 2014-04-24 Dolby Laboratories Licensing Corporation Method for capturing and playback of sound originating from a plurality of sound sources
US9756449B2 (en) * 2011-06-24 2017-09-05 Bright Minds Holding B.V. Method and device for processing sound data for spatial sound reproduction
US20140126758A1 (en) * 2011-06-24 2014-05-08 Bright Minds Holding B.V. Method and device for processing sound data
US9467791B2 (en) 2011-07-01 2016-10-11 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
TWI816597B (en) * 2011-07-01 2023-09-21 美商杜比實驗室特許公司 Apparatus, method and non-transitory medium for enhanced 3d audio authoring and rendering
US9942688B2 (en) 2011-07-01 2018-04-10 Dolby Laboraties Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US11412342B2 (en) 2011-07-01 2022-08-09 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
CN103650539A (en) * 2011-07-01 2014-03-19 杜比实验室特许公司 System and method for adaptive audio signal generation, coding and rendering
US9179236B2 (en) 2011-07-01 2015-11-03 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US10477339B2 (en) 2011-07-01 2019-11-12 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US10244343B2 (en) 2011-07-01 2019-03-26 Dolby Laboratories Licensing Corporation System and tools for enhanced 3D audio authoring and rendering
US10609506B2 (en) 2011-07-01 2020-03-31 Dolby Laboratories Licensing Corporation System and tools for enhanced 3D audio authoring and rendering
CN105578380A (en) * 2011-07-01 2016-05-11 杜比实验室特许公司 System and Method for Adaptive Audio Signal Generation, Coding and Rendering
US12047768B2 (en) 2011-07-01 2024-07-23 Dolby Laboratories Licensing Corporation System and tools for enhanced 3D audio authoring and rendering
US10057708B2 (en) 2011-07-01 2018-08-21 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US11057731B2 (en) 2011-07-01 2021-07-06 Dolby Laboratories Licensing Corporation System and tools for enhanced 3D audio authoring and rendering
US11962997B2 (en) 2011-07-01 2024-04-16 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US9838826B2 (en) 2011-07-01 2017-12-05 Dolby Laboratories Licensing Corporation System and tools for enhanced 3D audio authoring and rendering
TWI607654B (en) * 2011-07-01 2017-12-01 杜比實驗室特許公司 Apparatus, method and non-transitory medium for enhanced 3d audio authoring and rendering
US9800991B2 (en) 2011-07-01 2017-10-24 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
TWI785394B (en) * 2011-07-01 2022-12-01 美商杜比實驗室特許公司 Apparatus, method and non-transitory medium for enhanced 3d audio authoring and rendering
US11641562B2 (en) 2011-07-01 2023-05-02 Dolby Laboratories Licensing Corporation System and tools for enhanced 3D audio authoring and rendering
US10327092B2 (en) 2011-07-01 2019-06-18 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US10904692B2 (en) 2011-07-01 2021-01-26 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US10165387B2 (en) 2011-07-01 2018-12-25 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US9622009B2 (en) 2011-07-01 2017-04-11 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
WO2013032822A2 (en) 2011-08-26 2013-03-07 Dts Llc Audio adjustment system
KR20190004248A (en) * 2012-05-14 2019-01-11 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
KR102220527B1 (en) * 2012-05-14 2021-02-25 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
KR102071431B1 (en) * 2012-05-14 2020-03-02 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
KR20210022600A (en) * 2012-05-14 2021-03-03 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
KR20130127344A (en) * 2012-05-14 2013-11-22 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
KR20200011522A (en) * 2012-05-14 2020-02-03 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
KR101935020B1 (en) * 2012-05-14 2019-01-03 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
KR102370672B1 (en) * 2012-05-14 2022-03-07 한국전자통신연구원 Method and apparatus for providing audio data, method and apparatus for providing audio metadata, method and apparatus for playing audio data
US9622014B2 (en) 2012-06-19 2017-04-11 Dolby Laboratories Licensing Corporation Rendering and playback of spatial audio using channel-based audio systems
US9190065B2 (en) 2012-07-15 2015-11-17 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for three-dimensional audio coding using basis function coefficients
US9478225B2 (en) 2012-07-15 2016-10-25 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for three-dimensional audio coding using basis function coefficients
US9516446B2 (en) 2012-07-20 2016-12-06 Qualcomm Incorporated Scalable downmix design for object-based surround codec with cluster analysis by synthesis
US9479886B2 (en) 2012-07-20 2016-10-25 Qualcomm Incorporated Scalable downmix design with feedback for object-based surround codec
US9761229B2 (en) 2012-07-20 2017-09-12 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for audio object clustering
US9489954B2 (en) 2012-08-07 2016-11-08 Dolby Laboratories Licensing Corporation Encoding and rendering of object based audio indicative of game audio content
WO2014025752A1 (en) * 2012-08-07 2014-02-13 Dolby Laboratories Licensing Corporation Encoding and rendering of object based audio indicative of game audio content
US11178503B2 (en) * 2012-08-31 2021-11-16 Dolby Laboratories Licensing Corporation System for rendering and playback of object based audio in various listening environments
CN107454511A (en) * 2012-08-31 2017-12-08 杜比实验室特许公司 Loudspeakers for reflecting sound from viewing screens or display surfaces
US11277703B2 (en) 2012-08-31 2022-03-15 Dolby Laboratories Licensing Corporation Speaker for reflecting sound off viewing screen or display surface
US20150221319A1 (en) * 2012-09-21 2015-08-06 Dolby International Ab Methods and systems for selecting layers of encoded audio signals for teleconferencing
US9858936B2 (en) * 2012-09-21 2018-01-02 Dolby Laboratories Licensing Corporation Methods and systems for selecting layers of encoded audio signals for teleconferencing
US9565314B2 (en) * 2012-09-27 2017-02-07 Dolby Laboratories Licensing Corporation Spatial multiplexing in a soundfield teleconferencing system
US20150244869A1 (en) * 2012-09-27 2015-08-27 Dolby Laboratories Licensing Corporation Spatial Multiplexing in a Soundfield Teleconferencing System
US11727945B2 (en) 2013-04-03 2023-08-15 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US9997164B2 (en) * 2013-04-03 2018-06-12 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US11081118B2 (en) * 2013-04-03 2021-08-03 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US20230419973A1 (en) * 2013-04-03 2023-12-28 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US10515644B2 (en) 2013-04-03 2019-12-24 Dolby Laboratories Licensing Corporation Methods and systems for interactive rendering of object based audio
US20160029138A1 (en) * 2013-04-03 2016-01-28 Dolby Laboratories Licensing Corporation Methods and Systems for Interactive Rendering of Object Based Audio
US9837123B2 (en) 2013-04-05 2017-12-05 Dts, Inc. Layered audio reconstruction system
US9558785B2 (en) 2013-04-05 2017-01-31 Dts, Inc. Layered audio coding and transmission
US9613660B2 (en) 2013-04-05 2017-04-04 Dts, Inc. Layered audio reconstruction system
US9258664B2 (en) 2013-05-23 2016-02-09 Comhear, Inc. Headphone audio enhancement system
US9866963B2 (en) 2013-05-23 2018-01-09 Comhear, Inc. Headphone audio enhancement system
US10284955B2 (en) 2013-05-23 2019-05-07 Comhear, Inc. Headphone audio enhancement system
US20150025664A1 (en) * 2013-07-22 2015-01-22 Dolby Laboratories Licensing Corporation Interactive Audio Content Generation, Delivery, Playback and Sharing
US9411882B2 (en) * 2013-07-22 2016-08-09 Dolby Laboratories Licensing Corporation Interactive audio content generation, delivery, playback and sharing
US10003907B2 (en) * 2013-07-31 2018-06-19 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
US11736890B2 (en) 2013-07-31 2023-08-22 Dolby Laboratories Licensing Corporation Method, apparatus or systems for processing audio objects
US20170223476A1 (en) * 2013-07-31 2017-08-03 Dolby International Ab Processing Spatially Diffuse or Large Audio Objects
US10595152B2 (en) 2013-07-31 2020-03-17 Dolby Laboratories Licensing Corporation Processing spatially diffuse or large audio objects
US11064310B2 (en) 2013-07-31 2021-07-13 Dolby Laboratories Licensing Corporation Method, apparatus or systems for processing audio objects
US11068227B2 (en) 2013-11-05 2021-07-20 Sony Corporation Information processing device and information processing method for indicating a position outside a display region
US10599382B2 (en) * 2013-11-05 2020-03-24 Sony Corporation Information processing device and information processing method for indicating a position outside a display region
US9955278B2 (en) * 2014-04-02 2018-04-24 Dolby International Ab Exploiting metadata redundancy in immersive audio metadata
US20170013387A1 (en) * 2014-04-02 2017-01-12 Dolby International Ab Exploiting metadata redundancy in immersive audio metadata
US9367283B2 (en) * 2014-07-22 2016-06-14 Sonos, Inc. Audio settings
US11803349B2 (en) 2014-07-22 2023-10-31 Sonos, Inc. Audio settings
US10061556B2 (en) 2014-07-22 2018-08-28 Sonos, Inc. Audio settings
US11765535B2 (en) 2015-02-06 2023-09-19 Dolby Laboratories Licensing Corporation Methods and systems for rendering audio based on priority
US11190893B2 (en) 2015-02-06 2021-11-30 Dolby Laboratories Licensing Corporation Methods and systems for rendering audio based on priority
US10659899B2 (en) * 2015-02-06 2020-05-19 Dolby Laboratories Licensing Corporation Methods and systems for rendering audio based on priority
CN114374925A (en) * 2015-02-06 2022-04-19 杜比实验室特许公司 Hybrid priority-based rendering system and method for adaptive audio
US20190191258A1 (en) * 2015-02-06 2019-06-20 Dolby Laboratories Licensing Corporation Methods and systems for rendering audio based on priority
US20160300577A1 (en) * 2015-04-08 2016-10-13 Dolby International Ab Rendering of Audio Content
CN106162500A (en) * 2015-04-08 2016-11-23 杜比实验室特许公司 Presenting of audio content
CN111586533A (en) * 2015-04-08 2020-08-25 杜比实验室特许公司 Presentation of audio content
US9967666B2 (en) * 2015-04-08 2018-05-08 Dolby Laboratories Licensing Corporation Rendering of audio content
US10136240B2 (en) * 2015-04-20 2018-11-20 Dolby Laboratories Licensing Corporation Processing audio data to compensate for partial hearing loss or an adverse hearing environment
US20160357501A1 (en) * 2015-06-03 2016-12-08 Skullcandy, Inc. Audio devices and related methods for acquiring audio device use information
US10338880B2 (en) * 2015-06-03 2019-07-02 Skullcandy, Inc. Audio devices and related methods for acquiring audio device use information
US20180275955A1 (en) * 2015-12-01 2018-09-27 Fraunhofer-Gesellschaft Zur Foerderung De Angewandten Forschung E.V. System for outputting audio signals and respective method and setting device
US11249718B2 (en) * 2015-12-01 2022-02-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. System for outputting audio signals and respective method and setting device
US10171971B2 (en) 2015-12-21 2019-01-01 Skullcandy, Inc. Electrical systems and related methods for providing smart mobile electronic device features to a user of a wearable device
US11194541B2 (en) 2016-01-28 2021-12-07 Sonos, Inc. Systems and methods of distributing audio to one or more playback devices
US9886234B2 (en) 2016-01-28 2018-02-06 Sonos, Inc. Systems and methods of distributing audio to one or more playback devices
US11526326B2 (en) 2016-01-28 2022-12-13 Sonos, Inc. Systems and methods of distributing audio to one or more playback devices
US10592200B2 (en) 2016-01-28 2020-03-17 Sonos, Inc. Systems and methods of distributing audio to one or more playback devices
US10296288B2 (en) 2016-01-28 2019-05-21 Sonos, Inc. Systems and methods of distributing audio to one or more playback devices
US10229695B2 (en) 2016-03-30 2019-03-12 Microsoft Technology Licensing, Llc Application programing interface for adaptive audio rendering
WO2017173155A1 (en) * 2016-03-30 2017-10-05 Microsoft Technology Licensing, Llc Spatial audio resource management and mixing for applications
US10325610B2 (en) 2016-03-30 2019-06-18 Microsoft Technology Licensing, Llc Adaptive audio rendering
US11528576B2 (en) 2016-12-05 2022-12-13 Magic Leap, Inc. Distributed audio capturing techniques for virtual reality (VR), augmented reality (AR), and mixed reality (MR) systems
US20180167755A1 (en) * 2016-12-14 2018-06-14 Nokia Technologies Oy Distributed Audio Mixing
US10448186B2 (en) * 2016-12-14 2019-10-15 Nokia Technologies Oy Distributed audio mixing
EP3566456A4 (en) * 2017-01-06 2020-08-19 Nokia Technologies Oy Discovery, announcement and assignment of position tracks
US11595774B2 (en) * 2017-05-12 2023-02-28 Microsoft Technology Licensing, Llc Spatializing audio data based on analysis of incoming audio data
US20180332424A1 (en) * 2017-05-12 2018-11-15 Microsoft Technology Licensing, Llc Spatializing audio data based on analysis of incoming audio data
US11386913B2 (en) * 2017-08-01 2022-07-12 Dolby Laboratories Licensing Corporation Audio object classification based on location metadata
US10848894B2 (en) * 2018-04-09 2020-11-24 Nokia Technologies Oy Controlling audio in multi-viewpoint omnidirectional content

Also Published As

Publication number Publication date
US9167346B2 (en) 2015-10-20
WO2011020065A1 (en) 2011-02-17
KR20170052696A (en) 2017-05-12
EP2465259A4 (en) 2015-10-28
WO2011020067A1 (en) 2011-02-17
EP3697083A1 (en) 2020-08-19
US20130202129A1 (en) 2013-08-08
EP2465114B1 (en) 2020-04-08
KR101842411B1 (en) 2018-03-26
KR101805212B1 (en) 2017-12-05
JP5726874B2 (en) 2015-06-03
CN102549655A (en) 2012-07-04
US8396576B2 (en) 2013-03-12
US20110040396A1 (en) 2011-02-17
US8396577B2 (en) 2013-03-12
US20110040397A1 (en) 2011-02-17
PL2465114T3 (en) 2020-09-07
JP2013502183A (en) 2013-01-17
JP5635097B2 (en) 2014-12-03
US8396575B2 (en) 2013-03-12
CN102576533A (en) 2012-07-11
KR20120061869A (en) 2012-06-13
CN102576533B (en) 2014-09-17
EP2465259A1 (en) 2012-06-20
ES2793958T3 (en) 2020-11-17
CN102549655B (en) 2014-09-24
KR20120062758A (en) 2012-06-14
EP2465114A4 (en) 2015-11-11
JP2013502184A (en) 2013-01-17
EP3697083B1 (en) 2023-04-19
EP2465114A1 (en) 2012-06-20

Similar Documents

Publication Publication Date Title
US9167346B2 (en) Object-oriented audio streaming system
JP7009664B2 (en) Audio signal processing system and method
RU2820838C2 (en) System, method and persistent machine-readable data medium for generating, encoding and presenting adaptive audio signal data

Legal Events

Date Code Title Description
AS Assignment

Owner name: SRS LABS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KRAEMER, ALAN D.;TRACEY, JAMES;KATSIANOS, THEMIS;REEL/FRAME:025244/0293

Effective date: 20101019

AS Assignment

Owner name: DTS LLC, CALIFORNIA

Free format text: MERGER;ASSIGNOR:SRS LABS, INC.;REEL/FRAME:028691/0552

Effective date: 20120720

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: ROYAL BANK OF CANADA, AS COLLATERAL AGENT, CANADA

Free format text: SECURITY INTEREST;ASSIGNORS:INVENSAS CORPORATION;TESSERA, INC.;TESSERA ADVANCED TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040797/0001

Effective date: 20161201

AS Assignment

Owner name: DTS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DTS LLC;REEL/FRAME:047119/0508

Effective date: 20180912

AS Assignment

Owner name: BANK OF AMERICA, N.A., NORTH CAROLINA

Free format text: SECURITY INTEREST;ASSIGNORS:ROVI SOLUTIONS CORPORATION;ROVI TECHNOLOGIES CORPORATION;ROVI GUIDES, INC.;AND OTHERS;REEL/FRAME:053468/0001

Effective date: 20200601

AS Assignment

Owner name: INVENSAS CORPORATION, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: DTS LLC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: IBIQUITY DIGITAL CORPORATION, MARYLAND

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: INVENSAS BONDING TECHNOLOGIES, INC. (F/K/A ZIPTRONIX, INC.), CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: DTS, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: FOTONATION CORPORATION (F/K/A DIGITALOPTICS CORPORATION AND F/K/A DIGITALOPTICS CORPORATION MEMS), CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: TESSERA, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: PHORUS, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

Owner name: TESSERA ADVANCED TECHNOLOGIES, INC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROYAL BANK OF CANADA;REEL/FRAME:052920/0001

Effective date: 20200601

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

AS Assignment

Owner name: IBIQUITY DIGITAL CORPORATION, CALIFORNIA

Free format text: PARTIAL RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:061786/0675

Effective date: 20221025

Owner name: PHORUS, INC., CALIFORNIA

Free format text: PARTIAL RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:061786/0675

Effective date: 20221025

Owner name: DTS, INC., CALIFORNIA

Free format text: PARTIAL RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:061786/0675

Effective date: 20221025

Owner name: VEVEO LLC (F.K.A. VEVEO, INC.), CALIFORNIA

Free format text: PARTIAL RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:061786/0675

Effective date: 20221025

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY