US20150302086A1 - Audio identification during performance - Google Patents
Audio identification during performance Download PDFInfo
- Publication number
- US20150302086A1 US20150302086A1 US14/258,263 US201414258263A US2015302086A1 US 20150302086 A1 US20150302086 A1 US 20150302086A1 US 201414258263 A US201414258263 A US 201414258263A US 2015302086 A1 US2015302086 A1 US 2015302086A1
- Authority
- US
- United States
- Prior art keywords
- fingerprint
- audio piece
- live
- audio
- identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims description 88
- 230000004044 response Effects 0.000 claims description 12
- 230000002596 correlated effect Effects 0.000 claims description 7
- 230000015654 memory Effects 0.000 description 24
- 230000000875 corresponding effect Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 239000004984 smart glass Substances 0.000 description 5
- 230000015556 catabolic process Effects 0.000 description 4
- 238000006731 degradation reaction Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000001755 vocal effect Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000013475 authorization Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 229940050561 matrix product Drugs 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000001020 rhythmical effect Effects 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G06F17/30743—
-
- G06F17/30241—
-
- G06F17/30867—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Definitions
- the subject matter disclosed herein generally relates to the processing of data. Specifically, the present disclosure addresses systems and methods to facilitate audio identification.
- a performer may give a live performance (e.g., a concert or other live show) before an audience that includes one or more individuals (e.g., audience members, fans, or concertgoers).
- a live performance e.g., a concert or other live show
- individuals e.g., audience members, fans, or concertgoers
- a musical soloist e.g., a singer-songwriter
- a musical group e.g., a rock band
- a theater ensemble e.g., including actors, dancers, and a choir
- One or more audio pieces may be performed during a live performance.
- one or more songs may be performed, and a song may be performed with or without visual accompaniment (e.g., a video, a laser show, or a dance routine).
- the performer of an audio piece is an artist that recorded the audio piece (e.g., as a studio recording or as a live recording).
- a performer may perform a song that was written and recorded by her herself.
- the performer of an audio piece is different from the artist that recorded the audio piece (e.g., as a studio recording or as a live recording).
- a performer may perform a cover of a song that was written and recorded by someone else.
- FIG. 1 is a network diagram illustrating a network environment suitable for audio identification, according to some example embodiments.
- FIG. 2 is a block diagram illustrating components of an identification machine suitable for audio identification, according to some example embodiments.
- FIGS. 3-9 are flowcharts illustrating operations in a method of audio identification, according to some example embodiments.
- FIG. 10 is a block diagram illustrating components of a machine, according to some example embodiments, able to read instructions from a machine-readable medium and perform any one or more of the methodologies discussed herein.
- Example methods and systems are directed to audio identification. Examples merely typify possible variations. Unless explicitly stated otherwise, components and functions are optional and may be combined or subdivided, and operations may vary in sequence or be combined or subdivided. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments. It will be evident to one skilled in the art, however, that the present subject matter may be practiced without these specific details.
- one or more audience members may use a network-based system to identify an audio piece during its performance (e.g., while the audio piece is being performed).
- the network-based system may provide its users (e.g., the audience members) with one or more audio identification services.
- a machine may form all or part of the network-based system and may be configured (e.g., by software) to provide such identification services to one or more users (e.g., concertgoers).
- the machine may be configured to obtain an identifier (e.g., a song name) of an audio piece during a performance of an audio piece (e.g., at a first time, such as five seconds into a song).
- the identifier may be obtained in any one or more of various ways, including, for example, receiving the identifier as a user submission (e.g., from an audience member, from a venue manager, or from the performer herself), inferring the identifier based on some received metadata of the audio piece (e.g., a partial name of the song, an album on which the song appears, or a release year of the song), inferring the identifier based on a detected geolocation of a device whose user is at the performance, tallying votes for the identifier (e.g., from several audience members), and accessing the identifier directly from a device of the performer (e.g., a mixer, a drum machine, a media player, a smartphone, or a tablet computer).
- a user's device may record a segment of the audio piece during its performance, generate a fingerprint of the segment, and upload the fingerprint to the machine.
- the machine may receive the fingerprint during the performance (e.g., at a second time, such as 15 seconds into the song) and assign the identifier to the fingerprint. This identifier may be provided to the user's device to identify the audio piece.
- the machine may receive additional information (e.g., one or more additional fingerprints or classifications of additional segments of the audio piece or other audio) from additional users' devices, and the machine may determine from this additional information that the audio piece has not ended (e.g., by failing to detect silence, applause, booing, or any suitable combination thereof).
- the machine may provide the identifier to any one or more of these additional users' devices.
- another user's device may record another segment of the audio piece during its performance, generate another fingerprint of the segment, and submit this fingerprint to the machine as a query for identification of the audio piece. While the performance continues, the machine may receive this fingerprint during the performance (e.g., at a third time, such as 30 seconds into the song) and respond during the performance by providing the identifier, which may be based on its determination that additional information (e.g., one or more additional fingerprints or classifications of additional segments of the audio piece or other audio) from additional user's devices fail to indicate an end of the audio piece.
- additional information e.g., one or more additional fingerprints or classifications of additional segments of the audio piece or other audio
- the machine may be configured to identify an audio piece, even when a live version (e.g., a live cover version) of the audio piece is being performed differently from a reference version (e.g., a studio version or radio version) of the audio piece as recorded by an artist (e.g., same or different from the performer of the live version).
- the machine may receive a live fingerprint of the segment of the live version (e.g., within a query for identification of the audio piece during its performance).
- the fingerprinting technique used here may instead identify one or more core characteristics of the audio piece (e.g., the notes and rhythms present) and be robust to differences between the live version and a reference version of the audio piece (e.g., differences in tempo, vocal timber, vocal strength, vibrato, instrument tuning, ambient noise, reverberation, or distortions).
- the fingerprinting technique may be based on a chromagram that represents the harmonic structure of the live version (e.g., mapped to one octave).
- Such a fingerprinting technique may also be used later to identify and retrieve user-uploaded recordings from the performance (e.g., for copyright clearance purposes, to automatically tag or index such recordings, or any suitable combination thereof).
- the machine may identify the performer of the live version (e.g., by detecting a venue at which the live version is being performed and accessing information that correlates the detected venue with the performer).
- the machine may then access a set of reference fingerprints that correspond to the artist that recorded the audio piece (e.g., based on the identified performer of the live version). For example, based on the identified performer (e.g., as well as a detected venue, a current date and time, or any suitable combination of), the machine may retrieve a list of audio pieces (e.g., a playlist, a concert program, or a concert brochure) that corresponds to the performer. Using the retrieved list, the machine may identify reference versions (e.g., official or canonical versions) of the audio pieces (e.g., recorded by the artist, who may be the same or different from the performer of the live version) and access reference fingerprints of the identified reference versions of the audio pieces. The reference fingerprints may have been previously generated from segments of the reference versions of the audio pieces, and among these reference fingerprints may be a reference fingerprint of the reference version of the audio piece whose live version is currently being performed.
- a list of audio pieces e.g., a playlist
- the machine may compare the live fingerprint of a segment of the live version of an audio piece to the set of reference fingerprints of segments from the reference versions of the audio piece.
- the machine compares the live fingerprint exclusively (e.g., only) to the set of reference fingerprints. Based on this comparison, the machine may identify a match between the live fingerprint and the reference fingerprint and thus identify the audio piece while the audio piece is being performed.
- the machine may provide an identifier of the audio piece in a response to the query for identification of the audio piece. The identifier may be provided during the performance of the live version of the audio piece.
- FIG. 1 is a network diagram illustrating a network environment suitable for audio identification, according to some example embodiments.
- the network environment 100 includes an identification machine 110 , a database 115 , and devices 120 , 130 , 140 , and 150 respectively being operated by users 122 , 132 , 142 , and 152 in an audience, as well as a device 160 and a mixer 161 being operated by a performer 162 .
- the identification machine 110 , the database 115 , the devices 120 , 130 , 140 , 150 , 160 , and the mixer 161 may all be communicatively coupled (e.g., to each other) via a network 190 .
- the identification machine 110 may form all or part of a network-based system 105 (e.g., a cloud-based server system configured to provide one or more audio identification services to the devices 120 , 130 , 140 , and 150 , to their respective users 122 , 132 , 142 , and 152 , or to any suitable combination thereof).
- the identification machine 110 , the database 115 , the devices 120 , 130 , 140 , 150 , and 160 , and the mixer 161 may each be implemented in a computer system, in whole or in part, as described below with respect to FIG. 10 .
- any one or more of the users 122 , 132 , 142 , and 152 in the audience may be a human user (e.g., a human being), a machine user (e.g., a computer configured by a software program to interact with the device 120 ), or any suitable combination thereof (e.g., a human assisted by a machine or a machine supervised by a human).
- the user 122 is not part of the network environment 100 , but is associated with the device 120 and may be a user of the device 120 .
- the device 120 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to the user 122 .
- the user 132 is not part of the network environment 100 , but is associated with the device 130 and may be a user of the device 130 .
- the device 130 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to the user 132 .
- the user 142 is not part of the network environment 100 , but is associated with the device 140 .
- the device 140 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to the user 142 .
- the user 152 is not part of the network environment 100 , but is associated with the device 150 .
- the device 150 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to the user 152 .
- the performer 162 is not part of the network environment 100 , but is associated with the device 160 and the mixer 161 .
- the device 160 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to the performer 162 .
- the mixer 161 may be or include an audio playback device, an audio mixing device, an audio processing device, or any suitable combination thereof. According to various example embodiments, the mixer 161 may drive (e.g., output signals that represent audio information to) one or more amplifiers, speakers, or other audio output equipment in producing sound for the audience during a performance of an audio piece by the performer 162 . In some example embodiments, the mixer 161 is a source of one or more segments of a reference version of an audio piece (e.g., an audio piece to be identified later during performance of the audio piece). In certain example embodiments, the mixer 161 may perform operations described herein for any one or more of the devices 120 , 130 , 140 , and 150 .
- the mixer 161 may perform operations described herein for any one or more of the devices 120 , 130 , 140 , and 150 .
- any of the machines, databases, or devices shown in FIG. 1 may be implemented in a general-purpose computer modified (e.g., configured or programmed) by software (e.g., one or more software modules) to be a special-purpose computer to perform one or more of the functions described herein for that machine, database, or device.
- a computer system able to implement any one or more of the methodologies described herein is discussed below with respect to FIG. 10 .
- a “database” is a data storage resource and may store data structured as a text file, a table, a spreadsheet, a relational database (e.g., an object-relational database), a triple store, a hierarchical data store, or any suitable combination thereof.
- any two or more of the machines, databases, or devices illustrated in FIG. 1 may be combined into a single machine, and the functions described herein for any single machine, database, or device may be subdivided among multiple machines, databases, or devices.
- the network 190 may be any network that enables communication between or among machines, databases, and devices (e.g., the identification machine 110 and the device 130 ). Accordingly, the network 190 may be a wired network, a wireless network (e.g., a mobile or cellular network), or any suitable combination thereof. The network 190 may include one or more portions that constitute a private network, a public network (e.g., the Internet), or any suitable combination thereof.
- the network 190 may include one or more portions that incorporate a local area network (LAN), a wide area network (WAN), the Internet, a mobile telephone network (e.g., a cellular network), a wired telephone network (e.g., a plain old telephone system (POTS) network), a wireless data network (e.g., WiFi network or WiMax network), or any suitable combination thereof. Any one or more portions of the network 190 may communicate information via a transmission medium.
- LAN local area network
- WAN wide area network
- the Internet a mobile telephone network
- POTS plain old telephone system
- WiFi network e.g., WiFi network or WiMax network
- transmission medium refers to any intangible (e.g., transitory) medium that is capable of communicating (e.g., transmitting) instructions for execution by a machine (e.g., by one or more processors of such a machine), and includes digital or analog communication signals or other intangible media to facilitate communication of such software.
- FIG. 2 is a block diagram illustrating components of the identification machine 110 , according to some example embodiments.
- the identification machine 110 is shown as including an identifier module 210 , a reception module 220 , a determination module 230 , a correlation module 240 , the query module 250 , a result module 260 , a performer module 270 , a reference module 280 , and a comparison module 290 , all configured to communicate with each other (e.g., via a bus, shared memory, or a switch). Any one or more of the modules described herein may be implemented using hardware (e.g., one or more processors of a machine) or a combination of hardware and software.
- any module described herein may configure a processor (e.g., among one or more processors of a machine) to perform the operations described herein for that module.
- a processor e.g., among one or more processors of a machine
- any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules.
- modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
- FIGS. 3-9 are flowcharts illustrating operations in a method 300 of audio identification (e.g., of an audio piece during a live performance of the audio piece), according to some example embodiments.
- FIG. 3 illustrates some interactions between the identification machine 110 and the device 120 (e.g., a first device) during a performance of the audio piece by the performer 162 . These illustrated interactions may form a portion of the method 300 , according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments.
- Operation 310 may be performed at or near the beginning of the performance (e.g., at a first time, such as five or ten seconds into the performance) of the audio piece.
- the identifier module 210 of the identification machine 110 obtains an identifier of the audio piece.
- the identifier may be a title of the audio piece (e.g., a song name). As discussed below with respect to FIG. 6 , the identifier may be obtained in any of several ways.
- the device 120 (e.g., the first device) records a live segment of the audio piece being performed.
- the live segment may be recorded by a microphone built into the device 120 .
- operation 370 may be performed at any point during the performance of the audio piece.
- the device 120 In operation 318 , the device 120 generates a live fingerprint of the live segment recorded in operation 317 .
- the device 120 may apply one or more audio fingerprinting techniques (e.g., algorithms) to generate the live fingerprint.
- the audio fingerprinting technique e.g., a first technique
- the audio fingerprinting technique used by the device 120 in operation 318 is designated or selected (e.g., by the identification machine 110 ) as a default technique and may be designated or selected based on the presence or absence of processing power, available memory, or both, in the device 120 .
- the device 120 communicates (e.g., sends) the generated live fingerprint to the identification machine 110 (e.g., via the network 190 ).
- the reception module 220 of the identification machine 110 accesses (e.g., receives) the generated live fingerprint communicated by the device 120 (e.g., at a second time, such as 15 or 20 seconds into the performance).
- operations 317 - 319 are performed by the device 160 of the performer 162 , or by the mixer 161 .
- the reception module 220 of the identification machine 110 may access the generated live fingerprint as communicated by the device 160 , or by the mixer 161 (e.g., at the second time).
- the audio piece includes multiple audio channels (e.g., 64 separate audio channels being input into the mixer 161 , including a monophonic audio channel for a lead guitar, a monophonic audio channel for a bass guitar, left and right stereo audio channels for a synthesizer keyboard, and eight monophonic microphone channels for a drum kit).
- the entire mix of these multiple channels is used for generating the live fingerprint in operation 318 .
- the generating of the live fingerprint in operation 318 may be based on less than all of these multiple audio channels (e.g., generated from a subset of the multiple audio channels).
- the live fingerprint may be generated exclusively from a monophonic audio channel for lead guitar.
- Some example embodiments of the method 300 include operation 328 .
- the determination module 230 of the identification machine 110 determines that the performance has not yet been completed (e.g., has not finished or is not yet done). As discussed below with respect to FIG. 7 , this determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both. Since the performance is not completed, the determination module 230 may determine that the respective times at which the identifier of the audio piece and the live fingerprint were accessed (e.g., the first time and the second time) occurred during the performance of the audio piece.
- the correlation module 240 of the identification machine 110 assigns the identifier obtained in operation 310 to the live fingerprint received in operation 320 . This may be based on the determination in operation 328 that the performance is not over. Accordingly, the correlation module 240 may assign the identifier to the live fingerprint based on an inference that the performance of the audio piece is still ongoing (e.g., continuing).
- the query module 250 of the identification machine 110 accesses metadata of the audio piece (e.g., from the database 115 ). For example, the query module 250 may generate a query based on (e.g., inclusive of) the identifier assigned to the live fingerprint in operation 330 . In some example embodiments, the query is generated based on the determination in operation 328 the performance is not finished. The query module 250 may submit the generated query to the database 115 , and in response, the database 115 may provide the query module 250 with the metadata of the audio piece or access thereto.
- the result module 260 of the identification machine 110 provides the identifier and some or all of the metadata to the device 120 (e.g., via the network 190 ), during the performance of the audio piece.
- the result module 260 may communicate all or part of the identifier obtained in operation 310 and all or part of the metadata accessed in operation 332 to the device 120 (e.g., for presentation thereon, in whole or in part, to the user 122 ).
- the device 120 accesses (e.g., receives) the information that was communicated (e.g., via the network 190 ) from the result module 260 in operation 340 .
- FIG. 4 illustrates some interactions between identification machine 110 and the device 130 (e.g., a second device) during the same performance of the audio piece by the performer 162 . These illustrated interactions may form a portion of the method 300 , according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments.
- the device 130 (e.g., the second device) records a live segment of the audio piece being performed.
- the live segment may be recorded by a microphone built into the device 130 .
- the device 130 In operation 418 , the device 130 generates a live fingerprint of the live segment recorded in operation 417 .
- the device 130 may apply one or more audio fingerprinting techniques to generate the live fingerprint.
- the audio fingerprinting technique e.g., a first technique
- the audio fingerprinting technique has been designated or selected (e.g., by the identification machine 110 ) as a default technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in the device 130 .
- the audio fingerprinting technique (e.g., a second technique) to be used by the device 140 in operation 518 is a different (e.g., non-default) technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in the device 140 .
- the audio piece includes multiple audio channels (e.g., 64 separate audio channels, including a monophonic audio channel for a lead guitar, a monophonic audio channel for a bass guitar, left and right stereo audio channels for a synthesizer keyboard, and eight monophonic microphone channels for a drum kit).
- the generating of the live fingerprint in operation 418 may be based on less than all of these multiple audio channels (e.g., generated from a subset of the multiple audio channels).
- the live fingerprint may be generated exclusively from a monophonic audio channel for lead guitar.
- the live fingerprint may be generated exclusively from a monophonic vocal track (e.g., using vocal melody and lyrics for generating the live fingerprint).
- the live fingerprint is generated from one or more audio channels that are dominant throughout the audio piece, which may facilitate reliable and consistent identification of the audio piece.
- the device 130 communicates the generated live fingerprint to the identification machine 110 (e.g., via the network 190 ).
- the live fingerprint may be communicated in a query for identification of the audio piece, and such a query may be submitted from the device 130 to the network-based system 105 during the performance of the audio piece.
- the reception module 220 of the identification machine 110 accesses the generated live fingerprint communicated by the device 130 (e.g., at a third time, such as 30 or 35 seconds into the performance).
- the determination module 230 of the identification machine 110 determines that the performance is not done (e.g., not yet ended, completed, finished, or over). As discussed in greater detail below with respect to FIG. 7 , this determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both.
- the result module 260 of the identification machine 110 provides the identifier (e.g., assigned in operation 330 ) and some or all of the metadata to the device 130 (e.g., via the network 190 ).
- the result module 260 may communicate all or part of the identifier obtained in operation 310 and all or part of the metadata accessed in operation 332 to the device 130 (e.g., for presentation thereon, in whole or in part, to the user 132 ).
- the device 130 accesses the information that was communicated from the result module 260 in operation 340 . This may have the effect of providing the identifier of the audio piece in a response to the query for identification of the audio piece, during the performance of the audio piece.
- the identifier may be accompanied by additional information (e.g., metadata of the audio piece).
- additional information may include lyrics, album art, original release year, original composer, other performers of the audio piece, or other metadata of the audio piece, as well as an offer to sell a recording (e.g., original or non-original) of the audio piece.
- the identifier may be accompanied by an authorization, such as an authorization to access backstage passes or a merchandise offer (e.g., for free or discounted merchandise related to the audio piece, to the performer, or to both).
- the authorization enables software (e.g., an application, an applet, or a mobile app) executing on the device 130 to access special content that may be presented on the device 130 (e.g., on a screen of the device 130 ). Examples of such special content include screen lighting or imagery (e.g., a slideshow or background image), a game (e.g., a single-player or multiplayer quiz or treasure hunt), or any suitable combination thereof.
- a game may challenge the user 132 to win a prize (e.g., an album on compact disc (CD) or as a music download, exclusive video footage, a t-shirt, or other merchandise item) by correctly identifying multiple audio pieces performed by the performer 162 or by being the first to correctly identify all songs released on a specific album.
- a prize e.g., an album on compact disc (CD) or as a music download, exclusive video footage, a t-shirt, or other merchandise item
- FIG. 5 illustrates some interactions between the identification machine 110 and the device 140 (e.g., a third device) during a live performance of an audio piece by the performer 162 .
- the live performance is the same performance discussed above with respect to FIGS. 3-4 .
- the performer 162 is performing a live version (e.g., a live cover version) of an audio piece differently from a reference version (e.g., a studio version or radio version) of the audio piece as recorded by an artist who may be the same or different from the performer 162 of the live version.
- live version e.g., a live cover version
- reference version e.g., a studio version or radio version
- the identification machine 110 performs only operations 520 , 530 , 540 , 550 , and 560 (e.g., in response to performance of operations 517 , 518 , and 519 by the device 140 ), without performing any operations described above with respect to FIGS. 3 and 4 .
- the device 140 (e.g., the third device) records a live segment of the audio piece being performed.
- the live segment may be recorded by a microphone built into the device 140 .
- the device 140 may record a live segment of a live version (e.g., a live cover version) of the audio piece, as the live version of the audio piece is being performed.
- the live segment may be received (e.g., as a digital feed, a network stream, a broadcast signal, or any suitable combination thereof) by the device 140 via the network 190 (e.g., from the identification machine 110 , the device 160 , or the mixer 161 ).
- the device 140 In operation 518 , the device 140 generates a live fingerprint of the live segment recorded in operation 517 .
- the device 140 may apply one or more audio fingerprinting techniques to generate the live fingerprint.
- the audio fingerprinting technique e.g., a first technique
- the audio fingerprinting technique to be used by the device 140 in operation 518 is designated or selected (e.g., by the identification machine 110 ) as a default technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in the device 140 .
- the audio fingerprinting technique (e.g., a second technique) to be used by the device 140 in operation 518 is a different (e.g., non-default) technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in the device 140 .
- the audio fingerprinting technique (e.g., the second technique) is particularly suitable for live version identification and may implement one or more image processing techniques to derive fingerprints that are robust to both audio degradations and audio variations, while still being compact enough for efficient matching. Further details on such an audio fingerprinting technique are provided below.
- the device 140 communicates the generated live fingerprint to the identification machine 110 (e.g., via the network 190 ).
- the live fingerprint may be communicated in a query for identification of the audio piece, and such a query may be submitted from the device 140 to the network-based system 105 during the performance of the audio piece (e.g., the live version of the audio piece).
- the reception module 220 of the identification machine 110 accesses the generated live fingerprint communicated by the device 140 (e.g., at any point in time during the performance of the audio piece, such as 5, 10, 15, 20, 30, 40, or 45 seconds into the performance).
- the performer module 270 of the identification machine 110 identifies the performer of the live version of the audio piece.
- the performer module 270 may detect the venue of the live performance (e.g., the place or location where the live performance is occurring) and identify the performer based on the detected venue (e.g., by accessing information, which may be stored in the database 115 , that correlates the performer with the venue).
- the detected venue may be a concert hall, an auditorium, a hotel, a conference room, a resort, a school, a theater, an amphitheater, a fairground, a sports arena, a stadium, a private residence, or any suitable combination thereof. As discussed below with respect to FIG.
- the detection of the venue may be based on a geolocation (e.g., Global Positioning System (GPS) coordinates) of the device 140 , an identifier (e.g., Internet protocol (IP) address) of a network (e.g., network 190 ) at the venue (e.g., a local wireless network at the venue), an image (e.g., photo) of a ticket stub for an event that includes the live performance (e.g., generated by the device 140 and accessed by the performer module 270 ), a user preference for the venue (e.g., stored in a user profile of the user 142 ), social network data that references the venue (e.g., publicly or privately published in a microblog entry by the user 142 ), a calendar event of the user 142 , a purchase record of the user 142 (e.g., for tickets to an event that includes live performance), or any suitable combination thereof.
- the venue may be detected by detecting that the device 140 is
- the reference module 280 of the identification machine 110 accesses a set of one or more reference fingerprints based on the performer identified in operation 530 . Furthermore, the accessing of the reference fingerprints may be also based on the detected venue at which the live version is being performed, a current date, current time, or any suitable combination thereof. As noted above, the reference module 280 may retrieve a list of audio pieces (e.g., playlist, concert program, a concert brochure, or concert poster) for the performer (e.g., based on the detected venue and at the current date and current time).
- a list of audio pieces e.g., playlist, concert program, a concert brochure, or concert poster
- the reference module 280 may identify reference versions (e.g., official or canonical versions) of the audio pieces that correspond to the performer (e.g., and corresponding to the detected venue, the current date, the current time, or any suitable combination thereof).
- the database 115 may store these reference fingerprints, which may have been previously generated from segments of the reference versions of the audio pieces.
- these reference fingerprints may be a reference fingerprint (e.g., a particular reference fingerprint) of a reference version of the audio piece of which a live version is currently being performed.
- the set of reference fingerprints may be accessed from the database 115 , which may correlate (e.g., assign, map, or link) the reference fingerprint (e.g., the particular reference fingerprint) of the reference version with the identifier of the audio piece (e.g., as assigned in operation 330 ).
- operation 540 may be performed at any point prior to operation 550 (e.g., before the performance of the audio piece).
- the accessing of the reference fingerprints may be based on a scheduled date and time for the performance itself.
- the comparison module 290 of the identification machine 110 identifies the audio piece being performed by comparing the live fingerprint (e.g., accessed in operation 520 ) to the set of reference fingerprints (e.g., accessed in operation 540 ).
- the comparison module 290 may compare the live fingerprint of a segment of the live version to the reference fingerprints of segments of the reference versions.
- the comparison module 290 compares the live fingerprint exclusively (e.g., only) to the set of reference fingerprints or a subset thereof. This may have the effect of reducing computational complexity, increasing computational speed, increasing accuracy, or any suitable combination thereof.
- the comparison module 290 may identify a match between the live fingerprint and the reference fingerprint (e.g., the particular reference fingerprint) of the reference version of the audio piece of which the live version is currently being performed. Based on this identifying of the match, the comparison module 290 may identify the audio piece while its live version is being performed.
- the identified match between the live fingerprint and the reference fingerprint may be an imperfect match (e.g., a fuzzy match or a near match).
- operation 550 includes performing an analysis of musically meaningful and unique features audio piece, and then performing a loose comparison that allows for differences in the playing and interpretation of the audio piece (e.g., different instrumentation, tempo, or intonation).
- operation 550 includes determines harmonic and rhythmic elements from the live fingerprint and the set of reference fingerprints and compares these elements to find a most likely candidate match among the set of reference fingerprints.
- Such an analysis and comparison may be performed within a predetermined period of time (e.g., a 10 second window). In some situations, the analysis and comparison are performed in short segments (e.g., 3 second segments).
- the analysis and comparison may be performed until a single match (e.g., best candidate) is found, or until the analysis and comparison converge to obtain a stabilized list of a few candidate matches.
- multiple candidate matches maybe identified in situations where the set of reference fingerprints includes reference fingerprints from multiple different recordings of the audio piece (e.g., studio recordings, live recordings, and variations, such as acoustic versions or extended remixes).
- the result module 260 of the identification machine 110 provides the identifier (e.g., as assigned in operation 330 ) of the identified audio piece to the device 140 (e.g., by the network 190 ).
- the identifier may be provided with some or all of the metadata for the audio piece.
- the result module 260 may communicate all or part of the identifier obtained in operation 310 and all or part of the metadata accessed in operation 332 to the device 140 (e.g., for presentation thereon, in whole or in part, to the user 142 ).
- the device 140 accesses the information that was communicated from the result module 260 in operation 560 .
- the identifier may be provided as a candidate identifier (e.g., a proposed identifier) among multiple candidate identifiers (e.g., for confirmation by the user 142 via the device 140 ).
- a candidate identifier may be provided as part of a game (e.g., a trivia quiz) in which multiple users (e.g., users 132 , 142 , and 152 ) attempt to identify the audio piece by selecting the correct candidate identifier from among multiple candidate identifiers presented.
- a game e.g., a trivia quiz
- the audio fingerprinting technique used for identifying the live version of the audio piece may be particularly well-suited for generating fingerprints that are robust to both audio degradations and audio variations, while still being compact enough for efficient matching.
- a fingerprint may be derived from a segment of an audio piece (e.g., a live segment or a reference segment) by first using a log-frequency spectrogram to capture the melodic similarity and handle key variations, and then using adaptive thresholding to reduce the feature size and handle noise degradations and local variations.
- the segment to be transformed into a time-frequency representation such as a log-frequency spectrogram based on the Constant Q Transform (CQT).
- CQT Constant Q Transform
- the CQT is a transform with a logarithmic frequency resolution, similar to the human auditory system and consistent with the notes of the Western music scale. Accordingly, the CQT may be well-suited for music analysis.
- the CQT may handle key variations relatively easily, since pitch deviations correspond to frequency translations in the transform.
- the CQT is computed by using a fast algorithm based on the Fast Fourier Transform (FFT) in conjunction with the use of a kernel.
- FFT Fast Fourier Transform
- a CQT-based spectrogram may be derived by using a time resolution of around 0.13 seconds per time frame and the frequency resolution up one quarter note per frequency channel, with a frequency range spanning from C 3 (130.81 Hz) to C 8 (4186.01 Hz), resulting in 120 frequency channels.
- the CQT-based spectrogram may be transformed into a binary image. According to various example embodiments, this is performed using adaptive thresholding method based on two-dimensional median filtering. Thresholding is a technique for image segmentation that uses a threshold value to turn a grayscale image into a binary image.
- the threshold value for each pixel of an image may be adapted based on local statistics of the pixel's neighborhood. For each time-frequency bin in the CQT-based spectrogram, given a window size, the median of the neighborhood may be computed. As an example, the window size may be 35 frequency channels by 15 time frames. Then, the value of the bin may be compared with the value of its median. If the value of the bin is higher than its median, the value of the bin may be assigned to 1. If otherwise, the value of the bin may be assigned to 0. This process may be restated as the following equation:
- the CQT-based spectrogram may be clustered into foreground (e.g., with assigned values of one) where the energy is locally high, or background (e.g., with assigned values of zero) with the energy is locally low.
- the result may therefore be used as a compact fingerprint (e.g., a CQT-based fingerprint) that can handle noise degradations while still allowing local variations.
- Such compact (e.g., CQT-based) fingerprints may be used to perform comparisons and matching between a query fingerprint and one or more reference fingerprints.
- template matching may be performed (e.g., by the comparison module 290 during operation 550 ) between query and reference fingerprints by first using Hamming similarity to compare all pairs of time frames at different pitch shifts and handle key variations, and then using the Hough Transform to find the best alignment and handle tempo variations.
- a similarity matrix may be computed between a query fingerprint and a reference fingerprint.
- Hamming similarity may be calculated between all pairs of time frames in the query fingerprint and the reference fingerprints.
- the Hamming similarity is the percentage of pins that matches between two arrays (e.g., arrays of ones and zeroes).
- Each bin in the resulting matrix measures the Hamming similarity between any two pairs of time frames in the query and reference fingerprints.
- the similarity matrix for different pitch shifts in the query may also be computed. In some cases, a number of ⁇ 10 pitch shifts may be used (e.g., assuming a maximum key variation of ⁇ 5 semitones between a live performance and its studio version). This may have the effect of measuring the similarity of both the foregrounds and the backgrounds between the query and reference fingerprints, which may be beneficial in identifying an audio piece.
- the best alignment between the query fingerprint and the reference fingerprint may be identified.
- the best alignment may correspond to a line that is at or near an angle of 45° in the similarity matrix and that intersects the bins with the largest calculated Hamming similarity.
- the Hough Transform may be used to determine the best alignment.
- the Hough Transform is a technique for detecting shapes (e.g., lines) in an image by building a parameter space matrix and identifying the parameter candidates that give the largest values.
- the similarity matrix computed above may be binarized based on a threshold value.
- the Hough Transform may then be computed, and the ( ⁇ , ⁇ ) candidate that gives the largest normalized value in the space parameter matrix may be identified (e.g., as the highest overall Hamming similarity).
- the threshold value may be 0.6; a range for ⁇ may be equal to the number of time frames in the reference fingerprints; and a range for ⁇ may be around ⁇ 45° ⁇ 5°, which may correspond to a number of ⁇ 10 time shifts (e.g., assuming a maximum tempo variation of ⁇ 20% between a live performance and its studio version).
- This may have the effect of identifying a short and noisy excerpt (e.g., recorded from a smartphone at a live performance) by comparing the excerpt to a database of studio recordings from a known performer or known artist.
- no hash functions are used in the above fingerprinting and matching techniques. This may have the effect of obtaining greater accuracy.
- relatively short queries e.g., segments of audio less than 10 seconds in duration
- relatively small databases e.g., 50-100 songs per artist or performer
- the method 300 or portions thereof may include one or more of operations 610 , 620 , 630 , 640 , 650 , and 660 .
- One or more of operations 610 , 620 , 630 , 640 , 650 , and 660 may be performed as part (e.g., a precursor task, a subroutine, or a portion) of operation 310 , in which the identifier module 210 obtains the identifier of the audio piece.
- the identifier module 210 receives the identifier in a user submission from the user 122 via the device 120 (e.g., the first device).
- the user 122 may be a manager, promoter, moderator, or other authoritative person for the event in which the live performance occurs, and the user 122 may submit the identifier to the network-based system 105 (e.g., so that other users 132 , 142 , and 152 may be able to receive the identifier on their respective devices 130 , 140 , and 150 ).
- the identifier is received from the device 160 of the performer 162 , the mixer 161 , or any suitable combination thereof.
- the identifier module 210 receives some metadata of the audio piece (e.g., without the identifier of the audio piece) from the device 120 (e.g., the first device, as a user submission).
- metadata may include one or more descriptors of the audio piece (e.g., an artist name, an album name, a release year, or a genre).
- the user 122 may be an audience member that does not know the identifier of the audio piece, but knows at least some metadata of the audio piece (e.g., the artist name, the album name, the release year, the genre, or even a portion of the identifier of the audio piece).
- the user 122 may submit what he knows to the network-based system 105 . This operation may be repeated for additional users (e.g., user 152 ) to obtain additional metadata of the audio piece.
- the metadata received in operation 620 (e.g., from one or more users 122 and 152 ) may be a basis (e.g., a sufficient basis) for the identifier module 210 to obtain the identifier of the audio piece (e.g., from the database 115 , which may correlate the metadata with the identifier of the audio piece).
- the metadata is received from the device 160 of the performer 162 , the mixer 161 , or any suitable combination thereof.
- the identifier module 210 detects a geolocation of the device 120 (e.g., the first device). This may be performed based on an indication that the user 122 has made the device 120 available for location-based services (e.g., stored by the database 115 in a user profile for the user 122 ). The detected geolocation may be a basis (e.g., a sufficient basis) for the identifier module 210 to obtain the identifier of the audio piece (e.g., from the database 115 , which may correlate the location of the venue at which the audio piece is being performed with the identifier of the audio piece).
- a geolocation of the device 120 e.g., the first device. This may be performed based on an indication that the user 122 has made the device 120 available for location-based services (e.g., stored by the database 115 in a user profile for the user 122 ).
- the detected geolocation may be a basis (e.g., a sufficient basis) for the identifier module 210 to obtain
- the identifier module 210 queries the database 115 for the identifier of the audio piece. This query may be made based on the metadata of the audio piece received in operation 620 (e.g., one or more descriptors of the audio piece), the geolocation of the device 120 (e.g., the first device) detected in operation 630 , or any suitable combination thereof.
- the identifier module 210 may have performed multiple instances of operation 610 and received multiple submissions that attempt to submit the identifier of the audio piece (e.g., submissions that include both correct and incorrect identifiers). In situations where the multiple submissions are not unanimous, the identifier module 210 performs operation 650 by tallying votes for the identifier of the audio piece. For example, the identifier module 210 may count the quantity of submissions received for each distinct identifier. In some example embodiments, the identifier with the most votes is selected by the identifier module 210 as the identifier of the audio piece in operation 310 .
- an identifier with less than the largest number of votes is selected based on results from one or more of operations 620 , 630 , and 640 .
- one or more of the devices 120 , 130 , 140 , and 150 may execute software that implements a game (e.g., a multiplayer quiz or trivia game) that solicits the multiple submissions that attempt to submit the identifier of the audio piece.
- a game may challenge the users 122 , 132 , 142 , and 152 to win a prize (e.g., an album on CD) by correctly identifying multiple audio pieces performed by the performer 162 or by being the first to correctly identify all songs released on a specific album.
- the identifier module 210 accesses the identifier of the audio piece (e.g., directly or indirectly) from the device 160 of the performer 162 , the mixer 161 , or any suitable combination thereof.
- the device 160 may be a computer operated by a disc jockey (DJ) and configured to play the audio piece (e.g., execute the performance of the audio piece).
- the mixer 161 may be or include a computer that executes audio mixing software (e.g., programmed with a list of song names and start times).
- the identifier module 210 may thus obtain (e.g., read) the identifier of the audio piece based on a playlist, current date, current time, or any suitable combination thereof.
- the identifier module 220 receives the identifier in response to an event within audio renderer that is executing on the device 160 , the mixer 161 , or both. Examples of such an event include a play event, a stop event, a pause event, a scratch event, a playback position timer event, or any suitable combination thereof.
- the method 300 or portions thereof may include one or more of operations 710 , 720 , 722 , 724 , and 726 .
- example embodiments of the method 300 that include one or more of operations 328 and 428 may include operations 710 and 720 .
- operations 328 and 428 involve the determination module 230 of the identification machine 110 determining that the performance of the audio piece is not done. This determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both.
- the reception module 220 of the identification machine 110 accesses (e.g., receives) one or more live fingerprints of segments of the audio piece. These live fingerprints may be received from one or more devices (e.g., devices 120 , 130 , 140 , and 150 ), and these received live fingerprints may be used by the determination module 230 in performing operation 328 , operation 420 , or both. Accordingly, operation 710 may be performed any number of times between operations 310 and 320 and any number of times between operations 310 and 420 .
- Operation 720 may be performed as part of operation 328 , in which the determination module 230 determines that the performance of the audio piece is not done. In some example embodiments, operation 720 may be performed as part of operation 428 , which is similar to operation 328 . In operation 720 , the determination module 230 determines that the live fingerprints received in operation 710 fail to indicate an end of the audio piece (e.g., that the fingerprints fail to indicate that the performance of the audio piece has ended). One or more of operations 722 , 724 , and 726 may be performed as part of operation 720 .
- the determination module 230 fails to detect silence beyond a threshold period of time (e.g., first threshold duration corresponding to a period of silence indicative of an end of a performance).
- a threshold period of time e.g., first threshold duration corresponding to a period of silence indicative of an end of a performance.
- the determination in operation 720 that the performance is not over may be based on an absence of silence that lasts longer than this threshold period of time.
- the determination module 230 fails to detect applause beyond a threshold period of time (e.g., a second threshold duration corresponding to a period of clapping or cheering indicative of an end of the performance).
- a threshold period of time e.g., a second threshold duration corresponding to a period of clapping or cheering indicative of an end of the performance.
- the determination module 230 fails to detect booing beyond a threshold period of time (e.g., a third threshold duration corresponding to a period of groaning or jeering indicative of an end of the performance).
- a threshold period of time e.g., a third threshold duration corresponding to a period of groaning or jeering indicative of an end of the performance.
- the determination in operation 720 that the performance is not over may be based on an absence of booing that lasts longer than this threshold period of time.
- the method 300 or portions thereof may include one or more of operations 831 , 832 , 833 , 834 , 835 , 836 , and 837 .
- One or more of operations 831 - 837 may be performed as part of operation 530 , in which the performer module 270 of the identification machine 110 may identify the performer by detecting the venue of the performance of the audio piece (e.g., a live performance of a live version of the audio piece).
- the performer module 270 accesses a geolocation (e.g., GPS coordinate) of the device 140 (e.g., the third device) from which the live fingerprint was received in operation 520 .
- a geolocation e.g., GPS coordinate
- the geolocation is received with the live fingerprint in operation 520 .
- the performer module 270 accesses an identifier of a network at the venue (e.g., an IP address or a domain name of the network 190 ) from the device 140 (e.g., the third device) from which the live fingerprint was received in operation 520 .
- a network may be or include a local wireless network at the venue.
- the identifier may identify the network 190 to which the device 140 is communicatively coupled.
- the identifier of the network 190 is received with the live fingerprint in operation 520 .
- the performer module 270 accesses an image (e.g., a photo) of a ticket stub for an event that includes the live performance of the audio piece.
- an image may be generated (e.g., captured or taken) by a built-in camera within the device 140 (e.g., the third device) from which the live fingerprint was received in operation 520 .
- the image of the ticket stub is received with the live fingerprint in operation 520 .
- the performer module 270 accesses a user preference for the venue (e.g., stored in a user profile of the user 142 within the database 115 ).
- the database 115 may store a user profile that indicates the venue is the closest of multiple available venues to a residence of the user 142 , who is associated with (e.g., corresponds to) the device 140 (e.g., the third device) from which the live fingerprint was received in operation 520 .
- the user preference for the venue is received with the live fingerprint in operation 520 .
- the performer module 270 accesses social network data of the user 142 (e.g., stored within the database 115 or accessible via the network 190 from a third-party social network server).
- the database 115 may store social network data descriptive of the user 142 (e.g., status updates, microblog posts, images, comments, likes, favorites, or other public, private, or semiprivate publications to friends of the user 142 ), and some or all of the social network data may reference the venue or otherwise indicate that the user 142 is located at the venue where the live performance is taking place at the current date and current time.
- the performer module 270 may detect the venue of the live performance based on the social network data of the user 142 .
- the social network data is received with the live fingerprint in operation 520 .
- the performer module 270 accesses a calendar event of the user 142 (e.g., stored within the database 115 or accessible via the network 190 from a third-party calendar server).
- the database 115 may store calendar data for the user 142 (e.g., meetings, appointments, or other scheduled events), and the accessed calendar event may indicate that the user 142 is located at the venue where the live performance is taking place at the current date and current time.
- the performer module 270 may detect the venue of the live performance based on the calendar event of the user 142 .
- the calendar event is received with the live fingerprint in operation 520 .
- the performer module 270 accesses a purchase record (e.g., transaction record) of the user 142 (e.g., stored within the database 115 or accessible via the network 190 from a third-party transaction server).
- a purchase record e.g., transaction record
- the database 115 may store purchase data for the user 142 (e.g., transaction records for purchases made by the user 142 ), and the purchase record may indicate that the user 142 purchased a ticket (e.g., from the venue) for an event at which the live performance is taking place at the current date and current time.
- the performer module 270 may detect the venue of the live performance based on the purchase record of the user 142 .
- the purchase record is received with the live fingerprint in operation 520 .
- the method 300 or portions thereof may include operation 910 , which in turn may include one or more of operations 911 , 912 , 913 , 914 , 915 , and 916 .
- operation 910 may be performed at any point prior to operation 540 , in which the reference module 280 of the identification machine 110 accesses the reference fingerprints.
- operation 910 may be performed prior to the beginning of the performance itself.
- operation 910 is performed each time the performer 160 or an artist that originally recorded the audio piece releases new material (e.g., new recordings of audio pieces).
- operation 910 is performed periodically (e.g., at regularly scheduled intervals of time).
- the reference module 280 of the identification machine 110 builds the set of reference fingerprints to be accessed in operation 540 .
- the reference module 280 may do this by generating some or all of the database 115 .
- One or more of operations 911 - 960 may be performed as part of operation 910 .
- the reference module 280 accesses a schedule for a venue at which an event that includes the live performance will be take place.
- the reference module 280 may access a venue schedule in the form of an event calendar (e.g., a concert calendar) for the venue, a playlist for the venue, an agenda for the venue, an advertisement (e.g., poster) for the venue, or any suitable combination thereof.
- the schedule may be accessed from information previously collected and stored in the database 115 or from a third-party server corresponding to the venue itself.
- the accessed schedule may correlate the venue with the performer 162 of the audio piece, correlate the venue with an artist that recorded a reference version of the audio piece (e.g., an original artist that recorded a studio recording of the audio piece or a live recording of the audio piece), correlate the venue with a period of time during which the live fingerprint is received in operation 520 , or any suitable combination thereof.
- a reference version of the audio piece e.g., an original artist that recorded a studio recording of the audio piece or a live recording of the audio piece
- the reference module 280 determines (e.g., identifies) the performer 162 based on the schedule accessed in operation 911 .
- the performer 162 may be determined based on the artist being correlated with the venue by the schedule accessed in operation 911 .
- the performer 162 may be determined based on the period of time during which the live fingerprint is received in operation 520 being correlated with the artist by the schedule. This determination of the performer 162 may enable the identification machine 110 to infer the likely audio pieces to be played and thus significantly reduce the number of possible audio pieces that may be performed during the live performance.
- the reference module 280 accesses (e.g., retrieves) studio reference fingerprints of segments of studio recordings by an artist (e.g., original artist).
- the artist is the performer 162 , though this need not be the case.
- the studio reference fingerprints may be accessed from information previously collected and stored in the database 115 or from a third-party server (e.g., corresponding to the venue, to the artist, to the performer 162 , or any suitable combination thereof).
- the reference module 280 accesses (e.g., retrieves) live reference fingerprints of segments of studio recordings by the artist (e.g., original artist).
- the artist may be the performer 162 , though this need not be the case.
- the live reference fingerprints may be accessed from information previously collected and stored in the database 115 or from a third-party server (e.g., corresponding to the venue, to the artist, to the performer 162 , or any suitable combination thereof).
- the mixer 161 is the source of one or more segments of a reference version of the audio piece whose live version is being performed, and one or more of the live reference fingerprints are generated (e.g., by the reference module 280 ) from such segments received from the mixer 161 .
- the mixer 161 , the device 160 of the performer 162 , or both may provide the reference module 280 with metadata (e.g., at least some of the metadata accessed in operation 332 ) that describes or identifies the audio piece, one or more live recordings of the audio piece, one or more studio recordings of the audio piece, or any suitable combination thereof (e.g., for storage in the database 115 and for access by the query module 250 ).
- the reference module 280 accesses (e.g., retrieves) a previously played playlist from a previously performed performance by the same artist (e.g., the performer 162 ). This may enable the identification machine 110 to further infer the most likely audio pieces to be played and thus even further reduce the number possible audio pieces that may be performed during a live performance.
- the previously played playlist may be a basis for weighting one or more of multiple candidate identifiers of the audio piece.
- identifiers of audio pieces already performed during the current performance may be accorded lower weights or omitted from consideration, since it may be unlikely that the performer 162 will perform the same audio piece twice in one show, particularly, back-to-back or within a short time window (e.g., 20 minutes).
- the reference module 280 accesses (e.g., retrieves) fingerprints for segments of likely or most likely audio pieces to be played by the performer 162 . These accessed fingerprints may then be designated by the reference module 280 as the set of reference fingerprints to be accessed in operation 540 . As noted above, these accessed fingerprints may be stored in the database 115 , for later use (e.g., in operation 540 ).
- one or more of the methodologies described herein may facilitate identification of an audio piece during its performance. Moreover, one or more of the methodologies described herein may facilitate identification of the audio piece during performance of a live version of the audio piece, even where the live version differs from previously recorded versions of the audio piece. Hence, one or more the methodologies described herein may facilitate retrieval and presentation of information regarding the identified audio piece (e.g., its identifier and some or all of its metadata) to one or more audience members during performance of the same audio piece. Furthermore, one or more of the methodologies described herein may facilitate identification and tagging of recordings that were made during the performance.
- one or more of the methodologies described herein may obviate a need for certain efforts or resources that otherwise would be involved in identifying an audio piece during the performance of an audio piece. Efforts expended by a user may be reduced by one or more of the methodologies described herein. Computing resources used by one or more machines, databases, or devices (e.g., within the network environment 100 ) may similarly be reduced. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption, and cooling capacity.
- FIG. 10 is a block diagram illustrating components of a machine 1000 , according to some example embodiments, able to read instructions 1024 from a machine-readable medium 1022 (e.g., a non-transitory machine-readable medium, a machine-readable storage medium, a computer-readable storage medium, or any suitable combination thereof) and perform any one or more of the methodologies discussed herein, in whole or in part.
- a machine-readable medium 1022 e.g., a non-transitory machine-readable medium, a machine-readable storage medium, a computer-readable storage medium, or any suitable combination thereof
- FIG. 10 shows the machine 1000 in the example form of a computer system (e.g., a computer) within which the instructions 1024 (e.g., software, a program, an application, an applet, an app, or other executable code) for causing the machine 1000 to perform any one or more of the methodologies discussed herein may be executed, in whole or in part.
- the instructions 1024 e.g., software,
- the machine 1000 operates as a standalone device or may be connected (e.g., networked) to other machines.
- the machine 1000 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a distributed (e.g., peer-to-peer) network environment.
- the machine 1000 may be a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a cellular telephone, a smartphone, a set-top box (STB), a personal digital assistant (PDA), a web appliance, a network router, a network switch, a network bridge, or any machine capable of executing the instructions 1024 , sequentially or otherwise, that specify actions to be taken by that machine.
- STB set-top box
- PDA personal digital assistant
- a web appliance a network router, a network switch, a network bridge, or any machine capable of executing the instructions 1024 , sequentially or otherwise, that specify actions to be taken by that machine.
- STB set-top box
- PDA personal digital assistant
- a web appliance a network router, a network switch, a network bridge, or any machine capable of executing the instructions 1024 , sequentially or otherwise, that specify actions to be taken by that machine.
- the term “machine” shall also be taken
- the machine 1000 includes a processor 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a radio-frequency integrated circuit (RFIC), or any suitable combination thereof), a main memory 1004 , and a static memory 1006 , which are configured to communicate with each other via a bus 1008 .
- the processor 1002 may contain microcircuits that are configurable, temporarily or permanently, by some or all of the instructions 1024 such that the processor 1002 is configurable to perform any one or more of the methodologies described herein, in whole or in part.
- a set of one or more microcircuits of the processor 1002 may be configurable to execute one or more modules (e.g., software modules) described herein.
- the machine 1000 may further include a graphics display 1010 (e.g., a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, a cathode ray tube (CRT), or any other display capable of displaying graphics or video).
- a graphics display 1010 e.g., a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, a cathode ray tube (CRT), or any other display capable of displaying graphics or video).
- PDP plasma display panel
- LED light emitting diode
- LCD liquid crystal display
- CRT cathode ray tube
- the machine 1000 may also include an alphanumeric input device 1012 (e.g., a keyboard or keypad), a cursor control device 1014 (e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, an eye tracking device, or other pointing instrument), a storage unit 1016 , an audio generation device 1018 (e.g., a sound card, an amplifier, a speaker, a headphone jack, or any suitable combination thereof), and a network interface device 1020 .
- an alphanumeric input device 1012 e.g., a keyboard or keypad
- a cursor control device 1014 e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, an eye tracking device, or other pointing instrument
- a storage unit 1016 e.g., an audio generation device 1018 (e.g., a sound card, an amplifier, a speaker, a headphone jack, or any suitable combination
- the storage unit 1016 includes the machine-readable medium 1022 (e.g., a tangible and non-transitory machine-readable storage medium) on which are stored the instructions 1024 embodying any one or more of the methodologies or functions described herein.
- the instructions 1024 may also reside, completely or at least partially, within the main memory 1004 , within the processor 1002 (e.g., within the processor's cache memory), or both, before or during execution thereof by the machine 1000 . Accordingly, the main memory 1004 and the processor 1002 may be considered machine-readable media (e.g., tangible and non-transitory machine-readable media).
- the instructions 1024 may be transmitted or received over the network 190 via the network interface device 1020 .
- the network interface device 1020 may communicate the instructions 1024 using any one or more transfer protocols (e.g., hypertext transfer protocol (HTTP)).
- HTTP hypertext transfer protocol
- the machine 1000 may be a portable computing device, such as a smart phone or tablet computer, and have one or more additional input components 1030 (e.g., sensors or gauges).
- additional input components 1030 include an image input component (e.g., one or more cameras), an audio input component (e.g., a microphone), a direction input component (e.g., a compass), a location input component (e.g., a global positioning system (GPS) receiver), an orientation component (e.g., a gyroscope), a motion detection component (e.g., one or more accelerometers), an altitude detection component (e.g., an altimeter), and a gas detection component (e.g., a gas sensor).
- Inputs harvested by any one or more of these input components may be accessible and available for use by any of the modules described herein.
- the term “memory” refers to a machine-readable medium able to store data temporarily or permanently and may be taken to include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. While the machine-readable medium 1022 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions.
- machine-readable medium shall also be taken to include any medium, or combination of multiple media, that is capable of storing the instructions 1024 for execution by the machine 1000 , such that the instructions 1024 , when executed by one or more processors of the machine 1000 (e.g., processor 1002 ), cause the machine 1000 to perform any one or more of the methodologies described herein, in whole or in part.
- a “machine-readable medium” refers to a single storage apparatus or device, as well as cloud-based storage systems or storage networks that include multiple storage apparatus or devices.
- machine-readable medium shall accordingly be taken to include, but not be limited to, one or more tangible (e.g., non-transitory) data repositories in the form of a solid-state memory, an optical medium, a magnetic medium, or any suitable combination thereof.
- Modules may constitute software modules (e.g., code stored or otherwise embodied on a machine-readable medium or in a transmission medium), hardware modules, or any suitable combination thereof.
- a “hardware module” is a tangible (e.g., non-transitory) unit capable of performing certain operations and may be configured or arranged in a certain physical manner.
- one or more computer systems e.g., a standalone computer system, a client computer system, or a server computer system
- one or more hardware modules of a computer system e.g., a processor or a group of processors
- software e.g., an application or application portion
- a hardware module may be implemented mechanically, electronically, or any suitable combination thereof.
- a hardware module may include dedicated circuitry or logic that is permanently configured to perform certain operations.
- a hardware module may be a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC.
- a hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations.
- a hardware module may include software encompassed within a general-purpose processor or other programmable processor. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- hardware module should be understood to encompass a tangible entity, and such a tangible entity may be physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
- “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module comprises a general-purpose processor configured by software to become a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (e.g., comprising different hardware modules) at different times. Software (e.g., a software module) may accordingly configure one or more processors, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
- a resource e.g., a collection of information
- processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions described herein.
- processor-implemented module refers to a hardware module implemented using one or more processors.
- processor-implemented module refers to a hardware module in which the hardware includes one or more processors.
- processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS).
- At least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an application program interface (API)).
- a network e.g., the Internet
- API application program interface
- the performance of certain operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines.
- the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Landscapes
- Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- The subject matter disclosed herein generally relates to the processing of data. Specifically, the present disclosure addresses systems and methods to facilitate audio identification.
- A performer may give a live performance (e.g., a concert or other live show) before an audience that includes one or more individuals (e.g., audience members, fans, or concertgoers). For example, a musical soloist (e.g., a singer-songwriter) may perform at a concert before such an audience. As another example, a musical group (e.g., a rock band) may perform at a concert before such an audience. As a further example, a theater troupe (e.g., including actors, dancers, and a choir) may perform a theatrical show before such an audience.
- One or more audio pieces (e.g., musical pieces or spoken word pieces) may be performed during a live performance. For example, one or more songs may be performed, and a song may be performed with or without visual accompaniment (e.g., a video, a laser show, or a dance routine). In some situations, the performer of an audio piece is an artist that recorded the audio piece (e.g., as a studio recording or as a live recording). For example, a performer may perform a song that was written and recorded by her herself. In other situations, the performer of an audio piece is different from the artist that recorded the audio piece (e.g., as a studio recording or as a live recording). For example, a performer may perform a cover of a song that was written and recorded by someone else.
- Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings.
-
FIG. 1 is a network diagram illustrating a network environment suitable for audio identification, according to some example embodiments. -
FIG. 2 is a block diagram illustrating components of an identification machine suitable for audio identification, according to some example embodiments. -
FIGS. 3-9 are flowcharts illustrating operations in a method of audio identification, according to some example embodiments. -
FIG. 10 is a block diagram illustrating components of a machine, according to some example embodiments, able to read instructions from a machine-readable medium and perform any one or more of the methodologies discussed herein. - Example methods and systems are directed to audio identification. Examples merely typify possible variations. Unless explicitly stated otherwise, components and functions are optional and may be combined or subdivided, and operations may vary in sequence or be combined or subdivided. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments. It will be evident to one skilled in the art, however, that the present subject matter may be practiced without these specific details.
- During a live performance (e.g., a live concert) of one or more audio pieces (e.g., songs), one or more audience members (e.g., concertgoers) may use a network-based system to identify an audio piece during its performance (e.g., while the audio piece is being performed). The network-based system may provide its users (e.g., the audience members) with one or more audio identification services. A machine may form all or part of the network-based system and may be configured (e.g., by software) to provide such identification services to one or more users (e.g., concertgoers).
- The machine may be configured to obtain an identifier (e.g., a song name) of an audio piece during a performance of an audio piece (e.g., at a first time, such as five seconds into a song). The identifier may be obtained in any one or more of various ways, including, for example, receiving the identifier as a user submission (e.g., from an audience member, from a venue manager, or from the performer herself), inferring the identifier based on some received metadata of the audio piece (e.g., a partial name of the song, an album on which the song appears, or a release year of the song), inferring the identifier based on a detected geolocation of a device whose user is at the performance, tallying votes for the identifier (e.g., from several audience members), and accessing the identifier directly from a device of the performer (e.g., a mixer, a drum machine, a media player, a smartphone, or a tablet computer).
- A user's device (e.g., smartphone or smart watch configured by a mobile app) may record a segment of the audio piece during its performance, generate a fingerprint of the segment, and upload the fingerprint to the machine. The machine may receive the fingerprint during the performance (e.g., at a second time, such as 15 seconds into the song) and assign the identifier to the fingerprint. This identifier may be provided to the user's device to identify the audio piece. The machine may receive additional information (e.g., one or more additional fingerprints or classifications of additional segments of the audio piece or other audio) from additional users' devices, and the machine may determine from this additional information that the audio piece has not ended (e.g., by failing to detect silence, applause, booing, or any suitable combination thereof). The machine may provide the identifier to any one or more of these additional users' devices.
- According to some example embodiments, another user's device (e.g., configured by a mobile app) may record another segment of the audio piece during its performance, generate another fingerprint of the segment, and submit this fingerprint to the machine as a query for identification of the audio piece. While the performance continues, the machine may receive this fingerprint during the performance (e.g., at a third time, such as 30 seconds into the song) and respond during the performance by providing the identifier, which may be based on its determination that additional information (e.g., one or more additional fingerprints or classifications of additional segments of the audio piece or other audio) from additional user's devices fail to indicate an end of the audio piece.
- According to various example embodiments, the machine may be configured to identify an audio piece, even when a live version (e.g., a live cover version) of the audio piece is being performed differently from a reference version (e.g., a studio version or radio version) of the audio piece as recorded by an artist (e.g., same or different from the performer of the live version). The machine may receive a live fingerprint of the segment of the live version (e.g., within a query for identification of the audio piece during its performance). The fingerprinting technique used here, in contrast to traditional fingerprinting techniques that identify the exact time and frequency positions of audio events, may instead identify one or more core characteristics of the audio piece (e.g., the notes and rhythms present) and be robust to differences between the live version and a reference version of the audio piece (e.g., differences in tempo, vocal timber, vocal strength, vibrato, instrument tuning, ambient noise, reverberation, or distortions). For example, the fingerprinting technique may be based on a chromagram that represents the harmonic structure of the live version (e.g., mapped to one octave). Such a fingerprinting technique may also be used later to identify and retrieve user-uploaded recordings from the performance (e.g., for copyright clearance purposes, to automatically tag or index such recordings, or any suitable combination thereof). The machine may identify the performer of the live version (e.g., by detecting a venue at which the live version is being performed and accessing information that correlates the detected venue with the performer).
- The machine may then access a set of reference fingerprints that correspond to the artist that recorded the audio piece (e.g., based on the identified performer of the live version). For example, based on the identified performer (e.g., as well as a detected venue, a current date and time, or any suitable combination of), the machine may retrieve a list of audio pieces (e.g., a playlist, a concert program, or a concert brochure) that corresponds to the performer. Using the retrieved list, the machine may identify reference versions (e.g., official or canonical versions) of the audio pieces (e.g., recorded by the artist, who may be the same or different from the performer of the live version) and access reference fingerprints of the identified reference versions of the audio pieces. The reference fingerprints may have been previously generated from segments of the reference versions of the audio pieces, and among these reference fingerprints may be a reference fingerprint of the reference version of the audio piece whose live version is currently being performed.
- Accordingly, the machine may compare the live fingerprint of a segment of the live version of an audio piece to the set of reference fingerprints of segments from the reference versions of the audio piece. In some example embodiments, the machine compares the live fingerprint exclusively (e.g., only) to the set of reference fingerprints. Based on this comparison, the machine may identify a match between the live fingerprint and the reference fingerprint and thus identify the audio piece while the audio piece is being performed. Thus, based on this comparison, the machine may provide an identifier of the audio piece in a response to the query for identification of the audio piece. The identifier may be provided during the performance of the live version of the audio piece.
-
FIG. 1 is a network diagram illustrating a network environment suitable for audio identification, according to some example embodiments. Thenetwork environment 100 includes anidentification machine 110, adatabase 115, anddevices users device 160 and amixer 161 being operated by aperformer 162. Theidentification machine 110, thedatabase 115, thedevices mixer 161, may all be communicatively coupled (e.g., to each other) via anetwork 190. Theidentification machine 110, with or without thedatabase 115, may form all or part of a network-based system 105 (e.g., a cloud-based server system configured to provide one or more audio identification services to thedevices respective users identification machine 110, thedatabase 115, thedevices mixer 161 may each be implemented in a computer system, in whole or in part, as described below with respect toFIG. 10 . - Any one or more of the
users user 122 is not part of thenetwork environment 100, but is associated with thedevice 120 and may be a user of thedevice 120. For example, thedevice 120 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser 122. Similarly, theuser 132 is not part of thenetwork environment 100, but is associated with thedevice 130 and may be a user of thedevice 130. For example, thedevice 130 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser 132. - Likewise, the
user 142 is not part of thenetwork environment 100, but is associated with thedevice 140. As an example, thedevice 140 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser 142. Moreover, theuser 152 is not part of thenetwork environment 100, but is associated with thedevice 150. As an example, thedevice 150 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser 152. Furthermore, theperformer 162 is not part of thenetwork environment 100, but is associated with thedevice 160 and themixer 161. As an example, thedevice 160 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theperformer 162. - The
mixer 161 may be or include an audio playback device, an audio mixing device, an audio processing device, or any suitable combination thereof. According to various example embodiments, themixer 161 may drive (e.g., output signals that represent audio information to) one or more amplifiers, speakers, or other audio output equipment in producing sound for the audience during a performance of an audio piece by theperformer 162. In some example embodiments, themixer 161 is a source of one or more segments of a reference version of an audio piece (e.g., an audio piece to be identified later during performance of the audio piece). In certain example embodiments, themixer 161 may perform operations described herein for any one or more of thedevices - Any of the machines, databases, or devices shown in
FIG. 1 may be implemented in a general-purpose computer modified (e.g., configured or programmed) by software (e.g., one or more software modules) to be a special-purpose computer to perform one or more of the functions described herein for that machine, database, or device. For example, a computer system able to implement any one or more of the methodologies described herein is discussed below with respect toFIG. 10 . As used herein, a “database” is a data storage resource and may store data structured as a text file, a table, a spreadsheet, a relational database (e.g., an object-relational database), a triple store, a hierarchical data store, or any suitable combination thereof. Moreover, any two or more of the machines, databases, or devices illustrated inFIG. 1 may be combined into a single machine, and the functions described herein for any single machine, database, or device may be subdivided among multiple machines, databases, or devices. - The
network 190 may be any network that enables communication between or among machines, databases, and devices (e.g., theidentification machine 110 and the device 130). Accordingly, thenetwork 190 may be a wired network, a wireless network (e.g., a mobile or cellular network), or any suitable combination thereof. Thenetwork 190 may include one or more portions that constitute a private network, a public network (e.g., the Internet), or any suitable combination thereof. Accordingly, thenetwork 190 may include one or more portions that incorporate a local area network (LAN), a wide area network (WAN), the Internet, a mobile telephone network (e.g., a cellular network), a wired telephone network (e.g., a plain old telephone system (POTS) network), a wireless data network (e.g., WiFi network or WiMax network), or any suitable combination thereof. Any one or more portions of thenetwork 190 may communicate information via a transmission medium. As used herein, “transmission medium” refers to any intangible (e.g., transitory) medium that is capable of communicating (e.g., transmitting) instructions for execution by a machine (e.g., by one or more processors of such a machine), and includes digital or analog communication signals or other intangible media to facilitate communication of such software. -
FIG. 2 is a block diagram illustrating components of theidentification machine 110, according to some example embodiments. Theidentification machine 110 is shown as including anidentifier module 210, areception module 220, adetermination module 230, acorrelation module 240, thequery module 250, aresult module 260, aperformer module 270, areference module 280, and acomparison module 290, all configured to communicate with each other (e.g., via a bus, shared memory, or a switch). Any one or more of the modules described herein may be implemented using hardware (e.g., one or more processors of a machine) or a combination of hardware and software. For example, any module described herein may configure a processor (e.g., among one or more processors of a machine) to perform the operations described herein for that module. Moreover, any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices. -
FIGS. 3-9 are flowcharts illustrating operations in amethod 300 of audio identification (e.g., of an audio piece during a live performance of the audio piece), according to some example embodiments.FIG. 3 illustrates some interactions between theidentification machine 110 and the device 120 (e.g., a first device) during a performance of the audio piece by theperformer 162. These illustrated interactions may form a portion of themethod 300, according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments. -
Operation 310 may be performed at or near the beginning of the performance (e.g., at a first time, such as five or ten seconds into the performance) of the audio piece. Inoperation 310, theidentifier module 210 of theidentification machine 110 obtains an identifier of the audio piece. The identifier may be a title of the audio piece (e.g., a song name). As discussed below with respect toFIG. 6 , the identifier may be obtained in any of several ways. - In
operation 317, the device 120 (e.g., the first device) records a live segment of the audio piece being performed. For example, the live segment may be recorded by a microphone built into thedevice 120. According to various example embodiments, operation 370 may be performed at any point during the performance of the audio piece. - In
operation 318, thedevice 120 generates a live fingerprint of the live segment recorded inoperation 317. For example, thedevice 120 may apply one or more audio fingerprinting techniques (e.g., algorithms) to generate the live fingerprint. In some example embodiments, the audio fingerprinting technique (e.g., a first technique) used by thedevice 120 inoperation 318 is designated or selected (e.g., by the identification machine 110) as a default technique and may be designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice 120. - In
operation 319, thedevice 120 communicates (e.g., sends) the generated live fingerprint to the identification machine 110 (e.g., via the network 190). Incorresponding operation 320, thereception module 220 of theidentification machine 110 accesses (e.g., receives) the generated live fingerprint communicated by the device 120 (e.g., at a second time, such as 15 or 20 seconds into the performance). - According to certain example embodiments, operations 317-319 are performed by the
device 160 of theperformer 162, or by themixer 161. Thus, inoperation 320, thereception module 220 of theidentification machine 110 may access the generated live fingerprint as communicated by thedevice 160, or by the mixer 161 (e.g., at the second time). In some cases, the audio piece includes multiple audio channels (e.g., 64 separate audio channels being input into themixer 161, including a monophonic audio channel for a lead guitar, a monophonic audio channel for a bass guitar, left and right stereo audio channels for a synthesizer keyboard, and eight monophonic microphone channels for a drum kit). According to various example embodiments, the entire mix of these multiple channels is used for generating the live fingerprint inoperation 318. In some example embodiments, the generating of the live fingerprint inoperation 318 may be based on less than all of these multiple audio channels (e.g., generated from a subset of the multiple audio channels). For example, the live fingerprint may be generated exclusively from a monophonic audio channel for lead guitar. - Some example embodiments of the
method 300 includeoperation 328. Inoperation 328, thedetermination module 230 of theidentification machine 110 determines that the performance has not yet been completed (e.g., has not finished or is not yet done). As discussed below with respect toFIG. 7 , this determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both. Since the performance is not completed, thedetermination module 230 may determine that the respective times at which the identifier of the audio piece and the live fingerprint were accessed (e.g., the first time and the second time) occurred during the performance of the audio piece. - In
operation 330, thecorrelation module 240 of theidentification machine 110 assigns the identifier obtained inoperation 310 to the live fingerprint received inoperation 320. This may be based on the determination inoperation 328 that the performance is not over. Accordingly, thecorrelation module 240 may assign the identifier to the live fingerprint based on an inference that the performance of the audio piece is still ongoing (e.g., continuing). - In
operation 332, thequery module 250 of theidentification machine 110 accesses metadata of the audio piece (e.g., from the database 115). For example, thequery module 250 may generate a query based on (e.g., inclusive of) the identifier assigned to the live fingerprint inoperation 330. In some example embodiments, the query is generated based on the determination inoperation 328 the performance is not finished. Thequery module 250 may submit the generated query to thedatabase 115, and in response, thedatabase 115 may provide thequery module 250 with the metadata of the audio piece or access thereto. - In
operation 340, theresult module 260 of theidentification machine 110 provides the identifier and some or all of the metadata to the device 120 (e.g., via the network 190), during the performance of the audio piece. For example, theresult module 260 may communicate all or part of the identifier obtained inoperation 310 and all or part of the metadata accessed inoperation 332 to the device 120 (e.g., for presentation thereon, in whole or in part, to the user 122). Incorresponding operation 341, thedevice 120 accesses (e.g., receives) the information that was communicated (e.g., via the network 190) from theresult module 260 inoperation 340. -
FIG. 4 illustrates some interactions betweenidentification machine 110 and the device 130 (e.g., a second device) during the same performance of the audio piece by theperformer 162. These illustrated interactions may form a portion of themethod 300, according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments. - In
operation 417, the device 130 (e.g., the second device) records a live segment of the audio piece being performed. For example, the live segment may be recorded by a microphone built into thedevice 130. - In
operation 418, thedevice 130 generates a live fingerprint of the live segment recorded inoperation 417. For example, thedevice 130 may apply one or more audio fingerprinting techniques to generate the live fingerprint. In some example embodiments, the audio fingerprinting technique (e.g., a first technique) to be used by thedevice 130 inoperation 418 has been designated or selected (e.g., by the identification machine 110) as a default technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice 130. However, in alternative example embodiments, the audio fingerprinting technique (e.g., a second technique) to be used by thedevice 140 inoperation 518 is a different (e.g., non-default) technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice 140. - In some cases, the audio piece includes multiple audio channels (e.g., 64 separate audio channels, including a monophonic audio channel for a lead guitar, a monophonic audio channel for a bass guitar, left and right stereo audio channels for a synthesizer keyboard, and eight monophonic microphone channels for a drum kit). In some example embodiments, the generating of the live fingerprint in
operation 418 may be based on less than all of these multiple audio channels (e.g., generated from a subset of the multiple audio channels). For example, the live fingerprint may be generated exclusively from a monophonic audio channel for lead guitar. As another example, the live fingerprint may be generated exclusively from a monophonic vocal track (e.g., using vocal melody and lyrics for generating the live fingerprint). According to various example embodiments, the live fingerprint is generated from one or more audio channels that are dominant throughout the audio piece, which may facilitate reliable and consistent identification of the audio piece. - In
operation 419, thedevice 130 communicates the generated live fingerprint to the identification machine 110 (e.g., via the network 190). The live fingerprint may be communicated in a query for identification of the audio piece, and such a query may be submitted from thedevice 130 to the network-basedsystem 105 during the performance of the audio piece. Incorresponding operation 420, thereception module 220 of theidentification machine 110 accesses the generated live fingerprint communicated by the device 130 (e.g., at a third time, such as 30 or 35 seconds into the performance). - In
operation 428, thedetermination module 230 of theidentification machine 110 determines that the performance is not done (e.g., not yet ended, completed, finished, or over). As discussed in greater detail below with respect toFIG. 7 , this determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both. - In
operation 440, theresult module 260 of theidentification machine 110 provides the identifier (e.g., assigned in operation 330) and some or all of the metadata to the device 130 (e.g., via the network 190). For example, theresult module 260 may communicate all or part of the identifier obtained inoperation 310 and all or part of the metadata accessed inoperation 332 to the device 130 (e.g., for presentation thereon, in whole or in part, to the user 132). Incorresponding operation 441, thedevice 130 accesses the information that was communicated from theresult module 260 inoperation 340. This may have the effect of providing the identifier of the audio piece in a response to the query for identification of the audio piece, during the performance of the audio piece. According to various example embodiments, the identifier may be accompanied by additional information (e.g., metadata of the audio piece). Such additional information may include lyrics, album art, original release year, original composer, other performers of the audio piece, or other metadata of the audio piece, as well as an offer to sell a recording (e.g., original or non-original) of the audio piece. - In some example embodiments, the identifier may be accompanied by an authorization, such as an authorization to access backstage passes or a merchandise offer (e.g., for free or discounted merchandise related to the audio piece, to the performer, or to both). In various example embodiments, the authorization enables software (e.g., an application, an applet, or a mobile app) executing on the
device 130 to access special content that may be presented on the device 130 (e.g., on a screen of the device 130). Examples of such special content include screen lighting or imagery (e.g., a slideshow or background image), a game (e.g., a single-player or multiplayer quiz or treasure hunt), or any suitable combination thereof. For example, a game may challenge theuser 132 to win a prize (e.g., an album on compact disc (CD) or as a music download, exclusive video footage, a t-shirt, or other merchandise item) by correctly identifying multiple audio pieces performed by theperformer 162 or by being the first to correctly identify all songs released on a specific album. -
FIG. 5 illustrates some interactions between theidentification machine 110 and the device 140 (e.g., a third device) during a live performance of an audio piece by theperformer 162. In some example embodiments, the live performance is the same performance discussed above with respect toFIGS. 3-4 . In certain example embodiments, theperformer 162 is performing a live version (e.g., a live cover version) of an audio piece differently from a reference version (e.g., a studio version or radio version) of the audio piece as recorded by an artist who may be the same or different from theperformer 162 of the live version. These illustrated interactions may form a portion of themethod 300, according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments. For example, in some example embodiments, theidentification machine 110 performs onlyoperations operations FIGS. 3 and 4 . - In
operation 517, the device 140 (e.g., the third device) records a live segment of the audio piece being performed. For example, the live segment may be recorded by a microphone built into thedevice 140. In particular, thedevice 140 may record a live segment of a live version (e.g., a live cover version) of the audio piece, as the live version of the audio piece is being performed. As another example, the live segment may be received (e.g., as a digital feed, a network stream, a broadcast signal, or any suitable combination thereof) by thedevice 140 via the network 190 (e.g., from theidentification machine 110, thedevice 160, or the mixer 161). - In
operation 518, thedevice 140 generates a live fingerprint of the live segment recorded inoperation 517. For example, thedevice 140 may apply one or more audio fingerprinting techniques to generate the live fingerprint. In some example embodiments, the audio fingerprinting technique (e.g., a first technique) to be used by thedevice 140 inoperation 518 is designated or selected (e.g., by the identification machine 110) as a default technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice 140. However, in alternative example embodiments, the audio fingerprinting technique (e.g., a second technique) to be used by thedevice 140 inoperation 518 is a different (e.g., non-default) technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice 140. In some example embodiments, the audio fingerprinting technique (e.g., the second technique) is particularly suitable for live version identification and may implement one or more image processing techniques to derive fingerprints that are robust to both audio degradations and audio variations, while still being compact enough for efficient matching. Further details on such an audio fingerprinting technique are provided below. - In
operation 519, thedevice 140 communicates the generated live fingerprint to the identification machine 110 (e.g., via the network 190). The live fingerprint may be communicated in a query for identification of the audio piece, and such a query may be submitted from thedevice 140 to the network-basedsystem 105 during the performance of the audio piece (e.g., the live version of the audio piece). Incorresponding operation 520, thereception module 220 of theidentification machine 110 accesses the generated live fingerprint communicated by the device 140 (e.g., at any point in time during the performance of the audio piece, such as 5, 10, 15, 20, 30, 40, or 45 seconds into the performance). - In
operation 530, theperformer module 270 of theidentification machine 110 identifies the performer of the live version of the audio piece. For example, theperformer module 270 may detect the venue of the live performance (e.g., the place or location where the live performance is occurring) and identify the performer based on the detected venue (e.g., by accessing information, which may be stored in thedatabase 115, that correlates the performer with the venue). For example, the detected venue may be a concert hall, an auditorium, a hotel, a conference room, a resort, a school, a theater, an amphitheater, a fairground, a sports arena, a stadium, a private residence, or any suitable combination thereof. As discussed below with respect toFIG. 8 , the detection of the venue may be based on a geolocation (e.g., Global Positioning System (GPS) coordinates) of thedevice 140, an identifier (e.g., Internet protocol (IP) address) of a network (e.g., network 190) at the venue (e.g., a local wireless network at the venue), an image (e.g., photo) of a ticket stub for an event that includes the live performance (e.g., generated by thedevice 140 and accessed by the performer module 270), a user preference for the venue (e.g., stored in a user profile of the user 142), social network data that references the venue (e.g., publicly or privately published in a microblog entry by the user 142), a calendar event of theuser 142, a purchase record of the user 142 (e.g., for tickets to an event that includes live performance), or any suitable combination thereof. In further example embodiments, the venue may be detected by detecting that thedevice 140 is executing a special application that corresponds to the venue, is accessing a specific uniform resource locator (URL) that corresponds the venue, or any suitable combination thereof. - In
operation 540, thereference module 280 of theidentification machine 110 accesses a set of one or more reference fingerprints based on the performer identified inoperation 530. Furthermore, the accessing of the reference fingerprints may be also based on the detected venue at which the live version is being performed, a current date, current time, or any suitable combination thereof. As noted above, thereference module 280 may retrieve a list of audio pieces (e.g., playlist, concert program, a concert brochure, or concert poster) for the performer (e.g., based on the detected venue and at the current date and current time). Based on this retrieved list, thereference module 280 may identify reference versions (e.g., official or canonical versions) of the audio pieces that correspond to the performer (e.g., and corresponding to the detected venue, the current date, the current time, or any suitable combination thereof). Thedatabase 115 may store these reference fingerprints, which may have been previously generated from segments of the reference versions of the audio pieces. Among these reference fingerprints may be a reference fingerprint (e.g., a particular reference fingerprint) of a reference version of the audio piece of which a live version is currently being performed. The set of reference fingerprints may be accessed from thedatabase 115, which may correlate (e.g., assign, map, or link) the reference fingerprint (e.g., the particular reference fingerprint) of the reference version with the identifier of the audio piece (e.g., as assigned in operation 330). According to various example embodiments,operation 540 may be performed at any point prior to operation 550 (e.g., before the performance of the audio piece). In example embodiments, in whichoperation 540 is performed prior to the beginning of the performance, the accessing of the reference fingerprints may be based on a scheduled date and time for the performance itself. - In
operation 550, thecomparison module 290 of theidentification machine 110 identifies the audio piece being performed by comparing the live fingerprint (e.g., accessed in operation 520) to the set of reference fingerprints (e.g., accessed in operation 540). In other words, thecomparison module 290 may compare the live fingerprint of a segment of the live version to the reference fingerprints of segments of the reference versions. In some example embodiments, thecomparison module 290 compares the live fingerprint exclusively (e.g., only) to the set of reference fingerprints or a subset thereof. This may have the effect of reducing computational complexity, increasing computational speed, increasing accuracy, or any suitable combination thereof. Based on this comparison, thecomparison module 290 may identify a match between the live fingerprint and the reference fingerprint (e.g., the particular reference fingerprint) of the reference version of the audio piece of which the live version is currently being performed. Based on this identifying of the match, thecomparison module 290 may identify the audio piece while its live version is being performed. In some example embodiments, the identified match between the live fingerprint and the reference fingerprint may be an imperfect match (e.g., a fuzzy match or a near match). - According to various example embodiments,
operation 550 includes performing an analysis of musically meaningful and unique features audio piece, and then performing a loose comparison that allows for differences in the playing and interpretation of the audio piece (e.g., different instrumentation, tempo, or intonation). In some example embodiments,operation 550 includes determines harmonic and rhythmic elements from the live fingerprint and the set of reference fingerprints and compares these elements to find a most likely candidate match among the set of reference fingerprints. Such an analysis and comparison may be performed within a predetermined period of time (e.g., a 10 second window). In some situations, the analysis and comparison are performed in short segments (e.g., 3 second segments). The analysis and comparison may be performed until a single match (e.g., best candidate) is found, or until the analysis and comparison converge to obtain a stabilized list of a few candidate matches. For example, multiple candidate matches maybe identified in situations where the set of reference fingerprints includes reference fingerprints from multiple different recordings of the audio piece (e.g., studio recordings, live recordings, and variations, such as acoustic versions or extended remixes). - In
operation 560, theresult module 260 of theidentification machine 110 provides the identifier (e.g., as assigned in operation 330) of the identified audio piece to the device 140 (e.g., by the network 190). The identifier may be provided with some or all of the metadata for the audio piece. For example, theresult module 260 may communicate all or part of the identifier obtained inoperation 310 and all or part of the metadata accessed inoperation 332 to the device 140 (e.g., for presentation thereon, in whole or in part, to the user 142). Incorresponding operation 561, thedevice 140 accesses the information that was communicated from theresult module 260 inoperation 560. This may have the effect of providing the identifier of the audio piece in a response to the query for identification of the audio piece, during the performance of the live version of the audio piece. In example embodiments where the identified match between the live fingerprint and the reference fingerprint is an imperfect match (e.g., fuzzy match), the identifier may be provided as a candidate identifier (e.g., a proposed identifier) among multiple candidate identifiers (e.g., for confirmation by theuser 142 via the device 140). For example, a candidate identifier may be provided as part of a game (e.g., a trivia quiz) in which multiple users (e.g.,users - As mentioned above, the audio fingerprinting technique used (e.g., by the device 140) for identifying the live version of the audio piece may be particularly well-suited for generating fingerprints that are robust to both audio degradations and audio variations, while still being compact enough for efficient matching. Such a fingerprint may be derived from a segment of an audio piece (e.g., a live segment or a reference segment) by first using a log-frequency spectrogram to capture the melodic similarity and handle key variations, and then using adaptive thresholding to reduce the feature size and handle noise degradations and local variations.
- First, the segment to be transformed into a time-frequency representation, such as a log-frequency spectrogram based on the Constant Q Transform (CQT). The CQT is a transform with a logarithmic frequency resolution, similar to the human auditory system and consistent with the notes of the Western music scale. Accordingly, the CQT may be well-suited for music analysis. The CQT may handle key variations relatively easily, since pitch deviations correspond to frequency translations in the transform. According to certain example embodiments, the CQT is computed by using a fast algorithm based on the Fast Fourier Transform (FFT) in conjunction with the use of a kernel. Thus, a CQT-based spectrogram may be derived by using a time resolution of around 0.13 seconds per time frame and the frequency resolution up one quarter note per frequency channel, with a frequency range spanning from C3 (130.81 Hz) to C8 (4186.01 Hz), resulting in 120 frequency channels.
- Next, the CQT-based spectrogram may be transformed into a binary image. According to various example embodiments, this is performed using adaptive thresholding method based on two-dimensional median filtering. Thresholding is a technique for image segmentation that uses a threshold value to turn a grayscale image into a binary image. In adaptive thresholding, the threshold value for each pixel of an image may be adapted based on local statistics of the pixel's neighborhood. For each time-frequency bin in the CQT-based spectrogram, given a window size, the median of the neighborhood may be computed. As an example, the window size may be 35 frequency channels by 15 time frames. Then, the value of the bin may be compared with the value of its median. If the value of the bin is higher than its median, the value of the bin may be assigned to 1. If otherwise, the value of the bin may be assigned to 0. This process may be restated as the following equation:
-
- Accordingly, the CQT-based spectrogram may be clustered into foreground (e.g., with assigned values of one) where the energy is locally high, or background (e.g., with assigned values of zero) with the energy is locally low. The result may therefore be used as a compact fingerprint (e.g., a CQT-based fingerprint) that can handle noise degradations while still allowing local variations.
- Such compact (e.g., CQT-based) fingerprints may be used to perform comparisons and matching between a query fingerprint and one or more reference fingerprints. As an example, template matching may be performed (e.g., by the
comparison module 290 during operation 550) between query and reference fingerprints by first using Hamming similarity to compare all pairs of time frames at different pitch shifts and handle key variations, and then using the Hough Transform to find the best alignment and handle tempo variations. - First, a similarity matrix may be computed between a query fingerprint and a reference fingerprint. As noted above, Hamming similarity may be calculated between all pairs of time frames in the query fingerprint and the reference fingerprints. The Hamming similarity is the percentage of pins that matches between two arrays (e.g., arrays of ones and zeroes). In some example embodiments, the query and reference fingerprints are converted according to the function ƒ(x)=2x−1. Then, the matrix product of the query and reference fingerprints may be computed. This matrix product may then be converted according to the function ƒ−1(x)=(x+1)/2, and each value may be normalized by the number of frequency channels in one fingerprint. Each bin in the resulting matrix then measures the Hamming similarity between any two pairs of time frames in the query and reference fingerprints. The similarity matrix for different pitch shifts in the query may also be computed. In some cases, a number of ±10 pitch shifts may be used (e.g., assuming a maximum key variation of ±5 semitones between a live performance and its studio version). This may have the effect of measuring the similarity of both the foregrounds and the backgrounds between the query and reference fingerprints, which may be beneficial in identifying an audio piece.
- Next, the best alignment between the query fingerprint and the reference fingerprint may be identified. For example, the best alignment may correspond to a line that is at or near an angle of 45° in the similarity matrix and that intersects the bins with the largest calculated Hamming similarity. Such a line may be parametrically represented as ρ=x cos θ+y sin θ. As noted above, the Hough Transform may be used to determine the best alignment. The Hough Transform is a technique for detecting shapes (e.g., lines) in an image by building a parameter space matrix and identifying the parameter candidates that give the largest values. In some example embodiments, the similarity matrix computed above may be binarized based on a threshold value. The Hough Transform may then be computed, and the (ρ,θ) candidate that gives the largest normalized value in the space parameter matrix may be identified (e.g., as the highest overall Hamming similarity). As examples, the threshold value may be 0.6; a range for ρ may be equal to the number of time frames in the reference fingerprints; and a range for θ may be around −45°±5°, which may correspond to a number of ±10 time shifts (e.g., assuming a maximum tempo variation of ±20% between a live performance and its studio version). This may have the effect of identifying a short and noisy excerpt (e.g., recorded from a smartphone at a live performance) by comparing the excerpt to a database of studio recordings from a known performer or known artist. According to certain example embodiments, no hash functions are used in the above fingerprinting and matching techniques. This may have the effect of obtaining greater accuracy. In situations with relatively short queries (e.g., segments of audio less than 10 seconds in duration) and relatively small databases (e.g., 50-100 songs per artist or performer), the lack of hash functions may provide such increased accuracy without sacrificing system performance.
- As shown in
FIG. 6 , themethod 300 or portions thereof may include one or more ofoperations operations operation 310, in which theidentifier module 210 obtains the identifier of the audio piece. Inoperation 610, according to some example embodiments, theidentifier module 210 receives the identifier in a user submission from theuser 122 via the device 120 (e.g., the first device). For example, theuser 122 may be a manager, promoter, moderator, or other authoritative person for the event in which the live performance occurs, and theuser 122 may submit the identifier to the network-based system 105 (e.g., so thatother users respective devices device 160 of theperformer 162, themixer 161, or any suitable combination thereof. - In
operation 620, according to certain example embodiments, theidentifier module 210 receives some metadata of the audio piece (e.g., without the identifier of the audio piece) from the device 120 (e.g., the first device, as a user submission). Such metadata may include one or more descriptors of the audio piece (e.g., an artist name, an album name, a release year, or a genre). For example, theuser 122 may be an audience member that does not know the identifier of the audio piece, but knows at least some metadata of the audio piece (e.g., the artist name, the album name, the release year, the genre, or even a portion of the identifier of the audio piece). In such a situation, theuser 122 may submit what he knows to the network-basedsystem 105. This operation may be repeated for additional users (e.g., user 152) to obtain additional metadata of the audio piece. The metadata received in operation 620 (e.g., from one ormore users 122 and 152) may be a basis (e.g., a sufficient basis) for theidentifier module 210 to obtain the identifier of the audio piece (e.g., from thedatabase 115, which may correlate the metadata with the identifier of the audio piece). In some example embodiments, the metadata is received from thedevice 160 of theperformer 162, themixer 161, or any suitable combination thereof. - In
operation 630, theidentifier module 210 detects a geolocation of the device 120 (e.g., the first device). This may be performed based on an indication that theuser 122 has made thedevice 120 available for location-based services (e.g., stored by thedatabase 115 in a user profile for the user 122). The detected geolocation may be a basis (e.g., a sufficient basis) for theidentifier module 210 to obtain the identifier of the audio piece (e.g., from thedatabase 115, which may correlate the location of the venue at which the audio piece is being performed with the identifier of the audio piece). - In
operation 640, theidentifier module 210 queries thedatabase 115 for the identifier of the audio piece. This query may be made based on the metadata of the audio piece received in operation 620 (e.g., one or more descriptors of the audio piece), the geolocation of the device 120 (e.g., the first device) detected inoperation 630, or any suitable combination thereof. - In
operation 650, theidentifier module 210 may have performed multiple instances ofoperation 610 and received multiple submissions that attempt to submit the identifier of the audio piece (e.g., submissions that include both correct and incorrect identifiers). In situations where the multiple submissions are not unanimous, theidentifier module 210 performsoperation 650 by tallying votes for the identifier of the audio piece. For example, theidentifier module 210 may count the quantity of submissions received for each distinct identifier. In some example embodiments, the identifier with the most votes is selected by theidentifier module 210 as the identifier of the audio piece inoperation 310. In alternative example embodiments, an identifier with less than the largest number of votes is selected based on results from one or more ofoperations devices users performer 162 or by being the first to correctly identify all songs released on a specific album. - In
operation 660, theidentifier module 210 accesses the identifier of the audio piece (e.g., directly or indirectly) from thedevice 160 of theperformer 162, themixer 161, or any suitable combination thereof. For example, in a nightclub environment, thedevice 160 may be a computer operated by a disc jockey (DJ) and configured to play the audio piece (e.g., execute the performance of the audio piece). As another example, themixer 161 may be or include a computer that executes audio mixing software (e.g., programmed with a list of song names and start times). Theidentifier module 210 may thus obtain (e.g., read) the identifier of the audio piece based on a playlist, current date, current time, or any suitable combination thereof. In some example embodiments, theidentifier module 220 receives the identifier in response to an event within audio renderer that is executing on thedevice 160, themixer 161, or both. Examples of such an event include a play event, a stop event, a pause event, a scratch event, a playback position timer event, or any suitable combination thereof. - As shown in
FIG. 7 , themethod 300 or portions thereof may include one or more ofoperations method 300 that include one or more ofoperations operations operations determination module 230 of theidentification machine 110 determining that the performance of the audio piece is not done. This determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both. - In
operation 710, thereception module 220 of theidentification machine 110 accesses (e.g., receives) one or more live fingerprints of segments of the audio piece. These live fingerprints may be received from one or more devices (e.g.,devices determination module 230 in performingoperation 328,operation 420, or both. Accordingly,operation 710 may be performed any number of times betweenoperations operations -
Operation 720 may be performed as part ofoperation 328, in which thedetermination module 230 determines that the performance of the audio piece is not done. In some example embodiments,operation 720 may be performed as part ofoperation 428, which is similar tooperation 328. Inoperation 720, thedetermination module 230 determines that the live fingerprints received inoperation 710 fail to indicate an end of the audio piece (e.g., that the fingerprints fail to indicate that the performance of the audio piece has ended). One or more ofoperations operation 720. - In
operation 722, thedetermination module 230 fails to detect silence beyond a threshold period of time (e.g., first threshold duration corresponding to a period of silence indicative of an end of a performance). Thus, the determination inoperation 720 that the performance is not over may be based on an absence of silence that lasts longer than this threshold period of time. - In
operation 724, thedetermination module 230 fails to detect applause beyond a threshold period of time (e.g., a second threshold duration corresponding to a period of clapping or cheering indicative of an end of the performance). Thus, the determination inoperation 720 that the performance is not over may be based on an absence of applause that lasts longer than this threshold period of time. - In
operation 726, thedetermination module 230 fails to detect booing beyond a threshold period of time (e.g., a third threshold duration corresponding to a period of groaning or jeering indicative of an end of the performance). Thus, the determination inoperation 720 that the performance is not over may be based on an absence of booing that lasts longer than this threshold period of time. - As shown in
FIG. 8 , themethod 300 or portions thereof may include one or more ofoperations operation 530, in which theperformer module 270 of theidentification machine 110 may identify the performer by detecting the venue of the performance of the audio piece (e.g., a live performance of a live version of the audio piece). - In
operation 831, theperformer module 270 accesses a geolocation (e.g., GPS coordinate) of the device 140 (e.g., the third device) from which the live fingerprint was received inoperation 520. In some example embodiments, the geolocation is received with the live fingerprint inoperation 520. - In
operation 832, theperformer module 270 accesses an identifier of a network at the venue (e.g., an IP address or a domain name of the network 190) from the device 140 (e.g., the third device) from which the live fingerprint was received inoperation 520. Such a network may be or include a local wireless network at the venue. For example, the identifier may identify thenetwork 190 to which thedevice 140 is communicatively coupled. In some example embodiments, the identifier of thenetwork 190 is received with the live fingerprint inoperation 520. - In
operation 833, theperformer module 270 accesses an image (e.g., a photo) of a ticket stub for an event that includes the live performance of the audio piece. For example, such an image may be generated (e.g., captured or taken) by a built-in camera within the device 140 (e.g., the third device) from which the live fingerprint was received inoperation 520. In some example embodiments, the image of the ticket stub is received with the live fingerprint inoperation 520. - In
operation 834, theperformer module 270 accesses a user preference for the venue (e.g., stored in a user profile of theuser 142 within the database 115). For example, thedatabase 115 may store a user profile that indicates the venue is the closest of multiple available venues to a residence of theuser 142, who is associated with (e.g., corresponds to) the device 140 (e.g., the third device) from which the live fingerprint was received inoperation 520. In some example embodiments, the user preference for the venue is received with the live fingerprint inoperation 520. - In
operation 835, theperformer module 270 accesses social network data of the user 142 (e.g., stored within thedatabase 115 or accessible via thenetwork 190 from a third-party social network server). For example, thedatabase 115 may store social network data descriptive of the user 142 (e.g., status updates, microblog posts, images, comments, likes, favorites, or other public, private, or semiprivate publications to friends of the user 142), and some or all of the social network data may reference the venue or otherwise indicate that theuser 142 is located at the venue where the live performance is taking place at the current date and current time. Since theuser 142 is associated with (e.g., corresponds to) the device 140 (e.g., the third device) from which the live fingerprint was received inoperation 520, theperformer module 270 may detect the venue of the live performance based on the social network data of theuser 142. In some example embodiments, the social network data is received with the live fingerprint inoperation 520. - In
operation 836, theperformer module 270 accesses a calendar event of the user 142 (e.g., stored within thedatabase 115 or accessible via thenetwork 190 from a third-party calendar server). For example, thedatabase 115 may store calendar data for the user 142 (e.g., meetings, appointments, or other scheduled events), and the accessed calendar event may indicate that theuser 142 is located at the venue where the live performance is taking place at the current date and current time. Since theuser 142 is associated with (e.g., corresponds to) the device 140 (e.g., the third device) from which the live fingerprint was received inoperation 520, theperformer module 270 may detect the venue of the live performance based on the calendar event of theuser 142. In some example embodiments, the calendar event is received with the live fingerprint inoperation 520. - In
operation 837, theperformer module 270 accesses a purchase record (e.g., transaction record) of the user 142 (e.g., stored within thedatabase 115 or accessible via thenetwork 190 from a third-party transaction server). For example, thedatabase 115 may store purchase data for the user 142 (e.g., transaction records for purchases made by the user 142), and the purchase record may indicate that theuser 142 purchased a ticket (e.g., from the venue) for an event at which the live performance is taking place at the current date and current time. Since theuser 142 is associated with (e.g., corresponds to) the device 140 (e.g., the third device) from which the live fingerprint was received inoperation 520, theperformer module 270 may detect the venue of the live performance based on the purchase record of theuser 142. In some example embodiments, the purchase record is received with the live fingerprint inoperation 520. - As shown in
FIG. 9 , themethod 300 or portions thereof may includeoperation 910, which in turn may include one or more ofoperations operation 910 may be performed at any point prior tooperation 540, in which thereference module 280 of theidentification machine 110 accesses the reference fingerprints. For example,operation 910 may be performed prior to the beginning of the performance itself. In some example embodiments,operation 910 is performed each time theperformer 160 or an artist that originally recorded the audio piece releases new material (e.g., new recordings of audio pieces). In certain example embodiments,operation 910 is performed periodically (e.g., at regularly scheduled intervals of time). - In
operation 910, thereference module 280 of theidentification machine 110 builds the set of reference fingerprints to be accessed inoperation 540. Thereference module 280 may do this by generating some or all of thedatabase 115. One or more of operations 911-960 may be performed as part ofoperation 910. - In
operation 911, thereference module 280 accesses a schedule for a venue at which an event that includes the live performance will be take place. For example, thereference module 280 may access a venue schedule in the form of an event calendar (e.g., a concert calendar) for the venue, a playlist for the venue, an agenda for the venue, an advertisement (e.g., poster) for the venue, or any suitable combination thereof. The schedule may be accessed from information previously collected and stored in thedatabase 115 or from a third-party server corresponding to the venue itself. According to various example embodiments, the accessed schedule may correlate the venue with theperformer 162 of the audio piece, correlate the venue with an artist that recorded a reference version of the audio piece (e.g., an original artist that recorded a studio recording of the audio piece or a live recording of the audio piece), correlate the venue with a period of time during which the live fingerprint is received inoperation 520, or any suitable combination thereof. - In
operation 912, thereference module 280 determines (e.g., identifies) theperformer 162 based on the schedule accessed inoperation 911. For example, theperformer 162 may be determined based on the artist being correlated with the venue by the schedule accessed inoperation 911. As another example, theperformer 162 may be determined based on the period of time during which the live fingerprint is received inoperation 520 being correlated with the artist by the schedule. This determination of theperformer 162 may enable theidentification machine 110 to infer the likely audio pieces to be played and thus significantly reduce the number of possible audio pieces that may be performed during the live performance. - In
operation 913, thereference module 280 accesses (e.g., retrieves) studio reference fingerprints of segments of studio recordings by an artist (e.g., original artist). In some example embodiments, the artist is theperformer 162, though this need not be the case. The studio reference fingerprints may be accessed from information previously collected and stored in thedatabase 115 or from a third-party server (e.g., corresponding to the venue, to the artist, to theperformer 162, or any suitable combination thereof). - In
operation 914, thereference module 280 accesses (e.g., retrieves) live reference fingerprints of segments of studio recordings by the artist (e.g., original artist). As noted above, the artist may be theperformer 162, though this need not be the case. The live reference fingerprints may be accessed from information previously collected and stored in thedatabase 115 or from a third-party server (e.g., corresponding to the venue, to the artist, to theperformer 162, or any suitable combination thereof). In some example embodiments where theperformer 162 is the artist, themixer 161 is the source of one or more segments of a reference version of the audio piece whose live version is being performed, and one or more of the live reference fingerprints are generated (e.g., by the reference module 280) from such segments received from themixer 161. In addition, themixer 161, thedevice 160 of theperformer 162, or both, may provide thereference module 280 with metadata (e.g., at least some of the metadata accessed in operation 332) that describes or identifies the audio piece, one or more live recordings of the audio piece, one or more studio recordings of the audio piece, or any suitable combination thereof (e.g., for storage in thedatabase 115 and for access by the query module 250). - In
operation 915, thereference module 280 accesses (e.g., retrieves) a previously played playlist from a previously performed performance by the same artist (e.g., the performer 162). This may enable theidentification machine 110 to further infer the most likely audio pieces to be played and thus even further reduce the number possible audio pieces that may be performed during a live performance. According to some example embodiments, the previously played playlist may be a basis for weighting one or more of multiple candidate identifiers of the audio piece. Similarly, identifiers of audio pieces already performed during the current performance may be accorded lower weights or omitted from consideration, since it may be unlikely that theperformer 162 will perform the same audio piece twice in one show, particularly, back-to-back or within a short time window (e.g., 20 minutes). - In
operation 916, thereference module 280 accesses (e.g., retrieves) fingerprints for segments of likely or most likely audio pieces to be played by theperformer 162. These accessed fingerprints may then be designated by thereference module 280 as the set of reference fingerprints to be accessed inoperation 540. As noted above, these accessed fingerprints may be stored in thedatabase 115, for later use (e.g., in operation 540). - According to various example embodiments, one or more of the methodologies described herein may facilitate identification of an audio piece during its performance. Moreover, one or more of the methodologies described herein may facilitate identification of the audio piece during performance of a live version of the audio piece, even where the live version differs from previously recorded versions of the audio piece. Hence, one or more the methodologies described herein may facilitate retrieval and presentation of information regarding the identified audio piece (e.g., its identifier and some or all of its metadata) to one or more audience members during performance of the same audio piece. Furthermore, one or more of the methodologies described herein may facilitate identification and tagging of recordings that were made during the performance.
- When these effects are considered in aggregate, one or more of the methodologies described herein may obviate a need for certain efforts or resources that otherwise would be involved in identifying an audio piece during the performance of an audio piece. Efforts expended by a user may be reduced by one or more of the methodologies described herein. Computing resources used by one or more machines, databases, or devices (e.g., within the network environment 100) may similarly be reduced. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption, and cooling capacity.
-
FIG. 10 is a block diagram illustrating components of amachine 1000, according to some example embodiments, able to readinstructions 1024 from a machine-readable medium 1022 (e.g., a non-transitory machine-readable medium, a machine-readable storage medium, a computer-readable storage medium, or any suitable combination thereof) and perform any one or more of the methodologies discussed herein, in whole or in part. Specifically,FIG. 10 shows themachine 1000 in the example form of a computer system (e.g., a computer) within which the instructions 1024 (e.g., software, a program, an application, an applet, an app, or other executable code) for causing themachine 1000 to perform any one or more of the methodologies discussed herein may be executed, in whole or in part. - In alternative embodiments, the
machine 1000 operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, themachine 1000 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a distributed (e.g., peer-to-peer) network environment. Themachine 1000 may be a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a cellular telephone, a smartphone, a set-top box (STB), a personal digital assistant (PDA), a web appliance, a network router, a network switch, a network bridge, or any machine capable of executing theinstructions 1024, sequentially or otherwise, that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute theinstructions 1024 to perform all or part of any one or more of the methodologies discussed herein. - The
machine 1000 includes a processor 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a radio-frequency integrated circuit (RFIC), or any suitable combination thereof), amain memory 1004, and astatic memory 1006, which are configured to communicate with each other via abus 1008. Theprocessor 1002 may contain microcircuits that are configurable, temporarily or permanently, by some or all of theinstructions 1024 such that theprocessor 1002 is configurable to perform any one or more of the methodologies described herein, in whole or in part. For example, a set of one or more microcircuits of theprocessor 1002 may be configurable to execute one or more modules (e.g., software modules) described herein. - The
machine 1000 may further include a graphics display 1010 (e.g., a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, a cathode ray tube (CRT), or any other display capable of displaying graphics or video). Themachine 1000 may also include an alphanumeric input device 1012 (e.g., a keyboard or keypad), a cursor control device 1014 (e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, an eye tracking device, or other pointing instrument), astorage unit 1016, an audio generation device 1018 (e.g., a sound card, an amplifier, a speaker, a headphone jack, or any suitable combination thereof), and anetwork interface device 1020. - The
storage unit 1016 includes the machine-readable medium 1022 (e.g., a tangible and non-transitory machine-readable storage medium) on which are stored theinstructions 1024 embodying any one or more of the methodologies or functions described herein. Theinstructions 1024 may also reside, completely or at least partially, within themain memory 1004, within the processor 1002 (e.g., within the processor's cache memory), or both, before or during execution thereof by themachine 1000. Accordingly, themain memory 1004 and theprocessor 1002 may be considered machine-readable media (e.g., tangible and non-transitory machine-readable media). Theinstructions 1024 may be transmitted or received over thenetwork 190 via thenetwork interface device 1020. For example, thenetwork interface device 1020 may communicate theinstructions 1024 using any one or more transfer protocols (e.g., hypertext transfer protocol (HTTP)). - In some example embodiments, the
machine 1000 may be a portable computing device, such as a smart phone or tablet computer, and have one or more additional input components 1030 (e.g., sensors or gauges). Examples ofsuch input components 1030 include an image input component (e.g., one or more cameras), an audio input component (e.g., a microphone), a direction input component (e.g., a compass), a location input component (e.g., a global positioning system (GPS) receiver), an orientation component (e.g., a gyroscope), a motion detection component (e.g., one or more accelerometers), an altitude detection component (e.g., an altimeter), and a gas detection component (e.g., a gas sensor). Inputs harvested by any one or more of these input components may be accessible and available for use by any of the modules described herein. - As used herein, the term “memory” refers to a machine-readable medium able to store data temporarily or permanently and may be taken to include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. While the machine-
readable medium 1022 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions. The term “machine-readable medium” shall also be taken to include any medium, or combination of multiple media, that is capable of storing theinstructions 1024 for execution by themachine 1000, such that theinstructions 1024, when executed by one or more processors of the machine 1000 (e.g., processor 1002), cause themachine 1000 to perform any one or more of the methodologies described herein, in whole or in part. Accordingly, a “machine-readable medium” refers to a single storage apparatus or device, as well as cloud-based storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, one or more tangible (e.g., non-transitory) data repositories in the form of a solid-state memory, an optical medium, a magnetic medium, or any suitable combination thereof. - Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
- Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute software modules (e.g., code stored or otherwise embodied on a machine-readable medium or in a transmission medium), hardware modules, or any suitable combination thereof. A “hardware module” is a tangible (e.g., non-transitory) unit capable of performing certain operations and may be configured or arranged in a certain physical manner. In various example embodiments, one or more computer systems (e.g., a standalone computer system, a client computer system, or a server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
- In some embodiments, a hardware module may be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware module may include dedicated circuitry or logic that is permanently configured to perform certain operations. For example, a hardware module may be a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC. A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware module may include software encompassed within a general-purpose processor or other programmable processor. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- Accordingly, the phrase “hardware module” should be understood to encompass a tangible entity, and such a tangible entity may be physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module comprises a general-purpose processor configured by software to become a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (e.g., comprising different hardware modules) at different times. Software (e.g., a software module) may accordingly configure one or more processors, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
- The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions described herein. As used herein, “processor-implemented module” refers to a hardware module implemented using one or more processors.
- Similarly, the methods described herein may be at least partially processor-implemented, a processor being an example of hardware. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented modules. As used herein, “processor-implemented module” refers to a hardware module in which the hardware includes one or more processors. Moreover, the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an application program interface (API)).
- The performance of certain operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
- Some portions of the subject matter discussed herein may be presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). Such algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
- Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or any suitable combination thereof), registers, or other machine components that receive, store, transmit, or display information. Furthermore, unless specifically stated otherwise, the terms “a” or “an” are herein used, as is common in patent documents, to include one or more than one instance. Finally, as used herein, the conjunction “or” refers to a non-exclusive “or,” unless specifically stated otherwise.
Claims (20)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/258,263 US20150302086A1 (en) | 2014-04-22 | 2014-04-22 | Audio identification during performance |
US15/888,998 US10846334B2 (en) | 2014-04-22 | 2018-02-05 | Audio identification during performance |
US17/102,012 US11574008B2 (en) | 2014-04-22 | 2020-11-23 | Audio identification during performance |
US18/165,107 US20230185847A1 (en) | 2014-04-22 | 2023-02-06 | Audio Identification During Performance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/258,263 US20150302086A1 (en) | 2014-04-22 | 2014-04-22 | Audio identification during performance |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/888,998 Continuation US10846334B2 (en) | 2014-04-22 | 2018-02-05 | Audio identification during performance |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150302086A1 true US20150302086A1 (en) | 2015-10-22 |
Family
ID=54322204
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/258,263 Abandoned US20150302086A1 (en) | 2014-04-22 | 2014-04-22 | Audio identification during performance |
US15/888,998 Active 2035-04-07 US10846334B2 (en) | 2014-04-22 | 2018-02-05 | Audio identification during performance |
US17/102,012 Active 2034-07-26 US11574008B2 (en) | 2014-04-22 | 2020-11-23 | Audio identification during performance |
US18/165,107 Pending US20230185847A1 (en) | 2014-04-22 | 2023-02-06 | Audio Identification During Performance |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/888,998 Active 2035-04-07 US10846334B2 (en) | 2014-04-22 | 2018-02-05 | Audio identification during performance |
US17/102,012 Active 2034-07-26 US11574008B2 (en) | 2014-04-22 | 2020-11-23 | Audio identification during performance |
US18/165,107 Pending US20230185847A1 (en) | 2014-04-22 | 2023-02-06 | Audio Identification During Performance |
Country Status (1)
Country | Link |
---|---|
US (4) | US20150302086A1 (en) |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098616A1 (en) * | 2014-10-02 | 2016-04-07 | Facebook, Inc. | Techniques for managing discussion sharing on a mobile platform |
CN105554590A (en) * | 2015-12-10 | 2016-05-04 | 杭州当虹科技有限公司 | Live streaming media recognition system based on audio fingerprint |
US20160284202A1 (en) * | 2006-07-17 | 2016-09-29 | Eloquence Communications, Inc. | Method and system for advanced patient communication |
US20170024094A1 (en) * | 2015-07-22 | 2017-01-26 | Enthrall Sports LLC | Interactive audience communication for events |
US9686596B2 (en) | 2008-11-26 | 2017-06-20 | Free Stream Media Corp. | Advertisement targeting through embedded scripts in supply-side and demand-side platforms |
US9703947B2 (en) | 2008-11-26 | 2017-07-11 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US9716736B2 (en) | 2008-11-26 | 2017-07-25 | Free Stream Media Corp. | System and method of discovery and launch associated with a networked media device |
CN107293307A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | Audio-frequency detection and device |
CN107657958A (en) * | 2017-09-13 | 2018-02-02 | 厦门声连网信息科技有限公司 | A kind of music recognition system, device and music management server and method |
US9961388B2 (en) | 2008-11-26 | 2018-05-01 | David Harrison | Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements |
US9971307B1 (en) * | 2017-04-14 | 2018-05-15 | Primax Electronics Ltd. | Electronic watch with function of calling for help |
US9986279B2 (en) | 2008-11-26 | 2018-05-29 | Free Stream Media Corp. | Discovery, access control, and communication with networked services |
US20180249056A1 (en) * | 2015-08-18 | 2018-08-30 | Lg Electronics Inc. | Mobile terminal and method for controlling same |
US10147407B2 (en) | 2016-08-31 | 2018-12-04 | Gracenote, Inc. | Characterizing audio using transchromagrams |
US20190104335A1 (en) * | 2017-09-29 | 2019-04-04 | Theater Ears, LLC | Theater ears audio recognition & synchronization algorithm |
US10334324B2 (en) | 2008-11-26 | 2019-06-25 | Free Stream Media Corp. | Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device |
US20190213989A1 (en) * | 2018-01-10 | 2019-07-11 | Qrs Music Technologies, Inc. | Technologies for generating a musical fingerprint |
EP3424218A4 (en) * | 2016-02-29 | 2019-07-31 | Gracenote Inc. | Media channel identification with multi-match detection and disambiguation based on location |
US10410040B2 (en) * | 2017-06-23 | 2019-09-10 | Bio-Key Hong Kong Limited | Fingerprint lock control method and fingerprint lock system |
US10419541B2 (en) | 2008-11-26 | 2019-09-17 | Free Stream Media Corp. | Remotely control devices over a network without authentication or registration |
US10440430B2 (en) | 2016-02-29 | 2019-10-08 | Gracenote, Inc. | Media channel identification with video multi-match detection and disambiguation based on audio fingerprint |
US10534777B2 (en) * | 2015-03-10 | 2020-01-14 | Cdx Nashville, Llc | Systems and methods for continuously detecting and identifying songs in a continuous audio stream |
US10567835B2 (en) | 2016-02-29 | 2020-02-18 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on single-match |
US10567823B2 (en) | 2008-11-26 | 2020-02-18 | Free Stream Media Corp. | Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device |
US10631068B2 (en) | 2008-11-26 | 2020-04-21 | Free Stream Media Corp. | Content exposure attribution based on renderings of related content across multiple devices |
US10629213B2 (en) | 2017-10-25 | 2020-04-21 | The Nielsen Company (Us), Llc | Methods and apparatus to perform windowed sliding transforms |
CN111274891A (en) * | 2020-01-14 | 2020-06-12 | 成都嗨翻屋科技有限公司 | Method and system for extracting pitches and corresponding lyrics for numbered musical notation images |
US10713296B2 (en) | 2016-09-09 | 2020-07-14 | Gracenote, Inc. | Audio identification based on data structure |
US10726852B2 (en) | 2018-02-19 | 2020-07-28 | The Nielsen Company (Us), Llc | Methods and apparatus to perform windowed sliding transforms |
US10733998B2 (en) * | 2017-10-25 | 2020-08-04 | The Nielsen Company (Us), Llc | Methods, apparatus and articles of manufacture to identify sources of network streaming services |
US10803119B2 (en) | 2017-01-02 | 2020-10-13 | Gracenote, Inc. | Automated cover song identification |
US10846334B2 (en) | 2014-04-22 | 2020-11-24 | Gracenote, Inc. | Audio identification during performance |
US10880340B2 (en) | 2008-11-26 | 2020-12-29 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
CN112466334A (en) * | 2020-12-14 | 2021-03-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio identification method, equipment and medium |
US10972295B1 (en) * | 2020-09-30 | 2021-04-06 | Ringcentral, Inc. | System and method for detecting the end of an electronic conference session |
US10977693B2 (en) | 2008-11-26 | 2021-04-13 | Free Stream Media Corp. | Association of content identifier of audio-visual data with additional data through capture infrastructure |
US11049507B2 (en) | 2017-10-25 | 2021-06-29 | Gracenote, Inc. | Methods, apparatus, and articles of manufacture to identify sources of network streaming services |
CN113641423A (en) * | 2021-08-31 | 2021-11-12 | 青岛海信传媒网络技术有限公司 | Display device and system starting method |
US11392640B2 (en) | 2019-09-05 | 2022-07-19 | Gracenote, Inc. | Methods and apparatus to identify media that has been pitch shifted, time shifted, and/or resampled |
US11487815B2 (en) * | 2019-06-06 | 2022-11-01 | Sony Corporation | Audio track determination based on identification of performer-of-interest at live event |
US20220372852A1 (en) * | 2021-05-24 | 2022-11-24 | Exxonmobil Upstream Research Company | Methods of Increasing Efficiency of Plunger Lift Operations |
US12141197B2 (en) | 2019-11-27 | 2024-11-12 | Gracenote, Inc. | Methods and apparatus to identify media based on historical data |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017192382A1 (en) * | 2016-05-02 | 2017-11-09 | Novelis Inc. | Aluminum alloys with enhanced formability and associated methods |
ES2901638T3 (en) * | 2018-05-17 | 2022-03-23 | Fraunhofer Ges Forschung | Device and method for detecting partial concordances between a first time-varying signal and a second time-varying signal |
CN113240419B (en) * | 2021-05-12 | 2022-03-15 | 前海联大(深圳)技术有限公司 | Use method for safe storage of digital RMB |
US11804231B2 (en) | 2021-07-02 | 2023-10-31 | Capital One Services, Llc | Information exchange on mobile devices using audio |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080013614A1 (en) * | 2005-03-30 | 2008-01-17 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. | Device and method for generating a data stream and for generating a multi-channel representation |
US20080065699A1 (en) * | 2006-09-13 | 2008-03-13 | Bloebaum L Scott | System and method for collection and synchronization of metadata to facilitate distributed content search |
US20080082510A1 (en) * | 2006-10-03 | 2008-04-03 | Shazam Entertainment Ltd | Method for High-Throughput Identification of Distributed Broadcast Content |
US20080133556A1 (en) * | 1999-05-19 | 2008-06-05 | Conwell William Y | Content Identifiers |
US20080320078A1 (en) * | 2007-06-22 | 2008-12-25 | Varia Mobil Llc | Venue and event recommendations for a user of a portable media player device |
US20110273455A1 (en) * | 2010-05-04 | 2011-11-10 | Shazam Entertainment Ltd. | Systems and Methods of Rendering a Textual Animation |
US20110289530A1 (en) * | 2010-05-19 | 2011-11-24 | Google Inc. | Television Related Searching |
US20120059826A1 (en) * | 2010-09-08 | 2012-03-08 | Nokia Corporation | Method and apparatus for video synthesis |
US20120124638A1 (en) * | 2010-11-12 | 2012-05-17 | Google Inc. | Syndication including melody recognition and opt out |
US20120210233A1 (en) * | 2010-11-04 | 2012-08-16 | Davis Bruce L | Smartphone-Based Methods and Systems |
US20130007201A1 (en) * | 2011-06-29 | 2013-01-03 | Gracenote, Inc. | Interactive streaming content apparatus, systems and methods |
US20130339877A1 (en) * | 2012-06-13 | 2013-12-19 | Opus Deli, Inc., D/B/A Deliradio | Venue-related multi-media management, streaming, and electronic commerce techniques implemented via computer networks and mobile devices |
US20140129571A1 (en) * | 2012-05-04 | 2014-05-08 | Axwave Inc. | Electronic media signature based applications |
US20140169768A1 (en) * | 2012-12-13 | 2014-06-19 | Reginald Webb | System and Method for Providing Device with Integrated Time Code Generator, Transmitter, and Reader with Interruptible Feedback Monitoring and Talkback |
US20140254820A1 (en) * | 2013-03-08 | 2014-09-11 | Research In Motion Limited | Methods and devices to generate multiple-channel audio recordings |
US20140324616A1 (en) * | 2013-04-26 | 2014-10-30 | Past Eleven Llc. | System and method for location and time specific mobile commerce |
US20150016661A1 (en) * | 2013-05-03 | 2015-01-15 | Digimarc Corporation | Watermarking and signal recognition for managing and sharing captured content, metadata discovery and related arrangements |
US20150104023A1 (en) * | 2013-10-11 | 2015-04-16 | Facebook, Inc., a Delaware corporation | Generating A Reference Audio Fingerprint For An Audio Signal Associated With An Event |
US20150193701A1 (en) * | 2014-01-08 | 2015-07-09 | Stubhub, Inc. | Validity determination of an event ticket and automatic population of admission information |
US20150206544A1 (en) * | 2014-01-23 | 2015-07-23 | International Business Machines Corporation | Adaptive pause detection in speech recognition |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5040081A (en) | 1986-09-23 | 1991-08-13 | Mccutchen David | Audiovisual synchronization signal generator using audio signature comparison |
US6931134B1 (en) | 1998-07-28 | 2005-08-16 | James K. Waller, Jr. | Multi-dimensional processor and multi-dimensional audio processor system |
US7302574B2 (en) | 1999-05-19 | 2007-11-27 | Digimarc Corporation | Content identifiers triggering corresponding responses through collaborative processing |
US7853664B1 (en) | 2000-07-31 | 2010-12-14 | Landmark Digital Services Llc | Method and system for purchasing pre-recorded music |
WO2003091899A2 (en) | 2002-04-25 | 2003-11-06 | Neuros Audio, Llc | Apparatus and method for identifying audio |
US7164076B2 (en) | 2004-05-14 | 2007-01-16 | Konami Digital Entertainment | System and method for synchronizing a live musical performance with a reference performance |
US20060074679A1 (en) | 2004-09-01 | 2006-04-06 | Pifer Daniel A | System and method of delivering live performance recordings |
US7913155B2 (en) | 2006-02-15 | 2011-03-22 | International Business Machines Corporation | Synchronizing method and system |
US8180063B2 (en) | 2007-03-30 | 2012-05-15 | Audiofile Engineering Llc | Audio signal processing system for live music performance |
WO2010138777A1 (en) | 2009-05-27 | 2010-12-02 | Arsh Technologies, Llc | Automatic resource retrieval and use |
US20110085781A1 (en) | 2009-10-13 | 2011-04-14 | Rovi Technologies Corporation | Content recorder timing alignment |
US20110112913A1 (en) | 2009-11-12 | 2011-05-12 | Ian Murray | System and method for live music performance digital recording, distribution, and digital advertising |
EP3418917B1 (en) | 2010-05-04 | 2022-08-17 | Apple Inc. | Methods and systems for synchronizing media |
US9093120B2 (en) * | 2011-02-10 | 2015-07-28 | Yahoo! Inc. | Audio fingerprint extraction by scaling in time and resampling |
US8949872B2 (en) | 2011-12-20 | 2015-02-03 | Yahoo! Inc. | Audio fingerprint for content identification |
US9661361B2 (en) | 2012-09-19 | 2017-05-23 | Google Inc. | Systems and methods for live media content matching |
US9153239B1 (en) * | 2013-03-14 | 2015-10-06 | Google Inc. | Differentiating between near identical versions of a song |
US9390727B2 (en) * | 2014-01-13 | 2016-07-12 | Facebook, Inc. | Detecting distorted audio signals based on audio fingerprinting |
US20150302086A1 (en) | 2014-04-22 | 2015-10-22 | Gracenote, Inc. | Audio identification during performance |
-
2014
- 2014-04-22 US US14/258,263 patent/US20150302086A1/en not_active Abandoned
-
2018
- 2018-02-05 US US15/888,998 patent/US10846334B2/en active Active
-
2020
- 2020-11-23 US US17/102,012 patent/US11574008B2/en active Active
-
2023
- 2023-02-06 US US18/165,107 patent/US20230185847A1/en active Pending
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133556A1 (en) * | 1999-05-19 | 2008-06-05 | Conwell William Y | Content Identifiers |
US20080013614A1 (en) * | 2005-03-30 | 2008-01-17 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. | Device and method for generating a data stream and for generating a multi-channel representation |
US20080065699A1 (en) * | 2006-09-13 | 2008-03-13 | Bloebaum L Scott | System and method for collection and synchronization of metadata to facilitate distributed content search |
US20080082510A1 (en) * | 2006-10-03 | 2008-04-03 | Shazam Entertainment Ltd | Method for High-Throughput Identification of Distributed Broadcast Content |
US20080320078A1 (en) * | 2007-06-22 | 2008-12-25 | Varia Mobil Llc | Venue and event recommendations for a user of a portable media player device |
US20110273455A1 (en) * | 2010-05-04 | 2011-11-10 | Shazam Entertainment Ltd. | Systems and Methods of Rendering a Textual Animation |
US20110289530A1 (en) * | 2010-05-19 | 2011-11-24 | Google Inc. | Television Related Searching |
US20120059826A1 (en) * | 2010-09-08 | 2012-03-08 | Nokia Corporation | Method and apparatus for video synthesis |
US20120210233A1 (en) * | 2010-11-04 | 2012-08-16 | Davis Bruce L | Smartphone-Based Methods and Systems |
US20120124638A1 (en) * | 2010-11-12 | 2012-05-17 | Google Inc. | Syndication including melody recognition and opt out |
US20130007201A1 (en) * | 2011-06-29 | 2013-01-03 | Gracenote, Inc. | Interactive streaming content apparatus, systems and methods |
US20140129571A1 (en) * | 2012-05-04 | 2014-05-08 | Axwave Inc. | Electronic media signature based applications |
US20130339877A1 (en) * | 2012-06-13 | 2013-12-19 | Opus Deli, Inc., D/B/A Deliradio | Venue-related multi-media management, streaming, and electronic commerce techniques implemented via computer networks and mobile devices |
US20140169768A1 (en) * | 2012-12-13 | 2014-06-19 | Reginald Webb | System and Method for Providing Device with Integrated Time Code Generator, Transmitter, and Reader with Interruptible Feedback Monitoring and Talkback |
US20140254820A1 (en) * | 2013-03-08 | 2014-09-11 | Research In Motion Limited | Methods and devices to generate multiple-channel audio recordings |
US20140324616A1 (en) * | 2013-04-26 | 2014-10-30 | Past Eleven Llc. | System and method for location and time specific mobile commerce |
US20150016661A1 (en) * | 2013-05-03 | 2015-01-15 | Digimarc Corporation | Watermarking and signal recognition for managing and sharing captured content, metadata discovery and related arrangements |
US20150104023A1 (en) * | 2013-10-11 | 2015-04-16 | Facebook, Inc., a Delaware corporation | Generating A Reference Audio Fingerprint For An Audio Signal Associated With An Event |
US20150193701A1 (en) * | 2014-01-08 | 2015-07-09 | Stubhub, Inc. | Validity determination of an event ticket and automatic population of admission information |
US20150206544A1 (en) * | 2014-01-23 | 2015-07-23 | International Business Machines Corporation | Adaptive pause detection in speech recognition |
Cited By (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160284202A1 (en) * | 2006-07-17 | 2016-09-29 | Eloquence Communications, Inc. | Method and system for advanced patient communication |
US10425675B2 (en) | 2008-11-26 | 2019-09-24 | Free Stream Media Corp. | Discovery, access control, and communication with networked services |
US10334324B2 (en) | 2008-11-26 | 2019-06-25 | Free Stream Media Corp. | Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device |
US10986141B2 (en) | 2008-11-26 | 2021-04-20 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US9686596B2 (en) | 2008-11-26 | 2017-06-20 | Free Stream Media Corp. | Advertisement targeting through embedded scripts in supply-side and demand-side platforms |
US9703947B2 (en) | 2008-11-26 | 2017-07-11 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US10977693B2 (en) | 2008-11-26 | 2021-04-13 | Free Stream Media Corp. | Association of content identifier of audio-visual data with additional data through capture infrastructure |
US9716736B2 (en) | 2008-11-26 | 2017-07-25 | Free Stream Media Corp. | System and method of discovery and launch associated with a networked media device |
US10880340B2 (en) | 2008-11-26 | 2020-12-29 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US10791152B2 (en) | 2008-11-26 | 2020-09-29 | Free Stream Media Corp. | Automatic communications between networked devices such as televisions and mobile devices |
US9838758B2 (en) | 2008-11-26 | 2017-12-05 | David Harrison | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US9848250B2 (en) | 2008-11-26 | 2017-12-19 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US9854330B2 (en) | 2008-11-26 | 2017-12-26 | David Harrison | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US9866925B2 (en) | 2008-11-26 | 2018-01-09 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US10771525B2 (en) | 2008-11-26 | 2020-09-08 | Free Stream Media Corp. | System and method of discovery and launch associated with a networked media device |
US9961388B2 (en) | 2008-11-26 | 2018-05-01 | David Harrison | Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements |
US9967295B2 (en) | 2008-11-26 | 2018-05-08 | David Harrison | Automated discovery and launch of an application on a network enabled device |
US10631068B2 (en) | 2008-11-26 | 2020-04-21 | Free Stream Media Corp. | Content exposure attribution based on renderings of related content across multiple devices |
US9986279B2 (en) | 2008-11-26 | 2018-05-29 | Free Stream Media Corp. | Discovery, access control, and communication with networked services |
US10032191B2 (en) | 2008-11-26 | 2018-07-24 | Free Stream Media Corp. | Advertisement targeting through embedded scripts in supply-side and demand-side platforms |
US10567823B2 (en) | 2008-11-26 | 2020-02-18 | Free Stream Media Corp. | Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device |
US10074108B2 (en) | 2008-11-26 | 2018-09-11 | Free Stream Media Corp. | Annotation of metadata through capture infrastructure |
US10142377B2 (en) | 2008-11-26 | 2018-11-27 | Free Stream Media Corp. | Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device |
US9706265B2 (en) | 2008-11-26 | 2017-07-11 | Free Stream Media Corp. | Automatic communications between networked devices such as televisions and mobile devices |
US10419541B2 (en) | 2008-11-26 | 2019-09-17 | Free Stream Media Corp. | Remotely control devices over a network without authentication or registration |
US10846334B2 (en) | 2014-04-22 | 2020-11-24 | Gracenote, Inc. | Audio identification during performance |
US11574008B2 (en) | 2014-04-22 | 2023-02-07 | Gracenote, Inc. | Audio identification during performance |
US20160098616A1 (en) * | 2014-10-02 | 2016-04-07 | Facebook, Inc. | Techniques for managing discussion sharing on a mobile platform |
US10560418B2 (en) * | 2014-10-02 | 2020-02-11 | Facebook, Inc. | Techniques for managing discussion sharing on a mobile platform |
US10534777B2 (en) * | 2015-03-10 | 2020-01-14 | Cdx Nashville, Llc | Systems and methods for continuously detecting and identifying songs in a continuous audio stream |
US20170024094A1 (en) * | 2015-07-22 | 2017-01-26 | Enthrall Sports LLC | Interactive audience communication for events |
US9817557B2 (en) * | 2015-07-22 | 2017-11-14 | Enthrall Sports LLC | Interactive audience communication for events |
US20180249056A1 (en) * | 2015-08-18 | 2018-08-30 | Lg Electronics Inc. | Mobile terminal and method for controlling same |
CN105554590A (en) * | 2015-12-10 | 2016-05-04 | 杭州当虹科技有限公司 | Live streaming media recognition system based on audio fingerprint |
US11290776B2 (en) | 2016-02-29 | 2022-03-29 | Roku, Inc. | Media channel identification and action with multi-match detection and disambiguation based on matching with differential reference-fingerprint feature |
US11012738B2 (en) | 2016-02-29 | 2021-05-18 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on location |
US10575052B2 (en) | 2016-02-29 | 2020-02-25 | Gracenot, Inc. | Media channel identification and action with multi-match detection based on reference stream comparison |
US10524000B2 (en) | 2016-02-29 | 2019-12-31 | Gracenote, Inc. | Media channel identification and action with multi-match detection and disambiguation based on matching with differential reference-fingerprint feature |
US10536746B2 (en) | 2016-02-29 | 2020-01-14 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on location |
US11317142B2 (en) | 2016-02-29 | 2022-04-26 | Roku, Inc. | Media channel identification with multi-match detection and disambiguation based on location |
US11627372B2 (en) | 2016-02-29 | 2023-04-11 | Roku, Inc. | Media channel identification with multi-match detection and disambiguation based on single-match |
US10567835B2 (en) | 2016-02-29 | 2020-02-18 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on single-match |
US11617009B2 (en) | 2016-02-29 | 2023-03-28 | Roku, Inc. | Media channel identification and action with multi-match detection and disambiguation based on matching with differential reference-fingerprint feature |
US10567836B2 (en) | 2016-02-29 | 2020-02-18 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on single-match |
US10523999B2 (en) | 2016-02-29 | 2019-12-31 | Gracenote, Inc. | Media channel identification and action with multi-match detection and disambiguation based on matching with differential reference-fingerprint feature |
US11412296B2 (en) | 2016-02-29 | 2022-08-09 | Roku, Inc. | Media channel identification with video multi-match detection and disambiguation based on audio fingerprint |
US11012743B2 (en) | 2016-02-29 | 2021-05-18 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on single-match |
US10631049B2 (en) | 2016-02-29 | 2020-04-21 | Gracenote, Inc. | Media channel identification with video multi-match detection and disambiguation based on audio fingerprint |
US11336956B2 (en) | 2016-02-29 | 2022-05-17 | Roku, Inc. | Media channel identification with multi-match detection and disambiguation based on single-match |
US11463765B2 (en) | 2016-02-29 | 2022-10-04 | Roku, Inc. | Media channel identification and action with multi-match detection based on reference stream comparison |
US10419814B2 (en) | 2016-02-29 | 2019-09-17 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on time of broadcast |
US11206447B2 (en) | 2016-02-29 | 2021-12-21 | Roku, Inc. | Media channel identification with multi-match detection and disambiguation based on time of broadcast |
US10440430B2 (en) | 2016-02-29 | 2019-10-08 | Gracenote, Inc. | Media channel identification with video multi-match detection and disambiguation based on audio fingerprint |
EP3424218A4 (en) * | 2016-02-29 | 2019-07-31 | Gracenote Inc. | Media channel identification with multi-match detection and disambiguation based on location |
US11089357B2 (en) | 2016-02-29 | 2021-08-10 | Roku, Inc. | Method and system for detecting and responding to changing of media channel |
US10805673B2 (en) | 2016-02-29 | 2020-10-13 | Gracenote, Inc. | Method and system for detecting and responding to changing of media channel |
US11432037B2 (en) | 2016-02-29 | 2022-08-30 | Roku, Inc. | Method and system for detecting and responding to changing of media channel |
US10848820B2 (en) | 2016-02-29 | 2020-11-24 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on time of broadcast |
US11089360B2 (en) | 2016-02-29 | 2021-08-10 | Gracenote, Inc. | Media channel identification with video multi-match detection and disambiguation based on audio fingerprint |
US10412448B2 (en) | 2016-02-29 | 2019-09-10 | Gracenote, Inc. | Media channel identification with multi-match detection and disambiguation based on location |
US10939162B2 (en) | 2016-02-29 | 2021-03-02 | Gracenote, Inc. | Media channel identification and action with multi-match detection based on reference stream comparison |
US10972786B2 (en) | 2016-02-29 | 2021-04-06 | Gracenote, Inc. | Media channel identification and action with multi-match detection and disambiguation based on matching with differential reference- fingerprint feature |
CN107293307B (en) * | 2016-03-31 | 2021-07-16 | 阿里巴巴集团控股有限公司 | Audio detection method and device |
CN107293307A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | Audio-frequency detection and device |
US10475426B2 (en) | 2016-08-31 | 2019-11-12 | Gracenote, Inc. | Characterizing audio using transchromagrams |
US10147407B2 (en) | 2016-08-31 | 2018-12-04 | Gracenote, Inc. | Characterizing audio using transchromagrams |
US10713296B2 (en) | 2016-09-09 | 2020-07-14 | Gracenote, Inc. | Audio identification based on data structure |
US11907288B2 (en) | 2016-09-09 | 2024-02-20 | Gracenote, Inc. | Audio identification based on data structure |
US12105754B2 (en) | 2016-09-09 | 2024-10-01 | Gracenote, Inc. | Audio identification based on data structure |
US12105753B2 (en) | 2017-01-02 | 2024-10-01 | Gracenote, Inc. | Automated cover song identification |
US10803119B2 (en) | 2017-01-02 | 2020-10-13 | Gracenote, Inc. | Automated cover song identification |
US11461390B2 (en) | 2017-01-02 | 2022-10-04 | Gracenote, Inc. | Automated cover song identification |
US9971307B1 (en) * | 2017-04-14 | 2018-05-15 | Primax Electronics Ltd. | Electronic watch with function of calling for help |
US10410040B2 (en) * | 2017-06-23 | 2019-09-10 | Bio-Key Hong Kong Limited | Fingerprint lock control method and fingerprint lock system |
WO2019052121A1 (en) * | 2017-09-13 | 2019-03-21 | 厦门声连网信息科技有限公司 | Music identification system, method and apparatus, and music management server |
CN107657958A (en) * | 2017-09-13 | 2018-02-02 | 厦门声连网信息科技有限公司 | A kind of music recognition system, device and music management server and method |
US20190104335A1 (en) * | 2017-09-29 | 2019-04-04 | Theater Ears, LLC | Theater ears audio recognition & synchronization algorithm |
US11651776B2 (en) | 2017-10-25 | 2023-05-16 | The Nielsen Company (Us), Llc | Methods, apparatus and articles of manufacture to identify sources of network streaming services |
US10629213B2 (en) | 2017-10-25 | 2020-04-21 | The Nielsen Company (Us), Llc | Methods and apparatus to perform windowed sliding transforms |
US10733998B2 (en) * | 2017-10-25 | 2020-08-04 | The Nielsen Company (Us), Llc | Methods, apparatus and articles of manufacture to identify sources of network streaming services |
US11948589B2 (en) | 2017-10-25 | 2024-04-02 | Gracenote, Inc. | Methods, apparatus, and articles of manufacture to identify sources of network streaming services |
US11049507B2 (en) | 2017-10-25 | 2021-06-29 | Gracenote, Inc. | Methods, apparatus, and articles of manufacture to identify sources of network streaming services |
US11430454B2 (en) | 2017-10-25 | 2022-08-30 | The Nielsen Company (Us), Llc | Methods and apparatus to identify sources of network streaming services using windowed sliding transforms |
US20190213989A1 (en) * | 2018-01-10 | 2019-07-11 | Qrs Music Technologies, Inc. | Technologies for generating a musical fingerprint |
US10861428B2 (en) * | 2018-01-10 | 2020-12-08 | Qrs Music Technologies, Inc. | Technologies for generating a musical fingerprint |
US11355093B2 (en) * | 2018-01-10 | 2022-06-07 | Qrs Music Technologies, Inc. | Technologies for tracking and analyzing musical activity |
US11322122B2 (en) * | 2018-01-10 | 2022-05-03 | Qrs Music Technologies, Inc. | Musical activity system |
US10726852B2 (en) | 2018-02-19 | 2020-07-28 | The Nielsen Company (Us), Llc | Methods and apparatus to perform windowed sliding transforms |
US11487815B2 (en) * | 2019-06-06 | 2022-11-01 | Sony Corporation | Audio track determination based on identification of performer-of-interest at live event |
US12056185B2 (en) | 2019-09-05 | 2024-08-06 | Gracenote, Inc. | Methods and apparatus to identify media |
US11392641B2 (en) * | 2019-09-05 | 2022-07-19 | Gracenote, Inc. | Methods and apparatus to identify media |
US11392640B2 (en) | 2019-09-05 | 2022-07-19 | Gracenote, Inc. | Methods and apparatus to identify media that has been pitch shifted, time shifted, and/or resampled |
US12061646B2 (en) | 2019-09-05 | 2024-08-13 | Gracenote, Inc. | Methods and apparatus to identify media that has been pitch shifted, time shifted, and/or resampled |
US12141197B2 (en) | 2019-11-27 | 2024-11-12 | Gracenote, Inc. | Methods and apparatus to identify media based on historical data |
CN111274891A (en) * | 2020-01-14 | 2020-06-12 | 成都嗨翻屋科技有限公司 | Method and system for extracting pitches and corresponding lyrics for numbered musical notation images |
US10972295B1 (en) * | 2020-09-30 | 2021-04-06 | Ringcentral, Inc. | System and method for detecting the end of an electronic conference session |
CN112466334A (en) * | 2020-12-14 | 2021-03-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio identification method, equipment and medium |
US20220372852A1 (en) * | 2021-05-24 | 2022-11-24 | Exxonmobil Upstream Research Company | Methods of Increasing Efficiency of Plunger Lift Operations |
CN113641423A (en) * | 2021-08-31 | 2021-11-12 | 青岛海信传媒网络技术有限公司 | Display device and system starting method |
Also Published As
Publication number | Publication date |
---|---|
US20230185847A1 (en) | 2023-06-15 |
US10846334B2 (en) | 2020-11-24 |
US20210149954A1 (en) | 2021-05-20 |
US11574008B2 (en) | 2023-02-07 |
US20180239818A1 (en) | 2018-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11574008B2 (en) | Audio identification during performance | |
US11366850B2 (en) | Audio matching based on harmonogram | |
US11477156B2 (en) | Watermarking and signal recognition for managing and sharing captured content, metadata discovery and related arrangements | |
JP5833235B2 (en) | Method and system for identifying the contents of a data stream | |
JP6060155B2 (en) | Method and system for performing a comparison of received data and providing subsequent services based on the comparison | |
JP4945877B2 (en) | System and method for recognizing sound / musical signal under high noise / distortion environment | |
US12105753B2 (en) | Automated cover song identification | |
US10122483B2 (en) | Crowd sentiment detection and analysis | |
WO2020093883A1 (en) | Method and device for obtaining video clip, server, and storage medium | |
EP3508986B1 (en) | Music cover identification for search, compliance, and licensing | |
US9576050B1 (en) | Generating a playlist based on input acoustic information | |
CN102959544A (en) | Methods and systems for synchronizing media | |
US10506268B2 (en) | Identifying media content for simultaneous playback | |
US12105754B2 (en) | Audio identification based on data structure | |
CN103562909A (en) | Methods and systems for identifying content in data stream by client device | |
Porter | Evaluating musical fingerprinting systems | |
US11410706B2 (en) | Content pushing method for display device, pushing device and display device | |
CN111125028B (en) | Method, device, server and storage medium for identifying audio files | |
CN118044206A (en) | Event source content and remote content synchronization | |
CN116932809A (en) | Music information display method, device and computer readable storage medium | |
CN116932810A (en) | Music information display method, device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GRACENOTE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ROBERTS, DALE T.;COOVER, BOB;MARCANTONIO, NICOLA;AND OTHERS;SIGNING DATES FROM 20140409 TO 20140421;REEL/FRAME:035160/0883 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK Free format text: SUPPLEMENTAL SECURITY AGREEMENT;ASSIGNORS:GRACENOTE, INC.;GRACENOTE MEDIA SERVICES, LLC;GRACENOTE DIGITAL VENTURES, LLC;REEL/FRAME:042262/0601 Effective date: 20170412 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: GRACENOTE DIGITAL VENTURES, LLC, NEW YORK Free format text: RELEASE (REEL 042262 / FRAME 0601);ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:061748/0001 Effective date: 20221011 Owner name: GRACENOTE, INC., NEW YORK Free format text: RELEASE (REEL 042262 / FRAME 0601);ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:061748/0001 Effective date: 20221011 |