APPARATUS AND METHOD FOR JDENTIFYΓNG AUDIO
BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION The invention relates generally to identification of audio. More particularly, the invention is directed to a portable device configured to identify an audio track.
DESCRIPTION OF RELATED ART Sound audible to the human ear, i.e., having a frequency between 20 and 20,000 vibrations per second (20-20,000Hz), is known as audio. Examples of audio include speech, music, or the like. What is more, audio is typically heard from one of three sources, namely live performances, recordings, or broadcasts, hi general, recordings and broadcasts are either analog or digital. Analog recordings include magnetic tape recordings and records, while digital recordings include compact discs (CDs), mini-discs, various data file formats, such as MPEG Audio Layer 3 (MP3) files, or the like. Analog broadcasts include sound reproduction, such as via a stereo, and analog radio broadcasts. Digital broadcasts, on the other hand, include digital radio broadcasts, such as those provided by XM SATELLITE RADIO and StRFJS SATELLITE RADIO, and streaming broadcasts over the Internet, such as REAL AUDIO, WINDOWS MEDIA, or MP3 streams.
Often, listeners of audio want to identify an audio track to which they are listening. The audio track may be any finite length audio composition, such as a song, speech, or the like. The identity of the audio track may be important for a number of reasons, such as to enable a user to identify a song in order to purchase the song; to know more about the artist; to find out further details about the artist; to be able to identify the audio in the future; to ascertain to whom royalties must be payed; to index a list of unidentified audio tracks; or the like.
Typically, the identity of the audio track is established by a number of methods, such as by the listener recognizing the audio track, reading an associated writing identifying the audio track, or relying on an announcement of the identity of the audio track. For example, a listener may recognize a song or artist that he/she knows, he/she may read a music album's
CD jacket to determine the identity of a song, or he/she may listen to a radio announcer announce the title and artist of a song.
However, in certain situations a user may not be able to identify an audio track using one of these aforementioned methods. Indeed, each source of audio has its own associated drawbacks to audio identification. For example, drawbacks in broadcasting are that radio announcers often don't announce the identity of an audio track; they wait too long to make an announcement and a listener cannot wait until the song is completed to hear the announcement; it is often inconvenient to write down the name of the song; etc. An example of a drawback of recordings is that, historically, recordings did not inform the listener of the identity of the audio track.
In recent years, however, digital recordings have introduced audio identification data into recorded audio track data. This audio identification data is otherwise known as metadata and is associated with many types of digital audio files. An example of such metadata is the ID3tags associated with MP3 audio files. This metadata typically contains basic information about the audio file such as song title, artist, track length, etc. Likewise, digital streaming broadcasts sometimes also attach metadata to their digital audio streams.
Another means for identifying digital recordings is provided by GRACENOTE (previously CDDB of Berkeley, California) and described in U.S. Patent Nos. 6,330,593; 6,240,459; 6,230,207; 6,230,192; 6,161,132; 6,154,773; 6,061,680; and 5,987,525. GRACENOTE uses a Compact Disc Database (CDDB) to identify music that is generated from prerecorded CDs. The CDDB uses the unique identifiers found in the CD's table of contents, such as the CD's list of tracks and associated track times, to identify the songs on a CD. The CDDB service works in conjunction with a variety of computer software media players to identify audio tracks. These media players use the CDDB to populate file names and metadata for each song encoded from a CD.
Another application of the CDDB technology allows standalone CD players (not attached to a computer or the Internet) to display song title and artist information. To do this, the device must store the GRACENOTE database locally and perform the same technique as described above, locally on the device.
A drawback of the CDDB technology is that it requires the presence of a full prerecorded CD to be able to identify the CD's individual audio tracks. Therefore, this
technology cannot be used to identify individual audio tracks heard by a listener from sources other than a recorded CD.
Yet another type of device for identifying audio uses a time-stamping technique to identify audio tracks. Two known devices that employ this time-stamping technique are the SONY E-MARKER and the XENOTE I-TAG. These devices are very simple keychain devices that simply record the date and time when a button on the device is depressed. In use, when a listener hears a song on the radio that he/she wants to identify, he/she presses the button on the device and the device records the date and time associated with the depression of the button. Later, when the device is synchronized with a desktop computer, a unique user identifier associated with the listener's device and the recorded date and time information is sent to a server via the Internet. Typically, a web page is then displayed which shows the songs played on a variety of stations that the listener (having the unique identifier) had previously identified as the radio stations most commonly listened to. The device itself does not store any information relating to the station the user was listening to at the time of the selection. The Web-page, that presents the identified songs, also often presents options related to purchasing the CD that contains the selected song, etc.
A drawback of such devices that use time-stamping techniques is that they do not fully automate the process of identifying song information because the user is required to remember what station he/she was listening to when he/she actuated the device. Further, the user must interact with a desktop computer to obtain the audio track identification. Specifically, the user must identify the radio stations that he/she most commonly listens to. In addition, interaction through the Internet is required, and as a result, includes the normal drawbacks associated with the latency, reliability, and speed of the Internet. Put differently, the interaction is typically much slower than that encountered when using a non-Internet based audio track identification solution.
Moreover, because such devices only record time and date of actuation, use of such devices is limited to radio broadcasts. In addition, such devices require the service provider to maintain a database that contains the complete playlists and accompanying playtimes from every radio station in every market that the service provider wishes to support. Because the radio stations do not provide this information, collecting such playlists and accompanying playtimes is usually performed by a third party. The third party either manually identifies and enters the playlists and accompanying playtimes into a database, or these playlists and
accompanying playtimes are automatically identified and stored in the database by a computer. In either event, such identification and storage is complex, requires significant effort, is costly, and is, therefore, typically limited to the most popular stations, thereby excluding many geographic areas and markets.
Yet another prior art means for identifying audio tracks is performed using audio fingerprinting. Audio fingerprinting typically uses software to identify a song by comparing a unique audio identifier or fingerprint (hereinafter "fingerprint") of an audio sample to a database of known "fingerprints" associated with known audio samples.
A number of service providers and/or software applications utilize digital finge rinting techniques to identify audio tracks. For example, CLANGO a software product made by AUDIBLE MAGIC CORP of Los Gatos, California, uses digital fingerprinting to identify streaming audio broadcasts that do not provide associated audio track metadata. The fingerprinting performed by AUDIBLE MAGIC CORP is described in U.S. Patent No. 5,918,223.
Another provider of similar audio fingerprinting technology is AUDITUDE, whose software product ID3MAN is aimed at users who posses a collection of digital audio files whose associated identification data is either incorrect or incomplete. Through a combination of techniques, including audio fingeφrinting, ID3MAN identifies the audio files and subsequently corrects the identification data associated with those files.
A drawback of these fingerprinting devices or services is that they do not provide any benefit to users listening to music away from their desktop computers (except in the case of a CDDB enabled CD player, which requires the device to store an extremely large GRACENOTE database, and which has its own associated drawbacks, as described above.).
A further means for identifying audio uses a cellular telephone network, where upon hearing the audio that the user wants to identify, the user calls a designated number to have that audio identified for them. There are at least two methods that are used to provide this service.
The first method, which was offered under the name BUZZHITS (now defunct), allowed the user to call a designated number and enter a user identifier which identified the caller (and the caller's geographic market) and then prompted the user for the broadcast frequency of the radio station broadcasting the audio to be identified. Once the broadcast frequency was supplied, the user was provided with sample audio clips, from which the user
selected a sample audio clip to obtain the identity of the audio track. This information was also emailed to the user.
While this phone service solve some of the above described drawbacks, it still requires the user to manually interact with the device and the user is forced to interact at exactly the time the audio is heard, which is often inconvenient.
What is more, none of the above described audio identification solutions automatically perform additional actions once an audio track has been identified. As more music becomes available for download through the emerging subscription services, consumers will desire an option to purchase and download the music they hear from a variety of sources. Completing a transaction using the above products/services is inherently a multi-step manual process that requires interaction with the Internet and a desktop computer, or cellular phone.
In light of the above, there is a need for an audio identification device and method that addresses the abovementioned drawbacks, while being convenient and easy to use, and providing accurate identification at a low associated cost per identification.
BRIEF SUMMARY OF THE INVENTION One embodiment of the invention includes a portable device that can record from a microphone, audio player, and/or radio receiver. To identify an audio track, this portable device, when actuated, records an audio sample of an audio track being played through the player or being received by the radio receiver. If the portable device is not currently playing audio or receiving a radio broadcast, it can record the audio sample through the microphone. This recorded audio sample is stored on the portable device's internal storage, and later, when connected to a client computer, is uploaded to that client computer. The client computer then processes the audio sample to generates a "fingerprint" of the audio sample that is then compared to a fingerprint database either on the client computer locally or on an identification server (ID server) coupled to the client computer via the Internet. Once the fingeφrint has been identified, the title and artist information is returned to the client computer and ultimately displayed on the portable device.
Alternatively, the portable device itself processes the fingeφrint from the recorded audio sample, and generates the fingeφrint. This has the advantage of reducing the amount of storage space needed to store the audio samples, as only the fingeφrint is stored on the
portable device. This embodiment, however, requires that the portable device have adequate processor power to perform fingeφrinting.
As an adjunct, in addition to the return of artist and title information, the device also performs additional actions once the audio sample or track has been identified. Examples of additional actions include downloading the identified audio track from a subscription service, recommending more audio tracks similar to the identified audio track, obtaining prices of the identified audio track from Internet music merchants. These additional actions are preferably selected by choosing a menu item from the player's display, and can be customized and downloaded from third party service providers.
According to the invention there is provided a method for identifying audio on a portable device. An audio sample is recorded on a portable device from an audio track. The audio sample is then stored in a cache on the portable device. The audio sample is transmitted to a computing device to be identified. The audio sample is received by the computing device and fingeφrinting is performed on the audio sample to obtain a unique audio fingeφrint for the audio sample. A fingeφrint database is then searched for a match of the fingeφrint to a known fingeφrint of a previously identified audio track. A match is located and identification data associated with the previously identified audio track is sent to the portable device. Identification of the audio sample is then received and displayed on the portable device. In another embodiment, the fingeφrinting is performed on the portable device.
In another embodiment, a radio broadcast is received and played on the portable device. An instruction to identify an audio track of the radio broadcast is received and the radio broadcast's broadcast frequency, and the date and time that the portable device received the instruction to identify the audio track is automatically recorded. The broadcast frequency, date, and time along with a unique device identifier is then transmitted to a computing device to be identified. This data is received by the computing device. A playlist database is then searched for a match of the broadcast frequency, date, and time to a known radio station's broadcast frequency based on the user's geographic location determined by the unique device identifier, and known date and time that an audio track was broadcast by the radio station. The audio track associated with the broadcast frequency, date and time is located and sent to the portable device. The portable device thereafter receives and displays information
associated with the identified audio track, hi another embodiment, the fingeφrinting is performed on the portable device.
According to the invention there is also provided a portable device, computing device, and identification server for performing the above described methods.
Therefore, by combining this functionality with a portable music player, the process of identification can be automated. In addition, the possible range of uses of the device can be broadened to cover a broader array of music, and music sources, where additional functionality can be provided for little additional cost. Also, the device can facilitate compiling a database on radio station playlists. Additional action can be initiated at the device, where the additional actions can be personalized to a user's preferences.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of the nature and objects of the invention, reference should be made to the following detailed description, taken in conjunction with the accompanying drawings, in which:
Figure 1 is a diagrammatic view of a system for identifying audio on a portable device, according to an embodiment of the invention;
Figure 2 is a block diagram of the identification server (ID server) and/or client computer shown in Figure 1;
Figure 3 is a block diagram of the portable device shown in Figure 1;
Figure 4 is a flow chart of a method for identifying audio, where the identification is performed by an ID server, according to an embodiment of the invention;
Figure 5 is a flow chart of another method for identifying audio, where the identification is performed by a client computer, according to another embodiment of the invention;
Figure 6 is a flow chart of yet another method for identifying audio, where the identification is performed by a portable device, according to yet another embodiment of the invention;
Figure 7 is a flow chart of a method for identifying audio from a radio broadcast, where the identification is performed by an identification server (ID server), according to an embodiment of the invention;
Figure 8 is a flow chart of another method for identifying audio from a radio broadcast, where the identification is performed by a client computer, according to another embodiment of the invention; and
Figure 9 is a flow chart of yet another method for identifying audio from a radio broadcast, where the identification is performed by a portable device, according to yet another embodiment of the invention.
Like reference numerals refer to corresponding parts throughout the several views of the drawings.
DETAILED DESCRIPTION OF THE INVENTION
Figure 1 is a block diagram of a system 100 for identifying audio, according to an embodiment of the invention. The system 100 comprises at least one identification server 102 (hereinafter "ID server") and at least one client computer 106 coupled to one another via a network 104. The ID server 102 and client computer 106 are any type of computing devices. However, in one embodiment the client computer 106 is a desktop computer and the network 104 is the Internet.
The client computer 106 is coupled to the network 104 by any suitable communication link 108, such as Ethernet, coaxial cables, copper telephone lines, optical fibers, wireless, infra-red, or the like. A portable audio identification device 112 (hereinafter "portable device") is coupled to the client computer 106. The portable device 112 is preferably sized to be carried in the palm of one's hand. The portable device 112 couples to the client computer 112 by any suitable communication link 110, such as Universal Serial Bus (USB), Firewire, Ethernet, coaxial cable, copper telephone line, optical fiber, wireless, infra-red, or the like.
In an alternative embodiment, the client computer 106 is a fixed wireless base station coupled to a gateway/modem that is in turn connected to the network 104. For example, client computer might be a WiFi (Wireless Fidelity - IEEE 802.1 lb wireless networking) base station coupled to the network 104 via a Digital Subscriber Line (DSL) gateway (not shown). In this embodiment, the communication link from the portable device 112 to the client computer 106 is a WiFi wireless communication link.
In yet another embodiment, no client computer 106 is present and the portable device 112 communicates directly with the ID server 102. For example, the portable device 112
includes cellular telephone communication circuitry which communicates with the ID server 102 via a cellular telephone network (network 104).
In a further embodiment, the portable device 112 alone is necessary to identify audio. For example, the portable device periodically downloads updated playlists and/or fingeφrinting databases from the network 104, as explained in further detail below in relation to Figures 6 and 9.
In one embodiment, a playlist provider 114 and fingeφrint provider 116 may also be coupled to the network 104. Here, the playlist provider 114 is a server that supplies updated playlists to the ID server 102, client computer 106, and/or portable device 112, while the fingeφrint provider 116 is a server that supplies updated fingeφrint data for new audio tracks to the ID server 102, client computer 106, and/or portable device 112.
Figure 2 is a block diagram of the ID server 102 and/or client computer 106 shown in Figure 1. ID server 102 and/or client computer 106 are shown in one diagram to avoid repetition. It should, however, be appreciated that all the elements of the ID server 102 and/or client computer 106 listed below need not be present in all embodiments of the invention and are merely included for exemplary puφoses.
The ID server 102 and/or client computer 106 preferably include: at least one data processor or central processing unit (CPU) 202; a memory 210; user interface devices 206, such as a monitor and keyboard; communications circuitry 204 for communicating with the network 104 (Figure 1), ID server 102 (Figure 1), client computer 106 (Figure 1) and/or portable device 112 (Figure 1); and at least one bus 208 that interconnects these components.
Memory 210 preferably includes an operating system 212, such as VXWORKS, LINUX, or WINDOWS having instructions for processing, accessing, storing, or searching data, etc. Memory 210 also preferably includes communications procedures 214 for communicating with the network 104 (Figure 1), ID server 102 (Figure 1), client computer 106 (Figure 1) and/or portable device 112 (Figure 1); fingeφrinting procedures 216; searching procedures 218; a fingeφrinting database 220; a radio playlist database 224; a geographic identifier 234; a "no identification" message 236; and a cache 238 for temporarily storing data.
The fingeφrinting procedures 216 are used to obtain a unique identifier or fingeφrint for an audio sample of an audio track, as described in further detail below in relation to
figures 4 and 5. The fingeφrinting procedures 216 include instructions for performing fingeφrinting on the audio sample to obtain a unique audio fingeφrint for the audio sample.
The searching procedures 218 are used for searching the fingeφrint database 220 in order to attempt to identify audio, as described in further detail below in relation to figures 4 to 6. The fingeφrinting database 220 includes numerous fingeφrints of known audio samples or audio tracks and their associated identification data 222(1 )-(N), such as song title, artist, or the like.
In an alternative embodiment, a radio playlist database 224 is provided. In this embodiment, the radio playlist database 224 includes numerous radio frequencies 226(1)-(N) and an associated playlist 228(1)-(N) for each frequency 226(1)-(N). Each playlist 228(1)- (N) includes a date 230(1)-(N) and time 232(1)-(N) , and the identity 232(1)-(N) of each audio track broadcast at that date and time. For example, radio station KJAZ may have a frequency of 98.7FM, and a playlist that includes Frank Sinatra's "New York, New York" broadcast on January 21, 2002 at 9:00AM.
As multiple radio stations across the world share the same frequencies, a geographic identifier 234 is provided to identify the radio stations or frequencies 226(1)-(N) in a particular geographic area. This geographic identifier 234 may be provided by any suitable means. In one embodiment the user supplies the geographic identifier. In another embodiment, the geographic identifier 234 is obtained from the user's unique network address. For example, an Internet Protocol (IP) address of the client computer 106 and/or portable device 112 can be used to approximate the geographic area of the user. In still another embodiment, a Global Positioning System (GPS) incoφorated into the client computer 106 and/or portable device 112 can be used to determine the geographic area of the user. If the ID server 102 and or client computer 106 cannot identify an audio track, the "no identification" message 236 is used to inform the user that no identification can be made. Alternatively, prior to receiving a no identification message, the user may be presented with a number of "closest match" possible identifications.
It should be appreciated by one skilled in the art that certain elements of these devices need not be present on both the ID server 102 and the client computer 106. For example, the fingeφrinting procedures 216, searching procedures 218, and fingeφrinting database 224 may only be necessary on the device on which fingeφrinting of an audio track occurs. In other words, if fingeφrinting occurs on the ID server 102 then the aforementioned elements
of memory 210 need only be present on the ID server 102. Likewise, in the embodiment where identification occurs on the portable device 112 (Figure 1), the aforementioned elements of memory 210 are not provided on either the ID server 102 or the client computer 106.
Figure 3 is a block diagram of the portable device 112 shown in Figure 1. It should be appreciated to one skilled in the art that all the elements of the portable device 112 listed below need not be present in all embodiments of the invention and are merely included for exemplary puφoses.
The portable device 112 preferably includes: at least one data processor or central processing unit (CPU) 302; a memory 310; user interface devices 308, such as buttons, a screen, and a headset; communications circuitry 304 for communicating with the network 104 (Figure 1), ID server 102 (Figure 1), and/or client computer 106 (Figure 1); one or more audio players 350, such as a CD or MP3 player; a microphone 352; a radio receiver 354 and antenna 356 for receiving radio broadcasts; and at least one bus 306 that interconnects these components.
Memory 310 preferably includes an operating system 312, such as NXWORKS, LINUX, or WINDOWS having instructions for processing, accessing, storing, or searching data, etc. Memory 310 also preferably includes communications procedures 314 for communicating with the network 104 (Figure 1), ID server 102 (Figure 1), and/or client computer 106 (Figure 1); fingeφrinting procedures 316; searching procedures 318; a fingeφrinting database 320; a radio playlist database 324; a geographic identifier 334; geographic identification procedures 336; a "no identification" message 338; recording procedures 340; player procedures 342; radio procedures 344; a cache 346 for temporarily storing data; frequency detection procedures 358; and a clock 360.
The fingeφrinting procedures 316 are used to obtain a unique identifier or fingeφrint for an audio sample of an audio track, as described in further detail below in relation to Figure 6.
Also, in this embodiment the searching procedures 318 are used for searching the fingeφrint database 320 in order to attempt to identify audio, as described in further detail below. The fingeφrinting database 320 includes numerous fingeφrints of known audio samples or audio tracks and their associated identification data 322(1 )-(N), such as song title, artist, or the like.
In an alternative embodiment, a radio playlist database 324 is provided. In this embodiment, the radio playlist database 324 includes numerous radio frequencies 326(1)-(N) and an associated playlist 328(1)-(N) for each frequency 326(1)-(N). Each playlist 328(1)- (N) includes a date 330(1)-(N) and time 332(1)-(N) , and the identity 332(1)-(N) of each audio track broadcast at that date and time.
Also for the above alternative embodiment, a geographic identifier 334 is provided to assist in identifying the radio stations or frequencies 326(1)-(N) in a particular geographic area. For example, the geographic identifier 334 may select from a set of frequencies stored on the device based on the identified geographic area. This geographic identifier 334 may be provided by any suitable means. In one embodiment the user supplies the geographic identifier 334. In another embodiment, the geographic identifier 334 is obtained by the geographic identification procedures 336. As described above, this can be determined from the user's unique network address. For example, an Internet Protocol (IP) address of the portable device 112 can be used to approximate the geographic area of the user. In still another embodiment, a Global Positioning System (GPS) (not shown) incoφorated into the portable device 112 can be used to determine the geographic area of the user.
In all embodiments, a "no identification" message 236 is used to inform the user that no identification can be made, if the portable device 112 cannot identify the audio track.
In the embodiments of the invention where fingeφrinting of the audio track is used to identify the audio, the recording procedures 340 record an audio sample 348, which is stored in the cache 348. The audio sample is recorded from the audio player/s 350, microphone 352, and/or the radio receiver 354.
In the embodiment of the invention where a date, time, and radio station or broadcast frequency, are used to identify audio from a radio station playlist, the recording procedures are used to record the date, time, and broadcast or radio station frequency 349, which is stored in the cache 348.
The player procedures 342 are preferably provided to play audio on the audio player/s 350. These player procedures 342 are especially needed for playing digital audio, such as MP3 audio tracks, or the like.
The radio procedures 344 are preferably provided to play radio received at the antenna 356 and fed through the radio receiver 354. It should, however, be appreciated that
all the aforementioned components of the memory 310 need not be present in all embodiments of the invention and are merely included for exemplary puφoses.
The frequency procedures 358 are used to detect the frequency of a radio station broadcast, and the clock 360 is used to keep the date and time. The frequency procedures 358 and clock 360 are explained in further detail below in relation to Figures 7-9.
Figure 4 is a flow chart of a method for identifying audio, where the identification is performed by the identification server (ID server) 102, according to an embodiment of the invention. In one embodiment, the audio player/s 350 (Figure 3) and/or player procedures 342 (Figure 3) of the portable device 112 play at step 402 audio through the user interface devices 308 (Figure 3). For example, a built-in MP3 player plays audio to a user through a headset. In another embodiment, the radio receiver 354 (Figure 3) and/or radio procedures 344 (Figure 3) receive and play a radio broadcast through the portable device's headset.
Instructions are then received at step 404 to identify the audio. These instructions preferably come from the user, such as by the user depressing a "identify now" button on the portable device, or the like. In an alternative embodiment, the instruction to record is received automatically. For example, an audio sample is automatically recorded every 2 minutes. The steps 402 and 404 of playing audio and receiving instructions are not essential to the invention and in some embodiments need not occur.
An audio sample is then recorded at step 406 by the recording procedures 342 (Figure 3) and saved as an audio sample 348 (Figure 3) in the cache 346 (Figure 3). In one embodiment, audio is recorded continuously and automatically segmented into audio samples having sufficient length to undergo fingeφrinting. For example, audio is continually recorded and automatically segmented into 30 second audio samples that are continually sent to the ID server 102 to be identified.
Where audio is not being played by the audio player/s 350 (Figure 3) and/or player procedures 342 (Figure 3) of the portable device 112, audio is recorded at step 406 through the microphone 352 (Figure 3).
In the embodiment where the portable device 112 couples to network 104 via the client computer 106, the communication procedures 314 (Figure 3) transmit the audio sample to the client computer 106 at step 408. As mentioned previously, this communication occurs over communications link 110, such as a serial port connection, wireless connection, or the
like. The audio sample is then received by the client computer and sent at step 410 to the ID server 102.
It should be appreciated that where the portable device does not have a persistent communication link with the client computer 106 and/or ID server 102, then the audio samples are saved in the cache 346 (Figure 3) until such time as a connection is established between the portable device 112 and the client computer 106 and/or ID server 102.
In an alternative embodiment, where no client computer is present, such as where the portable device 112 communicates with the ID server 102 via a cellular telephone network, the audio sample is transmitted at step 408 directly to the ID server 102.
The ID server 102 then receives the audio sample at step 412. The fingeφrinting procedures 216 (Figure 2) on the ID server 102 subsequently perform at step 414 fingeφrinting on the audio sample to determine a unique identifier or fingeφrint for the audio sample based on the audio sample's characteristics or acoustical features. Such characteristics or acoustical features include the audio sample's analog waveform, loudness, pitch, brightness, bandwidth, Mel Frequency Cepstral Coefficients (MFCC), or the like. One suitable method for fingeφrinting an audio sample is disclosed in U.S. patent No. 5,918,223, which is incoφorated herein by reference.
The fingeφrint database 220 (Figure 2) is then searched at step 416 for a match or partial match of the fingeφrint of the audio sample recorded on the portable device to a known fingeφrint of a previously identified audio track. If a match is not located (418 - No), then a "no identification" message 236 (Figure 2) is sent at step 430 to the portable device 112. In the embodiment where the portable device 112 couples to the J-D server 102 via the client computer, the client computer receives the "no identification" message and sends it to the portable device at step 432.
The portable device then receives at step 434 the "no identification message," which is displayed at step 436 to the user informing the user that the audio could not be identified. Such a message is preferably displayed on a screen on the portable device.
If a match of said fingeφrint to a known fingeφrint of a previously identified audio track is located (418 - Yes), then identification data associated with said previously identified audio track is sent at step 420 to the portable device 112. In the embodiment where the portable device 112 couples to the ID server 102 via the client computer, the client computer
receives the identification of the audio sample and sends the identification to the portable device at step 422.
The portable device receives and displays the identification of the audio sample at steps 424 and 426 respectively. For example, an artist and song title is displayed. In a preferred embodiment, additional actions are then taken at step 428. These additional actions are performed only after the identity of the audio sample is known, and for example include the client computer 106 and/or portable device 112 automatically displaying the identified artist's Web-page, biography, discography; automatically displaying a Web-page selling the artist's song or album; downloading the audio track from a subscription service; recommending a similar audio; obtaining prices of the identified audio track from Internet music merchants; or the like. These additional actions are preferably selected by choosing a menu item from the portable device's display, and can be customized and/or downloaded from third party service providers. The portable device can also return information about commercials or other spoken word recordings. Further these additional actions can be taken on all audio samples where the identity of the audio is known or unknown. For example, the user may have downloaded a digital audio file and may have the complete or partial identity of the song known, but still want to send the song for identification to receive additional information on the audio track or take some action on that audio track.
Figure 5 is a flow chart of another method for identifying audio, where the identification is performed by a client computer 106, according to another embodiment of the invention. In one embodiment, the audio player/s 350 (Figure 3) and/or player procedures 342 (Figure 3) of the portable device 112 play audio at step 502 through the user interface devices 308 (Figure 3). For example, a built-in MP3 player plays audio to a user through a headset.
Instructions are then received at step 504 to identify audio. These instructions preferably come from the user, such as by the user depressing a "identify now" button on the portable device. In an alternative embodiment, the instruction to record is received automatically. For example, an audio sample is automatically recorded every 2 minutes. It should be noted that the steps 502 and 504 of playing audio and receiving instructions to identify the audio are not essential to the invention and in some embodiments need not occur.
An audio sample is then recorded at step 506 by the recording procedures 342 (Figure 3) and saved as an audio sample 348 (Figure 3) in the cache 349 (Figure 3). In one
embodiment, audio is recorded continuously and automatically segmented into audio samples having sufficient length to undergo fingeφrinting. For example, audio is continually recorded and automatically segmented into 30 second audio samples that are continually identified by the client computer 106.
Where audio is not being played the audio player/s 350 (Figure 3) and/or player procedures 342 (Figure 3) of the portable device 112, audio is recorded at step 506 through the microphone 352 (Figure 3).
In the embodiment where the portable device 112 couples to the client computer 106 via a wireless connection, the communication procedures 314 (Figure 3) continue transmitting audio samples to the client computer 106 at step 508 until the wireless connection is brought down. The audio sample is then received at step 510 by the client computer 106.
It should be appreciated that where the portable device does not have a persistent communication link with the client computer 106, the audio samples are saved in the cache until such time as a connection is established between the portable device 112 and the client computer 106.
The fingeφrinting procedures 216 (Figure 2) on the client computer 106 subsequently perform at step 512 fingeφrinting on the audio sample to determine a unique identifier or fingeφrint for the audio sample based on the audio sample's characteristics or acoustical features, as described above.
In yet another embodiment, the fingeφrinting procedures 316 (Figure 3) on the portable device 112 perform fingeφrinting on the audio sample to determine a unique identifier or fingeφrint for the audio sample based on the audio sample's characteristics or acoustical features. This fingeφrint is then sent to the client computer 106, which searches the fingeφrint database at step 514.
The fingeφrint database 220 (Figure 2) is then searched at step 514 for a match or partial match to the fingeφrint of the audio sample recorded on the portable device. If a match is not located (516 - No), then a "no identification" message 236 (Figure 2) is sent at step 526 to the portable device 112, which receives the "no identification message" at step 528 and displays it at step 530 to the user informing the user that the audio could not be identified. Such a message is preferably displayed on a screen on the portable device.
If a match is located (516 - Yes), then an identification of the audio sample is sent 518 to the portable device 112. The portable device receives and displays the identification of the audio sample at steps 520 and 522, preferably on the portable device's screen. For example, an artist and song title is displayed. In a preferred embodiment, additional actions are then taken at step 524, as described above.
Figure 6 is a flow chart of yet another method for identifying audio, where the identification is performed by a portable device 112, according to yet another embodiment of the invention. In one embodiment, the audio player/s 360 (Figure 3) and/or player procedures 342 (Figure 3) of the portable device 112 play audio at step 602 through the user interface devices 308 (Figure 3). For example, a built-in MP3 player plays audio to a user through a headset. An instruction to identify audio is then received at step 604. In an alternative embodiment, the instruction to record is received automatically. It should be noted that the steps 602 and 404 of playing audio and receiving instructions to identify the audio are not essential to the invention and in some embodiments need not occur.
An audio sample is then recorded at step 606 by the recording procedures 342 (Figure 3) and saved as an audio sample 348 (Figure 3) in the cache 349 (Figure 3). In one embodiment, audio is recorded continuously and automatically segmented into audio samples having sufficient length to undergo fingeφrinting. The audio samples 348 (Figure 3) are preferably temporarily saved in the cache 346 (Figure 3).
Where audio is not being played by the audio player/s 350 (Figure 3) and/or player procedures 342 (Figure 3) of the portable device 112, audio is recorded at step 606 through the microphone 352 (Figure 3).
The fingeφrinting procedures 316 (Figure 3) on the portable device 112 subsequently perform at step 608 fingeφrinting on the audio sample to determine a unique identifier or fingeφrint for the audio sample based on the audio sample's characteristics or acoustical features. Such characteristics or acoustical features include the audio sample's analog waveform, loudness, pitch, brightness, bandwidth, Mel Frequency Cepstral Coefficients (MFCC), or the like. One suitable method for fingeφrinting an audio sample is disclosed in U.S. patent No. 6,918,223, which is incoφorated herein by reference.
The fingeφrint database 320 (Figure 3) is then searched at step 610 for a match or partial match to the fingeφrint of the audio sample recorded on the portable device. If a match is not located (612 - No), then a "no identification" message 340 (Figure 3) is
displayed at step 614 informing the user that the audio could not be identified. Such a message is preferably displayed on a screen on the portable device.
If a match is located (612 - Yes), then an identification of the audio sample is displayed at step 616, preferably on the portable device's screen. For example, an artist and song title is displayed. In a preferred embodiment, additional actions are then taken at step 618, as described above.
One of the advantages to performing fingeφrinting on the portable device is that it saves memory on the portable device, as fingeφrints are substantially smaller than the audio samples. In addition, another embodiment of the invention identifies audio using a peer to peer network consisting of networked portable devices only. For example, if the device has wireless networking capabilities, and fingeφrinting is performed on the portable device, then identification of the audio sample may occur by searching for fingeφrints on other networked portable devices.
Another embodiment where fingeφrinting is performed on the portable device, provides central kiosks, such as at record stores, where identification of the fingeφrint may be performed. This embodiment alleviates the load placed on such kiosks, as they do not generate the fingeφrint and less data would have to be transferred and maintained at each kiosk.
Figure 7 is a flow chart of a method for identifying audio from a radio broadcast, where the identification is performed by an identification server (ID server) 102, according to an embodiment of the invention. The radio receiver 354 (Figure 3) and/or radio procedures 344 (Figure 3) of the portable device 112 receive and play a radio broadcast at step 702. An instruction is then received at step 704 to identify an audio track of the radio broadcast. These instructions preferably come from the user, such as by the user depressing a "identify now" button on the portable device, or the like. In an alternative embodiment, the instruction to identify the audio track is received automatically. For example, an attempt to identify the audio track occurs automatically every 2 minutes.
The recording procedures 342 (Figure 3) then record and store the radio station's broadcast frequency at step 706, and the date and time that the portable device 112 received the instruction to identify the audio track. For example, a broadcast frequency of 95.7 kHz, date of February 23, 2002, and a time of 113H00 is recorded. The recording procedures 340 (Figure 3) obtain the frequency from the frequency detection procedures 358 (Figure 3) and
the date and time from the clock 360 (Figure 3). In their simplest form, the frequency detection procedures 358 (Figure 3) do nothing more than ascertain what radio frequency the user has selected, i.e., by reading the value that the radio receiver has been tuned to. Alternatively, the frequency detection procedures 358 (Figure 3) may detect the frequency of the broadcast, which is often broadcast together with the audio signal. This uses the Radio Data Service (RDS) which typically transmits the actual radio identification. RDS typically transmits the actual station identification, which is more reliable as such a station identification is not dependent on geography. RDS actually transmits information about the owner of the radio station which would unambiguously defines which playlist to search. Still other frequency detection procedures 358 (Figure 3) may detect the frequency of the broadcast by detecting the frequency that the radio receiver is tuned to. It should be appreciated that the frequency of the broadcast is determined automatically, i.e., the user does not supply the radio frequency to the frequency detection procedures 358 (Figure 3).
The date and time on the clock 360 (Figure 3) initially can be set by the user, or the portable device can automatically set the clock using known techniques for remotely synchronizing a clock from a reliable time source, such as an atomic clock. Such techniques for remotely synchronizing a clock are disclosed in U.S. Patent Nos. 4,823,328, and 4,768,178, both of which are incoφorated herein by reference.
The recorded date, time, and frequency (hereinafter "clock/frequency data") 349 (Figure 3) are then stored in the cache 346 (Figure 3). The portable device 112 then transmits the clock/frequency data 349 (Figure 3)at step 708.
In the embodiment where the portable device 112 couples to ID server 102 via the client computer 106, the communication procedures 314 (Figure 3) transmit the clock/frequency data 349 (Figure 3) to the client computer 106 at step 708. As mentioned previously, this communication occurs over communications link 110 (Figure 1), such as a serial port connection, wireless connection, or the like. The clock/frequency data 349 (Figure 3) is then received by the client computer and sent to the ID server 102 at step 710.
It should be appreciated that where the portable device does not have a persistent communication link with the client computer 106 and/or ID server 102, then the clock/frequency data 349 (Figure 3) are saved in the cache until such time as a connection is established between the portable device 112 and the client computer 106 and/or ID server 102.
In an alternative embodiment, where no client computer is present, such as where the portable device 112 communicates directly with the ID server 102 via a cellular telephone network, the audio sample is transmitted directly to the ID server 102 at step 708.
The ID server 102 then receives the clock/frequency data 349 (Figure 3) at step 712. The radio playlist database 224 (Figure 2) is then searched at step 716 for a match to the clock/frequency data 349 (Figure 3) recorded on the portable device 112. If a match is not located (718 - No), then a "no identification" message 236 (Figure 2) is sent at step 730 to the portable device 112. In the embodiment where the portable device 112 couples to the ID server 102 via the client computer, the client computer receives the "no identification" message and sends it to the portable device at step 732.
At step 734 the portable device then receives the "no identification message," which is displayed at step 736 to the user informing the user that the audio could not be identified. Such a message is preferably displayed on a screen on the portable device.
If a match is located (718 - Yes), then an identification of the audio track is sent at step 720 to the portable device 112. In the embodiment where the portable device 112 couples to the ID server 102 via the client computer, the client computer receives the identification of the audio sample and sends the identification to the portable device at step 722. The portable device receives and displays the identification of the audio sample, such as an artist and song title, at steps 724 and 726. In a preferred embodiment, additional actions are then taken at step 728, as described above.
Figure 8 is a flow chart of another method for identifying audio from a radio broadcast, where the identification is performed by a client computer 106, according to another embodiment of the invention. The radio receiver 354 (Figure 3) and/or radio procedures 344 (Figure 3) of the portable device 112 receive and play at step 802 a radio broadcast on the portable device 112. Instructions are then received at step 804 to identify an audio track played on the radio. These instructions preferably come from the user, such as by the user depressing a "identify now" button on the portable device, or the like. In an alternative embodiment, the instruction to identify the audio track is received automatically. For example, an attempt to identify the audio track occurs automatically every 2 minutes.
The recording procedures 342 (Figure 3) then store the clock/frequency data 349 (Figure 3) at step 806. hi a similar manner to that explained above, the recording procedures
340 (Figure 3) obtain the clock/frequency data 349 (Figure 3) from the frequency detection procedures 358 (Figure 3) and the date and time from the clock 360 (Figure 3).
The clock/frequency data 349 (Figure 3) are stored in the cache 346 (Figure 3), and subsequently transmitted at step 808 the clock/frequency data 349 (Figure 3) to the client computer 106. As mentioned previously, this communication occurs over communications link 110 (Figure 1), such as a serial port connection, wireless connection, or the like. The audio sample is then received at step 810 by the client computer 106.
It should be appreciated that where the portable device does not have a persistent communication link with the client computer 106, then the clock/frequency data 349 (Figure 3) are saved in the cache until such time as a connection is established between the portable device 112 and the client computer 106.
The radio playlist database 224 (Figure 2) is then searched at step 814 for a match to the clock/frequency data 349 (Figure 3) recorded on the portable device 112. If a match is not located (816 - No), then a "no identification" message 236 (Figure 2) is sent at step 826 to the portable device 112. The portable device then receives and displays the "no identification message" at steps 828 and 830.
If a match is located (816 - Yes), then an identification of the audio track is sent at step 818 to the portable device 112. The portable device receives and displays the identification of the audio sample, such as an artist and song title, at steps 820 and 822. In a preferred embodiment, additional actions are then taken at step 824, as described above.
Figure 9 is a flow chart of yet another method for identifying audio from a radio broadcast, where the identification is performed by a portable device 112, according to yet another embodiment of the invention. The radio receiver 354 (Figure 3) and/or radio procedures 344 (Figure 3) of the portable device 112 receive and play at step 902 a radio broadcast on the portable device 112. In a similar manner to that described above, instructions are then received at step 904 to identify an audio track played on the radio.
The recording procedures 342 (Figure 3) then store the clock/frequency data 349 (Figure 3) at step 906. In a similar manner to that explained above, the recording procedures 340 (Figure 3) obtain the clock/frequency data 349 (Figure 3) from the frequency detection procedures 358 (Figure 3) and the date and time from the clock 360 (Figure 3).
The clock/frequency data 349 (Figure 3) are stored in the cache 346 (Figure 3). The radio playlist database 324 (Figure 3) is then searched at step 910 for a match to the
clock/frequency data 349 (Figure 3). If a match is not located (912 - No), then a "no identification" message 338 (Figure 3) is displayed at step 914. If a match is located (912 - Yes), then an identification of the audio track is displayed at step 916. In a preferred embodiment, additional actions are then taken at step 918, as described above.
Moreover, the fingeφrinting database 220 (Figure 2) and/or 320 (Figure 3) on the ID server 102 (Figure 1), client computer 106 (Figure 1), or portable device 112 (Figure 1), is preferably periodically updated from the fingeφrint provider 116 (Figure 1). Similarly, the radio playlist database 224 (Figure 2) and/or 324 (Figure 3) on the ID server 102 (Figure 1), client computer 106 (Figure 1), and/or portable device 112 (Figure 1) is preferably periodically updated from the playlist provider 114 (Figure 1).
In still a further embodiment of the invention, the portable device appends clock/frequency data to the audio sample. This data may be used for many uses, such as determining a user's listening habits for targeted advertising, or the like.
In yet another embodiment of the invention, when a user is listening to a radio broadcast on a secondary device, such as a car radio, he/she can use the portable device to identify the broadcast channel which the secondary device is tuned and then record from that channel. To do this, the portable device searches through all the channels until it finds the broadcast station whose signal matches the ambient audio as heard through the portable device's microphone. Subsequently, the audio is identified using one of the above described techniques. This embodiment addresses drawbacks associated with recording ambient noise. Furthermore, the tuned radio frequency is recorded, which can be used to augment the fingeφrinting process and provide additional information to the database.
Still further, another embodiment of the invention utilizes broadcasting a predefined set of audible tones. In one embodiment the tones are used to identify the beginning and end of audio that has been designated for identification. For example, before and after an audio track, a radio station transmits one or more audible tones. The portable device 112 (Figure 1) is configured to record the audio track encapsulated by the audible tones. Once recorded, the audio track is identified as described above.
In another embodiment, the audible tones themselves contain identification data. For example, a series of audible tones (such as three quick beeps) represent an identifier, such as a series of numbers. This identifier is then used to look-up associated information on a database, either on the portable device 112 (Figure 1) or on the ID server 102 (Figure 1).
Another example would be where a series of audible tones themselves represent a artist's name, song title, radio station identifier, or the like. Such a series of tones may also be identifiable by the human ear, but may be forced to conform with a prescribed set of rules for such tones that would distinguish them from normal audio. For example, they may have to begin with a prescribed set of tones or conform to certain prescribed length. One use of such an embodiment, is where a local band promoting itself registers it's audible tone identifier in the database to be identified by the system. Upon synchronization the portable device returns information, such as telling the user where to get the recording for that particular local band or where to buy a CD containing the local band's songs, etc. What is more, such audible tone identifiers can be associated with particular artists, radio stations, etc., and used in marketing and promotion. Such audible tone identifiers may be transmitted to other users via any suitable means, such as email, beaming from one portable device to another, or the like. In addition, such audible tone/s and audible tone identifiers can be used to assist any of the abovementioned methods for audio track identification.
The foregoing descriptions of specific embodiments of the present invention are presented for puφoses of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously many modifications and variations are possible in view of the above teachings. For example, any of the aforementioned embodiments or methods, may be combined with one another, especially if a combination of embodiments or methods can be used to assist in the identification of an audio track. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. Furthermore, the order of steps in the method are not necessarily intended to occur in the sequence laid out. It is intended that the scope of the invention be defined by the following claims and their equivalents.