JP7314221B2

JP7314221B2 - ELECTRONIC DEVICE FOR DETECTING SOUND SOURCE AND METHOD OF OPERATION THEREOF

Info

Publication number: JP7314221B2
Application number: JP2021148570A
Authority: JP
Inventors: ジョンウンパク; デファンキム; ドンウソ; ドンファンキム; ジスチョン; ジャンヒイ
Original assignee: Naver Corp
Current assignee: Naver Corp
Priority date: 2020-09-14
Filing date: 2021-09-13
Publication date: 2023-07-25
Anticipated expiration: 2041-09-13
Also published as: KR102380540B1; JP2022048130A; KR20220035635A

Description

多様な実施形態は、マルチメディアコンテンツ（ｍｕｌｔｉｍｅｄｉａｃｏｎｔｅｎｔ）に使用された少なくとも１つの音源（ａｕｄｉｏｓｏｕｒｃｅ）を検出するための電子装置およびその作動方法に関する。 Various embodiments relate to an electronic device and method of operation for detecting at least one audio source used in multimedia content.

音源検出技術とは、マルチメディアコンテンツに使用された音源を検出する技術である。一般的に、サーバには、複数の音源が登録されており、音源のフィンガープリント（ｆｉｎｇｅｒｐｒｉｎｔ）がそれぞれ記録されている。このようなサーバは、音源検出技術を利用して、マルチメディアコンテンツのフィンガープリントに基づいて、登録された音源からマルチメディアコンテンツに使用された音源を検出する。これにより、サーバは、音源に関する情報と、音源内でマルチメディアコンテンツに使用された部分の開始位置を提供する。 A sound source detection technology is a technology for detecting sound sources used in multimedia content. In general, a plurality of sound sources are registered in the server, and fingerprints of the sound sources are respectively recorded. Such servers use sound source detection techniques to detect sound sources used in multimedia content from registered sound sources based on fingerprints of the multimedia content. The server thereby provides information about the sound source and the starting position of the part used for the multimedia content within the sound source.

しかし、このようなサーバは、マルチメディアコンテンツに使用された音源を検出するための動作性能が低いという問題を抱えている。具体的に、サーバが、マルチメディアコンテンツの全体のフィンガープリントと登録された音源のフィンガープリントとを比較しなければならないため、サーバの演算量が増加し、サーバの動作効率性が低下する。さらに、サーバが、音源内からマルチメディアコンテンツに使用された部分を正確に検出することに困難がある。 However, such servers have the problem of low operational performance for detecting sound sources used in multimedia content. Specifically, the server has to compare the fingerprint of the entire multimedia content with the fingerprint of the registered sound source, which increases the computational complexity of the server and reduces the operational efficiency of the server. Furthermore, it is difficult for the server to accurately detect the portion used for multimedia content from within the sound source.

多様な実施形態は、マルチメディアコンテンツに使用された少なくとも１つの音源を効率的に検出することができる、電子装置およびその作動方法を提供する。 Various embodiments provide an electronic device and method of operation thereof that can efficiently detect at least one sound source used in multimedia content.

多様な実施形態は、マルチメディアコンテンツのフィンガープリントを部分的に利用して、音源内でマルチメディアコンテンツに使用された部分を効率的に検出することができる、電子装置およびその作動方法を提供する。 Various embodiments provide an electronic device and method of operation that can utilize, in part, fingerprints of multimedia content to efficiently detect portions used for multimedia content within a sound source.

多様な実施形態は、音源内でマルチメディアコンテンツに使用された部分に対し、音源内の時間位置だけでなく、マルチメディアコンテンツ内の時間位置を検出することにより、音源内でマルチメディアコンテンツに使用された部分をより正確に特定することができる、電子装置およびその作動方法を提供する。 Various embodiments provide an electronic device and method of operation thereof that can more accurately identify a portion of a sound source used for multimedia content by detecting a time position within the multimedia content as well as a time position within the sound source for the portion used for the multimedia content within the sound source.

多様な実施形態は、音源内でマルチメディアコンテンツに使用された部分の音源内の時間位置とマルチメディアコンテンツ内の時間位置に基づいて、音源に対する信頼度を検出することができる、電子装置およびその作動方法を提供する。 Various embodiments provide an electronic device and an operating method thereof that can detect a confidence level for a sound source based on the time position within the sound source of the portion used for the multimedia content within the sound source and the time position within the multimedia content.

多様な実施形態に係る電子装置の作動方法は、前記電子装置のプロセッサが、マルチメディアコンテンツのフィンガープリントを予め設定された時間間隔によって複数の検索区間に分割する段階、前記検索区間のうちの少なくとも１つがマッチングされる検出区間を有する少なくとも１つの音源を検出する段階、前記マルチメディアコンテンツ内の前記検出区間の時間位置および前記音源内の前記検出区間の時間位置を示す位置情報を決定する段階、および前記音源と関連する情報および前記位置情報を提供する段階を含んでよい。 A method of operating an electronic device according to various embodiments may include dividing a fingerprint of multimedia content into a plurality of search intervals by a preset time interval, detecting at least one sound source having a detection interval with which at least one of the search intervals is matched, determining location information indicating a time position of the detection interval within the multimedia content and a time position of the detection interval within the sound source, and providing information related to the sound source and the location information.

多様な実施形態に係るコンピュータプログラムは、前記作動方法を前記電子装置に実行させるために非一時的なコンピュータ読み取り可能な記録媒体に記録されてよい。 A computer program according to various embodiments may be recorded in a non-transitory computer-readable recording medium for causing the electronic device to execute the operating method.

多様な実施形態に係る非一時的なコンピュータ読み取り可能な記録媒体は、前記作動方法を前記電子装置に実行させるためのプログラムが記録されている。 A non-transitory computer-readable recording medium according to various embodiments records a program for causing the electronic device to execute the operating method.

多様な実施形態に係る電子装置は、メモリ、および前記メモリに連結され、前記メモリに記録された少なくとも１つの命令を実行するように構成されたプロセッサを含み、前記プロセッサは、マルチメディアコンテンツのフィンガープリントを予め設定された時間間隔によって複数の検索区間に分割し、前記検索区間のうちの少なくとも１つがマッチングされる検出区間を有する少なくとも１つの音源を検出し、前記マルチメディアコンテンツ内の前記検出区間の時間位置および前記音源内の前記検出区間の時間位置を示す位置情報を決定し、前記音源と関連する情報および前記位置情報を提供するように構成されてよい。 An electronic device according to various embodiments includes a memory and a processor coupled to the memory and configured to execute at least one instruction recorded in the memory, the processor divides a fingerprint of multimedia content into a plurality of search intervals by preset time intervals, detects at least one sound source having a detection interval with which at least one of the search intervals is matched, determines position information indicating a time position of the detection interval within the multimedia content and a time position of the detection interval within the sound source; It may be arranged to provide information associated with said sound source and said location information.

多様な実施形態によると、電子装置は、マルチメディアコンテンツに使用された少なくとも１つの音源を効率的に検出することができる。具体的に、電子装置は、マルチメディアコンテンツのフィンガープリントの検索区間のうちの１つから時間範囲を拡張させながら、音源内でマルチメディアコンテンツにマッチングされる検出区間を効率的に検出することができる。また、電子装置は、音源内の検出区間の時間位置だけでなく、マルチメディアコンテンツ内の検出区間の時間位置を検出することにより、音源およびマルチメディアコンテンツ内で検出区間をより正確に特定することができる。さらに、電子装置は、検出区間に対するマルチメディアコンテンツの開始点からの時間オフセットと音源の開始点からの時間オフセットのオフセット差に基づいてマルチメディアコンテンツと音源とを比較することにより、音源に対する信頼度を検出することができる。これにより、電子装置は、利用者のために、音源と関連する情報と位置情報だけでなく、信頼度を提供することができる。 According to various embodiments, an electronic device can efficiently detect at least one sound source used in multimedia content. Specifically, the electronic device can efficiently detect a detection interval matching the multimedia content within the sound source while expanding the time range from one of the search intervals of the fingerprint of the multimedia content. In addition, the electronic device can more accurately identify the detection interval within the sound source and multimedia content by detecting the time position of the detection interval within the multimedia content as well as the time position of the detection interval within the sound source. Furthermore, the electronic device compares the multimedia content and the sound source based on the offset difference between the time offset from the start point of the multimedia content and the time offset from the start point of the sound source for the detection section, thereby detecting the reliability of the sound source. This allows the electronic device to provide a degree of confidence as well as information related to the sound source and location information for the user.

多様な実施形態における、電子装置を示した図である。1 illustrates an electronic device, according to various embodiments; FIG. 図１のプロセッサの動作特徴を説明するための例示図である。2 is an exemplary diagram for explaining operational features of the processor of FIG. 1; FIG. 図１のプロセッサの動作特徴を説明するための例示図である。2 is an exemplary diagram for explaining operational features of the processor of FIG. 1; FIG. 図１のプロセッサの動作特徴を説明するための例示図である。2 is an exemplary diagram for explaining operational features of the processor of FIG. 1; FIG. 図１のプロセッサの動作特徴を説明するための例示図である。2 is an exemplary diagram for explaining operational features of the processor of FIG. 1; FIG. 図１のプロセッサの動作特徴を説明するための例示図である。2 is an exemplary diagram for explaining operational features of the processor of FIG. 1; FIG. 図１のプロセッサを詳しく示した図である。2 is a detailed diagram of the processor of FIG. 1; FIG. 多様な実施形態における、電子装置の作動方法を示した図である。4A-4D illustrate methods of operating an electronic device, according to various embodiments. 図５の音源の信頼度検出段階を詳しく示した図である。FIG. 6 is a detailed diagram illustrating a sound source reliability detection step of FIG. 5; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments; 多様な実施形態における、電子装置の作動方法を説明するための例示図である。FIG. 4 is an exemplary diagram for explaining how an electronic device operates, according to various embodiments;

以下、本文書の多様な実施形態について、添付の図面を参照しながら説明する。 Various embodiments of this document are described below with reference to the accompanying drawings.

図１は、多様な実施形態における、電子装置１００を示した図である。図２および図３ａ～ｄは、図１のプロセッサ１６０の動作特徴を説明するため例示図である。図４は、図１のプロセッサ１６０を詳しく示した図である。 FIG. 1 illustrates an electronic device 100, according to various embodiments. FIGS. 2 and 3a-d are exemplary diagrams for explaining operational features of processor 160 of FIG. FIG. 4 is a detailed diagram of processor 160 of FIG.

図１を参照すると、多様な実施形態に係る電子装置１００は、連結端子１１０、通信モジュール１２０、入力モジュール１３０、出力モジュール１４０、メモリ１５０、またはプロセッサ１６０のうちの少なくともいずれか１つを含んでよい。一実施形態によっては、電子装置１００の構成要素のうちの少なくともいずれか１つが省略されても、少なくとも１つの他の構成要素が追加されてもよい。一実施形態によっては、電子装置１００の構成要素のうちの少なくともいずれか２つが、１つの統合された回路で実現されてよい。例えば、電子装置１００は、サーバ（ｓｅｒｖｅｒ）、スマートフォン（ｓｍａｒｔｐｈｏｎｅ）、携帯電話、ナビゲーション、ＰＣ、ノート型ＰＣ、デジタル放送用端末、ＰＤＡ（ｐｅｒｓｏｎａｌｄｉｇｉｔａｌａｓｓｉｓｔａｎｔｓ）、ＰＭＰ（ｐｏｒｔａｂｌｅｍｕｌｔｉｍｅｄｉａｐｌａｙｅｒ）、タブレット、ゲームコンソール（ｇａｍｅｃｏｎｓｏｌｅ）、ウェアラブルデバイス（ｗｅａｒａｂｌｅｄｅｖｉｃｅ）、ＩｏＴ（ｉｎｔｅｒｎｅｔｏｆｔｈｉｎｇｓ）デバイス、家電機器、医療機器、またはロボット（ｒｏｂｏｔ）のうちの少なくともいずれか１つを含んでよい。 Referring to FIG. 1 , an electronic device 100 according to various embodiments may include at least one of a connection terminal 110 , a communication module 120 , an input module 130 , an output module 140 , a memory 150 , or a processor 160 . Depending on the embodiment, at least one of the components of the electronic device 100 may be omitted or at least one other component may be added. In some embodiments, at least any two of the components of electronic device 100 may be implemented in one integrated circuit. For example, the electronic device 100 includes a server, a smart phone, a mobile phone, a navigation device, a PC, a notebook PC, a digital broadcast terminal, a PDA (personal digital assistant), a PMP (portable multimedia player), a tablet, a game console, a wearable device (wear). Able device), an IoT (internet of things) device, a home appliance, a medical device, or a robot.

連結端子１１０は、電子装置１００で外部装置１０２と物理的に連結されてよい。例えば、外部装置１０２は、他の電子装置を含んでよい。このために、連結端子１１０は、少なくとも１つのコネクタを含んでよい。例えば、コネクタは、ＨＤＭＩコネクタ、ＵＳＢコネクタ、ＳＤカードコネクタ、またはオーディオコネクタのうちの少なくともいずれか１つを含んでよい。 The connection terminal 110 may be physically connected to the external device 102 in the electronic device 100 . For example, external device 102 may include other electronic devices. To this end, the connection terminal 110 may include at least one connector. For example, the connector may include at least one of an HDMI connector, a USB connector, an SD card connector, or an audio connector.

通信モジュール１２０は、電子装置１００で外部装置１０２、１０４との通信を実行してよい。通信モジュール１２０は、電子装置１００と外部装置１０２、１０４との間に通信チャネルを樹立し、通信チャンネルを介して外部装置１０２、１０４との通信を実行してよい。ここで、外部装置１０２、１０４は、衛星、基地局、または他の電子装置のうちの少なくともいずれか１つを含んでよい。通信モジュール１２０は、有線通信モジュールまたは無線通信モジュールのうちの少なくともいずれか１つを含んでよい。有線通信モジュールは、連結端子１０２を介して外部装置１０２と有線で接続し、有線で通信してよい。無線通信モジュールは、近距離通信モジュールまたは遠距離通信モジュールのうちの少なくともいずれか１つを含んでよい。近距離通信モジュールは、外部装置１０２と近距離通信方式で通信してよい。例えば、近距離通信方式は、ブルートゥース（Ｂｌｕｅｔｏｏｔｈ）、Ｗｉ－Ｆｉｄｉｒｅｃｔ、または赤外線通信（ＩｒＤＡ：ｉｎｆｒａｒｅｄｄａｔａａｓｓｏｃｉａｔｉｏｎ）のうちの少なくともいずれか１つを含んでよい。遠距離通信モジュールは、外部装置１０４と遠距離通信方式で通信してよい。ここで、遠距離通信モジュールはネットワーク１９０を介して外部装置１０４と通信してよい。例えば、ネットワーク１９０は、セルラネットワーク、インターネット、またはＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）やＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）のようなコンピューターネットワークのうちの少なくともいずれか１つを含んでよい。 The communication module 120 may facilitate communication with the external devices 102 , 104 in the electronic device 100 . The communication module 120 may establish a communication channel between the electronic device 100 and the external devices 102, 104 and perform communication with the external devices 102, 104 over the communication channel. Here, external devices 102, 104 may include satellites, base stations, and/or other electronic devices. Communication module 120 may include at least one of a wired communication module and a wireless communication module. The wired communication module may be connected to the external device 102 by wire through the connection terminal 102 and communicate by wire. The wireless communication module may include at least one of a near field communication module or a long range communication module. The near field communication module may communicate with the external device 102 in a near field communication manner. For example, the near field communication scheme may include at least one of Bluetooth, Wi-Fi direct, or infrared data association (IrDA). The telecommunications module may communicate with the external device 104 in a telecommunications manner. Here, the telecommunications module may communicate with external device 104 via network 190 . For example, network 190 may include a cellular network, the Internet, and/or a computer network such as a local area network (LAN) or wide area network (WAN).

入力モジュール１３０は、電子装置１００の少なくとも１つの構成要素に使用される信号を入力してよい。入力モジュール１３０は、利用者が電子装置１００に信号を直接入力するように構成される入力装置、周辺環境を感知して信号を発生するように構成されるセンサ装置、または画像を撮影して画像データを生成するように構成されるカメラモジュールのうちの少なくともいずれか１つを含んでよい。例えば、入力装置は、マイクロフォン（ｍｉｃｒｏｐｈｏｎｅ）、マウス（ｍｏｕｓｅ）、またはキーボード（ｋｅｙｂｏａｒｄ）のうちの少なくともいずれか１つを含んでよい。一実施形態において、センサ装置は、タッチを感知するように設定されたタッチ回路（ｔｏｕｃｈｃｉｒｃｕｉｔｒｙ）、またはタッチによって発生する力の強度を測定するように設定されたセンサ回路のうちの少なくともいずれか１つを含んでよい。 The input module 130 may input signals used by at least one component of the electronic device 100 . The input module 130 may include at least one of an input device configured to allow a user to directly input a signal to the electronic device 100, a sensor device configured to sense the surrounding environment and generate a signal, or a camera module configured to capture an image and generate image data. For example, the input device may include at least one of a microphone, mouse, and keyboard. In one embodiment, the sensor device may include at least one of a touch circuit configured to sense a touch and/or a sensor circuit configured to measure the strength of the force generated by the touch.

出力モジュール１４０は、情報を出力してよい。出力モジュール１４０は、情報を視覚的に表示するように構成される表示モジュール、または情報を聴覚的に再生するように構成されるオーディオモジュールのうちの少なくとも１つを含んでよい。例えば、表示モジュールは、ディスプレイ、ホログラム装置、またはプロジェクタのうちの少なくともいずれか１つを含んでよい。一例として、表示モジュールは、入力モジュール１３０のタッチ回路またはセンサ回路のうちの少なくともいずれか１つと組み立てられて、タッチスクリーンとして実現されてよい。例えば、オーディオモジュールは、スピーカまたはレシーバのうちの少なくともいずれか１つを含んでよい。 Output module 140 may output information. Output module 140 may include at least one of a display module configured to visually display information or an audio module configured to audibly reproduce information. For example, the display module may include at least one of a display, a hologram device, or a projector. As an example, the display module may be assembled with at least one of the touch circuitry and/or sensor circuitry of the input module 130 and implemented as a touch screen. For example, an audio module may include a speaker and/or a receiver.

メモリ１５０は、電子装置１００の少なくとも１つの構成要素によって使用される多様なデータを記録してよい。例えば、メモリ１５０は、揮発性メモリまたは不揮発性メモリのうちの少なくともいずれか１つを含んでよい。データは、少なくとも１つのプログラム、およびこれと関連する入力データまたは出力データを含んでよい。プログラムは、メモリ１５０に少なくとも１つの命令を含むソフトウェアとして記録されてよく、例えば、オペレーティングシステム、ミドルウェア、またはアプリケーションのうちの少なくともいずれか１つを含んでよい。 Memory 150 may store various data used by at least one component of electronic device 100 . For example, memory 150 may include volatile memory and/or non-volatile memory. The data may include at least one program and its associated input or output data. The program may be stored in memory 150 as software including at least one instruction, and may include, for example, an operating system, middleware, and/or applications.

プロセッサ１６０は、メモリ１５０のプログラムを実行して、電子装置１００の少なくとも１つの構成要素を制御してよい。これにより、プロセッサ１６０は、データ処理または演算を実行してよい。このとき、プロセッサ１６０は、メモリ１５０に記録された命令を実行してよい。プロセッサ１６０は、マルチメディアコンテンツ（ｍｕｌｔｉｍｅｄｉａｃｏｎｔｅｎｔ）に使用された少なくとも１つの音源（ａｕｄｉｏｓｏｕｒｃｅ）を検出しようとしてよい。ここで、マルチメディアコンテンツは、画像データまたはオーディオデータのうちの少なくとも１つで構成されてよい。一例として、マルチメディアコンテンツは、画像データとオーディオデータで構成され、ミュージックビデオやネットワークを介して共有される動画などを含んでよい。他の例として、マルチメディアコンテンツは、オーディオデータで構成され、ポッドキャスト、放送局などで生成されてよい。また、図２に示すように、マルチメディアコンテンツのオーディオデータには少なくとも１つの音源が使用されてよく、各音源の少なくとも一部が使用されてよい。 Processor 160 may execute programs in memory 150 to control at least one component of electronic device 100 . Thereby, processor 160 may perform data processing or computation. At this time, processor 160 may execute the instructions stored in memory 150 . Processor 160 may attempt to detect at least one audio source used in the multimedia content. Here, the multimedia content may consist of at least one of image data and audio data. As an example, the multimedia content consists of image data and audio data, and may include music videos, moving images shared via networks, and the like. As another example, multimedia content may consist of audio data and may be generated by podcasts, broadcasters, and the like. Also, as shown in FIG. 2, at least one sound source may be used in the audio data of the multimedia content, and at least part of each sound source may be used.

多様な実施形態によると、プロセッサ１６０は、マルチメディアコンテンツに対応して、音源の少なくとも一部を検出区間２００として検出してよい。このとき、検出区間２００に対するオフセット差が定義されてよい。オフセット（ΔＴ_ｍ－ΔＴ_ａ）は、マルチメディアコンテンツの開始点（Ｔ_ｍ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ｍ）と音源の開始点（Ｔ_ａ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴａ）との差を示してよい。ここで、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）としてある値が定義されてよく、一定の範囲内の値が定義されてもよい。一例として、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）は、マルチメディアコンテンツの開始点（Ｔ_ｍ０）からの時間オフセット（ΔＴ_ｍ）と音源の開始点（Ｔ_ａ０）からの時間オフセット（ΔＴ_ａ）との差を中心とする範囲内の値が定義されてよい。オフセット差（ΔＴ_ｍ－ΔＴ_ａ）として一定の範囲内の値が定義される場合、同じ音源に対する多様な再生速度が考慮されてよい。 According to various embodiments, processor 160 may detect at least a portion of a sound source as detection interval 200 corresponding to multimedia content. At this time, an offset difference for the detection section 200 may be defined. The offset (ΔT _m - ΔT _a ) may indicate the difference between the time offset (ΔT _m ) from the start point of the multimedia content (T _m0 ) to the start point (T _d0 ) of the detection interval 200 and the time offset (ΔTa) from the start point of the sound source (T _a0 ) to the start point (T _d0 ) of the detection interval 200. Here, a certain value may be defined as the offset difference (ΔT _m −ΔT _a ), and a value within a certain range may be defined. As an example, the offset difference (ΔT _m - ΔT _a ) may be defined as a range of values centered on the difference between the time offset (ΔT _m ) from the starting point of the multimedia content (T _m0 ) and the time offset (ΔT _a ) from the starting point of the audio source (T _a0 ). If the offset difference (ΔT _m −ΔT _a ) is defined within a certain range of values, various playback speeds for the same sound source may be considered.

第１例として、図３ａに示すように、検出区間２００は、音源の全体領域であってよく、マルチメディアコンテンツの一部領域に使用されてよい。ここで、音源の開始点（Ｔ_ａ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ａ）は０であるため、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）は、マルチメディアコンテンツの開始点（Ｔ_ｍ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ｍ）が決定されてよい（オフセット差＝ΔＴ_ｍ）。第２例として、図３ｂに示すように、検出区間２００は、音源の一部領域であってよく、マルチメディアコンテンツの一部領域に使用されてよい。ここで、マルチメディアコンテンツの開始点（Ｔ_ｍ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ｍ）であるため、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）は、音源の開始点（Ｔ_ａ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ａ）から決定されてよい（オフセット差＝－ΔＴ_ａ）。第３例として、図３ｃに示すように、検出区間２００は、音源の一部領域であってよく、マルチメディアコンテンツの一部領域に使用されてよい。ここで、音源の開始点（Ｔ_ａ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ａ）は０であるため、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）は、マルチメディアコンテンツの開始点（Ｔ_ｍ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ｍ）が決定されてよい（オフセット差＝ΔＴ_ｍ）。第４例として、図３ｄに示すように、検出区間２００は、音源の一部領域であってよく、マルチメディアコンテンツの全体領域に使用されてよい。ここで、マルチメディアコンテンツの開始点（Ｔ_ｍ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ｍ）であるため、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）は、音源の開始点（Ｔ_ａ０）から検出区間２００の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ａ）から決定されてよい（オフセット差＝－ΔＴ_ａ）。 As a first example, as shown in FIG. 3a, the detection interval 200 may be the entire area of the sound source, or may be used for a partial area of the multimedia content. Here, since the time offset (ΔT a ) from the start point (T _a0 ) of the sound source to the start point (T _d0 ) of the detection interval 200 is 0, the offset difference (ΔT _m − ΔT _a _{) may be determined as the time offset (ΔT m} ₎ from the start point (T _m0 ) of the multimedia content to the start point (T _d0 ) of the detection interval 200 (offset difference=ΔT _m ). As a second example, as shown in FIG. 3b, the detection interval 200 may be a partial area of a sound source and may be used for a partial area of multimedia content. Here, since the time offset (ΔT m ) from the start point (T _m0 ) of the multimedia content to the start point (T _d0 ₎ of the detection interval 200, the offset difference (ΔT _m − ΔT _a ) may be determined from the time offset (ΔT _a ) from the start point (T _a0 ) of the sound source to the start point (T _d0 ) of the detection interval 200 (offset difference=−ΔT _a ). As a third example, as shown in FIG. 3c, the detection interval 200 may be a partial area of a sound source and may be used for a partial area of multimedia content. Here, since the time offset (ΔT a ) from the start point (T _a0 ) of the sound source to the start point (T _d0 ) of the detection interval 200 is 0, the offset difference (ΔT _m − ΔT _a _{) may be determined as the time offset (ΔT m} ₎ from the start point (T _m0 ) of the multimedia content to the start point (T _d0 ) of the detection interval 200 (offset difference=ΔT _m ). As a fourth example, as shown in FIG. 3d, the detection interval 200 may be a partial area of the sound source, or may be used for the entire area of the multimedia content. Here, since the time offset (ΔT m ) from the start point (T _m0 ) of the multimedia content to the start point (T _d0 ₎ of the detection interval 200, the offset difference (ΔT _m − ΔT _a ) may be determined from the time offset (ΔT _a ) from the start point (T _a0 ) of the sound source to the start point (T _d0 ) of the detection interval 200 (offset difference=−ΔT _a ).

多様な実施形態によると、プロセッサ１６０は、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）を利用して、音源の信頼度（ｃｏｎｆｉｄｅｎｃｅ）を検出してよい。信頼度は、検出された音源がマルチメディアコンテンツに使用されたものであるかに対する正確度を示すものであり、信頼度が高いほど正確度が高くてよい。具体的に、プロセッサ１６０は、オフセット差（ΔＴ_ｍ－ΔＴ_ａ）に基づいて、マルチメディアコンテンツに対して検出区間２００を整列させてよい。この後、プロセッサ１６０は、マルチメディアコンテンツと検出区間２００とを比較して、音源の信頼度を検出してよい。一実施形態によると、プロセッサ１６０は、ビット演算により、音源の信頼度を検出してよい。プロセッサ１６０は、マルチメディアコンテンツのフィンガープリントと検出区間２００のフィンガープリントとの比較演算により、検出区間２００のビットエラーレート（ｂｉｔｅｒｒｏｒｒａｔｅ：ＢＥＲ）を計算してよい。プロセッサ１６０は、ビットエラーレートを点数（ｓｃｏｒｅ）にそれぞれ変換してよい。プロセッサ１６０は、予め定められたスコア関数（ｓｃｏｒｅｆｕｎｃｔｉｏｎ）を利用して、ビットエラーレートを点数にそれぞれ変換してよい。プロセッサ１６０は、点数の和から信頼度を検出してよい。プロセッサ１６０は、予め定められたコンフィデンス関数（ｃｏｎｆｉｄｅｎｃｅｆｕｎｃｔｉｏｎ）を利用して、点数の和から信頼度を検出してよい。 According to various embodiments, processor 160 may utilize the offset difference (ΔT _m −ΔT _a ) to detect the confidence of the sound source. The reliability indicates the accuracy of whether the detected sound source is used in the multimedia content, and the higher the reliability, the higher the accuracy. Specifically, processor 160 may align detection interval 200 with respect to multimedia content based on the offset difference (ΔT _m −ΔT _a ). The processor 160 may then compare the multimedia content with the detection interval 200 to detect the reliability of the sound source. According to one embodiment, the processor 160 may detect the confidence level of the sound source by bitwise operations. The processor 160 may calculate a bit error rate (BER) of the detection interval 200 through a comparison operation between the fingerprint of the multimedia content and the fingerprint of the detection interval 200 . Processor 160 may convert each bit error rate into a score. Processor 160 may utilize a predetermined score function to convert each bit error rate into a score. Processor 160 may detect confidence from the sum of scores. Processor 160 may utilize a predetermined confidence function to detect confidence from the sum of scores.

多様な実施形態によると、プロセッサ１６０は、図４に示すように、ＡＰＩ（ａｐｐｌｉｃａｔｉｏｎｐｒｏｇｒａｍｍｉｎｇｉｎｔｅｒｆａｃｅ）４６１、プロセスＡＰＩ（ｐｒｏｃｅｓｓ－ＡＰＩ）４６３、制御部４６５、コンテンツ取得部４６７、フィンガープリント部４６９、マッチング部４７１、比較部４７３、またはクラスタリング部４７５のうちの少なくとも１つを含んでよい。一実施形態によっては、プロセッサ１６０の構成要素のうちの少なくともいずれか１つが省略されても、少なくとも１つの他の構成要素が追加されてもよい。一実施形態によっては、プロセッサ１６０の構成要素のうちの少なくともいずれか２つが、１つの統合された回路で実現されてよい。 According to various embodiments, the processor 160 may include at least one of an application programming interface (API) 461, a process-API (API) 463, a control unit 465, a content acquisition unit 467, a fingerprint unit 469, a matching unit 471, a comparison unit 473, or a clustering unit 475, as shown in FIG. In some embodiments, at least one of the components of processor 160 may be omitted and at least one other component may be added. In some embodiments, at least any two of the components of processor 160 may be implemented in one integrated circuit.

ＡＰＩ４６１は、利用者の要請を検出してよい。プロセスＡＰＩ４６３は、利用者の要請に基づいて、命令語を生成してよい。制御部４６５は、プロセッサ１６０の構成要素のうちの少なくとも１つを制御してよい。このとき、制御部４６５は、プロセッサ１６０の構成要素のうちの少なくとも２つを仲介する役割を実行してよく、プロセッサ１６０の構成要素のうちの少なくとも１つのための作業を実行してよい。コンテンツ取得部４６７は、命令語に基づいて、マルチメディアコンテンツを取得してよい。フィンガープリント部４６９は、マルチメディアコンテンツのフィンガープリントを取得してよい。このとき、フィンガープリント部４６９は、マルチメディアコンテンツのオーディオデータからフィンガープリントを直接抽出してよい。マッチング部４７１は、マルチメディアコンテンツのフィンガープリントに基づいて、少なくとも１つの音源を検出してよい。このとき、メモリ１５０には、複数の音源が予め登録されており、登録された音源のフィンガープリントがそれぞれ記録されていてよい。マッチング部４７１は、マルチメディアコンテンツのフィンガープリントと登録された音源のフィンガープリントをマッチングさせることにより、登録された音源のフィンガープリントのうちの少なくとも１つを検出してよい。比較部４７３は、マルチメディアコンテンツのフィンガープリントと検出された音源のフィンガープリントとを比較して、検出された音源の信頼度を検出してよい。クラスタリング部４７５は、検出された音源に基づいて、マルチメディアコンテンツに対する比較対象またはマルチメディアコンテンツの比較結果のうちの少なくとも１つを、検出された音源と同一あるいは類似の音源を包括するように拡張させてよい。具体的に、クラスタリング部４７５は、検出された音源と同一あるいは類似の音源の情報を取得し、マルチメディアコンテンツに対する比較対象を、検出された音源と同一あるいは類似の音源に拡張させてよい。一方、クラスタリング部４７５は、比較部４７３の比較結果に基づいて、検出された音源と同一あるいは類似の音源をまとめてよい。 API 461 may detect user requests. The process API 463 may generate commands based on user requests. Controller 465 may control at least one of the components of processor 160 . At this time, the controller 465 may act as an intermediary between at least two of the components of the processor 160 and may perform work for at least one of the components of the processor 160 . The content acquisition unit 467 may acquire multimedia content based on the commands. A fingerprint unit 469 may obtain a fingerprint of the multimedia content. At this time, the fingerprint unit 469 may directly extract the fingerprint from the audio data of the multimedia content. The matching unit 471 may detect at least one sound source based on the fingerprint of the multimedia content. At this time, a plurality of sound sources may be registered in advance in the memory 150, and fingerprints of the registered sound sources may be recorded respectively. The matching unit 471 may detect at least one fingerprint of the registered sound source by matching the fingerprint of the multimedia content with the fingerprint of the registered sound source. The comparison unit 473 may detect the reliability of the detected sound source by comparing the fingerprint of the multimedia content and the fingerprint of the detected sound source. Based on the detected sound source, the clustering unit 475 expands at least one of the comparison target for the multimedia content or the comparison result of the multimedia content to include the same or similar sound source as the detected sound source. Specifically, the clustering unit 475 may acquire information on sound sources that are the same as or similar to the detected sound source, and expand the comparison target for multimedia content to sound sources that are the same or similar to the detected sound source. On the other hand, the clustering unit 475 may group sound sources identical or similar to the detected sound source based on the comparison result of the comparison unit 473 .

図５は、多様な実施形態における、電子装置１００の作動方法を示した図である。図６は、図５の音源の信頼度検出段階（段階５５０）を詳しく示した図である。図７～１７は、多様な実施形態における、電子装置１００の作動方法を説明するための例示図である。 FIG. 5 is a diagram illustrating how electronic device 100 operates, according to various embodiments. FIG. 6 is a detailed diagram of the sound source reliability detection step (step 550) of FIG. 7-17 are exemplary diagrams illustrating how electronic device 100 operates, according to various embodiments.

図５を参照すると、段階５１０で、電子装置１００は、マルチメディアコンテンツのフィンガープリント７１０を複数の検索区間７２０に分割してよい。ここで、マルチメディアコンテンツは、画像データまたはオーディオデータのうちの少なくとも１つで構成されてよい。一例として、マルチメディアコンテンツは、画像データとオーディオデータで構成され、ミュージックビデオ、ネットワークを介して共有される動画などを含んでよい。他の例として、マルチメディアコンテンツは、オーディオデータで構成され、ポッドキャストや放送局などで生成されてよい。また、オーディオデータには、少なくとも１つの音源が使用されてよく、各音源の少なくとも一部が含まれてよい。プロセッサ１６０は、マルチメディアコンテンツのフィンガープリント７１０を取得してよい。一実施形態によると、プロセッサ１６０は、マルチメディアコンテンツのオーディオデータからフィンガープリント７１０を直接抽出してよい。例えば、利用者によってマルチメディアコンテンツが選択されれば、プロセッサ１６０は、マルチメディアコンテンツのオーディオデータからフィンガープリント７１０を抽出してよい。他の実施形態によると、プロセッサ１６０は、外部装置１０２、１０４からマルチメディアコンテンツのフィンガープリント７１０をクエリとして受信してよい。ここで、フィンガープリントは、オーディオデータに対する時間による周波数分布を示してよい。プロセッサ１６０は、図７に示すように、マルチメディアのフィンガープリント７１０を予め設定された時間間隔によって複数の検索区間７２０に分割してよい。一例として、時間間隔は、数秒の範囲内で定められてよい。 Referring to FIG. 5 , at step 510 , the electronic device 100 may divide the multimedia content fingerprint 710 into a plurality of search intervals 720 . Here, the multimedia content may consist of at least one of image data and audio data. As an example, the multimedia content is composed of image data and audio data, and may include music videos, moving images shared through networks, and the like. As another example, multimedia content may consist of audio data and may be generated by podcasts, broadcast stations, and the like. Also, the audio data may use at least one sound source and may include at least a portion of each sound source. Processor 160 may obtain fingerprint 710 of the multimedia content. According to one embodiment, processor 160 may directly extract fingerprint 710 from the audio data of the multimedia content. For example, if multimedia content is selected by a user, processor 160 may extract fingerprint 710 from the audio data of the multimedia content. According to another embodiment, the processor 160 may receive the multimedia content fingerprint 710 from the external device 102, 104 as a query. Here, the fingerprint may indicate the frequency distribution over time for the audio data. The processor 160 may divide the multimedia fingerprint 710 into multiple search intervals 720 by preset time intervals, as shown in FIG. As an example, the time intervals may be defined within seconds.

段階５２０で、電子装置１００は、検索区間７２０のうちの少なくとも１つがマッチングされる検出区間１０１０を有する少なくとも１つの音源を検出してよい。このとき、メモリ１５０には、複数の音源が予め登録されており、登録された音源のフィンガープリント９１０がそれぞれ記録されていてよい。プロセッサ１６０は、図８に示すように、検索区間７２０のそれぞれを登録された音源のフィンガープリント９１０と比較してよい。これにより、プロセッサ１６０は、検索区間７２０のうちの１つに基づいて、登録された音源のフィンガープリント９１０のうちの少なくとも１つを検出してよい。このとき、プロセッサ１６０は、図９に示すように、検索区間７２０のうちの１つから時間範囲を拡張させながら、マルチメディアコンテンツのフィンガープリント７１０と検出された音源のフィンガープリント９１０とを比較してよい。これにより、プロセッサ１６０は、図１０に示すように、検出された音源のフィンガープリント９１０から、検索区間７２０のうちの少なくとも１つがマッチングされる検出区間１０１０を検出できるようになる。 At step 520, the electronic device 100 may detect at least one sound source having a detection interval 1010 with which at least one of the search intervals 720 is matched. At this time, a plurality of sound sources may be registered in advance in the memory 150, and fingerprints 910 of the registered sound sources may be recorded respectively. Processor 160 may compare each of search intervals 720 to fingerprints 910 of registered sound sources, as shown in FIG. Thereby, processor 160 may detect at least one of fingerprints 910 of the registered sound sources based on one of search intervals 720 . At this time, the processor 160 may compare the multimedia content fingerprint 710 with the detected sound source fingerprint 910 while extending the time range from one of the search intervals 720, as shown in FIG. This enables the processor 160 to detect detection intervals 1010 that match at least one of the search intervals 720 from the detected sound source fingerprint 910, as shown in FIG.

段階５３０で、電子装置１００は、マルチメディアコンテンツおよび検出された音源内で検出区間１０１０の位置情報を決定してよい。位置情報は、マルチメディアコンテンツのフィンガープリント７１０内の検出区間１０１０の時間位置、および検出された音源のフィンガープリント９１０内の検出区間１０１０の時間位置を示してよい。プロセッサ１６０は、検出区間１０１０の位置情報に基づいて、検出区間１０１０に対するオフセット差（ΔＴ_ｍ－ΔＴ_ａ）を検出してよい。オフセット差（ΔＴ_ｍ－ΔＴ_ａ）は、マルチメディアコンテンツのフィンガープリント７１０の開始点（Ｔ_ｍ０）から検出区間１０１０の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ｍ）と、検出された音源のフィンガープリント９１０の開始点（Ｔ_ａ０）から検出区間１０１０の開始点（Ｔ_ｄ０）までの時間オフセット（ΔＴ_ａ）の差を示してよい。 At step 530, the electronic device 100 may determine the location information of the detection interval 1010 within the multimedia content and the detected sound source. The location information may indicate the time position of the detection interval 1010 within the fingerprint 710 of the multimedia content and the time position of the detection interval 1010 within the fingerprint 910 of the detected sound source. Processor 160 may detect the offset difference (ΔT _m −ΔT _a ) for detection section 1010 based on the position information of detection section 1010 . The offset difference (ΔT _m - ΔT _a ) may indicate the difference between the time offset (ΔT _m ) from the start point (T _m0 ) of the fingerprint of the multimedia content 710 to the start point (T _{d0 ) of the detection interval 1010 and the time offset (ΔT a ) from the start point (T a0} ₎ _of the fingerprint 910 of the detected sound source to the start point (T _d0 ) of the detection interval 1010.

段階５４０で、電子装置１００は、検出区間１０１０に対するオフセット差（ΔＴ_ｍ－ΔＴ_ａ）に基づいて、マルチメディアコンテンツのフィンガープリント７１０に対して検出区間１０１０を整列させてよい。プロセッサ１６０は、図１０に示すように、検出区間１０１０と検出区間１０１０にマッチングされた少なくとも１つの検索区間７２０とが互いに対応するように、検出区間１０１０を整列させてよい。プロセッサ１６０は、少なくとも１つの検索区間７２０の開始点に検出区間１０１０の開始点を整列させてよい。 At step 540 , electronic device 100 may align detection interval 1010 to multimedia content fingerprint 710 based on the offset difference (ΔT _m −ΔT _a ) for detection interval 1010 . Processor 160 may align detection interval 1010 such that detection interval 1010 and at least one search interval 720 matched to detection interval 1010 correspond to each other, as shown in FIG. Processor 160 may align the start of detection interval 1010 with the start of at least one search interval 720 .

段階５５０で、電子装置１００は、マルチメディアコンテンツのフィンガープリント７１０と検出区間１０１０とを比較して、検出された音源の信頼度を検出してよい。信頼度は、検出された音源がマルチメディアコンテンツに使用されたものであるかに対する正確度を示すものであり、信頼度が高いほど正確度が高くてよい。プロセッサ１６０は、検出区間１０１０と検出区間１０１０にマッチングされた少なくとも１つの検索区間７２０とを比較して、検出された音源の信頼度を検出してよい。一実施形態によると、プロセッサ１６０は、少なくとも１つの検索区間７２０に対する検出区間１０１０のビット演算により、検出された音源の信頼度を検出してよい。これについては、図６を参照しながらより詳しく説明する。 At step 550, the electronic device 100 may compare the multimedia content fingerprint 710 with the detection interval 1010 to detect the reliability of the detected sound source. The reliability indicates the accuracy of whether the detected sound source is used in the multimedia content, and the higher the reliability, the higher the accuracy. The processor 160 may compare the detection interval 1010 and at least one search interval 720 matched to the detection interval 1010 to detect the confidence of the detected sound source. According to one embodiment, processor 160 may detect the confidence of the detected sound source by bitwise operation of detection interval 1010 against at least one search interval 720 . This will be explained in more detail with reference to FIG.

図６を参照すると、段階６５１で、電子装置１００は、マルチメディアコンテンツのフィンガープリント７１０と検出区間１０１０との比較演算により、比較区間１１１０を生成してよい。一例として、比較演算は、排他的論理和（ＸＯＲ）を含んでよい。プロセッサ１６０は、検出区間１０１０と検出区間１０１０にマッチングされた少なくとも１つの検索区間７２０との比較演算により、図１１に示すように比較区間１１１０を生成してよい。 Referring to FIG. 6 , in step 651 , the electronic device 100 may generate a comparison interval 1110 by comparing the multimedia content fingerprint 710 and the detection interval 1010 . As an example, a comparison operation may include an exclusive OR (XOR). Processor 160 may perform a comparison operation between detection interval 1010 and at least one search interval 720 matched to detection interval 1010 to generate comparison interval 1110, as shown in FIG.

段階６５３で、電子装置１００は、比較区間１１１０を複数のビット区間１２１０に分割してよい。プロセッサ１６０は、図１２に示すように、予め設定された時間間隔によって比較区間１１１０を複数のビット区間１２１０に分割してよい。一例として、時間間隔は、約１秒であってよい。 At step 653 , the electronic device 100 may divide the comparison interval 1110 into a plurality of bit intervals 1210 . The processor 160 may divide the comparison interval 1110 into a plurality of bit intervals 1210 according to preset time intervals, as shown in FIG. As an example, the time interval may be approximately 1 second.

段階６５５で、電子装置１００は、ビット区間１２１０のビットエラーレートをそれぞれ計算してよい。プロセッサ１６０は、ビット区間１２１０のそれぞれに対して連続するビットで計算し、ビット区間１２１０のそれぞれのビットからビットエラーレートを計算してよい。ここで、各ビットエラーレートは、０から１までの値で表現され、ビットエラーレートが低いほど類似性が高くてよい。類似性は、検出区間１０１０と検出区間１０１０にマッチングされた少なくとも１つの検索区間７２０の類似性を示してよい。すなわち、ビットエラーレートが０であるということは、検出区間１０１０と少なくとも１つの検索区間７２０が同一であることを意味してよい。一例として、図１３に示すようにビット区間１２１０のビットエラーレートが計算された場合、これは、マルチメディアコンテンツで５１３～５５１秒までに検出された音源の検出区間１０１０が使用されたことを示してよい。他の例として、図１４に示すようにビット区間１２１０のビットエラーレートが計算された場合、これは、マルチメディアコンテンツで複数の音源の検出区間１０１０が使用され、同じ時間範囲にも複数の音源の検出区間１０１０が使用されたことを示してよい。このような場合、プロセッサ１６０は、検出区間１０１０のビットエラーレートのうちで最も高いビットエラーレートを抽出してよい。 At step 655 , electronic device 100 may calculate the bit error rate of bit intervals 1210 respectively. Processor 160 may compute on successive bits for each of bit intervals 1210 and calculate a bit error rate from each bit of bit interval 1210 . Here, each bit error rate is represented by a value between 0 and 1, and the lower the bit error rate, the higher the similarity. The similarity may indicate the similarity of the detection interval 1010 and at least one search interval 720 matched to the detection interval 1010 . That is, a bit error rate of 0 may mean that the detection interval 1010 and at least one search interval 720 are the same. As an example, if the bit error rate of bit interval 1210 is calculated as shown in FIG. 13, this may indicate that the detection interval 1010 of the sound source detected from 513 to 551 seconds in the multimedia content was used. As another example, if the bit error rate of bit interval 1210 is calculated as shown in FIG. 14, this may indicate that multiple sound source detection intervals 1010 were used in the multimedia content, and multiple sound source detection intervals 1010 were also used in the same time range. In such a case, processor 160 may extract the highest bit error rate among the bit error rates of detection section 1010 .

段階６５７で、電子装置１００は、ビットエラーレートをビット区間１２１０の点数にそれぞれ変換してよい。プロセッサ１６０は、予め定められたスコア関数を利用して、ビットエラーレートを点数にそれぞれ変換してよい。ここで、図１５に示すように、ビットエラーレートが低いほど高い点数に変換され、ビットエラーレートが高いほど低い点数に変換されてよい。このとき、ビットエラーレートのうちの少なくとも１つが閾値を超過すれば、プロセッサ１６０は、少なくとも１つのビットエラーレートに対応して０を点数として付与してよい。一方、ビットエラーレートのうちの残りが閾値以下であれば、プロセッサ１６０は、残りのビットエラーレートに基づいて点数をそれぞれ計算し、残りのビットエラーレートに対応して計算された点数をそれぞれ付与してよい。例えば、スコア関数は、以下の数式（１）のように表現されてよい。 At step 657 , the electronic device 100 may convert the bit error rates into scores of the bit intervals 1210 respectively. Processor 160 may utilize a predetermined scoring function to convert each bit error rate into a score. Here, as shown in FIG. 15, a lower bit error rate may be converted into a higher score, and a higher bit error rate may be converted into a lower score. At this time, if at least one of the bit error rates exceeds the threshold, the processor 160 may assign a score of 0 corresponding to the at least one bit error rate. On the other hand, if the remainder of the bit error rates are less than or equal to the threshold, processor 160 may calculate scores based on the respective residual bit error rates and award respective calculated scores corresponding to the residual bit error rates. For example, the score function may be expressed as Equation (1) below.

ｙ＝ｍａｘ（０．０，１０・（（１－ｘ））^２－（１．０－閾値）^２））・・・（１）
ここで、ｘはビットエラーレートを、ｙは点数を示し、閾値は、０超過０．５以下であってよく、一例として、０．３５以上０．４５以下であってよい。閾値が０に近い値で設定されるほど、音源に対する誤検出の可能性は低下するが、ノイズのある音源に対する検出の可能性は高まることがある。これに反し、閾値が０．５に近い値で設定されるほど、ノイズのある音源に対する検出の可能性が低下するが、音源に対する誤検出の可能性は高まることがある。 y=max(0.0,10·((1−x)) ² −(1.0−threshold) ² )) (1)
Here, x indicates a bit error rate, y indicates a score, and the threshold may be greater than 0 and 0.5 or less, for example, 0.35 or more and 0.45 or less. As the threshold value is set closer to 0, the possibility of false detection of sound sources decreases, but the possibility of detection of noisy sound sources may increase. On the contrary, the closer the threshold is set to 0.5, the lower the probability of detection for noisy sound sources, but the higher the probability of false detection of sound sources may be.

段階６５９で、電子装置１００は、点数の和から検出された音源に対する信頼度を検出してよい。信頼度は、検出された音源がマルチメディアコンテンツに使用されたものであるかに対する正確度を示すものであり、信頼度が高いほど正確度が高くてよい。プロセッサ１６０は、予め定められたコンフィデンス関数を利用して、点数の和から信頼度を検出してよい。ここで、図１６に示すように、信頼度は、０から１までの値で表現されてよい。点数の和が一定の範囲内にある場合、点数の和が信頼度に大きく影響を及ぼすため、点数の和が大きいほど信頼度が著しく高くてよい。一方、点数の和が一定の範囲外である場合、点数の和が信頼度に及ぼす影響が減少してよい。例えば、コンフィデンス関数は、以下の数式（２）のように表現されてよい。 At step 659, the electronic device 100 may detect the reliability of the detected sound source from the sum of the scores. The reliability indicates the accuracy of whether the detected sound source is used in the multimedia content, and the higher the reliability, the higher the accuracy. Processor 160 may utilize a predetermined confidence function to detect confidence from the sum of scores. Here, as shown in FIG. 16, the reliability may be represented by a value from 0 to 1. If the sum of scores is within a certain range, the sum of scores greatly affects the reliability, so the larger the sum of scores, the higher the reliability. On the other hand, if the sum of scores is outside a certain range, the sum of scores may have less impact on reliability. For example, the confidence function may be expressed as Equation (2) below.

ｙ＝ｔａｎｈ（加重値ｘ）・・・（２）
ここで、ｘは点数の和を、ｙは信頼度を示し、加重値は、スコア関数の閾値または後述する信頼度に対する基準値のうちの少なくとも１つによって決定されてよく、例えば、０．１以上０．２以下であってよい。 y=tanh (weight x) (2)
Here, x indicates the sum of scores, y indicates reliability, and the weighted value may be determined by at least one of a threshold of the score function or a reference value for reliability described later, and may be, for example, 0.1 or more and 0.2 or less.

この後、電子装置１００は、図５にリターンして、段階５６０に進んでよい。 After this, electronic device 100 may return to FIG. 5 and proceed to step 560 .

再び図５を参照すると、段階５６０で、電子装置１００は、検出された音源と関連する情報、位置情報、および信頼度を提供してよい。音源と関連する情報は、音源の識別子、名称、またはアーティストのうちの少なくとも１つを含んでよい。位置情報は、マルチメディアコンテンツのフィンガープリント７１０内の検出区間１０１０の時間位置、および検出された音源のフィンガープリント９１０内の検出区間１０１０の時間位置を示してよい。プロセッサ１６０は、図１７に示すように、マルチメディアコンテンツに対応して検出された音源と関連する情報、位置情報、および信頼度を提供してよい。ここで、マルチメディアコンテンツから複数の音源が検出された場合、プロセッサ１６０は、音源のリストとして、検出された音源と関連する情報、位置情報、および信頼度を提供してよい。一例として、プロセッサ１６０は、検出された音源の信頼度とは関係なく、検出された音源の関連する情報、位置情報、および信頼度を提供してよい。他の例として、検出された音源の信頼度が基準値以上であれば、プロセッサ１６０は、検出された音源の関連する情報、位置情報、および信頼度を提供してよい。言い換えれば、検出された音源の信頼度が基準値未満であれば、プロセッサ１６０は、検出された音源の関連する情報、位置情報、および信頼度を提供しなくてもよい。プロセッサ１６０は、外部装置１０２、１０４からのクエリに対する応答として、検出された音源と関連する情報、位置情報、および信頼度を提供してよい。一実施形態によると、プロセッサ１６０は、外部装置１０２、１０４に、検出された音源と関連する情報、位置情報、および信頼度を送信してよい。他の実施形態によると、プロセッサ１６０は、出力モジュール１４０から、検出された音源と関連する情報、位置情報、および信頼度を直接出力してよい。 Referring again to FIG. 5, at step 560, electronic device 100 may provide information associated with the detected sound source, location information, and confidence. Information associated with the sound source may include at least one of the sound source's identifier, name, or artist. The location information may indicate the time position of the detection interval 1010 within the fingerprint 710 of the multimedia content and the time position of the detection interval 1010 within the fingerprint 910 of the detected sound source. Processor 160 may provide information, location information, and confidence associated with detected sound sources corresponding to multimedia content, as shown in FIG. Here, if multiple sound sources are detected from the multimedia content, processor 160 may provide information associated with the detected sound sources, location information, and confidence as a list of sound sources. As an example, processor 160 may provide relevant information, location information, and confidence of the detected sound source independently of the confidence of the detected sound source. As another example, if the confidence of the detected sound source is greater than or equal to a reference value, processor 160 may provide relevant information, location information, and confidence of the detected sound source. In other words, if the confidence of the detected sound source is less than the reference value, the processor 160 may not provide relevant information, location information and confidence of the detected sound source. Processor 160 may provide information, location information, and confidence associated with detected sound sources in response to queries from external devices 102 , 104 . According to one embodiment, the processor 160 may transmit to the external device 102, 104 information associated with the detected sound source, location information, and confidence. According to other embodiments, the processor 160 may directly output the information associated with the detected sound source, the location information, and the confidence from the output module 140 .

多様な実施形態によると、利用者は、マルチメディアコンテンツに使用された音源を確認し、これを多様に活用してよい。一例として、マルチメディアコンテンツが放送や公演などの動画である場合、利用者は、マルチメディアコンテンツに使われた音源に基づいて、マルチメディアコンテンツのキューシート（ｃｕｅｓｈｅｅｔ）を取得してよい。他の例として、利用者は、マルチメディアコンテンツに使用された音源の著作権保護または著作権精算のために活用してよい。 According to various embodiments, a user may check sound sources used in multimedia content and use them in various ways. As an example, if the multimedia content is a moving image such as a broadcast or a performance, the user may obtain a cue sheet of the multimedia content based on the sound sources used in the multimedia content. As another example, the user may use it for copyright protection or copyright settlement of sound sources used in multimedia content.

多様な実施形態によると、検出された音源と関連する情報、位置情報、および信頼度を提供した後、電子装置１００は、検出された音源と関連付いた多様なサービスを提供してよい。一実施形態によると、プロセッサ１６０は、外部装置１０２、１０４に検出された音源を提供してよい。外部装置１０２、１０４によって検出された音源と関連する情報が選択されれば、プロセッサ１６０は、外部装置１０２、１０４に検出された音源を提供してよい。他の実施形態によると、プロセッサ１６０は、検出された音源と関連付いた他のマルチメディアコンテンツを提供してよい。外部装置１０２、１０４によって検出された音源と関連する情報が選択されれば、プロセッサ１６０は、検出された音源と関連する情報に基づいて、他のマルチメディアコンテンツを検索し、外部装置１０２、１０４に検索されたマルチメディアコンテンツを提供してよい。また他の実施形態によると、プロセッサ１６０は、検出された音源と関連付いた付加情報を提供してよい。外部装置１０２、１０４によって検出された音源と関連する情報が選択されれば、プロセッサ１６０は、検出された音源と関連する情報に基づいて、例えば、ニュースやソーシャルネットワークサービス（ｓｏｃｉａｌｎｅｔｗｏｒｋｓｅｒｖｉｃｅ：ＳＮＳ）などを利用して付加情報を検索して、外部装置１０２、１０４に検索された付加情報を提供してよい。 According to various embodiments, after providing information, location information, and confidence associated with the detected sound source, the electronic device 100 may provide various services associated with the detected sound source. According to one embodiment, the processor 160 may provide the detected sound source to the external device 102,104. Once information associated with the sound source detected by the external device 102,104 is selected, the processor 160 may provide the detected sound source to the external device 102,104. According to other embodiments, processor 160 may provide other multimedia content associated with the detected sound source. Once the information associated with the sound source detected by the external device 102, 104 is selected, the processor 160 may retrieve other multimedia content based on the information associated with the detected sound source and provide the retrieved multimedia content to the external device 102, 104. According to yet another embodiment, processor 160 may provide additional information associated with the detected sound source. If the information related to the sound source detected by the external devices 102 and 104 is selected, the processor 160 may search for additional information using, for example, news and social network services (SNS) based on the information related to the detected sound source, and provide the searched additional information to the external devices 102 and 104.

多様な実施形態によると、電子装置１００は、マルチメディアコンテンツに使用された少なくとも１つの音源を効率的に検出してよい。具体的に、電子装置１００は、マルチメディアコンテンツのフィンガープリント７１０の検索区間７２０のうちの１つから時間範囲を拡張させながら、音源内でマルチメディアコンテンツにマッチングされる検出区間１０１０を効率的に検出してよい。また、電子装置１００は、音源内の検出区間１０１０の時間位置だけでなく、マルチメディアコンテンツ内の検出区間１０１０の時間位置を検出することにより、音源およびマルチメディアコンテンツ内で検出区間１０１０をより正確に特定してよい。さらに、電子装置１００は、検出区間１０１０に対するマルチメディアコンテンツの開始点（Ｔ_ｍ０）からの時間オフセット（ΔＴ_ｍ）と音源の開始点（Ｔ_ａ０）からの時間オフセット（ΔＴ_ａ）のオフセット差（ΔＴ_ｍ－ΔＴ_ａ）に基づいてマルチメディアコンテンツと音源とを比較することにより、音源に対する信頼度を検出してよい。これにより、電子装置１００は、利用者のために、音源と関連する情報と位置情報だけでなく、信頼度を提供することができる。 According to various embodiments, electronic device 100 may efficiently detect at least one sound source used in multimedia content. Specifically, the electronic device 100 may efficiently detect the detection interval 1010 matching the multimedia content within the sound source while extending the time range from one of the search intervals 720 of the fingerprint 710 of the multimedia content. Further, the electronic device 100 may more accurately identify the detection interval 1010 within the sound source and multimedia content by detecting the time position of the detection interval 1010 within the multimedia content as well as the time position of the detection interval 1010 within the sound source. Furthermore, electronic device 100 may detect the reliability of the sound source by comparing the multimedia content and the sound source based on the offset difference (ΔT _m - ΔT _a ) between the time offset (ΔT _m ) from the start point (T _m0 ) of the multimedia content and the time offset (ΔT _a ) from the start point (T _a0 ) of the sound source for detection section 1010. Accordingly, the electronic device 100 can provide reliability as well as information related to the sound source and location information for the user.

多様な実施形態に係る電子装置１００の作動方法は、マルチメディアコンテンツのフィンガープリント７１０を予め設定された時間間隔によって複数の検索区間７２０に分割する段階（段階５１０）、検索区間７２０のうちの少なくとも１つがマッチングされる検出区間１０１０を有する少なくとも１つの音源を検出する段階（段階５２０）、マルチメディアコンテンツ内の検出区間１０１０の時間位置および音源内の検出区間１０１０の時間位置を示す位置情報を決定する段階（段階５３０）、および音源と関連する情報および位置情報を提供する段階（段階５６０）を含んでよい。 A method of operating the electronic device 100 according to various embodiments includes dividing a fingerprint 710 of multimedia content into a plurality of search intervals 720 according to preset time intervals (step 510), detecting at least one sound source having a detection interval 1010 with which at least one of the search intervals 720 is matched (step 520), and determining position information indicating the time position of the detection interval 1010 within the multimedia content and the time position of the detection interval 1010 within the sound source. and providing information associated with the sound source and location information (step 560).

多様な実施形態によると、電子装置１００の作動方法は、マルチメディアコンテンツの開始点（Ｔ_ｍ０）からの時間オフセット（ΔＴ_ｍ）と音源の開始点（Ｔ_ａ０）からの時間オフセット（ΔＴ_ａ）のオフセット差（ΔＴ_ｍ－ΔＴ_ａ）に基づいて、フィンガープリント７１０に対して検出区間１０１０を整列させる段階（段階５４０）、およびフィンガープリント７１０と検出区間１０１０とを比較して、音源の信頼度を検出する段階（段階５５０）をさらに含んでよい。 According to various embodiments, the method of operating the electronic device 100 includes aligning the detection interval 1010 with respect to the fingerprint 710 based on an offset difference (ΔT _m - ΔT _a ) between the time offset (ΔT _m ) from the start point (T _m0 ) of the multimedia content and the time offset (ΔT _a ) from the start point (T _a0 ) of the sound source (step 540 ), and aligning the fingerprint 710 and the detection interval 1010 . The comparison may further include detecting a confidence level of the sound source (step 550).

多様な実施形態によると、音源と関連する情報および位置情報を提供する段階（段階５６０）は、音源と関連する情報および位置情報とともに、信頼度を提供する段階を含んでよい。 According to various embodiments, providing information associated with the sound source and location information (step 560) may include providing a confidence level along with the information associated with the sound source and location information.

多様な実施形態によると、信頼度を検出する段階（段階５５０）は、フィンガープリント７１０と検出区間１０１０との比較演算により、比較区間１１１０を生成する段階（段階６５１）、比較区間１１１０を複数のビット区間１２１０に分割する段階（段階６５３）、ビット区間１２１０のビットエラーレートをそれぞれ計算する段階（段階６５５）、ビットエラーレートをビット区間１２１０の点数にそれぞれ変換する段階（段階６５７）、および点数の和から信頼度を検出する段階（段階６５９）を含んでよい。 According to various embodiments, detecting reliability (step 550) includes generating a comparison interval 1110 by comparing the fingerprint 710 and the detection interval 1010 (step 651), dividing the comparison interval 1110 into a plurality of bit intervals 1210 (step 653), calculating bit error rates of the bit intervals 1210 (step 655), and converting the bit error rates into scores of the bit intervals 1210, respectively. Steps (step 657) and detecting a confidence level from the sum of scores (step 659) may be included.

多様な実施形態によると、点数に変換する段階（段階６５７）は、ビットエラーレートのうちの少なくとも１つが閾値を超過すれば、少なくとも１つのビットエラーレートに対応して０を点数として付与する段階、およびビットエラーレートのうちの残りが閾値以下であれば、残りのビットエラーレートに対応して計算される点数を付与する段階を含んでよい。 According to various embodiments, converting to a score (step 657) may include assigning a score of 0 corresponding to at least one bit error rate if at least one of the bit error rates exceeds a threshold, and assigning a score calculated corresponding to the remaining bit error rates if the remainder of the bit error rates are below the threshold.

多様な実施形態によると、音源と関連する情報および位置情報を提供する段階（段階５６０）は、マルチメディアコンテンツが複数の音源と関連付いていれば、音源と関連する情報および位置情報をリストで提供する段階を含んでよい。 According to various embodiments, providing information associated with sound sources and location information (step 560) may include providing information associated with sound sources and location information in a list if the multimedia content is associated with multiple sound sources.

多様な実施形態によると、マルチメディアコンテンツは、画像データまたはオーディオデータのうちの少なくとも１つで構成されてよい。 According to various embodiments, multimedia content may consist of at least one of image data or audio data.

多様な実施形態によると、電子装置１００の作動方法は、音源と関連する情報が選択されれば、音源を提供する段階、または音源と関連する情報が選択されれば、音源と関連付いた他のマルチメディアコンテンツを提供する段階のうちの少なくとも１つをさらに含んでよい。 According to various embodiments, the method of operating the electronic device 100 may further include at least one of providing a sound source if information related to the sound source is selected, or providing other multimedia content associated with the sound source if information related to the sound source is selected.

多様な実施形態によると、検索区間７２０のうちの少なくとも１つは、複数の音源にマッチングされてよい。 According to various embodiments, at least one of search intervals 720 may be matched to multiple sound sources.

多様な実施形態に係る電子装置１００は、メモリ１５０、およびメモリ１５０に連結され、メモリ１５０に記録された少なくとも１つの命令を実行するように構成されたプロセッサ１６０を含んでよい。 Electronic device 100 , according to various embodiments, may include memory 150 and processor 160 coupled to memory 150 and configured to execute at least one instruction stored in memory 150 .

多様な実施形態によると、プロセッサ１６０は、マルチメディアコンテンツのフィンガープリント７１０を予め設定された時間間隔によって複数の検索区間７２０に分割し、検索区間７２０のうちの少なくとも１つがマッチングされる検出区間１０１０を有する少なくとも１つの音源を検出し、マルチメディアコンテンツ内の検出区間１０１０の時間位置および音源内の検出区間１０１０の時間位置を示す位置情報を決定し、音源と関連する情報および位置情報を提供するように構成されてよい。 According to various embodiments, the processor 160 is configured to divide the multimedia content fingerprint 710 into a plurality of search intervals 720 by preset time intervals, detect at least one sound source having a detection interval 1010 with which at least one of the search intervals 720 is matched, determine location information indicating the time position of the detection interval 1010 within the multimedia content and the time location of the detection interval 1010 within the sound source, and provide information associated with the sound source and location information. Good.

多様な実施形態によると、プロセッサ１６０は、マルチメディアコンテンツの開始点（Ｔ_ｍ０）からの時間オフセット（ΔＴ_ｍ）と音源の開始点（Ｔ_ａ０）からの時間オフセット（ΔＴ_ａ）のオフセット差（ΔＴ_ｍ－ΔＴ_ａ）に基づいて、フィンガープリント７１０に対して検出区間１０１０を整列させ、フィンガープリント７１０と検出区間１０１０とを比較して、音源の信頼度を検出するように構成されてよい。 According to various embodiments, the processor 160 aligns the detection interval 1010 with respect to the fingerprint 710 based on the offset difference (ΔT _m - ΔT _a ) between the time offset (ΔT _m ) from the start of the multimedia content (T _m0 ) and the time offset (ΔT _a ) from the start of the sound source (T _a0 ), and compares the fingerprint 710 and the detection interval 1010 to detect the confidence of the sound source. may be configured.

多様な実施形態によると、プロセッサ１６０は、検出区間１０１０に対応する音源の信頼度を検出し、音源と関連する情報および位置情報とともに、信頼度を提供するように構成されてよい。 According to various embodiments, processor 160 may be configured to detect a confidence level of a sound source corresponding to detection interval 1010 and provide the confidence level along with information associated with the sound source and location information.

多様な実施形態によると、プロセッサ１６０は、フィンガープリント７１０と検出区間１０１０との比較演算により、比較区間１１１０を生成し、比較区間１１１０を複数のビット区間１２１０に分割し、ビット区間１２１０のビットエラーレートをそれぞれ計算し、ビットエラーレートをビット区間１２１０の点数にそれぞれ変換し、点数の和から信頼度を検出するように構成されてよい。 According to various embodiments, the processor 160 may be configured to generate a comparison interval 1110 by comparing the fingerprint 710 and the detection interval 1010, divide the comparison interval 1110 into a plurality of bit intervals 1210, calculate bit error rates of the bit intervals 1210, convert the bit error rates into scores of the bit intervals 1210, and detect reliability from the sum of scores.

多様な実施形態によると、プロセッサ１６０は、ビットエラーレートのうちの少なくとも１つが閾値を超過すれば、少なくとも１つのビットエラーレートに対応して０を点数として付与し、ビットエラーレートのうちの残りが閾値以下であれば、残りのビットエラーレートに対応して計算される点数を付与するように構成されてよい。 According to various embodiments, processor 160 may be configured to assign a score of 0 corresponding to at least one bit error rate if at least one of the bit error rates exceeds a threshold, and to assign a calculated score corresponding to the remaining bit error rates if the remainder of the bit error rates are below the threshold.

多様な実施形態によると、プロセッサ１６０は、マルチメディアコンテンツが複数の音源と関連付いていれば、音源と関連する情報および位置情報をリストで提供するように構成されてよい。 According to various embodiments, processor 160 may be configured to provide information associated with the sound sources and location information in a list if the multimedia content is associated with multiple sound sources.

多様な実施形態によると、プロセッサ１６０は、音源と関連する情報が選択されれば、音源または音源と関連付けられた他のマルチメディアコンテンツのうちの少なくとも１つを提供するように構成されてよい。 According to various embodiments, processor 160 may be configured to provide at least one of the sound source or other multimedia content associated with the sound source if information associated with the sound source is selected.

上述した装置は、ハードウェア構成要素、ソフトウェア構成要素、および／またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）およびＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを記録、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者は、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことが理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサおよび１つのコントローラを含んでよい。また、並列プロセッサのような、他の処理構成も可能である。 The apparatus described above may be realized by hardware components, software components, and/or a combination of hardware and software components. For example, the devices and components described in the embodiments may include one or more general purpose or special purpose computers, such as processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable gate arrays (FPGAs), programmable logic units (PLUs), microprocessors, or any variety of devices capable of executing and responding to instructions. It may be realized using The processing unit may run an operating system (OS) and one or more software applications that run on the OS. The processor may also access, record, manipulate, process, and generate data in response to executing software. For convenience of understanding, one processing device may be described as being used, but those skilled in the art will appreciate that a processing device may include multiple processing elements and/or multiple types of processing elements. For example, a processing unit may include multiple processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、思うままに動作するように処理装置を構成したり、独立的または集合的に処理装置に命令したりしてよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、コンピュータ記録媒体または装置に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で記録されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータ読み取り可能な記録媒体に記録されてよい。 Software may include computer programs, code, instructions, or a combination of one or more of these, and may configure a processing device to operate at will or, independently or collectively, instruct a processing device. Software and/or data may be embodied in any kind of machine, component, physical device, computer storage medium or device for interpretation by or for providing instructions or data to a processing device. The software may be stored and executed in a distributed fashion over computer systems linked by a network. Software and data may be recorded on one or more computer-readable recording media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータ読み取り可能な媒体に記録されてよい。ここで、媒体は、コンピュータ実行可能なプログラムを継続して記録するものであっても、実行またはダウンロードのために一時記録するものであってもよい。また、媒体は、単一または複数のハードウェアが結合した形態の多様な記録手段または格納手段であってよく、あるコンピュータシステムに直接接続する媒体に限定されることはなく、ネットワーク上に分散して存在するものであってもよい。媒体の例としては、ハードディスク、フロッピー（登録商標）ディスク、および磁気テープのような磁気媒体、ＣＤ－ＲＯＭおよびＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどを含み、プログラム命令が記録されるように構成されたものであってよい。また、媒体の他の例として、アプリケーションを配布するアプリケーションストアやその他の多様なソフトウェアを供給または配布するサイト、サーバなどで管理する記録媒体または格納媒体が挙げられる。 The method according to the embodiments may be embodied in the form of program instructions executable by various computer means and recorded on a computer-readable medium. Here, the medium may record the computer-executable program continuously or temporarily record it for execution or download. In addition, the medium may be various recording means or storage means in the form of a combination of single or multiple hardware, and is not limited to a medium that is directly connected to a computer system, and may exist distributed on a network. Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and ROM, RAM, flash memory, etc., which may be configured to store program instructions. Other examples of media include recording media or storage media managed by application stores that distribute applications, sites that supply or distribute various software, and servers.

本文書の多様な実施形態およびこれに使用された用語は、本文書に記載された技術を特定の実施形態だけに対して限定するためのものではなく、該当の実施例の多様な変更、均等物、および／または代替物を含むものと理解されなければならない。図面の説明と関連し、類似する構成要素に対しては類似する参照符号を付与した。単数の表現は、文脈上で明らかに異なるように意味しない限り、複数の表現を含んでよい。本文書において、「ＡまたはＢ」、「Ａおよび／またはＢのうちの少なくとも１つ」、「Ａ、Ｂ、またはＣ」、または「Ａ、Ｂ、および／またはＣのうちの少なくとも１つ」などの表現は、ともに羅列される項目のすべての可能な組み合わせを含んでよい。「第１」、「第２」、「１番目」、または「２番目」などの表現は、該当の構成要素を順序または重要度とは関係なく修飾するものであり、ある構成要素を他の構成要素と区分するために使用されるものに過ぎず、該当の構成要素を限定するためのものではない。ある（例：第１）構成要素が他の（例：第２）構成要素に「（機能的にまたは通信的に）連結されて」いるか「接続されて」いると記載されるときには、前記ある構成要素が前記他の構成要素に直接に連結されている場合はもちろん、他の構成要素（例：第３構成要素）を介して連結されている場合も含まれる。 The various embodiments of this document, and the terminology used therein, are not intended to limit the techniques described in this document to only particular embodiments, but should be understood to include various modifications, equivalents, and/or alternatives to the examples in question. With reference to the description of the drawings, similar elements are provided with similar reference numerals. Singular terms may include plural terms unless the context clearly dictates otherwise. In this document, expressions such as “A or B,” “at least one of A and/or B,” “A, B, or C,” or “at least one of A, B, and/or C,” may include all possible combinations of the items listed together. Phrases such as "first," "second," "first," or "second," modify the elements in question without regard to order or importance, are used only to distinguish one element from another, and are not intended to limit the elements in question. When a certain (e.g., first) component is described as being “linked (functionally or communicatively)” or “connected” to another (e.g., second) component, it includes not only the case where the certain component is directly coupled to the other component, but also the case where it is coupled via another component (e.g., a third component).

本文書で使用される用語「モジュール」は、ハードウェア、ソフトウェア、またはファームウェアで構成されたユニットを含み、例えば、ロジック、論理ブロック、部品、または回路などの用語と互換的に使用されてよい。モジュールは、一体で構成された部品、または１つまたはそれ以上の機能を実行する最小単位またはその一部であってよい。例えば、モジュールは、ＡＳＩＣ（ａｐｐｌｉｃａｔｉｏｎ－ｓｐｅｃｉｆｉｃｉｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）で構成されてよい。 As used in this document, the term "module" includes units configured in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. A module may be an integrally constructed part or an atomic unit or part thereof that performs one or more functions. For example, the module may consist of an application-specific integrated circuit (ASIC).

多様な実施形態によると、記載した構成要素のそれぞれの構成要素（例：モジュールまたはプログラム）は、単数または複数の個体を含んでよい。多様な実施形態によると、上述した該当の構成要素のうちの１つ以上の構成要素または段階が省略されてもよいし、１つ以上の他の構成要素または段階が追加されてもよい。大体的にまたは追加的に、複数の構成要素（例：モジュールまたはプログラム）は、１つの構成要素として統合されてよい。このような場合、統合された構成要素は、複数の構成要素それぞれの構成要素の１つ以上の機能を、統合される前に複数の構成要素のうちの該当の構成要素によって実行されるときと同一または類似するように実行してよい。多様な実施形態によると、モジュール、プログラム、または他の構成要素によって実行される段階は、順次的に、並列的に、反復的に、または発見的に実行されても、段階のうちの１つ以上が他の順序で実行されても、省略されても、または１つ以上の他の段階が追加されてもよい。 According to various embodiments, each component (eg, module or program) of the described components may comprise one or more individuals. According to various embodiments, one or more of the corresponding components or steps described above may be omitted, and one or more other components or steps may be added. Generally or additionally, multiple components (eg, modules or programs) may be integrated as one component. In such cases, the integrated component may perform one or more functions of each component of the plurality of components in the same or similar manner as performed by that component of the plurality of components before being integrated. According to various embodiments, the steps performed by modules, programs, or other components may be performed sequentially, in parallel, iteratively, or heuristically, one or more of the steps may be performed in other orders, omitted, or one or more other steps may be added.

４６０：プロセッサ
４６１：ＡＰＩ
４６２：プロセスＡＰＩ
４６５：制御部
４６７：コンテンツ取得部
４６９：フィンガープリント部
４７１：マッチン部部
４７３：比較部
４７５：クラスタリング部 460: Processor 461: API
462: Process API
465: Control unit 467: Content acquisition unit 469: Fingerprint unit 471: Matching unit 473: Comparison unit 475: Clustering unit

Claims

A method of operating an electronic device, wherein a processor of the electronic device comprises:
dividing the fingerprint of the multimedia content into a plurality of search intervals according to preset time intervals;
matching the fingerprint of the search segment of the multimedia content with the fingerprint of the sound source;
detecting at least one sound source having a detection interval with which at least one of the search intervals is matched;
determining position information indicative of the time position of the detection interval within the multimedia content and the time position of the detection interval within the sound source; and providing information associated with the sound source and the position information.

a processor of the electronic device comprising:
aligning the detection interval in the sound source with respect to the fingerprint of the multimedia content based on the offset difference between the time offset from the start of the multimedia content to the start of the detection interval in the multimedia content and the time offset from the start of the sound source to the start of the detection interval in the sound source;
The method of claim 1, further comprising comparing the fingerprint of the search segment of the multimedia content with the fingerprint of the detection segment within the sound source to detect the reliability of the sound source.

providing information associated with the sound source and the location information;
3. The method of operating an electronic device of claim 2 , comprising: providing a confidence level of the sound source together with information associated with the sound source and the location information.

The step of detecting the reliability includes:
generating a comparison interval for the detection interval by a comparison operation between the fingerprint of the search interval of the multimedia content and the fingerprint of the detection interval within the sound source ;
dividing the comparison interval into a plurality of bit intervals;
calculating a bit error rate for each of the bit intervals;
3. The method of operating an electronic device according to claim 2 , comprising converting the bit error rates into scores of the bit intervals, respectively; and detecting the reliability from the sum of the scores, wherein the comparison operation is exclusive OR .

The step of converting into the score includes:
5. The method of claim 4, comprising: assigning a score of 0 corresponding to the at least one bit error rate if at least one of the bit error rates exceeds a threshold; and assigning a score calculated corresponding to the remaining bit error rates if a remainder of the bit error rates is equal to or less than the threshold.

providing information associated with the sound source and the location information;
2. The method of claim 1, comprising providing the information associated with the sound sources and the location information in a list if the multimedia content is associated with multiple sound sources.

The multimedia content is
including audio data,
A method of operating an electronic device according to claim 1 .

a processor of the electronic device comprising:
The method of claim 1, further comprising : providing the sound source if information associated with the sound source is selected; or providing other multimedia content associated with the sound source if information associated with the sound source is selected.

at least one of the search intervals,
2. The method of operating an electronic device as claimed in claim 1, wherein the electronic device is matched to a plurality of sound sources.

10. A computer program for causing an electronic device to perform the method of operating an electronic device according to any one of claims 1 or 9.

10. A non-transitory computer-readable recording medium on which a program for causing the electronic device to execute the method according to claim 1 or 9 is recorded.

an electronic device,
a memory; and a processor coupled to said memory and configured to execute at least one instruction stored in said memory;
The processor
dividing the fingerprint of the multimedia content into a plurality of search intervals according to preset time intervals;
matching the fingerprint of the search segment of the multimedia content with the fingerprint of the sound source;
detecting at least one sound source having a detection interval with which at least one of the search intervals is matched;
determining position information indicating the time position of the detection interval within the multimedia content and the time position of the detection interval within the sound source;
configured to provide information associated with said sound source and said location information;
Device.

The processor
aligning the detection interval in the sound source to the fingerprint of the multimedia content based on the offset difference between the time offset from the start of the multimedia content to the start of the detection interval in the multimedia content and the time offset from the start of the sound source to the start of the detection interval in the sound source;
configured to compare the fingerprint of the search interval of the multimedia content with the fingerprint of the detection interval within the sound source to detect the reliability of the sound source;
13. Apparatus according to claim 12.

The processor
detecting the reliability of the sound source corresponding to the detection interval;
configured to provide said confidence along with information associated with said sound source and said location information;
14. Apparatus according to claim 13 .

The processor
generating a comparison section for the detection section by a comparison operation between the fingerprint of the search section of the multimedia content and the fingerprint of the detection section within the sound source ;
dividing the comparison interval into a plurality of bit intervals;
calculating a bit error rate for each of the bit intervals;
converting the bit error rate into a score of the bit interval, respectively;
configured to detect the reliability from the sum of the scores , wherein the comparison operation is an exclusive OR;
14. Apparatus according to claim 13 .

The processor
assigning a score of 0 corresponding to the at least one bit error rate if at least one of the bit error rates exceeds a threshold;
If the remainder of the bit error rate is equal to or less than the threshold value, a score calculated corresponding to the residual bit error rate is awarded.
16. Apparatus according to claim 15.

The processor
configured to provide information associated with the sound sources and the location information in a list if the multimedia content is associated with multiple sound sources;
13. Apparatus according to claim 12.

The multimedia content is
including audio data,
13. Apparatus according to claim 12.

The processor
configured to provide at least one of the sound source or other multimedia content associated with the sound source if information associated with the sound source is selected;
13. Apparatus according to claim 12.

at least one of the search intervals,
Matched with multiple sound sources,
13. Apparatus according to claim 12.