JP4482779B2

JP4482779B2 - Image processing apparatus, image processing method, and recording medium

Info

Publication number: JP4482779B2
Application number: JP2000274767A
Authority: JP
Inventors: 哲二郎近藤; 寿一白木; 秀雄中屋; 裕二奥村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-09-11
Filing date: 2000-09-11
Publication date: 2010-06-16
Anticipated expiration: 2020-09-11
Also published as: JP2002092612A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置および画像処理方法、並びに記録媒体に関し、特に、例えば、容易な操作で、的確なオブジェクト抽出を行うことができるようにする画像処理装置および画像処理方法、並びに記録媒体に関する。
【０００２】
【従来の技術】
画像から、いわゆる前景等となっている物体の部分であるオブジェクトを抽出する方法としては、様々な手法が提案されている。
【０００３】
即ち、例えば、特開平１０−２６９３６９号公報には、あるフレームについて、オブジェクトの輪郭を検出することにより、オブジェクトを抽出し、その次のフレームについては、前のフレームのオブジェクトの周辺を探索して、オブジェクトの輪郭を検出し、さらに、その検出結果に基づいて、オブジェクトを抽出することを繰り返す方法が記載されている。
【０００４】
【発明が解決しようとする課題】
しかしながら、注目している注目フレームからの輪郭の検出を、その前のフレームのオブジェクトの周辺に制限して行うと、注目フレームにおいて、オブジェクトが大きく変形した場合や移動した場合に、そのオブジェクトの輪郭の検出を誤る可能性が高くなり、的確なオブジェクト抽出が困難となる。
【０００５】
一方、例えば、ユーザに、各フレームごとに、オブジェクトの輪郭を指定してもらい、その指定された輪郭に基づいて、オブジェクトを抽出するのでは、ユーザの操作負担が大になる。
【０００６】
本発明は、このような状況に鑑みてなされたものであり、容易な操作で、的確なオブジェクト抽出を行うことができるようにするものである。
【０００７】
【課題を解決するための手段】
本発明の一側面の画像処理装置、又は、記録媒体は、画像から所定のオブジェクトを抽出する画像処理装置であって、注目している注目画面の画像から、複数の処理によってオブジェクトを抽出するオブジェクト抽出手段と、前記複数の処理によるオブジェクト抽出結果から、最終的なオブジェクト抽出結果に反映させるものを、ユーザからの入力に基づいて選択する選択手段と、前記選択手段において選択されたオブジェクト抽出結果を、最終的なオブジェクト抽出結果に反映させる反映手段と、前記最終的なオブジェクト抽出結果に反映されたオブジェクトの抽出に用いられた処理の内容である処理履歴と、ユーザからの入力に基づいて、注目画面からオブジェクトを抽出する処理の内容を決定する決定手段と、前記処理履歴を記憶する処理履歴記憶手段とを備え、前記複数の処理は、異なる閾値を用いた処理であり、前記履歴情報は、前記オブジェクトを構成する画素ごとの、前記オブジェクトの抽出に用いられた閾値を含み、前記決定手段は、前記ユーザが入力した前記注目画面の所定の点を、前記注目画面と前記注目画面より時間的に前に処理された前画面との間の動きベクトルによって補正することにより、前記注目画面の所定の点に対応する、前記前画面の点を求め、前記前画面についての前記処理履歴に基づいて、前記注目画面の所定の点に対応する、前記前画面の点の画素がオブジェクトとして抽出されたときに用いられた所定の閾値を取得し、前記所定の閾値と、前記所定の閾値を用いた所定の演算により求められる値とを、前記注目画面の所定の点を含むオブジェクトを抽出する複数の処理に用いられる閾値に決定する画像処理装置、又は、画像処理装置として、コンピュータを機能させるためのプログラムが記録されている記録媒体である。
【０００８】
本発明の一側面の画像処理方法は、画像から所定のオブジェクトを抽出する画像処理方法であって、注目している注目画面の画像から、複数の処理によってオブジェクトを抽出するオブジェクト抽出ステップと、前記複数の処理によるオブジェクト抽出結果から、最終的なオブジェクト抽出結果に反映させるものを、ユーザからの入力に基づいて選択する選択ステップと、前記選択ステップにおいて選択されたオブジェクト抽出結果を、最終的なオブジェクト抽出結果に反映させる反映ステップと、前記最終的なオブジェクト抽出結果に反映されたオブジェクトの抽出に用いられた処理の内容である処理履歴と、ユーザからの入力に基づいて、注目画面からオブジェクトを抽出する処理の内容を決定する決定ステップと、前記処理履歴を記憶する処理履歴記憶ステップとを備え、前記複数の処理は、異なる閾値を用いた処理であり、前記履歴情報は、前記オブジェクトを構成する画素ごとの、前記オブジェクトの抽出に用いられた閾値を含み、前記決定ステップでは、前記ユーザが入力した前記注目画面の所定の点を、前記注目画面と前記注目画面より時間的に前に処理された前画面との間の動きベクトルによって補正することにより、前記注目画面の所定の点に対応する、前記前画面の点を求め、前記前画面についての前記処理履歴に基づいて、前記注目画面の所定の点に対応する、前記前画面の点の画素がオブジェクトとして抽出されたときに用いられた所定の閾値を取得し、前記所定の閾値と、前記所定の閾値を用いた所定の演算により求められる値とを、前記注目画面の所定の点を含むオブジェクトを抽出する複数の処理に用いられる閾値に決定する画像処理方法である。
【００１０】
本発明の一側面においては、注目している注目画面の画像から、複数の処理によってオブジェクトが抽出され、前記複数の処理によるオブジェクト抽出結果から、最終的なオブジェクト抽出結果に反映させるものが、ユーザからの入力に基づいて選択されて、その選択されたオブジェクト抽出結果が、最終的なオブジェクト抽出結果に反映される。また、前記最終的なオブジェクト抽出結果に反映されたオブジェクトの抽出に用いられた処理の内容である処理履歴と、ユーザからの入力に基づいて、注目画面からオブジェクトを抽出する処理の内容が決定される一方、前記処理履歴が記憶される。なお、前記複数の処理は、異なる閾値を用いた処理であり、前記履歴情報には、前記オブジェクトを構成する画素ごとの、前記オブジェクトの抽出に用いられた閾値が含まれる。この場合に、前記ユーザが入力した前記注目画面の所定の点を、前記注目画面と前記注目画面より時間的に前に処理された前画面との間の動きベクトルによって補正することにより、前記注目画面の所定の点に対応する、前記前画面の点が求められ、前記前画面についての前記処理履歴に基づいて、前記注目画面の所定の点に対応する、前記前画面の点の画素がオブジェクトとして抽出されたときに用いられた所定の閾値が取得される。そして、前記所定の閾値と、前記所定の閾値を用いた所定の演算により求められる値とが、前記注目画面の所定の点を含むオブジェクトを抽出する複数の処理に用いられる閾値に決定される。
【００１１】
【発明の実施の形態】
図１は、本発明を適用した画像処理装置の一実施の形態のハードウェア構成例を示している。
【００１２】
この画像処理装置は、コンピュータをベースに構成されており、コンピュータには、後述するようなオブジェクト抽出を行うための一連の処理を実行するプログラム（以下、適宜、オブジェクト抽出処理プログラムという）がインストールされている。
【００１３】
なお、画像処理装置は、このように、コンピュータにプログラムを実行させることによって構成する他、それ専用のハードウェアにより構成することも可能である。
【００１４】
ここで、オブジェクト抽出処理プログラムは、コンピュータに内蔵されている記録媒体としてのハードディスク１０５やＲＯＭ１０３に予め記録される。
【００１５】
あるいはまた、オブジェクト抽出処理プログラムは、フロッピーディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリなどのリムーバブル記録媒体１１１に、一時的あるいは永続的に格納（記録）される。このようなリムーバブル記録媒体１１１は、いわゆるパッケージソフトウエアとして提供することができる。
【００１６】
なお、オブジェクト抽出処理プログラムは、上述したようなリムーバブル記録媒体１１１からコンピュータにインストールする他、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送し、コンピュータでは、そのようにして転送されてくるオブジェクト抽出処理プログラムを、通信部１０８で受信し、内蔵するハードディスク１０５にインストールすることができる。
【００１７】
コンピュータは、CPU(Central Processing Unit)１０２を内蔵している。CPU１０２には、バス１０１を介して、入出力インタフェース１１０が接続されており、CPU１０２は、入出力インタフェース１１０を介して、ユーザによって、キーボードや、マウス、マイク等で構成される入力部１０７が操作等されることにより指令が入力されると、それにしたがって、ROM(Read Only Memory)１０３に格納されているオブジェクト抽出処理プログラムを実行する。あるいは、また、CPU１０２は、ハードディスク１０５に格納されているオブジェクト抽出処理プログラム、衛星若しくはネットワークから転送され、通信部１０８で受信されてハードディスク１０５にインストールされたオブジェクト抽出処理プログラム、またはドライブ１０９に装着されたリムーバブル記録媒体１１１から読み出されてハードディスク１０５にインストールされたオブジェクト抽出処理プログラムを、RAM(Random Access Memory)１０４にロードして実行する。これにより、CPU１０２は、後述するようなフローチャートにしたがった処理、あるいは後述するブロック図の構成により行われる処理を行う。そして、CPU１０２は、その処理結果を、必要に応じて、例えば、入出力インタフェース１１０を介して、LCD(Liquid CryStal Display)やスピーカ等で構成される出力部１０６から出力、あるいは、通信部１０８から送信、さらには、ハードディスク１０５に記録等させる。
【００１８】
ここで、本明細書において、コンピュータに各種の処理を行わせるためのプログラムを記述する処理ステップは、必ずしもフローチャートとして記載された順序に沿って時系列に処理する必要はなく、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含むものである。
【００１９】
また、プログラムは、１のコンピュータにより処理されるものであっても良いし、複数のコンピュータによって分散処理されるものであっても良い。さらに、プログラムは、遠方のコンピュータに転送されて実行されるものであっても良い。
【００２０】
図２は、図１の画像処理装置の機能的構成例を示している。なお、この機能的構成は、図１のCPU１０２がオブジェクト抽出処理プログラムを実行することによって実現される。
【００２１】
ストレージ１は、オブジェクトを抽出する対象の動画の画像データを記憶する。また、ストレージ１は、処理制御部７から供給される、後述するような各フレームの履歴情報等も記憶する。
【００２２】
注目フレーム処理部２は、ストレージ１に記憶された画像データの所定のフレームを注目フレームとして、その注目フレームの画像データを読み出し、処理制御部７からの制御にしたがい、注目フレームに関する処理を行う。
【００２３】
即ち、注目フレーム処理部２は、注目フレームバッファ２１、背景バッファ２２、オブジェクトバッファ２３、およびセレクタ２４等から構成されている。注目フレームバッファ２１は、ストレージ１から読み出された注目フレームの画像データを記憶する。背景バッファ２２は、注目フレームバッファ２１に記憶された注目フレームの画像のうち、後述するオブジェクトバッファ２３に記憶された部分以外の残りを、背景画像として記憶する。オブジェクトバッファ２３は、後述するオブジェクト抽出部３で抽出される注目フレームのオブジェクトを記憶する。セレクタ２４は、注目フレームバッファ２１に記憶された注目フレーム、背景バッファ２２に記憶された背景画像、またはオブジェクトバッファ２３に記憶されたオブジェクトのうちのいずれか１つを選択し、表示部５に供給する。
【００２４】
オブジェクト抽出部３は、処理制御部７の制御にしたがい、注目フレームバッファ２１に記憶された注目フレームから、複数の処理によってオブジェクトを抽出する。
【００２５】
即ち、オブジェクト抽出部３は、境界検出部３１、切り出し部３２、および結果処理部３３等から構成されている。境界検出部３１は、注目フレームバッファ２１に記憶された注目フレームの画像の境界部分を検出し、その境界部分と、境界部分でない部分（以下、適宜、非境界部分という）とを表す２値で構成される、複数種類（ここでは、例えば、３種類）の境界画像を作成する。切り出し部３２は、境界検出部３１において作成された３つの境界画像を参照し、注目フレームバッファ２１に記憶された注目フレームから、オブジェクトを構成する領域を切り出す。さらに、切り出し部３２は、３つの出力バッファ３２Ａ乃至３２Ｃを有しており、３種類の境界画像を参照して切り出した領域を、出力バッファ３２Ａ乃至３２Ｃにそれぞれ記憶させる。結果処理部３３は、切り出し部３２が有する３つの出力バッファ３２Ａ乃至３２Ｃに対応して、３つの結果バッファ３３Ａ乃至３３Ｃを有しており、オブジェクトバッファ２３に記憶されたオブジェクトの抽出結果に、出力バッファ３２Ａ乃至３２Ｃの記憶内容それぞれを合成して、その３つの合成結果を、結果バッファ３３Ａ乃至３３Ｃにそれぞれ記憶させる。さらに、結果処理部３３は、ユーザがマウス９を操作することによって与えられる入力に基づいて、結果バッファ３３Ａ乃至３３Ｃの記憶内容のうちのいずれかを選択し、オブジェクトバッファ２３の記憶内容に反映させる。
【００２６】
履歴管理部４は、処理制御部７の制御の下、履歴情報を管理する。
【００２７】
即ち、履歴管理部４は、指定位置記憶部４１、履歴画像記憶部４２、およびパラメータテーブル記憶部４３等から構成されている。指定位置記憶部４１は、ユーザがマウス９を操作することにより入力した注目フレーム上の位置の座標の履歴を記憶する。履歴画像記憶部４２は、オブジェクト抽出部４２における処理の内容の履歴を表す履歴画像を記憶し、パラメータテーブル記憶部４３は、その履歴画像を構成する画素値としてのＩＤに対応付けて、オブジェクト抽出部３における処理の内容を表すパラメータを記憶する。即ち、パラメータテーブル記憶部４３は、オブジェクト抽出部３における処理の内容を表すパラメータを、ユニークなＩＤと対応付けて記憶する。そして、履歴画像記憶部４２は、オブジェクトを構成する画素ごとに、オブジェクトを構成する画素として抽出するのに用いた処理の内容に対応するＩＤを記憶する。従って、オブジェクトを構成するある画素を、オブジェクトとして抽出するのに用いた処理の内容は、その画素の履歴画像における画素値としてのＩＤに対応付けられている、パラメータテーブル記憶部４３のパラメータを参照することで認識することができる。
【００２８】
ここで、履歴画像記憶部４２には、上述のように、画素をオブジェクトとして抽出したときの処理の内容を表すパラメータに対応付けられているＩＤを、画素値として、そのような画素値で構成される画像が記憶されるため、その画像は、オブジェクト抽出に用いた処理の内容の履歴を表していることから、履歴画像と呼んでいる。また、以下、適宜、指定位置記憶部４１、履歴画像記憶部４２、およびパラメータテーブル記憶部４３における記憶内容をまとめて、履歴情報という。
【００２９】
なお、指定位置記憶部４１、履歴画像記憶部４２、およびパラメータテーブル記憶部４３は、少なくとも２つのバンクを有しており、バンク切り替えによって、注目フレームと、その注目フレームの１フレーム前のフレーム（前フレーム）についての履歴情報を記憶することができるようになっている。
【００３０】
表示部５は、セレクタ２４が出力する画像（注目フレームの画像、背景画像、またはオブジェクト）と、結果処理部３３の結果バッファ３３Ａ乃至３３Ｃに記憶された画像を表示する。
【００３１】
動き検出部６は、処理制御部７の制御の下、注目フレームの画像の、前フレームの画像を基準とする動きベクトルを検出し、処理制御部７に供給する。
【００３２】
即ち、動き検出部６は、前フレームバッファ６１を内蔵しており、ストレージ１から、前フレームの画像データを読み出し、前フレームバッファ６１に記憶させる。そして、動き検出部６は、前フレームバッファ６１に記憶された前フレームの画像データと、注目フレーム処理部２の注目フレームバッファ２１に記憶された注目フレームの画像データとを対象としたブロックマッチング等を行うことにより、動きベクトルを検出して、処理制御部７に供給する。
【００３３】
処理制御部７は、イベント検出部８から供給されるイベント情報等に基づき、注目フレーム処理部２、オブジェクト抽出部３、履歴管理部４、および動き検出部６を制御する。さらに、処理制御部７は、イベント検出部８から供給されるイベント情報や、履歴管理部４で管理されている履歴情報に基づいて、オブジェクト抽出部３における処理の内容を決定し、その決定結果に基づいて、オブジェクト抽出部３にオブジェクトを抽出させる。また、処理制御部７は、位置補正部７１を内蔵しており、イベント検出部８からイベント情報として供給される注目フレームの画像上の位置の情報や、履歴管理部４の指定位置記憶部４１に記憶された位置の情報等を、動き検出部６からの動きベクトルにしたがって補正する。この補正された位置の情報は、後述するように、オブジェクト抽出部３に供給され、オブジェクトの抽出に用いられる。あるいは、また、履歴管理部４１に供給され、指定位置記憶部４１で記憶される。
【００３４】
イベント検出部８は、ユーザがマウス９を操作することにより発生するイベントを検出し、そのイベントの内容を表すイベント情報を、処理制御部７に供給する。
【００３５】
マウス９は、表示部５に表示された画像上の位置を指定する場合や、所定のコマンドを装置に与える場合等に、ユーザによって操作される。
【００３６】
次に、図３は、表示部５における画面の表示例を示している。
【００３７】
オブジェクト抽出処理プログラムが実行されると、表示部５には、図３に示すような横と縦がそれぞれ２分割されたウインドウが表示される。
【００３８】
この４分割されたウインドウにおいて、左上の画面は、基準画面とされており、右上、左下、右下の画面は、それぞれ結果画面＃１，＃２，＃３とされている。
【００３９】
基準画面には、セレクタ２４が出力する画像が表示される。上述したように、セレクタ２４は、注目フレームバッファ２１に記憶された注目フレーム、背景バッファ２２に記憶された背景画像、またはオブジェクトバッファ２３に記憶されたオブジェクトのうちのいずれか１つを選択し、表示部５に供給するので、基準画面には、注目フレームの画像（原画像）、オブジェクト、または背景のうちのいずれかが表示される。表示部５に、上述のいずれの画像を表示させるかは、ユーザが、マウス９を操作することにより切り替えることができるようになっている。図３の実施の形態においては、背景バッファ２２に記憶された背景画像、即ち、原画像から、オブジェクトバッファ２３にオブジェクトとして取り込まれた画像を除いた画像が、基準画面に表示されている。なお、基準画面において斜線を付してある部分が、現在、オブジェクトバッファ２３に取り込まれている部分を表している（以下、同様）。
【００４０】
また、基準画面の左下には、チェンジディスプレイ(Change Display)ボタン２０１、ユースレコード(Use Record)ボタン２０２、デリートパートリィ(Delete Partly)ボタン２０３、およびアンドゥ(Undo)ボタン２０４が設けられている。
【００４１】
チェンジディスプレイボタン２０１には、基準画面に表示させる画像を切り替えるときに操作される。即ち、セレクタ２４は、チェンジディスプレイボタン２０１がマウス９でクリックされるごとに、注目フレームバッファ２１、背景バッファ２２、オブジェクトバッファ２３の出力を、いわば巡回的に選択し、その結果、基準画面に表示される画像が、原画像、オブジェクト、背景の順に、巡回的に切り替えられる。
【００４２】
ユースレコードボタン２０２は、注目フレームバッファ２１に記憶された注目フレームからオブジェクトを抽出するのに、履歴管理部４に記憶されている履歴情報を利用するかどうかを設定するときに操作される。即ち、ユースレコードボタン２０２がマウス９でクリックされると、基準画面には、履歴情報の利用を許可するかどうか等を設定するためのプルダウンメニューが表示されるようになっている。なお、本実施の形態では、基本的に、履歴情報の利用が許可されているものとする。
【００４３】
デリートパートリィボタン２０３は、オブジェクトバッファ２３に、オブジェクトとして記憶された画像の一部を削除する（オブジェクトから背景に戻す）ときに操作される。即ち、ユーザがマウス９を操作することにより、基準画面に表示されたオブジェクトの所定の部分を範囲指定した後、デリートパートリィボタン２０３をマウス９でクリックすると、その範囲指定されたオブジェクトの所定の部分が、オブジェクトバッファ２３から削除される。従って、デリートパートリィボタン２０３は、例えば、オブジェクトバッファ２３に、背景の一部が、オブジェクトとして取り込まれた場合に、その背景部分を、オブジェクトから削除するような場合に使用される。
【００４４】
アンドゥボタン２０４は、結果処理部３３の結果バッファ３３Ａ乃至３３Ｃから、オブジェクトバッファ２３に、オブジェクトとして取り込まれた画像のうち、前回取り込まれた部分を削除するときに操作される。従って、アンドゥボタン２０４が操作されると、オブジェクトバッファ２３に記憶された画像は、結果バッファ３３Ａ乃至３３Ｃから画像が取り込まれる直前の状態に戻る。なお、オブジェクトバッファ２３は、複数バンクを有しており、結果バッファ３３Ａ乃至３３Ｃから画像が取り込まれる直前の状態を、少なくとも保持している。そして、アンドゥボタン２０４が操作された場合には、オブジェクトバッファ２３は、直前に選択していたバンクへのバンク切り替えを行うことにより、セレクタ２４に出力する画像を切り替える。
【００４５】
結果画面＃１乃至＃３には、注目フレームから、異なる処理で抽出されたオブジェクトが記憶された結果バッファ３３Ａ乃至３３Ｃの記憶内容、即ち、３つの異なる処理で行われたオブジェクト抽出の結果がそれぞれ表示される。また、結果画面＃１乃至＃３それぞれの左下には、ランクリザルト(Rank Result)ボタン２０６、グラブオール(Grab All)ボタン２０７、およびグラブパートリィ(Grab Partly)ボタン２０８が設けられている。
【００４６】
ランクリザルトボタン２０６は、結果画面＃１乃至＃３に表示されたオブジェクト抽出結果の順位付けを行うときに操作される。即ち、結果画面＃１乃至＃３それぞれのランクリザルトボタン２０６を、ユーザがマウス９を操作することにより、オブジェクト抽出結果として好ましいと考える順番でクリックすると、そのクリック順に、結果画面＃１乃至＃３に表示されたオブジェクト抽出結果に対して、順位付けが行われる。そして、オブジェクト抽出部３では、その順位付けに基づいて、再度、オブジェクトの抽出が行われ、そのオブジェクト抽出結果が、結果画面＃１乃至＃３に表示される。
【００４７】
グラブオールボタン２０７は、結果画面＃１乃至＃３に表示されたオブジェクト抽出結果のうちのいずれかを、オブジェクトバッファ２３に反映する（取り込む）ときに操作される。即ち、ユーザがマウス９を操作することにより、結果画面＃１乃至＃３のうちの、好ましいと考えるオブジェクト抽出結果が表示されているもののグラブオールボタン２０７をクリックすると、その結果画面に表示されたオブジェクト抽出結果を記憶している結果バッファの記憶内容のすべてが選択され、オブジェクトバッファ２３に反映される。
【００４８】
グラブパートリィボタン２０８は、結果画面＃１乃至＃３に表示されたオブジェクト抽出結果のうちのいずれかの一部を、オブジェクトバッファ２３に反映する（取り込む）ときに操作される。即ち、ユーザがマウス９を操作することにより、結果画面＃１乃至＃３のうちの、好ましいと考えるオブジェクト抽出結果の一部分を範囲指定した後、グラブパートリィボタン２０８をマウス９でクリックすると、その範囲指定されたオブジェクト抽出結果の一部分が選択され、オブジェクトバッファ２３に反映される。
【００４９】
次に、図４のフローチャートを参照して、図２の画像処理装置の処理の概要について説明する。
【００５０】
ユーザがマウス９を操作することにより、何らかのイベントが生じると、イベント検出部８は、ステップＳ１において、そのイベントの内容を判定する。
【００５１】
ステップＳ１において、イベントが、表示部５の基準画面に表示させる画像を切り替える「画面選択」を指示するものであると判定された場合、即ち、チェンジディスプレイボタン２０１（図３）がクリックされた場合、イベント検出部８は、「画面選択」を表すイベント情報を、処理制御部７に供給する。処理制御部７は、「画面選択」を表すイベント情報を受信すると、ステップＳ２に進み、注目フレーム処理部２のセレクタ２４を制御し、処理を終了する。
【００５２】
これにより、セレクタ２４は、注目フレームバッファ２１、背景バッファ２２、またはオブジェクトバッファ２３の出力の選択を切り替え、その結果、基準画面に表示される画像が、図５に示すように、注目フレームバッファ２１に記憶された注目フレームの原画像、背景バッファ２２に記憶された背景画像、またはオブジェクトバッファ２３に記憶されたオブジェクトのうちのいずれかに切り替えられる。
【００５３】
また、ステップＳ１において、イベントが、オブジェクトバッファ２３に直前反映された画像を削除する「アンドゥ」を指示するものであると判定された場合、即ち、アンドゥボタン２０４がクリックされた場合、イベント検出部８は、「アンドゥ」を表すイベント情報を、処理制御部７に供給する。処理制御部７は、「アンドゥ」を表すイベント情報を受信すると、ステップＳ３に進み、注目フレーム処理部２のオブジェクトバッファ２３を制御し、オブジェクトバッファ２３に直前に反映されたオブジェクトの部分を削除させ、ステップＳ４に進む。
【００５４】
ステップＳ４では、処理制御部７は、履歴管理部４１を制御することにより、ステップＳ３でオブジェクトバッファ２３から削除された画像に関する履歴情報を削除させ、処理を終了する。
【００５５】
即ち、オブジェクトバッファ２３に、オブジェクトとなる画像が反映された（取り込まれた）場合、後述するように、その反映された画像に関して、履歴管理部４で管理されている履歴情報が更新される。このため、オブジェクトバッファ２３から画像が削除された場合には、その削除された画像に関する履歴情報が削除される。
【００５６】
一方、ステップＳ１において、イベントが、オブジェクトバッファ２３に反映された画像の一部分を削除する「部分削除」を指示するものであると判定された場合、即ち、所定の範囲が指定され、さらに、デリートパートリィボタン２０３がクリックされた場合、イベント検出部８は、「部分削除」を表すイベント情報を、処理制御部７に供給する。処理制御部７は、「部分削除」を表すイベント情報を受信すると、ステップＳ５に進み、注目フレーム処理部２のオブジェクトバッファ２３を制御し、オブジェクトバッファ２３にオブジェクトとして記憶された画像のうちの、範囲指定された部分を削除させ、ステップＳ４に進む。
【００５７】
ステップＳ４では、処理制御部７は、履歴管理部４１を制御することにより、ステップＳ５でオブジェクトバッファ２３から削除された画像に関する履歴情報を削除させ、処理を終了する。
【００５８】
従って、例えば、図６（Ａ）に示すように、オブジェクトバッファ２３に、人間の胴体部分を表すオブジェクトｏｂｊ１が記憶されているとともに、背景バッファ２２に、人間の頭部を表すオブジェクトｏｂｊ２とともに、風景等の背景が記憶されている場合において、オブジェクト抽出部３において、人間の頭部を表すオブジェクトｏｂｊ２が抽出され、オブジェクトバッファ２３に反映されると、図６（Ｂ）に示すように、オブジェクトバッファ２３の記憶内容は、オブジェクトｏｂｊ１とｏｂｊ２となり、背景バッファ２２の記憶内容は、風景等の背景部分のみとなる。
【００５９】
この場合に、ユーザが、マウス９で、アンドゥボタン２０４をクリックすると、図６（Ｃ）に示すように、オブジェクトバッファ２３の記憶内容は、人間の頭部を表すオブジェクトｏｂｊ２が反映される前の、人間の胴体部分を表すオブジェクトｏｂｊ１だけが記憶されている状態に戻り、背景バッファ２２の記憶内容も、人間の頭部を表すオブジェクトｏｂｊ２とともに、風景等の背景が記憶されている状態に戻る。即ち、背景バッファ２２およびオブジェクトバッファ２３の記憶内容は、図６（Ａ）に示した状態に戻る。
【００６０】
また、ユーザが、マウス９を操作することにより、例えば、図６（Ｂ）に示すように、人間の頭部を表すオブジェクトｏｂｊ２の一部を範囲指定し、さらに、デリートパートリィボタン２０３をクリックすると、図６（Ｄ）に示すように、オブジェクトバッファ２３の記憶内容は、オブジェクトｏｂｊ２のうちの、範囲指定された部分が削除された状態となり、背景バッファ２２の記憶内容は、その範囲指定された部分が、風景等の背景部分に加えられた状態になる。
【００６１】
一方、ステップＳ１において、イベントが、基準画面または結果画面＃１乃至＃３のうちのいずれかに表示された画像上の位置を指定する「位置指定」を表すものであると判定された場合、即ち、例えば、図７に示すように、ユーザがマウス９を操作することにより、基準画面に表示された原画像や背景画像におけるオブジェクトのある位置をクリックした場合、イベント検出部８は、「位置指定」を表すイベント情報を、処理制御部７に供給する。処理制御部７は、「位置指定」を表すイベント情報を受信すると、ステップＳ６に進み、マウス９でクリックされた位置等に基づいて、オブジェクト抽出部３に行わせる３つのオブジェクト抽出処理の内容を決定し、その３つのオブジェクト抽出処理によりオブジェクトの抽出を行うように、オブジェクト抽出部３を制御する。
【００６２】
これにより、オブジェクト抽出部３は、ステップＳ７において、３つのオブジェクト抽出処理を行い、その結果得られる３つのオブジェクト抽出結果を、結果処理部３３の結果バッファ３３Ａ乃至３３Ｃに、それぞれ記憶させる。
【００６３】
そして、ステップＳ８に進み、表示部５は、結果バッファ３３Ａ乃至３３Ｃに記憶されたオブジェクト抽出結果を、結果画面＃１乃至＃３にそれぞれ表示し、処理を終了する。
【００６４】
また、ステップＳ１において、イベントが、結果画面＃１乃至＃３それぞれに表示されたオブジェクト抽出結果の（良好さの）順位を指定する「順位指定」を表すものであると判定された場合、即ち、結果画面＃１乃至＃３それぞれに表示されたランクリザルトボタン２０６が、所定の順番でクリックされた場合、イベント検出部８は、「順位指定」を表すイベント情報を、処理制御部７に供給する。処理制御部７は、「順位指定」を表すイベント情報を受信すると、ステップＳ６に進み、「順位指定」によって指定された順位に基づいて、オブジェクト抽出部３に行わせる３つのオブジェクト抽出処理の内容を決定し、その３つのオブジェクト抽出処理によりオブジェクトの抽出を行うように、オブジェクト抽出部３を制御する。そして、以下、ステップＳ７，Ｓ８に順次進み、上述した場合と同様の処理が行われる。
【００６５】
一方、ステップＳ１において、イベントが、結果画面＃１乃至＃３に表示されたオブジェクト抽出結果のうちのいずれかを選択し、その全部または一部を、オブジェクトバッファ２３に反映する「全取得」または「部分取得」を表すものであると判定された場合、即ち、結果画面＃１乃至＃３のうちのいずれかのグラブオールボタン２０７がクリックされた場合、または結果画面＃１乃至＃３に表示されたオブジェクト抽出結果のうちのいずれかの一部が範囲指定され、さらに、グラブパートリィボタン２０８がクリックされた場合、イベント検出部８は、「全取得」または「部分取得」を表すイベント情報を、処理制御部７に供給し、処理制御部７は、「全取得」または「部分取得」を表すイベント情報を受信すると、ステップＳ９に進む。
【００６６】
ステップＳ９では、処理制御部７が、オブジェクト抽出部３の結果処理部３３を制御することにより、結果画面＃１乃至＃３のうちの、グラブオールボタン２０７がクリックされたものに対応する結果バッファに記憶されたオブジェクト抽出結果の全体を選択させ、オブジェクトバッファ２３に反映させる（記憶させる）。あるいは、また、ステップＳ９では、処理制御部７が、オブジェクト抽出部３の結果処理部３３を制御することにより、結果画面＃１乃至＃３のうちの、グラブパートリィボタン２０８がクリックされたものに対応する結果バッファに記憶されたオブジェクト抽出結果のうちの範囲指定された部分を選択させ、オブジェクトバッファ２３に反映させる。
【００６７】
従って、例えば、オブジェクトバッファ２３の記憶内容が、図８（Ａ）に示すようなものであり、ある結果画面＃ｉに対応する結果バッファに記憶されたオブジェクト抽出結果が、図８（Ｂ）に示すようなものである場合において、結果画面＃ｉに表示されたグラブオールボタン２０７が操作されたときには、オブジェクトバッファ２３の記憶内容は、図８（Ｃ）に示すように、図８（Ｂ）に示した結果バッファに記憶されたオブジェクト抽出結果に更新（上書き）される。
【００６８】
また、結果画面＃ｉに表示されたオブジェクト抽出結果のうちの一部が、図８Ｂ）に長方形で囲んで示すように範囲指定され、さらに、結果画面＃ｉに表示されたグラブパートリィボタン２０８が操作されたときには、オブジェクトバッファ２３の記憶内容は、図８（Ｄ）に示すように、図８（Ａ）に示したオブジェクトに、図８（Ｂ）に示した範囲指定された部分のオブジェクト抽出結果を加えた（合成した）ものに更新される。
【００６９】
そして、ステップＳ１０に進み、処理制御部７は、履歴管理部４を制御することにより、ステップＳ９でオブジェクトバッファ２３に反映させた画像に関して、履歴情報を更新させ、処理を終了する。
【００７０】
以上のように、３つのオブジェクト抽出処理によるオブジェクト抽出結果が結果画面＃１乃至＃３にそれぞれ表示され、ユーザが、結果画面＃１乃至＃３のうちのいずれかにおけるグラブオールボタン２０７またはグラブパートリィボタン２０８をクリックすると、その結果画面に表示されたオブジェクト抽出結果が、オブジェクトバッファ２３に反映される。従って、ユーザは、結果画面＃１乃至＃３に表示された異なるオブジェクト抽出処理によるオブジェクト抽出結果を見て、良好なものを選択する操作をするだけで良く、さらに、オブジェクトバッファ２３には、異なるオブジェクト抽出処理により得られたオブジェクト抽出結果のうち、ユーザが良好であると判断して選択したものが反映される。その結果、容易な操作で、的確なオブジェクト抽出を行うことができる。
【００７１】
また、結果画面＃１乃至＃３には、それぞれ異なる処理によって抽出されたオブジェクトが表示されるが、ある処理によるオブジェクト抽出結果が、全体としては、それほど良好でなくても、その一部についてだけ見れば、良好な場合がある。この場合、その一部を範囲指定して、グラブパートリィボタン２０８をクリックすることにより、その良好に抽出されているオブジェクトの一部分を、オブジェクトバッファ２３に反映させることができ、その結果、最終的には、オブジェクトバッファ２３に、良好なオブジェクト抽出結果が記憶されることになる。
【００７２】
一方、ステップＳ１において、イベントが、注目フレームからの最終的なオブジェクト抽出結果を、オブジェクトバッファ２３に記憶された画像に確定する「確定」を表すものであると判定された場合、イベント検出部８は、「確定」を表すイベント情報を、処理制御部７に供給する。
【００７３】
処理制御部７は、「確定」を表すイベント情報を受信すると、ステップＳ１１に進み、注目フレーム処理部２から、オブジェクトバッファ２３に記憶された注目フレームのオブジェクトを読み出すとともに、履歴管理部４から、注目フレームについての履歴情報を読み出し、ストレージ１に供給して記憶させる。そして、ステップＳ１２に進み、処理制御部７は、ストレージ１に、注目フレームの次のフレームが記憶されているかどうかを判定し、記憶されていないと判定した場合、ステップＳ１３およびＳ１４をスキップして、処理を終了する。
【００７４】
また、ステップＳ１２において、ストレージ１に、注目フレームの次のフレームが記憶されていると判定された場合、ステップＳ１３に進み、処理制御部７は、その、次のフレームを、新たに注目フレームとし、注目フレームバッファ２１に供給して記憶させる。さらに、処理制御部７は、背景バッファ２２、結果バッファ３３Ａ乃至３３Ｃ、前フレームバッファ６１の記憶内容をクリアして、ステップＳ１４に進む。ステップＳ１４では、処理制御部７の制御の下、ステップＳ１３で注目フレームバッファ２１に新たに記録された注目フレームについて、後述するような初期抽出処理が行われ、処理を終了する。
【００７５】
次に、図２のオブジェクト抽出部３が行うオブジェクト抽出処理について説明する。
【００７６】
本実施の形態では、オブジェクト抽出部３は、基本的に、注目フレームにおける境界部分を検出し、その境界部分で囲まれる領域を、オブジェクトとして抽出するようになっている。
【００７７】
即ち、図９は、オブジェクト抽出部３の境界検出部３１の構成例を示している。
【００７８】
ＨＳＶ分離部２１１は、注目フレームバッファ２１に記憶された注目フレームを読み出し、その画素値を、Ｈ（色相），Ｓ（彩度），Ｖ（明度）の各成分に分離する。即ち、注目フレームの画素値が、例えば、ＲＧＢ(Red, Green, Blue)で表現されている場合、ＨＳＶ分離部２１１は、例えば、次式にしたがって、ＲＧＢの画素値を、ＨＳＶの画素値に変換する。
【００７９】
Ｖ＝ｍａｘ（Ｒ，Ｇ，Ｂ）
Ｘ＝ｍｉｎ（Ｒ，Ｇ，Ｂ）
Ｓ＝（Ｖ−Ｘ）／Ｖ×２５５
Ｈ＝（Ｇ−Ｂ）／（Ｖ−Ｘ）×６０但し、Ｖ＝Ｒのとき
Ｈ＝（Ｂ−Ｒ）／（Ｖ−Ｘ＋２）×６０但し、Ｖ＝Ｇのとき
Ｈ＝（Ｒ−Ｇ）／（Ｖ−Ｘ＋４）×６０但し、上記以外のとき
なお、ここでは、注目フレームの元の画素値であるＲ，Ｇ，Ｂの各成分が、例えば、８ビット（０乃至２５５の範囲の整数値）で表されるものとしてある。また、ｍａｘ（）は、カッコ内の値の最大値を表し、ｍｉｎ（）は、カッコ内の値の最小値を表す。
【００８０】
ＨＳＶ分離部２１１は、ＨＳＶの各成分に変換した画素値のうち、Ｈ，Ｓ，Ｖ成分を、それぞれ、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖに供給する。
【００８１】
エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖは、ＨＳＶ分離部２１１からのＨ，Ｓ，Ｖ成分で構成される画像（以下、適宜、それぞれを、Ｈプレーン、Ｓプレーン、Ｖプレーンという）それぞれを対象に、エッジ検出を行う。
【００８２】
即ち、エッジ検出部２１２Ｈは、Ｈプレーンの画像に、ソーベルオペレータ(Sobel Operator)によるフィルタリングを行うことで、Ｈプレーンの画像からエッジを検出する。
【００８３】
具体的には、Ｈプレーンの画像の左からｘ＋１番目で、上からｙ＋１番目の画素のＨ成分を、Ｉ（ｘ，ｙ）と表すとともに、その画素のソーベルオペレータによるフィルタリング結果をＥ（ｘ，ｙ）と表すと、エッジ検出部２１２Ｈは、次式で表される画素値Ｅ（ｘ，ｙ）で構成されるエッジ画像を求める。
【００８４】

【００８５】
エッジ検出部２１２Ｓと２１２Ｖでも、エッジ検出部２１２Ｈにおける場合と同様にして、ＳプレーンとＶプレーンの画像について、エッジ画像が求められる。
【００８６】
Ｈ，Ｓ，Ｖプレーンの画像から得られたエッジ画像は、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖから、二値化部２１３Ｈ，２１３Ｓ，２１３Ｖにそれぞれ供給される。二値化部２１３Ｈ，２１３Ｓ，２１３Ｖは、Ｈ，Ｓ，Ｖプレーンのエッジ画像を、所定の閾値と比較することで二値化し、その結果得られるＨ，Ｓ，Ｖプレーンの二値化画像（画素値が０か、または１の画像）を、細線化部２１４Ｈ，２１４Ｓ，２１４Ｖにそれぞれ供給する。
【００８７】
細線化部２１４Ｈ，２１４Ｓ，２１４Ｖは、二値化部２１３Ｈ，２１３Ｓ，２１３Ｖから供給されるＨ，Ｓ，Ｖプレーンの二値化画像における境界部分の細線化を行い、その結果得られるＨ，Ｓ，Ｖプレーンの境界画像を、境界画像記憶部２１５Ｈ，２１５Ｓ，２１５Ｖにそれぞれ供給して記憶させる。
【００８８】
次に、図１０を参照して、図９の細線化部２１４Ｈにおいて、Ｈプレーンの二値化画像を対象に行われる細線化処理について説明する。
【００８９】
細線化処理では、図１０（Ａ）のフローチャートに示すように、まず最初に、ステップＳ２１において、所定のフラグｖが０にリセットされ、ステップＳ２２に進む。ステップＳ２２では、Ｈプレーンの二値化画像を構成する画素が、ラスタスキャン順に参照され、ステップＳ２３に進む。ステップＳ２３では、Ｈプレーンの二値化画像を構成する画素のうち、ラスタスキャン順で、まだ参照されていない画素が存在するかどうかが判定され、まだ参照されていない画素が存在すると判定された場合、ラスタスキャン順で最初に検出される、まだ参照されていない画素を、注目画素として、ステップＳ２４に進む。
【００９０】
ステップＳ２４では、注目画素の上下左右にそれぞれ隣接する４つの画素のうちの１以上の画素値が０であり、かつ注目画素の画素値ｃが所定の値ａ（０および１以外の値）に等しくないかどうかが判定される。ステップＳ２４において、注目画素の上下左右にそれぞれ隣接する４つの画素のうちの１以上の画素値が０でないと判定されるか（従って、隣接する４つの画素の中に、画素値が０のものがない）、または注目画素の画素値ｃが所定の値ａに等しいと判定された場合、ステップＳ２２に戻り、以下、同様の処理を繰り返す。
【００９１】
また、ステップＳ２４において、注目画素の上下左右にそれぞれ隣接する４つの画素のうちの１以上の画素値が０であり、かつ注目画素の画素値ｃが所定の値ａに等しくないと判定された場合、ステップＳ２５に進み、フラグｖに１がセットされ、ステップＳ２６に進む。
【００９２】
ステップＳ２６では、図１０（Ｂ）に示すような、注目画素ｃに隣接する８つの画素の画素値ａ１，ａ２，ａ３，ａ４，ａ５，ａ６，ａ７，ａ８の加算値（ａ１＋ａ２＋ａ３＋ａ４＋ａ５＋ａ６＋ａ７＋ａ８が６以下であるかどうかが判定される。
【００９３】
ステップＳ２６において、注目画素ｃに隣接する８つの画素値の加算値が６以下でないと判定された場合、ステップＳ２８に進み、注目画素の画素値ｃに、所定の値ａがセットされ、ステップＳ２２に戻る。
【００９４】
また、ステップＳ２６において、注目画素ｃに隣接する８つの画素値の加算値が６以下であると判定された場合、ステップＳ２７に進み、次の条件式が成立するか否かが判定される。
【００９５】
(a2+a4+a6+a8)-(a1&a2&a3)-(a4&a5&a6)-(a7&a8&a1)=1
但し、&は、論理積を表す。
【００９６】
ステップＳ２７において、条件式が成立しないと判定された場合、ステップＳ２８に進み、上述したように、注目画素の画素値ｃに、所定の値ａがセットされ、ステップＳ２２に戻る。
【００９７】
また、ステップＳ２７において、条件式が成立すると判定された場合、ステップＳ２９に進み、注目画素の画素値ｃが０とされ、ステップＳ２２に戻る。
【００９８】
一方、ステップＳ２３において、Ｈプレーンの二値化画像を構成する画素のうち、ラスタスキャン順で、まだ参照されていない画素が存在しないと判定された場合、即ち、二値化画像を構成するすべての画素を注目画素として処理を行った場合、ステップＳ３０に進み、フラグｖが０であるかどうかが判定される。
【００９９】
ステップＳ３０において、フラグｖが０でないと判定された場合、即ち、フラグｖが１である場合、ステップＳ２１に戻り、以下、同様の処理が繰り返される。また、ステップＳ３０において、フラグｖが０であると判定された場合、処理を終了する。
【０１００】
その後、細線化部２１４Ｈは、上述の細線化処理の結果得られた画像を構成する画素のうち、画素値が所定の値ｃになっているものの画素値を１に変換し、その変換後の画像を、境界画像として、境界画像記憶部２１５Ｈに供給する。これにより、境界画像記憶部２１５Ｈには、Ｈプレーンの画像において境界部分が１で、非境界部分が０となっている境界画像が記憶される。
【０１０１】
細線化部２１４Ｓと２１４Ｖでも、細線化部２１４Ｈにおける場合と同様の処理が行われることにより、ＳとＨプレーンの境界画像がそれぞれ求められる。
【０１０２】
ここで、図１０で説明したような細線化の方法については、例えば、横井、鳥脇、福村、「標本化された２値図形のトポロジカルな性質について」、電子情報通信学会論文誌（Ｄ），Ｊ５６−Ｄ，ｐｐ．６６２−６６９，１９７３等に、その詳細が開示されている。なお、細線化の方法は、上述した手法に限定されるものではない。
【０１０３】
図１１は、境界画像の例を示している。
【０１０４】
図１１（Ａ）は、原画像を示しており、図１１（Ｂ）は、図１１（Ａ）の原画像から得られたＶプレーンの境界画像を示している。また、図１１（Ｃ）は、図１１（Ａ）の原画像から得られたＨプレーンの境界画像を示している。図１１（Ｂ）と図１１（Ｃ）を比較することにより、Ｖプレーンでは、比較的小さな凹または凸部分も、境界部分として検出されているのに対して、Ｈプレーンでは、比較的大きな凹または凸部分だけが、境界部分として検出されていることが分かる。このように、Ｈ，Ｓ，Ｖプレーンでは、境界部分として検出される凹または凸部分のレベルが異なる。
【０１０５】
ここで、図１１（Ｂ）および図１１（Ｃ）では、白抜きの部分（境界部分）が、境界画像において画素値が１になっている部分を表しており、黒塗りの部分が、境界画像において画素値が０になっている部分を表している。
【０１０６】
なお、境界検出部３１では、上述のように、Ｈ，Ｓ，Ｖプレーンそれぞれについて、３つの境界画像が作成される他、いずれか１つのプレーンについて、二値化するときの閾値を３つ用いることにより、その３つの閾値にそれぞれ対応する３つの境界画像が作成される場合がある。以下、適宜、Ｈ，Ｓ，Ｖプレーンについて、３つの境界画像が作成される場合に用いられる閾値を、それぞれ、ＴＨ_H，ＴＨ_S，ＴＨ_Vと表す。また、ある１つのプレーンについて３つの境界画像が作成される場合に用いられる３つの閾値を、以下、適宜、ＴＨ１，ＴＨ２，ＴＨ３と表す。
【０１０７】
次に、図１２のフローチャートを参照して、図２の切り出し部３２で行われる切り出し処理について説明する。なお、境界検出部３１では、上述したように、３つの境界画像が得られるが、ここでは、そのうちの１つの境界画像に注目して、切り出し処理を説明する。また、以下、適宜、３つの出力バッファ３２Ａ乃至３２Ｃのうちの、注目している境界画像（注目境界画像）に基づいて、注目フレームから切り出される画像が記憶されるものを、注目出力バッファという。
【０１０８】
切り出し処理では、注目出力バッファの記憶内容がクリアされた後、ステップＳ４１において、ユーザが、マウス９を操作することにより指定した注目フレームの画像上の位置（指定位置）にある画素の画素値が、注目フレームバッファ２１から読み出され、注目出力バッファに書き込まれる。即ち、オブジェクト抽出部３は、図４で説明したように、ユーザが「位置指定」または「順位指定」を行った場合に処理を行うが、ステップＳ４１では、ユーザが直前に行った「位置指定」によって指定した注目フレーム上の位置にある画素の画素値が、注目出力バッファに書き込まれる。そして、ステップＳ４２に進み、注目出力バッファに、未処理の画素（画素値）が記憶されているかどうかが判定される。
【０１０９】
ステップＳ４２において、注目出力バッファに、未処理の画素が記憶されていると判定された場合、ステップＳ４３に進み、注目出力バッファに記憶されている画素のうちの、未処理の画素の任意の１つが、注目画素とされ、ステップＳ４４に進む。ステップＳ４４では、注目画素の上、下、左、右、左上、左下、右上、右下にそれぞれ隣接する８画素の画素値が、境界画像から取得され、ステップＳ４５に進む。
【０１１０】
ステップＳ４５では、境界画像における、注目画素に隣接する８画素の画素値の中に、境界部分となっている境界画素（本実施の形態では、画素値が１になっている画素）が存在するかどうかが判定される。ステップＳ４５において、注目画素に隣接する８画素の画素値の中に、境界画素が存在すると判定された場合、ステップＳ４６をスキップして、ステップＳ４２に戻り、以下、同様の処理が繰り返される。即ち、注目画素に隣接する８画素の画素値の中に、境界画素が存在する場合は、その８画素の画素値の注目出力バッファへの書き込みは行われない。
【０１１１】
また、ステップＳ４５において、注目画素に隣接する８画素の画素値の中に、境界画素が存在しないと判定された場合、ステップＳ４６に進み、その８画素の画素値が、注目フレームバッファ２１から読み出され、注目出力バッファの対応するアドレスに記憶される。即ち、注目画素に隣接する８画素の画素値の中に、境界画素が存在しない場合は、その８画素が、ユーザがマウス９でクリックした位置（「位置指定」により指定した位置）を含むオブジェクトの内部の領域であるとして、その８画素の画素値が、注目出力バッファに書き込まれる。
【０１１２】
その後は、ステップＳ４２に戻り、以下、同様の処理が繰り返される。
【０１１３】
なお、注目出力バッファに対して、ステップＳ４６で画素値を書き込もうとしている画素に、画素値が、既に書き込まれている場合は、画素値が上書きされる。また、画素値が上書きされた画素が、既に、注目画素とされている場合には、その画素は、上書きが行われても、未処理の画素とはされず、処理済みの画素のままとされる。
【０１１４】
一方、ステップＳ４２において、注目出力バッファに、未処理の画素が記憶されていないと判定された場合、処理を終了する。
【０１１５】
次に、図１３を参照して、切り出し部３２が行う切り出し処理について、さらに説明する。
【０１１６】
切り出し部３２は、図１３（Ａ）に示すように、ユーザが、マウス９を操作することにより指定した注目フレームの画像上の位置（指定位置）にある画素の画素値を、注目フレームバッファ２１から読み出し、出力バッファに書き込む。さらに、切り出し部３２は、出力バッファに記憶されている画素のうちの、未処理の画素の任意の１つを、注目画素とし、その注目画素に隣接する８画素の画素値を、境界画像から取得する。そして、切り出し部３２は、境界画像における、注目画素に隣接する８画素の画素値の中に、境界画素が存在しない場合には、その８画素の画素値を、注目フレームバッファ２１から読み出し、出力バッファに書き込む。その結果、出力バッファには、図１３（Ｂ）に示すように、ユーザがマウス９によって指定した位置にある画素（図１３（Ｂ）において、●印で示す）を起点として、境界画素で囲まれる領域の内部を構成する画素の画素値が書き込まれていく。
【０１１７】
以上の処理が、出力バッファに記憶された画素の中に、未処理の画素がなくなるまで行われることにより、出力バッファには、注目フレームの画像のうちの、境界画素で囲まれる領域が記憶される。
【０１１８】
従って、以上のような切り出し処理によれば、ユーザが、オブジェクトであるとして指定した点を起点として、その起点を含む、境界部分で囲まれる注目フレームの領域が切り出されるので、オブジェクトを構成する領域を、精度良く切り出すことができる。即ち、オブジェクトを構成する領域の切り出しを、すべて自動で行う場合には、ある領域が、オブジェクトを構成するかどうかの判断が困難であり、その結果、オブジェクトを構成しない画素から、領域の切り出しが開始されることがある。これに対して、図１２の切り出し処理では、ユーザが、オブジェクトであるとして指定した点を起点として、領域の切り出しが行われるので、必ず、オブジェクトを構成する領域の画素から、領域の切り出しが開始され、オブジェクトを構成する領域を、精度良く切り出すことができる。
【０１１９】
なお、図１２の切り出し処理は、境界検出部３１で得られる３つの境界画像それぞれに基づいて行われ、その３つの境界画像に基づいて得られる領域の切り出し結果は、それぞれ、出力バッファ３２Ａ乃至３２Ｃに記憶される。そして、その出力バッファ３２Ａ乃至３２Ｃの記憶内容が、結果バッファ３３Ａ乃至３３Ｃにそれぞれ転送され、その結果、結果画面＃１乃至＃３には、それぞれ、異なる処理によって得られたオブジェクト抽出結果が表示される。
【０１２０】
次に、図１４乃至図１６を参照して、図２の履歴管理部４が管理する履歴情報について説明する。
【０１２１】
「部分取得」または「全取得」によって、結果バッファ３３Ａ乃至３３Ｃのうちのいずれかに記憶されたオブジェクト抽出結果の全部または一部が、オブジェクトバッファ２３に反映されると（書き込まれると）、履歴管理部４は、指定位置記憶部４１に記憶された指定位置、履歴画像記憶部４２に記憶された履歴画像、およびパラメータテーブル記憶部４３のエントリを更新する。
【０１２２】
即ち、例えば、いま、人間の全身が表示された注目フレームから、その人間が表示された部分をオブジェクトとして抽出する場合において、その胴体部分と下半身部分の画像が、オブジェクトとして、既に抽出され、オブジェクトバッファ２３に記憶されているとすると、履歴画像記憶部４２には、例えば、図１４（Ａ）に示すように、胴体部分を抽出するのに用いた境界画像のプレーンと、その境界画像を得るのに用いた閾値に対応付けられたＩＤ１が画素値となっている胴体部分の画素、および下半身部分を抽出するのに用いた境界画像のプレーンと、その境界画像を得るのに用いた閾値に対応付けられたＩＤ２が画素値となっている下半身部分の画素からなる履歴画像が記憶されている。
【０１２３】
なお、図１４（Ａ）の実施の形態では、胴体部分を抽出するのに用いた境界画像のプレーンはＨプレーンとなっており、そのＨプレーンの境界画像を得るのに用いた閾値（境界画像を得るための二値化に用いられた閾値）は１００となっている。また、下半身部分を抽出するのに用いた境界画像のプレーンはＶプレーンとなっており、その境界画像を得るのに用いた閾値は８０となっている。
【０１２４】
そして、この場合、パラメータテーブル記憶部４３には、ＩＤ１と、Ｈプレーンおよび閾値１００とが対応付けられて記憶されているとともに、ＩＤ２と、Ｖプレーンおよび閾値８０とが対応付けられて記憶されている。
【０１２５】
ここで、オブジェクトバッファ２３に記憶されたオブジェクトの抽出に用いられた境界画像のプレーンと、その境界画像を得るのに用いられた閾値とのセットを、以下、適宜、パラメータセットという。
【０１２６】
その後、ユーザが、人間の全身が表示された注目フレームにおける頭部の画素をマウス９でクリックして指定すると、オブジェクト抽出部３では、上述したようにして、３種類のオブジェクト抽出処理が行われ、その３種類のオブジェクト抽出処理による頭部のオブジェクト抽出結果が、図１４（Ｂ）に示すように、結果バッファ３３Ａ乃至３３Ｃにそれぞれ記憶されるとともに、その結果バッファ３３Ａ乃至３３Ｃの記憶内容が、結果画面＃１乃至＃３にそれぞれ表示される。
【０１２７】
そして、ユーザが、結果画面＃１乃至＃３に表示された頭部のオブジェクト抽出結果を参照して、良好なものを「全取得」すると、結果画面＃１乃至＃３に表示された頭部のオブジェクト抽出結果のうち、「全取得」が指示されたものが選択され、図１４（Ｃ）に示すように、オブジェクトバッファ２３に反映される。
【０１２８】
この場合、履歴管理部４は、オブジェクトバッファ２３に反映された頭部のオブジェクト抽出結果を得るのに用いた境界画像のプレーンと、その境界画像を得るのに用いた閾値のパラメータセットとを、ユニークなＩＤ３に対応付けて、パラメータテーブル記憶部４３に登録する。
【０１２９】
さらに、履歴管理部４は、図１４（Ｄ）に示すように、履歴画像記憶部４２の頭部を構成する画素の画素値に、ＩＤ３を書き込み、これにより、履歴画像を更新する。ここで、図１４（Ｄ）の実施の形態では、頭部を抽出するのに用いた境界画像のプレーンはＳプレーンとなっており、その境界画像を得るのに用いた閾値は５０となっている。
【０１３０】
また、履歴管理部４は、図１４（Ｅ）に示すように、オブジェクトバッファ２３に反映された頭部のオブジェクト抽出結果を得るときにユーザがクリックした注目フレーム上の点の位置（指定位置）を表す座標（ｘ４，ｙ４）を、指定位置記憶部４１に追加する。ここで、図１４（Ｅ）の実施の形態では、指定位置記憶部４１には、既に、３つの指定位置の座標（ｘ１，ｙ１），（ｘ２，ｙ２），（ｘ３，ｙ３）が記憶されており、そこに、新たな座標（ｘ４，ｙ４）が追加されている。
【０１３１】
以上のような注目フレームについての履歴情報は、例えば、その次のフレームが新たに注目フレームとされた場合に、その新たな注目フレームからのオブジェクトの抽出に利用される。
【０１３２】
即ち、例えば、いま、注目フレームの１フレーム前のフレーム（前フレーム）について、図１５（Ａ）に示すような履歴画像を得ることができたとして、注目フレームのある点を、ユーザがマウス９によってクリックしたとする。この場合、処理制御部７は、動き検出部６を制御することにより、ユーザがマウス９でクリックした点である指定位置（ｘ，ｙ）の、前フレームを基準とする動きベクトル（ｖ_x，ｖ_y）を求めさせる。さらに、処理制御部７は、その内蔵する位置補正部７１に、指定位置（ｘ，ｙ）を、動きベクトル（ｖ_x，ｖ_y）によって補正させることにより、指定位置（ｘ，ｙ）に対応する前フレーム上の位置（ｘ’，ｙ’）を求めさせる。即ち、この場合、位置補正部７１は、例えば、式（ｘ’，ｙ’）＝（ｘ，ｙ）−（ｖ_x，ｖ_y）によって、指定位置（ｘ，ｙ）に対応する前フレーム上の位置（ｘ’，ｙ’）を求める。
【０１３３】
その後、処理制御部７は、指定位置（ｘ，ｙ）に対応する前フレーム上の位置（ｘ’，ｙ’）におけるパラメータセットのＩＤを、履歴画像記憶部４２に記憶された前フレームの履歴画像を参照することで取得し、さらに、そのＩＤに対応付けられたパラメータセットを、パラメータテーブル記憶部４３を参照することで取得する。そして、処理制御部７は、上述のようにして取得したパラメータセットに基づく３つの境界画像の作成、およびその３つの境界画像からの、指定位置（ｘ，ｙ）を起点とする領域の切り出しを行うことを決定し、その旨の決定情報を、オブジェクト抽出部３に供給する。
【０１３４】
これにより、オブジェクト抽出部３では、図１５（Ｂ）に示すように、注目フレームを対象として、決定情報に対応するパラメータセットに基づく３つの境界画像の作成、およびその３つの境界画像からの、指定位置（ｘ，ｙ）を起点とする領域の切り出しが行われ、これにより、３パターンのオブジェクトが抽出される。このようにして注目フレームから抽出された３パターンのオブジェクトは、図１５（Ｃ）に示すように、結果バッファ３３Ａ乃至３３Ｃにそれぞれ記憶され、結果画面＃１乃至＃３においてそれぞれ表示される。
【０１３５】
注目フレームからオブジェクトを抽出する場合に、ユーザにより入力された指定位置の部分のオブジェクト抽出は、前フレームの対応する部分のオブジェクト抽出と同様に行うことにより、良好なオブジェクト抽出を行うことができると予想されるから、上述のように、決定情報に対応するパラメータセットに基づく境界画像の作成、およびその境界画像からの、指定位置（ｘ，ｙ）を起点とする領域の切り出しを行うことにより、迅速に、良好なオブジェクト抽出結果を得ることが可能となる。
【０１３６】
即ち、オブジェクト抽出部３では、例えば、ユーザが、「位置指定」を行うと、その「位置指定」によって指定された位置を起点として、３つの処理によるオブジェクト抽出が行われるが、その３つの処理によるオブジェクト抽出結果のうちのいずれもが良好でないときには、ユーザによって「順位指定」が行われ、後述するように、パラメータセットを替えてのオブジェクト抽出が行われる。従って、上述のように、前フレームの履歴情報を利用しない場合には、ある程度良好なオブジェクト抽出結果を得るために、ユーザが、「順位指定」を何度が行わなければならないことがある。これに対して、前フレームの履歴情報を利用する場合には、ユーザが「順位指定」を行わなくても、注目フレームのオブジェクト上の幾つかの点を指定するという容易な操作だけで、良好なオブジェクト抽出結果を、即座に得ることができる可能性が高くなる。
【０１３７】
また、注目フレームからのオブジェクト抽出にあたって、前フレームについての履歴情報は、例えば、次のように利用することも可能である。
【０１３８】
即ち、例えば、いま、注目フレームの１フレーム前のフレーム（前フレーム）について、図１６（Ａ）に示すような履歴画像と、図１６（Ｂ）に示すような３つの指定位置（ｘ１，ｙ１），（ｘ２，ｙ２），（ｘ３，ｙ３）を得ることができたとする。
【０１３９】
この場合、処理制御部７は、動き検出部６を制御することにより、前フレームの指定位置（ｘ１，ｙ１）を基準とする注目フレームの動きベクトル（ｖ_x，ｖ_y）を求めさせる。さらに、処理制御部７は、その内蔵する位置補正部７１に、前フレームの指定位置（ｘ１，ｙ１）を、動きベクトル（ｖ_x，ｖ_y）によって補正させることにより、前フレームの指定位置（ｘ，ｙ）に対応する注目フレーム上の位置（ｘ１’，ｙ１’）を求めさせる。即ち、この場合、位置補正部７１は、例えば、式（ｘ１’，ｙ１’）＝（ｘ１，ｙ１）＋（ｖ_x，ｖ_y）によって、前フレームの指定位置（ｘ１，ｙ１）に対応する注目フレーム上の位置（ｘ１’，ｙ１’）を求める。
【０１４０】
その後、処理制御部７は、前フレームの指定位置（ｘ１，ｙ１）におけるパラメータセットのＩＤを、履歴画像記憶部４２に記憶された前フレームの履歴画像を参照することで取得し、さらに、そのＩＤに対応付けられたパラメータセットを、パラメータテーブル記憶部４３を参照することで取得する。そして、処理制御部７は、上述のようにして取得したパラメータセットに基づく３つの境界画像の作成、およびその３つの境界画像からの、前フレームの指定位置（ｘ１，ｙ１）に対応する注目フレーム上の位置（ｘ１’，ｙ１’）を起点とする領域の切り出しを行うことを決定し、その旨の決定情報を、オブジェクト抽出部３に供給する。
【０１４１】
これにより、オブジェクト抽出部３では、図１６（Ｃ）に示すように、注目フレームを対象として、決定情報に対応するパラメータセットに基づく３つの境界画像の作成、およびその３つの境界画像からの、位置（ｘ１’，ｙ１’）を起点とする領域の切り出しが行われることにより、３パターンのオブジェクトが抽出される。
【０１４２】
前フレームの３つの指定位置（ｘ１，ｙ１），（ｘ２，ｙ２），（ｘ３，ｙ３）のうちの残りの２つの指定位置（ｘ２，ｙ２）と（ｘ３，ｙ３）についても同様の処理が行われ、これにより、その２つの指定位置に対応する注目フレーム上の位置（ｘ２’，ｙ２’）と（ｘ３’，ｙ３’）それぞれを起点とする３パターンのオブジェクトが抽出される。
【０１４３】
このようにして、前フレームの各指定位置に対応する注目フレーム上の位置それぞれを起点として、注目フレームから抽出されたオブジェクトの各部は、その後合成され、その結果得られるオブジェクト抽出結果は、図１６（Ｄ）に示すように、結果バッファに記憶され、結果画面において表示される。
【０１４４】
上述したように、注目フレームからのオブジェクトの抽出は、前フレームの対応する部分のオブジェクト抽出と同様に行うことにより、良好なオブジェクト抽出を行うことができると予想されるから、決定情報に対応するパラメータセットに基づく３つの境界画像の作成、およびその３つの境界画像からの、前フレームの指定位置（ｘ，ｙ）に対応する注目フレームの位置を起点とする領域の切り出しを行うことにより、図１５における場合と同様に、迅速に、良好なオブジェクト抽出結果を得ることが可能となる。
【０１４５】
さらに、図１５における場合には、ユーザが、注目フレームのオブジェクト上の点を指定する必要があったが、図１６における場合においては、ユーザが、そのような指定を行う必要がないので、ユーザの操作負担を、より軽減することができる。
【０１４６】
なお、図１６で説明したように、ユーザが、注目フレームのオブジェクト上の点を指定する前に、前フレームについての履歴情報を利用して、注目フレームのオブジェクト抽出を行い、そのオブジェクト抽出結果を、結果画面＃１乃至＃３に表示するか否かは、例えば、上述した基準画面のユースレコードボタン２０２（図３）をクリックすることにより表示されるプルダウンメニューにおいて設定することができる。
【０１４７】
次に、図１７のフローチャートを参照して、処理制御部７が図４のステップＳ６で行う、複数のオブジェクト抽出処理の内容を決定する処理について説明する。
【０１４８】
処理制御部７は、まず最初に、ステップＳ５１において、イベント検出部８からのイベント情報が、「位置指定」を表すものであるか、または「順位指定」を表すものであるかを判定する。ステップＳ５１において、イベント情報が、「位置指定」を表すものであると判定された場合、ステップＳ５２に進み、処理制御部７は、前フレームの履歴情報が、履歴管理部４に記憶されているかどうかを判定する。
【０１４９】
ステップＳ５２において、前フレームの履歴情報が、履歴管理部４に記憶されていると判定された場合、ステップＳ５３に進み、処理制御部７は、例えば、図１５で説明したように、前フレームの履歴情報に基づいて、注目フレームからオブジェクトを抽出する３つのオブジェクト抽出処理の内容を決定し、その旨の決定情報を、オブジェクト抽出部３に供給して、処理を終了する。
【０１５０】
また、ステップＳ５２において、前フレームの履歴情報が、履歴管理部４に記憶されていないと判定された場合、即ち、例えば、注目フレームが、ストレージ１に記憶された動画のフレームの中で、最初に注目フレームとされたものである場合、ステップＳ５４に進み、処理制御部７は、注目フレームからオブジェクトを抽出する３つのオブジェクト抽出処理の内容を、デフォルト値に決定し、その旨の決定情報を、オブジェクト抽出部３に供給して、処理を終了する。
【０１５１】
一方、ステップＳ５１において、イベント情報が、「順位指定」を表すものであると判定された場合、ステップＳ５５に進み、処理制御部７は、ユーザがマウス９を操作することにより行った順位付けに基づいて、注目フレームからオブジェクトを抽出する３つのオブジェクト抽出処理の内容を決定し、その旨の決定情報を、オブジェクト抽出部３に供給して、処理を終了する。
【０１５２】
次に、図１８のフローチャートを参照して、図１７のステップＳ５３乃至Ｓ５５における、オブジェクト抽出処理の内容の決定方法について、具体的に説明する。なお、イベント情報が「位置指定」を表すものである場合には、ユーザがマウス９をクリックすることにより、注目フレーム上のある位置を指定した場合であるが、そのユーザが指定した位置の座標は、イベント情報に含まれるものとする。また、イベント情報が「順位指定」を表すものである場合には、ユーザが、マウス９を操作することにより、３つの結果画面＃１乃至＃３に表示されたオブジェクト抽出結果に順位付けを行った場合であるが、そのオブジェクト抽出結果の順位（ここでは、第１位乃至第３位）も、イベント情報に含まれるものとする。
【０１５３】
図１７のステップＳ５３において、前フレームの履歴情報に基づいて、オブジェクト抽出処理の内容を決定する場合には、図１８（Ａ）のフローチャートに示すように、まず最初に、ステップＳ６１において、処理制御部７は、前フレームの履歴情報を参照することにより、前フレームの最終的なオブジェクト抽出結果（オブジェクトバッファ２３に最終的に記憶されたオブジェクト）が得られたときに用いられた境界画像のプレーンと同一のプレーンの境界画像を用いることを決定する。
【０１５４】
即ち、処理制御部７は、ユーザがマウス９によって指定した注目フレームの位置に対応する前フレームの位置の画素が、オブジェクトとして抽出されたときに用いられた境界画像のプレーンを、前フレームの履歴情報を参照することにより認識し、そのプレーンの境界画像を、境界検出部３１で作成することを決定する。ここで、このようにして、注目フレームのオブジェクト抽出に用いることが決定された境界画像のプレーンを、以下、適宜、決定プレーンという。
【０１５５】
そして、ステップＳ６２に進み、処理制御部７は、ユーザがマウス９によって指定した注目フレームの位置に対応する前フレームの位置の画素が、オブジェクトとして抽出されたときに境界画像を得るための二値化で用いられた閾値を、前フレームの履歴情報を参照することにより認識し、その閾値を、注目画像について、決定プレーンの境界画像を得るための二値化に用いる３つの閾値ＴＨ１乃至ＴＨ３のうちの２番目の閾値ＴＨ２として決定する。ここで、このようにして決定された閾値ＴＨ２を、以下、適宜、決定閾値という。
【０１５６】
その後、処理制御部７は、ステップＳ６３に進み、３つの閾値ＴＨ１乃至ＴＨ３のうちの決定閾値ＴＨ２以外の残りの２つの閾値ＴＨ１とＴＨ３を、決定閾値ＴＨ２に基づき、例えば、式ＴＨ１＝ＴＨ２−２０と、ＴＨ３＝ＴＨ２＋２０にしたがって決定し、決定プレーン、決定閾値ＴＨ２、決定閾値ＴＨ２に基づいて決定された閾値ＴＨ１とＴＨ３を、決定情報として、オブジェクト抽出部３に供給して、処理を終了する。
【０１５７】
この場合、オブジェクト抽出部３の境界検出部３１（図９）では、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖのうち、決定プレーンの画像についてエッジ検出を行うものにおいて、エッジ検出が行われる。そして、そのエッジ検出部に接続する二値化部において、決定閾値ＴＨ２、および決定閾値ＴＨ２を用いて決定された閾値ＴＨ１とＴＨ３の３つの閾値を用いて二値化が行われ、これにより、３つの境界画像が作成される。さらに、オブジェクト抽出部３の切り出し部３２では、境界検出部３１で作成された３つの境界画像それぞれについて、ユーザが指定した注目フレーム上の位置を起点とした、図１２および図１３で説明した領域の切り出しが行われる。
【０１５８】
以上のように、前フレームの履歴情報が存在する場合には、その履歴情報と、ユーザが指定した注目フレーム上の位置に基づいて、注目フレームについてのオブジェクト抽出処理の内容が決定されるので、注目フレームにおいて、前フレームと同じような特徴を有する部分については、前フレームにおける場合と同様の処理によって、オブジェクト抽出が行われる。従って、容易な操作で、的確なオブジェクト抽出を行うことが可能となる。
【０１５９】
次に、図１８（Ｂ）のフローチャートを参照して、図１７のステップＳ５４において、オブジェクト抽出処理の内容をデフォルト値に決定する場合の処理制御部７の処理について説明する。
【０１６０】
この場合、まず最初に、ステップＳ７１において、処理制御部７は、ユーザがマウス９でクリックした注目フレーム上の位置（指定位置）の画素付近の画素のＶ成分の平均値、即ち、例えば、指定位置の画素を含む横×縦が８×８画素のＶ成分の平均値が、５０未満であるかどうかを判定する。
【０１６１】
ステップＳ７１において、指定位置の画素を含む８×８画素のＶ成分の平均値が５０未満であると判定された場合、ステップＳ７２に進み、処理制御部７は、Ｖプレーンの境界画像を、境界検出部３１で作成することを決定する。
【０１６２】
即ち、Ｖ成分が小さい領域については、ＨやＳプレーンの境界画像を用いた場合には、Ｖプレーンの境界画像を用いた場合に比較して、領域の切り出しが不正確になることが経験的に認められるため、ステップＳ７２では、上述のように、決定プレーンがＶプレーンとされる。
【０１６３】
そして、ステップＳ７３に進み、処理制御部７は、決定プレーンであるＶプレーンの境界画像を得るための二値化に用いる３つの閾値ＴＨ１乃至ＴＨ３を、デフォルト値である、例えば、４０，１００，１８０にそれぞれ決定し、決定プレーンがＶプレーンである旨と、閾値ＴＨ１乃至ＴＨ３を、決定情報として、オブジェクト抽出部３に供給し、処理を終了する。
【０１６４】
この場合、オブジェクト抽出部３の境界検出部３１（図９）では、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖのうち、決定プレーンであるＶプレーンの画像についてエッジ検出を行うエッジ検出部２１２Ｖにおいて、エッジ検出が行われる。そして、エッジ検出部２１２Ｖに接続する二値化部２１３Ｖにおいて、閾値ＴＨ１乃至ＴＨ３としての４０，１００，１８０の３つの閾値を用いて二値化が行われ、これにより、３つの境界画像が作成される。さらに、オブジェクト抽出部３の切り出し部３２では、境界検出部３１で作成された３つの境界画像それぞれについて、ユーザが指定した注目フレーム上の位置を起点とした、図１２および図１３で説明した領域の切り出しが行われる。
【０１６５】
一方、ステップＳ７１において、指定位置の画素を含む８×８画素のＶ成分の平均値が５０未満でないと判定された場合、ステップＳ７４に進み、処理制御部７は、Ｈ，Ｓ，Ｖプレーンそれぞれの境界画像を、境界検出部３１で作成することを決定する。
【０１６６】
即ち、Ｖ成分がある程度大きい領域については、その領域の特徴によって、正確に、領域を切り出すことのできる境界画像のプレーンが異なる。さらに、いまの場合、前フレームの履歴情報が存在しないため、いずれのプレーンの境界画像が領域の切り出しに適切であるかを予測することが困難である。そこで、ステップＳ７４では、Ｈ，Ｓ，Ｖの３つのプレーンが、決定プレーンとされる。
【０１６７】
そして、ステップＳ７５に進み、処理制御部７は、決定プレーンであるＨ，Ｓ，Ｖプレーンの境界画像をそれぞれ得るための二値化に用いる閾値ＴＨ_H，ＴＨ_S，ＴＨ_Vを、例えば、いずれも、デフォルト値である１００に決定し、決定プレーンがＨ，Ｓ，Ｖプレーンである旨と、閾値ＴＨ_H，ＴＨ_S，ＴＨ_Vを、決定情報として、オブジェクト抽出部３に供給し、処理を終了する。
【０１６８】
この場合、オブジェクト抽出部３の境界検出部３１（図９）では、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖそれぞれにおいて、Ｈ，Ｓ，Ｖプレーンの画像について、エッジ検出が行われる。そして、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖに接続する二値化部２１３Ｈ，２１３Ｓ，２１３Ｖそれぞれにおいて、閾値ＴＨ_H，ＴＨ_S，ＴＨ_V（上述したように、ここでは、いずれも１００）を用いて二値化が行われ、これにより、３つのＨ，Ｓ，Ｖプレーンの境界画像が作成される。さらに、オブジェクト抽出部３の切り出し部３２では、境界検出部３１で作成されたＨ，Ｓ，Ｖプレーンの境界画像それぞれについて、ユーザが指定した注目フレーム上の位置を起点とした、図１２および図１３で説明した領域の切り出しが行われる。
【０１６９】
次に、図１８（Ｃ）のフローチャートを参照して、図１７のステップＳ５５において、指定順位に基づいて、オブジェクト抽出処理の内容を決定する場合の処理制御部７の処理について説明する。
【０１７０】
この場合、まず最初に、ステップＳ８１において、処理制御部７は、順位付けが、Ｈ，Ｓ，Ｖの３つのプレーンの境界画像を用いて得られたオブジェクト抽出結果に対して行われたもの（以下、適宜、プレーンに対する順位付けという）であるか、または、ある１つのプレーンの境界画像であって、３つの異なる閾値による二値化により作成されたものを用いて得られたオブジェクト抽出結果に対して行われたもの（以下、適宜、閾値に対する順位付けという）であるかを判定する。
【０１７１】
ステップＳ８１において、順位付けが、プレーンに対する順位付けであると判定された場合、ステップＳ８２に進み、処理制御部７は、Ｈ，Ｓ，Ｖの３つのプレーンの境界画像を用いて得られたオブジェクト抽出結果に対する順位を認識し、第１位のオブジェクト抽出結果が得られたプレーンの境界画像を、境界検出部３１で作成することを決定する。即ち、処理制御部７は、第１位のオブジェクト抽出結果が得られた境界画像のプレーンを、決定プレーンとする。
【０１７２】
そして、ステップＳ８３に進み、処理制御部７は、決定プレーンの境界画像を得るための二値化に用いる３つの閾値ＴＨ１乃至ＴＨ３を、デフォルト値である、例えば、４０，１００，１８０にそれぞれ決定し、決定プレーンが例えばＶプレーンである旨と、閾値ＴＨ１乃至ＴＨ３を、決定情報として、オブジェクト抽出部３に供給し、処理を終了する。
【０１７３】
この場合、オブジェクト抽出部３の境界検出部３１（図９）では、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖのうち、決定プレーン（第１位の順位付けが行われたオブジェクト抽出結果が得られた境界画像のプレーン）の画像についてエッジ検出を行うものにおいて、エッジ検出が行われる。そして、そのエッジ検出部に接続する二値化部において、閾値ＴＨ１乃至ＴＨ３としての４０，１００，１８０の３つの閾値を用いて二値化が行われ、これにより、３つの境界画像が作成される。さらに、オブジェクト抽出部３の切り出し部３２では、境界検出部３１で作成された３つの境界画像それぞれについて、ユーザが直前に指定した注目フレーム上の位置を起点とした、図１２および図１３で説明した領域の切り出しが行われる。
【０１７４】
一方、ステップＳ８１において、順位付けが、閾値に対する順位付けであると判定された場合、ステップＳ８４に進み、処理制御部７は、その順位付けが行われた注目フレームのオブジェクト抽出結果を得たときに用いられた境界画像のプレーンと同一のプレーンの境界画像を、境界検出部３１で作成することを決定する。即ち、処理制御部７は、注目フレームについての前回のオブジェクト抽出結果を得たときに用いられた境界画像のプレーンを、決定プレーンとする。
【０１７５】
そして、ステップＳ８５に進み、処理制御部７は、決定プレーンの境界画像を得るための二値化に用いる３つの閾値ＴＨ１乃至ＴＨ３を、閾値に対する順位付けに基づいて決定する。即ち、処理制御部７は、前回のオブジェクト抽出結果を得たときに用いられた３つの閾値のうち、第１位に指定された閾値を、閾値ＴＨ１に決定する。さらに、処理制御部７は、前回のオブジェクト抽出結果を得たときに用いられた３つの閾値のうち、第１位に指定された閾値と第２位に指定された閾値の平均値を、閾値ＴＨ２に決定する。また、処理制御部７は、前回のオブジェクト抽出結果を得たときに用いられた３つの閾値のうち、第２位に指定された閾値を、閾値ＴＨ３に決定する。その後、処理制御部７は、決定プレーンと、閾値ＴＨ１乃至ＴＨ３を、決定情報として、オブジェクト抽出部３に供給し、処理を終了する。
【０１７６】
この場合、オブジェクト抽出部３の境界検出部３１（図９）では、エッジ検出部２１２Ｈ，２１２Ｓ，２１２Ｖのうち、決定プレーン（前回のオブジェクト抽出結果を得たときに用いられた境界画像のプレーンと同一のプレーン）の画像についてエッジ検出を行うものにおいて、エッジ検出が行われる。そして、そのエッジ検出部に接続する二値化部において、前回用いられた閾値の順位付けに基づいて上述のように決定された３つの閾値ＴＨ１乃至ＴＨ３を用いて二値化が行われ、これにより、３つの境界画像が作成される。さらに、オブジェクト抽出部３の切り出し部３２では、境界検出部３１で作成された３つの境界画像それぞれについて、ユーザが直前に指定した注目フレーム上の位置を起点とした、図１２および図１３で説明した領域の切り出しが行われる。
【０１７７】
以上のようにして、オブジェクト抽出処理の内容を決定する場合においては、ユーザが、注目フレーム上のある位置を指定すると、前フレームの履歴情報が存在しない場合であって、指定位置付近のＶ成分の平均値が５０以上である場合には、Ｈ，Ｓ，Ｖの３つのプレーンの境界画像から、３つのオブジェクト抽出結果が求められる（図１８（Ｂ）におけるステップＳ７１，Ｓ７４，Ｓ７５）。そして、その３つのオブジェクト抽出結果に対して、プレーンに対する順位付けが行われると、第１位の１つのプレーンの画像と、３つの閾値ＴＨ１乃至ＴＨ３から作成される３つの境界画像から、３つのオブジェクト抽出結果が求められる（図１８（Ｃ）におけるステップＳ８１乃至Ｓ８３）。
【０１７８】
さらに、ユーザが、注目フレーム上のある位置を指定した場合に、前フレームの履歴情報が存在せず、かつ、指定位置付近のＶ成分の平均値が５０未満であるときも、１つのプレーンであるＶプレーンの画像と、デフォルトの３つの閾値ＴＨ１乃至ＴＨ３から作成される３つの境界画像から、３つのオブジェクト抽出結果が求められる（図１８（Ｂ）におけるステップＳ７１乃至Ｓ７３）。
【０１７９】
また、ユーザが、注目フレーム上のある位置を指定した場合に、前フレームの履歴情報が存在するときも、その履歴情報に基づいて決定される１つのプレーンの画像と、履歴情報に基づいて決定される３つの閾値ＴＨ１乃至ＴＨ３から作成される３つの境界画像から、３つのオブジェクト抽出結果が求められる（図１８（Ａ））。
【０１８０】
以上のように、１つのプレーンの画像と、３つの閾値から作成される３つの境界画像から、３つのオブジェクト抽出結果が求められた後に、その３つのオブジェクト抽出結果に対して順位付け（閾値に対する順位付け）が行われると、上述したように、その順位付けに基づいて、３つの閾値ＴＨ１乃至ＴＨ３が更新されていく（図１８（Ｃ）におけるステップＳ８１，Ｓ８４，Ｓ８５）。
【０１８１】
即ち、例えば、図１９（Ａ）に示すように、閾値ＴＨ１，ＴＨ２，ＴＨ３が、第１位、第２位、第３位に、それぞれ指定された場合には、次のオブジェクト抽出処理においては、図１９（Ｂ）に示すように、閾値ＴＨ１は、前回の第１位の閾値に、閾値ＴＨ２は、前回の第１位と第２位の閾値の平均値に、閾値ＴＨ３は、前回の第２の閾値に、それぞれ決定される。さらに、このように決定された３つの閾値ＴＨ１乃至ＴＨ３を用いて得られる３つのオブジェクト抽出結果に対して順位付けが行われ、これにより、図１９（Ｂ）に示すように、閾値ＴＨ１，ＴＨ２，ＴＨ３が、第３位、第１位、第２位に、それぞれ指定された場合には、次のオブジェクト抽出処理においては、図１９（Ｃ）に示すように、やはり、閾値ＴＨ１は、前回の第１位の閾値に、閾値ＴＨ２は、前回の第１位と第２位の閾値の平均値に、閾値ＴＨ３は、前回の第２の閾値に、それぞれ決定される。そして、このように決定された３つの閾値ＴＨ１乃至ＴＨ３を用いて得られる３つのオブジェクト抽出結果に対して、再び、順位付けが行われ、これにより、図１９（Ｃ）に示すように、閾値ＴＨ１，ＴＨ２，ＴＨ３が、第３位、第１位、第２位に、それぞれ指定された場合には、次のオブジェクト抽出処理においては、図１９（Ｄ）に示すように、やはり、閾値ＴＨ１は、前回の第１位の閾値に、閾値ＴＨ２は、前回の第１位と第２位の閾値の平均値に、閾値ＴＨ３は、前回の第２の閾値に、それぞれ決定される。
【０１８２】
従って、順位付けを繰り返すことにより、閾値ＴＨ１，ＴＨ２，ＴＨ３は、注目フレームからオブジェクトを抽出するのに、より適した値に収束していくことになり、その結果、的確なオブジェクト抽出が可能となる。
【０１８３】
さらに、前フレームの履歴情報が存在する場合には、その履歴情報に基づいて、３つの閾値ＴＨ１乃至ＴＨ３が決定されることにより、その３つの閾値ＴＨ１乃至ＴＨ３は、注目フレームからオブジェクトを抽出するのにある程度適した値となることから、ユーザは、「順位指定」の操作を、それほどの回数行わなくても、即ち、最良の場合には、「順位指定」の操作を一度も行わなくても、注目フレームについて、良好なオブジェクト抽出結果を得ることが可能となる。
【０１８４】
なお、図１８および図１９の実施の形態では、履歴情報がない場合において、指定位置付近のＶ成分の平均値が５０以上であるときには、Ｈ，Ｓ，Ｖの３つのプレーンの境界画像から、３つのオブジェクト抽出結果が求められ、その３つのオブジェクト抽出結果に対して、プレーンに対する順位付けが行われると、第１位の１つのプレーンの画像と、３つの閾値ＴＨ１乃至ＴＨ３から作成される３つの境界画像から、３つのオブジェクト抽出結果が求められ、その後に、閾値に対する順位付けが可能な状態となる。
【０１８５】
これに対して、履歴情報がない場合でも、指定位置付近のＶ成分の平均値が５０未満であるときには、上述したように、経験則から、１つのプレーンであるＶプレーンの画像と、３つの閾値ＴＨ１乃至ＴＨ３から作成される３つの境界画像から、３つのオブジェクト抽出結果が求められ、その後に、閾値に対する順位付けが可能な状態となる。従って、この場合、ユーザは、プレーンに対する順位付けを行う必要がなく、その分、ユーザの操作負担を軽減することができる。
【０１８６】
次に、図４のステップＳ１４における初期抽出処理について説明する。
【０１８７】
初期抽出処理では、新たな注目フレームについて、ユーザが、オブジェクトの位置を指定する「位置指定」を行わなくても、前フレームから得られた最終的なオブジェクト抽出結果や履歴情報を利用して、複数（本実施の形態では、３つ）のオブジェクト抽出処理が行われ、そのオブジェクト抽出結果が、表示部５に表示される。
【０１８８】
即ち、図２０は、新たな注目フレームについて、初期抽出処理が行われた直後の表示部５における画面の表示例を示している。
【０１８９】
図２０の実施の形態において、基準画面には、新たな注目フレームの画像（原画像）が表示されており、結果画面＃１乃至＃３には、前フレームから得られた最終的なオブジェクト抽出結果や履歴情報を利用して行われた３つのオブジェクト抽出処理による３つのオブジェクト抽出結果が、それぞれ表示されている。
【０１９０】
なお、図２０の実施の形態では、基準画面のアンドゥボタン２０４の下部に、リセットレコードボタン(Reset Record)２０５が新たに表示されているが、このリセットレコードボタン２０５は、前フレームの履歴情報を消去するときに操作される。即ち、リセットレコードボタン２０５を、マウス９によりクリックすると、履歴管理部４に記憶されている前フレームの履歴情報が使用不可になる。但し、リセットレコードボタン２０５を再度クリックすると、履歴情報は、使用可能な状態となる。
【０１９１】
結果画面＃１乃至＃３に表示される、新たな注目フレームについてのオブジェクト抽出結果は、オブジェクト抽出部３において、例えば、次のような第１乃至第３の初期抽出処理が行われることにより、それぞれ求められる。
【０１９２】
即ち、例えば、いま、前フレームについて、図２１（Ａ）に示すような履歴画像を得ることができていたとする。ここで、図２１（Ａ）の実施の形態は、人間の全身が表示された前フレームから、その人間が表示された部分がオブジェクトとして抽出された場合の履歴画像を示している。さらに、図２１（Ａ）の履歴情報は、人間の頭部の領域については、Ｓプレーンの画像と、二値化における閾値５０を用いて、胴体部分の領域については、Ｈプレーンの画像と、二値化における閾値１００を用いて、下半身部分の領域については、Ｖプレーンの画像と、二値化における閾値８０を用いて、それぞれ、オブジェクト抽出が行われたことを表している。従って、図２１の履歴画像において、人間の頭部の領域を構成するすべての画素、胴体部分の領域を構成するすべての画素、または下半身部分の領域を構成するすべての画素の画素値は、それぞれ同一のＩＤとなっている。
【０１９３】
第１の初期抽出処理では、例えば、処理制御部７が、まず、前フレームの履歴画像において同一のＩＤを画素値とする画素の集合でなる領域の重心を求める。従って、図２１（Ａ）の実施の形態では、人間の頭部の領域、胴体部分の領域、下半身部分の領域の重心がそれぞれ求められる。
【０１９４】
なお、ある領域の重心の座標を（Ｘ，Ｙ）と表すと、重心（Ｘ，Ｙ）は、例えば、次式にしたがって求められる。
【０１９５】
Ｘ＝Σｘ_k／Ｎ
Ｙ＝Σｙ_k／Ｎ
但し、Ｎは、ある領域を構成する画素数を表し、Σは、変数ｋを１からＮに変えてのサメーションを表す。また、（ｘ_k，ｙ_k）は、ある領域を構成するｋ番目の画素の座標を表す。
【０１９６】
人間の頭部の領域、胴体部分の領域、下半身部分の領域のそれぞれについて、例えば、図２１（Ｂ）に示すように、重心（ｘ１，ｙ１），（ｘ２，ｙ２），（ｘ３，ｙ３）が求められると、処理制御部７は、動き検出部６を制御することにより、前フレームの重心（ｘ１，ｙ１）を基準とする注目フレームの動きベクトル（ｖ_x，ｖ_y）を求めさせる。さらに、処理制御部７は、その内蔵する位置補正部７１に、前フレームの指定位置（ｘ１，ｙ１）を、動きベクトル（ｖ_x，ｖ_y）によって補正させることにより、前フレームの重心（ｘ１，ｙ１）に対応する注目フレーム上の位置（ｘ１’，ｙ１’）を求めさせる。即ち、この場合、位置補正部７１は、例えば、式（ｘ１’，ｙ１’）＝（ｘ１，ｙ１）＋（ｖ_x，ｖ_y）によって、前フレームの重心（ｘ１，ｙ１）に対応する注目フレーム上の位置（ｘ１’，ｙ１’）を求める。
【０１９７】
その後、処理制御部７は、前フレームの重心（ｘ１，ｙ１）の画素の画素値であるＩＤを、履歴画像記憶部４２に記憶された前フレームの履歴画像を参照することで取得し、さらに、そのＩＤに対応付けられたパラメータセットを、パラメータテーブル記憶部４３を参照することで取得する。そして、処理制御部７は、上述のようにして取得したパラメータセットに基づく境界画像の作成、およびその３つの境界画像からの、前フレームの重心（ｘ１，ｙ１）に対応する注目フレーム上の位置（ｘ１’，ｙ１’）を起点とする領域の切り出しを行うことを決定し、その旨の決定情報を、オブジェクト抽出部３に供給する。
【０１９８】
処理制御部７は、図２１（Ｂ）に示した３つの領域の重心（ｘ１，ｙ１），（ｘ２，ｙ２），（ｘ３，ｙ３）のうちの残りの２つの重心（ｘ２，ｙ２）と（ｘ３，ｙ３）についても、それぞれ同様の処理を行う。
【０１９９】
これにより、オブジェクト抽出部３では、図２１（Ｃ）に示すように、注目フレームを対象として、決定情報に対応するパラメータセットに基づく境界画像の作成、およびその境界画像からの、前フレームの重心（ｘ１，ｙ１），（ｘ２，ｙ２），（ｘ３，ｙ３）に対応する注目フレーム上の位置（ｘ１’，ｙ１’），（ｘ２’，ｙ２’），（ｘ３’，ｙ３’）をそれぞれ起点とする領域の切り出しが行われ、オブジェクトが抽出される。このオブジェクト抽出結果は、図２１（Ｄ）に示すように、結果バッファ３３Ａに記憶され、図２０に示したように、結果画面＃１において表示される。
【０２００】
次に、第２の初期抽出処理では、オブジェクト抽出部３は、例えば、テンプレートマッチングを行うことによって、注目フレームからオブジェクトを抽出する。
【０２０１】
即ち、オブジェクト抽出部３は、注目フレーム処理部２を介して、ストレージ１から、前フレームの最終的なオブジェクト抽出結果と、注目フレームを読み出し、図２２（Ａ）に示すように、前フレームの最終的なオブジェクト抽出結果と、注目フレームとを重ねて、対応する画素の画素値（例えば、輝度）どうしの絶対値差分の総和を演算する。オブジェクト抽出部３は、このような絶対値差分の総和を、図２２（Ｂ）に示すように、前フレームの最終的なオブジェクト抽出結果と、注目フレームとを重ね合わせる位置を、例えば、１画素ごとに変えて求め、絶対値差分の総和が最小になるときの、前フレームの最終的なオブジェクト抽出結果と、注目フレームとの位置関係を求める。さらに、オブジェクト抽出部３は、図２２（Ｃ）に示すように、その位置関係において、前フレームの最終的なオブジェクト抽出結果との画素値どうしの絶対値差分が、例えば２０以下になる画素を、注目フレームから検出し、その画素値を、注目フレームについてのオブジェクト抽出結果として、結果バッファ３３Ｂに書き込む。このようにして結果バッファ３３Ｂに書き込まれたオブジェクト抽出結果は、図２０に示したように、結果画面＃２において表示される。
【０２０２】
次に、第３の初期抽出処理では、オブジェクト抽出部３は、例えば、図１６で説明した場合と同様にして、注目フレームからオブジェクトを抽出する。このオブジェクト抽出結果は、結果バッファ３３Ｃに書き込まれ、図２０に示したように、結果画面＃３において表示される。
【０２０３】
以上の初期抽出処理は、注目フレームが新たなフレームに変更された後に、ユーザからの入力を特に待つことなく、いわば自動的に行われるため、「位置指定」や「順位指定」といったユーザの操作負担を軽減することができる。
【０２０４】
ところで、初期抽出処理によって得られた３つのオブジェクト抽出結果のうちのいずれかに対して、ユーザが、「全取得」または「部分取得」の操作を行い、これにより、その３つのオブジェクト抽出結果のうちのいずれかの全部または一部が、オブジェクトバッファ２３に反映され、注目フレームの最終的なオブジェクト抽出結果とされた場合には、その最終的なオブジェクト抽出結果のうち、初期抽出処理によって得られた部分については、履歴情報が存在しないこととなり、次のフレームが注目フレームとされたときに、前フレームの履歴情報が存在せず、その結果、ユーザの操作負担が増加することがある。
【０２０５】
そこで、前フレームの履歴情報は、次のフレームの履歴情報に継承することが可能である。
【０２０６】
即ち、例えば、図２３（Ａ）に示すように、前フレームについて最終的なオブジェクト抽出結果が得られた後に、初期抽出処理が行われることにより、注目フレームのオブジェクト抽出結果が結果画面に表示され、そのオブジェクト抽出結果の一部が、ユーザによる「部分取得」操作によって、オブジェクトバッファ２３に反映されたとする。
【０２０７】
この場合、処理制御部７は、オブジェクトバッファ２３に反映されたオブジェクトの部分の動きベクトルを、動き検出部６を制御することにより求めさせ、図２３（Ｂ）に示すように、その動きベクトルによって、前フレームの履歴画像の、オブジェクトバッファ２３に反映された領域に対応する部分の位置を補正する。さらに、処理制御部７は、履歴管理部４を制御することにより、その補正後の履歴画像の部分を、注目フレームの履歴画像としてコピーさせる。
【０２０８】
また、処理制御部７は、注目フレームの履歴画像としてコピーした前フレームの履歴画像の範囲内に、ユーザがマウス９をクリックすることによって指定した前フレーム上の位置が存在するかどうかを、指定位置記憶部４１を参照することにより判定し、存在する場合には、図２３（Ｃ）に示すように、その位置（指定位置）を、図２３（Ｂ）で説明した動きベクトルによって補正する。そして、処理制御部７は、履歴管理部４を制御することにより、その補正後の指定位置の座標を、注目フレームの指定位置の座標として、指定位置記憶部４１に記憶させる。
【０２０９】
なお、このような履歴情報の継承は、上述の第１乃至第３の初期抽出処理のうちの、第１と第３の初期抽出処理によって得られたオブジェクト抽出結果が、オブジェクトバッファ２３に反映された場合にのみ行い、第２の初期抽出処理によって得られたオブジェクト抽出結果が、オブジェクトバッファ２３に反映された場合には行わないようにすることが可能である。
【０２１０】
即ち、第１と第３の初期抽出処理では、上述したように、前フレームの履歴情報に基づいて、注目フレームからのオブジェクト抽出が行われるため、そのオブジェクト抽出結果が、オブジェクトバッファ２３に反映されるということは、注目フレームについて、前フレームの履歴情報を用いずにオブジェクト抽出を行ったとしても、前フレームと同様の処理によって得られたオブジェクト抽出結果が、最終的なオブジェクト抽出結果とされる可能性が高く、従って、前フレームと同様の履歴情報が作成される可能性が高い。
【０２１１】
これに対して、第２の初期抽出処理では、上述したように、テンプレートマッチングによって、注目フレームからオブジェクトが抽出されるため、そのオブジェクト抽出結果が、オブジェクトバッファ２３に反映されたとしても、その反映された部分は、前フレームの履歴情報とは無関係に、注目フレームから抽出されたものであるから、前フレームの履歴情報を用いずにオブジェクト抽出を行ったときに、前フレームと同様の処理によって得られたオブジェクト抽出結果が、最終的なオブジェクト抽出結果とされる可能性が高いとはいえず、従って、前フレームと同様の履歴情報が作成される可能性が高いともいえない。
【０２１２】
このため、上述したように、履歴情報の継承は、第１と第３の初期抽出処理によって得られたオブジェクト抽出結果が、オブジェクトバッファ２３に反映された場合にのみ行い、第２の初期抽出処理によって得られたオブジェクト抽出結果が、オブジェクトバッファ２３に反映された場合には行わないようにすることができる。
【０２１３】
但し、履歴情報の継承は、第１乃至第３の初期抽出処理によって得られたオブジェクト抽出結果のうちのいずれが、オブジェクトバッファ２３に反映された場合であっても行うようにすることが可能である。
【０２１４】
以上のように、複数の処理によって、複数のオブジェクト抽出結果を求め、その複数のオブジェクト抽出結果の中から、良好なものを、ユーザに判断してもらい、最終的なオブジェクト抽出結果に反映するようにしたので、容易な操作で、的確なオブジェクト抽出を行うことができる。
【０２１５】
さらに、前フレームの履歴情報が存在する場合には、その履歴情報と、ユーザにより入力された注目フレーム上の位置に基づいて、注目フレームのオブジェクト抽出を行うようにしたので、やはり、容易な操作で、的確なオブジェクト抽出を行うことができる。
【０２１６】
即ち、図２の画像処理装置では、ユーザが、注目フレームのオブジェクト上のある位置を指定すると、その位置を起点として、注目フレームの領域が、３つの処理によって切り出され（オブジェクト抽出され）、その結果得られる３つのオブジェクト抽出結果が表示される。さらに、ユーザが、必要に応じて、３つのオブジェクト抽出結果に対して順位を指定すると、その順位に基づく３つの処理によって、注目フレームの領域が再度切り出され、その結果得られる３つのオブジェクト抽出結果が表示される。そして、ユーザが、３つのオブジェクト抽出結果の中から、適切と思うものを指定すると、その指定されたオブジェクト抽出結果が、最終的なオブジェクト抽出結果に反映される。従って、ユーザが、オブジェクト上の位置の指定、必要な順位の指定、および適切なオブジェクト抽出結果の指定という負担の少ない操作を、必要な回数だけ繰り返すことにより、注目フレームから、的確に、オブジェクトを抽出することができる。
【０２１７】
さらに、次のフレームが注目フレームとされた場合には、その１フレーム前のフレーム（前フレーム）について作成された履歴情報を参照することにより、注目フレーム上のユーザ入力した位置の画素に対応する前フレームの画素をオブジェクトとして抽出するのに用いられたパラメータセットが認識され、そのパラメータセットに基づいて、３つのオブジェクト抽出処理の内容が決定される。従って、ユーザは、注目フレームのオブジェクト上の位置を指定するという負担の少ない操作を行うだけで、注目フレームから、良好なオブジェクト抽出結果を得ることができる。
【０２１８】
なお、本実施の形態では、ユーザに、注目フレームのオブジェクト上の位置を指定してもらうようにしたが、その他、例えば、注目フレームのオブジェクトの一部の範囲を指定してもらうようにすることも可能である。
【０２１９】
また、本実施の形態では、注目フレームのオブジェクト抽出を、前フレームのみの履歴情報に基づいて行うようにしたが、注目フレームのオブジェクト抽出は、その他、例えば、過去数フレームの履歴情報に対して重み付けをし、その重み付けされた履歴情報に基づいて行うことも可能である。また、本実施の形態では、時間順に、フレームを処理するものとして、前フレームの履歴情報を、注目フレームからのオブジェクト抽出に用いることとしたが、その他、例えば、時間の逆順に、フレームが処理される場合には、注目フレームからのオブジェクト抽出には、時間的に後行するフレームの履歴情報を用いることが可能である。
【０２２０】
さらに、本実施の形態では、動画の各フレームからオブジェクトを抽出するようにしたが、本発明は、静止画からのオブジェクト抽出にも適用可能である。
【０２２１】
また、本発明は、いわゆる前景となっている部分を抽出する場合の他、例えば、背景の一部の構成要素を抽出する場合にも適用可能である。
【０２２２】
さらに、本実施の形態で説明したオブジェクト抽出処理は例示であり、どのようなオブジェクト抽出処理を採用するかは、特に限定されるものではない。
【０２２３】
また、本発明は、放送システムや編集システム等の画像処理装置に広く適用可能である。
【発明の効果】
本発明の一側面によれば、容易な操作で、的確なオブジェクト抽出を行うことが可能となる。
【図面の簡単な説明】
【図１】本発明を適用した画像処理装置の一実施の形態のハードウェア構成例を示すブロック図である。
【図２】図１の画像処理装置の機能的構成例を示すブロック図である。
【図３】表示部５における画面の表示例を示す図である。
【図４】図２の画像処理装置の処理を説明するフローチャートである。
【図５】基本画面における表示の切り替え説明するための図である。
【図６】「アンドゥ」と「部分削除」を説明するための図である。
【図７】ユーザが、オブジェクト上の点を指定する様子を示す図である。
【図８】「全取得」と「部分取得」を説明するための図である。
【図９】境界検出部３１の構成例を示すブロック図である。
【図１０】細線化処理を説明するための図である。
【図１１】境界画像を示す図である。
【図１２】切り出し部３２の処理を説明するためのフローチャートである。
【図１３】切り出し部３２の処理を説明するための図である。
【図１４】履歴情報の更新を説明するための図である。
【図１５】履歴情報に基づくオブジェクト抽出を説明するための図である。
【図１６】履歴情報に基づくオブジェクト抽出を説明するための図である。
【図１７】処理制御部７の処理を説明するためのフローチャートである。
【図１８】図１７のステップＳ５３乃至Ｓ５５の処理のより詳細を説明するためのフローチャートである。
【図１９】閾値の更新を説明するための図である。
【図２０】表示部５における画面の表示例を示す図である。
【図２１】第１の初期抽出処理を説明するための図である。
【図２２】第２の初期抽出処理を説明するための図である。
【図２３】履歴情報の継承を説明するための図である。
【符号の説明】
１ストレージ，２注目フレーム処理部，３オブジェクト抽出部，４履歴管理部，５表示部，６動き検出部，７処理制御部，８イベント検出部，９マウス，２１注目フレームバッファ，２２背景バッファ，２３オブジェクトバッファ，２４セレクタ，３１境界検出部，３２切り出し部，３２Ａ乃至３２Ｃ出力バッファ，３３結果処理部，３３Ａ乃至３３Ｃ結果バッファ，４１指定位置記憶部，４２履歴画像記憶部，４３パラメータテーブル記憶部，６１前フレームバッファ，７１位置補正部，１０１バス，１０２ CPU，１０３ ROM，１０４ RAM，１０５ハードディスク，１０６出力部，１０７入力部，１０８通信部，１０９ドライブ，１１０入出力インタフェース，１１１リムーバブル記録媒体，２０１チェンジディスプレイボタン，２０２ユースレコードボタン，２０３デリートパートリィボタン，２０４アンドゥボタン，２０５リセットレコードボタン，２０６ランクリザルトボタン，２０７グラブオールボタン，２０８グラブパートリィボタン，２１１ＨＳＶ分離部，２１２Ｈ，２１２Ｓ，２１２Ｖエッジ検出部，２１３Ｈ，２１３Ｓ，２１３Ｖ二値化部，２１４Ｈ，２１４Ｓ，２１４Ｖ細線化部，２１５Ｈ，２１５Ｓ，２１５Ｖ境界画像記憶部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing device, an image processing method, and a recording medium, and more particularly, to an image processing device, an image processing method, and a recording medium that enable accurate object extraction, for example, with an easy operation. .
[0002]
[Prior art]
Various methods have been proposed as a method for extracting an object that is a part of an object that is a so-called foreground from an image.
[0003]
That is, for example, in Japanese Patent Application Laid-Open No. 10-269369, an object is extracted by detecting the outline of an object for a certain frame, and the periphery of the object of the previous frame is searched for the next frame. A method is described in which the contour of an object is detected, and the object is repeatedly extracted based on the detection result.
[0004]
[Problems to be solved by the invention]
However, if the detection of the contour from the target frame of interest is limited to the periphery of the object in the previous frame, the contour of the object will be detected when the object is greatly deformed or moved in the target frame. Is likely to be erroneously detected, and accurate object extraction becomes difficult.
[0005]
On the other hand, for example, if the user designates the contour of the object for each frame and the object is extracted based on the designated contour, the operation burden on the user becomes large.
[0006]
The present invention has been made in view of such circumstances, and makes it possible to perform accurate object extraction with an easy operation.
[0007]
[Means for Solving the Problems]
  An image processing apparatus or a recording medium according to an aspect of the present invention is an image processing apparatus that extracts a predetermined object from an image, and an object that extracts an object by a plurality of processes from an image of a target screen of interest An extraction means; a selection means for selecting an object extraction result by the plurality of processes to be reflected in a final object extraction result based on an input from a user; and an object extraction result selected by the selection means. Based on the reflection means to be reflected in the final object extraction result, the processing history that is the content of the process used to extract the object reflected in the final object extraction result, and the input from the user Determination means for determining the content of processing for extracting an object from the screen, and storing the processing history A plurality of processes are processes using different threshold values, and the history information includes a threshold value used for extracting the object for each pixel constituting the object, The determination unit corrects the predetermined point of the attention screen input by the user by a motion vector between the attention screen and a previous screen processed before the attention screen. A point of the previous screen corresponding to a predetermined point on the screen is obtained, and based on the processing history for the previous screen, a pixel at the point on the previous screen corresponding to the predetermined point on the target screen is used as an object. A predetermined threshold value used at the time of extraction is acquired, and the predetermined threshold value and a value obtained by a predetermined calculation using the predetermined threshold value are included in a value including a predetermined point on the attention screen. Image processing apparatus for determining a threshold value used in a plurality of process for extracting the object, or, as an image processing apparatus, a recording medium having a program recorded for causing a computer to function.
[0008]
  An image processing method according to one aspect of the present invention is an image processing method for extracting a predetermined object from an image, the object extracting step for extracting an object by a plurality of processes from an image of a target screen of interest, A selection step for selecting, based on an input from a user, an object extraction result by a plurality of processes to be reflected in the final object extraction result, and the object extraction result selected in the selection step is determined as a final object. Extracting objects from the screen of interest based on the reflection step to be reflected in the extraction result, the processing history that is the content of the processing used to extract the object reflected in the final object extraction result, and the input from the user A determination step for determining the content of the processing to be performed, and storing the processing history The plurality of processes are processes using different threshold values, and the history information includes a threshold value used for extraction of the object for each pixel constituting the object, In the determining step, the predetermined point of the attention screen input by the user is corrected by a motion vector between the attention screen and a previous screen processed temporally before the attention screen. A point of the previous screen corresponding to a predetermined point of the screen of interest is obtained, and a pixel of the point of the previous screen corresponding to the predetermined point of the screen of interest is an object based on the processing history for the previous screen A predetermined threshold value used when extracted as a predetermined value, and the predetermined threshold value and a value obtained by a predetermined calculation using the predetermined threshold value The image processing method of determining the threshold value used in a plurality of processes of extracting an object that contains the points.
[0010]
  In one aspect of the present invention, an object is extracted by a plurality of processes from an image of a target screen of interest, and the object extraction result by the plurality of processes is reflected in a final object extraction result. The selected object extraction result is reflected in the final object extraction result. Further, the content of the process for extracting the object from the screen of interest is determined based on the processing history which is the content of the process used for extracting the object reflected in the final object extraction result and the input from the user. On the other hand, the processing history is stored. The plurality of processes are processes using different threshold values, and the history information includes a threshold value used for extraction of the object for each pixel constituting the object. In this case, the predetermined point of the attention screen input by the user is corrected by a motion vector between the attention screen and the previous screen processed before the attention screen. A point of the previous screen corresponding to a predetermined point of the screen is obtained, and a pixel of the point of the previous screen corresponding to the predetermined point of the screen of interest is an object based on the processing history for the previous screen The predetermined threshold value used when extracted as is acquired. Then, the predetermined threshold and a value obtained by a predetermined calculation using the predetermined threshold are determined as thresholds used for a plurality of processes for extracting an object including a predetermined point on the screen of interest.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a hardware configuration example of an embodiment of an image processing apparatus to which the present invention is applied.
[0012]
This image processing apparatus is configured on the basis of a computer, and a program for executing a series of processes for object extraction as will be described later (hereinafter referred to as an object extraction processing program as appropriate) is installed in the computer. ing.
[0013]
In addition, the image processing apparatus can be configured by causing a computer to execute a program as described above, or can be configured by dedicated hardware.
[0014]
Here, the object extraction processing program is recorded in advance in a hard disk 105 or a ROM 103 as a recording medium built in the computer.
[0015]
Alternatively, the object extraction processing program is stored in a removable recording medium 111 such as a floppy disk, CD-ROM (Compact Disc Read Only Memory), MO (Magneto optical) disk, DVD (Digital Versatile Disc), magnetic disk, and semiconductor memory. Stored (recorded) temporarily or permanently. Such a removable recording medium 111 can be provided as so-called package software.
[0016]
Note that the object extraction processing program is installed on the computer from the removable recording medium 111 as described above, and is transferred from the download site to the computer wirelessly via a digital satellite broadcasting artificial satellite. Network) and the Internet via a network such as the Internet, the computer may receive the object extraction processing program transferred in this way by the communication unit 108 and install it in the built-in hard disk 105. it can.
[0017]
The computer includes a CPU (Central Processing Unit) 102. An input / output interface 110 is connected to the CPU 102 via the bus 101, and the CPU 102 operates an input unit 107 including a keyboard, a mouse, a microphone, and the like by the user via the input / output interface 110. When a command is input as a result of this, an object extraction processing program stored in a ROM (Read Only Memory) 103 is executed accordingly. Alternatively, the CPU 102 is installed in the object extraction processing program stored in the hard disk 105, the object extraction processing program transferred from the satellite or the network, received by the communication unit 108 and installed in the hard disk 105, or the drive 109. The object extraction processing program read from the removable recording medium 111 and installed in the hard disk 105 is loaded into a RAM (Random Access Memory) 104 and executed. As a result, the CPU 102 performs processing according to a flowchart as described later, or processing performed according to the configuration of a block diagram described later. Then, the CPU 102 outputs the processing result from the output unit 106 configured with an LCD (Liquid Crystal Display), a speaker, or the like via the input / output interface 110, or from the communication unit 108 as necessary. Transmission and further recording on the hard disk 105 are performed.
[0018]
Here, in the present specification, the processing steps for describing a program for causing the computer to perform various processes do not necessarily have to be processed in time series in the order described in the flowcharts, but in parallel or individually. This includes processing to be executed (for example, parallel processing or processing by an object).
[0019]
Further, the program may be processed by one computer or may be distributedly processed by a plurality of computers. Furthermore, the program may be transferred to a remote computer and executed.
[0020]
FIG. 2 shows a functional configuration example of the image processing apparatus of FIG. This functional configuration is realized by the CPU 102 in FIG. 1 executing the object extraction processing program.
[0021]
The storage 1 stores image data of a moving image from which an object is to be extracted. The storage 1 also stores history information of each frame, which will be described later, supplied from the processing control unit 7.
[0022]
The attention frame processing unit 2 uses a predetermined frame of the image data stored in the storage 1 as the attention frame, reads the image data of the attention frame, and performs processing related to the attention frame according to the control from the processing control unit 7.
[0023]
That is, the frame-of-interest processing unit 2 includes a frame-of-interest buffer 21, a background buffer 22, an object buffer 23, a selector 24, and the like. The attention frame buffer 21 stores the image data of the attention frame read from the storage 1. The background buffer 22 stores the rest of the image of the target frame stored in the target frame buffer 21 other than the part stored in the object buffer 23 described later as a background image. The object buffer 23 stores the object of the target frame extracted by the object extraction unit 3 described later. The selector 24 selects any one of the attention frame stored in the attention frame buffer 21, the background image stored in the background buffer 22, or the object stored in the object buffer 23, and supplies it to the display unit 5. To do.
[0024]
The object extraction unit 3 extracts an object from a target frame stored in the target frame buffer 21 through a plurality of processes under the control of the processing control unit 7.
[0025]
That is, the object extraction unit 3 includes a boundary detection unit 31, a cutout unit 32, a result processing unit 33, and the like. The boundary detection unit 31 detects a boundary portion of the image of the target frame stored in the target frame buffer 21 and is a binary value representing the boundary portion and a portion that is not the boundary portion (hereinafter referred to as a non-boundary portion as appropriate). A plurality of types (for example, three types) of boundary images are created. The cutout unit 32 refers to the three boundary images created by the boundary detection unit 31, and cuts out an area constituting the object from the target frame stored in the target frame buffer 21. Further, the cutout unit 32 includes three output buffers 32A to 32C, and stores the regions cut out by referring to the three types of boundary images in the output buffers 32A to 32C, respectively. The result processing unit 33 includes three result buffers 33A to 33C corresponding to the three output buffers 32A to 32C included in the cutout unit 32, and outputs the object extraction result stored in the object buffer 23 as an output. The contents stored in the buffers 32A to 32C are combined, and the three combined results are stored in the result buffers 33A to 33C, respectively. Further, the result processing unit 33 selects any one of the stored contents of the result buffers 33A to 33C based on the input given by the user operating the mouse 9, and reflects the selected contents in the stored contents of the object buffer 23. .
[0026]
The history management unit 4 manages history information under the control of the processing control unit 7.
[0027]
That is, the history management unit 4 includes a designated position storage unit 41, a history image storage unit 42, a parameter table storage unit 43, and the like. The designated position storage unit 41 stores a history of the coordinates of the position on the frame of interest input by the user operating the mouse 9. The history image storage unit 42 stores a history image representing the history of the processing contents in the object extraction unit 42, and the parameter table storage unit 43 associates the ID with a pixel value constituting the history image, and extracts an object. A parameter representing the content of processing in the unit 3 is stored. That is, the parameter table storage unit 43 stores a parameter representing the processing content in the object extraction unit 3 in association with a unique ID. And the history image memory | storage part 42 memorize | stores ID corresponding to the content of the process used for extracting as a pixel which comprises an object for every pixel which comprises an object. Therefore, the content of the process used to extract a certain pixel constituting the object as an object refers to the parameter in the parameter table storage unit 43 associated with the ID as the pixel value in the history image of the pixel. Can be recognized.
[0028]
Here, in the history image storage unit 42, as described above, an ID associated with a parameter representing the content of processing when a pixel is extracted as an object is configured with such a pixel value as a pixel value. Since the image to be stored is stored, the image represents a history of the contents of processing used for object extraction, and is called a history image. Hereinafter, the stored contents in the designated position storage unit 41, the history image storage unit 42, and the parameter table storage unit 43 are collectively referred to as history information as appropriate.
[0029]
The designated position storage unit 41, the history image storage unit 42, and the parameter table storage unit 43 have at least two banks. By switching banks, the frame of interest and a frame (one frame before the frame of interest) ( History information about the previous frame) can be stored.
[0030]
The display unit 5 displays the image output from the selector 24 (the image of the frame of interest, the background image, or the object) and the images stored in the result buffers 33A to 33C of the result processing unit 33.
[0031]
Under the control of the processing control unit 7, the motion detection unit 6 detects a motion vector of the image of the frame of interest based on the image of the previous frame and supplies the motion vector to the processing control unit 7.
[0032]
That is, the motion detection unit 6 includes a previous frame buffer 61, reads the image data of the previous frame from the storage 1, and stores the image data in the previous frame buffer 61. Then, the motion detection unit 6 performs block matching on the image data of the previous frame stored in the previous frame buffer 61 and the image data of the target frame stored in the target frame buffer 21 of the target frame processing unit 2. To detect the motion vector and supply it to the processing control unit 7.
[0033]
The processing control unit 7 controls the attention frame processing unit 2, the object extraction unit 3, the history management unit 4, and the motion detection unit 6 based on the event information supplied from the event detection unit 8. Further, the process control unit 7 determines the content of the process in the object extraction unit 3 based on the event information supplied from the event detection unit 8 and the history information managed by the history management unit 4, and the determination result Based on the above, the object extraction unit 3 is made to extract the object. Further, the processing control unit 7 has a built-in position correction unit 71, information on the position of the frame of interest supplied as event information from the event detection unit 8, and a designated position storage unit 41 of the history management unit 4. Is corrected according to the motion vector from the motion detector 6. As will be described later, the corrected position information is supplied to the object extraction unit 3 and used for object extraction. Alternatively, it is supplied to the history management unit 41 and stored in the designated position storage unit 41.
[0034]
The event detection unit 8 detects an event that occurs when the user operates the mouse 9 and supplies event information representing the content of the event to the processing control unit 7.
[0035]
The mouse 9 is operated by the user when specifying a position on the image displayed on the display unit 5 or when giving a predetermined command to the apparatus.
[0036]
Next, FIG. 3 shows a display example of a screen on the display unit 5.
[0037]
When the object extraction processing program is executed, a window in which the horizontal and vertical directions are each divided into two as shown in FIG.
[0038]
In the four divided windows, the upper left screen is a reference screen, and the upper right, lower left, and lower right screens are result screens # 1, # 2, and # 3, respectively.
[0039]
An image output from the selector 24 is displayed on the reference screen. As described above, the selector 24 selects any one of the attention frame stored in the attention frame buffer 21, the background image stored in the background buffer 22, or the object stored in the object buffer 23, Since the image is supplied to the display unit 5, any one of the image (original image), the object, and the background of the frame of interest is displayed on the reference screen. Which of the above-mentioned images is displayed on the display unit 5 can be switched by the user operating the mouse 9. In the embodiment of FIG. 3, a background image stored in the background buffer 22, that is, an image obtained by removing an image captured as an object in the object buffer 23 from the original image is displayed on the reference screen. Note that a hatched portion on the reference screen represents a portion currently taken into the object buffer 23 (the same applies hereinafter).
[0040]
In the lower left of the reference screen, a change display button 201, a use record button 202, a delete partly button 203, and an undo button 204 are provided.
[0041]
The change display button 201 is operated when switching an image to be displayed on the reference screen. That is, each time the change display button 201 is clicked with the mouse 9, the selector 24 cyclically selects the outputs of the attention frame buffer 21, background buffer 22, and object buffer 23, and as a result, is displayed on the reference screen. The images to be switched are cyclically switched in the order of the original image, the object, and the background.
[0042]
The use record button 202 is operated when setting whether to use history information stored in the history management unit 4 to extract an object from the target frame stored in the target frame buffer 21. That is, when the use record button 202 is clicked with the mouse 9, a pull-down menu for setting whether or not to permit use of history information is displayed on the reference screen. In the present embodiment, it is basically assumed that the use of history information is permitted.
[0043]
The delete partly button 203 is operated when deleting a part of an image stored as an object in the object buffer 23 (returning the object to the background). That is, when the user operates the mouse 9 to specify a range of a predetermined portion of the object displayed on the reference screen and then clicks the delete party button 203 with the mouse 9, the predetermined portion of the object specified in the range is selected. The part is deleted from the object buffer 23. Therefore, the delete partly button 203 is used when, for example, a part of the background is taken into the object buffer 23 as an object and the background part is deleted from the object.
[0044]
The undo button 204 is operated to delete the previously captured portion of the image captured as an object from the result buffers 33A to 33C of the result processing unit 33 into the object buffer 23. Therefore, when the undo button 204 is operated, the image stored in the object buffer 23 returns to the state immediately before the image is captured from the result buffers 33A to 33C. Note that the object buffer 23 has a plurality of banks and holds at least the state immediately before the image is captured from the result buffers 33A to 33C. When the undo button 204 is operated, the object buffer 23 switches the image to be output to the selector 24 by switching the bank to the bank selected immediately before.
[0045]
In the result screens # 1 to # 3, the storage contents of the result buffers 33A to 33C in which objects extracted from the target frame by different processes are stored, that is, the results of object extraction performed by three different processes, respectively. Is displayed. Also, a rank result button 206, a grab all button 207, and a grab partly button 208 are provided at the lower left of each of the result screens # 1 to # 3.
[0046]
The rank result button 206 is operated when ranking the object extraction results displayed on the result screens # 1 to # 3. That is, when the user clicks the rank result buttons 206 of the result screens # 1 to # 3 in the order that the user considers the object extraction result preferable by operating the mouse 9, the result screens # 1 to # 3 are displayed in the order of click. Ranking is performed on the object extraction results displayed in (1). Then, the object extraction unit 3 extracts the objects again based on the ranking, and the object extraction results are displayed on the result screens # 1 to # 3.
[0047]
The grab-all button 207 is operated when any of the object extraction results displayed on the result screens # 1 to # 3 is reflected (taken in) into the object buffer 23. That is, when the user operates the mouse 9 and the grab-all button 207 of the result screens # 1 to # 3 on which the object extraction result considered to be preferable is displayed is clicked, the result screen is displayed. All of the contents stored in the result buffer storing the object extraction result are selected and reflected in the object buffer 23.
[0048]
The grab party button 208 is operated to reflect (import) any one of the object extraction results displayed on the result screens # 1 to # 3 into the object buffer 23. That is, when the user operates the mouse 9 to specify a range of a part of the object extraction result considered to be preferable in the result screens # 1 to # 3, when the grab partly button 208 is clicked with the mouse 9, A part of the object extraction result whose range is specified is selected and reflected in the object buffer 23.
[0049]
Next, an overview of the processing of the image processing apparatus of FIG. 2 will be described with reference to the flowchart of FIG.
[0050]
When some event occurs as a result of the user operating the mouse 9, the event detection unit 8 determines the content of the event in step S1.
[0051]
When it is determined in step S1 that the event is an instruction for “screen selection” for switching an image to be displayed on the reference screen of the display unit 5, that is, when the change display button 201 (FIG. 3) is clicked. The event detection unit 8 supplies event information indicating “screen selection” to the processing control unit 7. When the process control unit 7 receives event information indicating “screen selection”, the process control unit 7 proceeds to step S2, controls the selector 24 of the target frame processing unit 2, and ends the process.
[0052]
As a result, the selector 24 switches the output selection of the attention frame buffer 21, the background buffer 22, or the object buffer 23. As a result, the image displayed on the reference screen is displayed on the reference frame buffer 21 as shown in FIG. To the original image of the frame of interest stored in the background image, the background image stored in the background buffer 22, or the object stored in the object buffer 23.
[0053]
When it is determined in step S1 that the event is an instruction for “undo” to delete the image reflected immediately before in the object buffer 23, that is, when the undo button 204 is clicked, the event detection unit 8 supplies event information indicating “Undo” to the processing control unit 7. When the process control unit 7 receives the event information indicating “Undo”, the process control unit 7 proceeds to step S3, controls the object buffer 23 of the target frame processing unit 2, and deletes the object portion reflected immediately before in the object buffer 23. The process proceeds to step S4.
[0054]
In step S4, the process control unit 7 controls the history management unit 41 to delete the history information regarding the image deleted from the object buffer 23 in step S3, and ends the process.
[0055]
That is, when an object image is reflected (taken in) in the object buffer 23, the history information managed by the history management unit 4 is updated for the reflected image, as will be described later. For this reason, when an image is deleted from the object buffer 23, history information regarding the deleted image is deleted.
[0056]
On the other hand, if it is determined in step S1 that the event is an instruction for “partial deletion” for deleting a part of the image reflected in the object buffer 23, that is, a predetermined range is designated, and the delete is further performed. When the partly button 203 is clicked, the event detection unit 8 supplies event information indicating “partial deletion” to the processing control unit 7. When the process control unit 7 receives event information indicating “partial deletion”, the process control unit 7 proceeds to step S5, controls the object buffer 23 of the target frame processing unit 2, and among the images stored as objects in the object buffer 23, The range-designated part is deleted, and the process proceeds to step S4.
[0057]
In step S4, the process control unit 7 controls the history management unit 41 to delete the history information regarding the image deleted from the object buffer 23 in step S5, and ends the process.
[0058]
Therefore, for example, as shown in FIG. 6A, an object obj1 representing a human torso portion is stored in the object buffer 23, and an object obj2 representing a human head is stored in the background buffer 22 together with a landscape. When the object obj2 representing the human head is extracted in the object extraction unit 3 and reflected in the object buffer 23 in the object extraction unit 3 as shown in FIG. The stored contents 23 are objects obj1 and obj2, and the stored contents of the background buffer 22 are only background parts such as landscapes.
[0059]
In this case, when the user clicks the undo button 204 with the mouse 9, as shown in FIG. 6C, the stored content of the object buffer 23 is the state before the object obj2 representing the human head is reflected. Returning to the state in which only the object obj1 representing the human torso is stored, the stored content of the background buffer 22 also returns to the state in which the background such as the landscape is stored together with the object obj2 representing the human head. That is, the stored contents of the background buffer 22 and the object buffer 23 are returned to the state shown in FIG.
[0060]
Further, when the user operates the mouse 9, for example, as shown in FIG. 6B, a range of a part of the object obj2 representing the human head is specified, and the delete partly button 203 is clicked. Then, as shown in FIG. 6D, the stored contents of the object buffer 23 are in a state in which the range-designated portion of the object obj2 is deleted, and the stored contents of the background buffer 22 are designated by the range. The added part is added to the background part such as a landscape.
[0061]
On the other hand, if it is determined in step S1 that the event represents “position designation” that designates a position on the image displayed on either the reference screen or the result screens # 1 to # 3, That is, for example, as illustrated in FIG. 7, when the user operates the mouse 9 and clicks a position where an object is displayed in the original image or the background image displayed on the reference screen, the event detection unit 8 displays “position Event information representing “designation” is supplied to the processing control unit 7. When the process control unit 7 receives the event information indicating “position designation”, the process control unit 7 proceeds to step S6 and determines the contents of the three object extraction processes to be performed by the object extraction unit 3 based on the position clicked with the mouse 9 or the like. The object extraction unit 3 is controlled so that the object is extracted by the three object extraction processes.
[0062]
Accordingly, the object extraction unit 3 performs three object extraction processes in step S7, and stores the three object extraction results obtained as a result in the result buffers 33A to 33C of the result processing unit 33, respectively.
[0063]
In step S8, the display unit 5 displays the object extraction results stored in the result buffers 33A to 33C on the result screens # 1 to # 3, and ends the process.
[0064]
If it is determined in step S1 that the event represents “order designation” for designating the (goodness) order of the object extraction results displayed on the result screens # 1 to # 3, that is, When the rank result button 206 displayed on each of the result screens # 1 to # 3 is clicked in a predetermined order, the event detection unit 8 supplies event information indicating “order designation” to the processing control unit 7 To do. When the process control unit 7 receives the event information indicating “order specification”, the process control unit 7 proceeds to step S6, and the contents of the three object extraction processes to be performed by the object extraction unit 3 based on the order specified by “order specification”. And the object extraction unit 3 is controlled so that the object is extracted by the three object extraction processes. Thereafter, the process proceeds to steps S7 and S8 in sequence, and the same processing as described above is performed.
[0065]
On the other hand, in step S1, the event selects any one of the object extraction results displayed on the result screens # 1 to # 3, and reflects all or a part thereof in the object buffer 23 as “all acquisition” or When it is determined that it represents “partial acquisition”, that is, when any grab all button 207 of the result screens # 1 to # 3 is clicked, or displayed on the result screens # 1 to # 3 When a part of one of the obtained object extraction results is designated as a range, and when the grab part button 208 is clicked, the event detection unit 8 displays event information indicating “all acquisition” or “partial acquisition”. To the process control unit 7, and when the process control unit 7 receives event information indicating "all acquisition" or "partial acquisition", the process proceeds to step S9.
[0066]
In step S9, the process control unit 7 controls the result processing unit 33 of the object extraction unit 3, so that the result buffer corresponding to the one in which the grab all button 207 is clicked in the result screens # 1 to # 3. Is selected and reflected (stored) in the object buffer 23. Alternatively, in step S9, the processing control unit 7 controls the result processing unit 33 of the object extraction unit 3 to click the grab party button 208 in the result screens # 1 to # 3. The range-designated portion of the object extraction result stored in the result buffer corresponding to is selected and reflected in the object buffer 23.
[0067]
Therefore, for example, the contents stored in the object buffer 23 are as shown in FIG. 8A, and the object extraction result stored in the result buffer corresponding to a certain result screen #i is shown in FIG. When the grab all button 207 displayed on the result screen #i is operated in the case shown in FIG. 8B, the stored contents of the object buffer 23 are as shown in FIG. Are updated (overwritten) on the object extraction result stored in the result buffer shown in FIG.
[0068]
Further, a part of the object extraction result displayed on the result screen #i is designated as indicated by a rectangle in FIG. 8B), and the grab party button 208 displayed on the result screen #i is displayed. As shown in FIG. 8 (D), the contents stored in the object buffer 23 are stored in the object shown in FIG. 8 (A) and the object in the range designated in FIG. 8 (B). It is updated to the one with the extraction result added (synthesized).
[0069]
In step S10, the process control unit 7 controls the history management unit 4 to update the history information regarding the image reflected in the object buffer 23 in step S9, and ends the process.
[0070]
As described above, the object extraction results by the three object extraction processes are displayed on the result screens # 1 to # 3, respectively, and the user can select the grab all button 207 or the grab part on any of the result screens # 1 to # 3. When the Li button 208 is clicked, the object extraction result displayed on the result screen is reflected in the object buffer 23. Therefore, the user only has to perform an operation of selecting a good one by looking at the object extraction results obtained by the different object extraction processes displayed on the result screens # 1 to # 3. Of the object extraction results obtained by the object extraction process, the one selected by the user as judged good is reflected. As a result, accurate object extraction can be performed with an easy operation.
[0071]
In addition, objects extracted by different processes are displayed on the result screens # 1 to # 3. Even if the object extraction result by a certain process is not so good as a whole, only a part thereof is displayed. If you look, it may be good. In this case, by specifying a part of the range and clicking the grab part button 208, a part of the object that has been successfully extracted can be reflected in the object buffer 23. In this case, a good object extraction result is stored in the object buffer 23.
[0072]
On the other hand, if it is determined in step S1 that the event represents “determined”, the final object extraction result from the frame of interest is confirmed in the image stored in the object buffer 23, the event detection unit 8 Supplies event information indicating “determined” to the process control unit 7.
[0073]
When the process control unit 7 receives the event information indicating “confirmed”, the process control unit 7 proceeds to step S11, reads the object of the target frame stored in the object buffer 23 from the target frame processing unit 2, and from the history management unit 4, History information about the frame of interest is read and supplied to the storage 1 for storage. In step S12, the process control unit 7 determines whether or not the next frame of the frame of interest is stored in the storage 1. If it is determined that the frame is not stored, the process control unit 7 skips steps S13 and S14. The process is terminated.
[0074]
If it is determined in step S12 that the next frame of the target frame is stored in the storage 1, the process proceeds to step S13, and the process control unit 7 sets the next frame as a new target frame. , Supplied to the attention frame buffer 21 for storage. Further, the process control unit 7 clears the stored contents of the background buffer 22, the result buffers 33A to 33C, and the previous frame buffer 61, and proceeds to step S14. In step S14, under the control of the processing control unit 7, an initial extraction process as will be described later is performed on the target frame newly recorded in the target frame buffer 21 in step S13, and the process ends.
[0075]
Next, an object extraction process performed by the object extraction unit 3 in FIG. 2 will be described.
[0076]
In the present embodiment, the object extraction unit 3 basically detects a boundary portion in the frame of interest and extracts a region surrounded by the boundary portion as an object.
[0077]
That is, FIG. 9 shows a configuration example of the boundary detection unit 31 of the object extraction unit 3.
[0078]
The HSV separation unit 211 reads the attention frame stored in the attention frame buffer 21 and separates the pixel value into H (hue), S (saturation), and V (lightness) components. That is, when the pixel value of the frame of interest is expressed in RGB (Red, Green, Blue), for example, the HSV separation unit 211 converts the RGB pixel value into the HSV pixel value, for example, according to the following equation: Convert.
[0079]
V = max (R, G, B)
X = min (R, G, B)
S = (V−X) / V × 255
H = (GB) / (V−X) × 60 However, when V = R
H = (B−R) / (V−X + 2) × 60 However, when V = G
H = (R−G) / (V−X + 4) × 60 However, other than above
Here, the R, G, and B components that are the original pixel values of the frame of interest are represented by, for example, 8 bits (an integer value in the range of 0 to 255). Further, max () represents the maximum value in the parentheses, and min () represents the minimum value in the parentheses.
[0080]
The HSV separation unit 211 supplies H, S, and V components of the pixel values converted into HSV components to the

edge detection units

212H, 212S, and 212V, respectively.
[0081]
The

edge detection units

212H, 212S, and 212V target images that are composed of the H, S, and V components from the HSV separation unit 211 (hereinafter referred to as H plane, S plane, and V plane, respectively). Perform edge detection.
[0082]
That is, the edge detection unit 212H detects an edge from the H-plane image by filtering the H-plane image by a Sobel operator.
[0083]
Specifically, the H component of the x + 1th pixel from the left of the H plane image and the y + 1th pixel from the top is represented as I (x, y), and the filtering result of the pixel by the Sobel operator is represented by E (x , Y), the edge detection unit 212H obtains an edge image composed of pixel values E (x, y) represented by the following expression.
[0084]

[0085]
In the

edge detection units

212S and 212V, as in the case of the edge detection unit 212H, edge images are obtained for the images of the S plane and the V plane.
[0086]
The edge images obtained from the images of the H, S, and V planes are supplied from the

edge detection units

212H, 212S, and 212V to the

binarization units

213H, 213S, and 213V, respectively. The

binarization units

213H, 213S, and 213V binarize the edge images of the H, S, and V planes by comparing them with a predetermined threshold value, and the binarized images of the H, S, and V planes obtained as a result ( An image having a pixel value of 0 or 1) is supplied to the thinning

units

214H, 214S, and 214V, respectively.
[0087]
The thinning

units

214H, 214S, and 214V thin the boundary portions in the binarized images of the H, S, and V planes supplied from the

binarizing units

213H, 213S, and 213V, and the resulting H, S , V plane boundary images are supplied to and stored in boundary

image storage units

215H, 215S, and 215V, respectively.
[0088]
Next, with reference to FIG. 10, the thinning process performed on the binarized image of the H plane in the thinning unit 214H of FIG. 9 will be described.
[0089]
In the thinning process, as shown in the flowchart of FIG. 10A, first, in step S21, the predetermined flag v is reset to 0, and the process proceeds to step S22. In step S22, the pixels constituting the binary image of the H plane are referred to in raster scan order, and the process proceeds to step S23. In step S23, it is determined whether there is a pixel that has not been referred to in the raster scan order among the pixels that constitute the binary image of the H plane, and it has been determined that there is a pixel that has not been referred to yet. In this case, the pixel that is detected first in the raster scan order and has not been referred to is set as the target pixel, and the process proceeds to step S24.
[0090]
In step S24, one or more pixel values of four pixels adjacent to the target pixel in the vertical and horizontal directions are 0, and the pixel value c of the target pixel is a predetermined value a (a value other than 0 and 1). It is determined whether they are not equal. In step S24, whether or not one or more of the four pixels adjacent to the target pixel in the up, down, left, and right directions is determined to be non-zero (thus, among the four adjacent pixels, the pixel value is zero) Or if the pixel value c of the target pixel is determined to be equal to the predetermined value a, the process returns to step S22, and the same processing is repeated thereafter.
[0091]
In step S24, it is determined that one or more pixel values of four pixels adjacent to the target pixel in the vertical and horizontal directions are 0 and the pixel value c of the target pixel is not equal to the predetermined value a. If so, the process proceeds to step S25, 1 is set in the flag v, and the process proceeds to step S26.
[0092]
In step S26, as shown in FIG. 10B, the sum of the pixel values a1, a2, a3, a4, a5, a6, a7, a8 of the eight pixels adjacent to the target pixel c (a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 is 6 or less). It is determined whether it exists.
[0093]
If it is determined in step S26 that the added value of the eight pixel values adjacent to the target pixel c is not 6 or less, the process proceeds to step S28, where a predetermined value a is set as the pixel value c of the target pixel, and step S22. Return to.
[0094]
If it is determined in step S26 that the added value of the eight pixel values adjacent to the target pixel c is 6 or less, the process proceeds to step S27 to determine whether or not the following conditional expression is satisfied.
[0095]
(a2 + a4 + a6 + a8)-(a1 & a2 & a3)-(a4 & a5 & a6)-(a7 & a8 & a1) = 1
However, & represents a logical product.
[0096]
If it is determined in step S27 that the conditional expression is not satisfied, the process proceeds to step S28, and as described above, the predetermined value a is set to the pixel value c of the target pixel, and the process returns to step S22.
[0097]
If it is determined in step S27 that the conditional expression is satisfied, the process proceeds to step S29, the pixel value c of the target pixel is set to 0, and the process returns to step S22.
[0098]
On the other hand, if it is determined in step S23 that there are no unreferenced pixels in the raster scan order among the pixels constituting the H-plane binarized image, that is, all the pixels constituting the binarized image. When the process is performed with the target pixel as the target pixel, the process proceeds to step S30 to determine whether or not the flag v is zero.
[0099]
If it is determined in step S30 that the flag v is not 0, that is, if the flag v is 1, the process returns to step S21, and the same processing is repeated thereafter. If it is determined in step S30 that the flag v is 0, the process ends.
[0100]
Thereafter, the thinning unit 214H converts the pixel value of the pixel constituting the image obtained as a result of the above-described thinning processing to a pixel value of the predetermined value c, and converts the pixel value to 1 after the conversion. The image is supplied as a boundary image to the boundary image storage unit 215H. Thereby, the boundary image storage unit 215H stores the boundary image in which the boundary portion is 1 and the non-boundary portion is 0 in the image of the H plane.
[0101]
The thinning

units

214S and 214V also obtain the boundary images of the S and H planes by performing the same processing as in the thinning unit 214H.
[0102]
Here, with respect to the thinning method as described in FIG. 10, for example, Yokoi, Toriwaki, Fukumura, “Topological properties of sampled binary figures”, IEICE Transactions (D), J56-D, pp. The details are disclosed in 662-669, 1973 and the like. Note that the thinning method is not limited to the above-described method.
[0103]
FIG. 11 shows an example of the boundary image.
[0104]
FIG. 11A shows an original image, and FIG. 11B shows a V-plane boundary image obtained from the original image of FIG. 11A. FIG. 11C shows an H-plane boundary image obtained from the original image of FIG. By comparing FIG. 11B and FIG. 11C, a relatively small concave or convex portion is detected as a boundary portion in the V plane, while a relatively large concave portion is detected in the H plane. Or it turns out that only a convex part is detected as a boundary part. Thus, in the H, S, and V planes, the levels of the concave or convex portions detected as the boundary portions are different.
[0105]
Here, in FIGS. 11B and 11C, a white portion (boundary portion) represents a portion having a pixel value of 1 in the boundary image, and a black portion represents the boundary. This represents a portion where the pixel value is 0 in the image.
[0106]
As described above, the boundary detection unit 31 creates three boundary images for each of the H, S, and V planes, and uses three threshold values for binarization for any one of the planes. As a result, three boundary images corresponding to the three threshold values may be created. Hereinafter, as appropriate, threshold values used when three boundary images are created for the H, S, and V planes are respectively TH._H, TH_S, TH_VIt expresses. Further, the three threshold values used when three boundary images are created for a certain plane are hereinafter referred to as TH1, TH2, and TH3 as appropriate.
[0107]
Next, with reference to the flowchart of FIG. 12, the clipping process performed by the clipping unit 32 of FIG. 2 will be described. Note that, as described above, the boundary detection unit 31 obtains three boundary images, but here, the clipping process will be described focusing on one of the boundary images. In addition, hereinafter, an image that is extracted from a frame of interest based on a focused boundary image (target boundary image) among the three output buffers 32A to 32C is referred to as a focused output buffer.
[0108]
In the clipping process, after the stored contents of the attention output buffer are cleared, the pixel value of the pixel at the position (designated position) on the image of the attention frame designated by the user by operating the mouse 9 is determined in step S41. , Read from the target frame buffer 21 and written to the target output buffer. That is, the object extraction unit 3 performs processing when the user performs “position designation” or “order designation” as described with reference to FIG. 4, but in step S41, the “position designation” performed by the user immediately before is performed. The pixel value of the pixel at the position on the target frame designated by “” is written to the target output buffer. In step S42, it is determined whether or not an unprocessed pixel (pixel value) is stored in the target output buffer.
[0109]
If it is determined in step S42 that the unprocessed pixel is stored in the target output buffer, the process proceeds to step S43, and any one of the unprocessed pixels among the pixels stored in the target output buffer is determined. The pixel is the target pixel, and the process proceeds to step S44. In step S44, pixel values of 8 pixels adjacent to the target pixel above, below, left, right, upper left, lower left, upper right, and lower right are acquired from the boundary image, and the process proceeds to step S45.
[0110]
In step S45, a boundary pixel (a pixel having a pixel value of 1 in the present embodiment) that is a boundary portion exists among the pixel values of eight pixels adjacent to the target pixel in the boundary image. It is determined whether or not. If it is determined in step S45 that the boundary pixel is present in the pixel values of the eight pixels adjacent to the target pixel, step S46 is skipped, the process returns to step S42, and the same processing is repeated thereafter. That is, when a boundary pixel exists among the pixel values of 8 pixels adjacent to the target pixel, the pixel value of the 8 pixels is not written to the target output buffer.
[0111]
If it is determined in step S45 that no boundary pixel exists among the pixel values of the eight pixels adjacent to the target pixel, the process proceeds to step S46, and the pixel value of the eight pixels is read from the target frame buffer 21. And stored in the corresponding address of the output buffer of interest. In other words, if there is no boundary pixel among the pixel values of the eight pixels adjacent to the target pixel, the object including the position where the user clicked with the mouse 9 (the position designated by “position designation”). The pixel value of 8 pixels is written in the target output buffer.
[0112]
Thereafter, the process returns to step S42, and the same processing is repeated thereafter.
[0113]
Note that if the pixel value has already been written to the pixel to which the pixel value is to be written in step S46, the pixel value is overwritten in the target output buffer. In addition, when a pixel whose pixel value has been overwritten is already a pixel of interest, the pixel is not treated as an unprocessed pixel even if overwritten, but remains as a processed pixel. Is done.
[0114]
On the other hand, if it is determined in step S42 that no unprocessed pixel is stored in the target output buffer, the process ends.
[0115]
Next, the clipping process performed by the clipping unit 32 will be further described with reference to FIG.
[0116]
As illustrated in FIG. 13A, the cutout unit 32 uses the pixel value of the pixel at the position (designated position) on the image of the target frame specified by the user by operating the mouse 9 as the target frame buffer 21. Read from and write to output buffer. Further, the cutout unit 32 sets an arbitrary one of the unprocessed pixels among the pixels stored in the output buffer as a target pixel, and calculates the pixel values of eight pixels adjacent to the target pixel from the boundary image. get. Then, when the boundary pixel does not exist among the pixel values of the eight pixels adjacent to the target pixel in the boundary image, the cutout unit 32 reads the pixel value of the eight pixels from the target frame buffer 21 and outputs it. Write to buffer. As a result, as shown in FIG. 13B, the output buffer is surrounded by boundary pixels starting from a pixel at the position designated by the user with the mouse 9 (indicated by a mark ● in FIG. 13B). The pixel values of the pixels constituting the inside of the area to be written are written.
[0117]
By performing the above processing until there are no unprocessed pixels in the pixels stored in the output buffer, the region surrounded by the boundary pixels in the image of the frame of interest is stored in the output buffer. The
[0118]
Therefore, according to the clipping process as described above, since the point of interest specified by the user as an object is the starting point, the region of the frame of interest surrounded by the boundary portion including the starting point is cut out. Can be cut out with high accuracy. That is, when all of the areas constituting the object are automatically extracted, it is difficult to determine whether or not a certain area constitutes the object. As a result, the area is cut out from the pixels not constituting the object. May be started. On the other hand, in the cutout process in FIG. 12, since the region is cut out starting from the point designated by the user as an object, the cutout of the region is always started from the pixels of the region constituting the object. Thus, the area constituting the object can be accurately cut out.
[0119]
The cutout process in FIG. 12 is performed based on each of the three boundary images obtained by the boundary detection unit 31, and the region cutout results obtained based on the three boundary images are output buffers 32A to 32C, respectively. Is remembered. Then, the stored contents of the output buffers 32A to 32C are transferred to the result buffers 33A to 33C, respectively. As a result, the object extraction results obtained by different processes are displayed on the result screens # 1 to # 3, respectively. The
[0120]
Next, history information managed by the history management unit 4 in FIG. 2 will be described with reference to FIGS. 14 to 16.
[0121]
When all or part of the object extraction result stored in any of the result buffers 33A to 33C is reflected (written) in the object buffer 23 by “partial acquisition” or “all acquisition”, the history is recorded. The management unit 4 updates the specified position stored in the specified position storage unit 41, the history image stored in the history image storage unit 42, and the entry in the parameter table storage unit 43.
[0122]
That is, for example, in the case where a portion where the person is displayed is extracted as an object from the frame of interest where the whole body of the person is displayed, the image of the body part and the lower body part is already extracted as the object, If it is stored in the buffer 23, the history image storage unit 42 obtains the boundary image plane used to extract the body part and the boundary image, for example, as shown in FIG. The pixel of the body part whose ID1 is associated with the threshold value used in the above, the boundary image plane used to extract the lower body part, and the threshold value used to obtain the boundary image A history image made up of pixels of the lower body portion in which the associated ID2 is a pixel value is stored.
[0123]
In the embodiment of FIG. 14A, the boundary image plane used to extract the body portion is an H plane, and the threshold value (boundary image) used to obtain the boundary image of the H plane is used. (The threshold value used for binarization for obtaining) is 100. The plane of the boundary image used for extracting the lower body part is a V plane, and the threshold used for obtaining the boundary image is 80.
[0124]
In this case, the parameter table storage unit 43 stores ID1 in association with the H plane and the threshold 100, and stores ID2 in association with the V plane and the threshold 80. Yes.
[0125]
Here, the set of the boundary image plane used to extract the object stored in the object buffer 23 and the threshold value used to obtain the boundary image is hereinafter referred to as a parameter set as appropriate.
[0126]
Thereafter, when the user clicks and designates the head pixel in the frame of interest displaying the whole human body with the mouse 9, the object extraction unit 3 performs three types of object extraction processes as described above. As shown in FIG. 14B, the head object extraction results by the three types of object extraction processes are respectively stored in the result buffers 33A to 33C, and the storage contents of the result buffers 33A to 33C are The results are displayed on the result screens # 1 to # 3, respectively.
[0127]
Then, when the user refers to the head object extraction results displayed on the result screens # 1 to # 3 and “acquires all” favorable items, the heads displayed on the result screens # 1 to # 3 Among the object extraction results, those for which “acquire all” is instructed are selected and reflected in the object buffer 23 as shown in FIG.
[0128]
In this case, the history management unit 4 includes the boundary image plane used to obtain the head object extraction result reflected in the object buffer 23 and the threshold parameter set used to obtain the boundary image. It is registered in the parameter table storage unit 43 in association with the unique ID 3.
[0129]
Further, as shown in FIG. 14D, the history management unit 4 writes ID3 to the pixel value of the pixels constituting the head of the history image storage unit 42, thereby updating the history image. Here, in the embodiment of FIG. 14D, the boundary image plane used to extract the head is an S plane, and the threshold used to obtain the boundary image is 50. Yes.
[0130]
Further, as shown in FIG. 14E, the history management unit 4 positions the points on the target frame (designated positions) clicked by the user when the head object extraction result reflected in the object buffer 23 is obtained. Is added to the designated position storage unit 41. Here, in the embodiment of FIG. 14E, the designated position storage unit 41 has already stored the coordinates (x1, y1), (x2, y2), (x3, y3) of the three designated positions. There, new coordinates (x4, y4) are added.
[0131]
The history information about the attention frame as described above is used for extracting an object from the new attention frame, for example, when the next frame is newly set as the attention frame.
[0132]
That is, for example, assuming that a history image as shown in FIG. 15A can be obtained for a frame one frame prior to the frame of interest (previous frame), the user points the mouse 9 If you click on. In this case, the process control unit 7 controls the motion detection unit 6 so that the motion vector (v) based on the previous frame at the designated position (x, y) that is the point clicked by the user with the mouse 9._x, V_y). Further, the processing control unit 7 sends the designated position (x, y) to the motion vector (v_x, V_y), The position (x ′, y ′) on the previous frame corresponding to the designated position (x, y) is obtained. That is, in this case, the position correction unit 71, for example, formula (x ′, y ′) = (x, y) − (v_x, V_y) To obtain the position (x ′, y ′) on the previous frame corresponding to the designated position (x, y).
[0133]
Thereafter, the process control unit 7 records the ID of the parameter set at the position (x ′, y ′) on the previous frame corresponding to the designated position (x, y) in the history of the previous frame stored in the history image storage unit 42. The parameter set associated with the ID is obtained by referring to the image, and the parameter set associated with the ID is obtained by referring to the parameter table storage unit 43. Then, the processing control unit 7 creates three boundary images based on the parameter set acquired as described above, and cuts out a region starting from the designated position (x, y) from the three boundary images. It decides to do, and supplies decision information to that effect to the object extraction unit 3.
[0134]
Thereby, in the object extraction unit 3, as shown in FIG. 15B, for the target frame, three boundary images are created based on the parameter set corresponding to the determination information, and from the three boundary images, A region starting from the designated position (x, y) is cut out, and three patterns of objects are extracted. The three patterns of objects extracted from the frame of interest in this way are stored in the result buffers 33A to 33C and displayed on the result screens # 1 to # 3, respectively, as shown in FIG.
[0135]
When extracting an object from the frame of interest, the object extraction at the designated position input by the user can be performed in the same manner as the object extraction at the corresponding part of the previous frame, so that a good object extraction can be performed. Since it is expected, as described above, by creating a boundary image based on the parameter set corresponding to the decision information and cutting out the region starting from the specified position (x, y) from the boundary image, A good object extraction result can be obtained quickly.
[0136]
That is, in the object extraction unit 3, for example, when the user performs “position designation”, object extraction is performed by three processes starting from the position designated by the “position designation”. When none of the object extraction results obtained by the above is satisfactory, “order designation” is performed by the user, and object extraction is performed by changing the parameter set as will be described later. Therefore, as described above, when the history information of the previous frame is not used, in order to obtain a somewhat good object extraction result, the user may have to perform “order designation” many times. On the other hand, when the history information of the previous frame is used, the user can perform a simple operation of designating several points on the object of the target frame without performing “order designation”. It is highly possible that an accurate object extraction result can be obtained immediately.
[0137]
Further, when extracting an object from the frame of interest, the history information about the previous frame can be used as follows, for example.
[0138]
That is, for example, for the frame one frame before the target frame (previous frame), a history image as shown in FIG. 16A and three designated positions (x1, y1 as shown in FIG. 16B). ), (X2, y2), (x3, y3) can be obtained.
[0139]
In this case, the process control unit 7 controls the motion detection unit 6 so that the motion vector (v of the frame of interest with reference to the designated position (x1, y1) of the previous frame) is used._x, V_y). Further, the processing control unit 7 sends the designated position (x1, y1) of the previous frame to the position correction unit 71 built in the motion vector (v_x, V_y), The position (x1 ′, y1 ′) on the target frame corresponding to the designated position (x, y) of the previous frame is obtained. That is, in this case, the position correction unit 71, for example, formula (x1 ′, y1 ′) = (x1, y1) + (v_x, V_y) To obtain the position (x1 ′, y1 ′) on the target frame corresponding to the designated position (x1, y1) of the previous frame.
[0140]
Thereafter, the process control unit 7 acquires the ID of the parameter set at the designated position (x1, y1) of the previous frame by referring to the history image of the previous frame stored in the history image storage unit 42, and further The parameter set associated with the ID is acquired by referring to the parameter table storage unit 43. Then, the processing control unit 7 creates three boundary images based on the parameter set acquired as described above, and the frame of interest corresponding to the designated position (x1, y1) of the previous frame from the three boundary images. It is determined to cut out a region starting from the upper position (x1 ′, y1 ′), and determination information to that effect is supplied to the object extraction unit 3.
[0141]
As a result, the object extraction unit 3 creates three boundary images based on the parameter set corresponding to the determination information for the frame of interest as shown in FIG. 16C, and from the three boundary images, By extracting a region starting from the position (x1 ′, y1 ′), three patterns of objects are extracted.
[0142]
Similar processing is performed for the remaining two designated positions (x2, y2) and (x3, y3) among the three designated positions (x1, y1), (x2, y2), (x3, y3) of the previous frame. As a result, three patterns of objects starting from the positions (x2 ′, y2 ′) and (x3 ′, y3 ′) on the target frame corresponding to the two designated positions are extracted.
[0143]
In this way, starting from each position on the target frame corresponding to each designated position of the previous frame, each part of the object extracted from the target frame is then synthesized, and the object extraction result obtained as a result is shown in FIG. As shown in (D), it is stored in the result buffer and displayed on the result screen.
[0144]
As described above, it is expected that the object extraction from the target frame can be performed in the same manner as the object extraction of the corresponding part of the previous frame, so that it corresponds to the decision information. By creating three boundary images based on the parameter set and cutting out a region starting from the position of the target frame corresponding to the designated position (x, y) of the previous frame from the three boundary images, FIG. As in the case of 15, a good object extraction result can be obtained quickly.
[0145]
Further, in the case of FIG. 15, the user needs to designate a point on the object of the target frame. However, in the case of FIG. 16, the user does not need to make such designation. The operation burden can be further reduced.
[0146]
As described with reference to FIG. 16, before the user designates a point on the object of the target frame, the user extracts the object of the target frame using the history information about the previous frame, and displays the object extraction result. Whether to display on the result screens # 1 to # 3 can be set, for example, in the pull-down menu displayed by clicking the use record button 202 (FIG. 3) on the reference screen described above.
[0147]
Next, with reference to the flowchart of FIG. 17, the process which the process control part 7 performs in step S6 of FIG. 4 and determines the content of the several object extraction process is demonstrated.
[0148]
First, in step S51, the process control unit 7 determines whether the event information from the event detection unit 8 represents “position designation” or “order designation”. If it is determined in step S51 that the event information represents “position specification”, the process proceeds to step S52, and the process control unit 7 stores the history information of the previous frame in the history management unit 4. Determine if.
[0149]
In step S52, when it is determined that the history information of the previous frame is stored in the history management unit 4, the process proceeds to step S53, and the process control unit 7, for example, as described in FIG. Based on the history information, the contents of the three object extraction processes for extracting the object from the frame of interest are determined, the determination information to that effect is supplied to the object extraction unit 3, and the process ends.
[0150]
If it is determined in step S52 that the history information of the previous frame is not stored in the history management unit 4, that is, for example, the frame of interest is the first of the moving image frames stored in the storage 1. In step S54, the process control unit 7 determines the contents of the three object extraction processes for extracting an object from the frame of interest as default values, and sets the determination information to that effect. Then, the data is supplied to the object extraction unit 3 and the process is terminated.
[0151]
On the other hand, if it is determined in step S 51 that the event information represents “order designation”, the process proceeds to step S 55, and the processing control unit 7 determines the ranking performed by the user operating the mouse 9. Based on this, the contents of the three object extraction processes for extracting the object from the frame of interest are determined, and determination information to that effect is supplied to the object extraction unit 3 and the process ends.
[0152]
Next, the method for determining the contents of the object extraction process in steps S53 to S55 in FIG. 17 will be specifically described with reference to the flowchart in FIG. If the event information represents “position specification”, the user clicks the mouse 9 to specify a position on the frame of interest. The coordinates of the position specified by the user Is included in the event information. If the event information represents “order designation”, the user operates the mouse 9 to rank the object extraction results displayed on the three result screens # 1 to # 3. In this case, the order of the object extraction results (here, first to third) is also included in the event information.
[0153]
In step S53 of FIG. 17, when determining the contents of the object extraction process based on the history information of the previous frame, first, in step S61, the process control is performed as shown in the flowchart of FIG. The unit 7 refers to the history information of the previous frame, so that the plane of the boundary image used when the final object extraction result (the object finally stored in the object buffer 23) of the previous frame is obtained. To use the same plane boundary image.
[0154]
In other words, the processing control unit 7 uses the boundary image plane used when the pixel at the position of the previous frame corresponding to the position of the target frame specified by the user with the mouse 9 is extracted as an object. The boundary is recognized by referring to the information, and the boundary detection unit 31 determines to generate the boundary image of the plane. Here, the plane of the boundary image determined to be used for extracting the object of the frame of interest in this way is hereinafter referred to as a determination plane as appropriate.
[0155]
In step S62, the processing control unit 7 obtains a binary image for obtaining a boundary image when the pixel at the position of the previous frame corresponding to the position of the target frame designated by the user with the mouse 9 is extracted as an object. The threshold values used in the conversion are recognized by referring to the history information of the previous frame, and the threshold values of the three threshold values TH1 to TH3 used for binarization for obtaining the boundary image of the decision plane for the target image The second threshold value TH2 is determined. Here, the threshold value TH2 determined in this way is hereinafter referred to as a determination threshold value as appropriate.
[0156]
Thereafter, the process control unit 7 proceeds to step S63, and sets the remaining two threshold values TH1 and TH3 other than the decision threshold value TH2 among the three threshold values TH1 to TH3 based on the decision threshold value TH2, for example, the expression TH1 = TH2- 20 and TH3 = TH2 + 20, and the threshold value TH1 and TH3 determined based on the determination plane, the determination threshold value TH2, and the determination threshold value TH2 are supplied to the object extraction unit 3 as determination information, and the process ends. .
[0157]
In this case, the boundary detection unit 31 (FIG. 9) of the object extraction unit 3 performs edge detection in the

edge detection units

212H, 212S, and 212V that perform edge detection on the image of the decision plane. Then, in the binarization unit connected to the edge detection unit, binarization is performed using the determination threshold TH2, and the three thresholds TH1 and TH3 determined using the determination threshold TH2, Three boundary images are created. Further, in the cutout unit 32 of the object extraction unit 3, the regions described with reference to FIGS. 12 and 13, starting from the position on the target frame designated by the user, for each of the three boundary images created by the boundary detection unit 31. Is cut out.
[0158]
As described above, when the history information of the previous frame exists, the content of the object extraction process for the target frame is determined based on the history information and the position on the target frame specified by the user. For a portion having the same characteristics as the previous frame in the target frame, object extraction is performed by the same processing as in the previous frame. Therefore, accurate object extraction can be performed with an easy operation.
[0159]
Next, with reference to the flowchart of FIG. 18B, the process of the process control unit 7 when the content of the object extraction process is determined as a default value in step S54 of FIG. 17 will be described.
[0160]
In this case, first, in step S71, the processing control unit 7 determines the average value of V components of pixels near the pixel at the position (designated position) on the target frame that the user clicked with the mouse 9, that is, for example, designated. It is determined whether or not the average value of V components of horizontal × vertical 8 × 8 pixels including the pixel at the position is less than 50.
[0161]
When it is determined in step S71 that the average value of the V components of 8 × 8 pixels including the pixel at the designated position is less than 50, the process control unit 7 proceeds to step S72, and the process control unit 7 The detection unit 31 decides to create it.
[0162]
In other words, for regions with a small V component, it is empirical that segmentation of the region becomes inaccurate when the boundary image of the H or S plane is used compared to when the boundary image of the V plane is used. Therefore, in step S72, the determination plane is set as the V plane as described above.
[0163]
In step S73, the processing control unit 7 sets the three thresholds TH1 to TH3 used for binarization for obtaining the boundary image of the V plane that is the decision plane, which are default values, for example, 40, 100, 180 is determined, the fact that the determined plane is the V plane, and the threshold values TH1 to TH3 are supplied to the object extracting unit 3 as determination information, and the process ends.
[0164]
In this case, the edge detection unit 31 (FIG. 9) of the object extraction unit 3 performs edge detection in the edge detection unit 212V that performs edge detection on the image of the V plane that is the determined plane among the

edge detection units

212H, 212S, and 212V. Is done. The binarization unit 213V connected to the edge detection unit 212V performs binarization using three

threshold values

40, 100, and 180 as the threshold values TH1 to TH3, thereby creating three boundary images. Is done. Further, in the cutout unit 32 of the object extraction unit 3, the regions described with reference to FIGS. 12 and 13, starting from the position on the target frame designated by the user, for each of the three boundary images created by the boundary detection unit 31. Is cut out.
[0165]
On the other hand, when it is determined in step S71 that the average value of the V components of 8 × 8 pixels including the pixel at the designated position is not less than 50, the process proceeds to step S74, and the processing control unit 7 determines each of the H, S, and V planes. Is determined to be generated by the boundary detection unit 31.
[0166]
That is, for a region having a somewhat large V component, the plane of the boundary image from which the region can be accurately cut out varies depending on the feature of the region. Furthermore, in this case, since there is no previous frame history information, it is difficult to predict which plane boundary image is appropriate for segmentation. Therefore, in step S74, the three planes H, S, and V are determined planes.
[0167]
In step S75, the processing control unit 7 uses the threshold TH used for binarization to obtain boundary images of the H, S, and V planes that are decision planes._H, TH_S, TH_V, For example, all are determined to be a default value of 100, the determination plane is an H, S, V plane, and a threshold value TH_H, TH_S, TH_VTo the object extraction unit 3 as decision information, and the process ends.
[0168]
In this case, the edge detection unit 31 (FIG. 9) of the object extraction unit 3 performs edge detection on the images of the H, S, and V planes in the

edge detection units

212H, 212S, and 212V, respectively. In each of the

binarization units

213H, 213S, and 213V connected to the

edge detection units

212H, 212S, and 212V, the threshold value TH_H, TH_S, TH_V(As described above, binarization is performed using 100 here, and thereby boundary images of three H, S, and V planes are created. Further, in the cutout unit 32 of the object extraction unit 3, the boundary image of the H, S, V plane created by the boundary detection unit 31 is set as the starting point at the position on the target frame specified by the user. The region described in 13 is cut out.
[0169]
Next, with reference to the flowchart of FIG. 18C, the process of the process control unit 7 in the case where the content of the object extraction process is determined based on the designation order in step S55 of FIG. 17 will be described.
[0170]
In this case, first, in step S81, the processing control unit 7 performs the ranking on the object extraction result obtained using the boundary image of the three planes H, S, and V ( (Hereinafter referred to as ranking for planes as appropriate), or an object extraction result obtained using a boundary image of a plane and created by binarization using three different threshold values. It is determined whether or not it has been performed for the following (hereinafter referred to as ranking with respect to the threshold as appropriate).
[0171]
If it is determined in step S81 that the ranking is ranking for the plane, the process proceeds to step S82, and the processing control unit 7 uses the boundary image of the three planes H, S, and V to obtain the object. The rank for the extraction result is recognized, and the boundary detection unit 31 determines to create a plane boundary image from which the first object extraction result is obtained. That is, the processing control unit 7 sets the plane of the boundary image from which the first object extraction result is obtained as the decision plane.
[0172]
In step S83, the processing control unit 7 determines the three threshold values TH1 to TH3 used for binarization for obtaining the boundary image of the determination plane as default values, for example, 40, 100, and 180, respectively. Then, the fact that the decision plane is, for example, the V plane and the threshold values TH1 to TH3 are supplied to the object extraction unit 3 as decision information, and the process ends.
[0173]
In this case, the boundary detection unit 31 (FIG. 9) of the object extraction unit 3 determines the decision plane (the object extraction result on which the first ranking is performed) from among the

edge detection units

212H, 212S, and 212V. In the case of performing edge detection on an image (image plane), edge detection is performed. Then, in the binarization unit connected to the edge detection unit, binarization is performed using three

threshold values

40, 100, and 180 as the threshold values TH1 to TH3, thereby creating three boundary images. The Further, in the cutout unit 32 of the object extraction unit 3, each of the three boundary images created by the boundary detection unit 31 is described with reference to FIGS. The cut out area is cut out.
[0174]
On the other hand, when it is determined in step S81 that the ranking is ranking with respect to the threshold value, the process proceeds to step S84, and the process control unit 7 obtains the object extraction result of the frame of interest in which the ranking is performed. It is determined that the boundary detection unit 31 creates a boundary image of the same plane as the boundary image plane used in the above. That is, the process control unit 7 sets the plane of the boundary image used when obtaining the previous object extraction result for the frame of interest as the decision plane.
[0175]
In step S85, the processing control unit 7 determines three threshold values TH1 to TH3 used for binarization for obtaining a boundary image of the determination plane based on ranking of the threshold values. That is, the processing control unit 7 determines the threshold value designated as the first among the three threshold values used when the previous object extraction result is obtained as the threshold value TH1. Furthermore, the processing control unit 7 calculates the average value of the threshold value designated as the first and the threshold value designated as the second among the three threshold values used when the previous object extraction result was obtained. Determine to TH2. Further, the processing control unit 7 determines the threshold value designated as the second among the three threshold values used when the previous object extraction result is obtained as the threshold value TH3. Thereafter, the process control unit 7 supplies the decision plane and the thresholds TH1 to TH3 as decision information to the object extraction unit 3, and ends the process.
[0176]
In this case, the boundary detection unit 31 (FIG. 9) of the object extraction unit 3 selects the decision plane (the plane of the boundary image used when obtaining the previous object extraction result from the

edge detection units

212H, 212S, and 212V). In the case of performing edge detection on an image of the same plane), edge detection is performed. Then, in the binarization unit connected to the edge detection unit, binarization is performed using the three threshold values TH1 to TH3 determined as described above based on the ranking of the threshold values used last time. Thus, three boundary images are created. Further, in the cutout unit 32 of the object extraction unit 3, each of the three boundary images created by the boundary detection unit 31 is described with reference to FIGS. The cut out area is cut out.
[0177]
As described above, when determining the contents of the object extraction process, if the user designates a certain position on the frame of interest, the history information of the previous frame does not exist and the V component near the designated position is present. When the average value of is equal to or greater than 50, three object extraction results are obtained from the boundary images of the three planes H, S, and V (steps S71, S74, and S75 in FIG. 18B). Then, when the planes are ranked with respect to the three object extraction results, three images are obtained from the first one plane image and the three boundary images created from the three threshold values TH1 to TH3. An object extraction result is obtained (steps S81 to S83 in FIG. 18C).
[0178]
Furthermore, when the user designates a certain position on the frame of interest, the history information of the previous frame does not exist and the average value of the V component near the designated position is less than 50. Three object extraction results are obtained from an image of a certain V plane and three boundary images created from the default three thresholds TH1 to TH3 (steps S71 to S73 in FIG. 18B).
[0179]
In addition, when the user designates a certain position on the frame of interest and the history information of the previous frame exists, it is determined based on one plane image determined based on the history information and the history information. Three object extraction results are obtained from the three boundary images created from the three threshold values TH1 to TH3 (FIG. 18A).
[0180]
As described above, after three object extraction results are obtained from an image of one plane and three boundary images created from three threshold values, the three object extraction results are ranked (with respect to the threshold values). When the ranking is performed, as described above, the three threshold values TH1 to TH3 are updated based on the ranking (steps S81, S84, and S85 in FIG. 18C).
[0181]
That is, for example, as shown in FIG. 19A, when the thresholds TH1, TH2, and TH3 are respectively designated as the first, second, and third positions, in the next object extraction process, As shown in FIG. 19B, the threshold value TH1 is the previous first threshold value, the threshold value TH2 is the average value of the previous first and second threshold values, and the threshold value TH3 is the previous threshold value. The second threshold is determined. Further, ranking is performed on the three object extraction results obtained using the three threshold values TH1 to TH3 determined in this way, and as a result, as shown in FIG. , TH3 are designated as the third place, the first place, and the second place, respectively, in the next object extraction process, as shown in FIG. The threshold TH2 is determined as the average value of the previous first and second thresholds, and the threshold TH3 is determined as the previous second threshold. Then, the three object extraction results obtained by using the three threshold values TH1 to TH3 determined in this way are ranked again, and as a result, as shown in FIG. When TH1, TH2, and TH3 are designated as the third, first, and second positions, respectively, in the next object extraction process, as shown in FIG. Are determined as the previous first threshold, the threshold TH2 is the average value of the previous first and second thresholds, and the threshold TH3 is determined as the previous second threshold.
[0182]
Therefore, by repeating the ranking, the threshold values TH1, TH2, and TH3 converge to values more suitable for extracting objects from the target frame, and as a result, accurate object extraction is possible. Become.
[0183]
Further, when there is history information of the previous frame, three threshold values TH1 to TH3 are determined based on the history information, so that the three threshold values TH1 to TH3 extract an object from the frame of interest. Therefore, the user does not perform the “order designation” operation so many times, that is, in the best case, the user does not perform the “order designation” operation once. However, it is possible to obtain a good object extraction result for the frame of interest.
[0184]
In the embodiment of FIGS. 18 and 19, when there is no history information and the average value of the V component near the designated position is 50 or more, the boundary image of the three planes H, S, and V When three object extraction results are obtained and the planes are ranked with respect to the three object extraction results, the image is generated from the image of the first one plane and the three threshold values TH1 to TH3. Three object extraction results are obtained from the two boundary images, and thereafter, the thresholds can be ranked.
[0185]
On the other hand, even when there is no history information, when the average value of the V component near the designated position is less than 50, as described above, from an empirical rule, an image of one plane and three Three object extraction results are obtained from the three boundary images created from the thresholds TH1 to TH3, and then the thresholds can be ranked. Therefore, in this case, the user does not need to rank the planes, and the operation burden on the user can be reduced accordingly.
[0186]
Next, the initial extraction process in step S14 of FIG. 4 will be described.
[0187]
In the initial extraction process, the user uses the final object extraction result and history information obtained from the previous frame without performing the “position specification” for specifying the position of the object for the new attention frame. A plurality of (three in this embodiment) object extraction processes are performed, and the object extraction results are displayed on the display unit 5.
[0188]
That is, FIG. 20 shows a display example of the screen on the display unit 5 immediately after the initial extraction process is performed for a new frame of interest.
[0189]
In the embodiment of FIG. 20, the reference screen displays a new image of the frame of interest (original image), and the result screens # 1 to # 3 show the final object extraction obtained from the previous frame. Three object extraction results by three object extraction processes performed using the results and history information are respectively displayed.
[0190]
In the embodiment of FIG. 20, a reset record button (Reset Record) 205 is newly displayed below the undo button 204 on the reference screen. This reset record button 205 displays the history information of the previous frame. Operated when erasing. That is, when the reset record button 205 is clicked with the mouse 9, the history information of the previous frame stored in the history management unit 4 becomes unusable. However, when the reset record button 205 is clicked again, the history information becomes usable.
[0191]
The object extraction result for the new frame of interest displayed on the result screens # 1 to # 3 is obtained by performing, for example, the following first to third initial extraction processes in the object extraction unit 3. Each is required.
[0192]
That is, for example, assume that a history image as shown in FIG. 21A can be obtained for the previous frame. Here, the embodiment of FIG. 21 (A) shows a history image in the case where the portion where the person is displayed is extracted as an object from the previous frame where the whole body of the person is displayed. Furthermore, the history information in FIG. 21A is obtained by using an S-plane image for the human head region and an H-plane image for the torso region using the threshold 50 in binarization. The threshold value 100 in binarization is used, and for the lower body area, the V plane image and the threshold value 80 in binarization are used to indicate that object extraction has been performed. Therefore, in the history image of FIG. 21, the pixel values of all the pixels constituting the human head region, all the pixels constituting the body portion region, or all the pixels constituting the lower body portion region are respectively It has the same ID.
[0193]
In the first initial extraction process, for example, the process control unit 7 first obtains the center of gravity of a region formed by a set of pixels having the same ID as the pixel value in the history image of the previous frame. Therefore, in the embodiment shown in FIG. 21A, the center of gravity of the human head region, the torso region, and the lower body region is obtained.
[0194]
If the coordinates of the center of gravity of a certain region are expressed as (X, Y), the center of gravity (X, Y) can be obtained, for example, according to the following equation.
[0195]
X = Σx_k/ N
Y = Σy_k/ N
However, N represents the number of pixels constituting a certain region, and Σ represents summation when the variable k is changed from 1 to N. Also, (x_k, Y_k) Represents the coordinates of the kth pixel constituting a certain area.
[0196]
For each of the human head region, body region, and lower body region, for example, as shown in FIG. 21B, the center of gravity (x1, y1), (x2, y2), (x3, y3) Is obtained, the process control unit 7 controls the motion detection unit 6 to thereby determine the motion vector (v of the frame of interest based on the center of gravity (x1, y1) of the previous frame._x, V_y). Further, the processing control unit 7 sends the designated position (x1, y1) of the previous frame to the position correction unit 71 built in the motion vector (v_x, V_y), The position (x1 ′, y1 ′) on the target frame corresponding to the center of gravity (x1, y1) of the previous frame is obtained. That is, in this case, the position correction unit 71, for example, formula (x1 ′, y1 ′) = (x1, y1) + (v_x, V_y) To obtain the position (x1 ′, y1 ′) on the target frame corresponding to the center of gravity (x1, y1) of the previous frame.
[0197]
Thereafter, the process control unit 7 acquires the ID that is the pixel value of the pixel at the center of gravity (x1, y1) of the previous frame by referring to the previous frame history image stored in the history image storage unit 42, and The parameter set associated with the ID is acquired by referring to the parameter table storage unit 43. Then, the processing control unit 7 creates a boundary image based on the parameter set acquired as described above, and a position on the target frame corresponding to the center of gravity (x1, y1) of the previous frame from the three boundary images. It is determined to cut out a region starting from (x1 ′, y1 ′), and determination information to that effect is supplied to the object extraction unit 3.
[0198]
The processing control unit 7 includes the remaining two centroids (x2, y2) of the centroids (x1, y1), (x2, y2), (x3, y3) of the three regions illustrated in FIG. The same processing is performed for (x3, y3).
[0199]
Thereby, as shown in FIG. 21C, the object extraction unit 3 creates a boundary image based on the parameter set corresponding to the determination information for the target frame, and the center of gravity of the previous frame from the boundary image. Positions (x1 ′, y1 ′), (x2 ′, y2 ′), (x3 ′, y3 ′) on the target frame corresponding to (x1, y1), (x2, y2), (x3, y3), respectively A region as a starting point is cut out and an object is extracted. The object extraction result is stored in the result buffer 33A as shown in FIG. 21D, and is displayed on the result screen # 1 as shown in FIG.
[0200]
Next, in the second initial extraction process, the object extraction unit 3 extracts an object from the frame of interest, for example, by performing template matching.
[0201]
That is, the object extraction unit 3 reads the final object extraction result of the previous frame and the target frame from the storage 1 via the target frame processing unit 2, and as shown in FIG. The final object extraction result and the frame of interest are overlapped to calculate the sum of absolute value differences between pixel values (for example, luminance) of corresponding pixels. As shown in FIG. 22 (B), the object extraction unit 3 sets the position where the final object extraction result of the previous frame and the target frame are overlapped, for example, one pixel as shown in FIG. Each time, the positional relationship between the final object extraction result of the previous frame and the frame of interest when the sum of absolute value differences is minimized is obtained. Furthermore, as shown in FIG. 22C, the object extraction unit 3 selects pixels whose absolute value difference between pixel values from the final object extraction result of the previous frame is 20 or less, for example, in the positional relationship. The pixel value is detected from the target frame, and the pixel value is written in the result buffer 33B as the object extraction result for the target frame. The object extraction result written in the result buffer 33B in this way is displayed on the result screen # 2 as shown in FIG.
[0202]
Next, in the third initial extraction process, the object extraction unit 3 extracts an object from the frame of interest, for example, in the same manner as described with reference to FIG. This object extraction result is written into the result buffer 33C and displayed on the result screen # 3 as shown in FIG.
[0203]
Since the initial extraction process described above is automatically performed without waiting for user input after the frame of interest has been changed to a new frame, user operations such as “position designation” and “order designation” are performed. The burden can be reduced.
[0204]
By the way, for any one of the three object extraction results obtained by the initial extraction process, the user performs an operation of “acquire all” or “partial acquisition”. When all or a part of any of these is reflected in the object buffer 23 and used as the final object extraction result of the frame of interest, the final object extraction result is obtained by the initial extraction process. The history information does not exist for the portion, and when the next frame is set as the frame of interest, the history information of the previous frame does not exist, and as a result, the operation burden on the user may increase.
[0205]
Therefore, the history information of the previous frame can be inherited by the history information of the next frame.
[0206]
That is, for example, as shown in FIG. 23A, after the final object extraction result is obtained for the previous frame, the initial extraction process is performed, so that the object extraction result of the target frame is displayed on the result screen. Suppose that a part of the object extraction result is reflected in the object buffer 23 by the “partial acquisition” operation by the user.
[0207]
In this case, the processing control unit 7 obtains the motion vector of the object part reflected in the object buffer 23 by controlling the motion detection unit 6, and uses the motion vector as shown in FIG. The position of the portion corresponding to the area reflected in the object buffer 23 of the history image of the previous frame is corrected. Furthermore, the process control unit 7 controls the history management unit 4 to copy the corrected history image portion as the history image of the frame of interest.
[0208]
In addition, the processing control unit 7 specifies whether or not the position on the previous frame specified by the user clicking the mouse 9 exists within the range of the previous frame history image copied as the history image of the frame of interest. The position is determined by referring to the position storage unit 41, and if it exists, as shown in FIG. 23C, the position (designated position) is corrected by the motion vector described with reference to FIG. Then, the process control unit 7 controls the history management unit 4 to store the corrected coordinates of the designated position in the designated position storage unit 41 as the coordinates of the designated position of the target frame.
[0209]
Such inheritance of history information is reflected in the object buffer 23 by the object extraction results obtained by the first and third initial extraction processes of the first to third initial extraction processes described above. The object extraction result obtained by the second initial extraction process is not performed when the object buffer 23 reflects the object extraction result.
[0210]
That is, in the first and third initial extraction processes, as described above, the object extraction from the frame of interest is performed based on the history information of the previous frame, so the object extraction result is reflected in the object buffer 23. This means that even if object extraction is performed for the target frame without using the history information of the previous frame, the object extraction result obtained by the same process as the previous frame is used as the final object extraction result. Therefore, there is a high possibility that history information similar to the previous frame is generated.
[0211]
On the other hand, in the second initial extraction process, as described above, the object is extracted from the frame of interest by template matching. Therefore, even if the object extraction result is reflected in the object buffer 23, the reflection is reflected. Since the extracted part is extracted from the target frame regardless of the history information of the previous frame, when the object extraction is performed without using the history information of the previous frame, the same processing as the previous frame is performed. It cannot be said that the obtained object extraction result is highly likely to be the final object extraction result, and therefore, it cannot be said that there is a high possibility that history information similar to the previous frame is created.
[0212]
For this reason, as described above, the history information is inherited only when the object extraction results obtained by the first and third initial extraction processes are reflected in the object buffer 23, and the second initial extraction process is performed. When the object extraction result obtained by the above is reflected in the object buffer 23, it can be avoided.
[0213]
However, the history information can be inherited even if any of the object extraction results obtained by the first to third initial extraction processes is reflected in the object buffer 23. is there.
[0214]
As described above, a plurality of object extraction results are obtained by a plurality of processes, and a good one is judged from the plurality of object extraction results, and reflected in the final object extraction result. Therefore, accurate object extraction can be performed with an easy operation.
[0215]
Furthermore, when there is history information of the previous frame, the object extraction of the frame of interest is performed based on the history information and the position on the frame of interest input by the user. Thus, accurate object extraction can be performed.
[0216]
That is, in the image processing apparatus of FIG. 2, when the user designates a certain position on the object of the target frame, the region of the target frame is cut out by three processes (object extraction) starting from that position. Three object extraction results obtained as a result are displayed. Furthermore, when the user designates ranks for the three object extraction results as necessary, the region of the frame of interest is cut out again by three processes based on the ranks, and the three object extraction results obtained as a result Is displayed. When the user designates an appropriate one from the three object extraction results, the designated object extraction result is reflected in the final object extraction result. Therefore, the user repeats the less burdensome operations of specifying the position on the object, specifying the required order, and specifying the appropriate object extraction result as many times as necessary, so that the object can be accurately extracted from the frame of interest. Can be extracted.
[0217]
Further, when the next frame is set as the frame of interest, the history information created for the previous frame (previous frame) is referred to correspond to the pixel at the position input by the user on the frame of interest. The parameter set used to extract the pixels of the previous frame as an object is recognized, and the contents of the three object extraction processes are determined based on the parameter set. Therefore, the user can obtain a good object extraction result from the frame of interest simply by performing an operation with less burden of designating the position of the frame of interest on the object.
[0218]
In the present embodiment, the user specifies the position of the target frame on the object. However, for example, the user may specify a range of a part of the target frame object. Is also possible.
[0219]
In this embodiment, the object extraction of the target frame is performed based on the history information of only the previous frame. However, the object extraction of the target frame is performed on the history information of the past several frames, for example. It is also possible to perform weighting based on the weighted history information. In the present embodiment, the history information of the previous frame is used for object extraction from the frame of interest, assuming that the frames are processed in time order. However, for example, the frames are processed in reverse order of time. In this case, it is possible to use history information of a frame that follows in time for extracting an object from the frame of interest.
[0220]
Furthermore, in the present embodiment, an object is extracted from each frame of a moving image, but the present invention can also be applied to object extraction from a still image.
[0221]
In addition to extracting a so-called foreground portion, the present invention can also be applied to, for example, extracting some constituent elements of the background.
[0222]
Furthermore, the object extraction process described in the present embodiment is an exemplification, and what kind of object extraction process is adopted is not particularly limited.
[0223]
The present invention can be widely applied to image processing apparatuses such as broadcast systems and editing systems.
【The invention's effect】
Of the present inventionAccording to one aspect,Accurate object extraction can be performed with an easy operation.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a hardware configuration example of an embodiment of an image processing apparatus to which the present invention has been applied.
FIG. 2 is a block diagram illustrating a functional configuration example of the image processing apparatus in FIG. 1;
FIG. 3 is a diagram showing a display example of a screen on the display unit 5;
4 is a flowchart illustrating processing of the image processing apparatus in FIG.
FIG. 5 is a diagram for explaining switching of display on the basic screen.
FIG. 6 is a diagram for explaining “undo” and “partial deletion”;
FIG. 7 is a diagram showing how a user designates a point on an object.
FIG. 8 is a diagram for explaining “all acquisition” and “partial acquisition”;
9 is a block diagram illustrating a configuration example of a boundary detection unit 31. FIG.
FIG. 10 is a diagram for explaining thinning processing;
FIG. 11 is a diagram illustrating a boundary image.
12 is a flowchart for explaining processing of a cutout unit 32. FIG.
FIG. 13 is a diagram for explaining processing of the cutout unit 32;
FIG. 14 is a diagram for explaining update of history information;
FIG. 15 is a diagram for explaining object extraction based on history information;
FIG. 16 is a diagram for explaining object extraction based on history information;
FIG. 17 is a flowchart for explaining processing of the processing control unit;
18 is a flowchart for explaining more details of the processing in steps S53 to S55 in FIG.
FIG. 19 is a diagram for explaining threshold update;
20 is a diagram showing a display example of a screen on the display unit 5. FIG.
FIG. 21 is a diagram for explaining a first initial extraction process;
FIG. 22 is a diagram for explaining a second initial extraction process;
FIG. 23 is a diagram for explaining inheritance of history information;
[Explanation of symbols]
1 storage, 2 attention frame processing section, 3 object extraction section, 4 history management section, 5 display section, 6 motion detection section, 7 processing control section, 8 event detection section, 9 mouse, 21 attention frame buffer, 22 background buffer, 23 object buffer, 24 selector, 31 boundary detection unit, 32 cutout unit, 32A to 32C output buffer, 33 result processing unit, 33A to 33C result buffer, 41 designated position storage unit, 42 history image storage unit, 43 parameter table storage unit , 61 Previous frame buffer, 71 Position correction unit, 101 bus, 102 CPU, 103 ROM, 104 RAM, 105 hard disk, 106 output unit, 107 input unit, 108 communication unit, 109 drive, 110 I / O interface Ace, 111 Removable recording medium, 201 Change display button, 202 Use record button, 203 Delete partly button, 204 Undo button, 205 Reset record button, 206 Rank result button, 207 Grab all button, 208 Grab part button, 211 HSV Separation unit, 212H, 212S, 212V Edge detection unit, 213H, 213S, 213V Binarization unit, 214H, 214S, 214V Thinning unit, 215H, 215S, 215V Boundary image storage unit

Claims

An image processing apparatus for extracting a predetermined object from an image,
An object extracting means for extracting an object by a plurality of processes from an image of a focused screen of interest;
A selection means for selecting, based on an input from a user, an object to be reflected in a final object extraction result from the object extraction results by the plurality of processes;
Reflecting means for reflecting the object extraction result selected by the selection means in the final object extraction result;
Determination means for determining the content of the process for extracting the object from the screen of interest based on the processing history that is the content of the process used for extracting the object reflected in the final object extraction result and the input from the user When,
Processing history storage means for storing the processing history ,
The plurality of processes are processes using different thresholds,
The history information includes a threshold value used for extracting the object for each pixel constituting the object,
The determining means includes
By correcting a predetermined point of the attention screen input by the user by a motion vector between the attention screen and a previous screen processed before the attention screen, a predetermined point of the attention screen is obtained. Find the previous screen point corresponding to the point,
Based on the processing history for the previous screen, obtain a predetermined threshold used when a pixel of the point on the previous screen corresponding to the predetermined point on the screen of interest is extracted as an object;
The predetermined threshold value and a value obtained by a predetermined calculation using the predetermined threshold value are determined as threshold values used for a plurality of processes for extracting an object including the predetermined point on the target screen.
Images processing device.

The process history storage means, from the target screen, object extracted based on the processing history for the previous screen, if said reflected in attention screen final object extraction result, for the front screen the threshold value of each pixel constituting the object included in the processing history, the focused screen and can be obtained by correcting the motion vector between the previous screen, the threshold for each pixel of the position moved by the motion vector amount and performs the inheritance of processing history to be copied as the processing history for the attention screen
The image processing apparatus according to claim 1 .

An input history storage means for storing an input history that is a history of coordinates of points input by the user when extracting the object reflected in the final object extraction result ;
The determination means, before the user inputs a point of the attention screen,
Based on the input history for the previous screen, a predetermined point input by the user on the previous screen is corrected by a motion vector between the attention screen and the previous screen, thereby obtaining a predetermined value for the previous screen. The point of the attention screen corresponding to the point of
Based on the processing history for the previous screen, obtain a threshold value used when a pixel at a predetermined point on the previous screen is extracted as an object,
The threshold value used when the pixel at the predetermined point on the previous screen is extracted as an object, and the threshold value used in the process of extracting the object including the point on the target screen corresponding to the predetermined point on the previous screen To decide
The image processing apparatus according to claim 1 .

The input history storage means, when an object extracted from the attention screen based on the input history for the previous screen has been reflected in the final object extracted result of the attention window, the input history of the previous frame A point on the target screen corresponding to the predetermined point on the previous screen, obtained by correcting the predetermined point on the previous screen with a motion vector between the target screen and the previous screen, carry out the inheritance of the input history to be copied as the input history for the target screen
The image processing apparatus according to claim 3 .

An image processing method for extracting a predetermined object from an image,
An object extraction step of extracting an object by a plurality of processes from an image of an attention screen of interest;
A selection step of selecting what is reflected in the final object extraction result from the object extraction results by the plurality of processes based on an input from the user;
A reflecting step of reflecting the object extraction result selected in the selection step in a final object extraction result;
A determination step of determining the content of processing for extracting an object from the screen of interest based on the processing history that is the content of processing used for extracting the object reflected in the final object extraction result and the input from the user When,
A processing history storage step for storing the processing history ,
The plurality of processes are processes using different thresholds,
The history information includes a threshold value used for extracting the object for each pixel constituting the object,
In the determining step,
By correcting a predetermined point of the attention screen input by the user by a motion vector between the attention screen and a previous screen processed before the attention screen, a predetermined point of the attention screen is obtained. Find the previous screen point corresponding to the point,
Based on the processing history for the previous screen, obtain a predetermined threshold used when a pixel of the point on the previous screen corresponding to the predetermined point on the screen of interest is extracted as an object;
The predetermined threshold value and a value obtained by a predetermined calculation using the predetermined threshold value are determined as threshold values used for a plurality of processes for extracting an object including the predetermined point on the target screen.
Images processing method.

A recording medium on which a program for causing a computer to perform image processing for extracting a predetermined object from an image is recorded,
An object extracting means for extracting an object by a plurality of processes from an image of a focused screen of interest;
A selection means for selecting, based on an input from a user, an object to be reflected in a final object extraction result from the object extraction results by the plurality of processes;
Reflecting means for reflecting the object extraction result selected by the selection means in the final object extraction result;
Determination means for determining the content of the process for extracting the object from the screen of interest based on the processing history that is the content of the process used for extracting the object reflected in the final object extraction result and the input from the user When,
Processing history storage means for storing the processing history;
And a program to make the computer function,
The plurality of processes are processes using different thresholds,
The history information includes a threshold value used for extracting the object for each pixel constituting the object,
The determining means includes
By correcting a predetermined point of the attention screen input by the user by a motion vector between the attention screen and a previous screen processed before the attention screen, a predetermined point of the attention screen is obtained. Find the previous screen point corresponding to the point,
Based on the processing history for the previous screen, obtain a predetermined threshold used when a pixel of the point on the previous screen corresponding to the predetermined point on the screen of interest is extracted as an object;
The predetermined threshold value and a value obtained by a predetermined calculation using the predetermined threshold value are determined as threshold values used for a plurality of processes for extracting an object including the predetermined point on the target screen.
Record medium in which the program is recorded.