JP7080614B2

JP7080614B2 - Image processing equipment, image processing system, image processing method, and program

Info

Publication number: JP7080614B2
Application number: JP2017191753A
Authority: JP
Inventors: 剛史古川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-09-29
Filing date: 2017-09-29
Publication date: 2022-06-06
Anticipated expiration: 2037-09-29
Also published as: JP2019067129A

Description

本発明は、撮影画像から特定の領域を抽出する技術に関するものである。 The present invention relates to a technique for extracting a specific region from a photographed image.

撮影画像内の所定のオブジェクト（被写体）を含む領域を、前景領域として当該撮影画像から抽出する前景背景分離という技術がある。この技術によれば、例えば、撮影画像に含まれる移動する人物の画像を自動で得ることができる。前景背景分離の方法としては、撮影画像と予め記憶されている背景画像との差分に基づいて前景領域を抽出する背景差分法や、連続して撮影された複数の撮影画像の差分に基づいて前景領域を抽出するフレーム間差分法がある。 There is a technique called foreground background separation that extracts a region including a predetermined object (subject) in a captured image as a foreground region from the captured image. According to this technique, for example, an image of a moving person included in a captured image can be automatically obtained. Background subtraction methods include background subtraction, which extracts the foreground area based on the difference between the captured image and the background image stored in advance, and the foreground, which is based on the difference between multiple captured images taken continuously. There is an interframe subtraction method that extracts areas.

特許文献１には、背景差分法において撮影環境の明るさの変化に応じて背景画像を更新することで、撮影画像からの移動物体の誤検出を防止することが記載されている。 Patent Document 1 describes that the background subtraction method updates a background image according to a change in the brightness of the shooting environment to prevent erroneous detection of a moving object from the shot image.

特開２０００－３２４４７７号公報Japanese Unexamined Patent Publication No. 2000-324477

しかしながら、従来の技術では、複数の異なるタイミングでの撮影に基づく複数の画像の差分に基づいて、抽出すべき所定の被写体の領域とそれとは別の領域とが区別なく抽出されてしまう場合がある。例えば、撮影画像内に抽出すべき移動する物体と、表示内容が時間と共に変化するディスプレイとが含まれる場合に、移動する物体の領域とディスプレイの領域とが同様に抽出されてしまう。 However, in the conventional technique, a predetermined subject area to be extracted and another area may be extracted without distinction based on the difference between a plurality of images taken at different timings. .. For example, when a moving object to be extracted and a display whose display contents change with time are included in the captured image, the area of the moving object and the area of the display are similarly extracted.

本発明は上記の課題に鑑みてなされたものであり、複数の異なるタイミングでの撮影に基づく複数の画像の差分に基づいて、抽出すべき所定の被写体の領域とそれとは別の領域とが区別なく抽出されてしまうことを抑制することを目的とする。 The present invention has been made in view of the above problems, and a region of a predetermined subject to be extracted and a region different from the region of a predetermined subject to be extracted are distinguished based on the difference between a plurality of images based on shooting at a plurality of different timings. The purpose is to prevent it from being extracted without any problem.

上記の課題を解決するため、本発明に係る画像処理装置は、例えば以下の構成を有する。すなわち、撮影装置による撮影に基づく画像から所定の被写体に対応する被写体領域を抽出する画像処理装置であって、第１所定操作により指定される第１期間内における複数のタイミングでの前記撮影装置による撮影に基づく複数の画像の差分に基づいて、第２所定操作により指定される第２期間内における前記撮影装置による撮影に基づく抽出対象画像の内部の領域であって前記被写体領域の抽出の対象としない領域を特定する特定手段と、前記抽出対象画像と、前記抽出対象画像の撮影タイミングとは異なるタイミングでの前記撮影装置による撮影に基づく別の画像との差分に基づいて、前記抽出対象画像の内部の前記被写体領域を抽出する抽出手段であって、前記特定手段により特定される領域に含まれない画素により構成される前記被写体領域を抽出する抽出手段とを有する。 In order to solve the above problems, the image processing apparatus according to the present invention has, for example, the following configuration. That is, it is an image processing device that extracts a subject area corresponding to a predetermined subject from an image based on the image taken by the photographing device, and is based on the photographing device at a plurality of timings within a first period designated by the first predetermined operation. Based on the difference between a plurality of images based on shooting, the area inside the image to be extracted based on shooting by the shooting device within the second period specified by the second predetermined operation, and the target of extraction of the subject area. The extraction target image is based on the difference between the specific means for specifying a region not to be extracted and another image based on the imaging by the imaging device at a timing different from the imaging timing of the extraction target image. It is an extraction means for extracting the subject area inside, and has an extraction means for extracting the subject area composed of pixels not included in the area specified by the specific means.

本発明によれば、複数の異なるタイミングでの撮影に基づく複数の画像の差分に基づいて、抽出すべき所定の被写体とそれとは別の領域とが区別なく抽出されてしまうことを抑制することが可能となる。 According to the present invention, it is possible to suppress that a predetermined subject to be extracted and a region different from the predetermined subject are extracted without distinction based on the difference between a plurality of images taken at a plurality of different timings. It will be possible.

画像処理システム１００の構成を示すブロック図である。It is a block diagram which shows the structure of an image processing system 100. 実施形態における撮影画像の変化について説明するための図である。It is a figure for demonstrating the change of the photographed image in an embodiment. 実施形態における背景変化領域について説明するための図である。It is a figure for demonstrating the background change area in an embodiment. 画像処理装置１００による背景変化領域を検出する処理について説明するためのフローチャートである。It is a flowchart for demonstrating the process of detecting the background change area by an image processing apparatus 100. 実施形態における背景画像と前景画像について説明するための図である。It is a figure for demonstrating the background image and the foreground image in an embodiment. 画像処理装置１００による前景背景分離の処理について説明するためのフローチャートである。It is a flowchart for demonstrating the process of the foreground background separation by an image processing apparatus 100.

［システム構成］
図１（ａ）は、実施形態に係る画像処理システム１０の概略構成を説明するための図である。画像処理システム１０は、画像処理装置１００、撮影装置１１０、及び画像処理サーバ１２０を有する。 [System configuration]
FIG. 1A is a diagram for explaining a schematic configuration of an image processing system 10 according to an embodiment. The image processing system 10 includes an image processing device 100, a photographing device 110, and an image processing server 120.

撮影装置１１０は、撮影を行うことで撮影画像を生成し、当該撮影画像を画像処理装置１００に入力する。撮影装置１１０は例えば、撮影画像を入力するためのシリアルデジタルインタフェース（ＳＤＩ）などの画像信号インターフェイスを備えるデジタルビデオカメラである。なお、本実施形態における撮影画像は、撮影後にフィルタ処理や解像度変換などの画像処理が行われた画像を含む。 The photographing device 110 generates a photographed image by performing photography, and inputs the photographed image to the image processing device 100. The photographing device 110 is, for example, a digital video camera provided with an image signal interface such as a serial digital interface (SDI) for inputting a captured image. The captured image in the present embodiment includes an image that has undergone image processing such as filter processing and resolution conversion after shooting.

画像処理装置１００は、撮影装置１１０から入力される撮影画像に対して画像処理を行い、撮影画像から前景領域を抽出することで、撮影画像を前景領域と背景領域に分離する。本実施形態ではこの処理を前景背景分離と呼ぶ。本実施形態において前景領域とは、撮影画像における所定のオブジェクト（被写体）に対応する被写体領域であり、背景領域とは当該所定の被写体に対応しない領域である。例えば画像処理装置１００は、サッカーの試合が行われている競技場において撮影装置１１０が撮影した撮影画像を取得し、取得した撮影画像を、選手や審判、ボールなどの所定の被写体を含む前景領域と、フィールド面や客席などを含む背景領域とに分離する。そして画像処理装置１００は、前景領域に基づく前景画像と背景領域に基づく背景画像とを、画像処理サーバ１２０に出力する。 The image processing device 100 performs image processing on the captured image input from the photographing device 110, extracts the foreground region from the captured image, and separates the captured image into a foreground region and a background region. In the present embodiment, this process is referred to as foreground background separation. In the present embodiment, the foreground area is a subject area corresponding to a predetermined object (subject) in the captured image, and the background area is an area not corresponding to the predetermined subject. For example, the image processing device 100 acquires a photographed image taken by the photographing apparatus 110 in a stadium where a soccer game is played, and the acquired photographed image is used as a foreground region including a predetermined subject such as a player, a referee, or a ball. And the background area including the field surface and the audience seats. Then, the image processing apparatus 100 outputs the foreground image based on the foreground region and the background image based on the background region to the image processing server 120.

画像処理装置１００の詳細な構成については後述する。なお、撮影装置１１０による撮影対象はサッカーに限らず、ラグビーや相撲など他の競技であってもよいし、ステージでのライブなどであってもよい。また、画像処理装置１００により前景領域として抽出される所定の被写体は、選手やボールに限らない。 The detailed configuration of the image processing device 100 will be described later. The shooting target by the shooting device 110 is not limited to soccer, but may be other sports such as rugby or sumo, or may be a live performance on a stage. Further, the predetermined subject extracted as the foreground region by the image processing device 100 is not limited to the player or the ball.

画像処理サーバ１２０は、画像処理装置１００から入力された画像に基づく画像処理を行う。例えば画像処理サーバ１２０は、ネットワークケーブルを介して画像処理装置１００から前景画像と背景画像を取得し、表示用の画像を生成して表示部（不図示）に表示させる。 The image processing server 120 performs image processing based on the image input from the image processing device 100. For example, the image processing server 120 acquires a foreground image and a background image from the image processing device 100 via a network cable, generates an image for display, and displays the image on a display unit (not shown).

本実施形態において、画像処理システム１０は、図１（ａ）に示すように複数の撮影装置１１０と複数の画像処理装置１００を有する。複数の撮影装置１１０は、例えば撮影対象となる競技場などに設置され、それぞれ異なる方向から撮影を行う。複数の画像処理装置１００は、それぞれが対応する撮影装置１１０から撮影画像を取得し、前景背景分離を行って前景画像と背景画像を画像処理サーバ１２０に出力する。すなわち、画像処理サーバ１２０は、複数の撮影装置による撮影画像それぞれに対する抽出処理により得られる複数の前景画像と複数の背景画像とを取得する。そして画像処理サーバ１２０は、前景領域として抽出された所定の被写体を含む仮想視点画像を生成する。なお、図１（ａ）においては画像処理システム１０内に２台の撮影装置１１０が含まれるが、撮影装置１１０の数はこれに限定されず、３台以上であってもよい。 In the present embodiment, the image processing system 10 has a plurality of photographing devices 110 and a plurality of image processing devices 100 as shown in FIG. 1 (a). The plurality of photographing devices 110 are installed in, for example, a stadium to be photographed, and photograph from different directions. The plurality of image processing devices 100 acquire captured images from the corresponding photographing devices 110, perform foreground background separation, and output the foreground image and the background image to the image processing server 120. That is, the image processing server 120 acquires a plurality of foreground images and a plurality of background images obtained by extraction processing for each of the captured images by the plurality of photographing devices. Then, the image processing server 120 generates a virtual viewpoint image including a predetermined subject extracted as a foreground region. In FIG. 1A, the image processing system 10 includes two photographing devices 110, but the number of photographing devices 110 is not limited to this, and may be three or more.

本実施形態における仮想視点画像は、仮想的な視点（仮想視点）から被写体を撮影した場合に得られる画像を表す。言い換えると、仮想視点画像は、指定された視点における視界を表す画像である。仮想視点は、例えば画像処理サーバ１２０のユーザにより指定されても良いし、画像解析の結果等に基づいて自動的に指定されても良い。すなわち仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。なお本実施形態では、特に断りがない限り、画像という文言が動画と静止画の両方の概念を含むものとして説明する。すなわち、本実施形態の画像処理システム１０は、静止画及び動画の何れについても処理可能である。 The virtual viewpoint image in the present embodiment represents an image obtained when a subject is photographed from a virtual viewpoint (virtual viewpoint). In other words, a virtual viewpoint image is an image that represents the field of view at a specified viewpoint. The virtual viewpoint may be designated, for example, by the user of the image processing server 120, or may be automatically designated based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. Further, an image corresponding to a viewpoint designated by the user from a plurality of candidates and an image corresponding to the viewpoint automatically designated by the device are also included in the virtual viewpoint image. In the present embodiment, unless otherwise specified, the word "image" will be described as including the concepts of both moving images and still images. That is, the image processing system 10 of the present embodiment can process both still images and moving images.

仮想視点画像を生成するために画像処理サーバ１２０は、仮想視点の指定に応じた視点情報を取得する。また画像処理サーバ１２０は、撮影方向の異なる複数の撮影装置１１０に対応する複数の画像処理装置１００から取得した複数の前景画像に基づいて、前景領域として抽出される所定の被写体の三次元モデルを生成する。三次元モデル生成には、例えばＶｉｓｕａｌＨｕｌｌを用いる方法など、既知の方法が使用される。そして画像処理サーバ１２０は、取得した視点情報、三次元モデル、及び背景画像に基づいてレンダリングを行い、所定の被写体を含む仮想視点画像を生成する。 In order to generate the virtual viewpoint image, the image processing server 120 acquires the viewpoint information according to the designation of the virtual viewpoint. Further, the image processing server 120 obtains a three-dimensional model of a predetermined subject extracted as a foreground region based on a plurality of foreground images acquired from a plurality of image processing devices 100 corresponding to a plurality of photographing devices 110 having different shooting directions. Generate. A known method is used for 3D model generation, for example, a method using Visual Hull. Then, the image processing server 120 performs rendering based on the acquired viewpoint information, the three-dimensional model, and the background image, and generates a virtual viewpoint image including a predetermined subject.

なお、画像処理サーバ１２０が仮想視点画像の生成する方法は、三次元モデルを用いる方法に限らず、他の方法であってもよい。例えば画像処理サーバ１２０は、取得した前景画像と背景画像をそれぞれ視点情報に基づいて射影変換し、変換後の前景画像と背景画像とを合成することで仮想視点画像を生成してもよい。また、画像処理サーバ１２０が行う処理は仮想視点画像などの画像生成に限らず、例えば取得した前景画像自体を表示させる処理であってもよいし、前景画像や背景画像、三次元モデルなどを対応付けて外部のデータベースに出力する処理であってもよい。 The method for generating the virtual viewpoint image by the image processing server 120 is not limited to the method using the three-dimensional model, and may be another method. For example, the image processing server 120 may generate a virtual viewpoint image by projecting and transforming the acquired foreground image and background image based on the viewpoint information and synthesizing the converted foreground image and background image. Further, the process performed by the image processing server 120 is not limited to the generation of an image such as a virtual viewpoint image, but may be a process of displaying the acquired foreground image itself, or supports a foreground image, a background image, a three-dimensional model, or the like. It may be a process of attaching and outputting to an external database.

上記において図１（ａ）を用いて説明したように、本実施形態の画像処理システム１０においては、複数の撮影装置１１０による撮影画像に対する前景背景分離を、複数の画像処理装置１００に分散させて行う。これにより、画像処理サーバ１２０において一括して前景背景分離を行う場合と比べて、画像処理サーバ１２０の負荷を低減し、画像処理システム１０全体としての処理に係る遅延を低減できる。なお、画像処理システム１０の構成は上記で説明したものに限らない。例えば、単一の画像処理装置１００が複数の撮影装置１１０から撮影画像を取得し、それぞれの撮影画像に対して前景背景分離を行ってもよい。また、画像処理装置１００と画像処理サーバ１２０とが一体となって構成されていてもよいし、後述する画像処理装置１００の構成要素が複数の装置に分かれていてもよい。 As described above with reference to FIG. 1A, in the image processing system 10 of the present embodiment, the foreground and background separation for the images captured by the plurality of imaging devices 110 is dispersed among the plurality of image processing devices 100. conduct. As a result, the load on the image processing server 120 can be reduced and the delay related to the processing of the image processing system 10 as a whole can be reduced as compared with the case where the foreground background separation is performed collectively on the image processing server 120. The configuration of the image processing system 10 is not limited to that described above. For example, a single image processing device 100 may acquire captured images from a plurality of imaging devices 110 and perform foreground background separation for each captured image. Further, the image processing device 100 and the image processing server 120 may be integrally configured, or the components of the image processing device 100, which will be described later, may be divided into a plurality of devices.

［装置構成］
図１（ｂ）は、本実施形態に係る画像処理装置１００のハードウェア構成について説明するための図である。なお、画像処理サーバ１２０の構成も画像処理装置１００と同様である。画像処理装置１００は、ＣＰＵ１１１、ＲＡＭ１１２、ＲＯＭ１１３、入力部１１４、外部インターフェイス１１５、及び出力部１１６を有する。 [Device configuration]
FIG. 1B is a diagram for explaining a hardware configuration of the image processing apparatus 100 according to the present embodiment. The configuration of the image processing server 120 is the same as that of the image processing device 100. The image processing device 100 includes a CPU 111, a RAM 112, a ROM 113, an input unit 114, an external interface 115, and an output unit 116.

ＣＰＵ１１１は、ＲＡＭ１１２やＲＯＭ１１３に格納されているコンピュータプログラムやデータを用いて画像処理装置１００の全体を制御する。なお、画像処理装置１００がＣＰＵ１１１とは異なる専用の１又は複数のハードウェアやＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を有し、ＣＰＵ１１１による処理の少なくとも一部をＧＰＵや専用のハードウェアが行ってもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＡＭ１１２は、ＲＯＭ１１３から読みだされたコンピュータプログラムやデータ、及び外部インターフェイス１１５を介して外部から供給されるデータなどを一時的に記憶する。ＲＯＭ１１３は、変更を必要としないコンピュータプログラムやデータを保持する。 The CPU 111 controls the entire image processing apparatus 100 by using computer programs and data stored in the RAM 112 and the ROM 113. The image processing device 100 may have one or more dedicated hardware or GPU (Graphics Processing Unit) different from the CPU 111, and the GPU or dedicated hardware may perform at least a part of the processing by the CPU 111. Examples of dedicated hardware include ASICs (Application Specific Integrated Circuits), FPGAs (Field Programmable Gate Arrays), and DSPs (Digital Signal Processors). The RAM 112 temporarily stores computer programs and data read from the ROM 113, data supplied from the outside via the external interface 115, and the like. The ROM 113 holds computer programs and data that do not need to be changed.

入力部１１４は、例えば操作ボタン、ジョグダイヤル、タッチパネル、キーボード、及びマウスなどで構成され、ユーザによる操作を受け付けて各種の指示をＣＰＵ１１１に入力する。外部インターフェイス１１５は、撮影装置１１０や画像処理サーバ１２０などの外部の装置と通信を行う。外部の装置との通信はＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）ケーブルやＳＤＩケーブルなどを用いて有線で行われてもよいし、アンテナを介して無線で行われてもよい。出力部１１６は、例えば、ディスプレイなどの表示部やスピーカなどの音声出力部で構成され、ユーザが画像処理装置１００を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）を表示したりガイド音声を出力したりする。 The input unit 114 is composed of, for example, an operation button, a jog dial, a touch panel, a keyboard, a mouse, and the like, and receives an operation by a user and inputs various instructions to the CPU 111. The external interface 115 communicates with an external device such as a photographing device 110 or an image processing server 120. Communication with an external device may be performed by wire using a LAN (Local Area Network) cable, SDI cable, or the like, or may be performed wirelessly via an antenna. The output unit 116 is composed of, for example, a display unit such as a display and an audio output unit such as a speaker, and displays a GUI (Graphical User Interface) for a user to operate the image processing device 100 and outputs a guide sound. do.

次に、図１（ａ）に示した画像処理装置１００の機能構成の詳細について説明する。画像処理装置１００は、前景背景分離部１０１（以降、分離部１０１）、変化領域検出部１０２（以降、検出部１０２）、及び通信部１０３を有する。画像処理装置１００が有するこれらの各機能部は、ＣＰＵ１１１がＲＯＭ１１３に格納されたプログラムをＲＡＭ１１２に展開して実行することで実現される。なお、図１（ａ）に示す画像処理装置１００の機能部の少なくとも一部を、ＣＰＵ１１１とは異なる専用の１又は複数のハードウェアやＧＰＵにより実現してもよい。 Next, the details of the functional configuration of the image processing apparatus 100 shown in FIG. 1A will be described. The image processing device 100 has a foreground background separation unit 101 (hereinafter, separation unit 101), a change area detection unit 102 (hereinafter, detection unit 102), and a communication unit 103. Each of these functional units included in the image processing device 100 is realized by the CPU 111 expanding the program stored in the ROM 113 into the RAM 112 and executing the program. It should be noted that at least a part of the functional unit of the image processing device 100 shown in FIG. 1A may be realized by one or a plurality of dedicated hardware or GPU different from the CPU 111.

分離部１０１は、撮影装置１１０から入力された撮影画像に対して前景背景分離を行い、前景画像と背景画像を通信部１０３に出力する。本実施形態において分離部１０１は、背景差分法を用いて撮影画像内の前景領域を抽出する。背景差分法においては、抽出すべき被写体が含まれる撮影画像と、予め記憶されており当該被写体が含まれない背景画像とを比較し、画素値の差が閾値より大きい領域を抽出する。例えば、抽出すべき選手が撮影範囲に含まれる試合中に撮影された撮影画像と、選手が撮影範囲内に存在しない試合前に撮影された背景画像とを比較することで、撮影画像内の選手に対応する領域が抽出される。なお、分離部１０１による前景背景分離の方法はこれに限らず、例えばフレーム間差分法が用いられてもよい。フレーム間差分法においては、同一の撮影装置により連続して撮影された複数の撮影画像の差分に基づく領域が抽出される。 The separation unit 101 separates the foreground and background from the captured image input from the photographing device 110, and outputs the foreground image and the background image to the communication unit 103. In the present embodiment, the separation unit 101 extracts a foreground region in the captured image by using the background subtraction method. In the background subtraction method, a captured image including a subject to be extracted is compared with a background image stored in advance and not including the subject, and a region in which the difference in pixel values is larger than the threshold value is extracted. For example, by comparing the captured image taken during the match in which the player to be extracted is included in the shooting range with the background image taken before the match in which the player does not exist in the shooting range, the player in the captured image is included. The area corresponding to is extracted. The method of separating the foreground and background by the separation unit 101 is not limited to this, and for example, the inter-frame difference method may be used. In the inter-frame difference method, a region based on the difference between a plurality of captured images continuously captured by the same imaging device is extracted.

ここで、分離部１０１が従来の背景差分法をそのまま使用した場合、抽出すべき被写体の領域だけを抽出することができない場合が考えられる。例えば、競技場におけるサッカーの試合を撮影した撮影画像から、選手が映っている領域を抽出したい場合を考える。図５（ａ）は抽出処理の対象となる撮影画像５０００の例を示している。撮影画像５０００には、選手５００１とフィールド２００１に加え、フィールド脇に設置された広告表示用のディスプレイ２００２、ディスプレイ２００３、及びディスプレイ２００４が映っている。ディスプレイ２００２－２００４は、それぞれ時間経過に伴って変化する画像を表示する。 Here, when the separation unit 101 uses the conventional background subtraction method as it is, it may not be possible to extract only the area of the subject to be extracted. For example, consider a case where it is desired to extract an area in which a player is reflected from a photographed image of a soccer game in a stadium. FIG. 5A shows an example of a captured image 5000 to be extracted. In addition to the athletes 5001 and the field 2001, the captured image 5000 shows the display 2002, the display 2003, and the display 2004 for displaying advertisements installed on the side of the field. The displays 2002-2004 each display an image that changes with the passage of time.

この場合に、図５（ｂ）に示すような事前に撮影された撮影画像を背景画像５１００として、撮影画像５０００に対して背景差分法が実行されると、図５（ｃ）に示すような差分領域画像５２００が得られる。本実施形態における差分領域画像５２００とは、具体的には、撮影画像５０００内の画素のうち背景画像５１００内の対応する画素との画素値の差分が閾値より大きい画素により構成される画像である。ただし、差分領域画像５２００は撮影画像５０００と背景画像５１００との差分が閾値以上である領域の画像であればよく、１画素ごとに画素値の差を算出することにより特定されるものに限らない。例えば、撮影画像５０００と背景画像５１００との差分を複数の画素により構成されるブロックごとに算出してもよい。複数の画像における対応するブロックの差分は、ブロック内における画素値の平均値を用いて算出されてもよいし、ブロック内における画素値の最頻値などを用いて算出されてもよい。 In this case, when the background subtraction method is executed on the captured image 5000 with the captured image captured in advance as shown in FIG. 5 (b) as the background image 5100, as shown in FIG. 5 (c). A difference region image 5200 is obtained. The difference region image 5200 in the present embodiment is specifically an image composed of pixels in the captured image 5000 whose pixel value difference from the corresponding pixel in the background image 5100 is larger than the threshold value. .. However, the difference area image 5200 may be an image in a region where the difference between the captured image 5000 and the background image 5100 is equal to or greater than the threshold value, and is not limited to the one specified by calculating the difference in pixel values for each pixel. .. For example, the difference between the captured image 5000 and the background image 5100 may be calculated for each block composed of a plurality of pixels. The difference between the corresponding blocks in a plurality of images may be calculated by using the average value of the pixel values in the block, or may be calculated by using the mode value of the pixel values in the block or the like.

図５（ｃ）に示すように、従来の背景差分法が実行された場合、選手５００１に対応する領域と共に、ディスプレイ２００２－２００４の表示面に対応する領域も同様に抽出されてしまう。これらの領域が同様に抽出されてしまうと、抽出結果に基づいて画像処理サーバ１２０により生成される仮想視点画像の画質が低くなってしまう虞がある。例えば画像処理装置１００が、複数の撮影装置により複数の方向からフィールド２００１とその周辺を撮影した複数の撮影画像それぞれから差分領域画像を生成した場合に、選手５００１はいずれの差分領域画像にも含まれる。一方、ディスプレイ２００２－２００４の表示面は、ディスプレイ２００２－２００４の裏側方向から撮影された撮影画像には含まれないため、その撮影画像から生成された差分領域画像にも含まれない。選手５００１および表示面の両方を含む差分領域画像の画素と選手５００１のみを含む差分領域画像の画素とを正確に対応付けるのは困難である。そのため、画像処理サーバ１２０がこれらの差分領域画像に基づいて選手５００１の三次元モデルを生成しようとすると、モデルの精度が悪化してしまう。その結果、仮想視点画像の画質も低下してしまう。 As shown in FIG. 5C, when the conventional background subtraction method is executed, the area corresponding to the display surface of the display 2002-2004 is similarly extracted together with the area corresponding to the player 5001. If these areas are similarly extracted, the image quality of the virtual viewpoint image generated by the image processing server 120 based on the extraction result may be deteriorated. For example, when the image processing device 100 generates a difference region image from each of a plurality of captured images obtained by photographing the field 2001 and its surroundings from a plurality of directions by a plurality of photographing devices, the player 5001 is included in any of the difference region images. Is done. On the other hand, since the display surface of the display 2002-2004 is not included in the captured image captured from the back side direction of the display 2002-2004, it is not included in the difference region image generated from the captured image. It is difficult to accurately associate the pixels of the difference region image including both the athlete 5001 and the display surface with the pixels of the difference region image including only the athlete 5001. Therefore, when the image processing server 120 tries to generate a three-dimensional model of the player 5001 based on these difference region images, the accuracy of the model deteriorates. As a result, the image quality of the virtual viewpoint image also deteriorates.

そこで本実施形態における画像処理装置１００は、図５（ｄ）に示すような、選手５００１に対応する領域を含み且つディスプレイ２００２－２００４の表示面に対応する領域を含まない前景画像５３００を取得するために、検出部１０２を備える。検出部１０２は、撮影画像内の背景変化領域を検出する。背景変化領域とは、前景領域としての抽出の対象としない領域である。より具体的には、背景変化領域は、背景領域として識別すべき領域、すなわち前景領域として抽出されるべきでない領域でありながら、時間経過に伴って変化する領域である。上述の図５に示す例の場合、ディスプレイ２００２－２００４の表示面に対応する領域が背景変化領域である。ただしこれに限らず、背景変化領域に対応する被写体は、抽出すべき所定の被写体とは別の被写体であって時間経過に伴って変化する被写体であればよい。背景変化領域の検出方法については後述する。 Therefore, the image processing apparatus 100 in the present embodiment acquires a foreground image 5300 as shown in FIG. 5D, which includes an area corresponding to the player 5001 and does not include an area corresponding to the display surface of the display 2002-2004. Therefore, a detection unit 102 is provided. The detection unit 102 detects the background change area in the captured image. The background change area is an area that is not the target of extraction as the foreground area. More specifically, the background change region is a region that should be identified as a background region, that is, a region that should not be extracted as a foreground region, but that changes with the passage of time. In the case of the example shown in FIG. 5 above, the area corresponding to the display surface of the display 2002-2004 is the background change area. However, the subject is not limited to this, and the subject corresponding to the background change area may be a subject different from the predetermined subject to be extracted and may be a subject that changes with the passage of time. The method of detecting the background change area will be described later.

検出部１０２は、背景変化領域を検出すると、検出された領域を示す情報を分離部１０１に提供する。そして分離部１０１は、撮影装置１１０から取得した撮影画像に対して、検出部１０２から取得した情報を用いて前景背景分離を行うことで、抽出すべき被写体の領域を前景領域として抽出できる。分離部１０１による前景背景分離の詳細については後述する。 When the detection unit 102 detects the background change region, the detection unit 102 provides the separation unit 101 with information indicating the detected region. Then, the separation unit 101 can extract the region of the subject to be extracted as the foreground region by performing the foreground background separation on the captured image acquired from the imaging device 110 using the information acquired from the detection unit 102. The details of the foreground background separation by the separation unit 101 will be described later.

通信部１０３は、分離部１０１から入力された前景画像及び背景画像を画像処理サーバ１２０へ送信する。通信部１０３は例えば、ＰＣＩＥｘｐｒｅｓｓなどの高速シリアルインターフェイスを備えたＬＡＮカードなどにより構成される。 The communication unit 103 transmits the foreground image and the background image input from the separation unit 101 to the image processing server 120. The communication unit 103 is composed of, for example, a LAN card provided with a high-speed serial interface such as PCI Express.

［背景変化領域の検出］
次に、背景変化領域の検出について説明する。図２（ａ）－（ｃ）は、撮影装置１１０により撮影された撮影画像の例を示す。図２（ａ）は撮影装置１１０が時刻Ｔにおいて撮影した撮影画像２０００であり、図２（ｂ）は撮影装置１１０が時刻Ｔ＋１において撮影した撮影画像２１００であり、図２（ｃ）は撮影装置１１０が時刻Ｔ＋２において撮影した撮影画像２２００である。 [Detection of background change area]
Next, the detection of the background change area will be described. 2 (a)-(c) show an example of the photographed image photographed by the photographing apparatus 110. 2A is a captured image 2000 captured by the photographing device 110 at time T, FIG. 2B is a captured image 2100 captured by the photographing device 110 at time T + 1, and FIG. 2C is a photographing device. 110 is a captured image 2200 taken at time T + 2.

撮影対象はサッカーの試合であり、撮影画像２０００においてはフィールド２００１及び広告表示用のディスプレイ２００２―２００４が撮影されている。フィールド２００１内にはゴールエリアを示すライン２００５が引かれている。撮影画像２０００―２２００は、スタジアムにおいて試合の準備が行われている場面など、撮影範囲内に選手などの人物がいない状況であって、撮影範囲内にディスプレイ２００２－２００４の表示面が含まれる状況において撮影された画像である。この撮影時において、広告表示のリハーサルが行われており、ディスプレイ２００２―２００４に表示される画像は変化する。 The subject to be photographed is a soccer game, and in the photographed image 2000, the field 2001 and the display 2002-2004 for displaying an advertisement are photographed. A line 2005 indicating the goal area is drawn in the field 2001. The captured image 2000-2200 is a situation in which there is no person such as a player in the shooting range, such as a scene where a match is being prepared at a stadium, and the display surface of the display 2002-2004 is included in the shooting range. It is an image taken in. At the time of this shooting, the advertisement display is being rehearsed, and the image displayed on the displays 2002-2004 changes.

広告表示用のディスプレイ２００２は時刻Ｔでは広告画像２０１２を表示しているのに対し、時刻Ｔ＋１では広告画像２１１２を表示している。例えば、広告画像２１１２は、広告画像２０１２が縦にスクロールする画像効果とともに次の広告画像２２１２に変化している途中の画像である。広告画像２０１３及び２０１４に関しても同様に、それぞれ広告画像２１１３及び２１１４に変化している。なお、広告画像２１１３は広告画像２０１３が横にスクロールする画像効果と共に次の広告画像２２１３に変化している途中の画像である。 The display 2002 for displaying the advertisement displays the advertisement image 2012 at the time T, while displaying the advertisement image 2112 at the time T + 1. For example, the advertisement image 2112 is an image in the middle of changing to the next advertisement image 2212 with the image effect that the advertisement image 2012 scrolls vertically. Similarly, the advertisement images 2013 and 2014 are changed to the advertisement images 2113 and 2114, respectively. The advertisement image 2113 is an image in the middle of changing to the next advertisement image 2213 with the image effect that the advertisement image 2013 scrolls horizontally.

図３（ａ）は、撮影画像２０００と撮影画像２２００の画素値の差分の例を示している。例えば、時刻Ｔの撮影画像２０００におけるディスプレイ２００２の表示面に対応する画素の画素値（Ｒ，Ｇ，Ｂ）は（２２０、１０、１０）である。また、ディスプレイ２００３の表示面に対応する画素の画素値（Ｒ，Ｇ，Ｂ）とディスプレイ２００４の表示面に対応する画素の画素値（Ｒ，Ｇ，Ｂ）は、それぞれ（１０、２３０、１０）と（１０、１０、２４０）である。 FIG. 3A shows an example of the difference between the pixel values of the captured image 2000 and the captured image 2200. For example, the pixel values (R, G, B) of the pixels corresponding to the display surface of the display 2002 in the captured image 2000 at time T are (220, 10, 10). Further, the pixel values (R, G, B) of the pixels corresponding to the display surface of the display 2003 and the pixel values (R, G, B) of the pixels corresponding to the display surface of the display 2004 are (10, 230, 10), respectively. ) And (10, 10, 240).

一方、時刻Ｔ＋２の撮影画像２２００におけるディスプレイ２００２－２００４の表示面に対応する画素の画素値はそれぞれ（１０、２３０、１０）、（１０、１０、２４０）、及び（２２０、１０、１０）である。なお、図３においては説明を簡単にするために、ディスプレイ２００２－２００４による表示画像はそれぞれ同時刻の表示面全体において一律同一の画素値であるものとしている。すなわち、単一の撮像画像におけるディスプレイ２００２の表示面に対応する画素の画素値は一意に表される。ディスプレイ２００３及びディスプレイ２００４についても同様である。ただしこれに限らず、例えば図２に示すように表示面内の位置によって異なる画素値を有する表示画像であってもよい。また、撮影画像を表す色空間はＲＧＢに限定されず、他の色空間でも良い。 On the other hand, the pixel values of the pixels corresponding to the display surface of the display 2002-2004 in the captured image 2200 at time T + 2 are (10, 230, 10), (10, 10, 240), and (220, 10, 10), respectively. be. In FIG. 3, for the sake of simplicity, the images displayed on the displays 2002-2004 are assumed to have the same pixel value on the entire display surface at the same time. That is, the pixel values of the pixels corresponding to the display surface of the display 2002 in a single captured image are uniquely represented. The same applies to the display 2003 and the display 2004. However, the present invention is not limited to this, and may be a display image having different pixel values depending on the position in the display surface, for example, as shown in FIG. Further, the color space representing the captured image is not limited to RGB, and may be another color space.

次に図４を用いて、検出部１０２が背景変化領域を検出する動作について説明する。図４に示す処理は、画像処理装置１００が背景変化領域を検出するモードにおいて撮影装置１１０から撮影画像を取得したタイミングで開始される（Ｓ４０１０）。ただし、図４に示す処理の開始タイミングはこれに限らない。画像処理装置１００のモードは例えばユーザによる操作に応じて設定される。具体的には、背景変化領域の検出を行うための画像の撮影期間がユーザによる所定操作に応じて指定されることで、画像処理装置１００は背景変化領域を検出するモードに設定される。ここで指定される撮影期間は、例えば競技場における試合開始前のリハーサル中など、撮影装置１１０の撮影範囲内に選手などの所定の被写体が含まれない期間である。なお、ユーザは撮影期間の開始時と終了時にそれぞれ操作を行ってもよい。 Next, the operation of the detection unit 102 to detect the background change region will be described with reference to FIG. The process shown in FIG. 4 is started at the timing when the captured image is acquired from the photographing device 110 in the mode in which the image processing device 100 detects the background change region (S4010). However, the start timing of the process shown in FIG. 4 is not limited to this. The mode of the image processing device 100 is set according to, for example, an operation by the user. Specifically, the image processing apparatus 100 is set to the mode for detecting the background change area by designating the shooting period of the image for detecting the background change area according to a predetermined operation by the user. The shooting period specified here is a period during which a predetermined subject such as a player is not included in the shooting range of the shooting device 110, for example, during a rehearsal before the start of a match at a stadium. The user may perform operations at the start and end of the shooting period, respectively.

背景変化領域を検出するための画像の撮影期間がユーザの操作によって指定できることにより、背景変化領域の検出に適した任意の期間を設定することができる。例えば、前景として抽出すべき所定の被写体が含まれない撮影画像が背景変化領域の検出に用いられることで、抽出すべき被写体とそうでない被写体との両方が含まれる撮影画像が用いられる場合よりも、背景変化領域を精度よく特定できる。 Since the shooting period of the image for detecting the background change area can be specified by the user's operation, it is possible to set an arbitrary period suitable for detecting the background change area. For example, by using a captured image that does not include a predetermined subject to be extracted as the foreground for detecting a background change area, a captured image that includes both a subject to be extracted and a subject that is not to be extracted is used as compared to the case where a captured image is used. , The background change area can be specified accurately.

以下の説明において撮影装置１１０から画像処理装置１００に入力される撮影画像は動画の各フレームであるものとするが、入力される撮影画像は複数の時点において撮影された複数の静止画であってもよい。なお、図４に示す処理は撮影装置１１０による撮影と並行してリアルタイムで行われてもよいし、蓄積された撮影画像に基づいて撮影後に行われてもよい。いずれの場合においても、フィールド２００１内に選手５００１がいない試合開始前など、撮影装置１１０の撮影範囲内に抽出すべき所定の被写体が含まれない状況において撮影された撮影画像に基づいて図４の処理が実行される。 In the following description, it is assumed that the captured image input from the photographing device 110 to the image processing device 100 is each frame of the moving image, but the input captured image is a plurality of still images taken at a plurality of time points. May be good. The process shown in FIG. 4 may be performed in real time in parallel with the image taken by the photographing apparatus 110, or may be performed after the image taken based on the accumulated photographed image. In any case, FIG. 4 is based on a photographed image taken in a situation where a predetermined subject to be extracted is not included in the photographing range of the photographing apparatus 110, such as before the start of a game in which there is no player 5001 in the field 2001. The process is executed.

図４に示す処理は、ＣＰＵ１１１がＲＯＭ１１３に格納されたプログラムをＲＡＭ１１２に展開して実行することで実現される。なお、図４に示す処理の少なくとも一部を、ＣＰＵ１１１とは異なる専用の１又は複数のハードウェアやＧＰＵにより実現してもよい。 The process shown in FIG. 4 is realized by the CPU 111 expanding the program stored in the ROM 113 into the RAM 112 and executing the program. It should be noted that at least a part of the processing shown in FIG. 4 may be realized by one or a plurality of dedicated hardware or GPU different from the CPU 111.

Ｓ４０２０において、検出部１０２は、背景変化領域を検出するモードにおいて撮影装置１１０から取得した動画のフレームから対象フレームを決定し、対象フレームと対象フレームより前の時点のフレームとの画素値の差分を算出する。例えば、検出部１０２は時刻Ｔに撮影された図２（ａ）に示す撮影画像２０００と時刻Ｔ＋２に撮影された図２（ｃ）に示す撮影画像２２００とを比較して、対応する画素の画素値の差分を算出する。この場合の対象フレームは撮影画像２２００である。撮影画像２０００におけるある画素の画素値を（Ｒ０、Ｇ０、Ｂ０）とし、撮影画像２２００における対応する画素の画素値を（Ｒ１、Ｇ１，Ｂ１）としたとき、以下の式（１）により画素値の差分値δｄ（差の絶対値）が求められる。
δｄ＝｜Ｒ０－Ｒ１｜＋｜Ｇ０－Ｇ１｜＋｜Ｂ０－Ｂ１｜…（１） In S4020, the detection unit 102 determines the target frame from the frame of the moving image acquired from the photographing device 110 in the mode of detecting the background change region, and determines the difference in the pixel value between the target frame and the frame at the time point before the target frame. calculate. For example, the detection unit 102 compares the captured image 2000 shown in FIG. 2 (a) captured at time T with the captured image 2200 shown in FIG. 2 (c) captured at time T + 2, and the pixels of the corresponding pixels. Calculate the difference between the values. The target frame in this case is the captured image 2200. When the pixel value of a certain pixel in the captured image 2000 is (R0, G0, B0) and the pixel value of the corresponding pixel in the captured image 2200 is (R1, G1, B1), the pixel value is calculated by the following equation (1). The difference value δd (absolute value of the difference) is obtained.
δd = | R0-R1 | + | G0-G1 | + | B0-B1 | ... (1)

撮影画像２０００と撮影画像２２００から算出された各画素の差分値を図３（ａ）に示す。ディスプレイ２００２の表示面に対応する画素の画素値は時刻Ｔにおいて（２２０，１０，１０）であり、時刻Ｔ＋２では（１０，２３０，１０）となる。そのため、ディスプレイ２００２の表示面に対応する画素の画素値の差分値δｄは以下の計算により４３０となる。
δｄ＝｜２２０－１０｜＋｜１０－２３０｜＋｜１０－１０｜＝４３０…（２） FIG. 3A shows the difference value of each pixel calculated from the captured image 2000 and the captured image 2200. The pixel value of the pixel corresponding to the display surface of the display 2002 is (220,10,10) at time T and (10,230,10) at time T + 2. Therefore, the difference value δd of the pixel values of the pixels corresponding to the display surface of the display 2002 is 430 by the following calculation.
δd = | 220-10 | + | 10-230 | + | 10-10 | = 430 ... (2)

検出部１０２は、対象フレームの全画素について上記の差分値を算出する。算出された差分値のうち、ディスプレイ２００３、ディスプレイ２００４、フィールド２００１、及びライン２００５に対応する画素それぞれの差分値は、例えば図３（ａ）に示す値となる。具体的には、ディスプレイ２００３の表示面に対応する画素の差分値およびディスプレイ２００４に対応する画素の差分値は、式（１）に示す演算によりそれぞれ４５０及び４４０となる。なお、ディスプレイ２００２－２００４による表示画像が表示面内の位置によって異なる画素値を有する場合は、表示面内の位置ごとに異なる差分値が算出されうる。 The detection unit 102 calculates the above difference value for all the pixels of the target frame. Among the calculated difference values, the difference values of the pixels corresponding to the display 2003, the display 2004, the field 2001, and the line 2005 are, for example, the values shown in FIG. 3A. Specifically, the difference value of the pixel corresponding to the display surface of the display 2003 and the difference value of the pixel corresponding to the display 2004 are 450 and 440, respectively, by the calculation shown in the equation (1). When the image displayed by the displays 2002-2004 has different pixel values depending on the position in the display surface, a different difference value can be calculated for each position in the display surface.

一方、フィールド２００１に対応する画素およびライン２００５に対応する画素については、動的に変化する表示面とは異なり、時刻Ｔの撮影画像２０００と時刻Ｔ＋２の撮影画像２２００との間で画素値はほぼ変化しない。具体的には、図３（ａ）に示すように、フィールド２００１に対応する画素の時刻Ｔにおける画素値は（１８０、２３０、３０）となり、時刻Ｔ＋２における画素値は（１７８，２２８，２８）となる。そのため、フィールド２００１に対応する画素の画素値の差分値δｄは以下の計算により６となる。
δｄ＝｜１８０－１７８｜＋｜２３０－２２８｜＋｜３０－２８｜＝６…（３） On the other hand, for the pixels corresponding to the field 2001 and the pixels corresponding to the line 2005, the pixel values are almost the same between the captured image 2000 at time T and the captured image 2200 at time T + 2, unlike the dynamically changing display surface. It does not change. Specifically, as shown in FIG. 3A, the pixel value of the pixel corresponding to the field 2001 at time T is (180, 230, 30), and the pixel value at time T + 2 is (178, 228, 28). It becomes. Therefore, the difference value δd of the pixel values of the pixels corresponding to the field 2001 is 6 by the following calculation.
δd = | 180-178 | + | 230-228 | + | 30-28 | = 6 ... (3)

ライン２００５に対応する画素についても、同様の計算により差分値δｄは３となる。対象フレームの全画素に対して差分値の算出が行われると、Ｓ４０３０に遷移する。なお、本実施形態では対象フレームの全画素に対して差分値の算出が行われるものとするが、これに限らない。例えば、背景変化領域の候補となる範囲が予め設定されている場合やユーザにより指定される場合などには、検出部１０２は、その設定された範囲や指定された範囲に含まれる画素に対してのみ差分値の算出を行ってもよい。 For the pixels corresponding to the line 2005, the difference value δd is 3 by the same calculation. When the difference value is calculated for all the pixels of the target frame, the process transitions to S4030. In this embodiment, the difference value is calculated for all the pixels of the target frame, but the present invention is not limited to this. For example, when the range that is a candidate for the background change area is preset or specified by the user, the detection unit 102 detects the set range or the pixels included in the specified range. Only the difference value may be calculated.

Ｓ４０３０において、検出部１０２は、複数の対象フレームについて算出された差分値δｄを積算する。ここでは検出部１０２が図４のフローにおいてＳ４０３０の処理を行うのが初回なので、前述のＳ４０１０で算出された画素差分値δｄが積算値として記憶される。 In S4030, the detection unit 102 integrates the difference values δd calculated for the plurality of target frames. Here, since the detection unit 102 performs the processing of S4030 for the first time in the flow of FIG. 4, the pixel difference value δd calculated in the above-mentioned S4010 is stored as an integrated value.

Ｓ４０４０において、検出部１０２は、所定数の対象フレームについての差分値δｄの積算が完了したか判断する。本実施形態では検出部１０２が積算の対象とするフレームの数を１００とする。積算対象のフレームの数は例えばユーザによる操作に基づいて設定される。ただしこれに限らず、積算対象のフレームの数は、例えば撮影画像の時間経過に伴う変化量などに応じて自動で設定されてもよい。所定数の対象フレームについての差分値δｄの積算が完了していないと判断された場合、検出部１０２は、差分値を算出したフレームより後の時点のフレームに対象フレームを変更し、Ｓ４０２０に戻る。 In S4040, the detection unit 102 determines whether or not the integration of the difference values δd for the predetermined number of target frames is completed. In the present embodiment, the number of frames targeted by the detection unit 102 for integration is 100. The number of frames to be integrated is set based on, for example, an operation by the user. However, the present invention is not limited to this, and the number of frames to be integrated may be automatically set according to, for example, the amount of change of the captured image with the passage of time. When it is determined that the integration of the difference value δd for the predetermined number of target frames has not been completed, the detection unit 102 changes the target frame to a frame after the frame in which the difference value is calculated, and returns to S4020. ..

例えば、検出部１０２は、時刻Ｔ＋２において撮影された撮影画像２２００と時刻Ｔ＋４において撮影された撮影画像との画素値の差分を算出する。ここでは説明を簡単にするために、時刻Ｔ＋２と時刻Ｔ＋４との間の画素値の差分が、前述の時刻Ｔと時刻Ｔ＋２との間の画素値の差分と同じ値であるものとする。すなわち、ディスプレイ２００２、ディスプレイ２００３、及びディスプレイ２００４の表示面に対応する画素の時刻Ｔ＋２と時刻Ｔ＋４との間における画素値の差分値δｄはそれぞれ４３０、４５０、及び４４０となる。また、フィールド２００１およびライン２００５に対応する画素の画素値の差分値δｄはそれぞれ６および３となる。そして検出部１０２は、算出した差分値をＳ４０３０において記憶済みの積算値に加算する。 For example, the detection unit 102 calculates the difference in pixel value between the captured image 2200 captured at time T + 2 and the captured image captured at time T + 4. Here, for the sake of simplicity, it is assumed that the difference in pixel values between time T + 2 and time T + 4 is the same as the difference in pixel values between time T and time T + 2 described above. That is, the difference values δd of the pixel values between the time T + 2 and the time T + 4 of the pixels corresponding to the display surfaces of the display 2002, the display 2003, and the display 2004 are 430, 450, and 440, respectively. Further, the difference values δd of the pixel values of the pixels corresponding to the field 2001 and the line 2005 are 6 and 3, respectively. Then, the detection unit 102 adds the calculated difference value to the integrated value stored in S4030.

検出部１０２は、Ｓ４０２０における差分値の算出と、Ｓ４０３０における積算値への加算を、所定のフレーム数である１００フレーム分繰り返す。そして１００フレーム分の差分値が積算された結果、ディスプレイ２００２、ディスプレイ２００３、及びディスプレイ２００４の表示面に対応する画素の差分値δｄの積算値はそれぞれ４３０００、４５０００，４４０００となる。また、フィールド２００１およびライン２００５に対応する画素の差分値δｄの積算値はそれぞれ６００および３００となる。 The detection unit 102 repeats the calculation of the difference value in S4020 and the addition to the integrated value in S4030 for 100 frames, which is a predetermined number of frames. As a result of integrating the difference values for 100 frames, the integrated values of the difference values δd of the pixels corresponding to the display surfaces of the display 2002, the display 2003, and the display 2004 are 43000, 45000, and 44000, respectively. Further, the integrated values of the difference values δd of the pixels corresponding to the field 2001 and the line 2005 are 600 and 300, respectively.

検出部１０２が算出した差分値δｄの積算値の例を、図３（ｂ）を用いて説明する。図３（ｂ）は、図２（ｂ）の撮影画像２１００における線分２１０５上の画素についての積算値を示している。横軸は撮影画像の水平方向の画素位置を表しており、縦軸は積算値を表している。 An example of the integrated value of the difference value δd calculated by the detection unit 102 will be described with reference to FIG. 3 (b). FIG. 3B shows the integrated values for the pixels on the line segment 2105 in the captured image 2100 of FIG. 2B. The horizontal axis represents the horizontal pixel position of the captured image, and the vertical axis represents the integrated value.

ディスプレイ２００２の表示面に対応する画素の水平方向の座標値は３００～７５０となり、これらの画素の差分値δｄの積算値は４３０００となっている。一方、ディスプレイ２００２－２００４の表示面に対応しない画素（水平方向の座標値が０～２９９、７５１～８９９、及び１３５１～１４９９の画素）の差分値δｄの積算値は６００となっている。 The horizontal coordinate values of the pixels corresponding to the display surface of the display 2002 are 300 to 750, and the integrated value of the difference value δd of these pixels is 43000. On the other hand, the integrated value of the difference value δd of the pixels (pixels having horizontal coordinate values of 0 to 299, 751 to 899, and 1351 to 1499) that do not correspond to the display surface of the display 2002-2004 is 600.

Ｓ４０４０において、所定数の対象フレームについての差分値δｄの積算が完了したと判断された場合、Ｓ４０５０に遷移する。Ｓ４０５０において、検出部１０２は、積算値が閾値以上であるか各画素について評価を行う。例えば図３（ｂ）に示すような積算値が得られた場合に、閾値３１０１の値を３００００として評価が行われる。閾値３１０１はユーザの操作に応じて設定されてもよいし、積算値の平均などに基づいて自動で設定されてもよい。 When it is determined in S4040 that the integration of the difference values δd for a predetermined number of target frames is completed, the process transitions to S4050. In S4050, the detection unit 102 evaluates each pixel to see if the integrated value is equal to or greater than the threshold value. For example, when the integrated value as shown in FIG. 3B is obtained, the evaluation is performed with the value of the threshold value 3101 as 30,000. The threshold value 3101 may be set according to the operation of the user, or may be automatically set based on the average of the integrated values and the like.

前述のように、ディスプレイ２００２の表示面に対応する画素の差分値δｄの積算値は４３０００となっているため、閾値３１０１の値である３００００以上である。同様に、ディスプレイ２００３及びディスプレイ２００４の表示面に対応する画素の積算値も３００００以上である。そして検出部１０２は、積算値が閾値３１０１以上である画素を、背景変化領域として検出する。その結果、ディスプレイ２００２－２００４の表示面に対応する領域を含む背景変化領域が検出される。一方、ディスプレイ２００２－２００４以外の領域、すなわちフィールド２００１に対応する領域やライン２００５に対応する領域は、積算値が閾値３１０１未満となるため、背景変化領域に含まれない。 As described above, since the integrated value of the difference value δd of the pixels corresponding to the display surface of the display 2002 is 43000, it is 30,000 or more, which is the value of the threshold value 3101. Similarly, the integrated value of the pixels corresponding to the display surfaces of the display 2003 and the display 2004 is 30,000 or more. Then, the detection unit 102 detects the pixels whose integrated value is the threshold value 3101 or more as the background change region. As a result, the background change area including the area corresponding to the display surface of the display 2002-2004 is detected. On the other hand, the area other than the display 2002-2004, that is, the area corresponding to the field 2001 and the area corresponding to the line 2005 are not included in the background change area because the integrated value is less than the threshold value 3101.

Ｓ４０５０における検出処理が終わると、Ｓ４１００において図４に示す処理が終了する。以上のように、検出部１０２は、ユーザによる所定操作により指定される期間内（例えば撮影装置１１０の撮影範囲内に抽出すべき所定の被写体が含まれない期間内）における複数の異なるタイミングにおいて複数の撮影画像を撮影する。そして、撮影した複数の撮影画像における対応する画素の画素値の差分を算出する。そして検出部１０２は、撮影装置１１０により撮影される撮影画像における画素位置であって、算出された差分が閾値以上である画素の画素位置を、前景領域の抽出の対象としない背景変化領域の画素位置として特定する。 When the detection process in S4050 is completed, the process shown in FIG. 4 is completed in S4100. As described above, the detection unit 102 is a plurality of detection units 102 at a plurality of different timings within a period designated by a predetermined operation by the user (for example, within a period during which a predetermined subject to be extracted is not included in the shooting range of the shooting device 110). Take a picture of. Then, the difference between the pixel values of the corresponding pixels in the plurality of captured images is calculated. Then, the detection unit 102 is a pixel position in the captured image captured by the imaging device 110, and the pixel position of the pixel whose calculated difference is equal to or larger than the threshold value is not the pixel position of the background change region to be extracted in the foreground region. Specify as a position.

なお、本実施形態では画像処理装置１００が複数の画像の差分に基づいて背景変化領域の画素位置を特定する場合を中心に説明するが、特定方法はこれに限らない。例えば、画像処理装置１００は、ユーザによる操作に応じて背景変化領域の画素位置を特定してもよい。 In the present embodiment, the case where the image processing device 100 specifies the pixel position of the background change region based on the difference between a plurality of images will be mainly described, but the specifying method is not limited to this. For example, the image processing device 100 may specify the pixel position of the background change region according to the operation by the user.

図３（ｃ）は検出された背景変化領域を示しており、ディスプレイ２００２―２００４の表示面に対応する領域が背景変化領域２３０１－２３０３として検出されている。なお、本実施形態において検出部１０２は、撮影タイミングの異なる複数の撮影画像における対応する画素の画素値の差分として差分値δｄの積算値を算出し、背景変化領域を検出するものとした。具体的には、検出部１０２は、同一の撮影装置による時間的に連続する複数の撮影画像の撮影期間における画素値の変化量が閾値以上である画素位置を、前景領域の抽出の対象としない画素位置として特定するものとした。このような方法を用いることで、例えばディスプレイ２００２－２００４の表示画像が徐々に変化する場合や、表示画像が周期的に切り替わるような場合など、様々な場合において背景変化領域を検出できる。 FIG. 3C shows the detected background change region, and the region corresponding to the display surface of the display 2002-2004 is detected as the background change region 2301-2303. In the present embodiment, the detection unit 102 calculates the integrated value of the difference value δd as the difference between the pixel values of the corresponding pixels in a plurality of captured images having different imaging timings, and detects the background change region. Specifically, the detection unit 102 does not target the pixel position in which the amount of change in the pixel value during the shooting period of a plurality of time-continuously shot images by the same shooting device is equal to or greater than the threshold value to be extracted in the foreground region. It is specified as the pixel position. By using such a method, the background change region can be detected in various cases such as when the display image of the display 2002-2004 gradually changes or when the display image is periodically switched.

ただしこれに限らず、検出部１０２は、背景差分法によって背景変化領域を検出してもよい。例えば、ディスプレイ２００２－２００４に画像が表示されていない状況における撮影画像を背景画像とする。そして検出部１０２は、ディスプレイ２００２－２００４に画像が表示されている状況における撮影画像の画素のうち、背景画像の対応する画素との画素値の差が閾値以上である画素により構成される領域を、背景変化領域として検出してもよい。このような方法によっても、図４で説明した方法と同様の背景変化領域が検出できる。 However, the present invention is not limited to this, and the detection unit 102 may detect the background change region by the background subtraction method. For example, a captured image in a situation where an image is not displayed on the displays 2002-2004 is used as a background image. Then, the detection unit 102 defines a region composed of pixels of the captured image in the situation where the image is displayed on the display 2002-2004, in which the difference in pixel value from the corresponding pixel of the background image is equal to or larger than the threshold value. , May be detected as a background change area. Even with such a method, a background change region similar to the method described with reference to FIG. 4 can be detected.

また、背景変化領域としてディスプレイ２００２－２００４に対応する領域を検出する方法としては、その他にも様々な方法を用いることができる。例えば画像処理装置１００は、ディスプレイ２００２－２００４の位置及び形状の少なくとも何れかに関する情報を取得し、取得した情報に基づいて背景変化領域を特定してもよい。ディスプレイ２００２－２００４の位置及び形状の少なくとも何れかに関する情報としては、具体的には、ディスプレイを表す画像やディスプレイに表示される画像、ディスプレイの３次元モデル、ユーザによる操作に応じた情報などが挙げられる。 In addition, various other methods can be used as a method for detecting the area corresponding to the display 2002-2004 as the background change area. For example, the image processing device 100 may acquire information regarding at least one of the positions and shapes of the displays 2002-2004, and specify the background change region based on the acquired information. Specific examples of the information regarding at least one of the positions and shapes of the displays 2002-2004 include an image representing the display, an image displayed on the display, a three-dimensional model of the display, and information according to the operation by the user. Be done.

例えば、画像処理装置１００は、ディスプレイ２００２－２００４を表す画像を取得し、取得した画像と撮影画像とを照合することで背景変化領域を検出してもよい。また画像処理装置１００は、ディスプレイ２００２－２００４に表示されるマーカーを撮影画像から検出することで、背景変化領域を検出してもよい。また画像処理装置１００は、ディスプレイ２００２－２００４を含む競技場の３次元モデルなどの設計情報を参照してディスプレイ２００２－２００４の位置を特定することで背景変化領域を検出してもよい。また画像処理装置１００は、ディスプレイ２００２－２００４の位置や形状をユーザに指定させるための画像を表示し、ユーザによる指定操作に基づいてディスプレイ２００２－２００４の位置や形状を特定することで、背景変化領域を検出してもよい。さらに画像処理装置１００は、撮影画像の画素の輝度情報に基づいて背景変化領域を検出してもよい。なお、ディスプレイ２００２－２００４とは異なる被写体に対応する領域を背景変化領域として特定する場合も、上記と同様の種々の方法を用いることができる。 For example, the image processing device 100 may acquire an image representing the display 2002-2004 and detect the background change region by collating the acquired image with the captured image. Further, the image processing device 100 may detect the background change region by detecting the marker displayed on the display 2002-2004 from the captured image. Further, the image processing device 100 may detect the background change region by specifying the position of the display 2002-2004 with reference to design information such as a three-dimensional model of the stadium including the display 2002-2004. Further, the image processing device 100 displays an image for allowing the user to specify the position and shape of the display 2002-2004, and specifies the position and shape of the display 2002-2004 based on the designated operation by the user to change the background. Regions may be detected. Further, the image processing apparatus 100 may detect a background change region based on the luminance information of the pixels of the captured image. In addition, when specifying a region corresponding to a subject different from the display 2002-2004 as a background change region, various methods similar to the above can be used.

［前景背景分離］
次に、分離部１０１による前景背景分離について説明する。図５（ａ）は、撮影装置１１０が撮影し分離部１０１に入力される撮影画像５０００の例を示す。撮影画像５０００は、前述の背景変化領域情報を検出するために撮影された撮影画像とは異なり、スタジアムにおいて試合が行われている場面などにおいて撮影された画像である。そのため、撮影画像５０００は、撮影範囲内のフィールド２００１上に選手５００１がいる状況であって、撮影範囲内にディスプレイ２００２－２００４の表示面が含まれる状況において撮影された画像である。なお、ディスプレイ２００２―２００４に表示される画像は、リハーサル時と同様に変化をしている。分離部１０１は、撮影画像５０００に含まれる選手５００１の領域を前景背景分離により抽出する。 [Foreground background separation]
Next, the foreground background separation by the separation unit 101 will be described. FIG. 5A shows an example of a photographed image 5000 photographed by the photographing apparatus 110 and input to the separation unit 101. The captured image 5000 is an image captured in a scene where a game is being played at a stadium, unlike the captured image captured for detecting the background change area information described above. Therefore, the captured image 5000 is an image captured in a situation where the player 5001 is on the field 2001 within the photographing range and the display surface of the display 2002-2004 is included in the photographing range. The images displayed on the displays 2002-2004 are changing as in the rehearsal. The separation unit 101 extracts the region of the player 5001 included in the captured image 5000 by foreground background separation.

図６を用いて、分離部１０１による前景背景分離の動作について説明する。図６に示す処理は、画像処理装置１００が前景背景分離を行うモードにおいて撮影装置１１０から撮影画像を取得したタイミングで開始される（Ｓ６０１０）。ただし、図６に示す処理の開始タイミングはこれに限らない。画像処理装置１００のモードは例えばユーザによる操作に応じて設定される。具体的には、被写体領域を検出するための前景背景分離の対象となる画像の撮影期間がユーザによる所定操作に応じて指定されることで、画像処理装置１００は前景背景分離を行うモードに設定される。ここで指定される撮影期間は、例えば競技場における試合中など、撮影装置１１０の撮影範囲内に選手などの所定の被写体が含まれる期間である。なお、前景背景分離のための撮影期間は、図４で説明した背景変化領域の検出のための撮影期間を指定する操作とは異なる操作により指定されるものとするが、これに限らず、同じ操作が何回目に行われたかに応じてモードが変更されてもよい。 The operation of the foreground background separation by the separation unit 101 will be described with reference to FIG. The process shown in FIG. 6 is started at the timing when the captured image is acquired from the photographing device 110 in the mode in which the image processing device 100 performs the foreground background separation (S6010). However, the start timing of the process shown in FIG. 6 is not limited to this. The mode of the image processing device 100 is set according to, for example, an operation by the user. Specifically, the image processing apparatus 100 is set to the mode for performing the foreground background separation by designating the shooting period of the image to be the foreground background separation for detecting the subject area according to a predetermined operation by the user. Will be done. The shooting period designated here is a period in which a predetermined subject such as a player is included in the shooting range of the shooting device 110, for example, during a game in a stadium. The shooting period for separating the foreground and background is specified by an operation different from the operation for specifying the shooting period for detecting the background change area described in FIG. 4, but is not limited to this. The mode may be changed depending on how many times the operation is performed.

図６に示す処理は撮影装置１１０による撮影と並行してリアルタイムで行われてもよいし、蓄積された撮影画像に基づいて撮影後に行われてもよい。いずれの場合においても、フィールド２００１内に選手５００１がいる試合中など、撮影装置１１０の撮影範囲内に抽出すべき所定の被写体が含まれる状況において撮影された撮影画像に基づいて図６の処理が実行される。 The process shown in FIG. 6 may be performed in real time in parallel with the image taken by the photographing apparatus 110, or may be performed after the image taken based on the accumulated photographed image. In any case, the processing of FIG. 6 is performed based on the captured image taken in a situation where a predetermined subject to be extracted is included in the shooting range of the shooting device 110, such as during a game in which the player 5001 is in the field 2001. Will be executed.

図６に示す処理は、ＣＰＵ１１１がＲＯＭ１１３に格納されたプログラムをＲＡＭ１１２に展開して実行することで実現される。なお、図６に示す処理の少なくとも一部を、ＣＰＵ１１１とは異なる専用の１又は複数のハードウェアやＧＰＵにより実現してもよい。 The process shown in FIG. 6 is realized by the CPU 111 expanding the program stored in the ROM 113 into the RAM 112 and executing the program. It should be noted that at least a part of the processing shown in FIG. 6 may be realized by one or a plurality of dedicated hardware or GPU different from the CPU 111.

Ｓ６０２０において、分離部１０１は、前景背景分離を行うモードにおいて撮影装置１１０から取得した動画のフレームから対象フレームを決定し、対象フレームである撮影画像５０００と背景画像５１００との差分を算出する。対象フレームは、前景領域の抽出の対象となる画像（抽出対象画像）であり、図６のフローにおいてＳ６０２０が初めて実行される場合には例えば取得した動画の最初のフレームに決定される。背景画像５１００は、抽出対象画像としての撮影画像５０００とは異なるタイミングでの撮影装置１１０による撮影に基づく画像であって予め記憶されており、抽出すべき所定の被写体を含まない画像である。 In S6020, the separation unit 101 determines the target frame from the frame of the moving image acquired from the photographing device 110 in the mode of performing the foreground background separation, and calculates the difference between the captured image 5000 and the background image 5100 which are the target frames. The target frame is an image (extraction target image) to be extracted in the foreground region, and when S6020 is executed for the first time in the flow of FIG. 6, for example, it is determined to be the first frame of the acquired moving image. The background image 5100 is an image based on shooting by the shooting device 110 at a timing different from that of the shot image 5000 as the extraction target image, is stored in advance, and does not include a predetermined subject to be extracted.

Ｓ６０２０の処理が実行されるのが初回である場合には、例えば試合前のリハーサル中に撮影された図５（ｂ）に示すような撮影画像が背景画像５１００として用いられる。すなわちこの場合、図４で説明した背景変化領域を検出するための撮影期間における撮影装置１１０による撮影に基づく画像が背景画像となる。そして、背景差分法を用いて撮影画像５０００と背景画像５１００の差分を算出することにより、図５（ｃ）に示すような差分領域画像５２００が得られる。差分領域画像５２００は具体的には、撮影画像５０００内の画素のうち、背景画像５１００内の対応する画素との画素値の差分（差の絶対値）が閾値以上である画素により構成される画像である。差分領域画像５２００には、前景領域として抽出すべき選手５００１の他に、ディスプレイ２００２―２００４の表示面に表示される広告画像２２１２―２２１４が含まれる。 When the process of S6020 is executed for the first time, for example, a captured image as shown in FIG. 5 (b) captured during the pre-match rehearsal is used as the background image 5100. That is, in this case, the image based on the shooting by the shooting device 110 during the shooting period for detecting the background change region described with reference to FIG. 4 becomes the background image. Then, by calculating the difference between the captured image 5000 and the background image 5100 using the background subtraction method, a difference region image 5200 as shown in FIG. 5C can be obtained. Specifically, the difference region image 5200 is an image composed of pixels in the captured image 5000 whose pixel value difference (absolute value of difference) from the corresponding pixel in the background image 5100 is equal to or larger than the threshold value. Is. The difference area image 5200 includes the advertisement image 2212-2214 displayed on the display surface of the display 2002-2004 in addition to the player 5001 to be extracted as the foreground area.

Ｓ６０３０において、分離部１０１は、図４を用いて説明した処理において検出された背景変化領域を示す情報を検出部１０２から取得する。前述のように、検出された背景変化領域は例えば図３（ｃ）に示す背景変化領域２３０１－２３０３のようになり、これらはディスプレイ２００２―２００４の表示面に対応する。 In S6030, the separation unit 101 acquires information indicating the background change region detected in the process described with reference to FIG. 4 from the detection unit 102. As described above, the detected background change region becomes, for example, the background change region 2301-2303 shown in FIG. 3 (c), which corresponds to the display surface of the display 2002-2004.

Ｓ６０４０において、分離部１０１は、差分領域画像５２００から背景変化領域２３０１－２３０３と重なる領域を除外することで、図５（ｄ）に示すような前景画像５３００を生成する。すなわち、分離部１０１は、差分領域画像５２００に含まれる画素のうち背景変化領域として特定された画素位置とは異なる画素により構成される領域を、前景領域として抽出する。前景画像５３００にはディスプレイ２００２－２００４の表示面に表示される広告画像２２１２－２２１４が含まれず、選手５００１のみが含まれる。 In S6040, the separation unit 101 generates the foreground image 5300 as shown in FIG. 5D by excluding the region overlapping the background change region 2301-2303 from the difference region image 5200. That is, the separation unit 101 extracts a region of the pixels included in the difference region image 5200, which is composed of pixels different from the pixel position specified as the background change region, as the foreground region. The foreground image 5300 does not include the advertising image 2212-2214 displayed on the display surface of the display 2002-2004, but includes only the player 5001.

Ｓ６０５０において、分離部１０１は、差分領域画像５２００内の背景変化領域に含まれる画素の画素値を、記憶済みの背景画像５１００に上書きする。すなわち、分離部１０１は、差分領域画像５２００内の画素のうち、前景領域として抽出される領域に含まれない画素に基づいて、背景画像５１００を更新する。背景画像５１００が更新されることで、図５（ｅ）に示すような新たな背景画像５４００が得られる。ここで得られる背景画像５４００は、前景背景分離のための撮影期間内における撮影装置１１０による撮影に基づく画像となる。そして、図６のフローにおいてＳ６０２０の処理が再度行われる場合には、新たな対象フレームと更新された背景画像５４００との差分が算出される。この際の差分の算出には、フレーム間差分法が用いられてもよい。 In S6050, the separation unit 101 overwrites the pixel values of the pixels included in the background change area in the difference area image 5200 with the stored background image 5100. That is, the separation unit 101 updates the background image 5100 based on the pixels in the difference region image 5200 that are not included in the region extracted as the foreground region. By updating the background image 5100, a new background image 5400 as shown in FIG. 5 (e) can be obtained. The background image 5400 obtained here is an image based on shooting by the shooting device 110 within the shooting period for separating the foreground and background. Then, when the processing of S6020 is performed again in the flow of FIG. 6, the difference between the new target frame and the updated background image 5400 is calculated. The inter-frame difference method may be used to calculate the difference at this time.

なお、設置されたディスプレイ２００２－２００４の表示面のように、背景変化領域として検出される領域が固定されている場合、分離部１０１は背景画像５１００を更新しなくてもよい。また、ディスプレイ２００２―２００４に表示される画像の画像データが取得可能な場合には、差分領域画像を用いる代わりに当該画像データを用いて背景画像５１００を更新してもよい。 When the area detected as the background change area is fixed, such as the display surface of the installed display 2002-2004, the separation unit 101 does not have to update the background image 5100. If the image data of the image displayed on the displays 2002-2004 can be acquired, the background image 5100 may be updated using the image data instead of using the difference region image.

Ｓ６０６０において、分離部１０１は、Ｓ６０４０において抽出された前景領域の画像（前景画像５３００）と、Ｓ６０５０において生成された背景画像５４００とを、分離して画像処理サーバ１２０に出力する。前景画像５３００と背景画像５４００とを分離された識別可能な画像として取得することにより、画像処理サーバ１２０は、所定の被写体の三次元モデルと背景の三次元モデルをそれぞれ生成して仮想視点画像を生成することができる。 In S6060, the separation unit 101 separates the image of the foreground region (foreground image 5300) extracted in S6040 and the background image 5400 generated in S6050 and outputs the image to the image processing server 120. By acquiring the foreground image 5300 and the background image 5400 as separated and identifiable images, the image processing server 120 generates a three-dimensional model of a predetermined subject and a three-dimensional model of the background, respectively, to generate a virtual viewpoint image. Can be generated.

Ｓ６０６０の処理が終了すると、Ｓ６０７０に遷移し、分離部１０１は前景背景分離を行うモードにおいて撮影装置１１０から取得した動画の全フレームについてＳ６０２０―Ｓ６０６０の処理が完了したか判断する。全フレームについての処理が完了していないと判断された場合、分離部１０１は、対象フレームを次のフレームに変更し、Ｓ６０２０に戻る。一方、全フレームについて処理が完了したと判断された場合、Ｓ６５００において図６の処理が終了する。 When the processing of S6060 is completed, the process transitions to S6070, and the separation unit 101 determines whether the processing of S6020-S6060 is completed for all the frames of the moving image acquired from the photographing device 110 in the mode of performing the foreground background separation. When it is determined that the processing for all the frames is not completed, the separation unit 101 changes the target frame to the next frame and returns to S6020. On the other hand, when it is determined that the processing is completed for all the frames, the processing of FIG. 6 ends in S6500.

なお、図６を用いた上記の説明においては、分離部１０１が、撮影画像５０００内の画素のうち背景画像５１００内の対応する画素との画素値の差分が閾値以上である画素により構成される領域から、背景変化領域を除くことで、前景領域を抽出するものとした。ただしこれに限らず、分離部１０１は、撮影画像５０００の部分画像であって背景変化領域に含まれない画素により構成される部分画像と、背景画像５１００内の当該部分画像に対応する領域との差分を判定することで、前景領域を抽出してもよい。この方法によれば、画素値の差分を算出する対象となる画素の数が少なくなるため、画像処理装置１００の処理負荷を低減することができる。いずれの方法においても、撮影画像５０００と背景画像５１００との差分に基づいて、背景変化領域に含まれない画素により構成される前景領域が撮影画像５０００から抽出される。 In the above description using FIG. 6, the separation unit 101 is composed of pixels in the captured image 5000 whose pixel value difference from the corresponding pixel in the background image 5100 is equal to or greater than the threshold value. The foreground area is extracted by removing the background change area from the area. However, not limited to this, the separation unit 101 includes a partial image that is a partial image of the captured image 5000 and is composed of pixels that are not included in the background change region, and a region corresponding to the partial image in the background image 5100. The foreground area may be extracted by determining the difference. According to this method, the number of pixels to be calculated for the difference between the pixel values is reduced, so that the processing load of the image processing apparatus 100 can be reduced. In either method, a foreground region composed of pixels not included in the background change region is extracted from the captured image 5000 based on the difference between the captured image 5000 and the background image 5100.

また、本実施形態においては、前景領域を抽出する処理の対象となる撮影画像５０００よりも前に撮影装置１１０により撮影された複数の撮影画像の差分に基づいて背景変化領域が特定される場合を中心に説明した。ただしこれに限らず、例えば、試合中に撮影された撮影画像５０００からの前景領域の抽出処理が試合後に行われるような場合には、試合後に撮影された複数の撮影画像の差分に基づいて背景変化領域が特定されてもよい。 Further, in the present embodiment, there is a case where the background change area is specified based on the difference between a plurality of captured images captured by the imaging device 110 before the captured image 5000 to be the target of the process of extracting the foreground region. I explained mainly. However, the present invention is not limited to this, and for example, when the foreground area is extracted from the captured images 5000 captured during the match after the match, the background is based on the difference between the plurality of captured images captured after the match. The area of change may be identified.

以上説明したように、本実施形態に係る画像処理装置１００は、撮影装置１１０による撮影に基づく画像から所定の被写体に対応する被写体領域（前景領域）を抽出する。具体的には、画像処理装置１００は、第１所定操作により指定される第１期間内における複数のタイミングでの撮影に基づく複数の画像の差分に基づいて、抽出対象画像の内部における被写体領域の抽出の対象としない領域（背景変化領域）を特定する。ここで抽出対象画像は、上記の第１所定操作とは異なる第２所定操作により指定される第２期間内における撮影装置１１０による撮影に基づく画像である。そして画像処理装置１００は、抽出対象画像と、抽出対象画像の撮影タイミングとは異なるタイミングでの撮影装置１１０による撮影に基づく別の画像との差分に基づいて、抽出対象画像の内部の被写体領域を抽出する。このようにして抽出される被写体領域は、抽出対象としない領域として特定された領域に含まれない画素により構成される。 As described above, the image processing device 100 according to the present embodiment extracts a subject area (foreground area) corresponding to a predetermined subject from an image based on the image taken by the image pickup device 110. Specifically, the image processing device 100 determines the subject area inside the extraction target image based on the difference between a plurality of images based on shooting at a plurality of timings within the first period designated by the first predetermined operation. Specify the area (background change area) that is not the target of extraction. Here, the extraction target image is an image based on photography by the photographing apparatus 110 within the second period designated by the second predetermined operation different from the first predetermined operation described above. Then, the image processing device 100 determines the subject area inside the extraction target image based on the difference between the extraction target image and another image based on the shooting by the shooting device 110 at a timing different from the shooting timing of the extraction target image. Extract. The subject area extracted in this way is composed of pixels not included in the area specified as the area not to be extracted.

このような構成によれば、複数の画像の差分となる領域のうちの特定の領域を抽出することができる。例えば、撮影画像内に移動する所定の被写体と動画像を表示するディスプレイとが含まれる場合に、画像処理装置１００は所定の被写体の領域だけを抽出することができる。そのため、画像処理装置１００から抽出結果を取得する画像処理サーバ１２０は、所定の被写体の形状を正確に特定することができ、当該被写体を含む高画質な仮想視点画像を生成することができる。 According to such a configuration, it is possible to extract a specific region among the regions that are the differences between a plurality of images. For example, when a predetermined subject moving in the captured image and a display displaying a moving image are included, the image processing device 100 can extract only the region of the predetermined subject. Therefore, the image processing server 120 that acquires the extraction result from the image processing device 100 can accurately specify the shape of a predetermined subject, and can generate a high-quality virtual viewpoint image including the subject.

なお、本実施形態においては、撮影画像内のディスプレイ２００２－２００４の表示面に対応する領域が背景変化領域として検出され、前景領域として選手５００１に対応する領域が抽出される場合を中心に説明した。ただし、背景変化領域や前景領域はこれらに限定されない。例えば、前景領域としてボールなどの被写体が抽出されてもよい。また、背景変化領域として、プロジェクタにより画像が投影される表示面や、観客が移動する観客席などが検出されてもよい。具体的には、試合開始前の観客が移動している間に撮影された撮影画像から背景変化領域として観客席を検出することで、試合中に撮影された撮影画像から観客席が前景領域として抽出されないようにしてもよい。 In the present embodiment, the case where the region corresponding to the display surface of the display 2002-2004 in the captured image is detected as the background change region and the region corresponding to the player 5001 is extracted as the foreground region has been mainly described. .. However, the background change area and the foreground area are not limited to these. For example, a subject such as a ball may be extracted as a foreground area. Further, as the background change area, a display surface on which an image is projected by a projector, an audience seat where an audience moves, or the like may be detected. Specifically, by detecting the spectator seat as a background change area from the photographed image taken while the spectator is moving before the start of the game, the spectator seat is used as the foreground area from the photographed image taken during the game. It may not be extracted.

また、本実施形態においては、画像処理装置１００が動画の各フレームについて前景画像と背景画像とを生成し出力する場合について説明した。ただしこれに限らず、例えば画像処理サーバ１２０が背景画像を予め記憶している場合や背景画像を用いない画像処理を行う場合などには、画像処理装置１００は背景画像を出力せず前景画像のみを出力してもよい。また、画像処理装置１００は背景画像を前景画像よりも低いフレームレートで出力してもよいし、背景画像を前景画像よりも低い解像度で出力してもよい。これにより、画像処理装置１００、画像処理サーバ１２０、及び画像処理装置１００と画像処理サーバ１２０との間の通信経路の負荷を低減することができる。特に本実施形態によれば、抽出すべき所定の被写体の領域だけを前景領域として抽出することができるため、高画質な前景画像のデータ量を削減でき、その結果、前景画像のデータ量と背景画像のデータ量の合計を小さくすることができる。 Further, in the present embodiment, a case where the image processing device 100 generates and outputs a foreground image and a background image for each frame of the moving image has been described. However, the present invention is not limited to this, and for example, when the image processing server 120 stores the background image in advance or performs image processing without using the background image, the image processing apparatus 100 does not output the background image but only the foreground image. May be output. Further, the image processing device 100 may output the background image at a frame rate lower than that of the foreground image, or may output the background image at a resolution lower than that of the foreground image. This makes it possible to reduce the load on the image processing device 100, the image processing server 120, and the communication path between the image processing device 100 and the image processing server 120. In particular, according to the present embodiment, since only the region of a predetermined subject to be extracted can be extracted as the foreground region, the amount of high-quality foreground image data can be reduced, and as a result, the amount of foreground image data and the background can be reduced. The total amount of image data can be reduced.

また、本実施形態においては、撮影画像の画素のうち背景変化領域として特定された画素位置の画素が前景領域として抽出されないようにして、前景背景分離が行われる場合を中心に説明した。ただし、例えば撮影画像においてディスプレイ２００２－２００４の表示面の一部が選手５００１により遮蔽されている場合など、特定された画素位置に抽出すべき所定の被写体に対応する画素が含まれる場合が考えられる。このような場合においては、特定された画素位置に含まれる画素のうち、所定の被写体に対応しない画素のみを、抽出の対象から除外してもよい。具体的には、差分領域画像から人物の形状の領域を検出し、検出された領域に含まれる画素は除外せず、それ以外の画素のうち特定された画素位置に含まれる画素を差分領域画像から除外するなどの方法を採用してもよい。このような方法によれば、撮影画像内の特定された画素位置に抽出すべき所定の被写体に対応する画素が含まれる場合においても、当該所定の被写体の画像抽出することができる。 Further, in the present embodiment, the case where the foreground background separation is performed so as not to extract the pixel at the pixel position specified as the background change region among the pixels of the captured image as the foreground region has been mainly described. However, it is conceivable that a pixel corresponding to a predetermined subject to be extracted may be included in the specified pixel position, for example, when a part of the display surface of the display 2002-2004 is shielded by the player 5001 in the captured image. .. In such a case, among the pixels included in the specified pixel positions, only the pixels that do not correspond to the predetermined subject may be excluded from the extraction target. Specifically, the area of the shape of a person is detected from the difference area image, the pixels included in the detected area are not excluded, and the pixels included in the specified pixel position among the other pixels are included in the difference area image. A method such as excluding from may be adopted. According to such a method, even when the pixel corresponding to the predetermined subject to be extracted is included in the specified pixel position in the captured image, the image of the predetermined subject can be extracted.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ等）によっても実現可能である。また、そのプログラムをコンピュータにより読み取り可能な記録媒体に記録して提供してもよい。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC or the like) that realizes one or more functions. Further, the program may be recorded and provided on a recording medium readable by a computer.

１０画像処理システム
１００画像処理装置
１１０撮影装置
１２０画像処理サーバ 10 Image processing system 100 Image processing device 110 Imaging device 120 Image processing server

Claims

An image processing device that extracts a subject area corresponding to a predetermined subject from an image based on the image taken by the photographing device.
The imaging device within the second period designated by the second predetermined operation based on the difference between a plurality of images based on the imaging by the imaging device at a plurality of timings within the first period designated by the first predetermined operation. A specific means for specifying an area inside the image to be extracted based on the image taken by the image, which is not the target of extraction of the subject area, and a specific means.
Extraction to extract the subject area inside the extraction target image based on the difference between the extraction target image and another image based on the shooting by the shooting device at a timing different from the shooting timing of the extraction target image. An image processing apparatus comprising: means for extracting the subject area composed of pixels not included in the area specified by the specific means.

The extraction means obtains the subject area by removing the area specified by the specific means from the internal area of the extraction target image determined based on the difference between the extraction target image and the other image. The image processing apparatus according to claim 1, wherein the image processing apparatus is characterized by extraction.

The extraction means includes a partial image of a partial image of the extraction target image and composed of pixels not included in the region specified by the specific means, and a region corresponding to the partial image of the other image. The image processing apparatus according to claim 1, wherein the subject area is extracted by determining the difference.

Having an output means for separately outputting an image based on the pixels of the extraction target image, which are not included in the area extracted as the subject area by the extraction means, and the image of the subject area. The image processing apparatus according to any one of claims 1 to 3, which is characterized.

The specifying means specifies a region inside the extraction target image at a pixel position where the difference between the pixel values of the corresponding pixels in the plurality of images is equal to or greater than the first threshold value as a region that is not the target of extraction of the subject region. death,
The extraction means is included in a region of the pixels of the extraction target image whose pixel value difference from the corresponding pixel in the other image is equal to or larger than the second threshold value and is specified by the specific means. The image processing apparatus according to any one of claims 1 to 4, wherein an area composed of non-pixels is extracted as the subject area.

The predetermined subject is not included in the photographing range of the photographing apparatus in the first period, and the predetermined subject is included in the photographing range of the photographing apparatus in at least a part of the second period. The image processing apparatus according to any one of claims 1 to 5.

Any one of claims 1 to 6, wherein the region specified by the specific means is a region different from the predetermined subject and corresponding to a subject that changes with the passage of time. The image processing apparatus described in the section.

The subject area extracted by the extraction means is an area corresponding to a person as a predetermined subject.
The image processing apparatus according to claim 7, wherein the area specified by the specific means is an area corresponding to the display surface of the display device as another subject.

The other image is an image based on the image taken by the photographing apparatus within the first period.
The image processing apparatus according to any one of claims 1 to 8, wherein the extraction means extracts the subject area by using a background subtraction method.

The other image is an image based on the image taken by the photographing apparatus within the second period, and the extraction means is characterized in that the subject area is extracted by using the inter-frame difference method. The image processing apparatus according to any one of the above items.

The image processing apparatus according to any one of claims 1 to 10.
The image of the subject area extracted by the extraction means and the image of the area corresponding to the predetermined subject extracted from the image taken by another photographing device that photographs from a direction different from that of the photographing device. Based on this, an image processing system comprising an image generation means for generating a virtual viewpoint image including the predetermined subject.

An acquisition means for acquiring viewpoint information according to the designation of the virtual viewpoint related to the generation of the virtual viewpoint image, and
Three-dimensional of the predetermined subject based on the image of the subject area extracted by the extraction means and the image of the area corresponding to the predetermined subject extracted from the image based on the image taken by the other photographing device. It has a model generation means to generate a model, and has
The image generation means is characterized in that it generates a virtual viewpoint image including the predetermined subject based on the viewpoint information acquired by the acquisition means and the three-dimensional model generated by the model generation means. Item 11. The image processing system according to item 11.

It is an image processing method executed by a system that extracts a subject area corresponding to a predetermined subject from an image based on an image taken by a photographing device.
The imaging device within the second period designated by the second predetermined operation based on the difference between a plurality of images based on the imaging by the imaging device at a plurality of timings within the first period designated by the first predetermined operation. A specific step of specifying an area inside the image to be extracted based on the image taken by the photographer and not the area to be extracted of the subject area.
Extraction to extract the subject area inside the extraction target image based on the difference between the extraction target image and another image based on the shooting by the shooting device at a timing different from the shooting timing of the extraction target image. An image processing method comprising a means, which comprises an extraction step of extracting the subject region composed of pixels not included in the region specified in the specific step.

In the extraction step, the subject area is determined by removing the area specified in the specific step from the internal area of the extraction target image determined based on the difference between the extraction target image and the other image. The image processing method according to claim 13, wherein the image is extracted.

In the extraction step, a partial image of the extraction target image, which is composed of pixels not included in the region specified in the specific step, and a region corresponding to the partial image of the other image. The image processing method according to claim 13, wherein the subject area is extracted by determining the difference.

A program for operating a computer as each means of the image processing apparatus according to any one of claims 1 to 10.