JP2012034119A

JP2012034119A - Terminal device and processing method

Info

Publication number: JP2012034119A
Application number: JP2010171001A
Authority: JP
Inventors: Takahiro Shimazu; 宝浩島津
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2010-07-29
Filing date: 2010-07-29
Publication date: 2012-02-16

Abstract

PROBLEM TO BE SOLVED: To smoothen the progress of a conference by speedily outputting an image relating to a drive-controlled image, in drive control of a camera.SOLUTION: A television conference terminal 110 which outputs the image of a participant taken by a camera 113 to a television conference terminal 110 at the base or another base records alternate image information indicating participant to be taken by the camera 113 in association with positional information of the participants. A participant to be taken by the camera 113 is determined. The positional information of the determined participant is identified and on the basis of the identified positional information, the camera 113 is drive-controlled so as to take an image of the participant. During the drive control, at least the alternate image indicating the participant is output.

Description

この発明は、端末装置間で情報の送受信をおこなう端末装置および処理方法に関し、特に、自拠点と、ネットワークを介して接続された他拠点との間で送受信される情報を利用して会議をおこなう端末装置および処理方法に関する。 The present invention relates to a terminal device and a processing method for transmitting / receiving information between terminal devices, and in particular, a conference is performed using information transmitted / received between its own base and another base connected via a network. The present invention relates to a terminal device and a processing method.

テレビ会議システムは、複数の端末装置間で各拠点の参加者の状況などを示す画像情報や会議に用いる資料情報を送受信する。テレビ会議システムは、端末装置によって送受信された情報に基づいて、各拠点の参加者の画像や資料などを表示する。各拠点における会議の参加者は、端末装置によって表示された参加者の画像や資料を確認して会議をおこなう。 The video conference system transmits and receives image information indicating the status of participants at each base and material information used for a conference between a plurality of terminal devices. The video conference system displays images, materials, and the like of participants at each base based on information transmitted and received by the terminal device. Participants of the conference at each site conduct the conference by confirming the images and materials of the participants displayed by the terminal device.

参加者の画像は、各拠点に設置されたカメラによって撮像される。各拠点の端末装置は、撮像された参加者の画像を他拠点へ出力する。各拠点の端末装置は、受信した参加者の画像をディスプレイに表示する。複数の参加者が存在する拠点の端末装置は、カメラのパン・チルトなどの駆動制御によって、撮像対象とする参加者の画像を変更することができる。近年では、パン・チルト中に参加者とは異なる風景画像などを撮像して出力しないため、撮像中の参加者から異なる参加者へ撮像対象を変更するためにパン・チルトを実行する場合、パン・チルト前に撮像されていた参加者の画像を出力する提案がされている（特許文献１）。 Participants' images are captured by cameras installed at each site. The terminal device at each base outputs the captured image of the participant to another base. The terminal device at each site displays the received participant image on the display. A terminal device at a base where a plurality of participants exist can change the image of the participant to be imaged by drive control such as pan / tilt of the camera. In recent years, since a landscape image or the like that is different from a participant is not captured and output during pan / tilt, pan / tilt is performed when panning / tilting is performed to change an imaging target from a participant who is capturing to a different participant. A proposal has been made to output an image of a participant who was imaged before tilting (Patent Document 1).

特開平７−３２２１１６号公報JP-A-7-322116

しかしながら、上述した特許文献１に記載の従来技術では、パン・チルトをおこなって撮像対象とする参加者の画像は、パン・チルトによって撮像可能となるまで出力することができないという問題が一例として挙げられる。特に、発話者など会議に重要な参加者を撮像するようパン・チルトをおこなう場合であっても、パン・チルト前の参加者の画像を出力する構成であるため、重要な画像の出力までに時間を要して会議が円滑に進行できないという問題が一例として挙げられる。 However, in the conventional technique described in Patent Document 1 described above, an example of a problem is that an image of a participant who performs pan / tilt and is to be imaged cannot be output until the image can be captured by pan / tilt. It is done. In particular, even when panning / tilting is performed to capture participants who are important to the conference, such as speakers, it is configured to output the participant's image before panning / tilting. An example of the problem is that the conference does not proceed smoothly over time.

この発明は、上述した問題を解決するため、カメラのパン・チルトなどの駆動制御をおこなう際、駆動制御後の画像に関連する画像を迅速に出力することで、円滑に会議を進行することのできる端末装置および処理方法を提供することを目的とする。 In order to solve the above-described problems, the present invention can smoothly output a conference by quickly outputting an image related to an image after drive control when performing drive control such as pan / tilt of the camera. An object of the present invention is to provide a terminal device and a processing method that can be used.

上述した課題を解決し、目的を達成するため、請求項１の発明にかかる端末装置は、撮像部によって撮像される被写体映像を自拠点または他拠点の端末装置へ出力する端末装置であって、前記撮像部によって撮像される被写体を示す代替画像情報を、前記被写体の位置情報と対応付けて記録する記録手段と、自拠点または他拠点における前記被写体に関連する状態変化を検知する検知手段と、前記検知手段によって検知された状態変化に基づいて、前記撮像部による撮像の対象となる前記被写体を決定する決定手段と、前記決定手段によって決定された前記被写体の前記位置情報を特定する特定手段と、前記特定手段によって特定された前記位置情報に基づいて、前記決定手段で決定された前記被写体を撮像するよう前記撮像部の駆動制御をおこなう制御手段と、前記制御手段による駆動制御中は、少なくとも前記決定手段で決定された前記被写体を示す前記代替画像情報を出力する出力手段と、を備えることを特徴とする。 In order to solve the above-described problems and achieve the object, a terminal device according to claim 1 is a terminal device that outputs a subject video imaged by an imaging unit to a terminal device at its own site or another site, Recording means for recording substitute image information indicating a subject imaged by the imaging unit in association with position information of the subject, detection means for detecting a state change related to the subject at its own base or another base, A determining unit that determines the subject to be imaged by the imaging unit based on a state change detected by the detecting unit; a specifying unit that specifies the position information of the subject determined by the determining unit; Based on the position information specified by the specifying means, drive control of the imaging unit is performed so as to image the subject determined by the determining means. A clear screen control means, in the drive control by the control unit, characterized by comprising output means for outputting the substitute image information indicative of the object determined by at least the determining means.

請求項２の発明にかかる端末装置は、請求項１に記載の発明において、前記検知手段は、前記被写体としての利用者の発話を前記状態変化として検知する検知し、前記決定手段は、前記検知手段によって発話が検知された検知方向に位置する前記利用者を撮像の対象となる前記被写体として決定することを特徴とする。 According to a second aspect of the present invention, in the terminal device according to the first aspect of the invention, the detection means detects the user's utterance as the subject as the state change, and the determination means detects the detection. The user located in the detection direction in which the utterance is detected by the means is determined as the subject to be imaged.

請求項３の発明にかかる端末装置は、請求項２に記載の発明において、前記制御手段は、前記検知手段によって検知される発話が所定時間以上継続した場合に、前記撮像部の駆動制御をおこなうことを特徴とする。 According to a third aspect of the present invention, in the terminal device according to the second aspect, the control unit performs drive control of the imaging unit when the utterance detected by the detection unit continues for a predetermined time or more. It is characterized by that.

請求項４の発明にかかる端末装置は、請求項１〜３のいずれか一つに記載の発明において、前記記録手段は、前記撮像部によって撮像された前記被写体映像に基づく前記代替画像情報を、撮像された位置に基づく前記位置情報と対応付けて記録することを特徴とする。 The terminal device according to a fourth aspect of the present invention is the terminal device according to any one of the first to third aspects, wherein the recording means stores the substitute image information based on the subject video imaged by the imaging unit. Recording is performed in association with the position information based on the imaged position.

請求項５の発明にかかる端末装置は、請求項１〜４のいずれか一つに記載の発明において、前記記録手段は、前記制御手段によって駆動制御された前記撮像部によって、前記被写体映像が撮像された場合、既に記録されている前記代替画像情報を、撮像された前記被写体映像に基づく前記代替画像情報に更新することを特徴とする。 The terminal device according to a fifth aspect of the present invention is the terminal device according to any one of the first to fourth aspects, wherein the recording unit captures the subject image by the imaging unit that is driven and controlled by the control unit. If it is, the already recorded alternative image information is updated to the alternative image information based on the captured subject video.

請求項６の発明にかかる端末装置は、請求項１〜５のいずれか一つに記載の発明において、前記出力手段は、前記自拠点または前記他拠点の前記端末装置へ出力する情報を、前記駆動制御中に前記撮像部によって撮像される映像から、前記記録手段によって記録された前記代替画像情報に切り替えることを特徴とする。 The terminal device according to a sixth aspect of the present invention is the terminal device according to any one of the first to fifth aspects, wherein the output means outputs the information to be output to the terminal device at the local site or the other site. The video image picked up by the image pickup unit during the drive control is switched to the substitute image information recorded by the recording means.

請求項７の記載にかかる処理方法は、撮像部によって撮像される被写体映像を自拠点または他拠点の端末装置へ出力する処理方法であって、前記撮像部によって撮像される被写体を示す代替画像情報を、前記被写体の位置情報と対応付けて記録する記録工程と、自拠点または他拠点における前記被写体に関連する状態変化を検知する検知工程と、前記検知工程によって検知された状態変化に基づいて、前記撮像部による撮像の対象となる前記被写体を決定する決定工程と、前記決定工程によって決定された前記被写体の前記位置情報を特定する特定工程と、前記特定工程によって特定された前記位置情報に基づいて、前記決定工程で決定された前記被写体を撮像するよう前記撮像部の駆動制御をおこなう制御工程と、前記制御工程による駆動制御中は、少なくとも前記決定工程で決定された前記被写体を示す前記代替画像情報を出力する出力工程と、を含むことを特徴とする。 The processing method according to claim 7 is a processing method for outputting a subject video imaged by an imaging unit to a terminal device at its own site or another site, and substitute image information indicating a subject imaged by the imaging unit On the basis of the recording process for recording in association with the position information of the subject, a detection process for detecting a status change related to the subject at its own base or another base, and a status change detected by the detection process, Based on the determination step of determining the subject to be imaged by the imaging unit, the specifying step of specifying the position information of the subject determined by the determining step, and the position information specified by the specifying step A control process for controlling the driving of the imaging unit so as to image the subject determined in the determination process, and a drive by the control process. Dear is characterized in that it comprises an output step of outputting the substitute image information indicative of the object determined by at least the determining step.

請求項１にかかる発明によれば、撮像部の駆動制御中であっても、撮像部によって撮像される被写体の代替画像情報を出力することができるため、撮像対象となる被写体を迅速に確認することができる。 According to the first aspect of the present invention, since the substitute image information of the subject imaged by the imaging unit can be output even during the drive control of the imaging unit, the subject to be imaged can be quickly confirmed. be able to.

請求項２にかかる発明によれば、発話をおこなった利用者を撮像対象として撮像部の駆動制御をおこない、駆動制御中に代替画像情報として出力することができるため、重要な被写体に関する画像を迅速に確認することができる。 According to the second aspect of the present invention, it is possible to perform drive control of the imaging unit with the user who has spoken as an imaging target, and to output as alternative image information during the drive control. Can be confirmed.

請求項３にかかる発明によれば、所定時間以上の発話があった場合に駆動制御をおこなうことができるため、撮像対象として重要な被写体の適切化を図ることができる。 According to the third aspect of the present invention, drive control can be performed when there is an utterance for a predetermined time or longer, so that an object that is important as an imaging target can be optimized.

請求項４にかかる発明によれば、撮像部による撮像によって代替画像情報を記録することができるため、機器の使用環境に適した代替画像情報を用いることができる。 According to the fourth aspect of the present invention, since the substitute image information can be recorded by the image pickup by the image pickup unit, the substitute image information suitable for the use environment of the device can be used.

請求項５にかかる発明によれば、駆動制御された撮像部による撮像によって代替画像情報を記録することができるため、処理の進行に応じた最新の代替画像情報を用いることで、重要な被写体の最適化を図ることができる。 According to the fifth aspect of the present invention, the substitute image information can be recorded by the image pickup by the drive unit that is driven and controlled. Therefore, by using the latest substitute image information according to the progress of the process, the important subject can be recorded. Optimization can be achieved.

請求項６にかかる発明によれば、撮像部の駆動制御中に出力する情報を撮像される映像から代替画像情報に切り替える構成であるため、出力する情報量の低減を図ることができる。 According to the sixth aspect of the present invention, the information output during the drive control of the image capturing unit is switched from the captured image to the substitute image information, so that the amount of information to be output can be reduced.

請求項７にかかる発明によれば、撮像部の駆動制御中であっても、撮像部によって撮像される被写体の代替画像情報を出力することができるため、撮像対象となる被写体を迅速に確認することができる。 According to the seventh aspect of the invention, since the alternative image information of the subject imaged by the imaging unit can be output even during the drive control of the imaging unit, the subject to be imaged can be quickly confirmed. be able to.

以上説明したように、本発明にかかる端末装置および処理方法よれば、カメラのパン・チルトなどの駆動制御をおこなう際、駆動制御後の画像に関連する画像を迅速に出力することで、円滑な会議の進行を図ることができるという効果を奏する。 As described above, according to the terminal device and the processing method according to the present invention, when performing drive control such as panning and tilting of the camera, it is possible to smoothly output an image related to the image after drive control. There is an effect that the conference can be progressed.

本発明の実施形態にかかるテレビ会議システムの一例を示す説明図である。It is explanatory drawing which shows an example of the video conference system concerning embodiment of this invention. 本発明の実施形態にかかるテレビ会議端末の機能的構成の一例を示す説明図である。It is explanatory drawing which shows an example of a functional structure of the video conference terminal concerning embodiment of this invention. 本発明の実施形態にかかるテレビ会議端末の画像テーブルの一例を示す説明図である。It is explanatory drawing which shows an example of the image table of the video conference terminal concerning embodiment of this invention. 本発明の実施形態にかかるテレビ会議端末による参加者画像の撮像の一例を示す説明図である。It is explanatory drawing which shows an example of the imaging of the participant image by the video conference terminal concerning embodiment of this invention. 本発明の実施形態にかかる参加者画像のディスプレイの表示の一例を示す説明図である。It is explanatory drawing which shows an example of the display on the display of the participant image concerning embodiment of this invention. 本発明の実施形態にかかるテレビ会議端末の処理の内容を示すフローチャートである。It is a flowchart which shows the content of the process of the video conference terminal concerning embodiment of this invention. 本発明の変形例にかかるテレビ会議端末の処理の内容を示すフローチャートである。It is a flowchart which shows the content of the process of the video conference terminal concerning the modification of this invention.

以下に添付図面を参照して、この発明にかかる端末装置および処理方法の好適な実施の形態を詳細に説明する。 Exemplary embodiments of a terminal device and a processing method according to the present invention will be explained below in detail with reference to the accompanying drawings.

（実施形態）
（全体構成）
図１を用いて、本発明の実施形態にかかる端末装置を、テレビ会議をおこなうテレビ会議システムのために複数拠点に設置されたテレビ会議端末に適用した場合について説明する。図１は、本発明の実施形態にかかるテレビ会議システムの一例を示す説明図である。なお、本実施形態では、各拠点（Ａ，Ｂ）に設置されたテレビ会議端末１１０（１１０ａ，１１０ｂ）によって、本発明にかかる端末装置を実現し、テレビ会議端末１１０およびネットワーク１５０を介して複数のテレビ会議端末１１０が接続されたテレビ会議システム１００によって、本発明にかかる処理方法の処理が実行される場合について説明する。 (Embodiment)
(overall structure)
The case where the terminal device according to the embodiment of the present invention is applied to video conference terminals installed at a plurality of bases for a video conference system that performs a video conference will be described with reference to FIG. FIG. 1 is an explanatory diagram showing an example of a video conference system according to an embodiment of the present invention. In the present embodiment, the terminal device according to the present invention is realized by the video conference terminals 110 (110a, 110b) installed at the respective bases (A, B), and a plurality of devices are provided via the video conference terminal 110 and the network 150. A case where the processing of the processing method according to the present invention is executed by the video conference system 100 to which the video conference terminal 110 is connected will be described.

図１において、テレビ会議システム１００は、各拠点Ａ，Ｂに設置されたテレビ会議端末１１０ａ，１１０ｂがネットワークＮＷを介して接続されて構成されている。具体的には、テレビ会議システム１００は、地理的に離れた各拠点Ａ，Ｂに設置されたテレビ会議端末１１０ａ，１１０ｂがインターネットなどのネットワーク１５０を介して接続されたり、建物内の離れた各拠点Ａ，Ｂに設置されたテレビ会議端末１１０ａ，１１０ｂがＬＡＮ（ローカルエリアネットワーク）などのネットワーク１５０を介して接続されたりしている。なお、図１では、テレビ会議端末１１０ａ，１１０ｂがネットワーク１５０を介して相互に接続されることとして説明するが、ネットワーク１５０上の任意の位置に設置された管理サーバなどを介して相互に接続される構成でもよい。以降の説明では、各拠点の区別をしない場合、符号の末尾の記号である「ａ」，「ｂ」を省略して説明する。 In FIG. 1, a video conference system 100 is configured by connecting video conference terminals 110a and 110b installed at each of the bases A and B via a network NW. Specifically, the video conference system 100 is configured such that video conference terminals 110a and 110b installed at geographically separated locations A and B are connected via a network 150 such as the Internet, or are separated from each other in a building. Video conference terminals 110a and 110b installed at bases A and B are connected via a network 150 such as a LAN (local area network). In FIG. 1, the video conference terminals 110a and 110b are described as being connected to each other via the network 150, but are connected to each other via a management server or the like installed at an arbitrary position on the network 150. It may be configured. In the following description, when the bases are not distinguished, “a” and “b” which are symbols at the end of the reference numerals are omitted.

テレビ会議システム１００は、各拠点でテレビ会議における参加者などの被写体の画像および参加者の発話などによる音声を各テレビ会議端末１１０によって相互に送受信させる。テレビ会議端末１１０は、ＣＰＵ（セントラルプロセッシングユニット）などの機能部を含む本体部１１１に接続された、各種画像を表示するディスプレイ１１２と、被写体を撮像するカメラ１１３と、音声を集音するマイク１１４と、各種音声を出力するスピーカ１１５とを備えている。 The video conference system 100 allows each video conference terminal 110 to mutually transmit and receive audio from an image of a subject such as a participant in a video conference and speech from the participant at each site. The video conference terminal 110 is connected to a main body 111 including a functional unit such as a CPU (Central Processing Unit), a display 112 that displays various images, a camera 113 that captures a subject, and a microphone 114 that collects sound. And a speaker 115 for outputting various sounds.

テレビ会議端末１１０は、カメラ１１３によって自拠点の画像を撮像する。テレビ会議端末１１０は、撮像された画像をネットワーク１５０を介して他拠点のテレビ会議端末１１０に送信する。テレビ会議端末１１０は、他拠点のテレビ会議端末１１０から送信される画像を受信する。テレビ会議端末１１０は、自拠点で撮像された画像や、他拠点から受信した画像をディスプレイ１１２によって表示する。 The video conference terminal 110 captures an image of its own base with the camera 113. The video conference terminal 110 transmits the captured image to the video conference terminal 110 at another site via the network 150. The video conference terminal 110 receives an image transmitted from the video conference terminal 110 at another base. The video conference terminal 110 displays on the display 112 an image captured at its own base or an image received from another base.

テレビ会議端末１１０は、マイク１１４によって自拠点における音声を集音する。テレビ会議端末１１０は、集音した音声をネットワーク１５０を介して他拠点のテレビ会議端末１１０に送信する。テレビ会議端末１１０は、他拠点のテレビ会議端末１１０から送信される音声を受信する。テレビ会議端末１１０は、受信した音声をスピーカ１１５によって出力する。 The video conference terminal 110 collects sound at the local site by the microphone 114. The video conference terminal 110 transmits the collected sound to the video conference terminal 110 at another site via the network 150. The video conference terminal 110 receives audio transmitted from the video conference terminal 110 at another site. The video conference terminal 110 outputs the received voice through the speaker 115.

すなわち、テレビ会議端末１１０は、自拠点と他拠点で相互に送受信される画像および音声を再生する。各拠点の参加者は、自拠点のテレビ会議端末１１０によって再生される他拠点の画像および音声を視聴することで、遠隔に位置する参加者同士でテレビ会議をおこなう。 That is, the video conference terminal 110 plays back images and sounds that are transmitted and received between the local site and other sites. Participants at each site conduct a video conference between participants located remotely by viewing the images and sounds of other sites reproduced by the video conference terminal 110 at their site.

また、テレビ会議端末１１０は、カメラ１１３に対してパン・チルトなどの駆動制御をおこなうことで、カメラ１１３の撮像範囲に含まれる被写体を変更する。カメラ１１３の駆動制御は、参加者の操作やマイク１１４によって検知される参加者による発話のあった方向に基づいておこなわれる。なお、本発明の実施形態では被写体として参加者画像を撮像することとして説明するが、これに限ることはない。被写体は、テレビ会議に関する資料などであってもよい。すなわち、テレビ会議端末１１０は、テレビ会議に関する資料の提示を振動や音声や発光などによって検知し、検知方向に基づいてカメラ１１３の撮像方向を制御することとなる。 Further, the video conference terminal 110 changes the subject included in the imaging range of the camera 113 by performing drive control such as pan / tilt on the camera 113. The drive control of the camera 113 is performed based on the direction of the participant's utterance detected by the participant's operation or the microphone 114. Note that, in the embodiment of the present invention, a description will be given of capturing a participant image as a subject, but the present invention is not limited to this. The subject may be a document related to a video conference. That is, the video conference terminal 110 detects the presentation of materials related to the video conference by vibration, voice, light emission, or the like, and controls the imaging direction of the camera 113 based on the detection direction.

テレビ会議端末１１０は、カメラ１１３の駆動制御をおこなう際、駆動制御後に撮像される参加者を示す代替画像情報を自拠点のディスプレイ１１２や、他拠点のテレビ会議端末１１０へ出力する。詳細は、図３〜図５を用いて説明するが、テレビ会議端末１１０は、カメラ１１３の駆動制御が開始されると、出力する画像を、駆動制御前に撮像されていた参加者画像から駆動制御後に撮像される参加者を示す画像である代替画像情報に切り替える。テレビ会議端末１１０は、カメラ１１３の駆動制御が終了すると、駆動制御後のカメラ１１３によって撮像された参加者画像を出力することとなる。 When the video conference terminal 110 performs drive control of the camera 113, the video conference terminal 110 outputs alternative image information indicating a participant imaged after the drive control to the display 112 at the local site or the video conference terminal 110 at another site. Although details will be described with reference to FIGS. 3 to 5, when the drive control of the camera 113 is started, the video conference terminal 110 drives the image to be output from the participant image captured before the drive control. It switches to the alternative image information which is an image which shows the participant imaged after control. When the drive control of the camera 113 is completed, the video conference terminal 110 outputs a participant image captured by the camera 113 after the drive control.

代替画像情報は、カメラ１１３によって過去に撮像された参加者の画像や、別途入力される参加者を示す文字や絵や写真などの画像であり、参加者の位置情報に対応付けて画像テーブルとして記録される。参加者の位置情報は、たとえば、参加者の発話が検知された検知方向、参加者を撮像したカメラ１１３の撮像方向、参加者の操作入力による撮像方向などに基づいて記録される。 The substitute image information is an image of a participant captured in the past by the camera 113 or an image such as a character, a picture, or a photograph indicating the participant that is input separately, and is used as an image table in association with the position information of the participant. To be recorded. The participant's position information is recorded based on, for example, the detection direction in which the participant's utterance is detected, the imaging direction of the camera 113 that images the participant, the imaging direction by the participant's operation input, and the like.

（機能的構成）
図２を用いて、テレビ会議端末１１０の機能的構成について説明する。図２は、本発明の実施形態にかかるテレビ会議端末の機能的構成の一例を示す説明図である。 (Functional configuration)
The functional configuration of the video conference terminal 110 will be described with reference to FIG. FIG. 2 is an explanatory diagram illustrating an example of a functional configuration of the video conference terminal according to the embodiment of the present invention.

図２において、テレビ会議端末１１０は、ＣＰＵ（セントラルプロセッシングユニット）２０１と、ＲＡＭ（ランダムアクセスメモリ）２０２と、ＲＯＭ（リードオンリーメモリ）２０３と、ディスプレイ１１２やカメラ１１３に対して各種映像の入出力を制御する映像Ｉ／Ｆ２０４と、スピーカ１１５やマイク１１４に対して各種音声の入出力を制御する音声Ｉ／Ｆ２０５と、各種情報の入力を受け付ける操作部２０６と、外部機器との通信を制御する通信Ｉ／Ｆ２０７と、各種情報を記憶する記憶媒体２０８と、を備えている。また、テレビ会議端末１１０の各構成部は、バス２００によってそれぞれ接続されている。 In FIG. 2, a video conference terminal 110 inputs / outputs various images to / from a CPU (Central Processing Unit) 201, a RAM (Random Access Memory) 202, a ROM (Read Only Memory) 203, a display 112 and a camera 113. Controls communication with an external device, a video I / F 204 that controls input / output, an audio I / F 205 that controls input / output of various audio to / from the speaker 115 and the microphone 114, an operation unit 206 that receives input of various information, and the like. A communication I / F 207 and a storage medium 208 that stores various types of information are provided. Each component of the video conference terminal 110 is connected by a bus 200.

ＣＰＵ２０１は、テレビ会議端末１１０全体の制御をおこなう。ＣＰＵ２０１は、ＲＡＭ２０２をワークエリアとして、ＲＯＭ２０３から読み込まれる各種プログラムを実行する。 The CPU 201 controls the entire video conference terminal 110. The CPU 201 executes various programs read from the ROM 203 using the RAM 202 as a work area.

映像Ｉ／Ｆ２０４は、ＣＰＵ２０１の制御にしたがって、ディスプレイ１１２に各種画像を表示させる。映像Ｉ／Ｆ２０４は、他拠点のテレビ会議端末１１０から受信された参加者画像を、ＣＰＵ２０１の制御にしたがって、記憶媒体２０８から読み出してディスプレイ１１２に表示させる。映像Ｉ／Ｆ２０４は、自拠点の参加者画像や他拠点とのテレビ会議に関する処理画面などをディスプレイ１１２に表示させる構成でもよい。 The video I / F 204 displays various images on the display 112 under the control of the CPU 201. The video I / F 204 reads the participant image received from the video conference terminal 110 at another site from the storage medium 208 and causes the display 112 to display the participant image under the control of the CPU 201. The video I / F 204 may be configured to display on the display 112 a participant image at the local site, a processing screen related to a video conference with another site, and the like.

映像Ｉ／Ｆ２０４は、ＣＰＵ２０１の制御にしたがって、カメラ１１３の駆動制御をおこなって、自拠点の被写体である参加者を撮像する。映像Ｉ／Ｆ２０４は、たとえば、複数の参加者のうち、発話した参加者を撮像するようにカメラ１１３をパン・チルトさせる。カメラ１１３の駆動制御は、たとえば、後述するマイク１１４によって所定時間以上の発話を検知した検知方向や、操作部２０６によって参加者による入力を受け付けた撮像方向に基づいておこなわれる。映像Ｉ／Ｆ２０４は、駆動制御して撮像された参加者画像を、後述する通信Ｉ／Ｆ２０７を介して送信先であるテレビ会議端末１１０に送信する。 The video I / F 204 controls the drive of the camera 113 in accordance with the control of the CPU 201 and images the participant who is the subject at the local site. The video I / F 204 pans and tilts the camera 113 so as to capture, for example, a participant who speaks among a plurality of participants. The drive control of the camera 113 is performed based on, for example, a detection direction in which an utterance of a predetermined time or more is detected by a microphone 114 described later, or an imaging direction in which an input by a participant is received by the operation unit 206. The video I / F 204 transmits a participant image captured by driving control to the video conference terminal 110 that is a transmission destination via a communication I / F 207 described later.

映像Ｉ／Ｆ２０４は、カメラ１１３によって撮像された参加者画像を、参加者を示す代替画像情報として後述する記憶媒体２０８へ出力する。映像Ｉ／Ｆ２０４は、参加者を撮像した際のカメラ１１３の撮像方向を、参加者画像の位置情報として記憶媒体２０８に出力する。なお、本発明の実施形態では、映像Ｉ／Ｆ２０４は、カメラ１１３によって撮像をおこなった撮像方向を位置情報とすることとして説明したが、これに限ることはない。具体的には、たとえば、発話した参加者を撮像する場合の発話の検知方向や、参加者による操作入力を受け付けた撮像方向などを位置情報として記憶媒体２０８へ出力することとしてもよい。記憶媒体２０８は、ＣＰＵ２０１の制御にしたがって、映像Ｉ／Ｆ２０４から出力された代替画像情報と、位置情報とを対応付けた画像テーブルを記憶する。 The video I / F 204 outputs the participant image captured by the camera 113 to the storage medium 208 described later as substitute image information indicating the participant. The video I / F 204 outputs the imaging direction of the camera 113 when the participant is imaged to the storage medium 208 as the location information of the participant image. In the embodiment of the present invention, the video I / F 204 has been described as using the imaging direction captured by the camera 113 as the position information, but the present invention is not limited to this. Specifically, for example, the direction in which an utterance is detected when an uttered participant is imaged, the direction in which an operation input by the participant is accepted, and the like may be output to the storage medium 208 as position information. The storage medium 208 stores an image table in which the substitute image information output from the video I / F 204 is associated with the position information according to the control of the CPU 201.

ＣＰＵ２０１は、映像Ｉ／Ｆ２０４の駆動制御中は、カメラ１１３によって撮像される画像の代わりに、駆動制御先の参加者を示す代替画像情報を、通信Ｉ／Ｆ２０７を介して送信先であるテレビ会議端末１１０に対して送信する。具体的には、ＣＰＵ２０１は、撮像の対象となる参加者の位置情報に基づいて、画像テーブルから代替画像情報を読み出して送信する構成である。ＣＰＵ２０１は、カメラ１１３の駆動制御が終了したら、カメラ１１３によって撮像される参加者画像を、通信Ｉ／Ｆ２０７を介して送信先であるテレビ会議端末１１０に送信する。 During the drive control of the video I / F 204, the CPU 201 transmits, instead of the image captured by the camera 113, alternative image information indicating the participant of the drive control destination via the communication I / F 207 as a video conference as a transmission destination. Transmit to terminal 110. Specifically, the CPU 201 is configured to read alternative image information from the image table and transmit it based on the position information of the participants who are to be imaged. When the drive control of the camera 113 is completed, the CPU 201 transmits a participant image captured by the camera 113 to the video conference terminal 110 that is a transmission destination via the communication I / F 207.

ここで、図３〜図５を用いて、本発明の実施形態にかかる代替画像情報の出力について説明する。なお、以降では、拠点Ａのテレビ会議端末１１０ａによって、参加者画像または代替画像情報を拠点Ｂのテレビ会議端末１１０ｂに送信する場合について説明する。図３は、本発明の実施形態にかかるテレビ会議端末の画像テーブルの一例を示す説明図である。 Here, the output of the alternative image information according to the embodiment of the present invention will be described with reference to FIGS. In the following, a case where the participant image or the substitute image information is transmitted to the video conference terminal 110b at the base B by the video conference terminal 110a at the base A will be described. FIG. 3 is an explanatory diagram illustrating an example of an image table of the video conference terminal according to the embodiment of the present invention.

図３において、画像テーブル３００ａは、記憶媒体２０８ａに記憶され、拠点Ａの被写体である参加者ａ１，ａ２の画像情報と、各参加者ａ１，ａ２の位置情報とが対応付けられている。位置情報は、カメラ１１３によって各参加者ａ１，ａ２を撮像した際のカメラ１１３ａの撮像方向である。 In FIG. 3, the image table 300a is stored in the storage medium 208a, and the image information of the participants a1 and a2 that are subjects of the base A is associated with the position information of each participant a1 and a2. The position information is an imaging direction of the camera 113a when each participant a1, a2 is imaged by the camera 113.

テレビ会議端末１１０ａは、他拠点のテレビ会議端末１１０ｂとの間でテレビ会議が開始されると、カメラ１１３ａの撮像方向を駆動制御して自拠点の全参加者ａ１，ａ２の画像を撮像する。テレビ会議端末１１０ａは、テレビ会議開始後の各参加者ａ１，ａ２の画像を代替画像情報とし、各参加者ａ１，ａ２を撮像した撮像方向を位置情報として画像テーブル３００ａを作成する。具体的には、たとえば、カメラ１１３ａによって撮像された画像について、画像認識によって顔画像が撮像された場合に顔画像を含む画像を代替画像情報とし、顔画像を撮像した撮像方向を位置情報として画像テーブル３００ａを作成する。また、顔画像を認識する代わりに、撮像された画像について、利用者の操作によって顔画像を特定することとしてもよい。 When a video conference is started with the video conference terminal 110b at another site, the video conference terminal 110a drives and controls the imaging direction of the camera 113a to capture images of all the participants a1 and a2 at the local site. The video conference terminal 110a creates an image table 300a using the images of the participants a1 and a2 after the start of the video conference as substitute image information and the imaging direction in which the participants a1 and a2 are captured as position information. Specifically, for example, for an image captured by the camera 113a, when a face image is captured by image recognition, an image including the face image is used as substitute image information, and an image capturing direction in which the face image is captured is used as position information. A table 300a is created. Further, instead of recognizing the face image, the face image may be specified for the captured image by the user's operation.

図４を用いて、本発明の実施形態にかかるテレビ会議端末１１０ａによって、カメラ１１３ａをパン・チルト制御して撮像する参加者画像を、参加者ａ２から参加者ａ１に変更する場合について説明する。図４は、本発明の実施形態にかかるテレビ会議端末による参加者画像の撮像の一例を示す説明図である。 With reference to FIG. 4, a case will be described in which the participant image captured by pan / tilt control of the camera 113a is changed from the participant a2 to the participant a1 by the video conference terminal 110a according to the embodiment of the present invention. FIG. 4 is an explanatory diagram illustrating an example of capturing participant images by the video conference terminal according to the embodiment of the present invention.

図４において、状態１でテレビ会議端末１１０ａは、カメラ１１３ａの撮像範囲４０１に含まれる参加者ａ２を撮像している。テレビ会議端末１１０ａは、参加者ａ２の撮像画像を拠点Ｂのテレビ会議端末１１０ｂへ出力する。テレビ会議端末１１０ｂは、テレビ会議端末１１０ａから出力された参加者ａ２の参加者画像をディスプレイ１１２ｂに表示させる。 In FIG. 4, in the state 1, the video conference terminal 110a images the participant a2 included in the imaging range 401 of the camera 113a. The video conference terminal 110a outputs the captured image of the participant a2 to the video conference terminal 110b at the base B. The video conference terminal 110b displays the participant image of the participant a2 output from the video conference terminal 110a on the display 112b.

図５を用いて、本発明の実施形態にかかるテレビ会議端末１１０ｂによるディスプレイ１１２ｂの表示について説明する。図５は、本発明の実施形態にかかる参加者画像のディスプレイの表示の一例を示す説明図である。 The display on the display 112b by the video conference terminal 110b according to the embodiment of the present invention will be described with reference to FIG. FIG. 5 is an explanatory diagram illustrating an example of display on the display of the participant image according to the embodiment of the present invention.

図５において、状態１でテレビ会議端末１１０ｂは、ディスプレイ１１２ｂに、拠点Ａから出力された参加者ａ２の参加者画像５０１を表示している。なお、特に図示はしないが、ディスプレイ１１２ｂには、拠点Ａに参加者画像５０１のほかに自拠点の参加者画像や、テレビ会議に関する処理画像や資料画像などを表示する構成でもよい。 In FIG. 5, in the state 1, the video conference terminal 110b displays the participant image 501 of the participant a2 output from the site A on the display 112b. Although not particularly illustrated, the display 112b may be configured to display a participant image at the local site, a processed image regarding a video conference, a document image, and the like in addition to the participant image 501 at the site A.

図４に戻って、状態１でテレビ会議端末１１０ａは、マイク１１４ａによって参加者ａ１の発話を検知すると、カメラ１１３ａを発話が検知された方向４００へのパン・チルト制御を実行し、状態２へと移行する。発話は、たとえば、発話の継続時間が所定時間以上であったり、発話の音量が所定値以上であったりした場合に検知することとしてもよい。 Returning to FIG. 4, when the video conference terminal 110a detects the speech of the participant a1 with the microphone 114a in the state 1, the video conference terminal 110a performs pan / tilt control in the direction 400 in which the speech is detected by the camera 113a. And migrate. The utterance may be detected when, for example, the duration of the utterance is longer than a predetermined time or the volume of the utterance is higher than a predetermined value.

状態２でテレビ会議端末１１０ａは、カメラ１１３ａのパン・チルト制御をおこなう際に、発話した参加者ａ１を発話者として特定する。テレビ会議端末１１０ａは、特定された発話者の代替画像情報を拠点Ｂのテレビ会議端末１１０ｂへ出力する。 When the video conference terminal 110a performs pan / tilt control of the camera 113a in the state 2, the video conference terminal 110a specifies the participant a1 who has spoken as the speaker. The video conference terminal 110a outputs the specified alternative image information of the speaker to the video conference terminal 110b at the base B.

具体的には、たとえば、発話者は、発話が検知された方向に基づいて特定する。発話の方向は、たとえば、指向性のマイク１１４によって発話が検知された方向や、複数のマイク１１４ａが設置される場合は、各マイク１１４ａによって検知された音量によって特定される方向などに基づいて決定する。テレビ会議端末１１０ａは、発話が検知された方向に基づいて、カメラ１１３ａの撮像方向を決定する。テレビ会議端末１１０ａは、画像テーブル３００ａに基づいて、決定された撮像方向を位置情報とする画像情報を代替画像情報として拠点Ｂのテレビ会議端末１１０ｂへ出力する。 Specifically, for example, the speaker specifies based on the direction in which the utterance is detected. The direction of the utterance is determined based on, for example, the direction in which the utterance is detected by the directional microphone 114 or the direction specified by the sound volume detected by each microphone 114a when a plurality of microphones 114a are installed. To do. The video conference terminal 110a determines the imaging direction of the camera 113a based on the direction in which the utterance is detected. The video conference terminal 110a outputs, based on the image table 300a, image information having the determined imaging direction as position information to the video conference terminal 110b at the site B as substitute image information.

状態２では、パン・チルト制御によって撮像範囲４０２に参加者ａ１，ａ２が含まれなくなった状態であっても、テレビ会議端末１１０ａは、参加者ａ１に関する代替画像情報をテレビ会議端末１１０ｂへ出力する。 In state 2, even if the participants a1 and a2 are no longer included in the imaging range 402 by pan / tilt control, the video conference terminal 110a outputs alternative image information regarding the participant a1 to the video conference terminal 110b. .

図５に戻って、状態２でテレビ会議端末１１０ｂは、ディスプレイ１１２ｂに、拠点Ａから出力された参加者ａ１の代替画像情報５１０を表示している。すなわち、状態２では、テレビ会議端末１１０ｂは、テレビ会議端末１１０ａのカメラ１１３ａのパン・チルト制御中に撮像範囲となる参加者ａ１，ａ２の画像ではなく、発話をおこなった参加者ａ１である発話者に関する画像を表示することとなる。 Returning to FIG. 5, in the state 2, the video conference terminal 110b displays the substitute image information 510 of the participant a1 output from the site A on the display 112b. That is, in state 2, the video conference terminal 110b is not the images of the participants a1 and a2 that are in the imaging range during the pan / tilt control of the camera 113a of the video conference terminal 110a, but the speech that is the participant a1 who made the speech. An image related to the person will be displayed.

図４に戻って、状態２のパン・チルト制御が終了し、参加者ａ１がカメラ１１３ａの撮像範囲４０３に含まれると、カメラ１１３ａによって参加者ａ１の参加者画像が撮像され、状態３へと移行する。テレビ会議端末１１０ａは、発話している参加者ａ１の参加者画像を拠点Ｂのテレビ会議端末１１０ｂへ出力する。 Returning to FIG. 4, when the pan / tilt control in the state 2 is completed and the participant a1 is included in the imaging range 403 of the camera 113a, the participant image of the participant a1 is captured by the camera 113a, and the state 3 is entered. Transition. The video conference terminal 110a outputs the participant image of the participant a1 who is speaking to the video conference terminal 110b at the base B.

図５に戻って、状態３でテレビ会議端末１１０ｂは、ディスプレイ１１２ｂに、拠点Ａから出力された参加者ａ１の参加者画像４０３を表示している。このように、拠点Ａのテレビ会議端末１１０ａによってカメラ１１３ａのパン・チルト制御が実行されていない状態（状態１，３）では、カメラ１１３ａによって撮像された参加者画像を表示し、パン・チルト制御中の状態（状態２）では、パン・チルト制御後に撮像される参加者ａ１を示す代替画像情報を表示することができる。すなわち、カメラ１１３ａのパン・チルト制御中に撮像可能な周囲の移動風景画像の代わりに、テレビ会議に関係する参加者の画像を表示することができる構成である Returning to FIG. 5, in the state 3, the video conference terminal 110b displays the participant image 403 of the participant a1 output from the base A on the display 112b. Thus, in a state where the pan / tilt control of the camera 113a is not executed by the video conference terminal 110a at the site A (states 1 and 3), the participant image captured by the camera 113a is displayed and the pan / tilt control is performed. In the middle state (state 2), it is possible to display alternative image information indicating the participant a1 imaged after the pan / tilt control. That is, instead of the surrounding moving scenery image that can be captured during the pan / tilt control of the camera 113a, it is possible to display the images of the participants related to the video conference.

図２に戻って、音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、スピーカ１１５に各種音声を出力させる。音声Ｉ／Ｆ２０５は、他拠点のテレビ会議端末１１０から受信された音声を、ＣＰＵ２０１の制御にしたがって、記憶媒体２０８から読み出してスピーカ１１５に出力させる。音声Ｉ／Ｆ２０５は、他拠点とのテレビ会議に関する案内音声などを出力させる構成でもよい。 Returning to FIG. 2, the sound I / F 205 causes the speaker 115 to output various sounds according to the control of the CPU 201. The audio I / F 205 reads out the audio received from the video conference terminal 110 at another site from the storage medium 208 according to the control of the CPU 201 and causes the speaker 115 to output it. The voice I / F 205 may be configured to output guidance voice regarding a video conference with another base.

音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、マイク１１４によって自拠点の参加者音声を集音する。音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、マイク１１４によって集音された参加者音声を記憶媒体２０８に出力する。 The voice I / F 205 collects the participant voice at its own location by the microphone 114 under the control of the CPU 201. The audio I / F 205 outputs the participant audio collected by the microphone 114 to the storage medium 208 according to the control of the CPU 201.

音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、マイク１１４によって集音される自拠点の参加者音声によって、発話者を検知する。発話者の検知は、たとえば、発話の継続時間が所定時間以上であったり、発話の音量が所定値以上であったりした場合に検知することとしてもよい。 The voice I / F 205 detects the speaker by the participant voice at the local site collected by the microphone 114 according to the control of the CPU 201. The speaker may be detected when, for example, the duration of the utterance is longer than a predetermined time or the volume of the utterance is higher than a predetermined value.

操作部２０６は、参加者などから各種情報の入力を受け付ける。操作部２０６は、タッチパネルや操作ボタンなどによって構成され、テレビ会議に関する情報の入力を受け付けて、入力された信号をＣＰＵ２０１へ出力する。 The operation unit 206 receives input of various types of information from participants. The operation unit 206 includes a touch panel, operation buttons, and the like. The operation unit 206 accepts input of information regarding a video conference and outputs an input signal to the CPU 201.

通信Ｉ／Ｆ２０７は、通信回線を通じてインターネットなどのネットワーク１５０に接続され、このネットワーク１５０を介して他のテレビ会議端末１１０やその他情報端末などの外部機器に接続される。通信Ｉ／Ｆ２０７は、ネットワーク１５０とテレビ会議端末１１０内部のインターフェースをつかさどり、外部機器に対するデータの入出力を制御する。通信Ｉ／Ｆ２０７には、たとえば、モデムやＬＡＮアダプタなどを採用することができる。 The communication I / F 207 is connected to a network 150 such as the Internet through a communication line, and is connected to an external device such as another video conference terminal 110 or other information terminal via the network 150. A communication I / F 207 controls an interface between the network 150 and the video conference terminal 110 and controls data input / output with respect to an external device. As the communication I / F 207, for example, a modem or a LAN adapter can be employed.

通信Ｉ／Ｆ２０７は、他拠点のテレビ会議端末１１０から送信される画像および音声を受信する。通信Ｉ／Ｆ２０７は、ＣＰＵ２０１の制御にしたがって、受信した画像および音声を記録媒体２０８へ出力する。 The communication I / F 207 receives images and sound transmitted from the video conference terminal 110 at another base. Communication I / F 207 outputs the received image and sound to recording medium 208 in accordance with the control of CPU 201.

通信Ｉ／Ｆ２０７は、ＣＰＵ２０１の制御にしたがって、記憶媒体２０８に記憶された画像および音声を、他拠点のテレビ会議端末１１０へ送信する。 The communication I / F 207 transmits the image and sound stored in the storage medium 208 to the video conference terminal 110 at another base according to the control of the CPU 201.

記憶媒体２０８は、ＨＤ（ハードディスク）や着脱可能な記録媒体の一例としてのＦＤ（フレキシブルディスク）などである。記憶媒体２０８は、それぞれのドライブデバイスを有し、ＣＰＵ２０１の制御にしたがって各種データが記録される。また、記憶媒体２０８からは、それぞれのドライブデバイスの制御にしたがってデータが読み取られる。 The storage medium 208 is an HD (hard disk) or an FD (flexible disk) as an example of a removable recording medium. The storage medium 208 has respective drive devices, and various data are recorded under the control of the CPU 201. Further, data is read from the storage medium 208 according to the control of each drive device.

なお、各構成要素と、各機能を対応付けて説明すると、図２に示したＣＰＵ２０１、記憶媒体２０８によって、本発明にかかる記録手段の機能を実現する。ＣＰＵ２０１によって、本発明にかかる決定手段、検知手段、特定手段の機能を実現する。ＣＰＵ２０１および映像Ｉ／Ｆ２０４によって、本発明にかかる制御手段の機能を実現する。ＣＰＵ２０１および通信Ｉ／Ｆ２０７によって、本発明にかかる出力手段の機能を実現する。なお、各機能を１つのテレビ会議端末によって実現することとしたが、複数の装置で連携して実現することとしてもよい。 If each component is described in association with each function, the function of the recording unit according to the present invention is realized by the CPU 201 and the storage medium 208 shown in FIG. The CPU 201 realizes the functions of the determination unit, the detection unit, and the identification unit according to the present invention. The function of the control means according to the present invention is realized by the CPU 201 and the video I / F 204. The function of the output unit according to the present invention is realized by the CPU 201 and the communication I / F 207. In addition, although each function was implement | achieved by one video conference terminal, it is good also as implement | achieving in cooperation with several apparatuses.

（テレビ会議端末１１０の処理の内容）
図６を用いて、本発明の実施形態にかかるテレビ会議端末１００の処理の内容について説明する。図６は、本発明の実施形態にかかるテレビ会議端末の処理の内容を示すフローチャートである。 (Contents of processing of the video conference terminal 110)
The contents of processing of the video conference terminal 100 according to the embodiment of the present invention will be described with reference to FIG. FIG. 6 is a flowchart showing the contents of processing of the video conference terminal according to the embodiment of the present invention.

図６のフローチャートにおいて、まず、ＣＰＵ２０１は、テレビ会議が開始されたか否かを判断する（ステップＳ６０１）。テレビ会議の開始は、たとえば、参加者による操作部２０６の操作に基づいて、通信Ｉ／Ｆ２０７を介して他のテレビ会議端末１１０に対して接続要求をおこなうことによって判断される。また、通信Ｉ／Ｆ２０７を介して他のテレビ会議端末１１０から応答を受信することによって判断される。 In the flowchart of FIG. 6, first, the CPU 201 determines whether or not a video conference has started (step S601). The start of the video conference is determined, for example, by making a connection request to another video conference terminal 110 via the communication I / F 207 based on the operation of the operation unit 206 by the participant. Also, the determination is made by receiving a response from another video conference terminal 110 via the communication I / F 207.

ステップＳ６０１において、テレビ会議が開始されるのを待って、開始された場合（ステップＳ６０１：Ｙｅｓ）は、ＣＰＵ２０１は、映像Ｉ／Ｆ２０４を制御して、カメラ１１３によって自拠点全体の撮像をおこなう（ステップＳ６０２）。自拠点の全体撮像は、たとえば、カメラ１１３に対してパン・チルトなどの駆動制御をおこない、自拠点でテレビ会議に参加している参加者全員についてのそれぞれの画像を撮像する。 In step S601, waiting for the start of the video conference and starting (step S601: Yes), the CPU 201 controls the video I / F 204 and takes an image of the entire local site by the camera 113 (step S601: Yes). Step S602). For example, the entire image of the local site is controlled by driving and controlling the camera 113 such as pan and tilt, and images of all participants participating in the video conference at the local site are captured.

ＣＰＵ２０１は、ステップＳ６０２において撮像された参加者画像に基づいて、画像テーブルを作成し、記憶媒体２０８に記憶する（ステップＳ６０３）。具体的には、たとえば、画像テーブルは、図３に示した画像テーブル３００ａのように被写体である各参加者の画像情報と、各参加者を撮像した際のカメラ１１３の撮像方向である位置情報とが対応付けられたテーブルである。 The CPU 201 creates an image table based on the participant image captured in step S602 and stores it in the storage medium 208 (step S603). Specifically, for example, the image table includes image information of each participant as a subject as in the image table 300a illustrated in FIG. 3, and position information that is the imaging direction of the camera 113 when each participant is imaged. Is a table in which and are associated with each other.

ＣＰＵ２０１は、映像Ｉ／Ｆ２０４を制御して、カメラ１１３によって参加者画像を撮像して出力する（ステップＳ６０４）。ステップＳ６０４における参加者画像は、たとえば、テレビ会議端末１１０の正面などの初期設定方向や、参加者の操作指示による撮像方向などに応じてカメラ１１３によって撮像される。参加者画像は、たとえば、通信Ｉ／Ｆ２０７を介して他拠点のテレビ会議端末１１０に出力する。 The CPU 201 controls the video I / F 204 to capture and output a participant image with the camera 113 (step S604). The participant image in step S604 is captured by the camera 113 according to, for example, an initial setting direction such as the front of the video conference terminal 110 or an imaging direction according to an operation instruction from the participant. The participant image is output to the video conference terminal 110 at another base via the communication I / F 207, for example.

ＣＰＵ２０１は、テレビ会議が終了されたか否かを判断する（ステップＳ６０５）。テレビ会議の終了は、たとえば、参加者による操作部２０６の操作による指示や、通信Ｉ／Ｆ２０７を介して受信される他のテレビ会議端末１１０からの指示などに基づいておこなわれる。 The CPU 201 determines whether or not the video conference is ended (step S605). The video conference is terminated based on, for example, an instruction by the participant operating the operation unit 206 or an instruction from another video conference terminal 110 received via the communication I / F 207.

ステップＳ６０５において、テレビ会議が終了された場合（ステップＳ６０５：Ｙｅｓ）は、そのまま一連の処理を終了する。ステップＳ６０５において、テレビ会議が終了されない場合（ステップＳ６０５：Ｎｏ）は、ＣＰＵ２０１は、マイク１１４によって、ステップＳ６０４において撮像されている参加者画像とは異なる参加者からの発話を検知した否かを判断する（ステップＳ６０６）。 In step S605, when the video conference is ended (step S605: Yes), the series of processes is ended as it is. If the video conference is not ended in step S605 (step S605: No), the CPU 201 determines whether or not the microphone 114 has detected an utterance from a participant different from the participant image captured in step S604. (Step S606).

ステップＳ６０６において、発話を検知しない場合（ステップＳ６０６：Ｎｏ）は、ステップＳ６０４へ戻って処理を繰り返す。ステップＳ６０６において、発話を検知した場合（ステップＳ６０６：Ｙｅｓ）は、ＣＰＵ２０１は、発話が所定時間以上継続しているか否かを判断する（ステップＳ６０７）。 If no utterance is detected in step S606 (step S606: No), the process returns to step S604 and the process is repeated. In step S606, when an utterance is detected (step S606: Yes), the CPU 201 determines whether or not the utterance continues for a predetermined time or more (step S607).

ステップＳ６０７において、発話が所定時間以上継続していない場合（ステップＳ６０７：Ｎｏ）は、ステップＳ６０４へ戻って処理を繰り返す。すなわち、所定時間以上の発話を確認することで、同意だけの発話や、テレビ会議に関連性の低い発話を無関係とすることができる。 In step S607, when the utterance has not continued for a predetermined time or longer (step S607: No), the process returns to step S604 and the process is repeated. That is, by confirming an utterance for a predetermined time or longer, an utterance with only consent or an utterance less relevant to a video conference can be made irrelevant.

ステップＳ６０７において、発話が所定時間以上継続している場合（ステップＳ６０７：Ｙｅｓ）は、ＣＰＵ２０１は、ステップＳ６０６およびステップＳ６０７において発話が検知された参加者である発話者を特定する（ステップＳ６０８）。発話者の特定は、たとえば、発話が検知された方向に基づいて決定されるカメラ１１３ａの撮像方向に基づいて、ステップＳ６０３において作成された画像テーブルにおける画像情報を特定する。 In step S607, when the utterance continues for a predetermined time or longer (step S607: Yes), the CPU 201 specifies the speaker who is the participant whose utterance is detected in step S606 and step S607 (step S608). For example, the speaker is specified by specifying image information in the image table created in step S603 based on the imaging direction of the camera 113a determined based on the direction in which the utterance is detected.

ＣＰＵ２０１は、ステップＳ６０８において特定された発話者について、ステップＳ６０３において作成された画像テーブルの画像情報である代替画像情報を発話者の画像として通信Ｉ／Ｆ２０７を介して他拠点のテレビ会議端末１１０に出力する（ステップＳ６０９）。 For the speaker identified in step S608, the CPU 201 uses the alternative image information, which is the image information in the image table created in step S603, as the speaker image to the video conference terminal 110 at another base via the communication I / F 207. Output (step S609).

ＣＰＵ２０１は、ステップＳ６０８において特定された発話者を撮像するため、発話が検知された方向を撮像方向として映像Ｉ／Ｆ２０４を介して、カメラ１１３のパン・チルト制御を実行する（ステップＳ６１０）。 In order to image the speaker identified in step S608, the CPU 201 executes pan / tilt control of the camera 113 via the video I / F 204 with the direction in which the utterance is detected as the imaging direction (step S610).

ＣＰＵ２０１は、映像Ｉ／Ｆ２０４を介して、ステップＳ６１０においてパン・チルト制御されているカメラ１１３の撮像範囲に発話者が含まれたか否かを判断する（ステップＳ６１１）。発話者が含まれたか否かの判断は、たとえば、代替画像情報中の参加者と、撮像される参加者との類似度で判断する構成でもよい。 The CPU 201 determines whether a speaker is included in the imaging range of the camera 113 that is pan / tilt controlled in step S610 via the video I / F 204 (step S611). The determination as to whether or not the speaker is included may be made, for example, based on the similarity between the participant in the alternative image information and the participant to be imaged.

ステップＳ６１１において、発話者がカメラ１１３の撮像範囲内にない場合（ステップＳ６１１：Ｎｏ）は、ステップＳ６１０へ戻って処理を繰り返す。ステップＳ６１１において、発話者が、カメラ１１３の撮像範囲内にある場合（ステップＳ６１１：Ｙｅｓ）は、ＣＰＵ２０１は、カメラ１１３によって発話者を撮像対象の参加者画像として（ステップＳ６１２）、ステップＳ６０４へ移行する。換言すれば、ステップＳ６１２では、他拠点のテレビ会議端末１１０への出力対象の画像を、ステップＳ６０９で出力していた代替画像情報から、カメラ１１３で撮像される発話者の参加者画像に切り替えることとなる。 In step S611, when the speaker is not within the imaging range of the camera 113 (step S611: No), the process returns to step S610 to repeat the process. If the speaker is within the imaging range of the camera 113 in step S611 (step S611: Yes), the CPU 201 sets the speaker as a participant image to be imaged by the camera 113 (step S612), and proceeds to step S604. To do. In other words, in step S612, the image to be output to the video conference terminal 110 at another site is switched from the substitute image information output in step S609 to the participant image of the speaker captured by the camera 113. It becomes.

なお、本発明の各構成要素における通信方法と、本発明の実施形態の各処理または各機能とを関連付けて説明すると、ステップＳ６０３におけるＣＰＵ２０１および記憶媒体２０８の処理によって、本発明にかかる記録工程の処理が実行される。ステップＳ６０６〜ステップＳ６０８におけるＣＰＵ２０１の処理によって、本発明にかかる決定工程、検知工程および特定工程の処理が実行される。ステップＳ６１０におけるＣＰＵ２０１および映像Ｉ／Ｆ２０４の処理によって、本発明にかかる制御工程の処理が実行される。ステップＳ６０９におけるＣＰＵ２０１および通信Ｉ／Ｆ２０７の処理によって、本発明にかかる出力工程の処理が実行される。 The communication method in each component of the present invention and each process or each function of the embodiment of the present invention will be described in association with each other in the recording process according to the present invention by the processing of the CPU 201 and the storage medium 208 in step S603. Processing is executed. By the processing of the CPU 201 in steps S606 to S608, the determination process, the detection process, and the specific process according to the present invention are executed. The processing of the control process according to the present invention is executed by the processing of the CPU 201 and the video I / F 204 in step S610. By the processing of the CPU 201 and the communication I / F 207 in step S609, the output process according to the present invention is executed.

以上説明したように、本発明の実施形態によれば、カメラのパン・チルト制御をおこなう場合、パン・チルト制御中は、パン・チルト制御後の参加者の画像を出力できるため、迅速にパン・チルト後の情報を確認することができる。特に、カメラによって参加者を撮像している際に発話した別の参加者へ撮像対象を変更する場合、カメラのパン・チルト制御中に、発話をした参加者を示す代替画像情報を出力し、迅速に発話をした参加者を確認することができるため、円滑なテレビ会議の進行を図ることができる。 As described above, according to the embodiment of the present invention, when performing pan / tilt control of a camera, an image of a participant after pan / tilt control can be output during pan / tilt control.・ You can check the information after tilting. In particular, when changing the imaging target to another participant who has spoken when the participant is being imaged by the camera, during the pan / tilt control of the camera, output alternative image information indicating the participant who has spoken, Since participants who have spoken quickly can be confirmed, a smooth video conference can be promoted.

また、本発明の実施形態によれば、検知された発話に対して、所定時間以上継続したか否かを判断する構成であるため、同意だけの発話や、テレビ会議に関連性の低い発話を無関係として、テレビ会議に関係の深い発話をおこなった参加者を迅速に確認して、無駄に画像を切り替えることを防ぐことができる。 In addition, according to the embodiment of the present invention, since it is configured to determine whether or not the detected utterance has continued for a predetermined time or more, an utterance only for consent or an utterance less relevant to the video conference can be performed. As irrelevant, it is possible to quickly confirm participants who have made utterances closely related to the video conference, and to prevent unnecessary switching of images.

（その他の一部の変形例）
本発明の実施形態では特に、参加者を撮像する場合について説明したが、これに限ることはない。具体的には、テレビ会議に関連する物品の撮像などをおこなうこととしてもよい。この場合、発話を検知する代わりに、物品の状態変化として提示の際に発生する振動や音や発光を検知したり、参加者による撮像指示の入力に応じてカメラのパン・チルト制御をおこなうこととなる。このようにすることで、本発明に関し、会議に関連する撮像対象物の幅広い適用を図ることができる。 (Other variations)
In the embodiment of the present invention, the case where the participant is imaged has been described in particular. However, the present invention is not limited to this. Specifically, an image of an article related to the video conference may be performed. In this case, instead of detecting an utterance, the vibration, sound, or light emission that occurs during the presentation as a change in the state of the article is detected, or the pan / tilt control of the camera is performed according to the input of the imaging instruction by the participant It becomes. By doing in this way, broad application of the imaging target object relevant to a meeting can be aimed at regarding this invention.

また、本発明の実施形態では特に、図６のフローチャートにおいて、発話が所定時間以上継続した場合に、発話者の代替画像情報を出力することとして説明したが、これに限ることはない。具体的には、たとえば、発話の音量が所定値以上であった場合や、発話が予め登録された参加者を示す所定の声色であった場合や、発話時間、音量、声色の少なくともいずれか一つの条件が満たされた場合に、代替画像情報を出力する構成としてもよい。このようにすることで、重要な発話者を優先して選択することができるため、的確な会議の進行を図ることができる。 Further, in the embodiment of the present invention, in the flowchart of FIG. 6, it has been described that the alternative image information of the speaker is output when the utterance continues for a predetermined time or more, but the present invention is not limited to this. Specifically, for example, when the volume of the utterance is equal to or higher than a predetermined value, when the utterance is a predetermined voice color indicating a pre-registered participant, or at least one of the utterance time, the volume, and the voice color The configuration may be such that alternative image information is output when one of the conditions is satisfied. By doing so, it is possible to preferentially select an important speaker, so that an accurate conference can be promoted.

また、本発明の実施形態では特に、図６のフローチャートにおいて、発話者が含まれた場合に、出力対象を代替画像情報から撮像した発話者の参加者画像に切り替えることとして説明したが、これに限ることはない。具体的には、たとえば、ステップＳ６１０において制御している撮像方向までのパン・チルトを完了したか否かを判断することとしてもよい。このようにすることで、簡便な仕組みによって出力対象の画像を切り替えることが可能となる。 In the embodiment of the present invention, in particular, in the flowchart of FIG. 6, when the speaker is included, the output target is described as switching to the participant image of the speaker captured from the alternative image information. There is no limit. Specifically, for example, it may be determined whether pan / tilt up to the imaging direction controlled in step S610 is completed. By doing in this way, it becomes possible to switch the image of the output object by a simple mechanism.

また、本発明の実施形態では特に、ディスプレイに表示する代替画像情報は、送信元であるテレビ会議端末からの出力を受け付ける構成として説明したが、これに限ることはない。具体的には、たとえば、送信元のテレビ会議端末から、参加者画像にくわえ画像テーブルを出力する構成として、送信元のテレビ会議端末が発話を検知した場合、カメラの駆動中は発話者の情報を出力する。送信先のテレビ会議端末は、発話者の情報を受信した場合、既に記憶している画像テーブルから代替画像情報を読み出す構成としてもよい。 In the embodiment of the present invention, the alternative image information to be displayed on the display is described as a configuration for receiving the output from the video conference terminal that is the transmission source. However, the present invention is not limited to this. Specifically, for example, as a configuration in which an image table is output in addition to a participant image from the transmission source video conference terminal, when the transmission source video conference terminal detects an utterance, information about the speaker is being driven while the camera is being driven. Is output. The video conference terminal of the transmission destination may be configured to read the substitute image information from the already stored image table when the information of the speaker is received.

また、本発明の実施形態では特に、テレビ会議開始後に撮像された各参加者の画像を代替画像情報として記録することとして説明したが、これに限ることはない。具体的には、事前に登録された情報や参加者を含む全体画像を用いてもよい。さらに、テレビ会議の進行に応じて、進行中に撮像された参加者画像を代替画像情報としても記録する構成でもよい。図７を用いて、会議開始後に全体の撮像をおこなわずに画像―ブルを作成する場合について説明する。図７は、本発明の変形例にかかるテレビ会議端末の処理の内容を示すフローチャートである。 In the embodiment of the present invention, the image of each participant taken after the start of the video conference is recorded as the substitute image information. However, the present invention is not limited to this. Specifically, an entire image including information registered in advance and participants may be used. Further, a configuration may be adopted in which a participant image captured during the progress is recorded as substitute image information in accordance with the progress of the video conference. With reference to FIG. 7, a case will be described in which an image table is created without imaging the entire image after the start of the conference. FIG. 7 is a flowchart showing the contents of processing of the video conference terminal according to the modification of the present invention.

図７のフローチャートにおいて、まず、ＣＰＵ２０１は、テレビ会議が開始されたか否かを判断する（ステップＳ７０１）。ステップＳ７０１において、テレビ会議が開始されるのを待って、開始された場合（ステップＳ７０１：Ｙｅｓ）は、ＣＰＵ２０１は、映像Ｉ／Ｆ２０４を制御して、カメラ１１３によって参加者画像を撮像して出力する（ステップＳ７０２）。ステップＳ７０２における参加者画像は、たとえば、テレビ会議端末１１０の正面などの初期設定方向や、参加者の操作指示による撮像方向などに応じてカメラ１１３によって撮像される。参加者画像は、たとえば、通信Ｉ／Ｆ２０７を介して他拠点のテレビ会議端末１１０に出力する。 In the flowchart of FIG. 7, first, the CPU 201 determines whether or not a video conference has started (step S701). In step S701, after waiting for the video conference to be started and starting (step S701: Yes), the CPU 201 controls the video I / F 204 to capture and output a participant image by the camera 113. (Step S702). The participant image in step S702 is imaged by the camera 113 according to, for example, an initial setting direction such as the front of the video conference terminal 110, an imaging direction according to an operation instruction from the participant, or the like. The participant image is output to the video conference terminal 110 at another base via the communication I / F 207, for example.

ＣＰＵ２０１は、ステップＳ７０２において撮像された参加者画像に基づいて、記憶媒体２０８に記憶される画像テーブルを更新する（ステップＳ７０３）。すなわち、ステップＳ７０２において撮像された参加者画像が、画像テーブルに存在する場合は画像情報を最新に更新し、画像テーブルに存在しない場合は新規に登録する構成である。なお、画像テーブルに存在する場合に常に更新をする必要はなく、画像情報や位置情報に変更があった場合や、過去の画像情報から所定時間経過した場合に更新することとしてもよい。 The CPU 201 updates the image table stored in the storage medium 208 based on the participant image captured in step S702 (step S703). In other words, when the participant image captured in step S702 exists in the image table, the image information is updated to the latest, and when the participant image does not exist in the image table, it is newly registered. Note that it is not always necessary to update the image table when it exists in the image table, and it may be updated when there is a change in image information or position information, or when a predetermined time has elapsed from past image information.

ＣＰＵ２０１は、テレビ会議が終了されたか否かを判断する（ステップＳ７０４）。テレビ会議の終了は、たとえば、参加者による操作部２０６の操作による指示や、通信Ｉ／Ｆ２０７を介して受信される他のテレビ会議端末１１０からの指示などに基づいておこなわれる。 The CPU 201 determines whether or not the video conference is ended (step S704). The video conference is terminated based on, for example, an instruction by the participant operating the operation unit 206 or an instruction from another video conference terminal 110 received via the communication I / F 207.

ステップＳ７０４において、テレビ会議が終了された場合（ステップＳ７０４：Ｙｅｓ）は、そのまま一連の処理を終了する。ステップＳ７０４において、テレビ会議が終了されない場合（ステップＳ７０４：Ｎｏ）は、ＣＰＵ２０１は、マイク１１４によって、ステップＳ７０２において撮像されている参加者画像とは異なる参加者からの発話を検知した否かを判断する（ステップＳ７０５）。 In step S704, when the video conference is ended (step S704: Yes), the series of processes is ended as it is. If the video conference is not ended in step S704 (step S704: No), the CPU 201 determines whether or not the microphone 114 has detected an utterance from a participant different from the participant image captured in step S702. (Step S705).

ステップＳ７０５において、発話を検知しない場合（ステップＳ７０５：Ｎｏ）は、ステップＳ７０２へ戻って処理を繰り返す。ステップＳ７０５において、発話を検知した場合（ステップＳ７０５：Ｙｅｓ）は、ＣＰＵ２０１は、発話が所定時間以上継続しているか否かを判断する（ステップＳ７０６）。 If no speech is detected in step S705 (step S705: No), the process returns to step S702 and the process is repeated. In step S705, when an utterance is detected (step S705: Yes), the CPU 201 determines whether or not the utterance has continued for a predetermined time or more (step S706).

ステップＳ７０６において、発話が所定時間以上継続していない場合（ステップＳ７０６：Ｎｏ）は、ステップＳ７０２へ戻って処理を繰り返す。ステップＳ７０６において、発話が所定時間以上継続している場合（ステップＳ７０６：Ｙｅｓ）は、ＣＰＵ２０１は、ステップＳ７０５において検知された発話の方向に基づいて、発話した参加者の代替画像情報が画像テーブルに登録済みか否かを判断する（ステップＳ７０７）。 In step S706, when the utterance has not continued for a predetermined time or longer (step S706: No), the process returns to step S702 and the process is repeated. In step S706, when the utterance continues for a predetermined time or longer (step S706: Yes), the CPU 201 displays the alternative image information of the uttered participant in the image table based on the utterance direction detected in step S705. It is determined whether or not it has been registered (step S707).

ステップＳ７０７において、登録済みでない場合（ステップＳ７０７：Ｎｏ）は、ステップＳ７０９へ移行する。ステップＳ７０７において、登録済みである場合（ステップＳ７０７：Ｙｅｓ）は、ステップＳ７０５およびステップＳ７０６において発話が検知された参加者である発話者について、画像テーブルに登録済みである画像情報である代替画像情報を発話者の画像として通信Ｉ／Ｆ２０７を介して他拠点のテレビ会議端末１１０に出力する（ステップＳ７０８）。 In step S707, if not registered (step S707: No), the process proceeds to step S709. If registered in step S707 (step S707: Yes), substitute image information that is image information registered in the image table for the speaker who is a participant whose utterance has been detected in step S705 and step S706. Is output to the video conference terminal 110 at another site via the communication I / F 207 as an image of the speaker (step S708).

ＣＰＵ２０１は、ステップＳ７０５およびステップＳ７０６において発話が検知された参加者である発話者を撮像するため、発話が検知された方向を撮像方向として映像Ｉ／Ｆ２０４を介して、カメラ１１３のパン・チルト制御を実行する（ステップＳ７０９）。 Since the CPU 201 captures an image of a speaker who is a participant whose utterance has been detected in steps S705 and S706, the pan / tilt control of the camera 113 is performed via the video I / F 204 with the direction in which the utterance is detected as the imaging direction. Is executed (step S709).

ＣＰＵ２０１は、映像Ｉ／Ｆ２０４を介して、ステップＳ７０９においてパン・チルト制御されているカメラ１１３の撮像範囲に発話者が含まれたか否かを判断する（ステップＳ７１０）。発話者が含まれたか否かの判断は、たとえば、代替画像情報中の参加者と、撮像される参加者との類似度で判断する構成でもよい。 The CPU 201 determines whether or not a speaker is included in the imaging range of the camera 113 that has been pan / tilt controlled in step S709 via the video I / F 204 (step S710). The determination as to whether or not the speaker is included may be made, for example, based on the similarity between the participant in the alternative image information and the participant to be imaged.

ステップＳ７１０において、発話者がカメラ１１３の撮像範囲内にない場合（ステップＳ７１０：Ｎｏ）は、ステップＳ７０９へ戻って処理を繰り返す。ステップＳ７１０において、発話者が、カメラ１１３の撮像範囲内にある場合（ステップＳ７１０：Ｙｅｓ）は、ＣＰＵ２０１は、カメラ１１３によって発話者を撮像撮像対象の参加者画像として（ステップＳ７１１）、ステップＳ７０２へ移行する。 If the speaker is not within the imaging range of the camera 113 in step S710 (step S710: No), the process returns to step S709 and the process is repeated. In step S710, if the speaker is within the imaging range of the camera 113 (step S710: Yes), the CPU 201 sets the speaker as a participant image to be imaged and captured by the camera 113 (step S711), and proceeds to step S702. Transition.

以上説明したように、本発明の変形例によれば、代替画像情報を最新の参加者画像に行進することができるため、出力する代替画像情報をより新鮮な情報として、的確にテレビ会議を進行することができる。 As described above, according to the modification of the present invention, since the alternative image information can be marched to the latest participant image, the alternative image information to be output is made fresher information and the video conference is accurately performed. can do.

また、上述した説明では、実施形態および一部の変形例について別々の例として説明したが、これに限ることはない。すなわち、それぞれ実施形態および一部の変形例による手法を適宜組み合わせて利用してもよい。 In the above description, the embodiment and some of the modifications have been described as separate examples, but the present invention is not limited to this. That is, the methods according to the embodiments and some of the modifications may be used in appropriate combinations.

なお、本発明の実施形態および変形例で説明した処理方法は、あらかじめ用意された通信プログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することにより実現することができる。この通信プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネットなどのネットワークを介して配布することが可能な伝送媒体であってもよい。 Note that the processing methods described in the embodiments and modifications of the present invention can be realized by executing a communication program prepared in advance on a computer such as a personal computer or a workstation. The communication program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The program may be a transmission medium that can be distributed via a network such as the Internet.

１００テレビ会議システム
１１０（１１０ａ，１１０ｂ）テレビ会議端末
１１１（１１１ａ，１１１ｂ）本体部
１１２（１１２ａ，１１２ｂ）ディスプレイ
１１３（１１３ａ，１１３ｂ）カメラ
１１４（１１４ａ，１１４ｂ）マイク
１１５（１１５ａ，１１５ｂ）スピーカ
ａ１，ａ２，ｂ１参加者
１５０ネットワーク
２００バス
２０１ＣＰＵ
２０２ＲＡＭ
２０３ＲＯＭ
２０４映像Ｉ／Ｆ
２０５音声Ｉ／Ｆ
２０６操作部
２０７通信Ｉ／Ｆ
２０８記憶媒体
100 Video conference system 110 (110a, 110b) Video conference terminal 111 (111a, 111b) Main body 112 (112a, 112b) Display 113 (113a, 113b) Camera 114 (114a, 114b) Microphone 115 (115a, 115b) Speaker a1 , A2, b1 Participants 150 Network 200 Bus 201 CPU
202 RAM
203 ROM
204 Video I / F
205 Voice I / F
206 Operation unit 207 Communication I / F
208 storage media

Claims

A terminal device that outputs a subject image captured by an imaging unit to a terminal device at its own base or another base,
Recording means for recording substitute image information indicating a subject imaged by the imaging unit in association with position information of the subject;
A detecting means for detecting a state change related to the subject at the own base or another base;
A determination unit that determines the subject to be imaged by the imaging unit based on a state change detected by the detection unit;
Specifying means for specifying the position information of the subject determined by the determining means;
Control means for performing drive control of the imaging unit to image the subject determined by the determination means based on the position information specified by the specification means;
During drive control by the control means, output means for outputting the substitute image information indicating at least the subject determined by the determination means;
A terminal device comprising:

The detection means detects a user's utterance as the subject as the state change,
The terminal device according to claim 1, wherein the determining unit determines the user positioned in a detection direction in which an utterance is detected by the detecting unit as the subject to be imaged.

The terminal device according to claim 2, wherein the control unit performs drive control of the imaging unit when an utterance detected by the detection unit continues for a predetermined time or more.

4. The recording device according to claim 1, wherein the recording unit records the substitute image information based on the subject video imaged by the imaging unit in association with the position information based on the imaged position. A terminal device according to any one of the above.

When the subject video is captured by the imaging unit that is driven and controlled by the control unit, the recording unit converts the substitute image information already recorded into the substitute image information based on the captured subject video. The terminal device according to claim 1, wherein the terminal device is updated.

The output means switches the information to be output to the terminal device at the local site or the remote site from the video imaged by the imaging unit during the drive control to the substitute image information recorded by the recording unit. The terminal device according to any one of claims 1 to 5.

A processing method for outputting a subject video imaged by an imaging unit to a terminal device at its own base or another base,
A recording step of recording substitute image information indicating a subject imaged by the imaging unit in association with position information of the subject;
A detection step of detecting a state change related to the subject at the local site or another site;
A determination step of determining the subject to be imaged by the imaging unit based on the state change detected by the detection step;
A specifying step of specifying the position information of the subject determined by the determining step;
A control step of performing drive control of the imaging unit so as to image the subject determined in the determination step based on the position information specified in the specifying step;
During the drive control by the control step, an output step of outputting the substitute image information indicating at least the subject determined in the determination step;
The processing method characterized by including.