JP2007150917A

JP2007150917A - Communication terminal and display method thereof

Info

Publication number: JP2007150917A
Application number: JP2005344752A
Authority: JP
Inventors: Kugo Morita; 空悟守田
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 2005-11-29
Filing date: 2005-11-29
Publication date: 2007-06-14
Anticipated expiration: 2025-11-29
Also published as: JP4832869B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a communication terminal and a display method thereof which can adaptively and optimally update the size and position of a display image area (screen) depending on conditions without requiring operation by a user, can make rearrangement (movement) of the display image area continuous, and can arrange it with the optimum size even if it has a different shape. <P>SOLUTION: When a screen exists on a designated position on an image surface, an encoder 20 as a transmission source generates corresponding designation information, screen information and volume information, and transmits the information to an opposite party in communication. A decoder 30 has a function of displaying a multi-screen, and functions of calculating a display magnification factor of the screen based on a line segment connecting the centers of the screen, the thickness of a reference shape and the volume of sounds. and optimally forming a plurality of screens on the image surface by controlling the movement and new generation of the screen based on this display magnification factor. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、携帯電話機等の通信端末およびその表示方法に係り、特に、多地点通信可能な通信端末およびその表示方法に関するものである。 The present invention relates to a communication terminal such as a mobile phone and a display method thereof, and more particularly to a communication terminal capable of multipoint communication and a display method thereof.

多地点通信としての代表としては、テレビ会議システムがある。テレビ会議システムでは、ＭＣＵ（Multi-point Control Unit）を介して複数の端末が接続する。ＭＣＵは、多数の端末から送られてきた画像データを１つの画面上に分割合成し、音声データとともに、各々の端末に送信することにより、多地点をつないだテレビ会議を実現する。 A representative example of multipoint communication is a video conference system. In the video conference system, a plurality of terminals are connected via an MCU (Multi-point Control Unit). The MCU divides and synthesizes image data sent from a large number of terminals on one screen, and transmits it to each terminal together with audio data, thereby realizing a video conference connecting multiple points.

基本的に、各拠点の画像を１つの画像に分割合成する場合、
（１）１つの画像を等分割する場合（たとえば４分割、９分割）と、
（２）１つ大きな画像領域を取り、残りの領域を等分割に分割して合成する場合と、
がある（たとえば６分割）。 Basically, when dividing and synthesizing the images of each site into one image,
(1) When one image is equally divided (for example, 4 divisions, 9 divisions),
(2) Taking one large image area and dividing the remaining area into equal parts;
(For example, 6 divisions).

（１）の場合、ＭＣＵで結んでいる拠点からの画像が同じ面積を使って合成される。
（２）の場合、話している拠点を大きな面積を割り当て、残りの拠点からの画像を残りの等分割された領域に割り当てて、合成する。 In the case of (1), images from bases connected by MCU are synthesized using the same area.
In the case of (2), a large area is allocated to the talking base, and images from the remaining bases are assigned to the remaining equally divided areas to be combined.

いずれの場合でも、テレビ会議システムでは、大画面のモニタを用いて行うために、複数の拠点の画像を１つの画像に分割合成しても、個々の拠点を映す画像のサイズは充分な大きさを有し、一人で映っている分には、その人の顔が認識困難になるということはない。テレビ会議システムとしては、たとえば特許文献１，２等に開示されている。 In any case, since the video conference system uses a large-screen monitor, even if the images of a plurality of locations are divided and combined into one image, the size of the image showing each location is sufficiently large. As long as you are alone, the face of that person will not be difficult to recognize. A video conference system is disclosed in, for example, Patent Documents 1 and 2 and the like.

図１（Ａ）〜（Ｅ）は、一般的なテレビ会議システムにおける多値点通信時のパーソナルコンピュータ（ＰＣ）等の端末の表示画面例を示す図である。
図１の例においては、画面１を先に決められた枠（四角形）のウィンドゥに分割する。
たとえば、画面１は１つの大きなウィンドゥ（四角形）２と複数の小さなウィンドゥ（四角形）３−１〜３−５から形成され、話し手を大きなウィンドゥ２に表示する。
この場合、ウィンドゥのサイズ、および分割数は固定的であり、撮像された画像をそのまま表示しているため、撮影の状態に応じて、顔の大きさが変動する。 1A to 1E are diagrams showing examples of display screens of a terminal such as a personal computer (PC) at the time of multilevel communication in a general video conference system.
In the example of FIG. 1, the screen 1 is divided into a predetermined frame (rectangular) window.
For example, the screen 1 is formed of one large window (square) 2 and a plurality of small windows (squares) 3-1 to 3-5, and displays a speaker on the large window 2.
In this case, since the window size and the number of divisions are fixed and the captured image is displayed as it is, the size of the face varies depending on the shooting state.

一般的なＰＣのウィンドゥ制御の場合、マウスでウィンドゥをドラッグすることにより、ウィンドゥのサイズの変更、ウィンドゥの選択を自由に行うことが可能である。 In the case of a general PC window control, it is possible to freely change the window size and select the window by dragging the window with the mouse.

ところで、携帯電話機等の携帯通信端末は、音声通話だけでなく、メール、Webアクセス、ゲーム、カメラ、テレビ電話、メディアプレィア、ラジオ、テレビなど年々高機能化されている。
現行、携帯通信端末でのテレビ電話は、発呼時に、テレビ電話で接続することを選択するものである。 By the way, mobile communication terminals such as mobile phones are not only for voice calls, but have become more sophisticated year by year, such as mail, web access, games, cameras, videophones, media players, radios, and televisions.
At present, a videophone in a mobile communication terminal is selected to connect by a videophone when a call is made.

しかしながら、パケット通信への対応が進むことにより、音声通話自体がパケット通信に対応したVoIPが使用され、通話中にカメラを起動し、音声および映像での通話に切り替えたり、また逆に、カメラを停止し、音声通話のみにしたりといった使い方が主要となる。さらに、通話中の相手に、自端末に保存している文書（ex. メール）、住所データ、画像（静止画像、動画像）、音声などを送って、（通話相手にて自動的に再生され、）同時に観たり、Webサイトを同時に観たりすることが可能となる。 However, with the progress of support for packet communication, VoIP that supports packet communication is used as the voice call itself, and the camera is activated during the call and switched to voice and video calls, and vice versa. The main usage is to stop and make only voice calls. In addition, send documents (ex. Mail), address data, images (still images, moving images), audio, etc. saved on your terminal to the other party on the call, )) You can watch at the same time or watch the website at the same time.

このように、高機能化により携帯通信端末がIP化された場合、同時に複数の相手（サーバを含む）と通信を行うことが可能となる。
この場合、一つの端末で複数のスクリーンを取り扱う必要がある。複数のスクリーンを取り扱う方法としては、（ＰＤＡなどで）ページめくり的に取り扱う方法がある。
特開平０６−１４１３１０号公報特開平０６−１４１３１１号公報 As described above, when the mobile communication terminal is converted to IP due to high functionality, it is possible to communicate with a plurality of other parties (including a server) at the same time.
In this case, it is necessary to handle a plurality of screens with one terminal. As a method of handling a plurality of screens, there is a method of handling pages in a page turning manner (such as with a PDA).
Japanese Patent Laid-Open No. 06-141310 Japanese Patent Application Laid-Open No. 06-141311

ところで、たとえば携帯IP-TV電話では、画面のサイズが小さいため、複数人で、画像ありの通話を行った場合、一人ひとりの顔の大きさが小さくなる。
PCのウィンドゥのように、ユーザがウィンドゥを動かしたら、サイズを変えたりできるようにするには、画面サイズ、および操作キーに制限があり、困難である。
また、複数人が同程度で話した場合、スクリーンが対応できない。 By the way, for example, in a portable IP-TV phone, since the screen size is small, when a call with images is made by a plurality of people, the size of each person's face becomes small.
In order to be able to change the size when the user moves the window like a PC window, the screen size and operation keys are limited and difficult.
In addition, when multiple people talk at the same level, the screen is not compatible.

さらに、音量に応じて、スクリーンサイズを設定した場合、無駄な空間を増加させたり、画面内に全スクリーンを表示しきれなくなったりする問題を有している。
具体的には、全スクリーンの音量が小さい場合、小さいスクリーンが画面上を浮遊することになる。一方、全スクリーンが音量最大の場合、表示スクリーンの合計面積が画面面積を超えるという問題を有する。
これに対して、局所的に合計面積を画面面積に規格化することによって、画面内に収まらせることは可能であるが、次時刻におけるスクリーン位置の再配置（移動）が不連続的になる。 Furthermore, when the screen size is set according to the volume, there is a problem that a useless space is increased or the entire screen cannot be displayed in the screen.
Specifically, when the volume of all the screens is small, a small screen floats on the screen. On the other hand, when the volume of all screens is maximum, there is a problem that the total area of the display screen exceeds the screen area.
On the other hand, by locally normalizing the total area to the screen area, it is possible to fit within the screen, but the rearrangement (movement) of the screen position at the next time becomes discontinuous.

本発明の目的は、ユーザが操作することなく、音量の大きさや表示すべき表示画像エリア(スクリーン)の数等の状況に応じて適応的に、最適に表示画像エリア（スクリーン）のサイズ、位置を更新することができ、しかも表示画像エリアの再配置(移動)が連続的になり、異なる形状であっても最適なサイズで配置することが可能となる通信端末およびその表示方法を提供することにある。 It is an object of the present invention to adaptively and optimally display the size and position of the display image area (screen) according to the situation such as the volume level and the number of display image areas (screens) to be displayed without the user's operation. A communication terminal capable of updating the display image area, and rearranging (moving) the display image area continuously, so that even a different shape can be arranged in an optimum size, and a display method thereof It is in.

本発明の第１の観点は、受信した画像データおよび音声データを再生する通信端末であって、画像を表示する表示手段と、前記表示手段に特定のエリアを抽出されて表示すべき複数の画像の各々を表示する複数の表示エリアを形成可能で、少なくとも画像の表示エリア中心間を結ぶ線分、基準形状の厚さ、音声の大きさに基づいて、前記表示エリアの表示倍率を算出し、当該表示倍率に基づいて表示エリアの移動、新規生成を制御して、前記表示手段の表示画面上に複数の表示エリアを形成する制御手段とを有する。 A first aspect of the present invention is a communication terminal that reproduces received image data and audio data, a display unit that displays an image, and a plurality of images that are extracted from a specific area and displayed on the display unit A plurality of display areas for displaying each of the display area, calculating the display magnification of the display area based on at least the line segment connecting the center of the display area of the image, the thickness of the reference shape, the volume of the sound, Control means for controlling the movement and new generation of the display area based on the display magnification to form a plurality of display areas on the display screen of the display means.

好適には、前記表示エリアは、当該表示エリアの表示位置を示す中心位置座標(P(i))、当該表示エリアの形状を示す基準形状(Unit(i))、当該表示エリアに対応付けられた音声の大きい(V(i))、当該表示エリアを画面上に表示する際の表示倍率(R(i))と、を有し、前記制御手段は、表示倍率(R(i))として、周囲のスクリーンの中心位置座標(P(j))と結ぶ線分(L(i,j))と、当該線分上の基準形状の厚さ(Lm(i,j),Lm(j,i))、および、音声の大きさ(V(i),V(j))に基づいて算出しされた仮表示倍率(R(i,j))のうち、最も小さい値とする。 Preferably, the display area is associated with a center position coordinate (P (i)) indicating a display position of the display area, a reference shape (Unit (i)) indicating the shape of the display area, and the display area. The sound is loud (V (i)), and the display magnification (R (i)) when the display area is displayed on the screen, the control means as the display magnification (R (i)) , The line segment (L (i, j)) connecting to the center position coordinates (P (j)) of the surrounding screen, and the thicknesses (Lm (i, j), Lm (j, i)) and the provisional display magnification (R (i, j)) calculated based on the sound volume (V (i), V (j)).

好適には、前記制御手段は、前記表示エリア中心から画面境界に垂直に接した点に、音声の大きさ(V(k)=0)、厚さ(Lm(k,i)=0)を設定し、表示倍率(R(i,k))を算出する。 Preferably, the control means sets the audio volume (V (k) = 0) and thickness (Lm (k, i) = 0) at a point perpendicular to the screen boundary from the center of the display area. Set and calculate the display magnification (R (i, k)).

好適には、前記制御手段は、前記表示エリアを、表示倍率(R(i))を最も大きくする位置に移動する。 Preferably, the control means moves the display area to a position where the display magnification (R (i)) is maximized.

前記制御手段は、表示倍率(R(k))を最も大きい位置に、新規表示エリアの中心を生成する。 The control means generates the center of the new display area at the position where the display magnification (R (k)) is the largest.

好適には、前記制御手段は、基準形状は、面積を等しくする。 Preferably, the control means equalizes the area of the reference shape.

好適には、前記制御手段は、基準形状にて形成したスクリーン間に分離線を引き、前記分離線にて分離されたエリアを新たな表示エリアとする。 Preferably, the control means draws a separation line between the screens formed in a reference shape, and sets the area separated by the separation line as a new display area.

好適には、前記表示すべき画像に関する情報は、送信側装置からの受信情報に含まれ、前記制御手段は、前記受信情報に基づいて表示倍率の算出し、表示エリアの移動、新規生成を制御する。 Preferably, the information on the image to be displayed is included in the reception information from the transmission side device, and the control unit calculates display magnification based on the reception information, and controls movement of the display area and new generation. To do.

本発明の第２の観点は、受信した画像データ、音声データを再生する通信端末の表示方法であって、特定のエリアを抽出されて表示すべき複数の画像の表示エリア中心間を結ぶ線分、基準形状の厚さ、音声の大きさに基づいて、前記表示エリアの表示倍率を算出し、当該表示倍率に基づいて表示エリアの移動、新規生成を制御して、表示画面上に複数の表示エリアを形成し、表示すべき画像を含む複数の表示エリアを表示する。 A second aspect of the present invention is a display method of a communication terminal that reproduces received image data and audio data, and a line segment connecting the centers of display areas of a plurality of images to be extracted and displayed in a specific area. The display area display magnification is calculated based on the thickness of the reference shape and the volume of the sound, and a plurality of displays are displayed on the display screen by controlling the movement and new generation of the display area based on the display magnification. An area is formed and a plurality of display areas including an image to be displayed are displayed.

本発明によれば、ユーザが操作することなく、音量の大きさや表示すべき表示画像エリア(スクリーン)の数等の状況に応じて適応的に、最適に表示画像エリア（スクリーン）のサイズ、位置を更新することができ、しかも表示画像エリアの再配置(移動)が連続的になり、異なる形状であっても最適なサイズで配置することが可能となる。 According to the present invention, the size and position of the display image area (screen) are adaptively and optimally according to the situation such as the volume level and the number of display image areas (screens) to be displayed without the user's operation. In addition, the display image area can be rearranged (moved) continuously, and even with different shapes, it can be arranged in an optimum size.

以下、本発明の実施形態を図面に関連付けて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図２および図３は、本発明の実施形態に係る携帯通信端末の構成例を示す図であって、図２はエンコード装置を示すブロック図であり、図３はデコード装置を示すブロック図である。 2 and 3 are diagrams illustrating a configuration example of the mobile communication terminal according to the embodiment of the present invention. FIG. 2 is a block diagram illustrating an encoding apparatus, and FIG. 3 is a block diagram illustrating a decoding apparatus. .

本携帯通信端末１０は、送信元となるエンコード装置２０と、受信側とあるデコード装置３０とを有し、多地点通信可能に構成される。 The mobile communication terminal 10 includes an encoding device 20 serving as a transmission source and a decoding device 30 serving as a reception side, and is configured to be capable of multipoint communication.

エンコード装置２０は、符号化した音声データ、画像データを、受信側端末に対する指示情報や画像の天地情報等を付加してパケットとしてネットワークに送信する機能を有する。
音声データおよび画像データに付加される送信元の指示情報は、指示された画像の送信元の識別する情報(たとえ、ＩＰアドレス、MACアドレス）と、受信した画像上の位置を示す位置情報とを含む。
送信元となるエンコード装置２０は、送信元は、画面上、指示した位置にスクリーン(スクリーンについては後で詳述する)が存在する場合、対応する指示情報、スクリーン情報、音量情報を生成し、同通信中の相手に対して送出する機能を有する。 The encoding device 20 has a function of transmitting the encoded audio data and image data to the network as a packet by adding instruction information for the receiving terminal, image top and bottom information, and the like.
The instruction information of the transmission source added to the audio data and the image data includes information for identifying the transmission source of the instructed image (for example, IP address, MAC address) and position information indicating the position on the received image. Including.
The encoding device 20 serving as a transmission source generates corresponding instruction information, screen information, and volume information when the transmission source has a screen (the screen will be described in detail later) at the indicated position on the screen. It has a function to send to the other party in communication.

図２のエンコード装置２０は、マイクロフォン等からなる音声入力部２０１、デジタルカメラ等の画像入力部２０２、キー入力等が可能な操作部２０３、音声入力部２０１により入力される音声データを符号化する音声符号化処理部２０４、画像入力部２０２から入力され所定エリアに切り出された画像データを符号化する画像符号化処理部２０５、撮像画像に関連付けた天地情報に基づいて、撮像画像の天地を受信側の表示部の画面（端末画面）の天地と一致するように補正する天地補正部２０６、撮像画像から顔のエリアを検出、抽出する顔エリア検出部２０７、顔エリア検出部２０７にて検出された顔エリアに基づいて使用するスクリーン（表示すべき表示画像エリア）を判別しスクリーン情報を生成するスクリーン判別部２０８、スクリーン判別部２０８の判定に基づいて受信画像から該当するエリアを切り出す切り出し部２０９、音声入力部２０１による入力音量を計測し音量情報を生成する入力音量計測部２１０、操作部２０３の入力情報に基づいて端末を制御する端末制御部２１１、端末制御部２１１の指示に基づいて指示情報や天地情報、スクリーン情報、音量情報等を含む制御情報を生成する制御情報生成部２１２、画像・映像を記憶する記憶部２１３、符号化された音声データおよび画像データ、制御情報、端末制御部２１１の指示に基づいて記憶部２１３から読み出された画像・映像データを送信パケットとして生成する送信パケット生成部２１４、およびネットワークと無線通信可能で生成された送信パケットをネットワークを介して通信相手の端末やサーバに送信するネットワークインタフェース（Ｉ／Ｆ）２１５を有する。 The encoding apparatus 20 in FIG. 2 encodes a voice input unit 201 composed of a microphone or the like, an image input unit 202 such as a digital camera, an operation unit 203 capable of key input or the like, and voice data input by the voice input unit 201. The audio encoding processing unit 204, the image encoding processing unit 205 that encodes the image data input from the image input unit 202 and cut out into a predetermined area, and receives the top and bottom of the captured image based on the top and bottom information associated with the captured image Detected by a top / bottom correction unit 206 that corrects the screen to match the top / bottom of the display screen (terminal screen), a face area detection unit 207 that detects and extracts a face area from the captured image, and a face area detection unit 207. A screen discriminating unit 208 that discriminates a screen (display image area to be displayed) to be used based on the face area and generates screen information. Based on the input information of the operation unit 203 and the input volume measuring unit 210 that generates the volume information by measuring the input volume by the voice input unit 201 by cutting out the corresponding area from the received image based on the determination of the lean determination unit 208. A terminal control unit 211 that controls the terminal, a control information generation unit 212 that generates control information including instruction information, top and bottom information, screen information, volume information, and the like based on instructions from the terminal control unit 211, and stores images and videos A storage unit 213, encoded audio data and image data, control information, a transmission packet generation unit 214 that generates image / video data read from the storage unit 213 based on an instruction from the terminal control unit 211 as a transmission packet; In addition, the transmission packet generated so that wireless communication with the network can be performed via the network is performed on the other party's terminal or A network interface (I / F) 215 for transmitting to the server.

デコード装置３０は、通信相手（送信元）のエンコード装置２０から送信されネットワークを介して受信した音声データ、画像データを再生する機能を有する。
デコード装置３０は、たとえば多地点通信を行っている場合に、受信画像の制御情報に基づいて特定エリアである顔を含む画像を、使用するスクリーン（サイズが制御された表示エリア）を選択して表示し、音声を発する機能を有する。
デコード装置３０は、このスクリーンの表示に際し、デッドゾーンをなくした円形（楕円形を含む概念である）ウィンドゥにて分割する機能を有する。
円形(楕円形)ウィンドウに分割するように構成したのは、以下の理由による。
一般的に、画面の分割は、長方形で行っていた。人間の顔は基本的に楕円形であり、長方形の四隅はデットゾーンとなる。このデッドゾーンが、顔を表示するエリアを結果的に狭く（小さく）している。
よって、本実施形態においては、このデッドゾーンをなくした円形（楕円形）ウィンドゥにて分割するように構成している。
また、デコード装置３０は、マルチスクリーンを表示する機能を有し、スクリーン中心間を結ぶ線分、基準形状の厚さ、音声の大きさに基づいて、スクリーンの表示倍率を算出し、この表示倍率に基づいてスクリーンの移動、新規生成を制御することにより、画面上に複数のスクリーンを最適に形成する機能を有する。
具体的な処理については、後で図面に関連付けて詳述する。 The decoding device 30 has a function of reproducing audio data and image data transmitted from the encoding device 20 of the communication partner (transmission source) and received via the network.
For example, when performing multipoint communication, the decoding device 30 selects a screen (a display area whose size is controlled) to use an image including a face as a specific area based on the control information of the received image. It has a function to display and emit sound.
The decoding device 30 has a function to divide in a circular (concept including an ellipse) window from which a dead zone is eliminated when the screen is displayed.
The reason for the division into the circular (elliptical) windows is as follows.
In general, the screen is divided into rectangles. The human face is basically oval, and the four corners of the rectangle are dead zones. This dead zone results in a narrow (small) area for displaying the face.
Therefore, in this embodiment, it is configured so as to be divided by a circular (elliptical) window from which this dead zone is eliminated.
Further, the decoding device 30 has a function of displaying a multi-screen, calculates a screen display magnification based on a line segment connecting the centers of the screens, a thickness of a reference shape, and a loudness of the sound. By controlling the movement and new generation of the screen based on the above, it has a function of optimally forming a plurality of screens on the screen.
Specific processing will be described in detail later in association with the drawings.

図３のデコード装置３０は、ネットワークと無線通信可能で送信元から送信された音声データ、画像(映像)データ、制御情報や指示情報、スクリーン情報、音量情報等を含むパケットを受信するネットワークインタフェース（Ｉ／Ｆ）３０１、キー入力等が可能な操作部３０２、ネットワークインタフェース３０１で受信されたパケットを解析し、音声データ、画像データ、送信元アドレス、制御情報(天地情報や指示情報等)を抽出する受信パケット解析部３０３、受信パケット解析部３０３により抽出された音声データを復号する音声復号処理部３０４、受信パケット解析部３０３により抽出された映像データを復号する映像復号処理部３０５、映像復号処理部３０５により復号された映像データ、送信元アドレス、制御情報、スクリーン情報、サイズ情報、天地情報に基づいて表示すべきスクリーン(表示ウィンドウ)のサイズや表示形態を制御する表示画像制御部３０６、音声復号処理部３０４により復号された音声の音量を修正する音量修正部３０７、音量修正部３０７で修正された音量で発音するスピーカ等の音声出力部３０８、表示画像制御部３０６によりサイズや表示形態が制御された画像を補正する画像補正部３０９、画像補正部３０９を介した画像を表示するＬＣＤ等の表示部（画像出力部）３１０、および操作部３０２からの入力情報に基づいて表示画像制御部３０６に制御情報(天地情報)を与える自端末制御部３１１を有する。 The decoding device 30 of FIG. 3 is a network interface that receives packets including audio data, image (video) data, control information and instruction information, screen information, volume information, and the like that are wirelessly communicable with a network and transmitted from a transmission source. (I / F) 301, operation unit 302 capable of key input, etc., and packet received by network interface 301 are analyzed, and voice data, image data, transmission source address, control information (top and bottom information, instruction information, etc.) are extracted. The received packet analysis unit 303, the audio decoding processing unit 304 that decodes the audio data extracted by the reception packet analysis unit 303, the video decoding processing unit 305 that decodes the video data extracted by the reception packet analysis unit 303, and the video decoding process. Video data decoded by the unit 305, transmission source address, control information, screen information Display image control unit 306 that controls the size and display form of a screen (display window) to be displayed based on the information, size information, and top and bottom information, and a volume correction unit that corrects the volume of the audio decoded by the audio decoding processing unit 304 307, an audio output unit 308 such as a speaker that produces sound with the volume corrected by the volume correction unit 307, an image correction unit 309 that corrects an image whose size and display form are controlled by the display image control unit 306, and an image correction unit 309. A display unit (image output unit) 310 such as an LCD for displaying an image via the terminal, and a local terminal control unit 311 for giving control information (top and bottom information) to the display image control unit 306 based on input information from the operation unit 302 .

なお、エンコード装置２０とデコード装置３０は、操作部２０３と３０２、ネットワークインタフェース２１０と３０１、端末制御部２１１と自端末制御部３１１は共用することが可能である。 Note that the encoding device 20 and the decoding device 30 can share the operation units 203 and 302, the network interfaces 210 and 301, the terminal control unit 211, and the own terminal control unit 311.

以下に、本実施形態の特徴部分である表示画像制御部３０６のより具体的な構成および機能、並びにスクリーンの具体的な構成や表示形態例について順を追って説明する。 Hereinafter, a more specific configuration and function of the display image control unit 306, which is a characteristic part of the present embodiment, and a specific configuration and display mode example of the screen will be described in order.

図３の表示画像制御部３０６は、受信パケット解析部３０３により供給される制御情報に基づいてスクリーン情報、サイズ情報、天地情報、および指示情報を抽出する制御情報解析部３０６１、スクリーン情報に基づいて映像復号処理部３０５で復号された映像に対してマスキングを行うマスキング処理部３０６２、サイズ情報に基づいて表示すべきスクリーン(表示画像エリア)の表示倍率を算出する表示倍率算出部３０６３、表示倍率算出部３０６３で算出された表示倍率に従ってマスキング処理後の画像を縮小・拡大する縮小・拡大処理部３０６４、表示倍率算出部３０６３で算出された表示倍率および天地情報に従って表示位置を算出する表示位置算出部３０６５、および表示位置算出部３０６５にて得られた表示部３１０上の位置に縮小・拡大処理部３０６４にて得られた画像をマッピングするマッピング処理部３０６６を有する。 The display image control unit 306 in FIG. 3 is based on the control information analysis unit 3061 that extracts screen information, size information, top and bottom information, and instruction information based on the control information supplied from the received packet analysis unit 303, and based on the screen information. A masking processing unit 3062 that masks the video decoded by the video decoding processing unit 305, a display magnification calculating unit 3063 that calculates a display magnification of a screen (display image area) to be displayed based on the size information, and a display magnification calculation A reduction / enlargement processing unit 3064 for reducing / enlarging the image after the masking process according to the display magnification calculated by the unit 3063, and a display position calculation unit for calculating a display position according to the display magnification and the top / bottom information calculated by the display magnification calculation unit 3063 3065 and the position on the display unit 310 obtained by the display position calculation unit 3065 Having a mapping processing unit 3066 for mapping the image obtained by small-enlargement processing section 3064.

本実施形態の表示画像制御部３０６によりサイズおよび表示形態が制御されるスクリーンは、１つの画面上に複数のスクリーンを表示するマルチスクリーンとして表示される。 The screen whose size and display form are controlled by the display image control unit 306 of this embodiment is displayed as a multi-screen that displays a plurality of screens on one screen.

本実施形態の表示倍率算出部３０６３において、スクリーンは、スクリーンの表示位置を示す中心位置座標(P(i))、スクリーンの形状を示す基準形状(Unit(i))、スクリーンに対応付けられた音声の大きい(V(i))、スクリーンを画面上に表示する際の表示倍率(R(i))とを有し、表示倍率(R(i))は、周囲のスクリーンの中心位置座標(P(j))と結ぶ線分(L(i,j))と、その線分上の基準形状の厚さ(Lm(i,j),Lm(j,i))、および、音声の大きさ(V(i),V(j))に基づいて算出しされた仮表示倍率(R(i,j))の内、最も小さい値をする。
表示倍率算出部３０６３において、スクリーン中心から画面境界に垂直に接した点に、音声の大きさ(V(k)=0)、厚さ(Lm(k,i)=0)を設定し、表示倍率(R(i,k))を算出する。
また、スクリーンは、表示倍率(R(i))を最も大きくする位置に移動する。
また、スクリーンは、表示倍率(R(k))の最も大きい位置に、新規スクリーンの中心を生成する。
また、基準形状は、面積を等しくする。
さらにまた、基準形状にて形成したスクリーン間に分離線を引き、前記分離線にて分離されたエリアを新たなスクリーンとする。 In the display magnification calculation unit 3063 of this embodiment, the screen is associated with the center position coordinate (P (i)) indicating the display position of the screen, the reference shape (Unit (i)) indicating the shape of the screen, and the screen. Loud (V (i)), and display magnification (R (i)) when displaying the screen on the screen, the display magnification (R (i)) is the center position coordinates of the surrounding screen ( P (j)) and the line segment (L (i, j)), the thickness of the reference shape on the line segment (Lm (i, j), Lm (j, i)), and the volume of the voice The smallest value among the temporary display magnifications (R (i, j)) calculated based on (V (i), V (j)).
In the display magnification calculator 3063, the audio volume (V (k) = 0) and thickness (Lm (k, i) = 0) are set at the point that is perpendicular to the screen boundary from the center of the screen and displayed. The magnification (R (i, k)) is calculated.
Further, the screen moves to a position where the display magnification (R (i)) is maximized.
The screen generates the center of the new screen at the position where the display magnification (R (k)) is the largest.
The reference shape has the same area.
Furthermore, a separation line is drawn between the screens formed in the reference shape, and an area separated by the separation line is set as a new screen.

次に、本実施形態に係る表示画像制御部３０６によりサイズおよび表示形態が制御されるスクリーンの表示倍率の算出、新規スクリーンの生成位置の算出、スクリーンの移動位置の算出等についてより具体的に説明する。 Next, the calculation of the display magnification of the screen whose size and display form are controlled by the display image control unit 306 according to the present embodiment, the calculation of the generation position of the new screen, the calculation of the movement position of the screen, etc. will be described more specifically. To do.

図４に示すように、各スクリーン４０は、基準形状（Unit）を有する。表示部３１０の画面上へのスクリーン４０は、基準形状（Unit）を表示倍率（R)に従って、拡大・縮小して表示する。 As shown in FIG. 4, each screen 40 has a reference shape (Unit). The screen 40 on the screen of the display unit 310 displays the reference shape (Unit) enlarged or reduced according to the display magnification (R).

表示倍率（R)の算出：
表示倍率算出部３０６３は、スクリーンiとスクリーンjとの中心間の距離（L(i,j))、各スクリーンの中心から前記方向への基準形状（Unit）における厚さ（Lm(i,j)、Lm(j,i))を算出し、および、各スクリーンに表示する内容における受信した音声の大きさ（V(i),V(j))に基づいて、スクリーンiにおけるスクリーンjからの算出される表示倍率（R(i,j))を以下の通りに算出する。 Calculation of display magnification (R) :
The display magnification calculator 3063 has a distance (L (i, j)) between the centers of the screen i and the screen j and a thickness (Lm (i, j) in the reference shape (Unit) from the center of each screen to the direction. ), Lm (j, i)), and based on the received audio volume (V (i), V (j)) in the content displayed on each screen, from screen j on screen i The calculated display magnification (R (i, j)) is calculated as follows.

周囲に存在するスクリーン間の表示倍率を算出し、次式のように、その表示倍率の内、最も小さい値を実際の表示倍率（R(i))とする。 The display magnification between the surrounding screens is calculated, and the smallest value among the display magnifications is set as the actual display magnification (R (i)) as shown in the following equation.

新規のスクリーンの生成位置の算出：
表示倍率算出部３０６３は、画面上に仮の中心を配し、各中心において、表示倍率（Rmin)を算出する。各表示倍率（R)の内、最も大きい値を取る位置を新規のスクリーンの生成の中心位置とする。 Calculation of new screen generation position :
The display magnification calculator 3063 places a temporary center on the screen and calculates the display magnification (Rmin) at each center. Of each display magnification (R), the position having the largest value is set as the center position for generating a new screen.

この条件を満たす中心(P(k))を新規スクリーンの中心位置とする。 The center (P (k)) satisfying this condition is set as the center position of the new screen.

スクリーンの移動位置の算出：
各スクリーンは、現在(t)の位置から一定距離内(集合I)の各位置において、表示倍率（R)を算出し、表示倍率の内、最も大きな値を取る位置を次時刻（t+Δt)における中心位置とする。 Calculation of screen movement position :
Each screen calculates the display magnification (R) at each position within a certain distance (set I) from the current (t) position, and the position having the largest value among the display magnifications is calculated at the next time (t + Δt ).

この条件を満たす中心（P(t+Δt))に移動する。 Move to the center (P (t + Δt)) that satisfies this condition.

スクリーン位置は、時間経過に伴い、画面上を移動していく。このため、新規のスクリーンの生成においては、画面上の空き位置全てに対して演算を行う必要はない。つまり、画面上の何点かに対して、新規生成位置の判定を行い、その結果により位置を配置したとしても、時間経過とともに、表示倍率がもっとも大きい位置に移動していく。これにより、生成における演算負荷を低減することが可能となる。 The screen position moves on the screen over time. For this reason, in the generation of a new screen, it is not necessary to perform calculation for all empty positions on the screen. That is, even if a new generation position is determined for some points on the screen and the positions are arranged based on the result, the display magnification moves to the position where the display magnification is the highest as time passes. Thereby, it becomes possible to reduce the calculation load in generation.

随時、スクリーンの位置関係は変動していくため、表示倍率（R(*))算出における基準形状の厚さ（Lm(*))は、その時の方向に対して算出する必要がある。
この厚さ算出に関しては、（複雑な形状に対して）中心から対象方向へデジタル直線を引くことにより、算出することが可能となる。ただし、これは演算負荷の増加となる。これに対しては、各基準形状に対して、各角度に対する厚さを前もって算出したテーブルを参照することにより、表示倍率演算時の演算負荷を低減することが可能となる。 Since the positional relationship of the screen changes at any time, it is necessary to calculate the thickness (Lm (*)) of the reference shape in the display magnification (R (*)) calculation with respect to the direction at that time.
The thickness can be calculated by drawing a digital straight line from the center to the target direction (for a complicated shape). However, this increases the calculation load. For this, by referring to a table in which the thickness for each angle is calculated in advance for each reference shape, it is possible to reduce the calculation load when calculating the display magnification.

画面の四方の壁処理：
各スクリーン４０は、四方の壁との間に以下の演算規則に従って、表示倍率（R)を算出する。
図５に示すように、スクリーンの中心から壁に垂直に落とした点を算出上の壁の中心とし、中心間の線分（L(i,k))、基準形状（Unit）における厚さ(Lm(i,k),Lm(k,i))と、およびスクリーンの受信した音声の大きさ（V(i),V(k))を算出する。この時、壁における、音声の大きさは(V(k)=0)、基準形状の厚さ(Lm(k,i)=0)として、前述の表示倍率(R)の算出と同様に算出を行う。
各々のスクリーン４０において、表示倍率（R(i))を算出する場合、周囲のスクリーンとの表示倍率(R(i,j)と同様に、壁との表示倍率(R(i,k))を算出し、この内、最も小さい値を実際に表示する際の表示倍率（R(i))とする Wall treatment on all sides of the screen :
Each screen 40 calculates the display magnification (R) according to the following calculation rule between the four walls.
As shown in FIG. 5, the point dropped vertically from the center of the screen to the wall is taken as the center of the calculated wall, and the line segment between the centers (L (i, k)) and the thickness in the reference shape (Unit) ( Lm (i, k), Lm (k, i)) and the magnitude (V (i), V (k)) of the voice received by the screen are calculated. At this time, the sound volume on the wall is calculated as (V (k) = 0) and the thickness of the reference shape (Lm (k, i) = 0) in the same manner as the calculation of the display magnification (R) described above. I do.
When calculating the display magnification (R (i)) for each screen 40, the display magnification with the wall (R (i, k)) is the same as the display magnification with the surrounding screen (R (i, j)). And the display magnification (R (i)) when actually displaying the smallest value among these

基準形状を楕円形とするスクリーン(S(0),S(1))において、音声の大きさ（V(0),V(1))の比を変化させた例を図６（Ａ）〜（Ｃ）に示す。
図６（Ａ）〜（Ｃ）において、左から、音声の大きさの比(V(0)：V(1))が、１：１、２：１、３：１の場合である。このように、音量に大きさに応じて、適応的にスクリーンサイズを変動することが可能となる。 An example in which the ratio of the sound volume (V (0), V (1)) is changed on the screen (S (0), S (1)) whose reference shape is an ellipse is shown in FIGS. Shown in (C).
6A to 6C, from the left, the audio volume ratio (V (0): V (1)) is 1: 1, 2: 1, 3: 1. Thus, the screen size can be adaptively changed according to the volume.

基準形状を楕円形とするスクリーン（S(0),S(1),S(2),S(3))において、画面上に形成するスクリーン数を増減した例を図７（Ａ）〜（Ｃ）に示す。
図７（Ａ）〜（Ｃ）において、左から、スクリーン数＝２、３、４の場合である。
このように、スクリーン数の数に応じて、適応的にスクリーンサイズを変動させ、画面内に全てのスクリーンを形成することが可能となる。 7A to 7 (A) to 7 (A) to 7 (A) to 7 (A) to 7 (A) to 7 (A) to 7 (A) to (E) in which the number of screens formed on the screen is increased or decreased. C).
7A to 7C, the numbers of screens are 2, 3, and 4 from the left.
In this way, it is possible to adaptively change the screen size according to the number of screens and form all the screens in the screen.

基準形状を楕円形とするスクリーン（S(0),S(1),S(2),S(3))において、画面上に形成するスクリーン数を増減しつつ、そのうち１つのスクリーンの音声の大きさを他のスクリーンの音声の大きさの倍にした例を図８（Ａ）〜（Ｃ）に示す。
図８（Ａ）〜（Ｃ）において、左から、スクリーン数＝２、３、４の場合であり、音声の大きさの比(V(0)：V(1))が２：１、比(V(0)：V(1)：V(2))が２：１：１、比(V(0)：V(1)：V(2)：V(3))が２：１：１：１の場合である。
このように、スクリーン数の数に応じて、適応的にスクリーンサイズを変動させ、画面内に全てのスクリーンを形成することが可能となる。これは、スクリーン(S(0))に映っている人が発言をしている例であり、このように、一人が発言している場合、その人のスクリーンのみが、その大きさに応じて適応的にスクリーンサイズを拡大・縮小することが可能となる。 In a screen (S (0), S (1), S (2), S (3)) with an elliptical reference shape, the number of screens formed on the screen is increased or decreased, and the sound of one of these screens is recorded. FIGS. 8A to 8C show examples in which the size is double the size of the sound of other screens.
8A to 8C, the number of screens is 2, 3, and 4 from the left, and the audio volume ratio (V (0): V (1)) is 2: 1. (V (0): V (1): V (2)) is 2: 1: 1, ratio (V (0): V (1): V (2): V (3)) is 2: 1: This is the case of 1: 1.
In this way, it is possible to adaptively change the screen size according to the number of screens and form all the screens in the screen. This is an example in which a person shown on the screen (S (0)) is speaking, and when one person is speaking in this way, only that person's screen will depend on its size. The screen size can be adaptively enlarged / reduced.

基準形状を楕円形とするスクリーンS(0),S(1),S(2),S(3))において、画面上に形成するスクリーン数を増減しつつ、そのうち１つのスクリーンの音声の大きさを他のスクリーンの音声の大きさを１／２倍にした例を図９（Ａ）〜（Ｃ）に示す。
図９（Ａ）〜（Ｃ）において、左から、スクリーン数＝２、３、４の場合であり、音声の大きさの比(V(0)：V(1))が２：１、比(V(0)：V(1)：V(2))が２：１：２、比(V(0)：V(1)：V(2)：V(3))が２：１：２：２の場合である。
このように、スクリーン数の数に応じて、適応的にスクリーンサイズを変動させ、画面内に全てのスクリーンを形成することが可能となる。これは、スクリーン(S(0))に映っている人以外が発言をしている例であり、このように、複数の人が発言している場合でも、状況に合わせて、適応的にスクリーンサイズを拡大・縮小することが可能となる。 In the screens S (0), S (1), S (2), S (3)) having an elliptical reference shape, the number of screens formed on the screen is increased or decreased, and the sound volume of one of the screens is increased. FIGS. 9A to 9C show examples in which the sound volume of other screens is halved.
9A to 9C, the number of screens is 2, 3, and 4 from the left, and the audio volume ratio (V (0): V (1)) is 2: 1. (V (0): V (1): V (2)) is 2: 1: 2, ratio (V (0): V (1): V (2): V (3)) is 2: 1: This is the case of 2: 2.
In this way, it is possible to adaptively change the screen size according to the number of screens and form all the screens in the screen. This is an example where a person other than the person shown on the screen (S (0)) is speaking, and in this way, even when multiple persons are speaking, the screen is adaptively adapted to the situation. It becomes possible to enlarge / reduce the size.

基準形状が、楕円形(S(oval))、円形(S(circle))、長方形(S(rectangle))が混在している場合（音声の大きさは等しい）の例を図１０（Ａ）〜（Ｄ）に示す。
図１０（Ａ）〜（Ｄ）において、左から、長方形と楕円形、円形と楕円形、円形と長方形、下方が円形と楕円形と長方形の場合を示している。
基準形状は、形状が異なっても、面積を同等に設定することにより、各スクリーンは、適応的にスクリーンサイズを調整し、音声の大きさが等しい場合、視覚的に各々のスクリーンサイズが等しく表示することが可能となる。 FIG. 10A shows an example in which the reference shape includes an ellipse (S (oval)), a circle (S (circle)), and a rectangle (S (rectangle)) (sound volume is equal). Shown in (D).
10A to 10D, from the left, a case where a rectangle and an ellipse, a circle and an ellipse, a circle and a rectangle, and a lower part are a circle, an ellipse, and a rectangle are shown.
Regardless of the shape of the reference shape, by setting the area to be equal, each screen adaptively adjusts the screen size, and if the audio volume is equal, each screen size is visually displayed equally It becomes possible to do.

さらに、本実施形態においては、スクリーン外のデッドゾーンを低減することから、図１１（Ａ）〜（Ｃ）に示すように、各スクリーンの間にエリアの分離線（太線）を形成し、前記分離線に基づいたエリアは各スクリーンの表示エリアとする。これにより、スクリーン数の増減と、各スクリーンの音声の大きさの増減に適応しつつ、画面を最大限に分割利用することが可能となる。
図１１（Ａ）〜（Ｃ）の例は、音声の大きさの比(V(0)：V(1))が２：１、比(V(0)：V(1)：V(2))が２：１：１、比(V(0)：V(1)：V(2)：V(3))が２：１：１：１の場合である。 Furthermore, in this embodiment, since the dead zone outside the screen is reduced, as shown in FIGS. 11 (A) to (C), an area separation line (bold line) is formed between the screens. The area based on the separation line is the display area of each screen. This makes it possible to divide and use the screen to the maximum while adapting to the increase / decrease in the number of screens and the increase / decrease in the sound volume of each screen.
In the example of FIGS. 11A to 11C, the audio volume ratio (V (0): V (1)) is 2: 1 and the ratio (V (0): V (1): V (2). )) Is 2: 1: 1 and the ratio (V (0): V (1): V (2): V (3)) is 2: 1: 1: 1.

次に、上述したようサイズおよび表示形態が制御される表示部３１０におけるスクリーン表示制御について、図１２から図１６に関連付けて説明する。 Next, the screen display control in the display unit 310 in which the size and the display form are controlled as described above will be described with reference to FIGS.

本実施形態においては、前述したように、デッドゾーンをなくした円形（楕円形）ウィンドゥにて分割する。 In this embodiment, as described above, the image is divided by a circular (elliptical) window without a dead zone.

図１２および図１３に示すように、スクリーン上に表示する画像は、送信側(エンコード装置側)、ないし受信側(デコード装置側)において、画像から顔の特徴点を抽出し、これに基づいて、顔エリア算出を行う。この顔エリアがスクリーンにて包含されるように画像を切り出し、スクリーンにマッピングする。
図１２および図１３に示すように、受信画像から、顔の特徴点１１１を検索し、輪郭抽出を行って顔を抽出する。そして、一度、顔エリアを抽出した場合、動きベクトルに応じて顔エリアの追従を行って、切り出しを行う。 As shown in FIG. 12 and FIG. 13, on the screen to be displayed on the screen, the facial feature points are extracted from the image on the transmission side (encoding device side) or the reception side (decoding device side). The face area is calculated. An image is cut out and mapped to the screen so that the face area is included on the screen.
As shown in FIGS. 12 and 13, a facial feature point 111 is searched from the received image, and a contour is extracted to extract a face. Once the face area is extracted, the face area is tracked according to the motion vector, and cut out.

本実施形態においては、図２および図３に関連つけた図１４に示すように、エンコード装置２０側（符号化側）にて、顔エリアを検出し、検出した顔エリアを包含するとともに、顔以外のエリアが最小になるように円形のスクリーンを選択する。円形のスクリーンを包含する四角形エリアを送信画像として切り出し、これを符号化し、スクリーン情報、マイクなどの入力音声の音量情報とともにパケットとして送出する。
エンコード装置２０において、図１５に示すように、撮像画像に対して、符号化部分は一部分である。網掛けの部分は、切り落とし部分であり、符号化の対象から削除している。結果的に伝送する画像データの容量は削減される。 In the present embodiment, as shown in FIG. 14 associated with FIG. 2 and FIG. 3, the face area is detected on the encoding device 20 side (encoding side), and the detected face area is included and the face is detected. Select a circular screen to minimize the area other than. A rectangular area including a circular screen is cut out as a transmission image, encoded, and transmitted as a packet together with screen information and volume information of input sound such as a microphone.
In the encoding device 20, as shown in FIG. 15, the encoded portion is a part of the captured image. The shaded portion is a cut-off portion and is deleted from the encoding target. As a result, the capacity of image data to be transmitted is reduced.

また、図１６に示すように、撮像されている内容を解析し、撮像内容に応じて、スクリーン形状を変える場合において、適用可能である。
図１６の例においては、人が撮像されていると判断された場合、スクリーン形状を楕円形にし、それ以外の場合はスクリーン形状を長方形にした場合である。
図１６に例においては、顔エリアの面積が一定値以上の場合、受信画像を「人物画像」と判別する。顔エリアの面積が一定値以下の場合、受信画像を「非人物画像」と判別する。
「人物画像」と判別した場合、円形スクリーンとする。「非人物画像」と判別した場合、四角形スクリーンとする。「非人物画像」であっても、同送信元からの音圧に応じて、表示サイズを変動させる。 Further, as shown in FIG. 16, the present invention can be applied to the case where the captured content is analyzed and the screen shape is changed according to the captured content.
In the example of FIG. 16, when it is determined that a person is captured, the screen shape is an ellipse, and in other cases, the screen shape is a rectangle.
In the example of FIG. 16, when the area of the face area is equal to or greater than a certain value, the received image is determined as a “person image”. When the area of the face area is equal to or smaller than a certain value, the received image is determined as a “non-person image”.
When it is determined that the image is a “person image”, a circular screen is used. When it is determined that the image is a “non-person image”, a quadrangular screen is used. Even for a “non-human image”, the display size is changed in accordance with the sound pressure from the transmission source.

そして、図１４に示すように、デコード装置３０においては、受信データから映像復号処理部３０５にて復号された受信画像を、制御情報解析部３０６１で抽出されたスクリーン情報に基づいてマスキングする。
また、表示倍率算出部３０６３において、サイズ情報に基づき表示倍率を算出し、縮小・拡大処理部３０６４において算出した倍率に従ってマスキングされた画像が縮小、拡大する。一方、表示位置算出部３０６５において、算出された表示倍率に従って表示位置を算出し、算出した表示位置に縮小または拡大された画像を含むスクリーンを表示部３１０に表示する。 As shown in FIG. 14, the decoding device 30 masks the received image decoded by the video decoding processing unit 305 from the received data based on the screen information extracted by the control information analysis unit 3061.
The display magnification calculator 3063 calculates the display magnification based on the size information, and the masked image is reduced or enlarged according to the magnification calculated by the reduction / enlargement processing unit 3064. On the other hand, the display position calculation unit 3065 calculates a display position according to the calculated display magnification, and displays a screen including an image reduced or enlarged at the calculated display position on the display unit 310.

個々で、多地点通信を行う場合であって、端末機を図２および図３に示す構成とした場合における処理に負荷について考察する。
デコード装置３０側で台数Ｎに増大した場合の処理は、次のようになる。 Considering the load in processing when multipoint communication is performed individually and the terminal is configured as shown in FIGS.
Processing when the number of decoding devices 30 increases to the number N is as follows.

Ｏnew＝Ｎ×（マスキング処理＋縮小・拡大処理＋マッピング処理＋表示倍率算出＋表示位置算出） Onnew = N × (masking process + reduction / enlargement process + mapping process + display magnification calculation + display position calculation)

これに対して、デコード装置側において、本実施形態の送信側(エンコード装置側)の処理をデコード装置側において行うように構成した場合の処理は次のようになる。 On the other hand, the processing when the decoding device side is configured to perform the transmission side (encoding device side) processing of the present embodiment on the decoding device side is as follows.

Ｏold＝Ｎ×（天地補正処理＋顔エリア検出＋スクリーン判定＋切り出し処理＋サイズ算出＋縮小・拡大処理＋表示倍率算出＋表示位置算出＋マッピング処理） Oold = N × (top and bottom correction process + face area detection + screen determination + cutout process + size calculation + reduction / enlargement process + display magnification calculation + display position calculation + mapping process)

処理の差を観ると、図２および図３に示す構成の方が、次に示す分、負荷が軽減されていることになる。 Looking at the difference in processing, the load shown in FIGS. 2 and 3 is reduced by the following amount.

Ｏsub＝Ｏold−Ｏnew
＝Ｎ×（天地補正処理＋顔エリア検出＋スクリーン判定＋切り出し処理＋サイズ算出−マスキング処理） Osub = Oold-Onew
= N × (top and bottom correction process + face area detection + screen determination + cutout process + size calculation−masking process)

この内、処理負荷のほとんどは、「顔エリア検出」となる。 Of these, most of the processing load is “face area detection”.

送信側（エンコード装置側）では、次に示す分だけ負荷が増加している。 On the transmission side (encoding device side), the load is increased by the following amount.

Ｅnew＝天地補正処理＋顔エリア検出＋スクリーン判定＋切り出し処理＋サイズ算出 Enew = top and bottom correction processing + face area detection + screen determination + cutout processing + size calculation

しかし、この負荷は接続する台数には依存しない。 However, this load does not depend on the number of connected units.

以上説明したように、本実施形態によれば、送信元となるエンコード装置２０は、送信元は、画面上、指示した位置にスクリーンが存在する場合、対応する指示情報、スクリーン情報、音量情報を生成し、同通信中の相手に対して送出する機能を有し、デコード装置３０は、マルチスクリーンを表示する機能を有し、スクリーン中心間を結ぶ線分、基準形状の厚さ、音声の大きさに基づいて、スクリーンの表示倍率を算出し、この表示倍率に基づいてスクリーンの移動、新規生成を制御することにより、画面上に複数のスクリーンを最適に形成する機能を有することから、スクリーンのサイズが、音量の大きさ、およびスクリーン数に応じて、適応的にサイズを変動させることができる。
また、スクリーンの動きが連続的になり、異なる形状であっても、最適なサイズに配置することが可能となる利点がある。
その結果、複数端末の接続時であっても、通話中の相手を確認しやすく、また、画像(スクリーン)は重ならないように制御されることから、会話している全員の状態が一目で確認することができる。また、新たな参加人にも容易に対応することができる。 As described above, according to the present embodiment, the encoding device 20 serving as the transmission source transmits the corresponding instruction information, screen information, and volume information when the transmission source has a screen at the designated position on the screen. The decoding device 30 has a function of displaying a multi-screen, a line connecting between the centers of the screen, a thickness of a reference shape, and a voice size. The screen display magnification is calculated based on the display magnification, and the screen movement and new generation are controlled based on the display magnification, thereby having the function of optimally forming a plurality of screens on the screen. The size can be adaptively changed according to the volume level and the number of screens.
In addition, there is an advantage that the movement of the screen becomes continuous, and even when the shapes are different, it can be arranged in an optimum size.
As a result, even when multiple terminals are connected, it is easy to check the other party in a call, and the image (screen) is controlled so that it does not overlap, so you can check the status of everyone who is talking at a glance can do. In addition, new participants can be easily accommodated.

一般的なテレビ会議システムにおける多値点通信時のパーソナルコンピュータ（ＰＣ）等の端末の表示画面例を示す図である。It is a figure which shows the example of a display screen of terminals, such as a personal computer (PC) at the time of multipoint communication in a general video conference system. 本発明の実施形態に係る携帯通信端末の構成例を示す図であって、エンコード装置を示すブロック図である。It is a figure which shows the structural example of the portable communication terminal which concerns on embodiment of this invention, Comprising: It is a block diagram which shows an encoding apparatus. 本発明の実施形態に係る携帯通信端末の構成例を示す図であって、デコード装置を示すブロック図である。It is a figure which shows the structural example of the portable communication terminal which concerns on embodiment of this invention, Comprising: It is a block diagram which shows a decoding apparatus. 表示倍率の算出処理を説明するための図である。It is a figure for demonstrating the calculation process of a display magnification. 画面四方の壁処理を説明するための図である。It is a figure for demonstrating the wall process of a screen four directions. 基準形状を楕円形とするスクリーン(S(0),S(1))において、音声の大きさ（V(0),V(1))の比を変化させた例を示す図である。It is a figure which shows the example which changed the ratio of the magnitude | size (V (0), V (1)) of an audio | voice in the screen (S (0), S (1)) which makes an elliptical reference | standard shape. 基準形状を楕円形とするスクリーン（S(0),S(1),S(2),S(3))において、画面上に形成するスクリーン数を増減した例を示す図である。It is a figure which shows the example which increased / decreased the number of screens formed on a screen in the screen (S (0), S (1), S (2), S (3)) which makes an oval reference shape. 基準形状を楕円形とするスクリーン（S(0),S(1),S(2),S(3))において、画面上に形成するスクリーン数を増減しつつ、そのうち１つのスクリーンの音声の大きさを他のスクリーンの音声の大きさの倍にした例を示す図である。In a screen (S (0), S (1), S (2), S (3)) with an elliptical reference shape, the number of screens formed on the screen is increased or decreased, and the sound of one of these screens is recorded. It is a figure which shows the example which doubled the magnitude | size of the audio | voice of the other screen. 基準形状を楕円形とするスクリーンS(0),S(1),S(2),S(3))において、画面上に形成するスクリーン数を増減しつつ、そのうち１つのスクリーンの音声の大きさを他のスクリーンの音声の大きさを１／２倍にした例を示す図である。In the screens S (0), S (1), S (2), S (3)) having an elliptical reference shape, the number of screens formed on the screen is increased or decreased, and the sound volume of one of the screens is increased. It is a figure which shows the example which doubled the magnitude | size of the audio | voice of the other screen. 基準形状が、楕円形(S(oval))、円形(S(circle))、長方形(S(rectangle))が混在している場合（音声の大きさは等しい）の例を示す図である。It is a figure which shows the example when a reference | standard shape has mixed ellipse (S (oval)), circle | round | yen (S (circle)), and a rectangle (S (rectangle)) (the magnitude | size of an audio | voice is equal). 各スクリーンの間にエリアの分離線（太線）を形成し、前記分離線に基づいたエリアは各スクリーンの表示エリアとする例を示す図である。It is a figure which shows the example which forms the separation line (bold line) of an area between each screen, and makes the area based on the said separation line into the display area of each screen. スクリーン表示制御について説明するための図であって、受信画像から顔エリアの抽出処理の説明図である。It is a figure for demonstrating screen display control, Comprising: It is explanatory drawing of the extraction process of a face area from a received image. スクリーン表示制御について説明するための図であって、受信画像から顔エリアの抽出処理後の切り出し処理の説明図である。It is a figure for demonstrating screen display control, Comprising: It is explanatory drawing of the clipping process after the extraction process of a face area from a received image. 本実施形態の動作を模式的に示す図である。It is a figure which shows the operation | movement of this embodiment typically. 本実施形態の動作を説明するための図であって、送信側における撮像画像の送信エリア例を示す図である。It is a figure for demonstrating the operation | movement of this embodiment, Comprising: It is a figure which shows the transmission area example of the captured image in the transmission side. スクリーン表示制御について説明するための図であって、顔エリアの面積に応じた処理の説明図である。It is a figure for demonstrating screen display control, Comprising: It is explanatory drawing of the process according to the area of a face area.

Explanation of symbols

１０・・・携帯通信端末、２０・・・エンコード装置、２０１・・・音声入力部、２０２・・・画像入力部、２０３・・・操作部、２０４・・・音声符号化処理部、２０５・・・画像符号化処理部、２０６・・・天地補正部、２０７・・・顔エリア検出部、２０８・・・スクリーン判定部、２０９・・・切り出し処理部、２１０・・・入力音量計測部、２１１・・・端末制御部、２１２・・・制御情報生成部、２１３・・・記憶部、２１４・・・送信パケット生成部、２１５・・・ネットワークインタフェース（Ｉ／Ｆ）、３０・・・デコード装置、３０１・・・ネットワークインタフェース（Ｉ／Ｆ）、３０２・・・操作部、３０３・・・受信パケット解析部、３０４・・・音声復号処理部、３０５・・・映像復号処理部、３０６・・・表示画像制御部、３０７・・・音量修正部、３０８・・音声出力部、３０９・・・画像補正部、３１０・・・表示部（画像出力部）、３１１・・・自端末制御部、３０６１・・・制御情報解析部、３０６２・・・マスキング処理部、３０６３・・・表示倍率算出部、３０６４・・・縮小・拡大処理部、３０６５・・・表示位置算出部、３０６６・・・マッピング処理部。
DESCRIPTION OF SYMBOLS 10 ... Portable communication terminal, 20 ... Encoding apparatus, 201 ... Audio | voice input part, 202 ... Image input part, 203 ... Operation part, 204 ... Voice encoding process part, 205. ..Image encoding processing unit, 206... Top and bottom correction unit, 207... Face area detection unit, 208... Screen determination unit, 209. 211: Terminal control unit, 212: Control information generation unit, 213 ... Storage unit, 214 ... Transmission packet generation unit, 215 ... Network interface (I / F), 30 ... Decoding Device 301 network interface (I / F) 302 operation unit 303 received packet analysis unit 304 audio decoding processing unit 305 video decoding processing unit 306 ··table Image control unit, 307... Volume correction unit, 308 .. Audio output unit, 309... Image correction unit, 310... Display unit (image output unit), 311. ..Control information analysis unit, 3062 ... Masking processing unit, 3063 ... Display magnification calculation unit, 3064 ... Reduction / enlargement processing unit, 3065 ... Display position calculation unit, 3066 ... Mapping processing unit .

Claims

A communication terminal that reproduces received image data and audio data,
Display means for displaying an image;
It is possible to form a plurality of display areas for displaying each of a plurality of images that should be displayed by extracting a specific area on the display means, and at least a line segment connecting the centers of the display areas of the images, a thickness of the reference shape, a sound A display magnification of the display area is calculated based on the size of the display area, and movement and new generation of the display area are controlled based on the display magnification to form a plurality of display areas on the display screen of the display means. And a communication terminal.

The display area includes a center position coordinate (P (i)) indicating the display position of the display area, a reference shape (Unit (i)) indicating the shape of the display area, and a large amount of sound associated with the display area. (V (i)), and display magnification (R (i)) when displaying the display area on the screen,
The control means, as a display magnification (R (i)), a line segment (L (i, j)) connected to the center position coordinates (P (j)) of the surrounding screen, and the reference shape on the line segment Temporary display magnification (R (i, j, i)) calculated based on thickness (Lm (i, j), Lm (j, i)) and audio volume (V (i), V (j)) The communication terminal according to claim 1, wherein j)) is the smallest value.

The control means sets the loudness (V (k) = 0) and thickness (Lm (k, i) = 0) at the point perpendicular to the screen boundary from the center of the display area, and displays it. The communication terminal according to claim 1 or 2, wherein a magnification (R (i, k)) is calculated.

The communication terminal according to any one of claims 1 to 3, wherein the control means moves the display area to a position where the display magnification (R (i)) is maximized.

The communication terminal according to any one of claims 1 to 4, wherein the control means generates a center of a new display area at a position where the display magnification (R (k)) is the largest.

The communication terminal according to any one of claims 1 to 5, wherein the control means equalizes an area of the reference shape.

The communication terminal according to any one of claims 1 to 6, wherein the control unit draws a separation line between screens formed in a reference shape, and uses the area separated by the separation line as a new display area.

The information about the image to be displayed is included in the reception information from the transmission side device,
The communication terminal according to any one of claims 1 to 7, wherein the control unit calculates display magnification based on the reception information, and controls movement and new generation of a display area.

A display method of a communication terminal that reproduces received image data and audio data,
Based on the line segment connecting the display area centers of a plurality of images to be extracted and displayed a specific area, the thickness of the reference shape, the volume of the sound, the display area display magnification is calculated,
Based on the display magnification, the display area is moved and newly generated, and a plurality of display areas are formed on the display screen.
A display method of a communication terminal that displays a plurality of display areas including an image to be displayed.