JP2007243392A

JP2007243392A - Call terminal, multi-user call system, multi-user call method, and program

Info

Publication number: JP2007243392A
Application number: JP2006060764A
Authority: JP
Inventors: Masaki Fukasaku; 正樹深作; Noriaki Mikami; 典昭三上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2006-03-07
Filing date: 2006-03-07
Publication date: 2007-09-20

Abstract

<P>PROBLEM TO BE SOLVED: To provide a call terminal which assures shifting of the right to speak while a group session is established for smooth communication. <P>SOLUTION: The call terminal allows multi-user call. It comprises a received voice recognizing means which receives voice data transmitted from a call terminal of a user having the right to speak to recognize voice, a speech request detecting means which detects that the received voice data indicate speech request based on the voice recognition result, and a reporting means which, when detecting a speech request, reports it to a user. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、通話端末にかかり、特に、複数の端末間において多者通話が可能な通話端末に関する。 The present invention relates to a call terminal, and more particularly to a call terminal capable of multi-party calls between a plurality of terminals.

近年、携帯電話などにおける通信技術の向上により、複数の端末間でグループセッションを確立し、１対Ｎ（多数）での通話が可能となっている。このような技術は、例えば、モバイルサービスの標準化団体であるＯＭＡ（Open mobile Alliance）にて、Push-to-Talk over Cellular（ＰｏＣ）という名称で呼ばれている。 In recent years, with the improvement of communication technology in mobile phones and the like, a group session is established between a plurality of terminals, and one-to-N (many) calls are possible. Such a technique is called by the name of Push-to-Talk over Cellular (PoC) in OMA (Open mobile Alliance) which is a mobile service standardization organization.

そして、上述した複数人でのPoC通信は、トランシーバーに近い半２重通話型のコミュニケーション手法にて行われる。つまり、発言権を取得した人間が発言している間、他の人間は傍聴者となる。発言権はPoC通信グループメンバー間で交互にやり取りがなされ、「○○さんどうぞ」等の口頭での合図（発言要求）により、他の参加者に発言権取得を促すような方法が一般的である。 The above-described PoC communication with a plurality of people is performed by a half-duplex communication method close to a transceiver. That is, while a person who has acquired the right to speak speaks, other persons become listeners. The right to speak is exchanged between the members of the PoC communication group, and it is common practice to encourage other participants to obtain the right to speak by verbal cues (request to speak) such as is there.

特開２００１−１８８７４０号公報JP 2001-188740 A

しかしながら、上述したような従来の口頭のみによる発言権取得要求では、周囲の雑音で発言要求を聞き取れなかった、または注意散漫で聞いていなかったという場合には、合図（発言要求）が伝わらず円滑なコミュニケーションを阻害する可能性がある。また、PoC通信グループメンバーの中で発言の少ない人間へ発言を促す場合においても、口頭での発言要求だけでなく、同時に、より有効な報知手段にて通知して発言を促すことができれば議論が更に活発化すると考えられる。 However, in the above-described conventional verbal request acquisition request only by oral, if the request for speech cannot be heard due to ambient noise or if it is not heard because of distraction, the signal (speak request) is not transmitted smoothly. May hinder proper communication. In addition, when urging people to speak in PoC communication group members, it is not only a verbal request for speech, but at the same time, it can be discussed if notification can be encouraged by using more effective notification means. It is thought that it will become more active.

一方で、上記特許文献１には、電子会議システムにおいて、発言対象者を指定するボタンを押下することで、かかる対象者に発言権が与えられ、当該対象者にその旨を通知する、というシステムが開示されている。しかし、グループセッションにおいて円滑なコミュニケーションを図るためには、従来例と同様に、会話の中で口頭による合図によって他の参加者に発言権取得を促すことが望ましい。 On the other hand, in Patent Document 1, in a teleconference system, by pressing a button for designating a speech target person, the right to speak is given to the target person, and the target person is notified accordingly. Is disclosed. However, in order to facilitate smooth communication in the group session, it is desirable to encourage other participants to obtain the right to speak by verbal cues in the conversation, as in the conventional example.

このため、本発明は、上記従来例の有する不都合を改善し、特に、グループセッション確立中における発言権移動の確実性の向上を図り、円滑なコミュニケーションを図ることができる通話端末を提供すること、をその目的とする。 For this reason, the present invention improves the inconvenience of the above-described conventional example, and in particular, provides a call terminal capable of improving the certainty of floor transfer during establishment of a group session and achieving smooth communication. Is the purpose.

そこで、本発明の一形態である通話端末は、
多者通話が可能な通話端末であって、
発言権を有する者の通話端末から送信された音声データを受信して音声認識する受信音声認識手段と、
この音声認識結果に基づいて受信した音声データが発言要求であることを検出する発言要求検出手段と、
発言要求を検出した場合にその旨をユーザに報知する報知手段と、
を備えたことを特徴としている。 Therefore, a call terminal according to an aspect of the present invention is
A call terminal capable of multi-party calls,
A received voice recognition means for receiving and recognizing voice data transmitted from a call terminal of a person who has the right to speak;
A speech request detecting means for detecting that the speech data received based on the speech recognition result is a speech request;
An informing means for informing the user when a request to speak is detected;
It is characterized by having.

上記発明によると、まず、発言権を有する者から発言要求の音声が発せられると、発言要求されたユーザの通話端末にて、音声認識により自動的に発言要求音声が検出される。そして、発言要求された旨が通話端末にて呼び出し音やバイブレータなどによってユーザに報知される。従って、ユーザは、通話内における音声のみでなく、他の方法によっても発言要求があったことを認識することができ、円滑なコミュニケーションを図ることができる。 According to the above invention, first, when a voice requesting speech is uttered from a person who has the right to speak, the speech requesting voice is automatically detected by voice recognition at the call terminal of the user who has requested speech. Then, the user is notified of the request for speech by a ringing tone or a vibrator at the call terminal. Therefore, the user can recognize that the request for speech has been made not only by the voice in the call but also by other methods, and smooth communication can be achieved.

そして、予め設定された発言要求文言を記憶する発言要求文言記憶手段を備えると共に、発言要求検出手段は、記憶された発言要求文言に基づいて発言要求を検出する、ことを特徴としている。これにより、発言要求文言を精度よく検出することができ、より円滑なコミュニケーションを図ることができる。 The speech request message storage unit stores a speech request message set in advance, and the speech request detection unit detects a speech request based on the stored speech request message. As a result, it is possible to detect the speech request wording with high accuracy and achieve smoother communication.

また、報知手段は、複数種類の報知を行う、ことを特徴としており、特に、報知手段は、複数種類の報知を時間をずらして行う、ことを特徴としている。さらには、一定時間内における発言要求検出回数をカウントする発言要求検出回数カウント手段を備えると共に、報知手段は、発言要求検出回数に応じて複数種類の報知を行う、ことを特徴としている。これにより、通話状況に応じて有効な報知を行うことができると共に、報知がユーザの通話の妨げとなることを抑制でき、さらなる円滑なコミュニケーションを図ることができる。 Further, the notification means is characterized in that a plurality of types of notification are performed, and in particular, the notification means is characterized in that a plurality of types of notification are performed at different times. Furthermore, it is provided with a speech request detection frequency count means for counting the number of speech request detections within a certain time, and the notification means performs a plurality of types of notification according to the speech request detection frequency. Thereby, while being able to perform effective alerting according to the call status, it is possible to suppress the alerting from interfering with the user's phone call, and further smooth communication can be achieved.

また、本発明における通話端末の他の形態は、
他のユーザに対する音声による発言要求の入力を受け付けて音声認識する発言要求音声認識手段と、
この音声認識された音声データに対応する合成音を生成する合成音生成手段と、
この生成された合成音からなる音声データを他のユーザの通話端末に送信する発言要求送信手段と、
を備えたことを特徴としている。 Moreover, the other form of the call terminal in the present invention is:
A speech request speech recognition means for recognizing a speech by receiving an input of a speech request to another user;
A synthesized sound generating means for generating a synthesized sound corresponding to the speech-recognized voice data;
A speech request transmitting means for transmitting the voice data composed of the generated synthesized sound to a call terminal of another user,
It is characterized by having.

上記発明によると、発言権を有する者から発せられた発言要求の音声は、通話端末にて音声認識され、合成音にて他のユーザの通話端末に送信される。従って、聞き取りやすい合成音にて発言要求がなされるため、他の通話端末のユーザは、発言要求を認識しやすくなり、円滑なコミュニケーションを図ることができる。特に、上述したように、発言要求されたユーザの通話端末にて音声認識により自動的に発言要求が検出される場合には、より高精度に発言要求が検出されうる。 According to the above invention, the voice of the speech request issued from the person who has the right to speak is recognized by the call terminal and transmitted to the other user's call terminal as a synthesized sound. Accordingly, since a speech request is made with a synthetic sound that is easy to hear, users of other call terminals can easily recognize the speech request, and smooth communication can be achieved. In particular, as described above, when a speech request is automatically detected by voice recognition at the call terminal of the user who has requested speech, the speech request can be detected with higher accuracy.

そして、予め設定された発言要求文言を記憶する発言要求文言記憶手段を備えると共に、合成音生成手段は、記憶された発言要求文言に基づいて特定の音声データのみに対応する合成音を生成する、ことを特徴としている。これにより、発言要求文言のみの合成音が生成されるため、端末の処理負担を軽減できる。 And it is provided with the speech request message memory | storage means which memorize | stores the speech request message set beforehand, and a synthetic sound production | generation means produces | generates the synthetic | combination sound corresponding only to specific audio | voice data based on the stored speech request wording. It is characterized by that. As a result, a synthesized sound of only the speech request message is generated, so that the processing burden on the terminal can be reduced.

また、予め操作者の音声特徴を表す音声特徴データを記憶する音声特徴データ記憶手段を備えると共に、発言要求音声認識手段は、音声特徴データに基づいて音声認識を行う、ことを特徴としている。これにより、操作者の癖に応じて発言要求文言を精度よく検出することができ、より円滑なコミュニケーションを図ることができる。 In addition, voice feature data storage means for storing voice feature data representing the voice characteristics of the operator in advance is provided, and the speech request voice recognition means performs voice recognition based on the voice feature data. As a result, it is possible to detect the speech request wording with high accuracy according to the operator's habit and to achieve smoother communication.

また、本発明では、上述した発言権を有し発言要求を行う者が所有する通話端末と、この発言要求を受けるユーザの通話端末と、を備えた多者通話システムをも提供している。また、上述した通話端末は、一台の端末にて構成されていることを特徴としている。 The present invention also provides a multi-party call system including a call terminal owned by a person who has the right to speak and makes a request for speech, and a call terminal of a user who receives the request for speech. Further, the above-described call terminal is configured by a single terminal.

さらに、本発明の他の形態は、
多者通話が可能な通話端末にネットワークを介して接続されたサーバコンピュータであって、
サーバコンピュータが、発言権を有する者の通話端末から送信された他のユーザに対する音声による発言要求を受信して音声認識する発言要求音声認識手段と、
この音声認識結果に基づいて受信した音声データが発言要求であることを検出する発言要求検出手段と、
発言要求を検出した場合にその旨を他のユーザの通話端末に報知する報知制御手段と、
を備えたことを特徴としている。 Furthermore, another aspect of the present invention is:
A server computer connected via a network to a call terminal capable of multi-party calls,
A speech request speech recognition means for recognizing speech by receiving a speech request for speech sent to another user by a server computer from a call terminal of a person who has a speech right;
A speech request detecting means for detecting that the speech data received based on the speech recognition result is a speech request;
Notification control means for notifying other users' call terminals when a request to speak is detected;
It is characterized by having.

これにより、音声認識処理をサーバコンピュータに実行させることができるため、上述した効果を得ることができると共に、通話端末での処理負担の軽減を図ることができる。 Thereby, since the voice recognition processing can be executed by the server computer, the above-described effects can be obtained, and the processing load on the call terminal can be reduced.

また、本発明の他の形態であるプログラムは、
多者通話が可能な通話端末に装備された演算装置に、
発言権を有する者の通話端末から送信された音声データを受信して音声認識する受信音声認識手段と、
この音声認識結果に基づいて受信した音声データが発言要求であることを検出する発言要求検出手段と、
発言要求を検出した場合にその旨を通話端末に装備された報知器を介してユーザに報知する報知制御手段と、
を実現させるためのプログラムである。 Moreover, the program which is the other form of this invention is:
In the computing device equipped in the call terminal that can make multi-party calls,
A received voice recognition means for receiving and recognizing voice data transmitted from a call terminal of a person who has the right to speak;
A speech request detecting means for detecting that the speech data received based on the speech recognition result is a speech request;
Notification control means for notifying the user via a notification device equipped in the call terminal when a request to speak is detected;
It is a program for realizing.

また、本発明であるプログラムの他の形態は、
多者通話が可能な通話端末に装備された演算装置に、
他のユーザに対する音声による発言要求の入力を受け付けて音声認識する発言要求音声認識手段と、
この音声認識された音声データに対応する合成音を生成する合成音生成手段と、
この生成された合成音からなる音声データを他のユーザの通話端末に送信する発言要求送信手段と、
を実現させるためのプログラムである。 Another form of the program according to the present invention is as follows:
In the computing device equipped in the call terminal that can make multi-party calls,
A speech request speech recognition means for recognizing a speech by receiving an input of a speech request to another user;
A synthesized sound generating means for generating a synthesized sound corresponding to the speech-recognized voice data;
A speech request transmitting means for transmitting the voice data composed of the generated synthesized sound to a call terminal of another user,
It is a program for realizing.

さらに、本発明では、通話端末を用いた多者通話方法を提供しており、
発言要求を受けるユーザの通話端末が、発言権を有する者の通話端末から送信された音声データを受信して音声認識する受信音声認識工程と、この音声認識結果に基づいて受信した音声データが発言要求であることを検出する発言要求検出工程と、発言要求を検出した場合にその旨を通信端末に装備された報知器を介してユーザに報知する報知工程と、
を有することを特徴としている。 Furthermore, the present invention provides a multi-party call method using a call terminal,
The user's call terminal that receives the request for speech receives the voice data transmitted from the call terminal of the person who has the right to speak and recognizes the voice, and the voice data received based on the voice recognition result A speech request detection step for detecting that it is a request, and a notification step for notifying the user via a notification device equipped in the communication terminal when a speech request is detected,
It is characterized by having.

そして、発言要求検出工程は、発言要求文言記憶手段に予め記憶された発言要求文言に基づいて発言要求を検出する、ことを特徴としている。また、報知工程は、複数種類の報知を、時間をずらして行う、ことを特徴としている。さらに、報知工程は、一定時間内における発言要求検出回数をカウントすると共に、発言要求検出回数に応じて複数種類の報知を行う、ことを特徴としている。 The speech request detection step is characterized in that a speech request is detected based on a speech request message stored in advance in the speech request message storage means. Further, the notification step is characterized in that a plurality of types of notification are performed at different times. Further, the notifying step is characterized in that the number of times of utterance request detection within a predetermined time is counted and a plurality of types of notifications are performed according to the number of utterance request detections.

さらに、受信音声認識工程の前に、発言権を有する者の通話端末が、他のユーザに対する音声による発言要求の入力を受け付けて音声認識する発言要求音声認識工程と、この音声認識された音声データに対応する合成音を生成する合成音生成工程と、この生成された合成音からなる音声データを他のユーザの通話端末に送信する発言要求送信工程と、を有することを特徴としている。 Further, prior to the received voice recognition step, the speech terminal of the person who has the right to speak receives a speech request input by voice to another user and recognizes the voice, and this voice-recognized voice data A synthesized sound generating step for generating a synthesized sound corresponding to the voice, and a speech request transmitting step for transmitting voice data composed of the generated synthesized sound to a call terminal of another user.

そして、合成音生成工程は、発言要求文言記憶手段に予め記憶された発言要求文言に基づいて特定の音声データのみに対応する合成音を生成する、ことを特徴としている。また、発言要求音声認識工程は、音声特徴データ記憶手段に予め記憶された操作者の音声特徴を表す音声特徴データに基づいて音声認識を行う、ことを特徴としている。 The synthesized sound generation step is characterized in that a synthesized sound corresponding to only specific voice data is generated based on the speech request message stored in advance in the speech request message storage means. The speech request speech recognition step is characterized in that speech recognition is performed based on speech feature data representing an operator's speech features stored in advance in speech feature data storage means.

上記構成の多者通話システム、プログラム、多者通話方法であっても、上述した通話端末と同様に作用するため、上述した本発明の目的を達成することができる。 Even the multi-party call system, the program, and the multi-party call method configured as described above operate in the same manner as the above-described call terminal, so that the above-described object of the present invention can be achieved.

本発明は、以上のように構成され機能するので、これによると、他者からの発言要求音声が音声認識により自動的に検出されるため、ユーザは、通話内における音声の発言要求を聞き逃した場合であっても、他の報知方法によって発言要求があったことを認識することができる。従って、多者通話時における円滑なコミュニケーションを図ることができる、という従来にない優れた効果を有する。 Since the present invention is configured and functions as described above, according to this, since a speech request voice from another person is automatically detected by voice recognition, the user cannot hear a speech request in a call. Even in this case, it is possible to recognize that there is a request for speech by another notification method. Therefore, the present invention has an unprecedented excellent effect that smooth communication during multi-party calls can be achieved.

本発明は、多者通話時における円滑なコミュニケーションを図るべく、発言権の移動を促す発言要求である音声データの認識の確実性を向上させる、という点に特徴を有する。以下、実施例では、多者通話が可能な通話端末の一例として携帯電話を挙げ、また、モバイルサービスの標準化団体であるＯＭＡ（Open mobile Alliance）にて、Push-to-Talk over Cellular（ＰｏＣ）という名称で呼ばれている技術を用いて、多者通話を行う場合を一例に挙げて説明する。但し、通話端末は携帯電話に限定されず、また、多者通話を行うシステムは、上記ＰｏＣシステムに限定されない。 The present invention is characterized in that the certainty of recognition of voice data, which is a speech request for encouraging the transfer of the right to speak, is improved in order to facilitate smooth communication during a multi-party call. Hereinafter, in the embodiment, a mobile phone is given as an example of a call terminal capable of multi-party calls, and Push-to-Talk over Cellular (PoC) is established by OMA (Open mobile Alliance) which is a mobile service standardization organization. A case where a multi-party call is performed using a technique called “name” will be described as an example. However, the call terminal is not limited to a mobile phone, and a system for performing multi-party calls is not limited to the PoC system.

本発明の第１の実施例を、図１乃至図６を参照して説明する。図１は、多者通話システムの全体構成を示す概略図であり、図２は、多者通話を実現する携帯電話の構成を示すブロック図である。図３は、携帯電話に記憶されている情報の一例を示す図である。図４は、多者通話時の様子を示す説明図であり、図５乃至図６は、多者通話時の動作を示すシーケンス図である。 A first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a schematic diagram showing an overall configuration of a multi-party call system, and FIG. 2 is a block diagram showing a configuration of a mobile phone that realizes multi-party call. FIG. 3 is a diagram illustrating an example of information stored in the mobile phone. FIG. 4 is an explanatory diagram showing a state during a multi-party call, and FIGS. 5 to 6 are sequence diagrams showing an operation during a multi-party call.

［構成］
図１に示すように、本発明における多者通話システムは、複数のユーザＡ〜Ｄにて操作される携帯電話ａ〜ｄによって実現される。特に、本実施例では、各携帯電話ａ〜ｄがネットワークＮを介して接続され、上述したＰｏＣ通信にて多者通話を行う。なお、本実施例における多者通話は、発言権を有するユーザ（例えば、Ａ）の携帯電話（例えば、ａ）からの音声データが、他のユーザ（傍聴者）（例えば、Ｂ〜Ｄ）の携帯電話（例えば、ｂ〜ｄ）に伝送されることによって実現される。そして、他のユーザが発言するためには、発言権を得る必要があり、当該発言権を有するユーザからの発言要求を受けてから発言する。 [Constitution]
As shown in FIG. 1, the multi-party call system in the present invention is realized by mobile phones a to d operated by a plurality of users A to D. In particular, in the present embodiment, the respective mobile phones a to d are connected via the network N, and a multi-party call is performed by the PoC communication described above. In the multi-party call in this embodiment, the voice data from the mobile phone (for example, a) of the user (for example, A) who has the right to speak is transmitted from other users (listeners) (for example, BD). This is realized by being transmitted to a mobile phone (for example, b to d). And in order for another user to speak, it is necessary to obtain the right to speak, and a speech is made after receiving a speech request from a user having the right to speak.

以下では、ユーザＡ（発言者）が発言権を有することとし、ユーザＢ（傍聴者）が発言要求を受ける場合を一例に挙げて説明する。つまり、符号ａの携帯電話から符号ｂの携帯電話ｂに発言要求が送信される場合を説明する。なお、いずれの携帯電話ａ〜ｄも、同様に構成されており、いずれのユーザＡ〜Ｄであっても、発言権を有する者、あるいは、傍聴者となり得る。 Hereinafter, a case where the user A (speaker) has the right to speak and the user B (listener) receives a speech request will be described as an example. That is, a case will be described in which a speech request is transmitted from the cellular phone with the symbol a to the cellular phone b with the symbol b. Note that any of the mobile phones a to d is configured in the same manner, and any user A to D can be a person who has the right to speak or a listener.

図１に携帯電話の構成を示す。ここでは、符号ａの携帯電話を一例に挙げて説明するが、他の携帯電話ｂ〜ｃも同様の構成である。この図に示すように、携帯電話ａは、装置制御部６を中心に、装置内の各要部へは内部バス１３を介して接続されている。装置制御部６には、ＣＰＵ（中央演算装置）、キャッシュメモリ、割り込みコントローラ、ＤＳＰ（Digital Signal Processor）、装置全体を制御するためのＯＳを含み、携帯端末装置全体を制御する役割をもつ。 FIG. 1 shows a configuration of a mobile phone. Here, a mobile phone with the symbol a will be described as an example, but the other mobile phones b to c have the same configuration. As shown in this figure, the mobile phone a is connected to each main part in the apparatus via an internal bus 13 with the apparatus control section 6 as the center. The device control unit 6 includes a CPU (Central Processing Unit), a cache memory, an interrupt controller, a DSP (Digital Signal Processor), and an OS for controlling the entire device, and has a role of controlling the entire portable terminal device.

そして、ユーザ（傍聴者）に発言要求を報知する装置（報知器）としては、一般的な携帯電話に装備されている装置内要部の振動駆動装置部１、ＬＥＤ部２、ディスプレイ部３、スピーカ部４が該当する。振動駆動装置部１は、装置全体を振動させる機構を搭載しており、発言要求を受けると振動駆動装置が動作しユーザへ報知する。ＬＥＤ部２は、ユーザが発光を確認可能な装置本体の前面に配置されており、発言要求を受けるとＬＥＤが発光しユーザへ報知する。ディスプレイ部３は、表示制御機能を備えた表示装置である。表示装置は液晶パネルあるいは有機ＥＬ（電子蛍光）等の表示装置である。そして、発言要求を受けると、画面全体もしくは一部の表示が変化しユーザへ報知する。このとき表示する画像データは、メモリ５の不揮発領域（ROM）に格納されているユーザデータを使用する。スピーカ部４は、PoC通話中の音声会話を出力する装置であると同時に、発言要求の報知手段にも使用される。発言要求があった場合には、あらかじめ指定された呼び出し音を鳴動させユーザに報知する。呼び出し音としては機械音だけでなく、メモリ５の不揮発領域(ROM)に格納されている音楽・音声データを使用しても良い。 And as a device (notifier) for notifying a user (listener) of a speech request, a vibration drive device portion 1, an LED portion 2, a display portion 3, and a main portion in the device equipped in a general mobile phone, The speaker unit 4 is applicable. The vibration drive unit 1 is equipped with a mechanism that vibrates the entire apparatus. When a request for speech is received, the vibration drive unit operates to notify the user. The LED unit 2 is disposed on the front surface of the apparatus main body from which the user can confirm the light emission. When the LED request is received, the LED emits light to notify the user. The display unit 3 is a display device having a display control function. The display device is a display device such as a liquid crystal panel or organic EL (electrofluorescence). When a request for speech is received, the display on the entire screen or a part of the screen changes to notify the user. As the image data to be displayed at this time, user data stored in a nonvolatile area (ROM) of the memory 5 is used. The speaker unit 4 is a device that outputs a voice conversation during a PoC call, and at the same time, is used for a speech request notification unit. When there is a request for speech, a ringing tone designated in advance is sounded to notify the user. As a ringing tone, not only a mechanical tone but also music / voice data stored in a nonvolatile area (ROM) of the memory 5 may be used.

なお、上述した報知器である振動駆動装置部１、ＬＥＤ部２、ディスプレイ部３、スピーカ部４の動作は、上述した報知制御手段としての装置制御部６にて制御される。特に、後述するように、傍聴者Ｂの携帯電話ｂにて発言要求が検出されたときに、報知器による報知動作がなされるよう制御する。従って、報知器と装置制御部６にて報知手段が構成されている。 In addition, operation | movement of the vibration drive apparatus part 1, LED part 2, the display part 3, and the speaker part 4 which are the alerting devices mentioned above is controlled by the apparatus control part 6 as an alerting | reporting control means mentioned above. In particular, as will be described later, when a request for speech is detected by the mobile phone b of the listener B, control is performed so that a notification operation is performed by a notification device. Therefore, the notification unit is configured by the notification device and the device control unit 6.

また、メモリ５には、ROM／RAM共に搭載している。ROMは電話帳、音楽・画像データ等のユーザデータを格納している不揮発性メモリである。また、RAMはCPUがプログラムを実行する上で一時的に必要とされるデータを格納するようになっている。そして、本実施例におけるメモリ５には、予め設定された発言要求キーワード（発言要求文言）が記憶されており、当該メモリ５は、発言要求文言記憶手段として機能している。この発言要求キーワードの一例を、図３に示す。この図に示すように、発言要求キーワードは、一定区切りの単語として格納されており、この各単語を論理演算（AND,OR,NOTなど）により組み合わせることで利用される。例えば、「Aさん、発言してください」という発言要求であれる場合には、図３中のテーブルNo.1とNo.4のAND演算の結果を用いて後述する音声認識処理あるいは発言要求検出処理が行われる。また、「発言の少ない人、発言してください」という要求であれば、タイマ部１２の計時機能で発言者毎の発言時間をあらかじめ計測したデータと、図３中のテーブルNo.4とNo.6のAND演算の結果が利用される。そして、これら発言要求キーワードは、発言者Ａの携帯電話ａにおいては発言要求時に利用され、傍聴者の携帯電話ｂにおいては、発言要求を検出する際に利用される。 The memory 5 is equipped with both ROM / RAM. The ROM is a non-volatile memory that stores user data such as a telephone directory and music / image data. The RAM stores data temporarily required for the CPU to execute the program. The memory 5 in the present embodiment stores a speech request keyword (speech request message) set in advance, and the memory 5 functions as a speech request message storage unit. An example of the speech request keyword is shown in FIG. As shown in this figure, the speech request keyword is stored as words of a certain delimiter, and is used by combining each word by a logical operation (AND, OR, NOT, etc.). For example, in the case of an utterance request such as “Mr. A, please speak,” voice recognition processing or utterance request detection described later using the result of the AND operation of tables No. 1 and No. 4 in FIG. Processing is performed. If the request is “a person with few utterances, please speak”, the time measured by the timer unit 12 is used to preliminarily measure the utterance time for each speaker, and tables No. 4 and No. 4 in FIG. The result of the 6 AND operation is used. These speech request keywords are used at the time of speech request in the mobile phone a of the speaker A, and are used when detecting the speech request in the mobile phone b of the listener.

また、送信制御部７は、通信用アンテナ９を介して無線によって音声データなどの送信制御を行う回路である。また、受信制御部８は、通信用アンテナ９を介して無線によって音声データなどの受信制御を行う回路である。 The transmission control unit 7 is a circuit that performs transmission control of voice data and the like wirelessly via the communication antenna 9. The reception control unit 8 is a circuit that performs reception control of voice data and the like wirelessly via the communication antenna 9.

音声認識部１０（発言要求音声認識手段、受信音声認識手段）は、受話部分から入力された通話データや、受信制御部８を介して他の携帯電話から受信した通話データなど、通話時の音声データに対して、特定キーワードまたは特定制御音の認識・検出を、所定の検出アルゴリズムによって実行する装置またはプログラムにて実現される。音声認識アルゴリズムの詳細については公知のものとして本発明では言及しないが、特に発言権を有するユーザＡにて使用される場合には、その所有者の個性を学習して音声認識精度を向上させることが可能なものとする。つまり、予めユーザの音声特徴を表す音声特徴データを、ＲＯＭなど（音声特徴データ記憶手段）に記憶しており、これに基づいて音声認識を行う。 The voice recognition unit 10 (speech request voice recognition unit, received voice recognition unit) is a voice at the time of a call such as a call data input from the reception part or a call data received from another mobile phone via the reception control unit 8. It is realized by a device or a program that executes recognition / detection of specific keywords or specific control sounds with respect to data by a predetermined detection algorithm. Details of the speech recognition algorithm are not mentioned in the present invention as known ones, but when used by the user A who has the right to speak, learning the individuality of the owner to improve the speech recognition accuracy Is possible. That is, voice feature data representing the voice feature of the user is stored in advance in a ROM or the like (voice feature data storage means), and voice recognition is performed based on this.

また、音声認識部１０（発言要求検出手段）は、自分の発言中および他の参加者の発言中は絶えず動作し、会話中に含まれる発言要求となるキーワードを検出する役割をもつ。このとき、上述したＲＯＭに記憶されている図３に示すような発言要求キーワードを検出する。そして、傍聴者Ｂ側にて用いられる場合には、音声認識装置部１０で発言要求を検出すると、上述したように、装置制御部６を介して振動駆動装置部１、またはLED部２、またはディスプレイ部３、またはスピーカ部４からユーザへ報知される。 The voice recognition unit 10 (speech request detecting means) constantly operates during his / her speech and when other participants speak, and has a role of detecting a keyword that becomes a speech request included in the conversation. At this time, a speech request keyword as shown in FIG. 3 stored in the ROM described above is detected. Then, when used on the side of the listener B, when the speech recognition device unit 10 detects a speech request, as described above, the vibration driving device unit 1 or the LED unit 2 via the device control unit 6, or The user is notified from the display unit 3 or the speaker unit 4.

合成音生成装置部１１（合成音生成手段）は、入力された音声データを加工して出力することが可能な装置である。本発明においては、特に、送信側の携帯電話ａ、つまり、PoC発言者側であるユーザＡの携帯電話ａにて動作し、受信側の電話端末ｂ（PoC傍聴者側）の音声認識部１０の音声認識精度を向上させるためのものである。具体的には、上述したように受話部分から入力された音声データから発言要求キーワードが検出されたときに、そのキーワード自体の合成音を生成して、送信制御部７を介して他の通話端末に送信する。つまり、送信制御装置７と協働して、発言要求送信手段としても機能する。そして、さらに、合成音生成装置部１１では、発言要求相手にそれぞれ割り当てられた特定の識別音を付加したり、イントネーションの変更（例えば、地方なまりのイントネーションを標準日本語的なイントネーション化）などの加工を行う。 The synthesized sound generating unit 11 (synthesized sound generating means) is a device that can process and output input voice data. In the present invention, in particular, it operates on the mobile phone a on the transmitting side, that is, on the mobile phone a of the user A who is the PoC speaker side, and the voice recognition unit 10 of the telephone terminal b on the receiving side (PoC listener side). This is to improve the voice recognition accuracy. Specifically, as described above, when a speech request keyword is detected from the voice data input from the reception part, a synthesized sound of the keyword itself is generated, and another call terminal via the transmission control unit 7 Send to. That is, in cooperation with the transmission control device 7, it also functions as a speech request transmission unit. Further, the synthesized sound generating unit 11 adds a specific identification sound assigned to each of the speech requesting parties, changes an intonation (for example, changes a local round intonation into a standard Japanese intonation), and the like. Processing.

また、タイマ部１２は、時間を計測するための計時装置である。後述するように、発言要求の検出アルゴリズム内で時間計測用途にて使用される。マイク部１４は、発言者の発言内容を集音する装置であり、受話部分である。 The timer unit 12 is a time measuring device for measuring time. As will be described later, it is used in a time measurement application within a detection request detection algorithm. The microphone unit 14 is a device that collects the content of a speaker's speech and is a receiving part.

［動作］
次に、上記構成の携帯電話の動作を、図４乃至図６を参照して説明する。以下では、まず、図４を参照して全体動作の概要を説明し、続いて、図５を参照して送信側であるユーザＡの携帯電話ａの動作を説明し、さらに、図６を参照して受信側であるユーザＢの携帯電話ｂの動作を説明する。 [Operation]
Next, the operation of the mobile phone configured as described above will be described with reference to FIGS. In the following, first, an overview of the overall operation will be described with reference to FIG. 4, followed by description of the operation of the mobile phone a of the user A on the transmission side with reference to FIG. 5, and further with reference to FIG. The operation of the mobile phone b of the user B who is the receiving side will be described.

＜全体動作の概要＞
図４は、上述した携帯電話Ａ〜Ｄを使用し、PoC会議を行う様子を示したイメージ図である。ここでは、ユーザＡが発言者であり、ユーザＢが傍聴者であって、ユーザＡがユーザＢに発言要求をする様子を説明する。 <Overview of overall operation>
FIG. 4 is an image diagram showing a state in which a PoC conference is performed using the mobile phones A to D described above. Here, a state where the user A is a speaker, the user B is a listener, and the user A requests the user B to speak is described.

図中の発言者（ユーザＡ）の携帯電話ａと、傍聴者（ユーザＢ）の携帯電話ｂは、PoC通信が確立されており、今、発言者Ａが発言しているところである。なお、発言の内容には、傍聴者Ｂへの発言要求となる「○○さん、発言してください」、という音声を含んでいるものとする。 The PoC communication is established between the cellular phone a of the speaker (user A) and the cellular phone b of the listener (user B) in the figure, and the speaker A is now speaking. It is assumed that the content of the utterance includes a voice “Please speak, Mr. XX”, which is a utterance request to the listener B.

そして、送信側の携帯電話ａでは、発言者Ａの肉声が音声データとして入力される（Ｙ１）。この段階の肉声データでは、発言者のしゃべり方の癖、声のトーン、なまり等により、例え同じ発言内容であってもデータ上では多くの差異が存在する。この個人差は、受信側の携帯電話ｂの音声認識精度の低下を招く要因であり、認識精度が低い場合には予期せぬ状況で報知動作が誤作動してしまう問題がある。従って、本実施例では、上述したように、発言者Ａの声の特徴を学習済みの送信側携帯電話ａの音声認識部を用いて、肉声データ中の発言要求部分の検出を行い、発言要求部分の音声データを合成音に変換する（Ｙ２）。すると、合成音への変換によって肉声データで介在していた個体差は除去され、受信側の携帯電話ｂでの認識精度を向上させる効果が期待できる。なお、合成音生成手法としては、音声データに特定の識別音を付加するということでもよく、発言のイントネーション変更（なまりなどの発言の癖を除去）であってもよい。 Then, in the mobile phone a on the transmission side, the real voice of the speaker A is input as voice data (Y1). In the real voice data at this stage, there are many differences in the data even if the content is the same, for example, due to the manner in which the speaker speaks, the tone of the voice, and the roundness. This individual difference is a factor that causes a decrease in the voice recognition accuracy of the mobile phone b on the receiving side, and there is a problem that the notification operation malfunctions in an unexpected situation when the recognition accuracy is low. Therefore, in the present embodiment, as described above, the speech request part of the real voice data is detected using the speech recognition unit of the transmitting mobile phone a having learned the voice characteristics of the speaker A, and the speech request The voice data of the part is converted into synthesized sound (Y2). Then, the individual difference intervening in the real voice data is removed by the conversion to the synthesized sound, and an effect of improving the recognition accuracy in the mobile phone b on the receiving side can be expected. As a synthetic sound generation method, a specific identification sound may be added to the voice data, or the intonation change of the speech (removal of speech such as a round) may be performed.

このように、音声データは発言要求部分が合成音化され、PoC通信網を伝わって受信側端末装置へと伝送される（Ｙ３）。すると、受信側の携帯電話ｂ内部では、受信した合成音による音声データ（「○○さん、発言してください」）をスピーカ等の鳴動装置により外部出力する（Ｙ４）と共に、音声認識部が会話中の発言要求検出を絶えず行う。そして、合成音化された発言要求を音声認識部が検出した場合（Ｙ５）には、発言要求の報知動作を行う（Ｙ６）。なお、符号Ｙ４の会話出力動作は省略し、符号Ｙ６の報知動作のみを行ってもよい。 As described above, the voice data is synthesized in the speech request portion and transmitted to the receiving terminal device through the PoC communication network (Y3). Then, in the mobile phone b on the receiving side, the voice data (“Mr. XX, please say”) of the received synthesized sound is output to the outside by a ringing device such as a speaker (Y4), and the voice recognition unit also has a conversation. Continuously detect requests to speak. When the speech recognition unit detects a speech request that has been synthesized (Y5), a speech request notification operation is performed (Y6). In addition, the conversation output operation | movement of the code | symbol Y4 may be abbreviate | omitted and only the alerting | reporting operation | movement of the code | symbol Y6 may be performed.

＜送信側携帯電話の動作＞
次に、各携帯電話ａ，ｂにおける動作について詳述する。まず、図５のシーケンス図を参照して、発言者Ａの携帯電話ａの動作を詳述する。 <Operation of sending mobile phone>
Next, the operation in each of the mobile phones a and b will be described in detail. First, the operation of the cellular phone a of the speaker A will be described in detail with reference to the sequence diagram of FIG.

発言者Ａによる発言は、マイク部より入力され音声データへと変換され（ステップＳ１）、メモリ部へと逐次転送される（ステップＳ２）。音声データは、メモリ部５の音声認識用のバッファに格納され（ステップＳ３）、音声認識部１０ではそのバッファに格納された音声データから音声認識処理を行う（ステップＳ４）。音声認識処理は会話が行われている間は常に動作し、発言要求となる特定キーワードの検出を行う（ステップＳ５）。 The utterance by the speaker A is input from the microphone unit, converted into voice data (step S1), and sequentially transferred to the memory unit (step S2). The voice data is stored in a voice recognition buffer of the memory unit 5 (step S3), and the voice recognition unit 10 performs voice recognition processing from the voice data stored in the buffer (step S4). The voice recognition process always operates during a conversation and detects a specific keyword that is a request for speech (step S5).

そして、特定キーワードの検出がされた場合は、その音声部分のデータを合成音生成部１１へと転送し（ステップＳ７）、合成音生成部１１にて合成音へと変換した後（ステップＳ８）、送信制御部７へと転送される（ステップＳ９）。一方、特定キーワードが検出されていない音声データは、そのままの肉声データとして送信制御部９へと転送される（ステップＳ６）。送信制御部７では、転送されてきたデータを符号化し、通信用アンテナ９からデータを傍聴者Ｂの携帯電話ｂに送信する（ステップＳ１０）。 When a specific keyword is detected, the data of the voice part is transferred to the synthesized sound generating unit 11 (step S7), and converted into synthesized sound by the synthesized sound generating unit 11 (step S8). The data is transferred to the transmission control unit 7 (step S9). On the other hand, the voice data in which the specific keyword is not detected is transferred to the transmission control unit 9 as the real voice data as it is (step S6). The transmission control unit 7 encodes the transferred data and transmits the data from the communication antenna 9 to the mobile phone b of the listener B (step S10).

＜受信側携帯電話の動作＞
次に、図６のシーケンス図を参照して、傍聴者Ｂの携帯電話ｂの動作を詳述する。まず、装置制御部６では、常時、発言要求の検出待ち状態にある（ステップＳ２０）。そして、通信用アンテナ９にて、上述した発言者Ａの携帯電話ａからデータを受信すると、受信制御部８は音声データを復号化する（ステップＳ２１）。その後、受信された音声データは逐次転送され（ステップＳ２２）、スピーカ部４で会話出力される（ステップＳ２９，Ｓ３０）と共に、メモリ部５の音声認識用バッファへ格納される（ステップＳ２３）。音声認識部１０では、そのバッファに格納された音声データから音声認識処理を行う（ステップＳ２４）。音声認識処理は会話が行われている間は常に動作し、発言要求となる特定キーワードの検出を行う（ステップＳ２５）。 <Operation of receiving mobile phone>
Next, the operation of the mobile phone b of the listener B will be described in detail with reference to the sequence diagram of FIG. First, the device control unit 6 is always in a speech request detection waiting state (step S20). When the communication antenna 9 receives data from the above-described speaker A's mobile phone a, the reception control unit 8 decodes the voice data (step S21). Thereafter, the received voice data is sequentially transferred (step S22), and the conversation is output from the speaker unit 4 (steps S29 and S30), and stored in the voice recognition buffer of the memory unit 5 (step S23). The voice recognition unit 10 performs voice recognition processing from the voice data stored in the buffer (step S24). The voice recognition process always operates during a conversation, and a specific keyword that is a request for speech is detected (step S25).

そして、発言要求の検出がされた場合は、直ちに装置制御部６へと特定キーワードヒットの通知が行われ（ステップＳ２６）、装置制御部６はその通知を受け、スピーカ部４、ディスプレイ部３、LED部２、振動制御部１へと報知動作要求を行う（ステップＳ２７）。すると、この報知動作要求に応じて、各部１〜４が報知作動する（ステップＳ２８）。なお、発言要求の報知動作は、各部で同時に行ってもよく、あるいは、単独に時間をずらして行ってもよい。 When the request to speak is detected, the specific keyword hit is immediately notified to the device control unit 6 (step S26), and the device control unit 6 receives the notification, and the speaker unit 4, the display unit 3, A notification operation request is made to the LED unit 2 and the vibration control unit 1 (step S27). Then, in response to the notification operation request, each unit 1 to 4 performs a notification operation (step S28). Note that the speech request notification operation may be performed simultaneously by each unit, or may be performed independently at different times.

以上のように、本実施例では、PoC会議中の「○○さんどうぞ」等の音声による発言要求を、受信側の携帯電話ｂで音声認識により自動的に検出し、装置の振動、LCD表示、LED点灯、制御音呼応等の通知手段を動作させることが可能となる。つまり、ユーザは音声だけでなく、端末装置の他動作によって発言要求を知る機会を得ることができる。従って、従来よりも確実に相手に発言要求を通知させることにより、周囲の雑音で発言要求を聞き取れなかった、または注意散漫で発言要求を聞いていなかったという状況を解消させ、円滑なコミュニケーションを図ることができる。 As described above, in the present embodiment, a speech request such as “Please speak ○○” during a PoC meeting is automatically detected by voice recognition on the receiving mobile phone b, and the vibration of the device, LCD display It is possible to operate notification means such as LED lighting and control sound response. That is, the user can obtain an opportunity to know the request for speech not only by voice but also by other operations of the terminal device. Therefore, by making sure that the request for speech is notified to the other party more reliably than before, the situation where the request for speech was not heard due to ambient noise or the request for speech was not heard due to distraction is eliminated, and smooth communication is achieved. be able to.

さらに、本実施例では、送信側の電話端末ａでも音声認識を行い、認識した発言要求部分の音声データを合成音へと変換することで発言者のしゃべり方の癖、なまり等の個人差を一般化させることができる。これにより、受信側の携帯電話ｂによる音声認識精度が向上し、さらなるコミュニケーションの円滑化を図ることができる。 Furthermore, in the present embodiment, voice recognition is also performed at the telephone terminal a on the transmission side, and the voice data of the recognized speech request part is converted into a synthesized sound, so that individual differences such as the manner in which the speaker speaks and the roundness are reduced. It can be generalized. Thereby, the voice recognition accuracy by the mobile phone b on the receiving side is improved, and further smoothing of communication can be achieved.

なお、上記とは異なり、送信側の携帯電話ａからの発言要求音声は、合成音化されずに肉声のまま受信側の携帯電話ｂに送信されてもよい。この場合には、受信側の携帯電話ｂでは、肉声に対して音声認識を行い、発言要求の検出が行われる。このようにしても、受信側の携帯電話ｂにて、音声認識により自動的に発言要求音声が検出されるため、円滑なコミュニケーションを図ることができる。 Note that, unlike the above, the speech request voice from the mobile phone a on the transmission side may be transmitted to the mobile phone b on the reception side as a real voice without being synthesized. In this case, the mobile phone b on the receiving side performs voice recognition on the real voice and detects a speech request. Even in this case, since the speech request voice is automatically detected by voice recognition in the mobile phone b on the receiving side, smooth communication can be achieved.

また、上記とは異なり、受信側の携帯電話ｂでは音声認識が行われず、送信側の携帯電話ａからの発言要求音声が合成音化されて受信側の携帯電話ｂに送信されるのみであってもよい。このようにしても、受信側の携帯電話ｂに対して、傍聴者が聞き取りやすい（認識しやすい）合成音にて発言要求がなされるため、円滑なコミュニケーションを図ることができる。 Also, unlike the above, voice recognition is not performed on the mobile phone b on the receiving side, and the speech request speech from the mobile phone a on the sending side is synthesized and transmitted only to the mobile phone b on the receiving side. May be. Even in this case, since the request for speech is made to the reception-side mobile phone b with a synthesized sound that is easy for a listener to hear (easy to recognize), smooth communication can be achieved.

なお、上述した構成は、携帯電話の構成を変更するのみで実現可能であり、現状の多者通話を可能とするPoC通信システムの改良を必要としないため、コスト面での負担が軽減される。 Note that the above-described configuration can be realized only by changing the configuration of the mobile phone, and does not require improvement of the current PoC communication system that enables multi-party calls, thus reducing the cost burden. .

次に、本発明の第２の実施例を、図７を参照して説明する。図７は、受信側の携帯電話ｂの報知動作を示すシーケンス図である。 Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 7 is a sequence diagram showing a notification operation of the mobile phone b on the receiving side.

仮に、音声認識の誤認識により報知動作が行われてしてしまう場合には、かえってコミュニケーションを阻害しかねない。とりわけ、振動駆動装置が動作してしまってはユーザの不満は大きい。このため、誤報によるユーザへの影響を低減させるための制御手法として、本実施例では、ユーザ報知の強さが弱い順に時間を置いて段階的に動作させることとする。例えば、上述したように、振動、LCD表示、LED点灯、音の鳴動という４つの報知手段があったとすると、LED点灯→LCD表示→音の鳴動→振動の順に行うように制御を行う。 If a notification operation is performed due to misrecognition of voice recognition, communication may be hindered. In particular, user dissatisfaction is great if the vibration drive device operates. For this reason, as a control method for reducing the influence on the user due to misreporting, in this embodiment, the operation is performed step by step in order of increasing strength of the user notification. For example, as described above, if there are four notification means of vibration, LCD display, LED lighting, and sounding, control is performed in the order of LED lighting → LCD display → sounding → vibration.

具体的な動作を、図７を参照して説明する。ここでは、受信側の携帯電話ｂが、送信側の携帯電話ａからの発言要求の検出を待ち状態にあり（ステップＳ４０）、装置制御部６が発言要求を受け取り、報知手段動作部１、および報知手段動作部２を制御させようとしている状況にある。そして、装置制御部６が発言要求受け取ると（ステップＳ４１）、タイマ部１２へタイマ開始要求を行い（ステップＳ４２）、タイマ部１２は一定時間のタイマ計測を行う（ステップＳ４３）。タイマ計測を終えると、タイマ部１２は装置制御部６にカウントアップ完了通知を行う（ステップＳ４４）。そこではじめて装置制御部６は、報知動作の要求を報知手段動作部１へ通知する（ステップＳ４５）し、報知動作が行われる（ステップＳ４６）。このとき、１回目は、ＬＥＤ部２による発光といった傍聴者Ｂに対する報知強さが弱い報知器を用いて行われる。 A specific operation will be described with reference to FIG. Here, the mobile phone b on the receiving side is in a waiting state for detecting the speech request from the mobile phone a on the transmitting side (step S40), the device control unit 6 receives the speech request, and the notification means operation unit 1; The informing means operating unit 2 is in a situation to be controlled. When the device control unit 6 receives the speech request (step S41), it makes a timer start request to the timer unit 12 (step S42), and the timer unit 12 performs timer measurement for a certain time (step S43). When the timer measurement is finished, the timer unit 12 notifies the device control unit 6 of the count-up completion (step S44). Therefore, for the first time, the device control unit 6 notifies the notification unit operation unit 1 of a notification operation request (step S45), and the notification operation is performed (step S46). At this time, the 1st time is performed using the alerting | reporting device with weak alerting | reporting strength with respect to the observer B, such as light emission by the LED part 2. FIG.

同様にして、さらに時間が計測され、一定時間が経過すると、ステップＳ４７〜ステップＳ５１に示すように、２回目の報知が行われる。このときは、スピーカ部４から音による報知を行うといった、さらに報知度が強い報知器を用いて行われる。 Similarly, when the time is further measured and a certain time has elapsed, a second notification is performed as shown in steps S47 to S51. At this time, the notification is performed using a notification device having a higher notification degree such as notification by sound from the speaker unit 4.

このように、複数種類の報知を、時間をずらして行い、さらには、ユーザに対する報知強さを変えて段階的に報知することで、音声認識の誤認識により報知動作が行われてしてしまう場合にフェールセーフ的に作動し、誤作動によるユーザの不満を低減させる効果がある。 As described above, a plurality of types of notifications are performed at different times, and further, notification operations are performed due to misrecognition of voice recognition by performing notification in stages while changing the notification strength for the user. It operates in a fail-safe manner in some cases, and has the effect of reducing user dissatisfaction due to malfunction.

次に、本発明の第３の実施例を、図８を参照して説明する。図８は、受信側の携帯電話ｂの報知動作を示すシーケンス図である。 Next, a third embodiment of the present invention will be described with reference to FIG. FIG. 8 is a sequence diagram showing the notification operation of the mobile phone b on the receiving side.

本実施例では、一定時間内における発言要求の検出回数に応じて、報知方法を変えて報知する、という点に特徴を有する。つまり、装置制御部６は、一定時間内における発言要求（その一部のキーワード）が検出された回数をカウントする機能（発言要求検出回数カウント手段）を有すると共に、その回数に応じて各種の報知を行う、という機能を有する。これは、発言要求が短時間に繰り返し行われるという傾向を考慮してのことである。例えば、傍聴者Ｂが「○○さん」であった場合に、１回目の発言要求は「○○さんコメントを頂きたいと思います。」、２回目の発言要求は「では、○○さんお願いします。」、３回目の発言要求は「○○さん、聞いてますか？」といったような会話を想定しており、この場合には、検出回数をカウントする特定キーワードは、「○○さん」となる。 The present embodiment is characterized in that notification is made by changing the notification method according to the number of detections of a speech request within a certain time. In other words, the device control unit 6 has a function (speech request detection frequency counting means) that counts the number of times that a speech request (a part of the keywords) is detected within a certain time, and various notifications according to the number of times. It has a function of performing. This is in consideration of the tendency that the request for speech is repeated in a short time. For example, if the listener B is “Mr. XX”, the first request to speak is “I want to receive a comment from Mr. XX”. "The third request to speak is assumed to be a conversation such as" Do you hear Mr. XX? ". In this case, the specific keyword for counting the number of detections is" Mr. XXX "

図８に示すように、まず、装置制御部６では、１回目の検出待ち（ヒット待ち）状態となっている（ステップＳ６０）。そして、特定キーワードの検出の通知がなされたら（ステップＳ６１）、タイマ部１２にタイマ開始要求がなされ（ステップＳ６２）、タイマ部１２ではタイマ計測が開始され（ステップＳ６３）、上記特定キーワードのカウントが開始される。同時に、装置制御部６は２回目のヒット待ち状態となる（ステップＳ６４）。その後、装置制御部が２回目のヒット待ち状態中に、特定キーワードヒットの通知（２回目）がなされたら（ステップＳ６５）、まずは比較的弱い報知手段を行う報知手段動作部１へ報知動作の要求（ステップＳ６６）を通知し、報知手段を動作させる（ステップＳ６７）。さらに、もう一度、特定キーワードヒットの通知（３回目）がなされたら（ステップＳ６８）、強い報知手段を行う報知手段動作部２へ報知動作の要求（ステップＳ６９）を通知し、報知手段を動作させる（ステップＳ７０）。 As shown in FIG. 8, first, the device control unit 6 is in a first detection waiting (hit waiting) state (step S60). When notification of detection of the specific keyword is made (step S61), a timer start request is made to the timer unit 12 (step S62), timer measurement is started in the timer unit 12 (step S63), and the count of the specific keyword is counted. Be started. At the same time, the apparatus control unit 6 enters a second wait state (step S64). After that, when the device control unit is notified of the specific keyword hit (second time) while waiting for the second hit (step S65), first, a notification operation request is issued to the notification unit operation unit 1 which performs a relatively weak notification unit. (Step S66) is notified, and the notification means is operated (Step S67). Furthermore, when a specific keyword hit notification (third time) is made again (step S68), a notification operation request (step S69) is notified to the notification means operation unit 2 that performs strong notification means, and the notification means is activated (step S69). Step S70).

なお、図８の点線内に示すように、２回目やそれ以降のキーワードヒットが通知される前にタイマカウントアップが完了した場合には、タイマ部１２は装置制御部６へカウントアップ完了通知を行い（ステップＳ７１）、装置制御部６は１回目のヒット待ち状態へと遷移する（ステップＳ７２）。 As shown in the dotted line in FIG. 8, when the timer count-up is completed before the second or subsequent keyword hit is notified, the timer unit 12 notifies the device control unit 6 of the count-up completion notification. If this is done (step S71), the device control unit 6 transitions to the first hit waiting state (step S72).

これにより、より確実に発言要求を通知することができ、さらなる円滑なコミュニケーションを図ることができる。 As a result, it is possible to notify the request for speech more reliably and to achieve further smooth communication.

次に、本発明の第３の実施例を、図９を参照して説明する。上記では、発言要求の音声認識処理を、各携帯電話ａ，ｂにて行っていたが、かかる処理をPoCサーバシステム２０で実行させてもよい。つまり、上述した携帯電話ｂが有する音声認識部１０（発言要求音声認識手段、発言要求検出手段）を、サーバシステム２０が備えており、さらに、発言要求を検出した旨を受信側の携帯電話ｂに通知する機能（報知制御手段）を備えている。 Next, a third embodiment of the present invention will be described with reference to FIG. In the above description, the speech recognition processing for requesting speech is performed by each of the mobile phones a and b. However, such processing may be executed by the PoC server system 20. In other words, the server system 20 includes the speech recognition unit 10 (speech request speech recognition means, speech request detection means) included in the mobile phone b described above, and further indicates that the speech request has been detected. The function (notification control means) of notifying is provided.

その動作を、図９を参照して説明する。まず、発言者（Ａ）から傍聴者（Ｂ）へ「○○さん、発言してください」という発言要求を入力すると、送信側の携帯電話ａでは、入力された発言（Ｙ１１）をそのままPoCサーバシステム２０に伝送する（Ｙ１２，Ｙ１３）。PoCサーバシステム２０では、音声認識部を備えており、上記実施例にて説明した電話端末ｂと同様に発言要求を検出し（Ｙ１４）、要求対象者となる者の携帯端末装置へと発言要求を伝える制御信号を送信する（Ｙ１５）。このとき、音声データもそのまま送信する。そして、この制御信号を受け取った携帯端末ｂでは、会話出力中（Ｙ１６）に発言要求の報知（Ｙ１７，Ｙ１８）が動作する。 The operation will be described with reference to FIG. First, when an utterance request “Mr. XX, please speak” is input from the speaker (A) to the listener (B), the mobile phone a on the transmitting side directly inputs the input message (Y11) to the PoC server. The data is transmitted to the system 20 (Y12, Y13). The PoC server system 20 includes a voice recognition unit, detects a speech request in the same manner as the telephone terminal b described in the above embodiment (Y14), and requests the speech request to the mobile terminal device of the person who is the request target. Is transmitted (Y15). At this time, the audio data is also transmitted as it is. Then, in the portable terminal b that has received this control signal, the speech request notification (Y17, Y18) operates during the conversation output (Y16).

これにより、音声認識処理をサーバシステム２０に実行させることができるため、上述した効果を得ることができると共に、携帯電話ａ，ｂでの処理負担の軽減を図ることができる。 Thereby, since the voice recognition processing can be executed by the server system 20, the above-described effects can be obtained, and the processing burden on the mobile phones a and b can be reduced.

次に、本発明の第５の実施例を、図１０乃至図１１を参照して説明する。図１０乃至図１１は、携帯電話の構成を示す図である。 Next, a fifth embodiment of the present invention will be described with reference to FIGS. 10 to 11 are diagrams illustrating a configuration of a mobile phone.

本実施例における携帯電話は、特に、発言権を有し、他のユーザに対して発言要求を行う送信側の携帯電話ａである。そして、基本的には、図２に示すように上述した実施例における携帯電話ａ，ｂと同様の構成であるが、図１０及び図１１に示すように、さらに、キーボード部１５を装備している点で異なる。そして、このキーボード部１５の各キーには、図１１に示すように、PoC会議参加者と、参加者を判別可能な制御音声と、が関連付けられており、かかる関連付け情報があらかじめ制御音変換テーブル５１として、メモリ部５に格納されている。なお、かかる関連付けは、ユーザＡによって行われる。 The mobile phone in this embodiment is a mobile phone a on the transmission side that has a right to speak and makes a request to speak to another user. Basically, the configuration is the same as that of the mobile phones a and b in the above-described embodiment as shown in FIG. 2, but a keyboard portion 15 is further provided as shown in FIGS. Is different. As shown in FIG. 11, each key of the keyboard unit 15 is associated with a PoC conference participant and a control voice capable of discriminating the participant, and the association information is stored in advance in the control sound conversion table. 51 is stored in the memory unit 5. Such association is performed by the user A.

そして、発言者Ａは、発言要求を行いたい相手がいた場合、その者に対応したキーを押下（Ｙ２１）することで、対応制御音が合成音生成部１１にて生成される。そして、かかる制御音は、各参加者で個別のものであるため、受信側の電話端末ｂでは、音声認識処理により、他のユーザＢに対する発言要求であることを検出することができる。つまり、制御音は、上述した発言要求の合成音として機能する。 Then, when there is a partner who wants to make a speech request, the speaker A presses a key corresponding to the speaker (Y21), and the corresponding control sound is generated by the synthesized sound generation unit 11. And since this control sound is individual for each participant, the telephone terminal b on the receiving side can detect that it is a speech request to another user B by voice recognition processing. That is, the control sound functions as a synthetic sound for the above-described speech request.

このようにすることで、送信側の携帯電話ａでは、「○○さんお願いします」という発言要求を喋らずとも、キー押下のみで発言要求を行うのと同じ効果を得られる。なお、この機能は、上述した音声による発言要求を発する際に、補助的に利用されてもよい。つまり、音声に基づいて生成された合成音による発言要求と共に、上記制御音を送信してもよい。 In this way, the mobile phone a on the transmitting side can obtain the same effect as when a request for speech is made only by pressing a key, without requesting a statement of “Please say Mr. XX”. This function may be used in an auxiliary manner when issuing the above-described speech request by voice. That is, the control sound may be transmitted together with a request for speech by a synthesized sound generated based on the voice.

本発明は、携帯電話機、PHS（Personal Handyphone System）、PDA（Personal Data Assistance，Personal Digital Assistants：個人向け携帯型情報通信機器）等の携帯端末装置やPC（Personal Computer）等の通信機能を備えており、多者通話が可能である端末装置に利用することができ、産業上の利用可能性を有する。 The present invention has a communication function of a mobile terminal device such as a mobile phone, PHS (Personal Handyphone System), PDA (Personal Data Assistance, Personal Digital Assistants), and a PC (Personal Computer). Therefore, it can be used for a terminal device capable of multi-party calls and has industrial applicability.

多者通話システムの全体構成を示す概略図である。It is the schematic which shows the whole structure of a multi-party call system. 実施例１における携帯電話の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a mobile phone in Example 1. FIG. 携帯電話に記憶されている発言要求キーワードの一例を示す図である。It is a figure which shows an example of the speech request keyword memorize | stored in the mobile telephone. 実施例１における多者通話時の様子を示す説明図である。It is explanatory drawing which shows the mode at the time of the multi-party call in Example 1. FIG. 実施例１における多者通話時の動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement at the time of the multi-party call in Example 1. 実施例１における多者通話時の動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement at the time of the multi-party call in Example 1. 実施例２における多者通話時の動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement at the time of the multi-party call in Example 2. 実施例３における多者通話時の動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement at the time of the multi-party call in Example 3. 実施例４における多者通話時の様子を示す説明図である。It is explanatory drawing which shows the mode at the time of the multi-party call in Example 4. 実施例５における携帯電話の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a mobile phone according to a fifth embodiment. 実施例５における多者通話時の様子を示す説明図である。It is explanatory drawing which shows the mode at the time of the multi-party call in Example 5. FIG.

Explanation of symbols

１振動駆動装置部
２ＬＥＤ部
３ディスプレイ部
４スピーカ部
５メモリ
６装置制御部
７送信制御部
８受信制御部
９通信用アンテナ
１０音声認識部
１１合成音生成装置部
１２タイマ部
１３内部バス
１４マイク部
１５キーボード部
Ａユーザ（発言者）
Ｂユーザ（傍聴者）
ａ，ｂ，ｃ，ｄ携帯電話

DESCRIPTION OF SYMBOLS 1 Vibration drive device part 2 LED part 3 Display part 4 Speaker part 5 Memory 6 Device control part 7 Transmission control part 8 Reception control part 9 Communication antenna 10 Speech recognition part 11 Synthetic sound production | generation part 12 Timer part 13 Internal bus 14 Microphone Part 15 Keyboard part A User (speaker)
B user (listener)
a, b, c, d mobile phone

Claims

A call terminal capable of multi-party calls,
A received voice recognition means for receiving and recognizing voice data transmitted from a call terminal of a person who has the right to speak;
A speech request detection means for detecting that the received speech data is a speech request based on the speech recognition result;
An informing means for informing the user when a request to speak is detected;
A call terminal characterized by comprising:

A speech request message storage means for storing a speech request message set in advance;
The speech request detection means detects a speech request based on the stored speech request wording.
The call terminal according to claim 1.

The call terminal according to claim 1 or 2, wherein the notification means performs a plurality of types of notification.

The call terminal according to claim 3, wherein the notification means performs the plurality of types of notifications at different times.

A speech request detection frequency counting means for counting the number of speech request detections within a predetermined time is provided,
5. The call terminal according to claim 3, wherein the notification unit performs the plurality of types of notifications according to the number of detected speech requests.

A call terminal capable of multi-party calls,
A speech request speech recognition means for recognizing a speech by receiving an input of a speech request to another user;
A synthesized sound generating means for generating a synthesized sound corresponding to the speech-recognized voice data;
A speech request transmitting means for transmitting the voice data composed of the generated synthesized sound to the other user's call terminal;
A call terminal characterized by comprising

A speech request message storage means for storing a speech request message set in advance;
The synthesized sound generating means generates a synthesized sound corresponding to only specific voice data based on the stored message request wording;
The call terminal according to claim 6.

Voice feature data storage means for storing voice feature data representing the voice feature of the operator in advance,
The speech request speech recognition means performs speech recognition based on the speech feature data.
The call terminal according to claim 6 or 7, characterized by the above.

A multi-party call system comprising the call terminal according to claim 1 and the call terminal according to claim 6.

A server computer connected via a network to a call terminal capable of multi-party calls,
A speech request speech recognition means for recognizing the server computer by receiving a speech request for speech transmitted to the other user from the call terminal of the person having the speech right;
A speech request detection means for detecting that the received speech data is a speech request based on the speech recognition result;
Notification control means for notifying the other user's call terminal of that when a request to speak is detected;
A server computer comprising:

In the computing device equipped in the call terminal that can make multi-party calls,
A received voice recognition means for receiving and recognizing voice data transmitted from a call terminal of a person who has the right to speak;
A speech request detection means for detecting that the received speech data is a speech request based on the speech recognition result;
Notification control means for notifying the user via a notification device equipped in the call terminal when a request to speak is detected;
A program to realize

In the computing device equipped in the call terminal that can make multi-party calls,
A speech request speech recognition means for recognizing a speech by receiving an input of a speech request to another user;
A synthesized sound generating means for generating a synthesized sound corresponding to the speech-recognized voice data;
A speech request transmitting means for transmitting the voice data composed of the generated synthesized sound to the other user's call terminal;
A program to realize

A multi-party call method using a call terminal,
A speech terminal of a user who receives a speech request receives a speech data transmitted from a speech terminal of a person who has a speech right and recognizes the speech, and the received speech data is based on the speech recognition result. A speech request detection step for detecting that it is a speech request; a notification step for notifying the user via a notification device equipped in the communication terminal when a speech request is detected;
A multi-party call method comprising:

The speech request detection step detects a speech request based on a speech request word stored in advance in the speech request word storage unit.
The multi-party call method according to claim 13.

The multi-party call method according to claim 13 or 14, wherein the notification step performs a plurality of types of notifications at different times.

The notifying step counts the number of times a request for speech is detected within a certain period of time, and performs a plurality of types of notification according to the number of times a request for speech is detected.
16. The multi-party call method according to claim 13, 14, or 15.

Prior to the received speech recognition step, a speech terminal of a person who has the right to speak receives a speech request input by speech to another user and recognizes speech, and the speech-recognized speech data A synthesized sound generating step for generating a corresponding synthesized sound, and a speech request transmitting step for transmitting voice data composed of the generated synthesized sound to the call terminal of the other user;
17. The multi-party call method according to claim 13, 14, 15 or 16, characterized by comprising:

The synthetic sound generation step generates a synthetic sound corresponding to only specific voice data based on a speech request word stored in advance in the speech request word storage unit.
The multi-party call method according to claim 17.

The speech request voice recognition step performs voice recognition based on voice feature data representing an operator's voice feature stored in advance in the voice feature data storage means.
The multi-party call method according to claim 17 or 18, characterized in that: