JP2003140672A - Phoneme business system - Google Patents
Phoneme business systemInfo
- Publication number
- JP2003140672A JP2003140672A JP2001340687A JP2001340687A JP2003140672A JP 2003140672 A JP2003140672 A JP 2003140672A JP 2001340687 A JP2001340687 A JP 2001340687A JP 2001340687 A JP2001340687 A JP 2001340687A JP 2003140672 A JP2003140672 A JP 2003140672A
- Authority
- JP
- Japan
- Prior art keywords
- phoneme
- data
- voice
- user
- phonemes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 38
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 38
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000003287 optical effect Effects 0.000 claims description 6
- 238000000034 method Methods 0.000 claims description 5
- 239000000470 constituent Substances 0.000 claims description 4
- 239000004065 semiconductor Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 238000002716 delivery method Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012882 sequential analysis Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
【0001】[0001]
【発明の属する技術分野】本発明は音声の最小構成要素
である音素を用いた音素ビジネスシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a phoneme business system using phonemes, which are the minimum constituent elements of speech.
【0002】[0002]
【従来の技術】近年、電子メールやワープロ等のテキス
トデータを音声に変換する機能がパーソナルコンピュー
タに搭載されるようになってきている。しかしながらテ
キストデータを音声変換するのみの機能であれば、コス
トパフォーマンスに欠ける等の問題がある。また出力さ
れる音声の種類も男性や女性といった一般的なものであ
り、必ずしもユーザが所望する声色での音声出力ではな
いので、ユーザが聴いていて楽しさを感じにくい面があ
った。2. Description of the Related Art In recent years, personal computers have come to have a function of converting text data such as electronic mails and word processors into voice. However, there is a problem such as lack of cost performance if the function is only for converting text data into voice. Also, the type of voice that is output is a general one such as male or female, and it is not necessarily the voice output in the voice color desired by the user, so there is a side in which it is difficult for the user to hear and enjoy.
【0003】特開平7−140999号公報には、人間
の発声に近い合成音声を生成することができる音声合成
装置及び音声合成方法が開示されている。すなわち、辞
書の中に読み仮名、アクセント型等の情報をととも、ア
クセント指令値及び又は音韻継続時間長情報を予め用意
しておき、音韻の継続時間長を用いて音素片データのパ
ラメータ列を生成し、それらを基に音声波形を合成する
ことにより、人間の発声に一段と近い合成音声を出力す
るものである。また特開平11−143483号公報に
は、パソコン、ワープロ、ゲーム機などを利用する際の
合成音声の発生について、特にユーザが任意でかつ多様
な合成音声を選ぶことが可能な手段を実現するシステム
が開示されている。Japanese Unexamined Patent Publication No. 7-140999 discloses a voice synthesizing apparatus and a voice synthesizing method capable of generating a synthetic voice close to a human voice. That is, along with information such as reading kana and accent type in a dictionary, an accent command value and / or phoneme duration information is prepared in advance, and a parameter string of phoneme piece data is obtained using the phoneme duration. By generating and synthesizing a voice waveform based on the generated voices, a synthetic voice that is much closer to human speech is output. Further, Japanese Patent Application Laid-Open No. 11-143483 discloses a system that realizes a means by which a user can select a variety of synthetic voices for generation of synthetic voices when using a personal computer, word processor, game machine, or the like. Is disclosed.
【0004】これらの問題を解決する為の手段として音
声発声者の肉声をサンプリングして作成された音素デー
タベースを用いる読み上げシステムが考案されている。As a means for solving these problems, a reading system using a phoneme database created by sampling a real voice of a voice utterer has been devised.
【0005】[0005]
【発明が解決しようとする課題】実在の人物から採取し
た音素は発声者固有の個性をもつものであり、著作権に
似た権利を認めるべきであるが、従来の読み上げシステ
ムにおいてはその権利(以下著作権とする)を認める手
段を備えていない。本来は著作権を認めるべきであり、
使用に応じて相応の対価(著作権料)が音素の著作権所
有者に支払わなければならないが、従来の読み上げシス
テムにおいては音素が使用されても著作権料を支払うも
のではない。A phoneme collected from a real person has a personality peculiar to the speaker, and a right similar to copyright should be recognized, but in the conventional reading system, the right ( It is not provided with any means for recognizing the following). Originally you should recognize copyright,
A corresponding consideration (copyright fee) must be paid to the phoneme copyright owner depending on the use, but in the conventional reading system, even if the phoneme is used, the copyright fee is not paid.
【0006】実在の人物から採取した音素が使用された
場合、その著作権所有者に対して著作権料が支払われな
い為に、権利者は不利益を被ることになる。また販売手
段が無いためにユーザは容易にサービスを受けることが
難しかった。従って音素を利用したビジネスを発展させ
る為の障害となる可能性があった。またユーザがシステ
ムから容易にサービスを受けるための販売手段も備えて
はいなかった。[0006] When a phoneme collected from a real person is used, the copyright holder is disadvantageous because the copyright fee is not paid to the copyright owner. In addition, it is difficult for the user to easily receive the service because there is no selling means. Therefore, it could be an obstacle to developing a business using phonemes. In addition, it does not have a selling means for users to easily receive services from the system.
【0007】[0007]
【課題を解決するための手段】上記課題を考慮し、音素
が使用される場合においては、その音素の著作権所有者
に対して使用に応じた著作権料を支払う手段とユーザが
容易にサービスを受けることができる販売手段が必要で
ある。これを実現する為に、音声の最小構成要素を音素
と定め、その個性を持つ音素と、その音素を取り込む音
素取り込み手段と音素の著作権所有者を登録する著作権
者登録手段と、音声合成目的のデータ、例えば文章等の
テキストデータと音素取り込み手段から生成される音素
のデータベースを用いて音声合成目的のデータを解析
し、そのデータ毎に最適な音素を抽出して繋ぎあわせる
とともに音素の使用量を算出する音声合成手段と、音声
合成手段が処理した合成音データをユーザに提供する配
信手段と、音声合成手段によって算出された使用量に応
じて音素の著作権所有者毎に著作権料を算出する著作権
料算出手段と著作権料の算出情報を基に著作権料を音素
の著作権所有者に支払う金銭支払い手段と音素を利用し
たサービスをユーザに販売する販売手段と、音声合成目
的のデータを記録する合成目的データ記録手段と音素取
り込み手段が作成した音素データベースを記録する音素
データベース記録手段とユーザからのサービス要求を受
け付け、ユーザとシステムのインターフェースを行うユ
ーザインターフェース手段から構成される音素ビジネス
システムを提供する。In view of the above problems, when a phoneme is used, a means for paying a copyright fee according to the use to a copyright owner of the phoneme and a user can easily provide a service. You need a means of sale that can receive the. In order to realize this, the minimum constituent element of speech is defined as a phoneme, a phoneme with its individuality, a phoneme importing means for capturing the phoneme, a copyright holder registration means for registering the copyright owner of the phoneme, and a speech synthesis. The target data, for example, text data such as sentences, and the phoneme database generated from the phoneme capturing means are used to analyze the data for the purpose of speech synthesis, and the optimum phoneme is extracted and connected for each data, and the phoneme is used. A voice synthesizing means for calculating the amount, a delivering means for providing the synthesized voice data processed by the voice synthesizing means to the user, and a copyright fee for each phoneme copyright owner according to the usage amount calculated by the voice synthesizing means. Selling to the user a payment method and a phoneme-based service that pays the copyright fee to the phoneme copyright owner based on the copyright fee calculation method and the copyright fee calculation information Selling means, a synthesis purpose data recording means for recording data for speech synthesis purpose, a phoneme database recording means for recording a phoneme database created by the phoneme importing means, and a service request from the user are accepted, and the interface between the user and the system is performed. A phoneme business system including user interface means is provided.
【0008】[0008]
【発明の実施の形態】請求項1記載の発明は音声の最小
構成要素を音素と定め、その個性を持つ音素と、その音
素を取り込む音素取り込み手段と音素の著作権所有者を
登録する著作権者登録手段と、音声合成目的のデータ、
例えば文章等のテキストデータと音素取り込み手段から
生成される音素のデータベースを用いて音声合成目的の
データを解析し、そのデータ毎に最適な音素を抽出して
繋ぎあわせるとともに音素の使用量を算出する音声合成
手段と、音声合成手段が処理した合成音データをユーザ
に提供する配信手段と、音声合成手段によって算出され
た使用量に応じて音素の著作権所有者毎に著作権料を算
出する著作権料算出手段と著作権料の算出情報を基に著
作権料を音素の著作権所有者に支払う金銭支払い手段と
音素を利用したサービスをユーザに販売する販売手段
と、音声合成目的のデータを記録する合成目的データ記
録手段と音素取り込み手段が作成した音素データベース
を記録する音素データベース記録手段とユーザからのサ
ービス要求を受け付け、ユーザとシステムのインターフ
ェースを行うユーザインターフェース手段から構成され
る音素ビジネスシステムであり、音素の著作権所有者は
音素の使用に応じた著作権料が得られ、また音素を利用
したサービスのユーザは容易にサービスを受けることが
できるようになる。BEST MODE FOR CARRYING OUT THE INVENTION The invention according to claim 1 defines a phoneme as a minimum constituent element of a voice, and registers a phoneme having its individuality, a phoneme capturing means for capturing the phoneme, and a copyright holder of the phoneme. Person registration means, voice synthesis purpose data,
For example, data for text-to-speech synthesis is analyzed using text data such as sentences and a phoneme database generated from the phoneme capturing means, optimal phonemes are extracted and connected for each data, and the usage amount of phonemes is calculated. A speech synthesis means, a delivery means for providing the synthesized speech data processed by the speech synthesis means to the user, and a copyright fee for each phoneme copyright owner according to the usage amount calculated by the speech synthesis means. Based on the royalty fee calculation means and the copyright fee calculation information, a payment method for paying the copyright fee to the copyright holder of the phoneme, a sales means for selling the service using the phoneme to the user, and data for the voice synthesis purpose are provided. Accepting a service request from a user and a phoneme database recording means for recording the phoneme database created by the synthesis purpose data recording means and the phoneme capturing means for recording. , A phoneme business system composed of user interface means for interfacing the user with the system, where the copyright owner of the phoneme gets a copyright fee according to the use of the phoneme, and the user of the service using the phoneme You will be able to receive services easily.
【0009】(実施の形態)以下、本発明の音素ビジネ
スシステムの実施例について図1から図4を用いて説明
する。(Embodiment) An embodiment of a phoneme business system of the present invention will be described below with reference to FIGS. 1 to 4.
【0010】図2は本発明の音素ビジネスシステムの概
略説明図である。以下に本システムの概略説明を行う。
(201)は本発明の音素ビジネスシステムであり、音素を
利用したサービスをユーザに販売するとともに、音素の
使用に応じた著作権料を音素の著作権所有者に支払う。
(202)は本発明の音素ビジネスシステムに音素を提供す
る音素提供者である。(203)は本発明の音素ビジネスシ
ステムから音素を利用したサービスを受ける一般ユーザ
である。(204)は本発明の音素ビジネスシステムから音
素を利用したサービスを受け、さらに一般ユーザに対し
て音声情報等のサービスを提供する企業や市役所等の行
政機関及び学校等の教育機関や宗教団体等のコンテンツ
提供者である。FIG. 2 is a schematic explanatory view of the phoneme business system of the present invention. The outline of this system will be described below.
Reference numeral (201) is a phoneme business system of the present invention, which sells a service using a phoneme to a user and pays a copyright fee according to the use of the phoneme to the owner of the phoneme copyright.
(202) is a phoneme provider that provides phonemes to the phoneme business system of the present invention. (203) is a general user who receives a service using a phoneme from the phoneme business system of the present invention. (204) is a company that receives services using phonemes from the phoneme business system of the present invention and further provides services such as voice information to general users, administrative institutions such as city halls, educational institutions such as schools, religious groups, etc. Is a content provider of.
【0011】音素提供者が本システムに音素を提供する
と、本システムでは提供された音素の著作権所有者の登
録が行われる(205)。次にコンテンツ提供者と一般ユー
ザは本システムに対して、ネットワーク経由や電話、フ
ァックス、郵便、口頭等及びこれらを組み合わせた手段
で所望の音声キャラクタを用いての音声合成目的データ
の音声合成サービスを要求する(206)。音声合成目的デ
ータとは音声合成させたい文章が記述されたデータであ
り、データの形式は限定しない。またその内容は例えば
ニュースや行政案内及び教科書や予め本システムに記録
された小説等の文章、ユーザが作成した文章、自分史、
ドラマ、地方の方言等である。音素ビジネスシステムは
ユーザから要求のあった音声キャラクタの音素データベ
ースを用いて音声合成を行い、合成音データをネットワ
ーク経由や光ディスクや磁気ディスク、半導体メモリー
等の記録媒体に記録して郵便または人手にてユーザに配
信し、そのサービスに対する料金を徴収する(207)。When the phoneme provider provides the phoneme to the system, the system registers the copyright owner of the provided phoneme (205). Next, the content provider and the general user provide a voice synthesis service of voice synthesis target data using a desired voice character to the system via a network, telephone, fax, mail, verbal or a combination of these. Request (206). The voice synthesis target data is data in which a sentence to be voice-synthesized is described, and the data format is not limited. The contents are, for example, news, administrative guides, textbooks, sentences such as novels recorded in advance in this system, sentences created by the user, personal history,
Dramas, local dialects, etc. The phoneme business system performs voice synthesis using the phoneme database of voice characters requested by the user, and records the synthesized voice data via a network or a recording medium such as an optical disk, magnetic disk, or semiconductor memory by mail or manually. It is delivered to the user and the fee for the service is collected (207).
【0012】一般ユーザは、配信された合成音データを
合成音データ入力手段と、音声出力手段を備えた端末装
置に取り込み、再生することで所望の音声キャラクタで
の合成音声を聴くことができる。合成音データ入力手段
とは例えば、モデム等のネットワークインターフェー
ス、光ディスクや磁気ディスクや半導体メモリー等であ
る記憶媒体のデータ入力手段である。また音声出力手段
とはスピーカやヘッドフォン、イヤフォン等である。ま
たコンテンツ提供者は配信された合成音データを前記記
録媒体に記録し、一般ユーザのサービス要求に備える。
また一般ユーザはコンテンツ提供者に対してネットワー
ク経由や電話、FAX、郵便、口頭及びこれらを組み合わ
せた方法にてキャラクタ音声でのニュースや行政案内等
を要求し(208)、コンテンツ提供者は要求されたサービ
スをネットワーク経由や光ディスク、磁気ディスク、半
導体メモリー等の記録媒体に記録して郵便または人手に
て一般ユーザに配信する(209)。そして一般ユーザは配
信された合成音データを前記手段にて取り込み、合成音
声を聴くことができる。そして本システムからサービス
に使用された音素の使用に応じて、使用された音素の著
作権所有者に著作権料が支払われる(210)。ここまでが
本システムの概略説明である。A general user can listen to the synthesized voice with a desired voice character by taking in the distributed synthesized voice data to a terminal device equipped with the synthesized voice data input means and the voice output means and reproducing it. The synthetic sound data input means is, for example, a network interface such as a modem, data input means of a storage medium such as an optical disk, a magnetic disk, or a semiconductor memory. The audio output means is a speaker, headphones, earphones, or the like. Further, the content provider records the distributed synthetic voice data in the recording medium to prepare for a service request of a general user.
In addition, the general user requests the content provider via the network, telephone, fax, mail, verbal or a combination of these for news in character voice, administrative guidance, etc. (208), and the content provider is requested. The service is recorded via a network or on a recording medium such as an optical disk, a magnetic disk, or a semiconductor memory and delivered to a general user by mail or manually (209). Then, the general user can take in the delivered synthetic voice data by the means and listen to the synthetic voice. Then, according to the use of the phoneme used for the service from the system, a copyright fee is paid to the copyright owner of the used phoneme (210). The above is the outline of the present system.
【0013】次に本システムの詳細説明を行う。図1は
本発明の音素ビジネスシステムのブロック図である。(1
01)は音素登録者が発声する肉声であり、(102)は発声さ
れた肉声から音素を抽出し、データベース化する音素取
り込み手段である。(103)は音素取り込み手段から取り
込まれた音素の著作権所有者の登録を行う著作権者登録
手段であり、(104)は音素取り込み手段から生成された
音素のデータベースを用い、音声合成したい目的のデー
タを分析した結果、最適な音素を組み合わせて発音する
とともに、音素の使用量をも算出する音声合成手段であ
る。Next, a detailed description of this system will be given. FIG. 1 is a block diagram of a phoneme business system of the present invention. (1
Reference numeral 01) is a real voice uttered by a phoneme registrant, and reference numeral 102 is a phoneme importing means for extracting phonemes from the uttered real voice and creating a database. (103) is a copyright holder registration means for registering the copyright owner of the phoneme taken in by the phoneme importing means, and (104) is the purpose of synthesizing speech using the phoneme database generated by the phoneme importing means. As a result of analyzing the data of (1), it is a voice synthesizing unit that combines optimal phonemes and pronounces, and also calculates the usage amount of phonemes.
【0014】(105)は音声合成手段によって算出された
使用量の結果に応じ、音素の著作権所有者毎に著作権料
を算出する著作権料算出手段であり、(106)は著作権料
課金手段からの算出情報を基に著作権料を音素の著作権
所有者に支払う金銭支払い手段である。(107)は音声合
成手段が処理した合成音データをユーザに提供する為の
配信手段であり、インターネット等のネットワーク経由
または光ディスク及び磁気ディスクや半導体メモリー等
の記録媒体に記録して郵便または人手にてユーザに合成
音データを提供する。(108)は音素を利用したサービス
をユーザに販売する販売手段である。Reference numeral (105) is a copyright fee calculation means for calculating a copyright fee for each phoneme copyright owner according to the result of the usage amount calculated by the speech synthesis means, and (106) is a copyright fee. It is a monetary payment means for paying the copyright fee to the phoneme copyright owner based on the calculated information from the charging means. (107) is a delivery means for providing the synthesized voice data processed by the voice synthesis means to the user, and is recorded via a network such as the Internet or on a recording medium such as an optical disk and a magnetic disk or a semiconductor memory, and then mailed or manually. And provide synthesized voice data to the user. (108) is a selling means for selling a service using phonemes to the user.
【0015】(109)は音声合成目的のデータ、例えば小
説や文章のテキストデータ等を記録する為の合成目的デ
ータ記録手段であり、(110)は音素取り込み手段が作成
した音素データベースを記録する音素データベース記録
手段である。合成目的データ記録手段、音素データベー
ス記録手段は光ディスクや磁気ディスク、半導体メモリ
ー等であり、データが記録できればここに記載したもの
に限定しない。(111)はユーザからのサービス要求を受
け付け、ユーザとシステムのインターフェースを行うユ
ーザインターフェース手段である。ユーザインターフェ
ース手段はインターネット上で用いられるWEBシステム
でも良いし、電話やFAX、郵便あるいは直接人手で行っ
ても良い。(109) is a synthesis target data recording means for recording data for speech synthesis purpose, for example, text data of a novel or a sentence, and (110) is a phoneme for recording a phoneme database created by the phoneme capturing means. It is a database recording means. The synthesis purpose data recording means and the phoneme database recording means are optical disks, magnetic disks, semiconductor memories, etc., and are not limited to those described here as long as data can be recorded. (111) is a user interface unit that receives a service request from a user and interfaces with the user and the system. The user interface means may be a WEB system used on the Internet, or may be a telephone, FAX, mail, or directly by hand.
【0016】次に動作の説明を行う。本システムの動作
は2つの動作に大別できる。一つは肉声を取り込み、音
素を蓄積するまでの動作、もう一つは蓄積した音素の利
用から販売、著作権所有者への課金、著作権料支払いま
での動作である。Next, the operation will be described. The operation of this system can be roughly divided into two operations. One is the operation of capturing the real voice and accumulating the phonemes, and the other is the operation from the use of the accumulated phonemes to the sale, the charge to the copyright owner, and the payment of the copyright fee.
【0017】初めに本システムの音素蓄積の動作につい
て説明する。図3は本発明の音素ビジネスシステムにお
ける音素蓄積までの動作フローチャートである。音素登
録者が発声を行うとマイク等を備えた音素取り込み手段
は発声された肉声を任意のフォーマットに沿った形でデ
ータベース化し、音素データベース記録手段に記録する
(s301)。次に著作権者登録手段は音素取り込み手段が取
り込んだ音素に関し、その音素の著作権所有者の登録を
行う(s302)。なお、(s301)、(s302)の動作の順番は入れ
替わっても良い。以上が音素蓄積までの動作である。First, the operation of phoneme storage of this system will be described. FIG. 3 is an operation flowchart up to phoneme storage in the phoneme business system of the present invention. When a phoneme registrant utters a voice, the phoneme capturing means equipped with a microphone and the like forms a database of the uttered real voice in a form conforming to an arbitrary format and records it in the phoneme database recording means.
(s301). Next, the copyright holder registration means registers the copyright owner of the phoneme with respect to the phoneme taken in by the phoneme taking means (s302). The order of the operations of (s301) and (s302) may be exchanged. The above is the operation up to phoneme accumulation.
【0018】図4は本発明の音素ビジネスシステムにお
ける音素を利用したサービスの受け付けから販売、著作
権料支払いまでの動作フローチャートである。ユーザ、
すなわちコンテンツ提供者や一般ユーザが本システムに
対し、インターネット等のネットワークや電話、FAX、
郵便、口頭及びこれらを組み合わせた手段を用いて、所
望の音声キャラクタや音声合成目的のデータを指定し、
音声合成サービスを要求すると、本システムのユーザイ
ンターフェース手段はサービス要求を受け付ける(s40
1)。ユーザが指定する音声合成目的のデータとは予めシ
ステム内部の合成目的データ記録部に記録されているデ
ータやコンテンツ提供者及び一般ユーザが本システムに
対して音声合成を依頼するデータである。コンテンツ提
供者及び一般ユーザが音声合成を依頼したデータはシス
テム内部の合成目的データ記録手段に記録される。FIG. 4 is an operation flowchart from the reception of a service using a phoneme to the sale and payment of a copyright fee in the phoneme business system of the present invention. A user,
That is, content providers and general users can use this system to access networks such as the Internet, telephones, faxes,
Designate desired voice characters and voice synthesis target data using mail, verbal or a combination of these.
When a voice synthesis service is requested, the user interface means of this system accepts the service request (s40
1). The voice synthesis target data designated by the user is data recorded in advance in the synthesis target data recording section in the system or data requested by the content provider and general user to the system. The data requested by the content provider and the general user for voice synthesis is recorded in the synthesis target data recording means inside the system.
【0019】次に販売手段はユーザインターフェース手
段が受け付けたユーザからのサービス要求内容を認識
し、ユーザが依頼したサービスに応じた料金を計算す
る。そして料金の結果をユーザインターフェース手段を
用いてユーザに提示し、ユーザからの了承を得て、料金
を徴収する(s402)。この徴収形態については以下のよう
に複数の形態がある。例えばユーザに提供する音声キャ
ラクタの数に応じた料金徴収、または音声キャラクタの
質(世間相場)に応じた料金徴収、もしくは各キャラク
タの音素データ量に応じた料金徴収、それと音素を用い
て音声合成するデータの本数やデータ量に応じた料金徴
収、または音声合成済みデータの本数または量に応じた
料金徴収、勿論、上記各料金徴収要因を色々組み合わせ
た料金徴収もあり得る。なお必ずしもこの段階で料金を
徴収する必要はなく、ユーザ名、クレジット番号等のユ
ーザ情報を記録し、課金を行って後日に料金を徴収して
も良い。Next, the selling means recognizes the content of the service request from the user accepted by the user interface means, and calculates the charge according to the service requested by the user. Then, the result of the charge is presented to the user using the user interface means, and the charge is collected with the user's approval (s402). There are several forms of this collection as follows. For example, charge collection according to the number of voice characters provided to the user, charge collection according to the quality of the voice character (public market price), or charge collection according to the phoneme data amount of each character, and voice synthesis with it There may be a charge collection according to the number or amount of data to be stored, or a charge collection according to the number or amount of voice-synthesized data, or of course, a charge collection in which the above-mentioned respective charge collection factors are variously combined. Note that it is not always necessary to collect the fee at this stage, and user information such as a user name and a credit number may be recorded, the fee may be charged, and the fee may be collected at a later date.
【0020】次に音声合成手段はユーザから指定された
音声合成目的データを合成目的データ記録部から読み出
し、場合によっては音声合成可能なデータに変換して、
順次解析を行い、各データに最も適する音素データをユ
ーザが指定した音声キャラクタの音素データベースから
読み出して、繋ぎ合わせ、合成音データを作成する。合
成音データは場合によっては配信手段やユーザが使用す
る端末装置に最適なデータフォーマットに変換される(s
403)。次に配信手段は音声合成手段が作成した合成音デ
ータをユーザに配信する(s404)。Next, the voice synthesizing means reads the voice synthesizing target data designated by the user from the synthesis target data recording unit, converts it into voice synthesizable data in some cases, and
Sequential analysis is performed, and the phoneme data most suitable for each data is read from the phoneme database of the voice character specified by the user and connected to create synthetic sound data. In some cases, the synthesized voice data is converted into a data format that is optimal for the distribution means and the terminal device used by the user (s
403). Next, the distribution means distributes the synthetic sound data created by the speech synthesis means to the user (s404).
【0021】一般ユーザは本システムから配信された合
成音データを合成音データ入力手段と、音声出力手段を
備えた端末装置に取り込み、再生することで所望の音声
キャラクタでの文章の朗読を聴くことができる。またコ
ンテンツ提供者は配信された合成音データを前記記録媒
体に記録し、一般ユーザのサービス要求に備える。次に
音声合成手段は音声合成の際に使用された音素の使用量
を算出する(s405)。なおここでは音素の使用量とした
が、音声合成したい目的のデータの使用量や音声合成音
の使用量であっても良い。また使用量についてもデータ
の量及び合成時間の意味も勿論含んでいる。A general user listens to the reading of a sentence with a desired voice character by taking in the synthesized voice data distributed from this system into a terminal device equipped with a synthesized voice data input means and a voice output means, and reproducing it. You can Further, the content provider records the distributed synthetic voice data in the recording medium to prepare for a service request of a general user. Next, the voice synthesizing unit calculates the usage amount of the phonemes used in the voice synthesis (s405). It should be noted that although the phoneme usage amount is used here, it may be the usage amount of target data for which voice synthesis is desired or the usage amount of voice synthesized speech. The amount of data used and the meaning of the synthesis time are of course included.
【0022】次に著作権料算出手段は音声合成手段から
の使用量の算出結果に基づき、使用量に応じた著作権料
の算出を行う(s406)。そしてこの算出情報を基に金銭支
払い手段より、著作権料が音素の著作権所有者に対して
支払われる(s407)。Next, the copyright fee calculating means calculates the copyright fee according to the usage amount based on the calculation result of the usage amount from the voice synthesizing means (s406). Then, based on the calculated information, the money payment means pays the copyright fee to the copyright holder of the phoneme (s407).
【0023】[0023]
【発明の効果】本発明のシステムにより、音素の著作権
所有者は音素の使用に応じた著作権料が得られ、また音
素を利用したサービスのユーザは容易にサービスを受け
ることができるようになる。したがって音素を用いるビ
ジネスそのものが大きく発展する可能性がある。According to the system of the present invention, a copyright holder of a phoneme can obtain a copyright fee according to the use of the phoneme, and a user of a service using the phoneme can easily receive the service. Become. Therefore, there is a possibility that the business itself using phonemes will develop significantly.
【図1】本発明の音素ビジネスシステムのブロック図FIG. 1 is a block diagram of a phoneme business system of the present invention.
【図2】本発明の音素ビジネスシステムの概略説明図FIG. 2 is a schematic explanatory diagram of a phoneme business system of the present invention.
【図3】本発明の音素ビジネスシステムにおける音素蓄
積までの動作フローチャートFIG. 3 is an operation flowchart up to phoneme accumulation in the phoneme business system of the present invention.
【図4】本発明の音素ビジネスシステムにおける音素を
利用したサービスの受け付けから販売、著作権料支払い
までの動作フローチャートFIG. 4 is an operation flowchart from reception of a service using a phoneme to sales and payment of a copyright fee in the phoneme business system of the present invention.
(101) 音素登録者が発声する肉声 (102) 音素取り込み手段 (103) 著作権者登録手段 (104) 音声合成手段 (105) 著作権料算出手段 (106) 金銭支払い手段 (107) 配信手段 (108) 販売手段 (109) 合成目的データ記録手段 (110) 音素データベース記録手段 (111) ユーザインターフェース手段 (101) Real voice uttered by phoneme registrants (102) Phoneme capture means (103) Copyright holder registration means (104) Speech synthesis means (105) Copyright fee calculation means (106) Money payment method (107) Delivery method (108) Sales means (109) Compositing purpose data recording means (110) Phoneme database recording means (111) User interface means
───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.7 識別記号 FI テーマコート゛(参考) G10L 13/00 G10L 3/00 E 13/06 5/04 Z ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. 7 Identification code FI theme code (reference) G10L 13/00 G10L 3/00 E 13/06 5/04 Z
Claims (7)
個性を持つ音素と、その音素を取り込む音素取り込み手
段と音素の著作権所有者を登録する著作権者登録手段
と、音声合成目的のデータ、例えば文章等のテキストデ
ータと音素取り込み手段から生成される音素のデータベ
ースを用いて音声合成目的のデータを解析し、そのデー
タ毎に最適な音素を抽出して繋ぎあわせるとともに音素
の使用量を算出する音声合成手段と、音声合成手段が処
理した合成音データをユーザに提供する配信手段と、音
声合成手段によって算出された使用量に応じて音素の著
作権所有者毎に著作権料を算出する著作権料算出手段と
著作権料の算出情報を基に著作権料を音素の著作権所有
者に支払う金銭支払い手段と音素を利用したサービスを
ユーザに販売する販売手段と、音声合成目的のデータを
記録する合成目的データ記録手段と音素取り込み手段が
作成した音素データベースを記録する音素データベース
記録手段とユーザからのサービス要求を受け付け、ユー
ザとシステムのインターフェースを行うユーザインター
フェース手段から構成される音素ビジネスシステム。1. A phoneme is defined as a minimum constituent element of a voice, a phoneme having its individuality, a phoneme importing means for importing the phoneme, a copyright holder registration means for registering a copyright owner of the phoneme, and a speech synthesis purpose. Data, for example, text data such as sentences, and the phoneme database generated from the phoneme capturing means are used to analyze the data for the purpose of speech synthesis, and the optimum phonemes are extracted and connected for each data and the usage amount of the phonemes is determined. A voice synthesizing unit for calculating, a delivery unit for providing the synthesized voice data processed by the voice synthesizing unit to the user, and a copyright fee for each phoneme copyright owner according to the usage amount calculated by the voice synthesizing unit. Selling by selling to a user a payment method and a phoneme-based service that pays the copyright fee to the phoneme copyright owner based on the copyright fee calculation method and the copyright fee calculation information Means, a synthesis purpose data recording means for recording data for speech synthesis purpose, and a phoneme database recording means for recording a phoneme database created by the phoneme importing means, and a user interface for accepting a service request from the user and performing an interface between the user and the system. A phoneme business system composed of means.
といった母音や子音の組み合わせから成る音であること
を特徴とする請求項1記載の音素ビジネスシステム。2. The phoneme is "a" or "i", "ka" or "ki".
The phoneme business system according to claim 1, which is a sound composed of a combination of vowels and consonants.
音(例えば「秋(あき)」は「a」「k」「i」の単音から
成る)であることを特徴とする請求項1記載の音素ビジ
ネスシステム。3. The phoneme is a single sound which is a minimum unit of continuous speech (for example, "Aki" is composed of single sounds of "a", "k", and "i"). Phoneme business system.
項1記載の音素ビジネスシステム。4. The phoneme business system according to claim 1, wherein the phonemes are words.
る請求項1記載の音素ビジネスシステム。5. The phoneme business system according to claim 1, wherein the phonemes are clauses and sentences.
とを特徴とする請求項1記載の音素ビジネスシステム。6. The phoneme business system according to claim 1, wherein the phonemes are onomatopoeia, onomatopoeia and mimetic words.
徴とする請求項1記載の音素ビジネスシステム。 【請求請8】 端末装置の合成音データ入力手段はメモ
リーカードや光ディスク及び磁気ディスク等の記憶装置
やモデム等のネットワークインターフェースであること
を特徴とする請求請1記載の読み上げシステム。7. The phoneme business system according to claim 1, wherein the phoneme is digitally synthesized speech. 8. The reading system according to claim 1, wherein the synthesized voice data input means of the terminal device is a memory card, a storage device such as an optical disk and a magnetic disk, or a network interface such as a modem.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001340687A JP2003140672A (en) | 2001-11-06 | 2001-11-06 | Phoneme business system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001340687A JP2003140672A (en) | 2001-11-06 | 2001-11-06 | Phoneme business system |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2003140672A true JP2003140672A (en) | 2003-05-16 |
JP2003140672A5 JP2003140672A5 (en) | 2005-04-14 |
Family
ID=19154843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2001340687A Withdrawn JP2003140672A (en) | 2001-11-06 | 2001-11-06 | Phoneme business system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2003140672A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7945851B2 (en) | 2007-03-14 | 2011-05-17 | Nuance Communications, Inc. | Enabling dynamic voiceXML in an X+V page of a multimodal application |
US8229081B2 (en) | 2008-04-24 | 2012-07-24 | International Business Machines Corporation | Dynamically publishing directory information for a plurality of interactive voice response systems |
US8239205B2 (en) | 2006-09-12 | 2012-08-07 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a multimodal application |
US8290780B2 (en) | 2009-06-24 | 2012-10-16 | International Business Machines Corporation | Dynamically extending the speech prompts of a multimodal application |
US8380513B2 (en) | 2009-05-19 | 2013-02-19 | International Business Machines Corporation | Improving speech capabilities of a multimodal application |
US8416714B2 (en) | 2009-08-05 | 2013-04-09 | International Business Machines Corporation | Multimodal teleconferencing |
US8494858B2 (en) | 2006-09-11 | 2013-07-23 | Nuance Communications, Inc. | Establishing a preferred mode of interaction between a user and a multimodal application |
US8510117B2 (en) | 2009-07-09 | 2013-08-13 | Nuance Communications, Inc. | Speech enabled media sharing in a multimodal application |
US8571872B2 (en) | 2005-06-16 | 2013-10-29 | Nuance Communications, Inc. | Synchronizing visual and speech events in a multimodal application |
US8600755B2 (en) | 2006-09-11 | 2013-12-03 | Nuance Communications, Inc. | Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction |
US8706490B2 (en) | 2007-03-20 | 2014-04-22 | Nuance Communications, Inc. | Indexing digitized speech with words represented in the digitized speech |
US8781840B2 (en) | 2005-09-12 | 2014-07-15 | Nuance Communications, Inc. | Retrieval and presentation of network service results for mobile device using a multimodal browser |
US8843376B2 (en) | 2007-03-13 | 2014-09-23 | Nuance Communications, Inc. | Speech-enabled web content searching using a multimodal browser |
US8862475B2 (en) | 2007-04-12 | 2014-10-14 | Nuance Communications, Inc. | Speech-enabled content navigation and control of a distributed multimodal browser |
US8909532B2 (en) | 2007-03-23 | 2014-12-09 | Nuance Communications, Inc. | Supporting multi-lingual user interaction with a multimodal application |
US8938392B2 (en) | 2007-02-27 | 2015-01-20 | Nuance Communications, Inc. | Configuring a speech engine for a multimodal application based on location |
US9076454B2 (en) | 2008-04-24 | 2015-07-07 | Nuance Communications, Inc. | Adjusting a speech engine for a mobile computing device based on background noise |
US9083798B2 (en) | 2004-12-22 | 2015-07-14 | Nuance Communications, Inc. | Enabling voice selection of user preferences |
US9208785B2 (en) | 2006-05-10 | 2015-12-08 | Nuance Communications, Inc. | Synchronizing distributed speech recognition |
US9208783B2 (en) | 2007-02-27 | 2015-12-08 | Nuance Communications, Inc. | Altering behavior of a multimodal application based on location |
US9349367B2 (en) | 2008-04-24 | 2016-05-24 | Nuance Communications, Inc. | Records disambiguation in a multimodal application operating on a multimodal device |
US9396721B2 (en) | 2008-04-24 | 2016-07-19 | Nuance Communications, Inc. | Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise |
JP2020154571A (en) * | 2019-03-19 | 2020-09-24 | ジャングルX株式会社 | Game service distribution device, game service distribution method and game service distribution program |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04180451A (en) * | 1990-11-15 | 1992-06-26 | Yoshiya Marumoto | Information delivery system |
JPH05233565A (en) * | 1991-11-12 | 1993-09-10 | Fujitsu Ltd | Voice synthesization system |
JPH09171396A (en) * | 1995-10-18 | 1997-06-30 | Baisera:Kk | Voice generating system |
JPH11345261A (en) * | 1998-06-01 | 1999-12-14 | Pfu Ltd | Content management system and recording medium |
JP2001255884A (en) * | 2000-03-13 | 2001-09-21 | Antena:Kk | Voice synthesis system, voice delivery system capable of order-accepting and delivering voice messages using the voice synthesis system, and voice delivery method |
JP2001282281A (en) * | 2000-03-28 | 2001-10-12 | Toshiba Corp | Storage medium, distributing method, and voice output device |
JP2002023777A (en) * | 2000-06-26 | 2002-01-25 | Internatl Business Mach Corp <Ibm> | Voice synthesizing system, voice synthesizing method, server, storage medium, program transmitting device, voice synthetic data storage medium and voice outputting equipment |
JP2002358092A (en) * | 2001-06-01 | 2002-12-13 | Sony Corp | Voice synthesizing system |
JP2003114692A (en) * | 2001-10-05 | 2003-04-18 | Toyota Motor Corp | Providing system, terminal, toy, providing method, program, and medium for sound source data |
-
2001
- 2001-11-06 JP JP2001340687A patent/JP2003140672A/en not_active Withdrawn
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04180451A (en) * | 1990-11-15 | 1992-06-26 | Yoshiya Marumoto | Information delivery system |
JPH05233565A (en) * | 1991-11-12 | 1993-09-10 | Fujitsu Ltd | Voice synthesization system |
JPH09171396A (en) * | 1995-10-18 | 1997-06-30 | Baisera:Kk | Voice generating system |
JPH11345261A (en) * | 1998-06-01 | 1999-12-14 | Pfu Ltd | Content management system and recording medium |
JP2001255884A (en) * | 2000-03-13 | 2001-09-21 | Antena:Kk | Voice synthesis system, voice delivery system capable of order-accepting and delivering voice messages using the voice synthesis system, and voice delivery method |
JP2001282281A (en) * | 2000-03-28 | 2001-10-12 | Toshiba Corp | Storage medium, distributing method, and voice output device |
JP2002023777A (en) * | 2000-06-26 | 2002-01-25 | Internatl Business Mach Corp <Ibm> | Voice synthesizing system, voice synthesizing method, server, storage medium, program transmitting device, voice synthetic data storage medium and voice outputting equipment |
JP2002358092A (en) * | 2001-06-01 | 2002-12-13 | Sony Corp | Voice synthesizing system |
JP2003114692A (en) * | 2001-10-05 | 2003-04-18 | Toyota Motor Corp | Providing system, terminal, toy, providing method, program, and medium for sound source data |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9083798B2 (en) | 2004-12-22 | 2015-07-14 | Nuance Communications, Inc. | Enabling voice selection of user preferences |
US8571872B2 (en) | 2005-06-16 | 2013-10-29 | Nuance Communications, Inc. | Synchronizing visual and speech events in a multimodal application |
US8781840B2 (en) | 2005-09-12 | 2014-07-15 | Nuance Communications, Inc. | Retrieval and presentation of network service results for mobile device using a multimodal browser |
US9208785B2 (en) | 2006-05-10 | 2015-12-08 | Nuance Communications, Inc. | Synchronizing distributed speech recognition |
US9343064B2 (en) | 2006-09-11 | 2016-05-17 | Nuance Communications, Inc. | Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction |
US9292183B2 (en) | 2006-09-11 | 2016-03-22 | Nuance Communications, Inc. | Establishing a preferred mode of interaction between a user and a multimodal application |
US8494858B2 (en) | 2006-09-11 | 2013-07-23 | Nuance Communications, Inc. | Establishing a preferred mode of interaction between a user and a multimodal application |
US8600755B2 (en) | 2006-09-11 | 2013-12-03 | Nuance Communications, Inc. | Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction |
US8498873B2 (en) | 2006-09-12 | 2013-07-30 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of multimodal application |
US8239205B2 (en) | 2006-09-12 | 2012-08-07 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a multimodal application |
US8862471B2 (en) | 2006-09-12 | 2014-10-14 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a multimodal application |
US8938392B2 (en) | 2007-02-27 | 2015-01-20 | Nuance Communications, Inc. | Configuring a speech engine for a multimodal application based on location |
US9208783B2 (en) | 2007-02-27 | 2015-12-08 | Nuance Communications, Inc. | Altering behavior of a multimodal application based on location |
US8843376B2 (en) | 2007-03-13 | 2014-09-23 | Nuance Communications, Inc. | Speech-enabled web content searching using a multimodal browser |
US7945851B2 (en) | 2007-03-14 | 2011-05-17 | Nuance Communications, Inc. | Enabling dynamic voiceXML in an X+V page of a multimodal application |
US9123337B2 (en) | 2007-03-20 | 2015-09-01 | Nuance Communications, Inc. | Indexing digitized speech with words represented in the digitized speech |
US8706490B2 (en) | 2007-03-20 | 2014-04-22 | Nuance Communications, Inc. | Indexing digitized speech with words represented in the digitized speech |
US8909532B2 (en) | 2007-03-23 | 2014-12-09 | Nuance Communications, Inc. | Supporting multi-lingual user interaction with a multimodal application |
US8862475B2 (en) | 2007-04-12 | 2014-10-14 | Nuance Communications, Inc. | Speech-enabled content navigation and control of a distributed multimodal browser |
US8229081B2 (en) | 2008-04-24 | 2012-07-24 | International Business Machines Corporation | Dynamically publishing directory information for a plurality of interactive voice response systems |
US9076454B2 (en) | 2008-04-24 | 2015-07-07 | Nuance Communications, Inc. | Adjusting a speech engine for a mobile computing device based on background noise |
US9349367B2 (en) | 2008-04-24 | 2016-05-24 | Nuance Communications, Inc. | Records disambiguation in a multimodal application operating on a multimodal device |
US9396721B2 (en) | 2008-04-24 | 2016-07-19 | Nuance Communications, Inc. | Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise |
US8380513B2 (en) | 2009-05-19 | 2013-02-19 | International Business Machines Corporation | Improving speech capabilities of a multimodal application |
US8290780B2 (en) | 2009-06-24 | 2012-10-16 | International Business Machines Corporation | Dynamically extending the speech prompts of a multimodal application |
US9530411B2 (en) | 2009-06-24 | 2016-12-27 | Nuance Communications, Inc. | Dynamically extending the speech prompts of a multimodal application |
US8510117B2 (en) | 2009-07-09 | 2013-08-13 | Nuance Communications, Inc. | Speech enabled media sharing in a multimodal application |
US8416714B2 (en) | 2009-08-05 | 2013-04-09 | International Business Machines Corporation | Multimodal teleconferencing |
JP2020154571A (en) * | 2019-03-19 | 2020-09-24 | ジャングルX株式会社 | Game service distribution device, game service distribution method and game service distribution program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2003140672A (en) | Phoneme business system | |
US10991360B2 (en) | System and method for generating customized text-to-speech voices | |
Rabiner | Applications of voice processing to telecommunications | |
Eide et al. | A corpus-based approach to< ahem/> expressive speech synthesis | |
Cox et al. | Tessa, a system to aid communication with deaf people | |
US7689421B2 (en) | Voice persona service for embedding text-to-speech features into software programs | |
JP4478939B2 (en) | Audio processing apparatus and computer program therefor | |
JP2003140672A5 (en) | ||
JPH11513144A (en) | Interactive language training device | |
JPWO2005093713A1 (en) | Speech synthesizer | |
TW200901161A (en) | Speech synthesizer generating system and method | |
CN109346057A (en) | A kind of speech processing system of intelligence toy for children | |
JPH09171396A (en) | Voice generating system | |
CN115249480A (en) | Beidou short message-based voice and text conversion method and related device | |
JPH11143483A (en) | Voice generating system | |
Kulkarni et al. | Clartts: An open-source classical arabic text-to-speech corpus | |
JP2003058180A (en) | Synthetic voice sales system and phoneme copyright authentication system | |
CN109616116B (en) | Communication system and communication method thereof | |
WO2001073752A1 (en) | Storage medium, distributing method, and speech output device | |
US20020156630A1 (en) | Reading system and information terminal | |
Kouroupetroglou et al. | Speech-enabled e-Commerce for disabled and elderly persons | |
Schroeter | The fundamentals of text-to-speech synthesis | |
JP2020204683A (en) | Electronic publication audio-visual system, audio-visual electronic publication creation program, and program for user terminal | |
JP4244661B2 (en) | Audio data providing system, audio data generating apparatus, and audio data generating program | |
JP2003280692A (en) | Phoneme database distribution system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20040603 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20040603 |
|
RD01 | Notification of change of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7421 Effective date: 20050701 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20060502 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20060516 |
|
A761 | Written withdrawal of application |
Free format text: JAPANESE INTERMEDIATE CODE: A761 Effective date: 20060713 |