JP5860575B1

JP5860575B1 - Voice recording program, voice recording terminal device, and voice recording system

Info

Publication number: JP5860575B1
Application number: JP2015087381A
Authority: JP
Inventors: 司黒岩; 賢悟渡邉
Original assignee: 株式会社シフトワン
Priority date: 2014-09-24
Filing date: 2015-04-22
Publication date: 2016-02-16
Anticipated expiration: 2034-10-31
Also published as: JP2016066992A; JP5774185B1; JP2016066984A

Abstract

【課題】本願発明の課題は、従来技術が抱える問題を解決することであり、すなわち、動画中に表れる複数のキャラクターの中から、所望のキャラクターを選んでユーザ自身の音声を反映させるとともに、他の声役を複数の音声から選択でき、しかも動画を確認しながら音声を録音する際、その速度を案内することのできる音声録音プログラム、音声録音端末装置、及び音声録音システムを提供することである。【解決手段】本願発明の「音声録音プログラム」は、動画（コマ送りされる画像の連続表示を含む）を確認しながら音声を録音するプログラムであって、コンテンツ読出し処理と、主声役選択処理、動画表示処理、録音支援処理、音声記憶処理をコンピュータに実行させる機能を備えたものである。このうちコンテンツ読出し処理は、動画及びタイムラインを有する「コンテンツ」を読み出す処理である。【選択図】図４An object of the present invention is to solve a problem of the prior art, that is, a desired character is selected from a plurality of characters appearing in a moving image and the user's own voice is reflected. A voice recording program, a voice recording terminal device, and a voice recording system capable of selecting a voice role from a plurality of voices and guiding the speed when recording a voice while confirming a moving image. . A “voice recording program” according to the present invention is a program for recording a voice while confirming a moving image (including continuous display of frame-by-frame images), and includes a content reading process and a main character selection process. And a function for causing a computer to execute a moving image display process, a recording support process, and a voice storage process. Among these, the content reading process is a process of reading “content” having a moving image and a timeline. [Selection] Figure 4

Description

本願発明は、映像に合わせて音声を録音する技術に関するものであり、より具体的には、設定された複数の声役のうち所望の声役を選んで音声録音することのできる音声録音プログラム、音声録音端末装置、及び音声録音システムに関するものである。 The present invention relates to a technique for recording sound according to video, more specifically, a voice recording program capable of recording a voice by selecting a desired voice role from a plurality of set voice roles, The present invention relates to a voice recording terminal device and a voice recording system.

従来、動画といえば映画やテレビが主流であったが、近年では情報技術の飛躍的進歩に伴い、コンピュータを用いて表示する動画も多く利用されている。コンピュータ上で動作する「動画ファイル」はコンテナとも呼ばれ、一般的に「映像データ」と「音声データ」で構成される。そして、動画再生用のソフトウェアを使って動画ファイルを再生するわけである。動画ファイルを再生する端末機器としては、パーソナルコンピュータ（パソコン）をはじめ、タブレット型端末やスマートフォンなど多種多様なものが利用されている。 Conventionally, movies and televisions have been mainstream when it comes to moving images, but in recent years, moving images displayed using a computer are often used with the rapid progress of information technology. A “moving image file” that operates on a computer is also called a container, and is generally composed of “video data” and “audio data”. Then, the moving image file is reproduced using moving image reproduction software. Various terminal devices that play video files include personal computers (personal computers), tablet terminals, and smartphones.

映像は、静止画とは異なり人や物の動きを連続的に表現するものであり、実際の動作等を写し取った実写や、アニメーションが代表例として挙げられる。この映像は、多数の静止画を高速で切り替えることで動きを表現するのが主流であるが、ここでは、複数の静止画を断続的に切り替える、いわゆるコマ送りによる表現も映像に含めることとする。したがって、本願の出願人は特許文献１に示す「動く漫画」を発明しているが、この「動く漫画」もここでいう映像に含まれる。 Unlike a still image, a video continuously expresses the movement of a person or an object. Typical examples are a live-action image that captures an actual motion and an animation. The mainstream of this video is to express the movement by switching many still images at high speed, but here, so-called frame-by-frame representation that switches between multiple still images is included in the video. . Therefore, the applicant of the present application has invented the “moving cartoon” shown in Patent Document 1, and this “moving cartoon” is also included in the image here.

ところで、動画には映像とともに音声が含まれることは既に説明したとおりであるが、映像中の登場人物（人に限定されないため、以下ここでは「キャラクター」という。）に関する音声（例えば台詞など）は、映像中の者とは異なる者が担当することも少なくない。外国映画を日本語に吹き替えるケースや、アニメーションのキャラクターに対して台詞を入れるケース（いわゆるアテレコ）などはよく知られている。また、演劇やアニメーション、動く漫画の場合、通常は複数のキャラクターが登場することから、キャラクターの数だけ音声担当者（例えば声優）が用意される。なお、ここでは便宜上、台詞などキャラクターに関する音声の担当を「声役」ということとする。つまり、ＡというキャラクターとＢというキャラクターが登場する場合、声役Ａと声役Ｂが必要になるわけである。 By the way, as described above, the moving image includes sound as well as the video, but the sound (for example, dialogue) related to the characters in the video (not limited to people, hereinafter referred to as “character”). Often, a person different from the person in the video takes charge. Cases of dubbing foreign movies into Japanese and cases of putting dialogue on animated characters (so-called ateleco) are well known. In the case of plays, animations, and moving cartoons, since a plurality of characters usually appear, voice personnel (for example, voice actors) are prepared for the number of characters. Here, for the sake of convenience, the voice responsible for a character such as a dialogue is referred to as a “voice role”. That is, when the character A and the character B appear, the voice role A and the voice role B are required.

映画やアニメーションでは、専門の声優が声役を担当し、これまで視聴者（ユーザ）が声役として参加する余地はなかった。一方、コンピュータを用いて表示する動画の場合、音声データを映像データと分離して構成することができることから、ユーザが声役として参加することも十分考えられる。音声データのうち特定の声役に対して、ユーザの音声に書き換えることができれば、ユーザ自身の音声が反映された動画が再生できるわけである。 In movies and animation, professional voice actors are in charge of voice roles, and there has been no room for viewers (users) to participate as voice roles. On the other hand, in the case of a moving image displayed using a computer, audio data can be configured separately from video data, so that it is conceivable that the user participates as a voice actor. If a specific voice combination in the audio data can be rewritten to the user's voice, a moving image reflecting the user's own voice can be reproduced.

特定の声役にユーザ自身の声音を反映させるということは、つまり実際の声優とユーザが共演した動画を作成するということである。昨今では声優を志望する者が急増しており、このように実際の声優との共演を実現できる仕組みは広く要望されることが予想される。特許文献２でも、ユーザ自身がナレーションし、その音声を録音した紙芝居などを作成する方法について提案している。 Reflecting the user's own voice in a specific voice role means creating a moving image in which the actual voice actor and the user co-star. In recent years, the number of people who wish to become voice actors is increasing rapidly, and it is expected that a mechanism capable of co-starring with actual voice actors will be widely demanded. Patent Document 2 also proposes a method for creating a picture-story show in which the user narrates and records the sound.

特許第５３２７８２３号公報Japanese Patent No. 5327823 特開２００５−２６７０６５号公報Japanese Patent Laying-Open No. 2005-267065

特許文献２では、紙芝居などのナレーションとしてユーザ自身の音声を反映することができる。しかしながら、通常の動画には複数のキャラクターが登場し、すなわち多くの声役が用意されるため、ユーザの音声を反映させる声役をいずれか選択しなければならないが、特許文献２では所望の声役を選んで音声を反映させることができない。また、ユーザ自身の音声と実際の声優が共演した動画を作成したいと思う場合、一つの声役に対して様々な声優が選択できれば好適であるが、このような技術は、特許文献２を含め未だ提案されることがなかった。 In Patent Document 2, the user's own voice can be reflected as a narration such as a picture-story show. However, since a plurality of characters appear in a normal video, that is, many voice roles are prepared, it is necessary to select any voice role that reflects the user's voice. Cannot reflect the sound by selecting a role. In addition, when it is desired to create a moving image in which the user's own voice and the actual voice actor are co-starred, it is preferable that various voice actors can be selected for one voice role. It has never been proposed.

さらに、音声を録音する際、映像を確認しながら行う方が、作品に適した音声を記録することができるが、例えば話すべき台詞と、その台詞に許される時間（いわゆる尺と呼ばれるもの）との兼ね合いは、映像だけで判断することは難しい。 In addition, when recording audio, it is possible to record audio suitable for the work by checking the video, but for example, the dialogue to be spoken and the time allowed for that dialogue (so-called shaku) It is difficult to judge the balance between the images alone.

本願発明の課題は、従来技術が抱える問題を解決することであり、すなわち、動画中に表れる複数のキャラクターの中から、所望のキャラクターを選んでユーザ自身の音声を反映させるとともに、他の声役を複数の音声から選択でき、しかも映像を確認しながら音声を録音する際、その速度を案内することのできる音声録音プログラム、音声録音端末装置、及び音声録音システムを提供することである。 The problem of the present invention is to solve the problems of the prior art, i.e., select a desired character from a plurality of characters appearing in a moving image and reflect the user's own voice, as well as other voice roles. And a voice recording program, a voice recording terminal device, and a voice recording system that can guide the speed when recording a voice while confirming a video.

本願発明は、動画中に登場する複数のキャラクターから所望のものを選んで音声録音し、しかも映像を再生しながら録音するという点に着目して開発されたものであり、従来にはない発想に基づいてなされた発明である。 The invention of the present application was developed with a focus on the fact that a desired character is selected from a plurality of characters appearing in a moving picture and recorded while reproducing the video. It is an invention made based on this.

本願発明の「音声録音プログラム」は、映像（コマ送りされる画像の連続表示を含む）を確認しながら音声を録音するプログラムであって、コンテンツ読出し処理と、主声役選択処理、映像表示処理、従音声選択処理、音声記憶処理をコンピュータに実行させる機能を備えたものである。このうちコンテンツ読出し処理は、映像及びタイムラインを有する「コンテンツ」を読み出す処理である。このタイムラインには、２以上の声役が設定されるとともに、声役ごとに映像中に割り当てられる録音時間帯が設定されている。主声役選択処理は、２以上の声役のうち録音対象とする主声役を選択する処理であり、映像表示処理は、映像を表示する処理である。また、従音声選択処理は、それぞれの「従声役（２以上のうち主声役を除く声役）」に対して記憶された２以上の録音ユーザによる音声のうち所望の録音ユーザの音声を選択する処理であり、音声記憶処理は、主声役の録音時間帯内に録音された音声をその録音時間帯と関連付けて記憶する処理である。 The “audio recording program” of the present invention is a program for recording audio while confirming video (including continuous display of frame-by-frame images), content reading processing, main character selection processing, video display processing , A function of causing the computer to execute the secondary voice selection process and the voice storage process. Among these, the content reading process is a process of reading “content” having a video and a timeline. In this timeline, two or more voice roles are set, and a recording time zone assigned to the video for each voice role is set. The main voice selection process is a process of selecting a main voice combination to be recorded from two or more voices, and the video display process is a process of displaying a video. Further, the subordinate voice selection processing is performed by selecting a voice of a desired recording user among voices of two or more recording users stored for each “subordinate part (voice part excluding the main voice part out of two or more)”. The voice storing process is a process of selecting and recording the voice recorded in the recording time zone of the main voice role in association with the recording time zone.

本願発明の「音声録音プログラム」は、動画再生処理を、コンピュータに実行させる機能をさらに備えたものとすることもできる。この動画再生処理は、映像とともに音声を出力するもので、主声役に対応する録音時間帯では音声記憶処理で録音された音声を出力し、従声役に対応する録音時間帯では従音声選択処理で選択された種類の音声を出力する。 The “audio recording program” of the present invention may further include a function for causing a computer to execute a moving image reproduction process. This video playback process outputs audio along with the video, outputs the voice recorded by the voice storage process during the recording time zone corresponding to the main voice role, and selects the secondary voice during the recording time zone corresponding to the main voice role. Outputs the type of audio selected in the process.

本願発明の「音声録音プログラム」は、従音声出力処理を、コンピュータに実行させる機能をさらに備えたものとすることもできる。この従音声出力処理は、録音する際、従声役に対して設定された録音時間帯で、従音声選択処理で選択された種類の音声を出力するもので、従音声出力処理によって従音声が出力されることで、主声役の録音タイミングが計りやすくなる。 The “voice recording program” of the present invention may be further provided with a function of causing a computer to execute a secondary voice output process. This subordinate audio output process outputs the type of audio selected in the subordinate audio selection process in the recording time zone set for the subordinate role when recording. By outputting, it becomes easy to measure the recording timing of the main voice.

本願発明の「音声録音プログラム」は、録音支援処理を、コンピュータに実行させる機能をさらに備えたものとすることもできる。この録音支援処理は、タイムラインに基づいて主声役の録音時間帯になると録音速度計を表示するものである。録音速度計は、主声役の録音時間帯における進行状況や残り時間を動的に示すもので、これを表示することで音声の録音を支援することができる。 The “voice recording program” of the present invention may further have a function of causing a computer to execute a recording support process. This recording support process displays a recording speed meter when the recording time zone of the main voice role is reached based on the timeline. The recording speed meter dynamically indicates the progress status and remaining time in the recording time zone of the main voice actor, and by displaying this, it is possible to support voice recording.

本願発明の「音声録音端末装置」は、映像（コマ送りされる画像の連続表示を含む）を確認しながら音声を録音する端末装置であって、コンテンツ読出し手段と、主声役選択手段、従音声選択手段、映像表示手段、音声録音手段を備えたものである。このうちコンテンツ読出し手段は、映像及びタイムラインを有する「コンテンツ」を読み出す手段である。このタイムラインには、２以上の声役が設定されるとともに、声役ごとに映像中に割り当てられる録音時間帯が設定されている。主声役選択手段は、２以上の声役のうち録音対象とする主声役を選択する手段であり、従音声選択処理は、それぞれの従声役に対して記憶された２以上の録音ユーザによる音声のうち所望の録音ユーザの音声を選択する処理である。映像表示手段は、映像を表示する手段であり、音声録音手段は、音声を録音する手段である。 The “voice recording terminal device” of the present invention is a terminal device that records voice while confirming a video (including continuous display of frames-by-frame images), and includes content reading means, main character selection means, A voice selection unit, a video display unit, and a voice recording unit are provided. Among these, the content reading means is means for reading “content” having a video and a timeline. In this timeline, two or more voice roles are set, and a recording time zone assigned to the video for each voice role is set. The main voice selection means is a means for selecting a main voice combination to be recorded from two or more voice roles, and the secondary voice selection processing is performed by two or more recording users stored for each of the subordinate voice roles. This is a process of selecting a desired recording user's voice from among the voices by. The video display means is means for displaying video, and the audio recording means is means for recording audio.

本願発明の「音声録音システム」は、映像（コマ送りされる画像の連続表示を含む）を確認しながら音声を録音するシステムであって、コンテンツ入力手段と、コンテンツ記憶手段、音声録音端末装置、音声記憶手段を備えたものである。コンテンツ入力手段は、コンテンツを入力するものであり、コンテンツ記憶手段は、入力されたコンテンツを記憶するものであり、音声記憶手段は、音声録音端末装置で録音された音声を記憶するものである。コンテンツには、映像及びタイムラインが含まれ、このタイムラインには、２以上の声役が設定されるとともに、声役ごとに映像中に割り当てられる録音時間帯が設定されている。また、音声録音端末装置は、コンテンツが有する映像を確認しながら音声を録音するものであって、コンテンツ読出し手段と、主声役選択手段、従音声選択手段、映像表示手段、音声録音手段を具備している。このうちコンテンツ読出し手段は、コンテンツ記憶手段からコンテンツを読み出すものであり、主声役選択手段は、２以上の声役のうち録音対象とする主声役を選択するものであり、従音声選択処理は、それぞれの従声役に対して記憶された２以上の録音ユーザによる音声のうち所望の録音ユーザの音声を選択する処理である。映像表示手段は、映像を表示するものであり、音声録音手段は、音声を録音するものである。なお、音声記憶手段は、録音時間帯内に録音された音声をその録音時間帯と関連付けて記憶する。 The “voice recording system” of the present invention is a system for recording voice while confirming video (including continuous display of frame-by-frame images), a content input means, a content storage means, a voice recording terminal device, A voice storage means is provided. The content input unit is for inputting content, the content storage unit is for storing the input content, and the voice storage unit is for storing the voice recorded by the voice recording terminal device. The content includes a video and a timeline. In this timeline, two or more voice roles are set, and a recording time zone assigned to the video for each voice role is set. The audio recording terminal device records audio while confirming the video included in the content, and includes a content reading unit, a main character selection unit, a subordinate audio selection unit, a video display unit, and an audio recording unit. doing. Of these, the content reading means reads the content from the content storage means, and the main voice combination selecting means selects the main voice combination to be recorded from among the two or more voice combinations. Is a process of selecting a desired recording user's voice among two or more recording user's voices stored for each subordinate role. The video display means displays video, and the audio recording means records audio. The voice storage means stores the voice recorded within the recording time zone in association with the recording time zone.

本願発明の音声録音プログラム、音声録音端末装置、及び音声録音システムには、次のような効果がある。
（１）同じ映像であっても、ユーザ自身の音声を録音することで、オリジナルの動画を作成することができる。
（２）さらに、台詞をアレンジすることによって、より独創的な動画を作成することができる。
（３）映像を確認しながら音声を録音することができるので、その作品（コンテンツ）に応じた音声（例えば、感情を込めた音声）を録音することができる。
（４）音声録音の際、従声役（主声役以外の声役）の音声を再生することで、さらにその作品に応じた臨場感をもって音声録音することができる。
（５）録音速度計が、主声役の録音時間帯における進行状況や残り時間を動的に示すので、あらかじめ定められた録音時間帯に適切に音声録音することができる。
（６）ユーザ自身が録音した音声と、あらかじめ入力された声優の音声を再生することによって、声優との共演を実現することができる。
（７）無線又は有線による通信手段を通じて、複数の音声録音端末装置の接続が可能な音声録音システムとすると、同一の映像に対して、見知らぬ者どうしが非同期で（時間と場所を選ばず）共演することができる。 The voice recording program, voice recording terminal device, and voice recording system of the present invention have the following effects.
(1) Even with the same video, an original moving image can be created by recording the user's own voice.
(2) Furthermore, a more original moving image can be created by arranging dialogue.
(3) Since the sound can be recorded while checking the video, it is possible to record a sound (for example, a sound with emotion) according to the work (content).
(4) During voice recording, by reproducing the voice of the subordinate role (voice role other than the main voice role), the voice can be recorded with a sense of reality corresponding to the work.
(5) Since the recording speed meter dynamically indicates the progress status and the remaining time in the recording time zone of the main voice role, it is possible to appropriately record voices in a predetermined recording time zone.
(6) By reproducing the voice recorded by the user himself and the voice of the voice actor inputted in advance, the co-starring with the voice actor can be realized.
(7) If the voice recording system is capable of connecting a plurality of voice recording terminal devices via wireless or wired communication means, strangers are asynchronously cooperating (regardless of time and place) with respect to the same video. can do.

タイムラインを説明するためのモデル図。The model figure for demonstrating a timeline. 本願発明の音声録音プログラムのうち主に音声録音に関する処理の流れの例を示すフロー図。The flowchart which shows the example of the flow of a process mainly regarding audio | voice recording among the audio | voice recording programs of this invention. 本願発明の音声録音プログラムのうち主に動画再生に関する処理の流れの例を示すフロー図。The flowchart which shows the example of the flow of a process mainly regarding animation reproduction | regeneration among the audio | voice recording programs of this invention. 録音速度計の１例を示すモデル図。The model figure which shows one example of a recording speedometer. 本願発明の音声録音システムのうち主に音声録音に必要な構成を示すブロック図。The block diagram which shows the structure mainly required for audio | voice recording among the audio | voice recording systems of this invention. 本願発明の音声録音システムのうち主に動画再生に必要な構成を示すブロック図。The block diagram which shows the structure mainly required for a moving image reproduction among the audio | voice recording systems of this invention. 本願発明の音声録音システムを使用した１例を示すブロック図。The block diagram which shows an example using the audio | voice recording system of this invention.

本願発明の音声録音プログラム、音声録音端末装置、及び音声録音システムの例を図に基づいて説明する。 An example of a voice recording program, a voice recording terminal device, and a voice recording system according to the present invention will be described with reference to the drawings.

１．定義
本願発明の実施形態の例を説明するにあたって、はじめにここで用いる用語の定義を示しておく。 1. Definitions In describing examples of embodiments of the present invention, definitions of terms used here are given first.

（動画とコンテンツ）
動画は、映像と音声を表示したものである。なおここで映像とは、既述したとおり、多数の静止画を高速で切り替えることで動きを表現するもののほか、複数の静止画を断続的に切り替える、いわゆるコマ送りによる表現も含まれる。一方、コンテンツは、映像データとタイムラインデータを含むもので、アニメーションや物語のタイトル（作品）ごとに用意される。タイムラインは映像中の音声時間を設定するものであり、このタイムラインに従って音声を録音し、あるいは出力する。つまり、コンテンツを動画再生手段で再生したものが、「動画」として表示されるわけである。 (Video and content)
A moving image displays video and audio. In addition, as described above, the video includes not only what expresses a motion by switching a large number of still images at high speed but also a so-called frame advance expression that intermittently switches a plurality of still images. On the other hand, the content includes video data and timeline data, and is prepared for each animation or story title (works). The timeline is for setting the audio time in the video, and the audio is recorded or output according to this timeline. That is, the content reproduced by the moving image reproducing means is displayed as “moving image”.

（タイムラインと声役）
タイムラインについて、図１を参照しながらさらに詳しく説明する。図１は、タイムラインを説明するためのモデル図である。タイムラインは、この図に示すように、映像の開始から終了までの間、どのタイミングで、しかもどの程度の時間幅（いわゆる尺）で、音声を表現するかを設定するものであり、いわば音声時間を決める設計図の役割を果たすものである。 (Timeline and voice role)
The timeline will be described in more detail with reference to FIG. FIG. 1 is a model diagram for explaining a timeline. As shown in this figure, the timeline sets at what timing and in what time width (so-called scale) from the start to the end of the video. It plays the role of a blueprint that determines time.

通常、一つの動画作品（タイトル）には、複数のキャラクターが登場し、当然ながらキャラクターごとに台詞やその出力タイミング等は異なる。なお、ここでは便宜上、キャラクターの音声担当を「声役」としている。例えば、図１に示すタイトルでは６種類のキャラクターが登場し、それぞれのキャラクターに対して、声役Ａ〜声役Ｆが割り当てられている。このように複数の声役が用意されている場合、当然ながらタイムラインは、それぞれの声役に対して、音声を入出力する時間帯（以下、「録音時間帯」という。）を設定することとなる。図１では、例えば、声役Ａに対して３つの録音時間帯が設定され、声役Ｄでは２つの録音時間帯が設定されている。 Usually, a plurality of characters appear in one movie work (title), and naturally, the dialogue and the output timing thereof are different for each character. Here, for the sake of convenience, the character's voice charge is referred to as “voice role”. For example, in the title shown in FIG. 1, six types of characters appear, and voice roles A to F are assigned to each character. When a plurality of voice roles are prepared as described above, the timeline naturally sets a time zone (hereinafter referred to as “recording time zone”) for inputting / outputting voice for each voice role. It becomes. In FIG. 1, for example, three recording time zones are set for the voice role A, and two recording time zones are set for the voice role D.

タイムラインは、登場するキャラクターに係る音声に限らず、効果音（ＳＥ：ＳｏｕｎｄＥｆｆｅｃｔ）や背景音楽（ＢＧＭ：ＢａｃｋＧｒｏｕｎｄＭｕｓｉｃ）の録音時間帯である「音響時間帯」も必要に応じて設定する。図１では、効果音として７つの音響時間帯、背景音楽として３つの音響時間帯が設定されている。もちろん、効果音や背景音楽も、声役と同様に２種類以上（例えば、効果音１、効果音２など）用意することもできる。 The timeline is not limited to the voices related to the characters that appear, but the “acoustic time zone”, which is the recording time zone of sound effects (SE: Sound Effect) and background music (BGM: Back Ground Music), is set as necessary. . In FIG. 1, seven sound time zones are set as sound effects, and three sound time zones are set as background music. Of course, two or more types of sound effects and background music (for example, sound effect 1, sound effect 2, etc.) can be prepared in the same manner as the voice combination.

２．音声録音プログラム
次に、本願発明の音声録音プログラムについて、図を参照しながら説明する。図２は、本願発明の音声録音プログラムのうち主に音声録音に関する処理の流れの例を示すフロー図であり、図３は、主に動画再生に関する処理の流れの例を示すフロー図である。なお、それぞれのフロー図は、中央の列に実施する処理を示し、左列にはその処理に必要な入力情報を、右列にはその処理から発生する出力情報を示している。また、ここで示す処理は、具体的にはコンピュータによって実行される。 2. Next, the voice recording program of the present invention will be described with reference to the drawings. FIG. 2 is a flowchart showing an example of a processing flow mainly related to voice recording in the voice recording program of the present invention, and FIG. 3 is a flowchart showing an example of a processing flow mainly related to moving image reproduction. Each flowchart shows the processing to be performed in the center column, the left column shows input information necessary for the processing, and the right column shows output information generated from the processing. Further, the processing shown here is specifically executed by a computer.

まず、ユーザが所望のタイトル（動画作品）に係るコンテンツを選択し、さらに所望の声役（キャラクター）に対して音声を録音する処理について、図２に基づいて説明する。はじめに、映像とタイムラインが含まれたコンテンツを、例えばコンテンツ記憶手段（コンテンツサーバ）から読み出す（Ｓｔｅｐ１０１）。通常、コンテンツサーバには複数のタイトルに係るコンテンツが記憶されており、したがってユーザはそのうち所望のタイトルに係るコンテンツを選んで読み出す。 First, a process in which a user selects content related to a desired title (moving picture work) and further records audio for a desired voice combination (character) will be described with reference to FIG. First, content including a video and a timeline is read from, for example, a content storage unit (content server) (Step 101). Normally, content related to a plurality of titles is stored in the content server, and therefore the user selects and reads content related to a desired title.

読み出されたコンテンツに含まれるタイムラインには、２以上の声役が設定されているため、ユーザは担当したい声役、すなわち自身の音声を録音したい声役を選択する（Ｓｔｅｐ１０２）。なお、便宜上ここでは、複数の声役のうち音声録音対象として選択された声役を「主声役」という。主声役が選択されると、主声役を除く他の声役は「従声役」として設定される。 Since two or more voice roles are set in the timeline included in the read content, the user selects a voice role to be in charge, that is, a voice role to record his / her voice (Step 102). For convenience, a voice role selected as a voice recording target among a plurality of voice roles is referred to as a “main voice role”. When the main voice role is selected, the other voice roles other than the main voice role are set as “subordinate voices”.

従声役に対して、既に記憶された音声がある場合、その声を読み出すこともできる。さらに、一つの声役に対して２以上の音声が記憶されていれば、ユーザが所望する音声を選択することもできる（Ｓｔｅｐ１０３）。もちろん、複数の従声役がある場合は、それぞれの声役に対して所望の音声を選択する。一方、ユーザが従声役に対して音声を選択しない場合は、既定の（デフォルトの）音声が当該従声役の音声として設定される。ここで、従声役の音声として選択（設定）されたものが「従音声」である。なお、従音声を選択する場合、その従声役として記憶された複数の音声のうち一部（あるいはすべて）の音声を試聴したうえで、所望の音声を選択することもできる。この場合、記憶された音声を読み出す処理と、その音声を再生する処理が必要となる。 If there is a voice already stored for the subordinate role, the voice can be read out. Further, if two or more voices are stored for one voice combination, the user's desired voice can be selected (Step 103). Of course, when there are a plurality of followers, a desired voice is selected for each of the voices. On the other hand, when the user does not select the voice for the subordinate role, a default (default) voice is set as the voice of the subordinate role. Here, “subordinate voice” is selected (set) as the subordinate voice. Note that, when selecting a subordinate voice, it is possible to select a desired voice after listening to a part (or all) of a plurality of voices stored as a subordinate role. In this case, a process for reading out the stored sound and a process for reproducing the sound are required.

あるいは、従音声を選択する場合、音声を録音した者（以下、「録音ユーザ」という。）に係る属性情報（以下、「ユーザ情報」という。）を表示することもできる。例えば、ユーザ情報を参照すれば、支持する声優に係る音声を従音声として選択することができるわけである。この場合、ユーザ情報を記憶するユーザ情報記憶手段と、記憶された音声に係るユーザ情報を読み出す処理、そのユーザ情報を例えば一覧形式で表示する処理が必要となる。 Alternatively, when subordinate voice is selected, attribute information (hereinafter referred to as “user information”) related to the person who recorded the voice (hereinafter referred to as “recording user”) can also be displayed. For example, referring to the user information, the voice related to the voice actor to be supported can be selected as the secondary voice. In this case, user information storage means for storing user information, processing for reading out user information related to the stored voice, and processing for displaying the user information in a list format, for example, are required.

ここまでの処理が終わると、いよいよ音声の録音が開始する（Ｓｔｅｐ１０４）。例えば、録音開始のトリガーとなる操作を行うと、映像が映像表示手段に映し出される（Ｓｔｅｐ１０５）。そして、主声役の録音時間帯になるとタイムラインにしたがって録音支援処理が開始される（Ｓｔｅｐ１０６）。この録音支援処理は、具体的には録音速度計を表示するものであり、また録音速度計は、主声役の録音時間帯における進行状況を動的に示すものである。録音速度計は、主声役の録音時間帯における進行状況に代えて（あるいは加えて）残り時間を動的に示すものとすることもできる。 When the processing up to this point is completed, voice recording is finally started (Step 104). For example, when an operation that triggers the start of recording is performed, an image is displayed on the image display means (Step 105). Then, when the recording time zone of the main voice role comes, the recording support processing is started according to the timeline (Step 106). The recording support process specifically displays a recording speed meter, and the recording speed meter dynamically indicates the progress status of the main character in the recording time zone. The recording speed meter can dynamically indicate the remaining time instead of (or in addition to) the progress status in the recording time zone of the main voice role.

図４は、録音速度計の１例を示すモデル図である。なお、この図では「マリノ」というキャラクターが主声役として選択されている。この図に示すように、主声役に対して設定された録音時間帯になると、録音速度計の動的表示が開始される。この図の録音表示計は、左右に長手方向を有する長方形枠の中を、移動針が左から右に向かって移動するものである。また、この録音表示計は、既に経過した時間帯を濃い色で、残りの時間帯を薄い色で示しており、その境界が移動針として表示されている。この図の録音速度計は、長方形枠の幅（移動針が移動する範囲）を一定長としているため、録音時間帯が比較的長い場合は移動針の移動速度を緩速とし、録音時間帯が比較的短い場合は移動針の移動速度を急速としている。なお、録音速度計は、図４に示す形式に限らず、主声役の音声録音を開始するタイミングを示すとともに、録音時間帯における進行状況又は残り時間を動的に示すことができれば、例えば時計表示や円グラフ表示、あるいは数字のみの百分率表示など、種々の形式を採用することができる。 FIG. 4 is a model diagram showing an example of a recording speed meter. In this figure, the character “Marino” is selected as the main character. As shown in this figure, when the recording time zone set for the main voice role comes, dynamic display of the recording speed meter is started. In the recording display meter of this figure, a moving needle moves from left to right in a rectangular frame having a longitudinal direction on the left and right. In this recording indicator, the time zone that has already passed is shown in a dark color, the remaining time zone is shown in a light color, and the boundary is displayed as a moving needle. The recording speedometer in this figure has a fixed rectangular frame width (the range in which the moving needle moves), so if the recording time zone is relatively long, the moving speed of the moving needle is slow and the recording time zone is If it is relatively short, the moving speed of the moving needle is rapid. Note that the recording speed meter is not limited to the format shown in FIG. 4, and can indicate the timing of starting the voice recording of the main character role and can dynamically indicate the progress or remaining time in the recording time zone, for example, a clock Various formats such as display, pie chart display, or percentage display of numbers only can be adopted.

さらに録音速度計は、その録音時間帯で話すべき台詞を表示することもできる。もちろん、台詞どおり話すのではなく、アドリブとして独創的な台詞を話すこともできるし、声ではなく楽器その他の音を発することもできる。ここで話された（発せられた）音声は、録音手段によって録音され（Ｓｔｅｐ１０７）、主声役に係る音声（以下、「主音声」という。）として記憶される（Ｓｔｅｐ１０８）。このとき、主音声は当該録音時間帯と関連づけて（紐づけて）記憶される。例えば、図１で声役Ａが主声役とすると、録音時間帯Ａ１で録音された主音声は、録音時間帯Ａ１に紐づく主音声として記憶され、録音時間帯Ａ２で録音された主音声は、録音時間帯Ａ２に紐づく主音声として記憶される。 Furthermore, the recording speed meter can also display the dialogue to be spoken in the recording time zone. Of course, instead of speaking in line, you can speak creative lines as ad-lib, and you can also make instruments and other sounds instead of voices. The voice spoken (spoken) here is recorded by the recording means (Step 107) and stored as the voice related to the main voice (hereinafter referred to as “main voice”) (Step 108). At this time, the main voice is stored in association with the recording time zone. For example, if the voice role A is the main voice role in FIG. 1, the main voice recorded in the recording time zone A1 is stored as the main voice linked to the recording time zone A1, and the main voice recorded in the recording time zone A2. Is stored as the main voice associated with the recording time zone A2.

主音声を録音するに当たっては、図４にも示すように、タイムラインに従って従声役の録音速度計を表示することもできる。加えて、従声役の録音速度計に当該録音時間帯の台詞を表示することもできるし（Ｓｔｅｐ１０９）、Ｓｔｅｐ１０３において選択（設定）された従音声を出力することもできる（Ｓｔｅｐ１１０）。もちろん、タイムラインに従って効果音や背景音楽を出力することもできる。従声役の録音速度計を表示し、従音声や効果音等を出力することで、主声役を録音するタイミングが計りやすくなるとともに、臨場感が向上してより適した音声を録音することができる。ただしこの場合、主声役で表示される録音速度計と、従声役で表示される録音速度計を識別可能に表示すると良い。主声役と従声役の録音速度計を同様にすると、主声役を録音するタイミングが却って計りにくくなるからである。例えば、主声役の録音表示計の濃淡表示（経過時間と残時間）は赤色とし、従声役の録音表示計の濃淡表示はグレーとすると良い。なお、異なる従声役の録音表示計は、それぞれ識別可能にしてもよいし、統一して識別不可とすることもできる。 When recording the main voice, as shown in FIG. 4, the recording speedometer of the subordinate role can also be displayed according to the timeline. In addition, the speech of the recording time zone can be displayed on the recording speedometer of the subordinate role (Step 109), and the subordinate voice selected (set) in Step 103 can be output (Step 110). Of course, sound effects and background music can be output according to the timeline. By displaying the recording speedometer of the subordinate role and outputting the subordinate voice or sound effect, it becomes easier to measure the timing of recording the main character role, and the realistic feeling is improved to record a more suitable voice Can do. However, in this case, it is preferable to display the recording speed meter displayed as the main character role and the recording speed meter displayed as the subordinate character so that they can be identified. This is because if the recording speedometers for the main voice role and the secondary voice role are made the same, the timing for recording the main voice role becomes difficult to measure. For example, the shading display (elapsed time and remaining time) of the main voice recording indicator may be red, and the shading display of the subordinate recording display meter may be gray. It should be noted that the recording indicators of different subordinates may be identifiable, or may be unified and indistinguishable.

さらに、主音声を録音するに当たっては、タイムラインに従って主音声を出力しながら録音することもできる。例えば、外国語の台詞の場合、ヒアリングしながらその発音に合わせて録音するわけである。この場合も従音声と同様、主声役に対して２種類以上の音声が記憶されていれば、記憶された音声のうち所望の種類の音声を選択して出力させることができる。その際、記憶された複数の音声のうち一部（あるいはすべて）の音声を試聴したうえで、所望の主音声を選択することもできるし、表示されたユーザ情報を参照しながら所望の主音声を選択することもできる。 Furthermore, when recording the main voice, it is also possible to record while outputting the main voice according to the timeline. For example, in the case of a foreign language dialogue, it is recorded according to its pronunciation while listening. Also in this case, as in the case of the subordinate voice, if two or more types of voices are stored for the main voice role, a desired type of voice can be selected and output from the stored voices. At that time, it is possible to select a desired main sound after listening to a part (or all) of a plurality of stored sounds, or to select a desired main sound while referring to the displayed user information. Can also be selected.

ところで図４の映像は、いわゆるコマ送りによる映像を表示しており、この画像に関する複数の録音速度計の長方形枠が、一覧形式で表示されている。この場合、画像が切り替わる（つまりコマ送りされる）タイミングで長方形枠の一覧を総入れ替えすることもできるし、画像の切り替えにかかわらず、時間の経過とともに長方形枠の一覧を上方にスクロール移動させることもできる。もちろん、画像を高速切り替えする映像の場合も同様である。 By the way, the video of FIG. 4 displays a video by so-called frame advance, and rectangular frames of a plurality of recording speed meters relating to this image are displayed in a list format. In this case, the list of rectangular frames can be totally replaced at the timing when the image is switched (that is, the frame is advanced), and the list of rectangular frames can be scrolled upward as time passes regardless of the image switching. You can also. Of course, the same applies to a video in which images are switched at high speed.

映像が終了する（図１のＥＮＤ）まで、つまりすべての録音時間帯（例えば、図１のＡ１〜Ａ３）に対して繰り返し音声を録音する処理が終わると、一連の録音処理が終了する（Ｓｔｅｐ１１１）。 A series of recording processes ends (Step 111) until the video ends (END in FIG. 1), that is, when the process of repeatedly recording audio for all recording time zones (for example, A1 to A3 in FIG. 1) ends. ).

つぎに図３に基づいて、ユーザが所望のタイトルに係るコンテンツを選択し、そのコンテンツを再生する処理について説明する。はじめに、映像とタイムラインが含まれたコンテンツを、例えばコンテンツ記憶手段（コンテンツサーバ）から読み出す（Ｓｔｅｐ２０１）。通常、コンテンツサーバには複数のタイトルに係るコンテンツが記憶されており、したがってユーザはそのうち所望のタイトルに係るコンテンツを選んで読み出す。なお、ここで読み出すコンテンツに係る録音時間帯には、既に音声が録音されている。したがって、図２のフロー図で説明したコンテンツサーバとここで説明するコンテンツサーバは異なるものとして用意してもよいし、１のコンテンツサーバを用意して、これに記憶されるコンテンツが録音済みであるか、あるいは未録音のものか識別できるように記憶させることもできる。 Next, based on FIG. 3, a process in which a user selects content related to a desired title and reproduces the content will be described. First, content including a video and a timeline is read from, for example, a content storage unit (content server) (Step 201). Normally, content related to a plurality of titles is stored in the content server, and therefore the user selects and reads content related to a desired title. Note that audio has already been recorded in the recording time zone relating to the content to be read here. Therefore, the content server described in the flowchart of FIG. 2 and the content server described here may be prepared differently, or one content server is prepared, and the content stored therein is recorded. Alternatively, it can be stored so that it can be identified whether it is an unrecorded one.

所望のコンテンツを読み出すと、声役に対して出力する音声を選択する（Ｓｔｅｐ２０２）。なお、動画を再生する場合は、音声を録音する場合と異なり、主声役と従声役に分けて設定する必要はない。複数ある声役それぞれに対して、２以上記憶された音声のうち所望の音声を選択する。もちろん、一つの声役に対して１の音声のみが記憶されている場合はそのまま指定する。一方、ユーザが声役に対して音声を選択しない場合は、既定の（デフォルトの）音声が当該声役の音声として設定される。ここで、声役の音声として選択（設定）されたものがそれぞれの声役の「再生音声」である。なお、音声を選択する場合、その声役として記憶された複数の音声のうち一部（あるいはすべて）の音声を試聴したうえで、所望の音声を選択することもできるし、表示されたユーザ情報を参照しながら所望の音声を選択することもできる。 When the desired content is read, the sound to be output to the voice combination is selected (Step 202). When playing back a moving image, unlike the case of recording sound, it is not necessary to set the main character role and the subordinate character separately. For each of a plurality of voice roles, a desired voice is selected from two or more stored voices. Of course, if only one voice is stored for one voice combination, it is designated as it is. On the other hand, when the user does not select a voice for the voice role, a default (default) voice is set as the voice of the voice role. Here, what is selected (set) as the voice of the voice role is the “reproduced voice” of each voice role. When selecting a voice, it is possible to select a desired voice after listening to a part (or all) of a plurality of voices stored as a voice combination, and displayed user information. The desired voice can also be selected with reference to FIG.

ここまでの処理が終わると、いよいよ動画再生が開始する（Ｓｔｅｐ２０３）。例えば、再生開始のトリガーとなる操作を行うと、映像が映像表示手段に映し出される（Ｓｔｅｐ２０４）。そして、Ｓｔｅｐ２０１で読み出したコンテンツのタイムラインに規定された録音時間帯に応じて、Ｓｔｅｐ２０２で選択され読み出された再生音声が出力される（Ｓｔｅｐ２０５）。 When the processing so far is finished, the moving image playback is finally started (Step 203). For example, when an operation that triggers the start of playback is performed, a video is displayed on the video display means (Step 204). Then, in accordance with the recording time zone defined in the timeline of the content read in Step 201, the playback audio selected and read in Step 202 is output (Step 205).

動画再生中に、それぞれの声役に対応する録音速度計をタイムラインに従って表示し、さらに対応する台詞を表示させることもできる（Ｓｔｅｐ２０６）。映像が終了する（図１のＥＮＤ）まで、つまりすべての声役のすべての録音時間帯に対して繰り返し音声が出力され、一連の動画再生処理が終了する（Ｓｔｅｐ２０７）。 During the reproduction of the moving image, the recording speed meter corresponding to each voice combination can be displayed according to the timeline, and further the corresponding dialogue can be displayed (Step 206). Until the video ends (END in FIG. 1), that is, all the recording time zones of all voice roles are repeatedly output, and a series of moving image playback processing ends (Step 207).

３．音声録音端末装置、及び音声録音システム
ここでは、本願発明の音声録音端末装置、及び音声録音システムについて、図を参照しながら説明する。なお、音声録音プログラムで説明した内容と重複する説明は避け、音声録音端末装置、及び音声録音システムに特有の内容のみ説明することとする。すなわち、ここに記載されていない内容は、音声録音プログラムで説明したものと同様である。 3. Voice Recording Terminal Device and Voice Recording System Here, the voice recording terminal device and voice recording system of the present invention will be described with reference to the drawings. It should be noted that description overlapping with the contents described in the voice recording program is avoided, and only contents specific to the voice recording terminal device and the voice recording system will be described. That is, the contents not described here are the same as those described in the voice recording program.

（音声録音端末装置）
はじめに、図５を参照しながら、音声録音システムを構成する音声録音端末装置１００について説明する。図５は、本願発明の音声録音システムのうち主に音声録音に必要な構成を示すブロック図である。音声録音端末装置１００は、音声録音プログラムの処理（全部または一部）を実行するものであり、専用のものとして製造することもできるが、汎用的なコンピュータ装置を利用することもできる。このコンピュータ装置は、パーソナルコンピュータ（ＰＣ）や、ｉＰａｄ（登録商標）といったタブレット型端末やスマートフォン、あるいはＰＤＡ（ＰｅｒｓｏｎａｌＤａｔａＡｓｓｉｓｔａｎｃｅ）などによって構成することができる。コンピュータ装置は、ＣＰＵ等のプロセッサ、ＲＯＭやＲＡＭといったメモリを具備しており、さらにマウスやキーボード等の入力手段やディスプレイ（映像表示手段１０３）を含むものもある。なお、一般的なＰＣであればマウスやキーボード等のデバイスから入力するが、タブレット型端末やスマートフォンではタッチパネルを用いた操作（タップ、ピンチイン／アウト、スライド等）で入力することが多い。 (Voice recording terminal device)
First, the voice recording terminal device 100 constituting the voice recording system will be described with reference to FIG. FIG. 5 is a block diagram showing a configuration mainly necessary for voice recording in the voice recording system of the present invention. The voice recording terminal device 100 executes processing (all or a part) of a voice recording program, and can be manufactured as a dedicated one, but a general-purpose computer device can also be used. This computer apparatus can be configured by a personal computer (PC), a tablet terminal such as iPad (registered trademark), a smartphone, or a PDA (Personal Data Assistance). The computer device includes a processor such as a CPU and a memory such as a ROM and a RAM, and further includes an input unit such as a mouse and a keyboard and a display (video display unit 103). In addition, although it inputs from devices, such as a mouse | mouth and a keyboard, if it is a general PC, it is often input by operation (a tap, pinch in / out, a slide, etc.) using a touch panel with a tablet type terminal or a smart phone.

音声録音端末装置１００は、図５に示すように、コンテンツ読出し手段１０１と、主声役選択手段１０２、映像表示手段１０３、録音支援手段１０４、音声録音手段１０５を具備している。コンテンツ読出し手段１０１は、コンテンツ記憶手段（コンテンツサーバ２００）からコンテンツを読み出すものである。通常、コンテンツサーバには複数のタイトルに係るコンテンツが記憶されているので、コンテンツ読出し手段１０１は複数の中から所望のタイトルに係るコンテンツを選んで読み出すことができる。 As shown in FIG. 5, the voice recording terminal device 100 includes content reading means 101, main voice selection means 102, video display means 103, recording support means 104, and voice recording means 105. The content reading unit 101 reads content from the content storage unit (content server 200). Usually, since content related to a plurality of titles is stored in the content server, the content reading means 101 can select and read content related to a desired title from the plurality.

読み出されたコンテンツに含まれるタイムラインには、２以上の声役が設定されている。そこで、主声役選択手段１０２が、この２以上の声役から主声役を選択する。また、映像表示手段１０３は、例えばディスプレイなどのように映像を表示し、録音支援手段１０４は、図４に示すような録音速度計を表示させる。そして、音声録音手段１０５を用いて音声を録音することができる。なお図５では、主音声を記憶する音声記憶手段（音声サーバ３００）を音声録音端末装置１００の外部に設けているが、これに限らず音声録音端末装置１００が音声サーバ３００を具備することもできる。音声サーバ３００は、主音声と、当該主音声の録音時間帯を、それぞれ関連づけて（紐づけて）記憶する。 Two or more voice combinations are set in the timeline included in the read content. Therefore, the main voice combination selection unit 102 selects the main voice combination from the two or more voice combinations. Further, the video display means 103 displays a video such as a display, and the recording support means 104 displays a recording speed meter as shown in FIG. Then, voice can be recorded using the voice recording means 105. In FIG. 5, the voice storage means (voice server 300) for storing the main voice is provided outside the voice recording terminal device 100. However, the voice recording terminal device 100 may include the voice server 300. it can. The voice server 300 stores the main voice and the recording time zone of the main voice in association with each other.

また、コンテンツサーバ２００からコンテンツを読み出すために、無線又は有線による通信手段を通じてコンテンツを受信する「受信手段１０６」を、音声録音端末装置１００に具備させることもできる。同様に、録音時間帯と紐づいた主音声を、無線又は有線による通信手段を通じて、音声サーバ３００（この場合は音声録音端末装置１００の外部に設けている）に送信する「送信手段１０７」を、音声録音端末装置１００に具備させることもできる。 Further, in order to read the content from the content server 200, the audio recording terminal device 100 may be provided with a “receiving unit 106” that receives the content through a wireless or wired communication unit. Similarly, a “sending unit 107” that transmits the main voice associated with the recording time zone to the voice server 300 (in this case, provided outside the voice recording terminal device 100) through wireless or wired communication means. The voice recording terminal device 100 can also be provided.

さらに、録音ユーザに係る属性情報である「ユーザ情報」を入力する「ユーザ情報入力手段１０８」を、音声録音端末装置１００に具備させることもできる。この場合、入力されたユーザ情報は、送信手段１０７によってユーザ情報記憶手段（ユーザ情報サーバ４００）に記憶される。このユーザ情報サーバ４００は、主音声として記憶された音声と、当該音声を録音した録音ユーザに係るユーザ情報を、それぞれ関連づけて（紐づけて）記憶する。 Furthermore, the voice recording terminal device 100 can be provided with “user information input means 108” for inputting “user information” which is attribute information relating to the recording user. In this case, the input user information is stored in the user information storage means (user information server 400) by the transmission means 107. The user information server 400 stores the voice stored as the main voice and the user information related to the recording user who recorded the voice in association (linked).

（音声録音システム）
音声録音システムは、図５に示すように、音声録音端末装置１００と、コンテンツ入力手段２０１、コンテンツサーバ２００、音声サーバ３００で構成される。ここでコンテンツ入力手段２０１は、コンテンツサーバ２００にコンテンツを入力するためのものである。なお、既述のとおりコンテンツサーバ２００は音声録音端末装置１００に具備させることもできるし、同様に、コンテンツ入力手段２０１も音声録音端末装置１００に具備させることができる。さらに、ユーザ情報サーバ４００を含めて、音声録音システムを構成することもできる。 (Voice recording system)
As shown in FIG. 5, the audio recording system includes an audio recording terminal device 100, a content input unit 201, a content server 200, and an audio server 300. Here, the content input means 201 is for inputting content to the content server 200. As described above, the content server 200 can be included in the audio recording terminal device 100, and similarly, the content input unit 201 can be included in the audio recording terminal device 100. Furthermore, a voice recording system can be configured including the user information server 400.

図６は、本願発明の音声録音システムのうち主に動画再生に必要な構成を示すブロック図である。この図では、図５に示す音声録音システムの構成に加えて、動画再生装置５００が設けられている。なおこの図では、動画再生装置５００が音声録音端末装置１００から独立して（別体として）設けられているが、これに限らず、動画再生装置５００を音声録音端末装置１００に含めて（一体として）設けることもできる。 FIG. 6 is a block diagram showing a configuration mainly necessary for moving image reproduction in the audio recording system of the present invention. In this figure, in addition to the configuration of the audio recording system shown in FIG. 5, a moving image reproducing apparatus 500 is provided. In this figure, the video playback device 500 is provided independently (separately) from the audio recording terminal device 100. However, the present invention is not limited to this, and the video playback device 500 is included in the audio recording terminal device 100 (integrated). As).

動画再生装置５００は、図６に示すように、再生音声選択手段５０１と、再生音声読出し手段５０２、動画再生手段５０３を具備している。再生音声選択手段５０１は、コンテンツ読出し手段１０１で読み出した所望のコンテンツで設定される複数の声役に対して、それぞれの声役について出力する音声を選択する。ここで選択された音声が、再生音声として設定される。再生音声読出し手段５０２は、再生音声選択手段５０１で選択（設定）した再生音声を読み出し、動画再生手段５０３は、コンテンツに含まれる映像を、映像表示手段１０３に表示して再生する。さらに動画再生装置５００は、各々の声役で選択した再生音声を一つの組み合わせ（以下、「音声グループ」という。）として登録する音声グループ登録手段を具備したものとすることもできる。この場合、再生音声選択手段５０１は、個々の声役ごとに再生音声を選択することもできるし、既登録のものがあれば音声グループを選択することもできる。配役どうしの相乗効果によってより優れた動画作品となることもあり、音声グループを登録しておけば容易に同じ声役の組み合わせで動画を閲覧することができて好適となる。 As shown in FIG. 6, the moving image reproducing apparatus 500 includes a reproduced audio selecting unit 501, a reproduced audio reading unit 502, and a moving image reproducing unit 503. The reproduction voice selecting unit 501 selects a voice to be output for each voice combination for a plurality of voice combinations set with the desired content read by the content reading unit 101. The audio selected here is set as the playback audio. The reproduction audio reading unit 502 reads the reproduction audio selected (set) by the reproduction audio selection unit 501, and the moving image reproduction unit 503 displays and reproduces the video included in the content on the video display unit 103. Furthermore, the moving image playback apparatus 500 may include a voice group registration unit that registers the playback voices selected for each voice combination as one combination (hereinafter referred to as “voice group”). In this case, the reproduction voice selection unit 501 can select reproduction voice for each voice combination, or can select a voice group if there is a registered one. It may be a better video work due to the synergistic effect between the castings, and if a voice group is registered, it is possible to easily view the video with the same combination of voices.

再生音声選択手段５０１で出力音声を選択する際、音声サーバ３００に記憶された複数の音声のうち一部（あるいはすべて）の音声を試聴したうえで、所望の再生音声を選択することもできるし、表示されたユーザ情報を参照しながら所望の再生音声を選択することもできる。この場合、再生音声選択手段５０１が、音声サーバ３００から音声を読み出して出力し、あるいはユーザ情報サーバ４００からユーザ情報を読み出して表示する。 When selecting the output sound by the reproduction sound selection means 501, it is possible to select a desired reproduction sound after listening to a part (or all) of a plurality of sounds stored in the sound server 300. The desired reproduction sound can be selected while referring to the displayed user information. In this case, the reproduction audio selection unit 501 reads out the sound from the sound server 300 and outputs it, or reads out the user information from the user information server 400 and displays it.

また、無線又は有線による通信手段を通じて、音声サーバ３００から音声を受信する、あるいはユーザ情報サーバ４００からユーザ情報を受信する「受信手段５０５」を、動画再生装置５００に具備させることもできる。さらに、動画再生後に、選択した再生音声を録音した録音ユーザに対して、評価する手段として「評価手段５０４」を設けることもできる。ここで評価した結果（例えば、テキストデータ）は、「送信手段５０６」を用い、無線又は有線による通信手段を通じて、ユーザ情報サーバ４００に送信することもできる。なお、受信手段５０５は、図５に示す受信手段１０６と兼用することができるし、送信手段５０６は、図５に示す送信手段１０７と兼用することができる。 In addition, the moving image reproduction apparatus 500 may include “reception unit 505” that receives audio from the audio server 300 or receives user information from the user information server 400 through a wireless or wired communication unit. Furthermore, “evaluation means 504” can be provided as a means for evaluating a recording user who has recorded the selected reproduction sound after reproducing the moving image. The result (for example, text data) evaluated here can also be transmitted to the user information server 400 through the wireless or wired communication means using the “transmission means 506”. Note that the receiving unit 505 can also be used as the receiving unit 106 shown in FIG. 5, and the transmitting unit 506 can be used as the transmitting unit 107 shown in FIG.

（音声録音システムの使用例）
図７は、本願発明の音声録音システムを使用した１例を示すブロック図である。この図では、クライアントがコマーシャル（ＣＭ）用に映像を作成し、その映像の声役に適した音声役（ユーザ）を探し出すケースを示している。クライアントは、映像とともにタイムラインを作成し、映像とタイムラインからなるコンテンツをコンテンツサーバ２００に登録する。このとき、映像中、採用したい声役（つまり主音声）を明確にしておくのが望ましい。 (Usage example of voice recording system)
FIG. 7 is a block diagram showing an example using the voice recording system of the present invention. This figure shows a case where a client creates a video for commercial (CM) and searches for a voice combination (user) suitable for the voice combination of the video. The client creates a timeline together with the video, and registers content composed of the video and the timeline in the content server 200. At this time, it is desirable to clarify the voice combination (that is, the main voice) to be adopted in the video.

一方、録音ユーザは、音声録音端末装置１００を用いて、クライアントが目的としている声役を主声役として、自身の音声を録音する。そして録音された音声は、音声サーバ３００に記憶される。なお、ユーザ情報はあらかじめユーザ情報サーバ４００に記憶されている。 On the other hand, the recording user uses his / her voice recording terminal apparatus 100 to record his / her voice with the voice role that the client is aiming at as the main voice role. The recorded voice is stored in the voice server 300. Note that user information is stored in the user information server 400 in advance.

何人かの録音ユーザによって音声サーバ３００に音声が記憶されると、今度はクライアントが、動画再生装置５００を用いて動画再生する。このとき、音声試聴やユーザ情報を参考に再生音声を選択したうえで動画再生を行う。そして、何人かの録音ユーザに係る再生音声を確認し、その中から採用したい録音ユーザを選択する。ここでクライアントが採用したい録音ユーザを選択すると、その採用した結果を「ユーザ選択情報」として当該録音ユーザに通知する。具体的には、図７に示す「選択情報送信手段６０１」によってユーザ選択情報を送信し、「選択情報受信手段６０２」によって録音ユーザが選択情報を受信する。なお、選択情報を送受信するとともに、ユーザ情報に含まれる銀行口座等の情報に基づいて、出演フィーとしてクライアントが当該登録ユーザの銀行口座に入金する仕組みとすることもできる。 When sound is stored in the sound server 300 by some recording users, the client reproduces the moving image by using the moving image reproducing device 500 this time. At this time, the reproduction of the moving image is performed after selecting the reproduction sound with reference to the audio preview and the user information. Then, the reproduced voices related to some recording users are confirmed, and a recording user to be employed is selected from among them. When the recording user that the client wants to employ is selected here, the adopted result is notified to the recording user as “user selection information”. Specifically, user selection information is transmitted by “selection information transmission means 601” shown in FIG. 7, and a recording user receives selection information by “selection information reception means 602”. In addition, while sending and receiving selection information, based on information, such as a bank account contained in user information, it can also be set as the mechanism in which a client deposits into the bank account of the said registered user as an appearance fee.

本願発明の音声録音プログラム、音声録音端末装置、及び音声録音システムは、実写の物語（ドラマ）や、アニメーション、あるいは動く漫画に利用できるほか、外国語の会話練習、テレビコマーシャルの声優選び、あるいは映像付きの音楽演奏にも利用することができる。すなわち、今後、様々な産業界で応用が期待できる発明である。 The voice recording program, voice recording terminal device, and voice recording system of the present invention can be used for live-action stories (dramas), animations, or moving cartoons, as well as foreign language conversation practice, television commercial voice actor selection, or video It can also be used for music performances. That is, the invention can be expected to be applied in various industries in the future.

１００音声録音端末装置
１０１コンテンツ読出し手段
１０２主声役選択手段
１０３映像表示手段
１０４録音支援手段
１０５音声録音手段
１０６受信手段
１０７送信手段
１０８ユーザ情報入力手段
２００コンテンツサーバ
３００音声サーバ
４００ユーザ情報サーバ
５００動画再生装置
５０１再生音声選択手段
５０２再生音声読出し手段
５０３動画再生手段
５０４評価手段
５０５受信手段
５０６送信手段
６０１選択情報送信手段
６０２選択情報受信手段 DESCRIPTION OF SYMBOLS 100 Voice recording terminal apparatus 101 Content reading means 102 Main voice selection means 103 Video display means 104 Recording support means 105 Voice recording means 106 Receiving means 107 Transmission means 108 User information input means 200 Content server 300 Audio server 400 User information server 500 Movie Playback apparatus 501 Playback voice selection means 502 Playback voice reading means 503 Movie playback means 504 Evaluation means 505 Reception means 506 Transmission means 601 Selection information transmission means 602 Selection information reception means

Claims

A program that allows a computer to execute a sound recording function while confirming a “video” including a continuous display of a frame-by-frame image,
A function of causing the computer to execute a content reading process of reading “content” having the video and timeline, and
In the timeline, two or more voice roles are set, and a recording time zone assigned to the video for each voice role is set,
A main voice role selection process for selecting a main voice role to be recorded from among the two or more voice roles;
Video display processing for displaying the video;
Subordinate voice selection processing for selecting the voice of a desired recording user from among the voices of two or more recording users stored for each “subordinate role” excluding the main voice role among the two or more voice roles. When,
A voice recording program comprising: a function of causing the computer to execute a voice storage process for storing voice recorded in a recording time zone of the main voice role in association with the recording time zone.

A function of causing the computer to execute a moving image reproduction process for outputting sound together with the video;
The video playback process outputs the voice recorded by the voice storage process in the recording time zone corresponding to the main voice role, and is selected by the secondary voice selection process in the recording time zone corresponding to the lower voice role. The voice recording program according to claim 1, wherein the recorded voice is output.

A function of causing the computer to execute a slave voice output process of outputting the type of voice selected in the slave voice selection process in a recording time zone set for the slave voice when recording;
3. The voice recording program according to claim 1, wherein the secondary voice is output by the secondary voice output process, thereby making it easy to measure the recording timing of the main voice combination.

Based on the timeline, further comprising a function of causing the computer to execute a recording support process for displaying a recording speed meter when the recording time zone of the main voice role is reached,
3. The recording speed meter supports voice recording by dynamically indicating a progress status and / or remaining time of the main voice role in a recording time zone. Voice recording program.

A terminal device for recording sound while confirming “video” including continuous display of frame-by-frame images,
A content reading means for reading “content” having the video and timeline is provided,
In the timeline, two or more voice roles are set, and a recording time zone assigned to the video for each voice role is set,
A main voice role selection means for selecting a main voice role to be recorded from among the two or more voice roles;
Subordinate voice selection means for selecting the voice of a desired recording user among the voices of two or more recording users stored for each “subordinate role” of the two or more excluding the main voice role. When,
Video display means for displaying the video;
A voice recording terminal device comprising: voice recording means for recording voice.

Content input means for inputting “content” having a “video” including a continuous display of frames-by-frame images and a timeline;
Content storage means for storing the input content;
An audio recording terminal device for recording audio while checking the video of the content;
Voice recording means for storing the voice recorded by the voice recording terminal device,
In the timeline, two or more voice roles are set, and a recording time zone assigned to the video for each voice role is set,
The voice recording terminal device
Content reading means for reading the content from the content storage means;
A main voice role selection means for selecting a main voice role to be recorded from among the two or more voice roles;
Subordinate voice selection means for selecting the voice of a desired recording user among the voices of two or more recording users stored for each “subordinate role” of the two or more excluding the main voice role. When,
Video display means for displaying the video;
Voice recording means for recording voice,
The voice recording system, wherein the voice storage means stores voice recorded in a recording time zone in association with the recording time zone.