WO2024135901A1 - Interactive emotional voice synthesis method based on counterpart voice and conversational speech information - Google Patents
Interactive emotional voice synthesis method based on counterpart voice and conversational speech information Download PDFInfo
- Publication number
- WO2024135901A1 WO2024135901A1 PCT/KR2022/021192 KR2022021192W WO2024135901A1 WO 2024135901 A1 WO2024135901 A1 WO 2024135901A1 KR 2022021192 W KR2022021192 W KR 2022021192W WO 2024135901 A1 WO2024135901 A1 WO 2024135901A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- information
- speech
- features
- user
- Prior art date
Links
- 238000001308 synthesis method Methods 0.000 title claims abstract description 27
- 230000002452 interceptive effect Effects 0.000 title abstract description 9
- 230000002996 emotional effect Effects 0.000 title description 15
- 239000000284 extract Substances 0.000 claims abstract description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 53
- 238000003786 synthesis reaction Methods 0.000 claims description 53
- 230000008451 emotion Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 6
- 238000000034 method Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 3
- 238000007654 immersion Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Definitions
- the present invention relates to voice synthesis technology, and more specifically, to a voice synthesis method for responsive emotional conversation by reflecting the other party's conversation intention and voice characteristics.
- the present invention was created to solve the above problems.
- the purpose of the present invention is to improve the quality of voice synthesis in a conversation system by using the conversation partner's voice information and conversation speech information as reference information.
- the goal is to provide a voice synthesis method for generating voice synthesis sounds.
- a voice synthesis method for achieving the above object includes extracting voice features from user's voice information; Generating reference information from extracted voice features; It includes: generating a speech synthesis sound of the system from the dialogue speech information and reference information of the system.
- the speech synthesis method further includes extracting text features from the user's conversation utterance information, and the reference information generating step may generate reference information from the extracted speech features and text features.
- Reference information may be referenced to generate a voice synthesis sound by reflecting the user's intention and emotion.
- the prosody of the system's synthesized voice may vary depending on the reference information.
- the voice synthesis method according to the present invention further includes converting the user's voice information into embedding information, and the voice feature extraction step may be extracting voice features from voice information converted into embedding information.
- the speech synthesis method further includes converting the user's conversation speech information into embedding information, and the text feature extraction step may be extracting text features from the conversation speech information converted into embedding information.
- the speech synthesis sound generation step of the system includes extracting text features from dialogue speech information of the system; and generating a speech synthesis sound of the system from the extracted text features and reference information.
- the voice synthesis method according to the present invention may further include outputting the voice synthesis sound of the generated system.
- a user may include multiple users.
- a processor that extracts voice features from the user's voice information, generates reference information from the extracted voice features, and generates a voice synthesis sound of the system from the conversation speech information and reference information of the system; and a storage unit that provides storage space necessary for the processor.
- a voice synthesis system is provided, characterized in that it includes a storage unit.
- receiving input of user's voice information comprising the step of outputting the generated speech synthesis sound.
- a microphone that receives user's voice information; a processor that extracts voice features from the user's voice information input through a microphone, generates reference information from the extracted voice features, and generates a voice synthesis sound of the system from the system's conversation speech information and reference information; and a speaker that outputs the voice synthesis sound generated by the processor.
- a voice synthesis system is provided, characterized in that it includes a speaker.
- the voice synthesis sound that had to be generated in a complex or uniform manner can be created. Quality can be improved.
- the tone or prosody of the voice synthesis sound can be changed depending on the conversation content or emotional state of the conversation partner, thereby providing improved synthesized sound in services that utilize interactive voice interface technology such as virtual assistants. By providing this, you can increase your sense of immersion in the service.
- 1 is a diagram showing a prosody-based voice synthesis method
- Figure 2 is a diagram provided to explain an emotional voice synthesis system according to an embodiment of the present invention.
- Figure 3 is a flowchart provided to explain an emotional voice synthesis method according to another embodiment of the present invention.
- Figure 4 is a diagram showing the configuration of a conversation system according to another embodiment of the present invention.
- Figure 1 is a diagram illustrating a prosody-based voice synthesis method. This is expected to create a synthetic sound with a similar rhyme by inputting the prosody information to be copied, or generate a synthetic sound with an adjusted prosody by inputting detailed information about the prosody.
- an embodiment of the present invention presents an interactive emotional voice synthesis method based on the other person's voice and conversational speech information.
- interactive voice synthesis sounds it is a technology that reflects reactive emotions by reflecting the other person's conversational intention and voice characteristics.
- the method according to the embodiment of the present invention can produce various types of synthesized sounds depending on the other person's speech or voice.
- Figure 2 is a diagram provided to explain an emotional voice synthesis system according to an embodiment of the present invention.
- the emotional voice synthesis system includes a voice encoder 110, a text encoder 120, a reference encoder 130, a TTS encoder 140, and a TTS decoder 150. do.
- the voice encoder 110 extracts voice features from the user's voice information converted into embedding information.
- the front end of the voice encoder 110 may include a voice pre-learning model that converts the user's voice information into embedding information.
- the text encoder 120 extracts text features from the user's conversation utterance information converted into embedding information.
- the front end of the text encoder 120 may include a language pre-learning model that converts the user's conversation utterance information into embedding information.
- the reference encoder 130 generates reference information by combining the voice features extracted from the voice encoder 110 and the text features extracted from the text encoder 120.
- Reference information is information that is referenced to generate a voice synthesis sound by reflecting the user's intention and emotion.
- the voice synthesis sound finally output from the emotional voice synthesis system according to an embodiment of the present invention has different styles and prosody depending on this reference information.
- the TTS encoder 140 extracts text features from conversation information to be uttered by the system.
- the TTS decoder 150 generates a speech synthesis sound of the system from the text features extracted by the TTS encoder 140 and the reference information generated by the reference encoder 130.
- the rear end of the TTS decoder 150 may include output means for outputting the voice synthesis sound of the system generated by the TTS decoder 150.
- the emotional voice synthesis system generates voices with different prosody depending on the other person's voice and conversational speech information even if the same utterance is generated.
- the number of conversation partners is not limited to one person, and conversation is possible even when there are multiple conversation partners.
- the emotional voice synthesis system is not limited to inputting raw data of voice and conversation speech information, and is also applicable to implementing a voice synthesis system including features that can be obtained through voice and conversation speech. possible.
- Figure 3 is a flowchart provided to explain an emotional voice synthesis method according to another embodiment of the present invention.
- the voice pre-learning model first converts the user's voice information into embedding information (S210), and the voice encoder 110 extracts voice features from the user's voice information converted into embedding information in step S210. (S220).
- the language pre-learning model converts the user's conversation utterance information into embedding information (S230), and the text encoder 120 extracts text features from the user's conversation utterance information converted into embedding information in step S230 (S240).
- the reference encoder 130 generates reference information by combining the voice features extracted in step S220 and the text features extracted in step S240 (S250).
- the TTS encoder 140 extracts text features from the conversation information to be uttered by the system (S260). And the TTS decoder 150 generates a speech synthesis sound of the system from the text features extracted in step S260 and the reference information generated in step S260 (S270). Afterwards, the synthesized voice sound of the system generated in step S270 is output.
- FIG. 4 is a diagram showing the configuration of a conversation system according to another embodiment of the present invention.
- the conversation system according to an embodiment of the present invention includes a microphone 310, a processor 320, a speaker 330, and a storage unit 340.
- the microphone 310 is a voice input means for receiving the user's voice utterance
- the speaker 330 is a voice output means for outputting the voice synthesis sound of the system.
- the processor 320 performs the functions of the system shown in FIG. 2 described above or performs the method shown in FIG. 3.
- the storage unit 340 provides storage space necessary for the processor 320 to operate and function.
- the quality of synthesized sounds that previously had to be created in a complex or uniform manner can be improved, and the tone of voice or utterance can be improved.
- the prosody of the generated synthesized sound can be changed.
- a computer-readable recording medium can be any data storage device that can be read by a computer and store data.
- computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc.
- computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Child & Adolescent Psychology (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
Abstract
An interactive voice synthesis method based on a counterpart voice and conversational speech information is provided. The voice synthesis method according to an embodiment of the present invention extracts a voice feature from voice information of a user, extracts a text feature from conversational speech information of the user, generates reference information from the extracted voice feature and text feature, and generates voice-synthesized sound of a system from conversational speech information of the system and the reference information. Therefore, voice information and conversational speech information of a conversational counterpart are used as reference information to generate voice-synthesized sound of a conversation system, and thus the quality of voice-synthesized sound, which was complicatedly or indiscriminately generated, can be improved.
Description
본 발명은 음성합성 기술에 관한 것으로, 더욱 상세하게는 상대방의 대화 의도 및 음성의 특성을 반영하여 반응적 감성 대화를 위한 음성합성 방법에 관한 것이다.The present invention relates to voice synthesis technology, and more specifically, to a voice synthesis method for responsive emotional conversation by reflecting the other party's conversation intention and voice characteristics.
최근 Fully End-to-end TTS(Text To Speech) 학습 기법이 연구되면서, 음성 합성 기술은 인간의 발화 음성과 구분이 어려운 수준까지 발전하고 있다.With recent research on Fully End-to-end TTS (Text To Speech) learning techniques, voice synthesis technology is advancing to a level where it is difficult to distinguish it from human speech.
하지만 일반적인 안내 및 책 읽기와 같은 분야 등에서만 적절하게 사용될 수 있을 뿐이며, 감성을 반영하는 1:1 대화 및 N:1 대화에서는 여전히 대화 분위기에 적합한 발화를 생성하지 못함으로써 음성 인터페이스의 몰입감을 떨어뜨리는 요인으로 작용하고 있다.However, it can only be used appropriately in fields such as general guidance and book reading, and in 1:1 conversations and N:1 conversations that reflect emotions, it still cannot produce utterances appropriate for the conversation atmosphere, reducing the immersion of the voice interface. It is acting as a factor.
이를 해결하기 위해 몇가지 감성 정보를 삽입하거나, 음소의 길이, 음의 높낮이 및 크기 등을 조절함으로써 음성합성음에 운율을 생성하고자 하는 연구가 최근 활발히 진행되고 있다.To solve this problem, research on creating prosody in speech synthesis by inserting some emotional information or adjusting the length of phonemes, pitch, and loudness has been actively conducted recently.
하지만, 실시간으로 대응되는 음성을 생성하는데 있어 이러한 정보를 개별적으로 예측하는 방식은 활용성이 떨어진다는 단점이 존재한다.However, the method of individually predicting this information in generating corresponding voices in real time has the disadvantage of being less useful.
본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 대화 시스템의 음성 합성음의 품질을 향상시키기 위한 방안으로, 대화 상대의 음성 정보와 대화 발화 정보를 레퍼런스 정보로 활용하여 음성 합성음을 생성하는 음성합성 방법을 제공함에 있다.The present invention was created to solve the above problems. The purpose of the present invention is to improve the quality of voice synthesis in a conversation system by using the conversation partner's voice information and conversation speech information as reference information. The goal is to provide a voice synthesis method for generating voice synthesis sounds.
상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 음성합성 방법은 사용자의 음성 정보로부터 음성 특징을 추출하는 단계; 추출된 음성 특징으로부터 레퍼런스 정보를 생성하는 단계; 시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 단계;를 포함한다.A voice synthesis method according to an embodiment of the present invention for achieving the above object includes extracting voice features from user's voice information; Generating reference information from extracted voice features; It includes: generating a speech synthesis sound of the system from the dialogue speech information and reference information of the system.
본 발명에 따른 음성합성 방법은 사용자의 대화 발화 정보로부터 텍스트 특징을 추출하는 단계;를 더 포함하고, 레퍼런스 정보 생성단계는, 추출된 음성 특징과 텍스트 특징으로부터 레퍼런스 정보를 생성할 수 있다.The speech synthesis method according to the present invention further includes extracting text features from the user's conversation utterance information, and the reference information generating step may generate reference information from the extracted speech features and text features.
레퍼런스 정보는, 사용자의 의도와 감성을 반영하여 음성 합성음을 생성하기 위해 참조될 수 있다.Reference information may be referenced to generate a voice synthesis sound by reflecting the user's intention and emotion.
시스템의 음성 합성음은, 레퍼런스 정보에 따라 운율이 달라질 수 있다. The prosody of the system's synthesized voice may vary depending on the reference information.
본 발명에 따른 음성합성 방법은 사용자의 음성 정보를 임베딩 정보를 변환하는 단계;를 더 포함하고, 음성 특징 추출단계는, 임베딩 정보로 변환된 음성 정보로부터 음성 특징을 추출하는 것일 수 있다.The voice synthesis method according to the present invention further includes converting the user's voice information into embedding information, and the voice feature extraction step may be extracting voice features from voice information converted into embedding information.
본 발명에 따른 음성합성 방법은 사용자의 대화 발화 정보를 임베딩 정보를 변환하는 단계;를 더 포함하고, 텍스트 특징 추출단계는, 임베딩 정보로 변환된 대화 발화 정보로부터 텍스트 특징을 추출하는 것일 수 있다.The speech synthesis method according to the present invention further includes converting the user's conversation speech information into embedding information, and the text feature extraction step may be extracting text features from the conversation speech information converted into embedding information.
시스템의 음성 합성음 생성단계는, 시스템의 대화 발화 정보로부터 텍스트 특징을 추출하는 단계; 및 추출된 텍스트 특징과 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 단계;를 포함할 수 있다.The speech synthesis sound generation step of the system includes extracting text features from dialogue speech information of the system; and generating a speech synthesis sound of the system from the extracted text features and reference information.
본 발명에 따른 음성합성 방법은 생성된 시스템의 음성 합성음을 출력하는 단계;를 더 포함할 수 있다.The voice synthesis method according to the present invention may further include outputting the voice synthesis sound of the generated system.
사용자는, 다수의 사용자들을 포함할 수 있다.A user may include multiple users.
본 발명의 다른 측면에 따르면, 사용자의 음성 정보로부터 음성 특징을 추출하고, 추출된 음성 특징으로부터 레퍼런스 정보를 생성하며, 시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 프로세서; 및 프로세서에 필요한 저장 공간을 제공하는 저장부;를 포함하는 것을 특징으로 하는 음성합성 시스템이 제공된다.According to another aspect of the present invention, there is provided a processor that extracts voice features from the user's voice information, generates reference information from the extracted voice features, and generates a voice synthesis sound of the system from the conversation speech information and reference information of the system; and a storage unit that provides storage space necessary for the processor. A voice synthesis system is provided, characterized in that it includes a storage unit.
본 발명의 또다른 측면에 따르면, 사용자의 음성 정보를 입력받는 단계; 입력된 사용자의 음성 정보로부터 음성 특징을 추출하는 단계; 추출된 음성 특징으로부터 레퍼런스 정보를 생성하는 단계; 시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 단계; 생성된 음성 합성음을 출력하는 단계;를 포함하는 것을 특징으로 하는 음성합성 방법이 제공된다.According to another aspect of the present invention, receiving input of user's voice information; Extracting voice features from input user voice information; Generating reference information from extracted voice features; Generating a speech synthesis sound of the system from the dialogue speech information and reference information of the system; A speech synthesis method is provided, comprising the step of outputting the generated speech synthesis sound.
본 발명의 또다른 측면에 따르면, 사용자의 음성 정보를 입력받는 마이크; 마이크를 통해 입력된 사용자의 음성 정보로부터 음성 특징을 추출하고, 추출된 음성 특징으로부터 레퍼런스 정보를 생성하며, 시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 프로세서; 및 프로세서에 의해 생성된 음성 합성음을 출력하는 스피커;를 포함하는 것을 특징으로 하는 음성합성 시스템이 제공된다.According to another aspect of the present invention, a microphone that receives user's voice information; a processor that extracts voice features from the user's voice information input through a microphone, generates reference information from the extracted voice features, and generates a voice synthesis sound of the system from the system's conversation speech information and reference information; and a speaker that outputs the voice synthesis sound generated by the processor. A voice synthesis system is provided, characterized in that it includes a speaker.
이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 대화 상대의 음성 정보와 대화 발화 정보를 레퍼런스 정보로 활용하여 대화 시스템의 음성 합성음을 생성함으로써, 복잡하거나 일률적으로 생성할 수 밖에 없었던 음성 합성음의 품질을 향상시킬 수 있게 된다.As described above, according to the embodiments of the present invention, by using the voice information of the conversation partner and the conversation speech information as reference information to generate the voice synthesis sound of the conversation system, the voice synthesis sound that had to be generated in a complex or uniform manner can be created. Quality can be improved.
특히 본 발명의 실시예들에 따르면, 대화 상대방의 대화 내용이나 감정 상태에 따라 음성 합성음의 톤이나 운율을 변화시킬 수 있어, 가상 비서와 같은 대화형 음성 인터페이스 기술이 활용되는 서비스에서 보다 향상된 합성음을 제공함으로써 서비스 몰입감을 높일 수 있게 된다.In particular, according to embodiments of the present invention, the tone or prosody of the voice synthesis sound can be changed depending on the conversation content or emotional state of the conversation partner, thereby providing improved synthesized sound in services that utilize interactive voice interface technology such as virtual assistants. By providing this, you can increase your sense of immersion in the service.
도 1은 운율 기반의 음성 합성 방법을 도시한 도면,1 is a diagram showing a prosody-based voice synthesis method;
도 2는 본 발명의 일 실시예에 따른 감성적 음성합성 시스템의 설명에 제공되는 도면,Figure 2 is a diagram provided to explain an emotional voice synthesis system according to an embodiment of the present invention;
도 3은 본 발명의 다른 실시예에 따른 감성적 음성합성 방법의 설명에 제공되는 흐름도,Figure 3 is a flowchart provided to explain an emotional voice synthesis method according to another embodiment of the present invention;
도 4는 본 발명의 또 다른 실시예에 따른 대화 시스템의 구성을 도시한 도면이다.Figure 4 is a diagram showing the configuration of a conversation system according to another embodiment of the present invention.
이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.
도 1은 운율 기반의 음성 합성 방법을 도시한 도면이다. 이는 모사하고 싶은 운율 정보를 입력으로 하여 유사한 운율의 합성음을 만들기를 기대하거나, 운율에 대한 상세 정보를 입력으로 하여 운율이 조정된 합성음을 생성한다.Figure 1 is a diagram illustrating a prosody-based voice synthesis method. This is expected to create a synthetic sound with a similar rhyme by inputting the prosody information to be copied, or generate a synthetic sound with an adjusted prosody by inputting detailed information about the prosody.
이와 같은 방식을 사용할 경우, 현재 생성을 해야하는 운율에 적합한 샘플을 찾거나, 상세 정보를 분석하여 입력해야 하는 등 추가적인 연구가 필요하다. 즉, 대화형 음성 합성 생성 모델의 기술로 활용하기에는 추가적인 연구가 필요하다는 한계가 존재한다.When using this method, additional research is required, such as finding a sample suitable for the current prosody to be created, or analyzing and entering detailed information. In other words, there is a limitation in that additional research is needed to utilize the technology as an interactive speech synthesis generation model.
이에 본 발명의 실시예에서는 상대 음성 및 대화 발화 정보 기반 대화형 감성 음성합성 방법을 제시한다. 대화형 음성합성음을 생성함에 있어, 상대의 대화 의도 및 음성의 특성을 반영하여 반응적 감성을 반영하는 기술이다.Accordingly, an embodiment of the present invention presents an interactive emotional voice synthesis method based on the other person's voice and conversational speech information. When generating interactive voice synthesis sounds, it is a technology that reflects reactive emotions by reflecting the other person's conversational intention and voice characteristics.
음소의 길이, 음의 높낮이, 크기를 활용하거나 사전에 정의된 감성(행복, 화남, 슬픔 등)의 종류에 한정적으로 정보를 입력으로 하여 생성하는 음성합성에 비해서도, 본 발명의 실시예에 따른 방법은 상대방의 발화나 음성에 따라 다양한 종류의 합성음을 생성할 수 있다.Compared to speech synthesis that utilizes the length of phonemes, pitch, and size of the sound, or inputs information limited to the types of emotions defined in advance (happy, angry, sad, etc.), the method according to the embodiment of the present invention can produce various types of synthesized sounds depending on the other person's speech or voice.
특히 음성 합성의 운율 연구에서 해결하지 못하는 One-to-Many Problem을 해소하기 위해, 입력 정보로 다양한 경우의 수가 가능하게끔 함으로써 동일한 발화에서 다른 톤으로 말을 하더라도 다양한 합성음을 생성할 수 있도록 한다.In particular, in order to solve the One-to-Many Problem that cannot be solved in prosody research in speech synthesis, a variety of cases are possible with input information, so that various synthesized sounds can be generated even if the same utterance is spoken in a different tone.
도 2는 본 발명의 일 실시예에 따른 감성적 음성합성 시스템의 설명에 제공되는 도면이다. 본 발명의 실시예에 따른 감성적 음성합성 시스템은 도시된 바와 같이, 음성 인코더(110), 텍스트 인코더(120), 레퍼런스 인코더(130), TTS 인코더(140) 및 TTS 디코더(150)를 포함하여 구성된다.Figure 2 is a diagram provided to explain an emotional voice synthesis system according to an embodiment of the present invention. As shown, the emotional voice synthesis system according to an embodiment of the present invention includes a voice encoder 110, a text encoder 120, a reference encoder 130, a TTS encoder 140, and a TTS decoder 150. do.
음성 인코더(110)는 임베딩 정보로 변환된 사용자의 음성 정보로부터 음성 특징을 추출한다. 음성 인코더(110)의 앞단에는 사용자의 음성 정보를 임베딩 정보로 변환하는 음성 사전학습 모델이 포함될 수 있다.The voice encoder 110 extracts voice features from the user's voice information converted into embedding information. The front end of the voice encoder 110 may include a voice pre-learning model that converts the user's voice information into embedding information.
텍스트 인코더(120)는 임베딩 정보로 변환된 사용자의 대화 발화 정보로부터 텍스트 특징을 추출한다. 텍스트 인코더(120)의 앞단에는 사용자의 대화 발화 정보를 임베딩 정보로 변환하는 언어 사전학습 모델이 포함될 수 있다.The text encoder 120 extracts text features from the user's conversation utterance information converted into embedding information. The front end of the text encoder 120 may include a language pre-learning model that converts the user's conversation utterance information into embedding information.
레퍼런스 인코더(130)는 음성 인코더(110)에서 추출된 음성 특징과 텍스트 인코더(120)에서 추출된 텍스트 특징을 합성하여 레퍼런스 정보를 생성한다. 레퍼런스 정보는 사용자의 의도와 감성을 반영하여 음성 합성음을 생성하기 위해 참조되는 정보이다. 본 발명의 실시예에 따른 감성적 음성합성 시스템에서 최종 출력되는 음성 합성음은 이 레퍼런스 정보에 따라 스타일과 운율이 달라지게 된다.The reference encoder 130 generates reference information by combining the voice features extracted from the voice encoder 110 and the text features extracted from the text encoder 120. Reference information is information that is referenced to generate a voice synthesis sound by reflecting the user's intention and emotion. The voice synthesis sound finally output from the emotional voice synthesis system according to an embodiment of the present invention has different styles and prosody depending on this reference information.
TTS 인코더(140)는 시스템이 발화할 대화 발화 정보로부터 텍스트 특징을 추출한다.The TTS encoder 140 extracts text features from conversation information to be uttered by the system.
TTS 디코더(150)는 TTS 인코더(140)에 의해 추출된 텍스트 특징과 레퍼런스 인코더(130)에 의해 생성된 레퍼런스 정보로부터 시스템의 음성 합성음을 생성한다.The TTS decoder 150 generates a speech synthesis sound of the system from the text features extracted by the TTS encoder 140 and the reference information generated by the reference encoder 130.
TTS 디코더(150)의 후단에는 TTS 디코더(150)에서 생성된 시스템의 음성 합성음을 출력하는 출력 수단이 포함될 수 있다.The rear end of the TTS decoder 150 may include output means for outputting the voice synthesis sound of the system generated by the TTS decoder 150.
이에 의해, 본 발명의 실시예에 따른 감성적 음성합성 시스템은 동일한 발화를 생성하더라도 상대방의 음성 및 대화 발화 정보에 따라 다른 운율을 가지는 음성을 생성하게 된다.As a result, the emotional voice synthesis system according to an embodiment of the present invention generates voices with different prosody depending on the other person's voice and conversational speech information even if the same utterance is generated.
나아가 대화 상대방(사용자)은 1명에 제한되지 않으며, 다수의 대화 상대방들이 있는 경우에도 대화가 가능하다.Furthermore, the number of conversation partners (users) is not limited to one person, and conversation is possible even when there are multiple conversation partners.
본 발명의 실시예에 따른 감성적 음성합성 시스템은 음성 및 대화 발화 정보의 원 데이터를 입력으로 하는데 국한하지 않으며, 음성 및 대화 발화를 통해 획득할 수 있는 특징를 포함하여 음성합성 시스템을 구현하는 경우에도 적용가능하다.The emotional voice synthesis system according to an embodiment of the present invention is not limited to inputting raw data of voice and conversation speech information, and is also applicable to implementing a voice synthesis system including features that can be obtained through voice and conversation speech. possible.
도 3은 본 발명의 다른 실시예에 따른 감성적 음성합성 방법의 설명에 제공되는 흐름도이다.Figure 3 is a flowchart provided to explain an emotional voice synthesis method according to another embodiment of the present invention.
감성적 음성합성을 위해, 먼저 음성 사전학습 모델이 사용자의 음성 정보를 임베딩 정보로 변환하고(S210), 음성 인코더(110)는 S210단계에서 임베딩 정보로 변환된 사용자의 음성 정보로부터 음성 특징을 추출한다(S220).For emotional voice synthesis, the voice pre-learning model first converts the user's voice information into embedding information (S210), and the voice encoder 110 extracts voice features from the user's voice information converted into embedding information in step S210. (S220).
한편 언어 사전학습 모델은 사용자의 대화 발화 정보를 임베딩 정보로 변환하고(S230), 텍스트 인코더(120)는 S230단계에서 임베딩 정보로 변환된 사용자의 대화 발화 정보로부터 텍스트 특징을 추출한다(S240).Meanwhile, the language pre-learning model converts the user's conversation utterance information into embedding information (S230), and the text encoder 120 extracts text features from the user's conversation utterance information converted into embedding information in step S230 (S240).
그러면 레퍼런스 인코더(130)는 S220단계에서 추출된 음성 특징과 S240단계에서 추출된 텍스트 특징을 합성하여 레퍼런스 정보를 생성한다(S250).Then, the reference encoder 130 generates reference information by combining the voice features extracted in step S220 and the text features extracted in step S240 (S250).
다음 TTS 인코더(140)는 시스템이 발화할 대화 발화 정보로부터 텍스트 특징을 추출한다(S260). 그리고 TTS 디코더(150)는 S260단계에서 추출된 텍스트 특징과 S260단계에서 생성된 레퍼런스 정보로부터 시스템의 음성 합성음을 생성한다(S270). 이후 S270단계에서 생성된 시스템의 음성 합성음을 출력한다.Next, the TTS encoder 140 extracts text features from the conversation information to be uttered by the system (S260). And the TTS decoder 150 generates a speech synthesis sound of the system from the text features extracted in step S260 and the reference information generated in step S260 (S270). Afterwards, the synthesized voice sound of the system generated in step S270 is output.
도 4는 본 발명의 또 다른 실시예에 따른 대화 시스템의 구성을 도시한 도면이다. 본 발명의 실시예에 따른 대화 시스템은 마이크(310), 프로세서(320), 스피커(330) 및 저장부(340)를 포함하여 구성된다.Figure 4 is a diagram showing the configuration of a conversation system according to another embodiment of the present invention. The conversation system according to an embodiment of the present invention includes a microphone 310, a processor 320, a speaker 330, and a storage unit 340.
마이크(310)는 사용자의 음성 발화를 입력받기 위한 음성 입력 수단이고, 스피커(330)는 시스템의 음성 합성음을 출력하기 위한 음성 출력 수단이다.The microphone 310 is a voice input means for receiving the user's voice utterance, and the speaker 330 is a voice output means for outputting the voice synthesis sound of the system.
프로세서(320)는 전술한 도 2에 도시된 시스템의 기능을 수행 또는 도 3에 도시된 방법을 수행한다. 저장부(340)는 프로세서(320)가 동작하고 기능함에 있어 필요한 저장 공간을 제공한다.The processor 320 performs the functions of the system shown in FIG. 2 described above or performs the method shown in FIG. 3. The storage unit 340 provides storage space necessary for the processor 320 to operate and function.
지금까지 상대 음성 및 대화 발화 정보 기반 대화형 감성 음성합성 방법에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, the interactive emotional voice synthesis method based on the other person's voice and conversational speech information has been described in detail with preferred embodiments.
본 발명의 실시예에서는 기존 대화형 음성 합성 모델에서 상대의 발화 및 음성을 입력으로 활용함으로써, 기존에 복잡하거나 일률적으로 생성할 수 밖에 없었던 합성음의 품질을 향상시킬 수 있으며, 음성의 톤이나 발화를 조정함으로써 생성된 합성음의 운율을 변화시킬 수 있다.In an embodiment of the present invention, by using the other person's speech and voice as input in the existing interactive voice synthesis model, the quality of synthesized sounds that previously had to be created in a complex or uniform manner can be improved, and the tone of voice or utterance can be improved. By adjusting, the prosody of the generated synthesized sound can be changed.
이에 의해 가상 비서와 같은 대화형 음성 인터페이스 기술이 활용되는 서비스에서 보다 향상된 합성음을 제공함으로써 서비스 몰입감을 높일 수 있게 되고, 동일한 발화(텍스트)에서의 음성 변화에 적응이 가능한 음성 합성 기술 연구에 활용될 수 있다.As a result, it will be possible to increase immersion in the service by providing improved synthesized sounds in services that utilize interactive voice interface technology such as virtual assistants, and will be used in research on voice synthesis technology that can adapt to voice changes in the same speech (text). You can.
한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.Meanwhile, of course, the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program that performs the functions of the device and method according to this embodiment. Additionally, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable code recorded on a computer-readable recording medium. A computer-readable recording medium can be any data storage device that can be read by a computer and store data. For example, of course, computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc. Additionally, computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.
또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the invention pertains without departing from the gist of the present invention as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical idea or perspective of the present invention.
Claims (12)
- 사용자의 음성 정보로부터 음성 특징을 추출하는 단계;Extracting voice features from user voice information;추출된 음성 특징으로부터 레퍼런스 정보를 생성하는 단계;Generating reference information from extracted voice features;시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 단계;를 포함하는 것을 특징으로 하는 음성합성 방법.A speech synthesis method comprising: generating a speech synthesis sound of the system from dialogue speech information and reference information of the system.
- 청구항 1에 있어서,In claim 1,사용자의 대화 발화 정보로부터 텍스트 특징을 추출하는 단계;를 더 포함하고,Further comprising: extracting text features from the user's conversation utterance information,레퍼런스 정보 생성단계는,The reference information creation step is,추출된 음성 특징과 텍스트 특징으로부터 레퍼런스 정보를 생성하는 것을 특징으로 하는 음성합성 방법.A speech synthesis method characterized by generating reference information from extracted speech features and text features.
- 청구항 2에 있어서,In claim 2,레퍼런스 정보는,Reference information is:사용자의 의도와 감성을 반영하여 음성 합성음을 생성하기 위해 참조되는 것을 특징으로 하는 음성합성 방법.A voice synthesis method characterized by being used to generate voice synthesis sounds by reflecting the user's intentions and emotions.
- 청구항 3에 있어서,In claim 3,시스템의 음성 합성음은,The voice synthesis sound of the system is,레퍼런스 정보에 따라 운율이 달라지는 것을 특징으로 하는 음성합성 방법.A speech synthesis method characterized in that prosody varies depending on reference information.
- 청구항 2에 있어서,In claim 2,사용자의 음성 정보를 임베딩 정보를 변환하는 단계;를 더 포함하고,Converting the user's voice information into embedding information; further comprising,음성 특징 추출단계는,The voice feature extraction step is,임베딩 정보로 변환된 음성 정보로부터 음성 특징을 추출하는 것을 특징으로 하는 음성합성 방법.A speech synthesis method characterized by extracting speech features from speech information converted to embedding information.
- 청구항 2에 있어서,In claim 2,사용자의 대화 발화 정보를 임베딩 정보를 변환하는 단계;를 더 포함하고,Converting the user's conversation utterance information into embedding information,텍스트 특징 추출단계는,The text feature extraction step is,임베딩 정보로 변환된 대화 발화 정보로부터 텍스트 특징을 추출하는 것을 특징으로 하는 음성합성 방법.A speech synthesis method characterized by extracting text features from conversation utterance information converted to embedding information.
- 청구항 1에 있어서,In claim 1,시스템의 음성 합성음 생성단계는,The voice synthesis sound generation stage of the system is,시스템의 대화 발화 정보로부터 텍스트 특징을 추출하는 단계; 및Extracting text features from dialogue utterance information in the system; and추출된 텍스트 특징과 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 단계;를 포함하는 것을 특징으로 하는 음성합성 방법.A speech synthesis method comprising: generating a speech synthesis sound of the system from extracted text features and reference information.
- 청구항 1에 있어서,In claim 1,생성된 시스템의 음성 합성음을 출력하는 단계;를 더 포함하는 것을 특징으로 하는 음성합성 방법.A voice synthesis method further comprising: outputting a voice synthesis sound of the generated system.
- 청구항 1에 있어서,In claim 1,사용자는,Users,다수의 사용자들을 포함하는 것을 특징으로 하는 음성합성 방법.A voice synthesis method characterized by including a plurality of users.
- 사용자의 음성 정보로부터 음성 특징을 추출하고, 추출된 음성 특징으로부터 레퍼런스 정보를 생성하며, 시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 프로세서; 및a processor that extracts voice features from the user's voice information, generates reference information from the extracted voice features, and generates a voice synthesis sound of the system from the system's conversation speech information and reference information; and프로세서에 필요한 저장 공간을 제공하는 저장부;를 포함하는 것을 특징으로 하는 음성합성 시스템.A voice synthesis system comprising a storage unit that provides storage space necessary for the processor.
- 사용자의 음성 정보를 입력받는 단계;Receiving user voice information;입력된 사용자의 음성 정보로부터 음성 특징을 추출하는 단계;Extracting voice features from input user voice information;추출된 음성 특징으로부터 레퍼런스 정보를 생성하는 단계;Generating reference information from extracted voice features;시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 단계;Generating a speech synthesis sound of the system from the dialogue speech information and reference information of the system;생성된 음성 합성음을 출력하는 단계;를 포함하는 것을 특징으로 하는 음성합성 방법.A speech synthesis method comprising: outputting the generated speech synthesis sound.
- 사용자의 음성 정보를 입력받는 마이크;A microphone that receives the user's voice information;마이크를 통해 입력된 사용자의 음성 정보로부터 음성 특징을 추출하고, 추출된 음성 특징으로부터 레퍼런스 정보를 생성하며, 시스템의 대화 발화 정보와 레퍼런스 정보로부터 시스템의 음성 합성음을 생성하는 프로세서; 및a processor that extracts voice features from the user's voice information input through a microphone, generates reference information from the extracted voice features, and generates a voice synthesis sound of the system from the system's conversation speech information and reference information; and프로세서에 의해 생성된 음성 합성음을 출력하는 스피커;를 포함하는 것을 특징으로 하는 음성합성 시스템.A voice synthesis system comprising a speaker that outputs the voice synthesis sound generated by the processor.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2022-0182952 | 2022-12-23 | ||
KR1020220182952A KR20240100869A (en) | 2022-12-23 | 2022-12-23 | Method for synthesizing interactive emotional voice based on the other party's voice and conversational utterance information |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024135901A1 true WO2024135901A1 (en) | 2024-06-27 |
Family
ID=91588846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/021192 WO2024135901A1 (en) | 2022-12-23 | 2022-12-23 | Interactive emotional voice synthesis method based on counterpart voice and conversational speech information |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR20240100869A (en) |
WO (1) | WO2024135901A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050261905A1 (en) * | 2004-05-21 | 2005-11-24 | Samsung Electronics Co., Ltd. | Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same |
KR20140062656A (en) * | 2012-11-14 | 2014-05-26 | 한국전자통신연구원 | Spoken dialog management system based on dual dialog management using hierarchical dialog task library |
KR20200015418A (en) * | 2018-08-02 | 2020-02-12 | 네오사피엔스 주식회사 | Method and computer readable storage medium for performing text-to-speech synthesis using machine learning based on sequential prosody feature |
KR20210106657A (en) * | 2020-02-21 | 2021-08-31 | 주식회사 케이티 | Device, method and computer program for providing conversation service based on emotion of user |
KR20220070979A (en) * | 2020-11-23 | 2022-05-31 | 서울대학교산학협력단 | Style speech synthesis apparatus and speech synthesis method using style encoding network |
KR20220116660A (en) * | 2021-02-15 | 2022-08-23 | 임상현 | Tumbler device with artificial intelligence speaker function |
-
2022
- 2022-12-23 WO PCT/KR2022/021192 patent/WO2024135901A1/en unknown
- 2022-12-23 KR KR1020220182952A patent/KR20240100869A/en unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050261905A1 (en) * | 2004-05-21 | 2005-11-24 | Samsung Electronics Co., Ltd. | Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same |
KR20140062656A (en) * | 2012-11-14 | 2014-05-26 | 한국전자통신연구원 | Spoken dialog management system based on dual dialog management using hierarchical dialog task library |
KR20200015418A (en) * | 2018-08-02 | 2020-02-12 | 네오사피엔스 주식회사 | Method and computer readable storage medium for performing text-to-speech synthesis using machine learning based on sequential prosody feature |
KR20210106657A (en) * | 2020-02-21 | 2021-08-31 | 주식회사 케이티 | Device, method and computer program for providing conversation service based on emotion of user |
KR20220070979A (en) * | 2020-11-23 | 2022-05-31 | 서울대학교산학협력단 | Style speech synthesis apparatus and speech synthesis method using style encoding network |
KR20220116660A (en) * | 2021-02-15 | 2022-08-23 | 임상현 | Tumbler device with artificial intelligence speaker function |
Also Published As
Publication number | Publication date |
---|---|
KR20240100869A (en) | 2024-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Delić et al. | Speech technology progress based on new machine learning paradigm | |
US20210366462A1 (en) | Emotion classification information-based text-to-speech (tts) method and apparatus | |
CN112735373A (en) | Speech synthesis method, apparatus, device and storage medium | |
US20040073423A1 (en) | Phonetic speech-to-text-to-speech system and method | |
CN108242238B (en) | Audio file generation method and device and terminal equipment | |
JPH10507536A (en) | Language recognition | |
TW201214413A (en) | Modification of speech quality in conversations over voice channels | |
US20230298564A1 (en) | Speech synthesis method and apparatus, device, and storage medium | |
CN111883103B (en) | Method and device for synthesizing voice | |
CA3160315C (en) | Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore | |
CN111415651A (en) | Audio information extraction method, terminal and computer readable storage medium | |
WO2024135901A1 (en) | Interactive emotional voice synthesis method based on counterpart voice and conversational speech information | |
CN117351929A (en) | Translation method, translation device, electronic equipment and storage medium | |
CN109616116B (en) | Communication system and communication method thereof | |
JP2002229590A (en) | Speech recognition system | |
Westall et al. | Speech technology for telecommunications | |
US20220383850A1 (en) | System and method for posthumous dynamic speech synthesis using neural networks and deep learning | |
Rabiner | Toward vision 2001: Voice and audio processing considerations | |
JPS63157226A (en) | Conversation type sentence reading device | |
JP2021177228A (en) | Electronic device for multilingual multi-speaker individuality expression voice synthesis and processing method for the same | |
WO2021060591A1 (en) | Device for changing speech synthesis models according to character utterance contexts | |
Westall | Review of speech technologies for telecommunications | |
US11501091B2 (en) | Real-time speech-to-speech generation (RSSG) and sign language conversion apparatus, method and a system therefore | |
Schramm et al. | A Brazilian Portuguese language corpus development | |
WO2024101975A1 (en) | Conversation method and system for operating conversation model on basis of presence or absence of relevant knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22969335 Country of ref document: EP Kind code of ref document: A1 |