BR112019006979A2 - sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes - Google Patents
sequência para sequenciar transformações para síntese de fala via redes neurais recorrentesInfo
- Publication number
- BR112019006979A2 BR112019006979A2 BR112019006979A BR112019006979A BR112019006979A2 BR 112019006979 A2 BR112019006979 A2 BR 112019006979A2 BR 112019006979 A BR112019006979 A BR 112019006979A BR 112019006979 A BR112019006979 A BR 112019006979A BR 112019006979 A2 BR112019006979 A2 BR 112019006979A2
- Authority
- BR
- Brazil
- Prior art keywords
- sequence
- transformations
- neural networks
- speech synthesis
- recurrent neural
- Prior art date
Links
- 238000000844 transformation Methods 0.000 title abstract 2
- 230000009466 transformation Effects 0.000 title abstract 2
- 238000013528 artificial neural network Methods 0.000 title 1
- 230000015572 biosynthetic process Effects 0.000 title 1
- 230000000306 recurrent effect Effects 0.000 title 1
- 238000003786 synthesis reaction Methods 0.000 title 1
- 230000001537 neural effect Effects 0.000 abstract 2
- 239000013598 vector Substances 0.000 abstract 2
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G10L13/047—Architecture of speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Telephonic Communication Services (AREA)
Abstract
a presente invenção refere-se a um sistema que elimina o processamento de alinhamento e executa uma funcionalidade tts com a utilização de uma nova arquitetura neural. a arquitetura neural inclui um codificador e um decodificador. o codificador recebe uma entrada e a codifica em vetores. o codificador aplica uma sequência de transformações à entrada e gera um vetor que representa a sequência completa. o decodificador toma a codificação e produz um arquivo de áudio, que pode incluir molduras de áudio comprimidas.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662412165P | 2016-10-24 | 2016-10-24 | |
PCT/US2017/058138 WO2018081163A1 (en) | 2016-10-24 | 2017-10-24 | Sequence to sequence transformations for speech synthesis via recurrent neural networks |
US15/792,236 US20180114522A1 (en) | 2016-10-24 | 2017-10-24 | Sequence to sequence transformations for speech synthesis via recurrent neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
BR112019006979A2 true BR112019006979A2 (pt) | 2019-06-25 |
Family
ID=61969829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
BR112019006979A BR112019006979A2 (pt) | 2016-10-24 | 2017-10-24 | sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes |
Country Status (6)
Country | Link |
---|---|
US (1) | US20180114522A1 (pt) |
AU (1) | AU2017347995A1 (pt) |
BR (1) | BR112019006979A2 (pt) |
CA (1) | CA3037090A1 (pt) |
SG (1) | SG11201903130WA (pt) |
WO (1) | WO2018081163A1 (pt) |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180061408A1 (en) * | 2016-08-24 | 2018-03-01 | Semantic Machines, Inc. | Using paraphrase in accepting utterances in an automated assistant |
US10824798B2 (en) | 2016-11-04 | 2020-11-03 | Semantic Machines, Inc. | Data collection for a new conversational dialogue system |
US10713288B2 (en) | 2017-02-08 | 2020-07-14 | Semantic Machines, Inc. | Natural language content generator |
US10762892B2 (en) | 2017-02-23 | 2020-09-01 | Semantic Machines, Inc. | Rapid deployment of dialogue system |
US11069340B2 (en) | 2017-02-23 | 2021-07-20 | Microsoft Technology Licensing, Llc | Flexible and expandable dialogue system |
US10586530B2 (en) | 2017-02-23 | 2020-03-10 | Semantic Machines, Inc. | Expandable dialogue system |
US10733380B2 (en) * | 2017-05-15 | 2020-08-04 | Thomson Reuters Enterprise Center Gmbh | Neural paraphrase generator |
CN107293296B (zh) * | 2017-06-28 | 2020-11-20 | 百度在线网络技术(北京)有限公司 | 语音识别结果纠正方法、装置、设备及存储介质 |
US11132499B2 (en) | 2017-08-28 | 2021-09-28 | Microsoft Technology Licensing, Llc | Robust expandable dialogue system |
US10510358B1 (en) * | 2017-09-29 | 2019-12-17 | Amazon Technologies, Inc. | Resolution enhancement of speech signals for speech synthesis |
WO2019126881A1 (en) * | 2017-12-29 | 2019-07-04 | Fluent.Ai Inc. | System and method for tone recognition in spoken languages |
US11042712B2 (en) * | 2018-06-05 | 2021-06-22 | Koninklijke Philips N.V. | Simplifying and/or paraphrasing complex textual content by jointly learning semantic alignment and simplicity |
US11381715B2 (en) | 2018-07-16 | 2022-07-05 | Massachusetts Institute Of Technology | Computer method and apparatus making screens safe for those with photosensitivity |
CN110364144B (zh) * | 2018-10-25 | 2022-09-02 | 腾讯科技(深圳)有限公司 | 一种语音识别模型训练方法及装置 |
TWI698857B (zh) | 2018-11-21 | 2020-07-11 | 財團法人工業技術研究院 | 語音辨識系統及其方法、與電腦程式產品 |
CN109616093B (zh) * | 2018-12-05 | 2024-02-27 | 平安科技(深圳)有限公司 | 端对端语音合成方法、装置、设备及存储介质 |
US11508359B2 (en) * | 2019-09-11 | 2022-11-22 | Oracle International Corporation | Using backpropagation to train a dialog system |
CN112489618A (zh) * | 2019-09-12 | 2021-03-12 | 微软技术许可有限责任公司 | 利用多级别上下文特征的神经文本到语音合成 |
CN111754973B (zh) * | 2019-09-23 | 2023-09-01 | 北京京东尚科信息技术有限公司 | 一种语音合成方法及装置、存储介质 |
US11373633B2 (en) * | 2019-09-27 | 2022-06-28 | Amazon Technologies, Inc. | Text-to-speech processing using input voice characteristic data |
KR20210042707A (ko) * | 2019-10-10 | 2021-04-20 | 삼성전자주식회사 | 음성 처리 방법 및 장치 |
KR20210158382A (ko) * | 2019-11-28 | 2021-12-30 | 주식회사 엘솔루 | 음성인식을 위한 전자장치와 그 데이터 처리 방법 |
CN111247581B (zh) * | 2019-12-23 | 2023-10-10 | 深圳市优必选科技股份有限公司 | 一种多语言文本合成语音方法、装置、设备及存储介质 |
NL2025235B1 (en) * | 2020-03-30 | 2021-10-22 | Microsoft Technology Licensing Llc | Updating constraints for computerized assistant actions |
US20220101829A1 (en) * | 2020-09-29 | 2022-03-31 | Harman International Industries, Incorporated | Neural network speech recognition system |
US11461681B2 (en) | 2020-10-14 | 2022-10-04 | Openstream Inc. | System and method for multi-modality soft-agent for query population and information mining |
CN112687259B (zh) * | 2021-03-11 | 2021-06-18 | 腾讯科技(深圳)有限公司 | 一种语音合成方法、装置以及可读存储介质 |
US11600282B2 (en) * | 2021-07-02 | 2023-03-07 | Google Llc | Compressing audio waveforms using neural networks and vector quantizers |
CN115083386B (zh) * | 2022-06-10 | 2024-09-06 | 思必驰科技股份有限公司 | 音频合成方法、电子设备和存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7403890B2 (en) * | 2002-05-13 | 2008-07-22 | Roushar Joseph C | Multi-dimensional method and apparatus for automated language interpretation |
US9031834B2 (en) * | 2009-09-04 | 2015-05-12 | Nuance Communications, Inc. | Speech enhancement techniques on the power spectrum |
US9672811B2 (en) * | 2012-11-29 | 2017-06-06 | Sony Interactive Entertainment Inc. | Combining auditory attention cues with phoneme posterior scores for phone/vowel/syllable boundary detection |
US10127901B2 (en) * | 2014-06-13 | 2018-11-13 | Microsoft Technology Licensing, Llc | Hyper-structure recurrent neural networks for text-to-speech |
US9799327B1 (en) * | 2016-02-26 | 2017-10-24 | Google Inc. | Speech recognition with attention-based recurrent neural networks |
US10896669B2 (en) * | 2017-05-19 | 2021-01-19 | Baidu Usa Llc | Systems and methods for multi-speaker neural text-to-speech |
WO2018218081A1 (en) * | 2017-05-24 | 2018-11-29 | Modulate, LLC | System and method for voice-to-voice conversion |
-
2017
- 2017-10-24 BR BR112019006979A patent/BR112019006979A2/pt not_active Application Discontinuation
- 2017-10-24 SG SG11201903130WA patent/SG11201903130WA/en unknown
- 2017-10-24 US US15/792,236 patent/US20180114522A1/en not_active Abandoned
- 2017-10-24 AU AU2017347995A patent/AU2017347995A1/en not_active Abandoned
- 2017-10-24 WO PCT/US2017/058138 patent/WO2018081163A1/en unknown
- 2017-10-24 CA CA3037090A patent/CA3037090A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
CA3037090A1 (en) | 2018-05-03 |
US20180114522A1 (en) | 2018-04-26 |
SG11201903130WA (en) | 2019-05-30 |
AU2017347995A1 (en) | 2019-03-28 |
WO2018081163A8 (en) | 2019-05-09 |
WO2018081163A1 (en) | 2018-05-03 |
AU2017347995A8 (en) | 2019-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
BR112019006979A2 (pt) | sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes | |
AR125775A2 (es) | Procesador de datos de audio para decodificadores de audio y/o renderizadores y método para procesar datos de audio | |
GB2571651A (en) | Systems for predictive data analytics, and related methods and apparatus | |
PH12019501895A1 (en) | Constraining motion vector information derived by decoder-side motion vector derivation | |
BR112019013832A8 (pt) | Restauração de vetor de movimento de lado de decodificador para codificação de vídeo | |
WO2018097693A3 (ko) | 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체 | |
EP4307676A3 (en) | Method for coding image on basis of selective transform and device therefor | |
PH12018500600A1 (en) | Method and apparatus for controlling audio frame loss concealment | |
NZ734339A (en) | Voice recognition system and method of robot system | |
EA201791457A1 (ru) | Улучшенные множественные преобразования для остатка предсказания | |
BR112018013677A2 (pt) | proteínas de fusão de ligação de 41bb multivalentes e multiespecíficas | |
BR112012017145A2 (pt) | codificação de vídeo usando detecção compressiva. | |
BR112017003887A2 (pt) | ?codificador, decodificador e método para codificar e decodificar conteúdo de áudio com o uso de parâmetros para aprimorar uma ocultação?. | |
WO2017164645A3 (ko) | 비디오 신호 부호화/복호화 방법 및 장치 | |
EP3866470A4 (en) | VIDEO ENCODING AND DECODING METHODS USING DIFFERENTIAL MOTION VECTOR VALUES, AND APPARATUS FOR ENCODING AND DECODING MOTION INFORMATION | |
NZ734552A (en) | Motion vector derivation in video coding | |
BR112015016253A2 (pt) | sinalização de informação de derivação de tique de relógio para temporização de vídeo em codificação de vídeo | |
BR112017025820A2 (pt) | métodos para um codificador de video, um transcodificador de vídeo e um nó de processamento de vídeo, codificador de vídeo, transcodificador de vídeo, nó de processamento de vídeo, e, programa de computador | |
EP3905675A4 (en) | METHOD AND APPARATUS FOR ENCODING MOTION VECTOR DIFFERENCES AND METHOD AND APPARATUS FOR DECODING MOTION VECTOR DIFFERENCES | |
EP3905694A4 (en) | PREDICTION IMAGE GENERATING DEVICE, MOVING IMAGE DECODING DEVICE, MOVING IMAGE ENCODING DEVICE AND PREDICTION IMAGE GENERATING METHOD | |
BR112021021356A2 (pt) | Movimento global para candidatos de modo de mesclagem em interprevisão | |
MX2019003587A (es) | Metodos, dispositivos y corriente para codificar imagenes de movimiento compensado de rotacion global. | |
GB2543971A (en) | Motion-compensated partitioning | |
EP3902261A4 (en) | PREDICTION IMAGE GENERATING DEVICE, MOVING IMAGE DECODING DEVICE, MOVING IMAGE CODING METHOD AND PREDICTION IMAGE GENERATING METHOD | |
MY180722A (en) | Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
B11A | Dismissal acc. art.33 of ipl - examination not requested within 36 months of filing | ||
B11Y | Definitive dismissal - extension of time limit for request of examination expired [chapter 11.1.1 patent gazette] | ||
B350 | Update of information on the portal [chapter 15.35 patent gazette] |