BR112019006979A2 - sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes - Google Patents

sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes

Info

Publication number
BR112019006979A2
BR112019006979A2 BR112019006979A BR112019006979A BR112019006979A2 BR 112019006979 A2 BR112019006979 A2 BR 112019006979A2 BR 112019006979 A BR112019006979 A BR 112019006979A BR 112019006979 A BR112019006979 A BR 112019006979A BR 112019006979 A2 BR112019006979 A2 BR 112019006979A2
Authority
BR
Brazil
Prior art keywords
sequence
transformations
neural networks
speech synthesis
recurrent neural
Prior art date
Application number
BR112019006979A
Other languages
English (en)
Inventor
Lee Maas Andrew
Klein Daniel
Lawrence Roth Daniel
Leo Wright Hall David
Steven Gillick Laurence
Andrew Wegmann Steven
Original Assignee
Semantic Machines Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Semantic Machines Inc filed Critical Semantic Machines Inc
Publication of BR112019006979A2 publication Critical patent/BR112019006979A2/pt

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephonic Communication Services (AREA)

Abstract

a presente invenção refere-se a um sistema que elimina o processamento de alinhamento e executa uma funcionalidade tts com a utilização de uma nova arquitetura neural. a arquitetura neural inclui um codificador e um decodificador. o codificador recebe uma entrada e a codifica em vetores. o codificador aplica uma sequência de transformações à entrada e gera um vetor que representa a sequência completa. o decodificador toma a codificação e produz um arquivo de áudio, que pode incluir molduras de áudio comprimidas.
BR112019006979A 2016-10-24 2017-10-24 sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes BR112019006979A2 (pt)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201662412165P 2016-10-24 2016-10-24
PCT/US2017/058138 WO2018081163A1 (en) 2016-10-24 2017-10-24 Sequence to sequence transformations for speech synthesis via recurrent neural networks
US15/792,236 US20180114522A1 (en) 2016-10-24 2017-10-24 Sequence to sequence transformations for speech synthesis via recurrent neural networks

Publications (1)

Publication Number Publication Date
BR112019006979A2 true BR112019006979A2 (pt) 2019-06-25

Family

ID=61969829

Family Applications (1)

Application Number Title Priority Date Filing Date
BR112019006979A BR112019006979A2 (pt) 2016-10-24 2017-10-24 sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes

Country Status (6)

Country Link
US (1) US20180114522A1 (pt)
AU (1) AU2017347995A1 (pt)
BR (1) BR112019006979A2 (pt)
CA (1) CA3037090A1 (pt)
SG (1) SG11201903130WA (pt)
WO (1) WO2018081163A1 (pt)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180061408A1 (en) * 2016-08-24 2018-03-01 Semantic Machines, Inc. Using paraphrase in accepting utterances in an automated assistant
US10824798B2 (en) 2016-11-04 2020-11-03 Semantic Machines, Inc. Data collection for a new conversational dialogue system
US10713288B2 (en) 2017-02-08 2020-07-14 Semantic Machines, Inc. Natural language content generator
US10762892B2 (en) 2017-02-23 2020-09-01 Semantic Machines, Inc. Rapid deployment of dialogue system
US11069340B2 (en) 2017-02-23 2021-07-20 Microsoft Technology Licensing, Llc Flexible and expandable dialogue system
US10586530B2 (en) 2017-02-23 2020-03-10 Semantic Machines, Inc. Expandable dialogue system
US10733380B2 (en) * 2017-05-15 2020-08-04 Thomson Reuters Enterprise Center Gmbh Neural paraphrase generator
CN107293296B (zh) * 2017-06-28 2020-11-20 百度在线网络技术(北京)有限公司 语音识别结果纠正方法、装置、设备及存储介质
US11132499B2 (en) 2017-08-28 2021-09-28 Microsoft Technology Licensing, Llc Robust expandable dialogue system
US10510358B1 (en) * 2017-09-29 2019-12-17 Amazon Technologies, Inc. Resolution enhancement of speech signals for speech synthesis
WO2019126881A1 (en) * 2017-12-29 2019-07-04 Fluent.Ai Inc. System and method for tone recognition in spoken languages
US11042712B2 (en) * 2018-06-05 2021-06-22 Koninklijke Philips N.V. Simplifying and/or paraphrasing complex textual content by jointly learning semantic alignment and simplicity
US11381715B2 (en) 2018-07-16 2022-07-05 Massachusetts Institute Of Technology Computer method and apparatus making screens safe for those with photosensitivity
CN110364144B (zh) * 2018-10-25 2022-09-02 腾讯科技(深圳)有限公司 一种语音识别模型训练方法及装置
TWI698857B (zh) 2018-11-21 2020-07-11 財團法人工業技術研究院 語音辨識系統及其方法、與電腦程式產品
CN109616093B (zh) * 2018-12-05 2024-02-27 平安科技(深圳)有限公司 端对端语音合成方法、装置、设备及存储介质
US11508359B2 (en) * 2019-09-11 2022-11-22 Oracle International Corporation Using backpropagation to train a dialog system
CN112489618A (zh) * 2019-09-12 2021-03-12 微软技术许可有限责任公司 利用多级别上下文特征的神经文本到语音合成
CN111754973B (zh) * 2019-09-23 2023-09-01 北京京东尚科信息技术有限公司 一种语音合成方法及装置、存储介质
US11373633B2 (en) * 2019-09-27 2022-06-28 Amazon Technologies, Inc. Text-to-speech processing using input voice characteristic data
KR20210042707A (ko) * 2019-10-10 2021-04-20 삼성전자주식회사 음성 처리 방법 및 장치
KR20210158382A (ko) * 2019-11-28 2021-12-30 주식회사 엘솔루 음성인식을 위한 전자장치와 그 데이터 처리 방법
CN111247581B (zh) * 2019-12-23 2023-10-10 深圳市优必选科技股份有限公司 一种多语言文本合成语音方法、装置、设备及存储介质
NL2025235B1 (en) * 2020-03-30 2021-10-22 Microsoft Technology Licensing Llc Updating constraints for computerized assistant actions
US20220101829A1 (en) * 2020-09-29 2022-03-31 Harman International Industries, Incorporated Neural network speech recognition system
US11461681B2 (en) 2020-10-14 2022-10-04 Openstream Inc. System and method for multi-modality soft-agent for query population and information mining
CN112687259B (zh) * 2021-03-11 2021-06-18 腾讯科技(深圳)有限公司 一种语音合成方法、装置以及可读存储介质
US11600282B2 (en) * 2021-07-02 2023-03-07 Google Llc Compressing audio waveforms using neural networks and vector quantizers
CN115083386B (zh) * 2022-06-10 2024-09-06 思必驰科技股份有限公司 音频合成方法、电子设备和存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7403890B2 (en) * 2002-05-13 2008-07-22 Roushar Joseph C Multi-dimensional method and apparatus for automated language interpretation
US9031834B2 (en) * 2009-09-04 2015-05-12 Nuance Communications, Inc. Speech enhancement techniques on the power spectrum
US9672811B2 (en) * 2012-11-29 2017-06-06 Sony Interactive Entertainment Inc. Combining auditory attention cues with phoneme posterior scores for phone/vowel/syllable boundary detection
US10127901B2 (en) * 2014-06-13 2018-11-13 Microsoft Technology Licensing, Llc Hyper-structure recurrent neural networks for text-to-speech
US9799327B1 (en) * 2016-02-26 2017-10-24 Google Inc. Speech recognition with attention-based recurrent neural networks
US10896669B2 (en) * 2017-05-19 2021-01-19 Baidu Usa Llc Systems and methods for multi-speaker neural text-to-speech
WO2018218081A1 (en) * 2017-05-24 2018-11-29 Modulate, LLC System and method for voice-to-voice conversion

Also Published As

Publication number Publication date
CA3037090A1 (en) 2018-05-03
US20180114522A1 (en) 2018-04-26
SG11201903130WA (en) 2019-05-30
AU2017347995A1 (en) 2019-03-28
WO2018081163A8 (en) 2019-05-09
WO2018081163A1 (en) 2018-05-03
AU2017347995A8 (en) 2019-08-29

Similar Documents

Publication Publication Date Title
BR112019006979A2 (pt) sequência para sequenciar transformações para síntese de fala via redes neurais recorrentes
AR125775A2 (es) Procesador de datos de audio para decodificadores de audio y/o renderizadores y método para procesar datos de audio
GB2571651A (en) Systems for predictive data analytics, and related methods and apparatus
PH12019501895A1 (en) Constraining motion vector information derived by decoder-side motion vector derivation
BR112019013832A8 (pt) Restauração de vetor de movimento de lado de decodificador para codificação de vídeo
WO2018097693A3 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
EP4307676A3 (en) Method for coding image on basis of selective transform and device therefor
PH12018500600A1 (en) Method and apparatus for controlling audio frame loss concealment
NZ734339A (en) Voice recognition system and method of robot system
EA201791457A1 (ru) Улучшенные множественные преобразования для остатка предсказания
BR112018013677A2 (pt) proteínas de fusão de ligação de 41bb multivalentes e multiespecíficas
BR112012017145A2 (pt) codificação de vídeo usando detecção compressiva.
BR112017003887A2 (pt) ?codificador, decodificador e método para codificar e decodificar conteúdo de áudio com o uso de parâmetros para aprimorar uma ocultação?.
WO2017164645A3 (ko) 비디오 신호 부호화/복호화 방법 및 장치
EP3866470A4 (en) VIDEO ENCODING AND DECODING METHODS USING DIFFERENTIAL MOTION VECTOR VALUES, AND APPARATUS FOR ENCODING AND DECODING MOTION INFORMATION
NZ734552A (en) Motion vector derivation in video coding
BR112015016253A2 (pt) sinalização de informação de derivação de tique de relógio para temporização de vídeo em codificação de vídeo
BR112017025820A2 (pt) métodos para um codificador de video, um transcodificador de vídeo e um nó de processamento de vídeo, codificador de vídeo, transcodificador de vídeo, nó de processamento de vídeo, e, programa de computador
EP3905675A4 (en) METHOD AND APPARATUS FOR ENCODING MOTION VECTOR DIFFERENCES AND METHOD AND APPARATUS FOR DECODING MOTION VECTOR DIFFERENCES
EP3905694A4 (en) PREDICTION IMAGE GENERATING DEVICE, MOVING IMAGE DECODING DEVICE, MOVING IMAGE ENCODING DEVICE AND PREDICTION IMAGE GENERATING METHOD
BR112021021356A2 (pt) Movimento global para candidatos de modo de mesclagem em interprevisão
MX2019003587A (es) Metodos, dispositivos y corriente para codificar imagenes de movimiento compensado de rotacion global.
GB2543971A (en) Motion-compensated partitioning
EP3902261A4 (en) PREDICTION IMAGE GENERATING DEVICE, MOVING IMAGE DECODING DEVICE, MOVING IMAGE CODING METHOD AND PREDICTION IMAGE GENERATING METHOD
MY180722A (en) Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information

Legal Events

Date Code Title Description
B11A Dismissal acc. art.33 of ipl - examination not requested within 36 months of filing
B11Y Definitive dismissal - extension of time limit for request of examination expired [chapter 11.1.1 patent gazette]
B350 Update of information on the portal [chapter 15.35 patent gazette]