Janicki et al., 2005 - Google Patents

Reconstruction of Polish diacritics in a text-to-speech system.

Janicki et al., 2005

View PDF
Document ID
5282686764382462304
Author
Janicki A
Herman P
Publication year
Publication venue
INTERSPEECH

External Links

Snippet

This paper describes an approach to reconstruction of the Polish diacritic signs, needed eg in a speech synthesis system. Some telecommunication services (for example SMS transmission in GSM) remove diacritics from the text. Without them the text is usually still …
Continue reading at www.isca-archive.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • G06F17/2715Statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/289Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2872Rule based translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • G06F17/2217Character encodings
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Similar Documents

Publication Publication Date Title
EP3125235B1 (en) Learning templates generated from dialog transcripts
US7587308B2 (en) Word recognition using ontologies
Daland et al. Learning diphone‐based segmentation
JP2000353161A (en) Method and device for controlling style in generation of natural language
JP2001249922A (en) Word division system and device
CN111930914A (en) Question generation method and device, electronic equipment and computer-readable storage medium
Li et al. Normalization of Text Messages Using Character-and Phone-based Machine Translation Approaches.
Meylan et al. Word forms-not just their lengths-are optimized for efficient communication
Ajees et al. A named entity recognition system for Malayalam using neural networks
Rajendran et al. A robust syllable centric pronunciation model for Tamil text to speech synthesizer
Shahid et al. Next word prediction for Urdu language using deep learning models
CN113051388A (en) Intelligent question and answer method and device, electronic equipment and storage medium
Janicki et al. Reconstruction of Polish diacritics in a text-to-speech system.
Dutta Word-level language identification using subword embeddings for code-mixed Bangla-English social media data
CN115455912A (en) Text analysis method and device, electronic equipment and computer readable storage medium
KR100784730B1 (en) Method and apparatus for statistical HMM part-of-speech tagging without tagged domain corpus
CN116089601A (en) Dialogue abstract generation method, device, equipment and medium
CN111090720B (en) Hot word adding method and device
Nanayakkara et al. Context aware back-transliteration from english to sinhala
Hlaing et al. Myanmar Number Normalization for Text-to-Speech
CN112992128A (en) Training method, device and system for intelligent voice robot
Neme A fully inflected Arabic verb resource constructed from a lexicon of lemmas by using finite-state transducers
Wong et al. Linguistic and behavioural studies of Chinese chat language
CN111159360A (en) Method and device for obtaining query topic classification model and query topic classification
CN112580365A (en) Chapter analysis method, electronic device and storage device