US20100324895A1 - Synchronization for document narration - Google Patents

Synchronization for document narration Download PDF

Info

Publication number
US20100324895A1
US20100324895A1 US12/687,240 US68724010A US2010324895A1 US 20100324895 A1 US20100324895 A1 US 20100324895A1 US 68724010 A US68724010 A US 68724010A US 2010324895 A1 US2010324895 A1 US 2010324895A1
Authority
US
United States
Prior art keywords
text
portions
expected
recognized
elapsed time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/687,240
Inventor
Raymond C. Kurzweil
Paul Albrecht
Peter Chapman
Lucy Gibson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EM ACQUISITION CORP Inc
Original Assignee
K NFB READING Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by K NFB READING Tech Inc filed Critical K NFB READING Tech Inc
Priority to US12/687,240 priority Critical patent/US20100324895A1/en
Priority to PCT/US2010/021104 priority patent/WO2010083354A1/en
Assigned to K-NFB READING TECHNOLOGY, INC. reassignment K-NFB READING TECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALBRECHT, PAUL, CHAPMAN, PETER, KURZWEIL, RAYMOND C., GIBSON, LUCY
Publication of US20100324895A1 publication Critical patent/US20100324895A1/en
Assigned to K-NFB HOLDING TECHNOLOGY, INC. reassignment K-NFB HOLDING TECHNOLOGY, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: K-NFB READING TECHNOLOGY, INC.
Assigned to K-NFB READING TECHNOLOGY, INC. reassignment K-NFB READING TECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: K-NFB HOLDING TECHNOLOGY, INC.
Assigned to FISH & RICHARDSON P.C. reassignment FISH & RICHARDSON P.C. LIEN (SEE DOCUMENT FOR DETAILS). Assignors: K-NFB HOLDING TECHNOLOGY, IMC.
Assigned to DIMENSIONAL STACK ASSETS LLC reassignment DIMENSIONAL STACK ASSETS LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: K-NFB READING TECHNOLOGY, INC.
Assigned to EM ACQUISITION CORP., INC. reassignment EM ACQUISITION CORP., INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIMENSIONAL STACK ASSETS, LLC
Assigned to DIMENSIONAL STACK ASSETS LLC reassignment DIMENSIONAL STACK ASSETS LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: FISH & RICHARDSON P.C.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Definitions

  • This invention relates generally to educational and entertainment tools and more particularly to techniques and systems which are used to provide a narration of a text.
  • a computer system used for artificial production of human speech can be called a speech synthesizer.
  • One type of speech synthesizer is text-to-speech (TTS) system which converts normal language text into speech.
  • TTS text-to-speech
  • a computer implemented method includes applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized words in the audio recording.
  • the method also includes determining by the one or more computer systems an elapsed time period from the start of the audio recording to each word in the sequence of words in the audio recording.
  • the method also includes comparing by the one or more computer systems the words in the text version of the recognized words in the audio recording to the words in a sequence of expected words.
  • the method also includes generating by the one or more computer systems a word timing file comprising the elapsed time information for each word in the sequence of expected words by outputting the elapsed time information for a particular word into the word timing file if the recognized word in the text version of the recognized words matches the expected word and correcting a particular word and associating the elapsed time information with the particular word if the particular word in the text version of the recognized words does not match the expected word.
  • Embodiments may also include devices, software, components, and/or systems to perform any features described herein.
  • FIG. 1 is a block diagram of a system for producing speech-based output from text.
  • FIG. 2 is a screenshot depicting text.
  • FIG. 3 is a screenshot of text that includes highlighting of portions of the text based on a narration voice.
  • FIG. 4 is a flow chart of a voice painting process.
  • FIG. 5 is a screenshot of a character addition process.
  • FIG. 6 is a flow chart of a character addition process.
  • FIG. 7 is a diagram of text with tagged narration data.
  • FIG. 8 is a screenshot of text with tagged narration information.
  • FIG. 9 is a diagram of text with highlighting.
  • FIG. 10 is a flow chart of a synchronization process.
  • FIG. 11 is a screenshot of a book view of text.
  • FIG. 12 is a screenshot of text.
  • FIG. 13 is a screenshot of text.
  • a system 10 for producing speech-based output from text is shown to include a computer 12 .
  • the computer 12 is generally a personal computer or can alternatively be another type of device, e.g., a cellular phone that includes a processor (e.g., CPU). Examples of such cell-phones include an iPhone® (Apple, Inc.). Other devices include an iPod® (Apple, Inc.), a handheld personal digital assistant, a tablet computer, a digital camera, an electronic book reader, etc.
  • the device includes a main memory and a cache memory and interface circuits, e.g., bus and I/O interfaces (not shown).
  • the computer system 12 includes a mass storage element 16 , here typically the hard drive associated with personal computer systems or other types of mass storage, Flash memory, ROM, PROM, etc.
  • the system 10 further includes a standard PC type keyboard 18 , a standard monitor 20 as well as speakers 22 , a pointing device such as a mouse and optionally a scanner 24 all coupled to various ports of the computer system 12 via appropriate interfaces and software drivers (not shown).
  • the computer system 12 can operate under a Microsoft Windows operating system although other systems could alternatively be used.
  • Narration software 30 controls the narration of an electronic document stored on the computer 12 (e.g., controls generation of speech and/or audio that is associated with (e.g., narrates) text in a document).
  • Narration software 30 includes an edit software 30 a that allows a user to edit a document and assign one or more voices or audio recordings to text (e.g., sequences of words) in the document and can include playback software 30 b that reads aloud the text from the document, as the text is displayed on the computer's monitor 20 during a playback mode.
  • Text is narrated by the narration software 30 using several possible technologies: text-to-speech (TTS); audio recording of speech; and possibly in combination with speech, audio recordings of music (e.g., background music) and sound effects (e.g., brief sounds such as gunshots, door slamming, tea kettle boiling, etc.).
  • the narration software 30 controls generation of speech, by controlling a particular computer voice (or audio recording) stored on the computer 12 , causing that voice to be rendered through the computer's speakers 22 .
  • Narration software often uses a text-to-speech (TTS) voice which artificially synthesizes a voice by converting normal language text into speech. TTS voices vary in quality and naturalness.
  • TTS voices are produced by synthesizing the sounds for speech using rules in a way which results in a voice that sounds artificial, and which some would describe as robotic.
  • Another way to produce TTS voices concatenates small parts of speech which were recorded from an actual person. This concatenated TTS sounds more natural.
  • Another way to narrate, other than TTS is play an audio recording of a person reading the text, such as, for example, a book on tape recording.
  • the audio recording may include more than one actor speaking, and may include other sounds besides speech, such as sound effects or background music.
  • the computer voices can be associated with different languages (e.g., English, French, Spanish, Cantonese, Japanese, etc).
  • the narration software 30 permits the user to select and optionally modify a particular voice model which defines and controls aspects of the computer voice, including for example, the speaking speed and volume.
  • the voice model includes the language of the computer voice.
  • the voice model may be selected from a database that includes multiple voice models to apply to selected portions of the document.
  • a voice model can have other parameters associated with it besides the voice itself and the language, speed and volume, including, for example, gender (male or female), age (e.g. child or adult), voice pitch, visual indication (such as a particular color of highlighting) of document text that is associated with this voice model, emotion (e.g. angry, sad, etc.), intensity (e.g. mumble, whisper, conversational, projecting voice as at a party, yell, shout).
  • the user can select different voice models to apply to different portions of text such that when the system 10 reads the text the different portions are read using the different voice models.
  • the system can also provide a visual indication, such as highlighting, of which portions are associated with which voice models in the electronic document.
  • text 50 is rendered on a user display 51 .
  • the text 50 includes only words and does not include images. However, in some examples, the text could include portions that are composed of images and portions that are composed of words.
  • the text 50 is a technical paper, namely, “The Nature and Origin of Instructional Objects.” Exemplary texts include but not limited to electronic versions of books, word processor documents, PDF files, electronic versions of newspapers, magazines, fliers, pamphlets, menus, scripts, plays, and the like.
  • the system 10 can read the text using one or more stored voice models. In some examples, the system 10 reads different portions of the text 50 using different voice models.
  • the text includes multiple characters
  • a listener may find listening to the text more engaging if different voices are used for each of the characters in the text rather than using a single voice for the entire narration of the text.
  • extremely important or key points could be emphasized by using a different voice model to recite those portions of the text.
  • a “character” refers to an entity and is typically stored as a data structure or file, etc. on computer storage media and includes a graphical representation, e.g., picture, animation, or another graphical representation of the entity and which may in some embodiments be associated with a voice model.
  • a “mood” refers to an instantiation of a voice model according to a particular “mood attribute” that is desired for the character.
  • a character can have multiple associated moods.
  • “Mood attributes” can be various attributes of a character.
  • one attribute can be “normal,” other attributes include “happy,” “sad,” “tired,” “energetic,” “fast talking,” “slow talking,” “native language,” “foreign language,” “hushed voice “loud voice,” etc.
  • Mood attributes can include varying features such as speed of playback, volumes, pitch, etc. or can be the result of recording different voices corresponding to the different moods.
  • Homer Simpson the character includes a graphical depiction of Homer Simpson and a voice model that replicates a voice associated with Homer Simpson.
  • Homer Simpson can have various moods, (flavors or instantiations of voice models of Homer Simpson) that emphasize one or more attributes of the voice for the different moods. For example, one passage of text can be associated with a “sad” Homer Simpson voice model, whereas another a “happy” Homer Simpson voice model and a third with a “normal” Homer Simpson voice model.
  • the text 50 is rendered on a user display 51 with the addition of a visual indicium (e.g., highlighting) on different portions of the text (e.g., portions 52 , 53 , and 54 ).
  • the visual indicium (or lack of a indicium) indicates portions of the text that have been associated with a particular character or voice model.
  • the visual indicium is in the form of, for example, a semi-transparent block of color over portions of the text, a highlighting, a different color of the text, a different font for the text, underlining, italicizing, or other visual indications (indicia) to emphasize different portions of the text.
  • portions 52 and 54 are highlighted in a first color while another portion 53 is not highlighted.
  • different voice models are applied to the different portions associated with different characters or voice models that are represented visually by the text having a particular visual indicia. For example, a first voice model will be used to read the first portions 52 and 54 while a second voice model (a different voice model) will be used to read the portion 53 of the text.
  • text has some portions that have been associated with a particular character or voice model and others that have not. This is represented visually on the user interface as some portions exhibiting a visual indicium and others not exhibiting a visual indicium (e.g., the text includes some highlighted portions and some non-highlighted portions).
  • a default voice model can be used to provide the narration for the portions that have not been associated with a particular character or voice model (e.g., all non-highlighted portions). For example, in a typical story much of the text relates to describing the scene and not to actual words spoken by characters in the story. Such non-dialog portions of the text may remain non-highlighted and not associated with a particular character or voice model.
  • dialog portions can be read using the default voice (e.g., a narrator's voice) while the dialog portions may be associated with a particular character or voice model (and indicated by the highlighting) such that a different, unique voice is used for dialog spoken by each character in the story.
  • voice e.g., a narrator's voice
  • dialog portions may be associated with a particular character or voice model (and indicated by the highlighting) such that a different, unique voice is used for dialog spoken by each character in the story.
  • FIG. 3 also shows a menu 55 used for selection of portions of a text to be read using different voice models.
  • a user selects a portion of the text by using an input device such as a keyboard or mouse to select a portion of the text, or, on devices with a touchscreen, a finger or stylus pointing device may be used to select text.
  • a drop down menu 55 is generated that provides a list of the different available characters (e.g., characters 56 , 58 , and 60 ) that can be used for the narration.
  • a character need not be related directly to a particular character in a book or text, but rather provides a specification of the characteristics of a particular voice model that is associated with the character. For example, different characters may have male versus female voices, may speak in different languages or with different accents, may read more quickly or slowly, etc. The same character can be associated with multiple different texts and can be used to read portions of the different texts.
  • Each character 56 , 58 , and 60 is associated with a particular voice model and with additional characteristics of the reading style of the character such as language, volume, speed of narration.
  • additional characteristics of the reading style of the character such as language, volume, speed of narration.
  • the drop down menu includes a “clear annotation” button 62 that clears previously applied highlighting and returns the portion of text to non-highlighted such that it will be read by the Narrator rather than one of the characters.
  • the Narrator is a character whose initial voice is the computer's default voice, though this voice can be overridden by the user. All of the words in the document or text can initially all be associated with the Narrator. If a user selects text that is associated with the Narrator, the user can then perform an action (e.g. select from a menu) to apply another one of the characters for the selected portion of text. To return a previously highlighted portion to being read by the Narrator, the user can select the “clear annotation” button 62 .
  • the drop down menu 55 can include an image (e.g., images 57 , 59 , and 61 ) of the character.
  • one of the character voices can be similar to the voice of the Fox television cartoon character Homer Simpson (e.g., character 58 ), an image of Homer Simpson (e.g., image 59 ) could be included in the drop down menu 55 .
  • Inclusion of the images is believed to make selection of the desired voice model to apply to different portions of the text more user friendly.
  • a process 100 for selecting different characters or voice models to be used when the system 10 reads a text is shown.
  • the system 10 displays 102 the text on a user interface.
  • the system 10 receives 104 a selection of a portion of the text and displays 106 a menu of available characters each associated with a particular voice model.
  • the system receives 108 the user selected character and associates the selected portion of the text with the voice model for the character.
  • the system 10 also generates a highlight 110 or generates some other type of visual indication to apply to that the portion of the text and indicate that that portion of text is associated with a particular voice model and will be read using the particular voice model when the user selects to hear a narration of the text.
  • the system 10 determines 112 if the user is making additional selections of portions of the text to associate with particular characters. If the user is making additional selections of portions of the text, the system returns to receiving 104 the user's selection of portions of the text, displays 106 the menu of available characters, receives a user selection and generates a visual indication to apply to a subsequent portion of text.
  • multiple different characters are associated with different voice models and a user associates different portions of the text with the different characters.
  • the characters are predefined and included in a database of characters having defined characteristics.
  • each character may be associated with a particular voice model that includes parameters such as a relative volume, and a reading speed.
  • the system 10 reads text having different portions associated with different characters, not only can the voice of the characters differ, but other narration characteristics such as the relative volume of the different characters and how quickly the characters read (e.g., how many words per minute) can also differ.
  • a character can be associated with multiple voice models. If a character is associated with multiple voice models, the character has multiple moods that can be selected by the user. Each mood has an associated (single) voice model. When the user selects a character the user also selects the mood for the character such that the appropriate voice model is chosen. For example, a character could have multiple moods in which the character speaks in a different language in each of the moods. In another example, a character could have multiple moods based on the type of voice or tone of voice to be used by the character. For example, a character could have a happy mood with an associated voice model and an angry mood using an angry voice with an associated angry voice model.
  • a character could have multiple moods based on a story line of a text.
  • the wolf character could have a wolf mood in which the wolf speaks in a typical voice for the wolf (using an associated voice model) and a grandma mood in which the wolf speaks in a voice imitating the grandmother (using an associated voice model).
  • FIG. 5 shows a screenshot of a user interface 120 on a user display 121 for enabling a user to view the existing characters and modify, delete, and/or generate a character.
  • a user With the interface, a user generates a cast of characters for the text. Once a character has been generated, the character will be available for associating with portions of the text (e.g., as discussed above).
  • a set of all available characters is displayed in a cast members window 122 .
  • the cast members window 122 includes three characters, a narrator 124 , Charlie Brown 126 , and Homer Simpson 128 . From the cast members window 122 the user can add a new character by selecting button 130 , modify an existing character by selecting button 132 , and/or delete a character by selecting button 134 .
  • the user interface for generating or modifying a voice model is presented as an edit cast member window 136 .
  • the character Charlie Brown has only one associated voice model to define the character's voice, volume and other parameters, but as previously discussed, a character could be associated with multiple voice models (not shown in FIG. 5 ).
  • the edit cast member window 136 includes an input portion 144 for receiving a user selection of a mood or character name.
  • the mood of Charlie Brown has been input into input portion 144 .
  • the character name can be associated with the story and/or associated with the voice model. For example, if the voice model emulates the voice of an elderly lady, the character could be named “grandma.”
  • the edit cast member window 136 also includes a portion 147 for selecting a voice to be associated with the character.
  • the system can include a drop down menu of available voices and the user can select a voice from the drop down menu of voices.
  • the portion 147 for selecting the voice can include an input block where the user can select and upload a file that includes the voice.
  • the edit cast member window 136 also includes a portion 145 for selecting the color or type of visual indicia to be applied to the text selected by a user to be read using the particular character.
  • the edit cast member window 136 also includes a portion 149 for selecting a volume for the narration by the character.
  • a sliding scale is presented and a user moves a slider on the sliding scale to indicate a relative increase or decrease in the volume of the narration by the corresponding character.
  • a drop down menu can include various volume options such as very soft, soft, normal, loud, very loud.
  • the edit cast member window 136 also includes a portion 146 for selecting a reading speed for the character.
  • the reading speed provides an average number of words per minute that the computer system will read at when the text is associated with the character. As such, the portion for selecting the reading speed modifies the speed at which the character reads.
  • the edit cast member window 136 also includes a portion 138 for associating an image with the character.
  • This image can be presented to the user when the user selects a portion of the text to associate with a character (e.g., as shown in FIG. 3 ).
  • the edit cast member window 136 can also include an input for selecting the gender of the character (e.g., as shown in block 140 ) and an input for selecting the age of the character (e.g., as shown in block 142 ).
  • Other attributes of the voice model can be modified in a similar manner.
  • a process 150 for generating elements of a character and its associated voice model are shown.
  • the system displays 152 a user interface for adding a character.
  • the user inputs information to define the character and its associated voice model. While this information is shown as being received in a particular order in the flow chart, other orders can be used. Additionally, the user may not provide each piece of information and the associated steps may be omitted from the process 150 .
  • the system receives 154 a user selection of a character name. For example, the user can type the character name into a text box on the user interface.
  • the voice can be an existing voice selected from a menu of available voices or can be a voice stored on the computer and uploaded at the time the character is generated.
  • the system also receives 160 a user selection of a volume for the character.
  • the volume will provide the relative volume of the character in comparison to a baseline volume.
  • the system also receives 162 a user selection of a speed for the character's reading. The speed will determine the average number of words per minute that the character will read when narrating a text.
  • the system stores 164 each of the inputs received from the user in a memory for later use. If the user does not provide one or more of the inputs, the system uses a default value for the input. For example, if the user does not provide a volume input, the system defaults to an average volume.
  • Different characters can be associated with voice models for different languages. For example, if a text included portions in two different languages, it can be beneficial to select portions of the text and have the system read the text in the first language using a first character with a voice model in the first language and read the portion in the second language using a second character with a voice model in the second language. In applications in which the system uses a text-to-speech application in combination with a stored voice model to produce computer generated speech, it can be beneficial for the voice models to be language specific in order for the computer to correctly pronounce and read the words in the text.
  • text can include a dialog between two different characters that speak in different languages.
  • the portions of the dialog spoken by a character in a first language e.g., English
  • a character (and associated voice model) that has a voice model associated with the first language (e.g., a character that speaks in English).
  • the portions of the dialog a second language e.g., Spanish
  • a character (and associated voice model) speaks in the second language (e.g., Spanish).
  • portions in the first language e.g., English
  • portions of the text in the second language e.g., Spanish
  • ESL second language
  • the portions of the ESL text written in English are associated with a character (and associated voice model) that is an English-speaking character.
  • the portions of the text in the foreign (non-English) language are associated with a character (and associated voice model) that is a character speaking the particular foreign language.
  • portions in English are read using a character with an English-speaking voice model and portions of the text in the foreign language are read using a character with a voice model associated with the foreign language.
  • a user selected portions of a text in a document to associate the text with a particular character such that the system would use the voice model for the character when reading that portion of the text
  • other techniques for associating portions of text with a particular character can be used.
  • the system could interpret text-based tags in a document as an indicator to associate a particular voice model with associated portions of text.
  • FIG. 7 a portion of an exemplary document rendered on a user display 171 that includes text based tags is shown.
  • the actors names are written inside square braces (using a technique that is common in theatrical play scripts).
  • Each line of text has a character name associated with the text.
  • the character name is set out from the text of the story or document with a set of brackets or other computer recognizable indicator such as the pound key, an asterisks, parenthesis, a percent sign, etc.
  • the first line 172 shown in document 170 includes the text “[Henry] Hi Sally!” and the second line 174 includes the text “[Sally] Hi Henry, how are you?” Henry and Sally are both characters in the story and character models can be generated to associate a voice model, volume, reading speed, etc. with the character, for example, using the methods described herein.
  • the computer system reads the text of document 170
  • the computer system recognizes the text in brackets, e.g., [Henry] and [Sally], as an indicator of the character associated with the following text and will not read the text included within the brackets.
  • the system will read the first line “Hi Sally!” using the voice model associated with Henry and will read the second line “Hi Henry, how are you?” using the voice model associated with Sally.
  • tags can be beneficial in some circumstances. For example, if a student is given an assignment to write a play for an English class, the student's work may go through multiple revisions with the teacher before reaching the final product. Rather than requiring the student to re-highlight the text each time a word is changed, using the tags allows the student to modify the text without affecting the character and voice model associated with the text. For example, in the text of FIG. 7 , if the last line was modified to read, “. . . schen you remembered to wear your gloves” from “. . .
  • a screenshot 180 rendered on a user display 181 of text that includes tagged portions associated with different characters is shown.
  • the character associated with a particular portion of the text is indicated in brackets preceding the text (e.g., as shown in bracketed text 182 , 184 and 186 ).
  • a story may include additional portions that are not to be read as part of the story. For example, in a play, stage motions or lighting cues may be included in the text but should not be spoken when the play is read. Such portions are skipped by the computer system when the computer system is reading the text.
  • a ‘skip’ indicator indicates portions of text that should not be read by the computer system.
  • a skip indicator 188 is used to indicate that the text “She leans back in her chair” should not be read.
  • the computer system automatically identifies text to be associated with different voice models. For example, the computer system can search the text of a document to identify portions that are likely to be quotes or dialog spoken by characters in the story. By determining text associated with dialog in the story, the computer system eliminates the need for the user to independently identify those portions.
  • the computer system searches the text of a story 200 (in this case the story of the Three Little Pigs) to identify the portions spoken by the narrator (e.g., the non-dialog portions).
  • the system associates all of the non-dialog portions with the voice model for the narrator as indicated by the highlighted portions 202 , 206 , and 210 .
  • the remaining dialog-based portions 204 , 208 , and 212 are associated with different characters and voice models by the user. By pre-identifying the portions 204 , 208 , and 212 for which the user should select a character, the computer system reduces the amount of time necessary to select and associate voice models with different portions of the story.
  • the computer system can step through each of the non-highlighted or non-associated portions and ask the user which character to associate with the quotation. For example, the computer system could recognize that the first portion 202 of the text shown in FIG. 9 is spoken by the narrator because the portion is not enclosed in quotations. When reaching the first set of quotations including the text “Please man give me that straw to build me a house,” the computer system could request an input from the user of which character to associate with the quotation. Such a process could continue until the entire text had been associated with different characters.
  • the system automatically selects a character to associate with each quotation based on the words of the text using a natural language process. For example, line 212 of the story shown in FIG. 9 recites “To which the pig answered ‘no, not by the hair of my chinny chin chin.” The computer system recognizes the quotation “no, not by the hair of my chinny chin chin” based on the text being enclosed in quotation marks. The system review the text leading up to or following the quotation for an indication of the speaker. In this example, the text leading up to the quotation states “To which the pig answered” as such, the system could recognize that the pig is the character speaking this quotation and associate the quotation with the voice model for the pig. In the event that the computer system selects the incorrect character, the user can modify the character selection using one or more of techniques described herein.
  • the voice models associated with the characters can be electronic Text-To-Speech (TTS) voice models.
  • TTS voices artificially produce a voice by converting normal text into speech.
  • the TTS voice models are customized based on a human voice to emulate a particular voice.
  • the voice models are actual human (as opposed to a computer) voices generated by a human specifically for a document, e.g., high quality audio versions of books and the like.
  • the quality of the speech from a human can be better than the quality of a computer generated, artificially produced voice. While the system narrates text out loud and highlights each word being spoken, some users may prefer that the voice is recorded human speech, and not a computer voice.
  • the user can pre-highlight the text to be read by the person who is generating the speech and/or use speech recognition software to associate the words read by a user to the locations of the words in the text.
  • the computer system read the document pausing and highlighting the portions to be read by the individual. As the individual reads, the system records the audio. In another example, a list of all portions to be read by the individual can be extracted from the document and presented to the user. The user can then read each of the portions while the system records the audio and associates the audio with the correct portion of the text (e.g., by placing markers in an output file indicating a corresponding location in the audio file).
  • the system can provide a location at which the user should read and the system can record the audio and associate the text location with the location in the audio (e.g., by placing markers in the audio file indicating a corresponding location in the document).
  • the system synchronizes the highlighting (or other indicia) of each word as it is being spoken with an audio recording so that each word is highlighted or otherwise visually emphasized on a user interface as it is being spoken, in real time.
  • a process 230 for synchronizing the highlighting (or other visual indicia) of each word in an audio with a set of expected words so that each word is visually emphasized on a user interface as it is being spoken is shown.
  • the system processes 232 the audio recording using speech recognition process executed on a computer.
  • the system using the speech recognition process, generates 234 a time mark (e.g., an indication of an elapsed time period from the start of the audio recording to each word in the sequence of words) for each word and preferably, each syllable, that the speech recognition process recognizes.
  • the system using the speech recognition process, generates 236 an output file of each recognized word or syllable and the time it was recognized, relative to the start time of the recording (e.g., the elapsed time). Other parameters and measurements can be saved to the file.
  • the system compares 238 the words in the speech recognition output to the words in the original text (e.g., a set of expected words). The comparison process compares one word from the original text at a time.
  • Speech recognition is an imperfect process, so even with a high quality recording like an audio book, there may be errors of recognition.
  • the system determines whether the word in the speech recognition output matches (e.g., is the same as) the word in the original text. If the word from the original text matches the recognized word, the word is output 240 with the time of recognition to a word timing file. If the words do not match, the system applies 242 a correcting process to find (or estimate) a timing for the original word.
  • the system determines 244 if there are additional words in the original text, and if so, returns to determining 238 whether the word in the speech recognition output matches (e.g., is the same as) the word in the original text. If not, the system ends 246 the synchronization process.
  • the correcting process can use a number of methods to find the correct timing from the speech recognition process or to estimate a timing for the word. For example, the correcting process can iteratively compare the next words until it finds a match between the original text and the recognized text, which leaves it with a known length of mis-matched words. The correcting process can, for example, interpolate the times to get a time that is in-between the first matched word and the last matched word in this length of mis-matched words. Alternatively, if the number of syllables matches in the length of mis-matched words, the correcting process assumes the syllable timings are correct, and sets the timing of the first mis-matched word according to the number of syllables. For example, if the mis-matched word has 3 syllables, the time of that word can be associated with the time from the 3 rd syllable in the recognized text.
  • Another technique involves using linguistic metrics based on measurements of the length of time to speak certain words, syllables, letters and other parts of speech. These metrics can be applied to the original word to provide an estimate for the time needed to speak that word.
  • a word timing indicator can be produced by close integration with a speech recognizer.
  • Speech recognition is a complex process which generates many internal measurements, variables and hypotheses. Using these very detailed speech recognition measurements in conjunction with the original text (the text that is known to be speaking) could produce highly accurate hypotheses about the timing of each word.
  • the techniques described above could be used, but with the additional information from the speech recognition engine, better results could be achieved.
  • the old speech recognition engine would be part of the new word timing indicator.
  • determining the timings of each word could be facilitated by a software tool that provides a user with a visual display of the recognized words, the timings, the original words and other information, preferably in a timeline display. The user would be able to quickly make an educated guess as to the timings of each word using the information on this display.
  • This software tool provides the user with an interface for the user to indicate which word should be associated with which timing, and to otherwise manipulate and correct the word timing file.
  • association between the location in the audio file and the location in the document can be used.
  • such an association could be stored in a separate file from both the audio file and the document, in the audio file itself, and/or in the document.
  • a second type of highlighting is displayed by the system during playback or reading of a text in order to annotate the text and provide a reading location for the user.
  • This playback highlighting occurs in a playback mode of the system and is distinct from the highlighting that occurs when a user selects text, or the voice painting highlighting that occurs in an editing mode used to highlight sections of the text according to an associated voice model.
  • this playback mode for example, as the system reads the text (e.g., using a TTS engine or by playing stored audio), the system tracks the location in the text of the words currently being spoken or produced.
  • the system highlights or applies another visual indicia (e.g., bold font, italics, underlining, a moving ball or other pointer, change in font color) on a user interface to allow a user to more easily read along with the system.
  • another visual indicia e.g., bold font, italics, underlining, a moving ball or other pointer, change in font color
  • One example of a useful playback highlighting mode is to highlight each word (and only that word) as it is being spoken by the computer voice.
  • the system plays back and reads aloud any text in the document, including, for example, the main story of a book, footnotes, chapter titles and also user-generated text notes that the system allows the user to type in. However, as noted herein, some sections or portions of text may be skipped, for example, the character names inside text tags, text indicated by use of the skip indicator, and other types of text as allowed by the system.
  • the text can be rendered as a single document with a scroll bar or page advance button to view portions of the text that do not fit on a current page view, for example, text such as a word processor (e.g., Microsoft Word), document, a PDF document, or other electronic document.
  • a word processor e.g., Microsoft Word
  • the two-dimensional text can be used to generate a simulated three-dimensional book view as shown in FIG. 11 .
  • a text that includes multiple pages can be formatted into the book view shown in FIG. 11 where two pages are arranged side-by-side and the pages are turned to reveal two new pages. Highlighting and association of different characters and voice models with different portions of the text can be used with both standard and book-view texts.
  • the computer system includes page turn indicators which synchronize the turning of the page in the electronic book with the reading of the text in the electronic book.
  • the computer system uses the page break indicators in the two-dimensional document to determine the locations of the breaks between the pages. Page turn indicators are added to every other page of the book view.
  • a user may desire to share a document with the associated characters and voice models with another individual.
  • the associations of a particular character with portions of a document and the character models for a particular document are stored with the document.
  • the associations between the assigned characters and different portions of the text are already included with the document.
  • Text-To-Speech (TTS) voice models associated with each character can be very large (e.g., from 15-250 Megabytes) and it may be undesirable to send the entire voice model with the document, especially if a document uses multiple voice models.
  • the voice model in order to eliminate the need to provide the voice model, the voice model is noted in the character definition and the system looks for the same voice model on the computer of the person receiving the document. If the voice model is available on the person's computer, the voice model is used. If the voice model is not available on the computer, metadata related to the original voice model such as gender, age, ethnicity, and language are used to select a different available voice model that is similar to the previously used voice model.
  • a subset of words e.g., a subset of TTS generated words or a subset of the stored digitized audio of the human voice model
  • the subset of words can be sent with the document where the subset of words includes only the words that are included in the documents. Because the number of unique words in a document is typically substantially less than all of the words in the English language, this can significantly reduce the size of the voice files sent to the recipient.
  • the TTS engine For example, if a TTS speech generator is used, the TTS engine generates audio files (e.g., wave files) for words and those audio files are stored with the text so that it is not necessary to have the TTS engine installed on a machine to read the text.
  • the number of audio files stored with the text can vary, for example, a full dictionary of audio files can be stored. In another example, only the unique audio files associated with words in the text are stored with the text. This allows the amount of memory necessary to store the audio files to be substantially less than if all words are stored.
  • only a subset of the voice models are sent to the recipient. For example, it might be assumed that the recipient will have at least one acceptable voice model installed on their computer. This voice model could be used for the narrator and only the voice models or the recorded speech for the characters other than the narrator would need to be sent to the recipient.
  • a user in addition to associating voice models to read various portions of the text, can additionally associate sound effects with different portions of the text. For example, a user can select a particular place within the text at which a sound effect should occur and/or can select a portion of the text during which a particular sound effect such as music should be played. For example, if a script indicates that eerie music plays, a user can select those portions of the text and associate a music file (e.g., a wave file) of eerie music with the text.
  • a music file e.g., a wave file
  • the systems and methods described herein can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, web-enabled applications, or in combinations thereof. Data structures used to represent information can be stored in memory and in persistent storage. Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor and method actions can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output.
  • the invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.
  • Each computer program can be implemented in a high-level procedural or object oriented programming language, or in assembly or machine language if desired, and in any case, the language can be a compiled or interpreted language.
  • Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory.
  • a computer will include one or more mass storage devices for storing data files, such devices include magnetic disks, such as internal hard disks and removable disks magneto-optical disks and optical disks.
  • Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including, by way of example, semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as, internal hard disks and removable disks; magneto-optical disks; and CD_ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
  • ASICs application-specific integrated circuits

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Disclosed are techniques and systems for synchronizing an audio file with a sequence of words displayed on a user interface.

Description

  • This application claims priority from and incorporates herein U.S. Provisional Application No. 61/144,947, filed Jan. 15, 2009, and titled “SYSTEMS AND METHODS FOR SELECTION OF MULTIPLE VOICES FOR DOCUMENT NARRATION” and U.S. Provisional Application No. 61/165,963, filed Apr. 2, 2009, and titled “SYSTEMS AND METHODS FOR SELECTION OF MULTIPLE VOICES FOR DOCUMENT NARRATION.”
  • BACKGROUND
  • This invention relates generally to educational and entertainment tools and more particularly to techniques and systems which are used to provide a narration of a text.
  • Recent advances in computer technology and computer based speech synthesis have opened various possibilities for the artificial production of human speech. A computer system used for artificial production of human speech can be called a speech synthesizer. One type of speech synthesizer is text-to-speech (TTS) system which converts normal language text into speech.
  • SUMMARY
  • Educational and entertainment tools and more particularly techniques and systems which are used to provide a narration of a text are described herein.
  • Systems, software and methods enabling a user to select different voice models to apply to different portions of text such that when the system reads the text the different portions are read using the different voice models are described herein.
  • In some aspects, a computer implemented method includes applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized words in the audio recording. The method also includes determining by the one or more computer systems an elapsed time period from the start of the audio recording to each word in the sequence of words in the audio recording. The method also includes comparing by the one or more computer systems the words in the text version of the recognized words in the audio recording to the words in a sequence of expected words. The method also includes generating by the one or more computer systems a word timing file comprising the elapsed time information for each word in the sequence of expected words by outputting the elapsed time information for a particular word into the word timing file if the recognized word in the text version of the recognized words matches the expected word and correcting a particular word and associating the elapsed time information with the particular word if the particular word in the text version of the recognized words does not match the expected word. Embodiments may also include devices, software, components, and/or systems to perform any features described herein.
  • The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a system for producing speech-based output from text.
  • FIG. 2 is a screenshot depicting text.
  • FIG. 3 is a screenshot of text that includes highlighting of portions of the text based on a narration voice.
  • FIG. 4 is a flow chart of a voice painting process.
  • FIG. 5 is a screenshot of a character addition process.
  • FIG. 6 is a flow chart of a character addition process.
  • FIG. 7 is a diagram of text with tagged narration data.
  • FIG. 8 is a screenshot of text with tagged narration information.
  • FIG. 9 is a diagram of text with highlighting.
  • FIG. 10 is a flow chart of a synchronization process.
  • FIG. 11 is a screenshot of a book view of text.
  • FIG. 12 is a screenshot of text.
  • FIG. 13 is a screenshot of text.
  • DETAILED DESCRIPTION
  • Referring now to FIG. 1, a system 10 for producing speech-based output from text is shown to include a computer 12. The computer 12 is generally a personal computer or can alternatively be another type of device, e.g., a cellular phone that includes a processor (e.g., CPU). Examples of such cell-phones include an iPhone® (Apple, Inc.). Other devices include an iPod® (Apple, Inc.), a handheld personal digital assistant, a tablet computer, a digital camera, an electronic book reader, etc. In addition to a processor, the device includes a main memory and a cache memory and interface circuits, e.g., bus and I/O interfaces (not shown). The computer system 12 includes a mass storage element 16, here typically the hard drive associated with personal computer systems or other types of mass storage, Flash memory, ROM, PROM, etc.
  • The system 10 further includes a standard PC type keyboard 18, a standard monitor 20 as well as speakers 22, a pointing device such as a mouse and optionally a scanner 24 all coupled to various ports of the computer system 12 via appropriate interfaces and software drivers (not shown). The computer system 12 can operate under a Microsoft Windows operating system although other systems could alternatively be used.
  • Resident on the mass storage element 16 is narration software 30 that controls the narration of an electronic document stored on the computer 12 (e.g., controls generation of speech and/or audio that is associated with (e.g., narrates) text in a document). Narration software 30 includes an edit software 30 a that allows a user to edit a document and assign one or more voices or audio recordings to text (e.g., sequences of words) in the document and can include playback software 30 b that reads aloud the text from the document, as the text is displayed on the computer's monitor 20 during a playback mode.
  • Text is narrated by the narration software 30 using several possible technologies: text-to-speech (TTS); audio recording of speech; and possibly in combination with speech, audio recordings of music (e.g., background music) and sound effects (e.g., brief sounds such as gunshots, door slamming, tea kettle boiling, etc.). The narration software 30 controls generation of speech, by controlling a particular computer voice (or audio recording) stored on the computer 12, causing that voice to be rendered through the computer's speakers 22. Narration software often uses a text-to-speech (TTS) voice which artificially synthesizes a voice by converting normal language text into speech. TTS voices vary in quality and naturalness. Some TTS voices are produced by synthesizing the sounds for speech using rules in a way which results in a voice that sounds artificial, and which some would describe as robotic. Another way to produce TTS voices concatenates small parts of speech which were recorded from an actual person. This concatenated TTS sounds more natural. Another way to narrate, other than TTS, is play an audio recording of a person reading the text, such as, for example, a book on tape recording. The audio recording may include more than one actor speaking, and may include other sounds besides speech, such as sound effects or background music. Additionally, the computer voices can be associated with different languages (e.g., English, French, Spanish, Cantonese, Japanese, etc).
  • In addition, the narration software 30 permits the user to select and optionally modify a particular voice model which defines and controls aspects of the computer voice, including for example, the speaking speed and volume. The voice model includes the language of the computer voice. The voice model may be selected from a database that includes multiple voice models to apply to selected portions of the document. A voice model can have other parameters associated with it besides the voice itself and the language, speed and volume, including, for example, gender (male or female), age (e.g. child or adult), voice pitch, visual indication (such as a particular color of highlighting) of document text that is associated with this voice model, emotion (e.g. angry, sad, etc.), intensity (e.g. mumble, whisper, conversational, projecting voice as at a party, yell, shout). The user can select different voice models to apply to different portions of text such that when the system 10 reads the text the different portions are read using the different voice models. The system can also provide a visual indication, such as highlighting, of which portions are associated with which voice models in the electronic document.
  • Referring to FIG. 2, text 50 is rendered on a user display 51. As shown, the text 50 includes only words and does not include images. However, in some examples, the text could include portions that are composed of images and portions that are composed of words. The text 50 is a technical paper, namely, “The Nature and Origin of Instructional Objects.” Exemplary texts include but not limited to electronic versions of books, word processor documents, PDF files, electronic versions of newspapers, magazines, fliers, pamphlets, menus, scripts, plays, and the like. The system 10 can read the text using one or more stored voice models. In some examples, the system 10 reads different portions of the text 50 using different voice models. For example, if the text includes multiple characters, a listener may find listening to the text more engaging if different voices are used for each of the characters in the text rather than using a single voice for the entire narration of the text. In another example, extremely important or key points could be emphasized by using a different voice model to recite those portions of the text.
  • As used herein a “character” refers to an entity and is typically stored as a data structure or file, etc. on computer storage media and includes a graphical representation, e.g., picture, animation, or another graphical representation of the entity and which may in some embodiments be associated with a voice model. A “mood” refers to an instantiation of a voice model according to a particular “mood attribute” that is desired for the character. A character can have multiple associated moods. “Mood attributes” can be various attributes of a character. For instance, one attribute can be “normal,” other attributes include “happy,” “sad,” “tired,” “energetic,” “fast talking,” “slow talking,” “native language,” “foreign language,” “hushed voice “loud voice,” etc. Mood attributes can include varying features such as speed of playback, volumes, pitch, etc. or can be the result of recording different voices corresponding to the different moods.
  • For example, for a character, “Homer Simpson” the character includes a graphical depiction of Homer Simpson and a voice model that replicates a voice associated with Homer Simpson. Homer Simpson can have various moods, (flavors or instantiations of voice models of Homer Simpson) that emphasize one or more attributes of the voice for the different moods. For example, one passage of text can be associated with a “sad” Homer Simpson voice model, whereas another a “happy” Homer Simpson voice model and a third with a “normal” Homer Simpson voice model.
  • Referring to FIG. 3, the text 50 is rendered on a user display 51 with the addition of a visual indicium (e.g., highlighting) on different portions of the text (e.g., portions 52, 53, and 54). The visual indicium (or lack of a indicium) indicates portions of the text that have been associated with a particular character or voice model. The visual indicium is in the form of, for example, a semi-transparent block of color over portions of the text, a highlighting, a different color of the text, a different font for the text, underlining, italicizing, or other visual indications (indicia) to emphasize different portions of the text. For example, in text 50 portions 52 and 54 are highlighted in a first color while another portion 53 is not highlighted. When the system 10 generates the narration of the text 50, different voice models are applied to the different portions associated with different characters or voice models that are represented visually by the text having a particular visual indicia. For example, a first voice model will be used to read the first portions 52 and 54 while a second voice model (a different voice model) will be used to read the portion 53 of the text.
  • In some examples, text has some portions that have been associated with a particular character or voice model and others that have not. This is represented visually on the user interface as some portions exhibiting a visual indicium and others not exhibiting a visual indicium (e.g., the text includes some highlighted portions and some non-highlighted portions). A default voice model can be used to provide the narration for the portions that have not been associated with a particular character or voice model (e.g., all non-highlighted portions). For example, in a typical story much of the text relates to describing the scene and not to actual words spoken by characters in the story. Such non-dialog portions of the text may remain non-highlighted and not associated with a particular character or voice model. These portions can be read using the default voice (e.g., a narrator's voice) while the dialog portions may be associated with a particular character or voice model (and indicated by the highlighting) such that a different, unique voice is used for dialog spoken by each character in the story.
  • FIG. 3 also shows a menu 55 used for selection of portions of a text to be read using different voice models. A user selects a portion of the text by using an input device such as a keyboard or mouse to select a portion of the text, or, on devices with a touchscreen, a finger or stylus pointing device may be used to select text. Once the user has selected a portion of the text, a drop down menu 55 is generated that provides a list of the different available characters (e.g., characters 56, 58, and 60) that can be used for the narration. A character need not be related directly to a particular character in a book or text, but rather provides a specification of the characteristics of a particular voice model that is associated with the character. For example, different characters may have male versus female voices, may speak in different languages or with different accents, may read more quickly or slowly, etc. The same character can be associated with multiple different texts and can be used to read portions of the different texts.
  • Each character 56, 58, and 60 is associated with a particular voice model and with additional characteristics of the reading style of the character such as language, volume, speed of narration. By selecting (e.g., using a mouse or other input device to click on) a particular character 56, 58, or 60, the selected portion of the text is associated with the voice model for the character and will be read using the voice model associated with the character.
  • Additionally, the drop down menu includes a “clear annotation” button 62 that clears previously applied highlighting and returns the portion of text to non-highlighted such that it will be read by the Narrator rather than one of the characters. The Narrator is a character whose initial voice is the computer's default voice, though this voice can be overridden by the user. All of the words in the document or text can initially all be associated with the Narrator. If a user selects text that is associated with the Narrator, the user can then perform an action (e.g. select from a menu) to apply another one of the characters for the selected portion of text. To return a previously highlighted portion to being read by the Narrator, the user can select the “clear annotation” button 62.
  • In order to make selection of the character more user friendly, the drop down menu 55 can include an image (e.g., images 57, 59, and 61) of the character. For example, one of the character voices can be similar to the voice of the Fox television cartoon character Homer Simpson (e.g., character 58), an image of Homer Simpson (e.g., image 59) could be included in the drop down menu 55. Inclusion of the images is believed to make selection of the desired voice model to apply to different portions of the text more user friendly.
  • Referring to FIG. 4 a process 100 for selecting different characters or voice models to be used when the system 10 reads a text is shown. The system 10 displays 102 the text on a user interface. In response to a user selection, the system 10 receives 104 a selection of a portion of the text and displays 106 a menu of available characters each associated with a particular voice model. In response to a user selecting a particular character (e.g., by clicking on the character from the menu), the system receives 108 the user selected character and associates the selected portion of the text with the voice model for the character. The system 10 also generates a highlight 110 or generates some other type of visual indication to apply to that the portion of the text and indicate that that portion of text is associated with a particular voice model and will be read using the particular voice model when the user selects to hear a narration of the text. The system 10 determines 112 if the user is making additional selections of portions of the text to associate with particular characters. If the user is making additional selections of portions of the text, the system returns to receiving 104 the user's selection of portions of the text, displays 106 the menu of available characters, receives a user selection and generates a visual indication to apply to a subsequent portion of text.
  • As described above, multiple different characters are associated with different voice models and a user associates different portions of the text with the different characters. In some examples, the characters are predefined and included in a database of characters having defined characteristics. For example, each character may be associated with a particular voice model that includes parameters such as a relative volume, and a reading speed. When the system 10 reads text having different portions associated with different characters, not only can the voice of the characters differ, but other narration characteristics such as the relative volume of the different characters and how quickly the characters read (e.g., how many words per minute) can also differ.
  • In some embodiments, a character can be associated with multiple voice models. If a character is associated with multiple voice models, the character has multiple moods that can be selected by the user. Each mood has an associated (single) voice model. When the user selects a character the user also selects the mood for the character such that the appropriate voice model is chosen. For example, a character could have multiple moods in which the character speaks in a different language in each of the moods. In another example, a character could have multiple moods based on the type of voice or tone of voice to be used by the character. For example, a character could have a happy mood with an associated voice model and an angry mood using an angry voice with an associated angry voice model. In another example, a character could have multiple moods based on a story line of a text. For example, in the story of the Big Bad Wolf, the wolf character could have a wolf mood in which the wolf speaks in a typical voice for the wolf (using an associated voice model) and a grandma mood in which the wolf speaks in a voice imitating the grandmother (using an associated voice model).
  • FIG. 5 shows a screenshot of a user interface 120 on a user display 121 for enabling a user to view the existing characters and modify, delete, and/or generate a character. With the interface, a user generates a cast of characters for the text. Once a character has been generated, the character will be available for associating with portions of the text (e.g., as discussed above). A set of all available characters is displayed in a cast members window 122. In the example shown in FIG. 5, the cast members window 122 includes three characters, a narrator 124, Charlie Brown 126, and Homer Simpson 128. From the cast members window 122 the user can add a new character by selecting button 130, modify an existing character by selecting button 132, and/or delete a character by selecting button 134.
  • The user interface for generating or modifying a voice model is presented as an edit cast member window 136. In this example, the character Charlie Brown has only one associated voice model to define the character's voice, volume and other parameters, but as previously discussed, a character could be associated with multiple voice models (not shown in FIG. 5). The edit cast member window 136 includes an input portion 144 for receiving a user selection of a mood or character name. In this example, the mood of Charlie Brown has been input into input portion 144. The character name can be associated with the story and/or associated with the voice model. For example, if the voice model emulates the voice of an elderly lady, the character could be named “grandma.”
  • In another example, if the text which the user is working on is Romeo and Juliet, the user could name one of the characters Romeo and another Juliet and use those characters to narrate the dialog spoken by each of the characters in the play. The edit cast member window 136 also includes a portion 147 for selecting a voice to be associated with the character. For example, the system can include a drop down menu of available voices and the user can select a voice from the drop down menu of voices. In another example, the portion 147 for selecting the voice can include an input block where the user can select and upload a file that includes the voice. The edit cast member window 136 also includes a portion 145 for selecting the color or type of visual indicia to be applied to the text selected by a user to be read using the particular character. The edit cast member window 136 also includes a portion 149 for selecting a volume for the narration by the character.
  • As shown in FIG. 5, a sliding scale is presented and a user moves a slider on the sliding scale to indicate a relative increase or decrease in the volume of the narration by the corresponding character. In some additional examples, a drop down menu can include various volume options such as very soft, soft, normal, loud, very loud. The edit cast member window 136 also includes a portion 146 for selecting a reading speed for the character. The reading speed provides an average number of words per minute that the computer system will read at when the text is associated with the character. As such, the portion for selecting the reading speed modifies the speed at which the character reads. The edit cast member window 136 also includes a portion 138 for associating an image with the character. This image can be presented to the user when the user selects a portion of the text to associate with a character (e.g., as shown in FIG. 3). The edit cast member window 136 can also include an input for selecting the gender of the character (e.g., as shown in block 140) and an input for selecting the age of the character (e.g., as shown in block 142). Other attributes of the voice model can be modified in a similar manner.
  • Referring to FIG. 6, a process 150 for generating elements of a character and its associated voice model are shown. The system displays 152 a user interface for adding a character. The user inputs information to define the character and its associated voice model. While this information is shown as being received in a particular order in the flow chart, other orders can be used. Additionally, the user may not provide each piece of information and the associated steps may be omitted from the process 150.
  • After displaying the user interface for adding a character, the system receives 154 a user selection of a character name. For example, the user can type the character name into a text box on the user interface. The system also receives 156 a user selection of a computer voice to associate with the character. The voice can be an existing voice selected from a menu of available voices or can be a voice stored on the computer and uploaded at the time the character is generated. The system also receives 158 a user selection of a type of visual indicia or color for highlighting the text in the document when the text is associated with the character. For example, the visual indicium or color can be selected from a list of available colors which have not been previously associated with another character. The system also receives 160 a user selection of a volume for the character. The volume will provide the relative volume of the character in comparison to a baseline volume. The system also receives 162 a user selection of a speed for the character's reading. The speed will determine the average number of words per minute that the character will read when narrating a text. The system stores 164 each of the inputs received from the user in a memory for later use. If the user does not provide one or more of the inputs, the system uses a default value for the input. For example, if the user does not provide a volume input, the system defaults to an average volume.
  • Different characters can be associated with voice models for different languages. For example, if a text included portions in two different languages, it can be beneficial to select portions of the text and have the system read the text in the first language using a first character with a voice model in the first language and read the portion in the second language using a second character with a voice model in the second language. In applications in which the system uses a text-to-speech application in combination with a stored voice model to produce computer generated speech, it can be beneficial for the voice models to be language specific in order for the computer to correctly pronounce and read the words in the text.
  • For example, text can include a dialog between two different characters that speak in different languages. In this example, the portions of the dialog spoken by a character in a first language (e.g., English) are associated with a character (and associated voice model) that has a voice model associated with the first language (e.g., a character that speaks in English). Additionally, the portions of the dialog a second language (e.g., Spanish) are associated with a character (and associated voice model) speaks in the second language (e.g., Spanish). As such, when the system reads the text, portions in the first language (e.g., English) are read using the character with an English-speaking voice model and portions of the text in the second language (e.g., Spanish) are read using a character with a Spanish-speaking voice model.
  • For example, different characters with voice models can be used to read an English as a second language (ESL) text in which it can be beneficial to read some of the portions using an English-speaking character and other portions using a foreign language-speaking character. In this application, the portions of the ESL text written in English are associated with a character (and associated voice model) that is an English-speaking character. Additionally, the portions of the text in the foreign (non-English) language are associated with a character (and associated voice model) that is a character speaking the particular foreign language. As such, when the system reads the text, portions in English are read using a character with an English-speaking voice model and portions of the text in the foreign language are read using a character with a voice model associated with the foreign language.
  • While in the examples described above, a user selected portions of a text in a document to associate the text with a particular character such that the system would use the voice model for the character when reading that portion of the text, other techniques for associating portions of text with a particular character can be used. For example, the system could interpret text-based tags in a document as an indicator to associate a particular voice model with associated portions of text.
  • Referring to FIG. 7, a portion of an exemplary document rendered on a user display 171 that includes text based tags is shown. Here, the actors names are written inside square braces (using a technique that is common in theatrical play scripts). Each line of text has a character name associated with the text. The character name is set out from the text of the story or document with a set of brackets or other computer recognizable indicator such as the pound key, an asterisks, parenthesis, a percent sign, etc. For example, the first line 172 shown in document 170 includes the text “[Henry] Hi Sally!” and the second line 174 includes the text “[Sally] Hi Henry, how are you?” Henry and Sally are both characters in the story and character models can be generated to associate a voice model, volume, reading speed, etc. with the character, for example, using the methods described herein. When the computer system reads the text of document 170, the computer system recognizes the text in brackets, e.g., [Henry] and [Sally], as an indicator of the character associated with the following text and will not read the text included within the brackets. As such, the system will read the first line “Hi Sally!” using the voice model associated with Henry and will read the second line “Hi Henry, how are you?” using the voice model associated with Sally.
  • Using the tags to indicate the character to associate with different portions of the text can be beneficial in some circumstances. For example, if a student is given an assignment to write a play for an English class, the student's work may go through multiple revisions with the teacher before reaching the final product. Rather than requiring the student to re-highlight the text each time a word is changed, using the tags allows the student to modify the text without affecting the character and voice model associated with the text. For example, in the text of FIG. 7, if the last line was modified to read, “. . . Hopefully you remembered to wear your gloves” from “. . . Hopefully you remembered to wear your hat.” Due to the preceding tag of ‘[Sally]’ the modified text would automatically be read using the voice model for Sally without requiring the user to take additional steps to have the word “gloves” read using the voice model for Sally.
  • Referring to FIG. 8, a screenshot 180 rendered on a user display 181 of text that includes tagged portions associated with different characters is shown. As described above, the character associated with a particular portion of the text is indicated in brackets preceding the text (e.g., as shown in bracketed text 182, 184 and 186). In some situations, a story may include additional portions that are not to be read as part of the story. For example, in a play, stage motions or lighting cues may be included in the text but should not be spoken when the play is read. Such portions are skipped by the computer system when the computer system is reading the text. A ‘skip’ indicator indicates portions of text that should not be read by the computer system. In the example shown in FIG. 8, a skip indicator 188 is used to indicate that the text “She leans back in her chair” should not be read.
  • While in the examples above, the user indicated portions of the text to be read using different voice models by either selecting the text or adding a tag to the text, in some examples the computer system automatically identifies text to be associated with different voice models. For example, the computer system can search the text of a document to identify portions that are likely to be quotes or dialog spoken by characters in the story. By determining text associated with dialog in the story, the computer system eliminates the need for the user to independently identify those portions.
  • Referring to FIG. 9, the computer system searches the text of a story 200 (in this case the story of the Three Little Pigs) to identify the portions spoken by the narrator (e.g., the non-dialog portions). The system associates all of the non-dialog portions with the voice model for the narrator as indicated by the highlighted portions 202, 206, and 210. The remaining dialog-based portions 204, 208, and 212 are associated with different characters and voice models by the user. By pre-identifying the portions 204, 208, and 212 for which the user should select a character, the computer system reduces the amount of time necessary to select and associate voice models with different portions of the story.
  • In some examples, the computer system can step through each of the non-highlighted or non-associated portions and ask the user which character to associate with the quotation. For example, the computer system could recognize that the first portion 202 of the text shown in FIG. 9 is spoken by the narrator because the portion is not enclosed in quotations. When reaching the first set of quotations including the text “Please man give me that straw to build me a house,” the computer system could request an input from the user of which character to associate with the quotation. Such a process could continue until the entire text had been associated with different characters.
  • In some additional examples, the system automatically selects a character to associate with each quotation based on the words of the text using a natural language process. For example, line 212 of the story shown in FIG. 9 recites “To which the pig answered ‘no, not by the hair of my chinny chin chin.” The computer system recognizes the quotation “no, not by the hair of my chinny chin chin” based on the text being enclosed in quotation marks. The system review the text leading up to or following the quotation for an indication of the speaker. In this example, the text leading up to the quotation states “To which the pig answered” as such, the system could recognize that the pig is the character speaking this quotation and associate the quotation with the voice model for the pig. In the event that the computer system selects the incorrect character, the user can modify the character selection using one or more of techniques described herein.
  • In some embodiments, the voice models associated with the characters can be electronic Text-To-Speech (TTS) voice models. TTS voices artificially produce a voice by converting normal text into speech. In some examples, the TTS voice models are customized based on a human voice to emulate a particular voice. In other examples, the voice models are actual human (as opposed to a computer) voices generated by a human specifically for a document, e.g., high quality audio versions of books and the like. For example, the quality of the speech from a human can be better than the quality of a computer generated, artificially produced voice. While the system narrates text out loud and highlights each word being spoken, some users may prefer that the voice is recorded human speech, and not a computer voice.
  • In order to efficiently record speech associated with a particular character, the user can pre-highlight the text to be read by the person who is generating the speech and/or use speech recognition software to associate the words read by a user to the locations of the words in the text. The computer system read the document pausing and highlighting the portions to be read by the individual. As the individual reads, the system records the audio. In another example, a list of all portions to be read by the individual can be extracted from the document and presented to the user. The user can then read each of the portions while the system records the audio and associates the audio with the correct portion of the text (e.g., by placing markers in an output file indicating a corresponding location in the audio file). Alternatively, the system can provide a location at which the user should read and the system can record the audio and associate the text location with the location in the audio (e.g., by placing markers in the audio file indicating a corresponding location in the document).
  • In “playback mode”, the system synchronizes the highlighting (or other indicia) of each word as it is being spoken with an audio recording so that each word is highlighted or otherwise visually emphasized on a user interface as it is being spoken, in real time. Referring to FIG. 10 a process 230 for synchronizing the highlighting (or other visual indicia) of each word in an audio with a set of expected words so that each word is visually emphasized on a user interface as it is being spoken is shown. The system processes 232 the audio recording using speech recognition process executed on a computer. The system, using the speech recognition process, generates 234 a time mark (e.g., an indication of an elapsed time period from the start of the audio recording to each word in the sequence of words) for each word and preferably, each syllable, that the speech recognition process recognizes. The system, using the speech recognition process, generates 236 an output file of each recognized word or syllable and the time it was recognized, relative to the start time of the recording (e.g., the elapsed time). Other parameters and measurements can be saved to the file. The system compares 238 the words in the speech recognition output to the words in the original text (e.g., a set of expected words). The comparison process compares one word from the original text at a time. Speech recognition is an imperfect process, so even with a high quality recording like an audio book, there may be errors of recognition. For each word, based on the comparison of the word in the speech recognition output to the expected word in the original text, the system determines whether the word in the speech recognition output matches (e.g., is the same as) the word in the original text. If the word from the original text matches the recognized word, the word is output 240 with the time of recognition to a word timing file. If the words do not match, the system applies 242 a correcting process to find (or estimate) a timing for the original word. The system determines 244 if there are additional words in the original text, and if so, returns to determining 238 whether the word in the speech recognition output matches (e.g., is the same as) the word in the original text. If not, the system ends 246 the synchronization process.
  • The correcting process can use a number of methods to find the correct timing from the speech recognition process or to estimate a timing for the word. For example, the correcting process can iteratively compare the next words until it finds a match between the original text and the recognized text, which leaves it with a known length of mis-matched words. The correcting process can, for example, interpolate the times to get a time that is in-between the first matched word and the last matched word in this length of mis-matched words. Alternatively, if the number of syllables matches in the length of mis-matched words, the correcting process assumes the syllable timings are correct, and sets the timing of the first mis-matched word according to the number of syllables. For example, if the mis-matched word has 3 syllables, the time of that word can be associated with the time from the 3rd syllable in the recognized text.
  • Another technique involves using linguistic metrics based on measurements of the length of time to speak certain words, syllables, letters and other parts of speech. These metrics can be applied to the original word to provide an estimate for the time needed to speak that word.
  • Alternatively, a word timing indicator can be produced by close integration with a speech recognizer. Speech recognition is a complex process which generates many internal measurements, variables and hypotheses. Using these very detailed speech recognition measurements in conjunction with the original text (the text that is known to be speaking) could produce highly accurate hypotheses about the timing of each word. The techniques described above could be used, but with the additional information from the speech recognition engine, better results could be achieved. The old speech recognition engine would be part of the new word timing indicator.
  • Additionally, methods of determining the timings of each word could be facilitated by a software tool that provides a user with a visual display of the recognized words, the timings, the original words and other information, preferably in a timeline display. The user would be able to quickly make an educated guess as to the timings of each word using the information on this display. This software tool provides the user with an interface for the user to indicate which word should be associated with which timing, and to otherwise manipulate and correct the word timing file.
  • Other associations between the location in the audio file and the location in the document can be used. For example, such an association could be stored in a separate file from both the audio file and the document, in the audio file itself, and/or in the document.
  • In some additional examples, a second type of highlighting, referred to herein as “playback highlighting,” is displayed by the system during playback or reading of a text in order to annotate the text and provide a reading location for the user. This playback highlighting occurs in a playback mode of the system and is distinct from the highlighting that occurs when a user selects text, or the voice painting highlighting that occurs in an editing mode used to highlight sections of the text according to an associated voice model. In this playback mode, for example, as the system reads the text (e.g., using a TTS engine or by playing stored audio), the system tracks the location in the text of the words currently being spoken or produced. The system highlights or applies another visual indicia (e.g., bold font, italics, underlining, a moving ball or other pointer, change in font color) on a user interface to allow a user to more easily read along with the system. One example of a useful playback highlighting mode is to highlight each word (and only that word) as it is being spoken by the computer voice. The system plays back and reads aloud any text in the document, including, for example, the main story of a book, footnotes, chapter titles and also user-generated text notes that the system allows the user to type in. However, as noted herein, some sections or portions of text may be skipped, for example, the character names inside text tags, text indicated by use of the skip indicator, and other types of text as allowed by the system.
  • In some examples, the text can be rendered as a single document with a scroll bar or page advance button to view portions of the text that do not fit on a current page view, for example, text such as a word processor (e.g., Microsoft Word), document, a PDF document, or other electronic document. In some additional examples, the two-dimensional text can be used to generate a simulated three-dimensional book view as shown in FIG. 11.
  • Referring to FIGS. 12 and 13, a text that includes multiple pages can be formatted into the book view shown in FIG. 11 where two pages are arranged side-by-side and the pages are turned to reveal two new pages. Highlighting and association of different characters and voice models with different portions of the text can be used with both standard and book-view texts. In the case of a book-view text, the computer system includes page turn indicators which synchronize the turning of the page in the electronic book with the reading of the text in the electronic book. In order to generate the book-view from a document such as Word or PDF document, the computer system uses the page break indicators in the two-dimensional document to determine the locations of the breaks between the pages. Page turn indicators are added to every other page of the book view.
  • A user may desire to share a document with the associated characters and voice models with another individual. In order to facilitate in such sharing, the associations of a particular character with portions of a document and the character models for a particular document are stored with the document. When another individual opens the document, the associations between the assigned characters and different portions of the text are already included with the document.
  • Text-To-Speech (TTS) voice models associated with each character can be very large (e.g., from 15-250 Megabytes) and it may be undesirable to send the entire voice model with the document, especially if a document uses multiple voice models. In some embodiments, in order to eliminate the need to provide the voice model, the voice model is noted in the character definition and the system looks for the same voice model on the computer of the person receiving the document. If the voice model is available on the person's computer, the voice model is used. If the voice model is not available on the computer, metadata related to the original voice model such as gender, age, ethnicity, and language are used to select a different available voice model that is similar to the previously used voice model.
  • In some additional examples, it can be beneficial to send all needed voice models with the document itself to reduce the likelihood that the recipient will not have appropriate voice models installed on their system to play the document. However, due to the size of the TTS voice models and of human voice-based voice models comprised of stored digitized audio, it can be prohibitive to send the entire voice model. As such, a subset of words (e.g., a subset of TTS generated words or a subset of the stored digitized audio of the human voice model) can be sent with the document where the subset of words includes only the words that are included in the documents. Because the number of unique words in a document is typically substantially less than all of the words in the English language, this can significantly reduce the size of the voice files sent to the recipient. For example, if a TTS speech generator is used, the TTS engine generates audio files (e.g., wave files) for words and those audio files are stored with the text so that it is not necessary to have the TTS engine installed on a machine to read the text. The number of audio files stored with the text can vary, for example, a full dictionary of audio files can be stored. In another example, only the unique audio files associated with words in the text are stored with the text. This allows the amount of memory necessary to store the audio files to be substantially less than if all words are stored. In other examples, where human voice-based voice models comprised of stored digitized audio are used to provide the narration of a text, either all of the words in the voice model can be stored with the text or only a subset of the words that appear in the text may be stored. Again, storing only the subset of words included in the text reduces the amount of memory needed to store the files.
  • In some additional examples, only a subset of the voice models are sent to the recipient. For example, it might be assumed that the recipient will have at least one acceptable voice model installed on their computer. This voice model could be used for the narrator and only the voice models or the recorded speech for the characters other than the narrator would need to be sent to the recipient.
  • In some additional examples, in addition to associating voice models to read various portions of the text, a user can additionally associate sound effects with different portions of the text. For example, a user can select a particular place within the text at which a sound effect should occur and/or can select a portion of the text during which a particular sound effect such as music should be played. For example, if a script indicates that eerie music plays, a user can select those portions of the text and associate a music file (e.g., a wave file) of eerie music with the text. When the system reads the story, in addition to reading the text using an associated voice model (based on voice model highlighting), the system also plays the eerie music (based on the sound effect highlighting).
  • The systems and methods described herein can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, web-enabled applications, or in combinations thereof. Data structures used to represent information can be stored in memory and in persistent storage. Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor and method actions can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. The invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object oriented programming language, or in assembly or machine language if desired, and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files, such devices include magnetic disks, such as internal hard disks and removable disks magneto-optical disks and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including, by way of example, semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as, internal hard disks and removable disks; magneto-optical disks; and CD_ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
  • A portion of the disclosure of this patent document contains material which is subject to copyright protection (e.g., the copyrighted names mentioned herein). This material and the characters used herein are for exemplary purposes only. The characters are owned by their respective copyright owners.
  • Other implementations are within the scope of the following claims:

Claims (19)

1. A computer implemented method comprising:
applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text;
determining by the one or more computer systems an elapsed time period from a reference time in the audio recording to each recognized portion in the audio recording;
comparing by the one or more computer systems the recognized portions of text to expected portions of text; and
generating by the one or more computer systems a timing file that is stored on a computer-readable storage medium, the timing file comprising the elapsed time information for each expected portion of text by:
storing the elapsed time information for a recognized portion into the timing file if the recognized portion matches the corresponding expected portion of text; and otherwise
computing the elapsed time information for the expected portion of text and storing the computed elapsed time information into the timing file if the recognized portion does not match the corresponding expected portion of text.
2. The method of claim 1, wherein the one or more recognized portions or expected portions of text comprise words.
3. The method of claim 1, further comprising,
during play back:
providing an audible output corresponding to the audio recording; and
displaying a sequence of words corresponding to at least a portion of the expected portion of text on a user interface rendered on a display device and providing visual indicia indicating a correspondence between the audio recording and the expected portion of text.
4. The method of claim 1 wherein one or more of the recognized portions or the expected portions of text are syllables.
5. The method of claim 1 wherein computing further comprises:
determining the number of syllables in the expected portion of text;
determining the elapsed time for the determined number of syllables in the recognized portion, and
outputting the determined elapsed time to the timing file.
6. The method of claim 1 wherein computing further comprises:
determining the elapsed time for an expected portion of text based on a metric associated with an expected length of time to verbalize the expected portion of text.
7. The method of claim 1 wherein computing comprises:
displaying on a user interface device, the recognized portions of text, the elapsed times, and the expected portions of text;
receiving from a user an indication of timings for the expected portions of text; and
storing elapsed time information in the timing file based on the received user indications.
8. A computer program product residing on a computer readable medium, the computer program product comprising instructions for causing a processor to:
apply speech recognition to an audio recording to generate a text version of recognized portions of text;
determine an elapsed time period from a reference time in the audio recording to each recognized portion in the audio recording;
generate a timing file that is stored on a computer-readable storage medium, the timing file comprising the elapsed time information for each expected portion of text by storing the elapsed time information for a recognized portion into the word timing file if the recognized portion matches the corresponding expected portion of text, and otherwise computing the elapsed time information for the expected portion of text and storing the computed elapsed time information into the timing file if the recognized portion does not match the expected portion of text.
9. The computer program product of claim 8, wherein the one or more recognized portions or portions of text comprise words.
10. The computer program product of claim 8 wherein the one or more of the recognized portions or portions of text comprise syllables.
11. The computer program product of claim 8, further comprising, during playback:
provide an audible output corresponding to the audio recording;
display a sequence of words corresponding to at least a portion of the expected portion of text on a user interface rendered on a display device; and
provide visual indicia indicating a correspondence between the portions in the audio recording and the expected portion of text.
12. The computer program product of claim 8 wherein the instructions to compute the elapsed time information further comprise instructions to:
determine the elapsed time for an expected portion of text based on a metric associated with an expected length of time to verbalize the expected portion of text.
13. The computer program product of claim 8 wherein the instructions to compute the elapsed time information comprise instructions to:
display on a user interface device, the recognized portions of text, the elapsed times, and the expected portions of text;
receive from a user an indication of timings for the expected portions of text; and
store elapsed time information in the timing file based on the received user indications.
14. A system comprising:
a memory; and
a computing device configured to:
apply speech recognition to an audio recording to generate a text version of recognized portions of text;
determine an elapsed time period from a reference time in the audio recording to each recognized portion in the audio recording version;
generate a timing file that is stored on a computer-readable storage medium, the timing file comprising the elapsed time information for each expected portion of text by storing the elapsed time information for a recognized portion into the timing file if the recognized portion matches the corresponding expected portion of text, and otherwise computing the elapsed time information for the expected portion of text and storing the computed elapsed time information into the timing file word if the recognized portion does not match the expected portion of text.
15. The system of claim 14, wherein the one or more recognized portions or portions of text comprise words.
16. The system of claim 14, wherein the one or more recognized portions or portions of text comprise syllables.
17. The system of claim 14, wherein the computing device is further configured to, during playback:
provide an audible output corresponding to the audio recording;
display a sequence of words corresponding to at least a portion of the expected portion of text on a user interface rendered on a display device; and
provide visual indicia indicating a correspondence between the portions in the audio recording and the expected portion of text.
18. The system of claim 14, wherein the computing device is further configured to:
determine the elapsed time for an expected portion of text based on a metric associated with an expected length of time to verbalize the expected portion of text.
19. The system of claim 14, wherein the computing device is further configured to:
display on a user interface device, the recognized portions of text, the elapsed times, and the expected portions of text;
receive from a user an indication of timings for the expected portions of text; and
store elapsed time information in the timing file based on the received user indications.
US12/687,240 2009-01-15 2010-01-14 Synchronization for document narration Abandoned US20100324895A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/687,240 US20100324895A1 (en) 2009-01-15 2010-01-14 Synchronization for document narration
PCT/US2010/021104 WO2010083354A1 (en) 2009-01-15 2010-01-15 Systems and methods for multiple voice document narration

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14494709P 2009-01-15 2009-01-15
US16596309P 2009-04-02 2009-04-02
US12/687,240 US20100324895A1 (en) 2009-01-15 2010-01-14 Synchronization for document narration

Publications (1)

Publication Number Publication Date
US20100324895A1 true US20100324895A1 (en) 2010-12-23

Family

ID=43125169

Family Applications (7)

Application Number Title Priority Date Filing Date
US12/687,202 Expired - Fee Related US8359202B2 (en) 2009-01-15 2010-01-14 Character models for document narration
US12/687,231 Expired - Fee Related US8498867B2 (en) 2009-01-15 2010-01-14 Systems and methods for selection and use of multiple characters for document narration
US12/687,240 Abandoned US20100324895A1 (en) 2009-01-15 2010-01-14 Synchronization for document narration
US12/687,271 Expired - Fee Related US8364488B2 (en) 2009-01-15 2010-01-14 Voice models for document narration
US12/687,208 Expired - Fee Related US8352269B2 (en) 2009-01-15 2010-01-14 Systems and methods for processing indicia for document narration
US12/687,220 Expired - Fee Related US8498866B2 (en) 2009-01-15 2010-01-14 Systems and methods for multiple language document narration
US12/687,213 Expired - Fee Related US8954328B2 (en) 2009-01-15 2010-01-14 Systems and methods for document narration with multiple characters having multiple moods

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US12/687,202 Expired - Fee Related US8359202B2 (en) 2009-01-15 2010-01-14 Character models for document narration
US12/687,231 Expired - Fee Related US8498867B2 (en) 2009-01-15 2010-01-14 Systems and methods for selection and use of multiple characters for document narration

Family Applications After (4)

Application Number Title Priority Date Filing Date
US12/687,271 Expired - Fee Related US8364488B2 (en) 2009-01-15 2010-01-14 Voice models for document narration
US12/687,208 Expired - Fee Related US8352269B2 (en) 2009-01-15 2010-01-14 Systems and methods for processing indicia for document narration
US12/687,220 Expired - Fee Related US8498866B2 (en) 2009-01-15 2010-01-14 Systems and methods for multiple language document narration
US12/687,213 Expired - Fee Related US8954328B2 (en) 2009-01-15 2010-01-14 Systems and methods for document narration with multiple characters having multiple moods

Country Status (1)

Country Link
US (7) US8359202B2 (en)

Cited By (183)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100318364A1 (en) * 2009-01-15 2010-12-16 K-Nfb Reading Technology, Inc. Systems and methods for selection and use of multiple characters for document narration
US20110153047A1 (en) * 2008-07-04 2011-06-23 Booktrack Holdings Limited Method and System for Making and Playing Soundtracks
WO2012167276A1 (en) * 2011-06-03 2012-12-06 Apple Inc. Automatically creating a mapping between text data and audio data
US20130191125A1 (en) * 2012-01-25 2013-07-25 Kabushiki Kaisha Toshiba Transcription supporting system and transcription supporting method
US20130219322A1 (en) * 2010-01-11 2013-08-22 Apple Inc. Electronic text manipulation and display
US8520025B2 (en) 2011-02-24 2013-08-27 Google Inc. Systems and methods for manipulating user annotations in electronic books
WO2013151610A1 (en) * 2012-04-06 2013-10-10 Google Inc. Synchronizing progress in audio and text versions of electronic books
WO2014137074A1 (en) * 2013-03-05 2014-09-12 Lg Electronics Inc. Mobile terminal and method of controlling the mobile terminal
US8903723B2 (en) 2010-05-18 2014-12-02 K-Nfb Reading Technology, Inc. Audio synchronization for document narration with user-selected playback
US20150082136A1 (en) * 2013-09-18 2015-03-19 Booktrack Holdings Limited Playback system for synchronised soundtracks for electronic media content
US9031493B2 (en) 2011-11-18 2015-05-12 Google Inc. Custom narration of electronic books
US9047356B2 (en) 2012-09-05 2015-06-02 Google Inc. Synchronizing multiple reading positions in electronic books
US9069744B2 (en) 2012-05-15 2015-06-30 Google Inc. Extensible framework for ereader tools, including named entity information
US9141404B2 (en) 2011-10-24 2015-09-22 Google Inc. Extensible framework for ereader tools
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US9323733B1 (en) 2013-06-05 2016-04-26 Google Inc. Indexed electronic book annotations
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9613654B2 (en) 2011-07-26 2017-04-04 Booktrack Holdings Limited Soundtrack for electronic text
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US10088976B2 (en) 2009-01-15 2018-10-02 Em Acquisition Corp., Inc. Systems and methods for multiple voice document narration
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10671251B2 (en) 2017-12-22 2020-06-02 Arbordale Publishing, LLC Interactive eReader interface generation based on synchronization of textual and audial descriptors
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10699717B2 (en) 2014-05-30 2020-06-30 Apple Inc. Intelligent assistant for home automation
US10698951B2 (en) * 2016-07-29 2020-06-30 Booktrack Holdings Limited Systems and methods for automatic-creation of soundtracks for speech audio
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10714117B2 (en) 2013-02-07 2020-07-14 Apple Inc. Voice trigger for a digital assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10805665B1 (en) 2019-12-13 2020-10-13 Bank Of America Corporation Synchronizing text-to-audio with interactive videos in the video framework
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US11069336B2 (en) 2012-03-02 2021-07-20 Apple Inc. Systems and methods for name pronunciation
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
US11350185B2 (en) 2019-12-13 2022-05-31 Bank Of America Corporation Text-to-audio for interactive videos using a markup language
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11443646B2 (en) * 2017-12-22 2022-09-13 Fathom Technologies, LLC E-Reader interface system with audio and highlighting synchronization for digital books
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US12010262B2 (en) 2013-08-06 2024-06-11 Apple Inc. Auto-activating smart responses based on activities from remote devices

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010067118A1 (en) 2008-12-11 2010-06-17 Novauris Technologies Limited Speech recognition involving a mobile device
US8370151B2 (en) * 2009-01-15 2013-02-05 K-Nfb Reading Technology, Inc. Systems and methods for multiple voice document narration
US9009612B2 (en) * 2009-06-07 2015-04-14 Apple Inc. Devices, methods, and graphical user interfaces for accessibility using a touch-sensitive surface
US20110184738A1 (en) * 2010-01-25 2011-07-28 Kalisky Dror Navigation and orientation tools for speech synthesis
US20110276327A1 (en) * 2010-05-06 2011-11-10 Sony Ericsson Mobile Communications Ab Voice-to-expressive text
US8707195B2 (en) 2010-06-07 2014-04-22 Apple Inc. Devices, methods, and graphical user interfaces for accessibility via a touch-sensitive surface
US8888494B2 (en) * 2010-06-28 2014-11-18 Randall Lee THREEWITS Interactive environment for performing arts scripts
US9870134B2 (en) * 2010-06-28 2018-01-16 Randall Lee THREEWITS Interactive blocking and management for performing arts productions
CN102314874A (en) * 2010-06-29 2012-01-11 鸿富锦精密工业(深圳)有限公司 Text-to-voice conversion system and method
US8452600B2 (en) * 2010-08-18 2013-05-28 Apple Inc. Assisted reader
US9218680B2 (en) * 2010-09-01 2015-12-22 K-Nfb Reading Technology, Inc. Systems and methods for rendering graphical content and glyphs
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
WO2012090196A1 (en) * 2010-12-30 2012-07-05 Melamed Gal Method and system for processing content
US20120218287A1 (en) * 2011-02-25 2012-08-30 Mcwilliams Thomas J Apparatus, system and method for electronic book reading with audio output capability
US20120226500A1 (en) * 2011-03-02 2012-09-06 Sony Corporation System and method for content rendering including synthetic narration
JP5463385B2 (en) * 2011-06-03 2014-04-09 アップル インコーポレイテッド Automatic creation of mapping between text data and audio data
US8751971B2 (en) 2011-06-05 2014-06-10 Apple Inc. Devices, methods, and graphical user interfaces for providing accessibility using a touch-sensitive surface
WO2013015463A1 (en) * 2011-07-22 2013-01-31 엘지전자 주식회사 Mobile terminal and method for controlling same
US20130063494A1 (en) * 2011-09-12 2013-03-14 Microsoft Corporation Assistive reading interface
US9639518B1 (en) * 2011-09-23 2017-05-02 Amazon Technologies, Inc. Identifying entities in a digital work
US9449526B1 (en) 2011-09-23 2016-09-20 Amazon Technologies, Inc. Generating a game related to a digital work
US9128581B1 (en) 2011-09-23 2015-09-08 Amazon Technologies, Inc. Providing supplemental information for a digital work in a user interface
US9613003B1 (en) 2011-09-23 2017-04-04 Amazon Technologies, Inc. Identifying topics in a digital work
JP2013072957A (en) * 2011-09-27 2013-04-22 Toshiba Corp Document read-aloud support device, method and program
US8881269B2 (en) 2012-03-31 2014-11-04 Apple Inc. Device, method, and graphical user interface for integrating recognition of handwriting gestures with a screen reader
US9449523B2 (en) * 2012-06-27 2016-09-20 Apple Inc. Systems and methods for narrating electronic books
KR102023157B1 (en) * 2012-07-06 2019-09-19 삼성전자 주식회사 Method and apparatus for recording and playing of user voice of mobile terminal
US9570066B2 (en) * 2012-07-16 2017-02-14 General Motors Llc Sender-responsive text-to-speech processing
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9117450B2 (en) * 2012-12-12 2015-08-25 Nuance Communications, Inc. Combining re-speaking, partial agent transcription and ASR for improved accuracy / human guided ASR
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
AU2014233517B2 (en) 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
JP2016521948A (en) 2013-06-13 2016-07-25 アップル インコーポレイテッド System and method for emergency calls initiated by voice command
KR102222122B1 (en) * 2014-01-21 2021-03-03 엘지전자 주식회사 Mobile terminal and method for controlling the same
US9183831B2 (en) 2014-03-27 2015-11-10 International Business Machines Corporation Text-to-speech for digital literature
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9460075B2 (en) 2014-06-17 2016-10-04 International Business Machines Corporation Solving and answering arithmetic and algebraic problems using natural language processing
US9514185B2 (en) * 2014-08-07 2016-12-06 International Business Machines Corporation Answering time-sensitive questions
US9430557B2 (en) 2014-09-17 2016-08-30 International Business Machines Corporation Automatic data interpretation and answering analytical questions with tables and charts
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
WO2016123205A1 (en) * 2015-01-28 2016-08-04 Hahn Bruce C Deep reading machine and method
CN106156766B (en) * 2015-03-25 2020-02-18 阿里巴巴集团控股有限公司 Method and device for generating text line classifier
US20160314780A1 (en) * 2015-04-27 2016-10-27 Microsoft Technology Licensing, Llc Increasing user interaction performance with multi-voice text-to-speech generation
US9691378B1 (en) * 2015-11-05 2017-06-27 Amazon Technologies, Inc. Methods and devices for selectively ignoring captured audio data
US10698485B2 (en) 2016-06-27 2020-06-30 Microsoft Technology Licensing, Llc Augmenting text narration with haptic feedback
US10489110B2 (en) 2016-11-22 2019-11-26 Microsoft Technology Licensing, Llc Implicit narration for aural user interface
US10930302B2 (en) 2017-12-22 2021-02-23 International Business Machines Corporation Quality of text analytics
KR20200033140A (en) * 2018-09-19 2020-03-27 삼성전자주식회사 System and method for providing voice assistant service
WO2020060151A1 (en) 2018-09-19 2020-03-26 Samsung Electronics Co., Ltd. System and method for providing voice assistant service
CN111048062B (en) * 2018-10-10 2022-10-04 华为技术有限公司 Speech synthesis method and apparatus
CN110399461A (en) * 2019-07-19 2019-11-01 腾讯科技(深圳)有限公司 Data processing method, device, server and storage medium
US11394799B2 (en) * 2020-05-07 2022-07-19 Freeman Augustus Jackson Methods, systems, apparatuses, and devices for facilitating for generation of an interactive story based on non-interactive data
US11875797B2 (en) * 2020-07-23 2024-01-16 Pozotron Inc. Systems and methods for scripted audio production

Citations (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4397635A (en) * 1982-02-19 1983-08-09 Samuels Curtis A Reading teaching system
US4636173A (en) * 1985-12-12 1987-01-13 Robert Mossman Method for teaching reading
US4913539A (en) * 1988-04-04 1990-04-03 New York Institute Of Technology Apparatus and method for lip-synching animation
US4965727A (en) * 1984-09-13 1990-10-23 Halamka John D Computer card
US5278943A (en) * 1990-03-23 1994-01-11 Bright Star Technology, Inc. Speech animation and inflection system
US5649060A (en) * 1993-10-18 1997-07-15 International Business Machines Corporation Automatic indexing and aligning of audio and text using speech recognition
US5721827A (en) * 1996-10-02 1998-02-24 James Logan System for electrically distributing personalized information
US5732216A (en) * 1996-10-02 1998-03-24 Internet Angles, Inc. Audio message exchange system
US5737725A (en) * 1996-01-09 1998-04-07 U S West Marketing Resources Group, Inc. Method and system for automatically generating new voice files corresponding to new text from a script
US5786814A (en) * 1995-11-03 1998-07-28 Xerox Corporation Computer controlled display system activities using correlated graphical and timeline interfaces for controlling replay of temporal data representing collaborative activities
US5860064A (en) * 1993-05-13 1999-01-12 Apple Computer, Inc. Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system
US5953005A (en) * 1996-06-28 1999-09-14 Sun Microsystems, Inc. System and method for on-line multimedia access
US6017219A (en) * 1997-06-18 2000-01-25 International Business Machines Corporation System and method for interactive reading and language instruction
US6064957A (en) * 1997-08-15 2000-05-16 General Electric Company Improving speech recognition through text-based linguistic post-processing
US6068487A (en) * 1998-10-20 2000-05-30 Lernout & Hauspie Speech Products N.V. Speller for reading system
US6076059A (en) * 1997-08-29 2000-06-13 Digital Equipment Corporation Method for aligning text with audio signals
US6081780A (en) * 1998-04-28 2000-06-27 International Business Machines Corporation TTS and prosody based authoring system
US6151576A (en) * 1998-08-11 2000-11-21 Adobe Systems Incorporated Mixing digitized speech and text using reliability indices
US6199076B1 (en) * 1996-10-02 2001-03-06 James Logan Audio program player including a dynamic program selection controller
US6226615B1 (en) * 1997-08-06 2001-05-01 British Broadcasting Corporation Spoken text display method and apparatus, for use in generating television signals
US6260011B1 (en) * 2000-03-20 2001-07-10 Microsoft Corporation Methods and apparatus for automatically synchronizing electronic audio files with electronic text files
US6263308B1 (en) * 2000-03-20 2001-07-17 Microsoft Corporation Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process
US6324511B1 (en) * 1998-10-01 2001-11-27 Mindmaker, Inc. Method of and apparatus for multi-modal information presentation to computer users with dyslexia, reading disabilities or visual impairment
US20020080179A1 (en) * 2000-12-25 2002-06-27 Toshihiko Okabe Data transfer method and data transfer device
US20020099552A1 (en) * 2001-01-25 2002-07-25 Darryl Rubin Annotating electronic information with audio clips
US6442518B1 (en) * 1999-07-14 2002-08-27 Compaq Information Technologies Group, L.P. Method for refining time alignments of closed captions
US6446041B1 (en) * 1999-10-27 2002-09-03 Microsoft Corporation Method and system for providing audio playback of a multi-source document
US20020143534A1 (en) * 2001-03-29 2002-10-03 Koninklijke Philips Electronics N.V. Editing during synchronous playback
US6490557B1 (en) * 1998-03-05 2002-12-03 John C. Jeppesen Method and apparatus for training an ultra-large vocabulary, continuous speech, speaker independent, automatic speech recognition system and consequential database
US6505153B1 (en) * 2000-05-22 2003-01-07 Compaq Information Technologies Group, L.P. Efficient method for producing off-line closed captions
US20030013073A1 (en) * 2001-04-09 2003-01-16 International Business Machines Corporation Electronic book with multimode I/O
US20030014252A1 (en) * 2001-05-10 2003-01-16 Utaha Shizuka Information processing apparatus, information processing method, recording medium, and program
US20030018663A1 (en) * 2001-05-30 2003-01-23 Cornette Ranjita K. Method and system for creating a multimedia electronic book
US20030028380A1 (en) * 2000-02-02 2003-02-06 Freeland Warwick Peter Speech system
US6633741B1 (en) * 2000-07-19 2003-10-14 John G. Posa Recap, summary, and auxiliary information generation for electronic books
US20030212559A1 (en) * 2002-05-09 2003-11-13 Jianlei Xie Text-to-speech (TTS) for hand-held devices
US20030219706A1 (en) * 2002-05-22 2003-11-27 Nijim Yousef Wasef Talking E-book
US20040135814A1 (en) * 2003-01-15 2004-07-15 Vendelin George David Reading tool and method
US20040138881A1 (en) * 2002-11-22 2004-07-15 Olivier Divay Automatic insertion of non-verbalized punctuation
US6792409B2 (en) * 1999-12-20 2004-09-14 Koninklijke Philips Electronics N.V. Synchronous reproduction in a speech recognition system
US20050021343A1 (en) * 2003-07-24 2005-01-27 Spencer Julian A.Q. Method and apparatus for highlighting during presentations
US20050096909A1 (en) * 2003-10-29 2005-05-05 Raimo Bakis Systems and methods for expressive text-to-speech
US20050137867A1 (en) * 2003-12-17 2005-06-23 Miller Mark R. Method for electronically generating a synchronized textual transcript of an audio recording
US20050203750A1 (en) * 2004-03-12 2005-09-15 International Business Machines Corporation Displaying text of speech in synchronization with the speech
US6947896B2 (en) * 1998-09-02 2005-09-20 International Business Machines Corporation Text marking for deferred correction
US6961700B2 (en) * 1996-09-24 2005-11-01 Allvoice Computing Plc Method and apparatus for processing the output of a speech recognition engine
US6961895B1 (en) * 2000-08-10 2005-11-01 Recording For The Blind & Dyslexic, Incorporated Method and apparatus for synchronization of text and audio data
US20060074659A1 (en) * 2004-09-10 2006-04-06 Adams Marilyn J Assessing fluency based on elapsed time
US20060111902A1 (en) * 2004-11-22 2006-05-25 Bravobrava L.L.C. System and method for assisting language learning
US20060149558A1 (en) * 2001-07-17 2006-07-06 Jonathan Kahn Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile
US20060190249A1 (en) * 2002-06-26 2006-08-24 Jonathan Kahn Method for comparing a transcribed text file with a previously created file
US7110945B2 (en) * 1999-07-16 2006-09-19 Dreamations Llc Interactive book
US20060242595A1 (en) * 2003-03-07 2006-10-26 Hirokazu Kizumi Scroll display control
US7174295B1 (en) * 1999-09-06 2007-02-06 Nokia Corporation User interface for text to speech conversion
US7191117B2 (en) * 2000-06-09 2007-03-13 British Broadcasting Corporation Generation of subtitles or captions for moving pictures
US7194411B2 (en) * 2001-02-26 2007-03-20 Benjamin Slotznick Method of displaying web pages to enable user access to text information that the user has difficulty reading
US7194693B2 (en) * 2002-10-29 2007-03-20 International Business Machines Corporation Apparatus and method for automatically highlighting text in an electronic document
US20070106508A1 (en) * 2003-04-29 2007-05-10 Jonathan Kahn Methods and systems for creating a second generation session file
US20070118378A1 (en) * 2005-11-22 2007-05-24 International Business Machines Corporation Dynamically Changing Voice Attributes During Speech Synthesis Based upon Parameter Differentiation for Dialog Contexts
US20070174060A1 (en) * 2001-12-20 2007-07-26 Canon Kabushiki Kaisha Control apparatus
US20070171189A1 (en) * 2006-01-20 2007-07-26 Primax Electronics Ltd. Auxiliary reading system of handheld electronic device
US20070271104A1 (en) * 2006-05-19 2007-11-22 Mckay Martin Streaming speech with synchronized highlighting generated by a server
US20080027726A1 (en) * 2006-07-28 2008-01-31 Eric Louis Hansen Text to audio mapping, and animation of the text
US7346506B2 (en) * 2003-10-08 2008-03-18 Agfa Inc. System and method for synchronized text display and audio playback
US7366671B2 (en) * 2004-09-29 2008-04-29 Inventec Corporation Speech displaying system and method
US7366714B2 (en) * 2000-03-23 2008-04-29 Albert Krachman Method and system for providing electronic discovery on computer databases and archives using statement analysis to detect false statements and recover relevant data
US7376560B2 (en) * 2001-10-12 2008-05-20 Koninklijke Philips Electronics N.V. Speech recognition device to mark parts of a recognized text
US20080133219A1 (en) * 2006-02-10 2008-06-05 Spinvox Limited Mass-Scale, User-Independent, Device-Independent Voice Messaging System
US20080140413A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Synchronization of audio to reading
US20080140313A1 (en) * 2005-03-22 2008-06-12 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Map-based guide system and method
US20080140412A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Interactive tutoring
US20080140652A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Authoring tool
US7412643B1 (en) * 1999-11-23 2008-08-12 International Business Machines Corporation Method and apparatus for linking representation and realization data
US20080195370A1 (en) * 2005-08-26 2008-08-14 Koninklijke Philips Electronics, N.V. System and Method For Synchronizing Sound and Manually Transcribed Text
US20080255837A1 (en) * 2004-11-30 2008-10-16 Jonathan Kahn Method for locating an audio segment within an audio file
US20080291325A1 (en) * 2007-05-24 2008-11-27 Microsoft Corporation Personality-Based Device
US20090019389A1 (en) * 2004-07-29 2009-01-15 Andreas Matthias Aust System and method for providing visual markers in electronic documents
US7483834B2 (en) * 2001-07-18 2009-01-27 Panasonic Corporation Method and apparatus for audio navigation of an information appliance
US7487086B2 (en) * 2002-05-10 2009-02-03 Nexidia Inc. Transcript alignment
US7490040B2 (en) * 2002-06-28 2009-02-10 International Business Machines Corporation Method and apparatus for preparing a document to be read by a text-to-speech reader
US20090048832A1 (en) * 2005-11-08 2009-02-19 Nec Corporation Speech-to-text system, speech-to-text method, and speech-to-text program
US20090202226A1 (en) * 2005-06-06 2009-08-13 Texthelp Systems, Ltd. System and method for converting electronic text to a digital multimedia electronic book
US20100023330A1 (en) * 2008-07-28 2010-01-28 International Business Machines Corporation Speed podcasting
US20100031142A1 (en) * 2006-10-23 2010-02-04 Nec Corporation Content summarizing system, method, and program
US7669111B1 (en) * 1997-01-29 2010-02-23 Philip R Krause Electronic text reading environment enhancement method and apparatus
US20100057461A1 (en) * 2007-02-06 2010-03-04 Andreas Neubacher Method and system for creating or updating entries in a speech recognition lexicon
US7693717B2 (en) * 2006-04-12 2010-04-06 Custom Speech Usa, Inc. Session file modification with annotation using speech recognition or text to speech
US20100094632A1 (en) * 2005-09-27 2010-04-15 At&T Corp, System and Method of Developing A TTS Voice
US20100169092A1 (en) * 2008-11-26 2010-07-01 Backes Steven J Voice interface ocx
US20100182325A1 (en) * 2002-01-22 2010-07-22 Gizmoz Israel 2002 Ltd. Apparatus and method for efficient animation of believable speaking 3d characters in real time
US20100216108A1 (en) * 2009-02-20 2010-08-26 Jackson Fish Market, LLC Audiovisual record of a user reading a book aloud for playback with a virtual book
US7809572B2 (en) * 2005-07-20 2010-10-05 Panasonic Corporation Voice quality change portion locating apparatus
US20100281365A1 (en) * 2006-10-19 2010-11-04 Tae Hyeon Kim Encoding method and apparatus and decoding method and apparatus
US20100278453A1 (en) * 2006-09-15 2010-11-04 King Martin T Capture and display of annotations in paper and electronic documents
US20100299131A1 (en) * 2009-05-21 2010-11-25 Nexidia Inc. Transcript alignment
US20110054901A1 (en) * 2009-08-28 2011-03-03 International Business Machines Corporation Method and apparatus for aligning texts
US7987244B1 (en) * 2004-12-30 2011-07-26 At&T Intellectual Property Ii, L.P. Network repository for voice fonts
US7996218B2 (en) * 2005-03-07 2011-08-09 Samsung Electronics Co., Ltd. User adaptive speech recognition method and apparatus
US8009966B2 (en) * 2002-11-01 2011-08-30 Synchro Arts Limited Methods and apparatus for use in sound replacement with automatic synchronization to images
US20110213613A1 (en) * 2006-04-03 2011-09-01 Google Inc., a CA corporation Automatic Language Model Update
US8036894B2 (en) * 2006-02-16 2011-10-11 Apple Inc. Multi-unit approach to text-to-speech synthesis
US8065142B2 (en) * 2007-06-28 2011-11-22 Nuance Communications, Inc. Synchronization of an input text of a speech with a recording of the speech
US20110288861A1 (en) * 2010-05-18 2011-11-24 K-NFB Technology, Inc. Audio Synchronization For Document Narration with User-Selected Playback
US8073694B2 (en) * 2005-09-27 2011-12-06 At&T Intellectual Property Ii, L.P. System and method for testing a TTS voice
US20110320189A1 (en) * 2006-02-27 2011-12-29 Dictaphone Corporation Systems and methods for filtering dictated and non-dictated sections of documents
US8103507B2 (en) * 2005-12-30 2012-01-24 Cisco Technology, Inc. Searchable multimedia stream
US8117034B2 (en) * 2001-03-29 2012-02-14 Nuance Communications Austria Gmbh Synchronise an audio cursor and a text cursor during editing
US8131552B1 (en) * 2000-11-21 2012-03-06 At&T Intellectual Property Ii, L.P. System and method for automated multimedia content indexing and retrieval
US8131545B1 (en) * 2008-09-25 2012-03-06 Google Inc. Aligning a transcript to audio data

Family Cites Families (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993007562A1 (en) * 1991-09-30 1993-04-15 Riverrun Technology Method and apparatus for managing information
US8073695B1 (en) * 1992-12-09 2011-12-06 Adrea, LLC Electronic book with voice emulation features
CA2119397C (en) * 1993-03-19 2007-10-02 Kim E.A. Silverman Improved automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation
US6442523B1 (en) * 1994-07-22 2002-08-27 Steven H. Siegel Method for the auditory navigation of text
JPH08328590A (en) * 1995-05-29 1996-12-13 Sanyo Electric Co Ltd Voice synthesizer
US6282511B1 (en) * 1996-12-04 2001-08-28 At&T Voiced interface with hyperlinked information
US6052663A (en) * 1997-06-27 2000-04-18 Kurzweil Educational Systems, Inc. Reading system which reads aloud from an image representation of a document
JP3224760B2 (en) * 1997-07-10 2001-11-05 インターナショナル・ビジネス・マシーンズ・コーポレーション Voice mail system, voice synthesizing apparatus, and methods thereof
US6549750B1 (en) * 1997-08-20 2003-04-15 Ithaca Media Corporation Printed book augmented with an electronically stored glossary
US7364068B1 (en) * 1998-03-11 2008-04-29 West Corporation Methods and apparatus for intelligent selection of goods and services offered to conferees
US6144938A (en) * 1998-05-01 2000-11-07 Sun Microsystems, Inc. Voice user interface with personality
US6199042B1 (en) * 1998-06-19 2001-03-06 L&H Applications Usa, Inc. Reading system
JP2002527800A (en) * 1998-10-02 2002-08-27 インターナショナル・ビジネス・マシーンズ・コーポレーション Conversation browser and conversation system
JP2001034282A (en) * 1999-07-21 2001-02-09 Konami Co Ltd Voice synthesizing method, dictionary constructing method for voice synthesis, voice synthesizer and computer readable medium recorded with voice synthesis program
JP3720230B2 (en) * 2000-02-18 2005-11-24 シャープ株式会社 Expression data control system, expression data control apparatus constituting the same, and recording medium on which the program is recorded
WO2001091109A1 (en) * 2000-05-24 2001-11-29 Stars 1-To-1 Interactive voice communication method and system for information and entertainment
US6933928B1 (en) * 2000-07-18 2005-08-23 Scott E. Lilienthal Electronic book player with audio synchronization
JP2002149560A (en) * 2000-08-28 2002-05-24 Sharp Corp Device and system for e-mail
US6985913B2 (en) * 2000-12-28 2006-01-10 Casio Computer Co. Ltd. Electronic book data delivery apparatus, electronic book device and recording medium
US6970820B2 (en) * 2001-02-26 2005-11-29 Matsushita Electric Industrial Co., Ltd. Voice personalization of speech synthesizer
US7020663B2 (en) * 2001-05-30 2006-03-28 George M. Hay System and method for the delivery of electronic books
JP2002358092A (en) * 2001-06-01 2002-12-13 Sony Corp Voice synthesizing system
US20030028377A1 (en) * 2001-07-31 2003-02-06 Noyes Albert W. Method and device for synthesizing and distributing voice types for voice-enabled devices
US6810378B2 (en) * 2001-08-22 2004-10-26 Lucent Technologies Inc. Method and apparatus for controlling a speech synthesis system to provide multiple styles of speech
US20060069567A1 (en) * 2001-12-10 2006-03-30 Tischer Steven N Methods, systems, and products for translating text to speech
US7483832B2 (en) * 2001-12-10 2009-01-27 At&T Intellectual Property I, L.P. Method and system for customizing voice translation of text to speech
JP2003334986A (en) * 2002-05-22 2003-11-25 Dainippon Printing Co Ltd Print system
US20040024585A1 (en) * 2002-07-03 2004-02-05 Amit Srivastava Linguistic segmentation of speech
AU2002950502A0 (en) * 2002-07-31 2002-09-12 E-Clips Intelligent Agent Technologies Pty Ltd Animated messaging
US20040054694A1 (en) * 2002-09-12 2004-03-18 Piccionelli Gregory A. Remote personalization method
EP1552502A1 (en) * 2002-10-04 2005-07-13 Koninklijke Philips Electronics N.V. Speech synthesis apparatus with personalized speech segments
DE102004012208A1 (en) * 2004-03-12 2005-09-29 Siemens Ag Individualization of speech output by adapting a synthesis voice to a target voice
US8666746B2 (en) * 2004-05-13 2014-03-04 At&T Intellectual Property Ii, L.P. System and method for generating customized text-to-speech voices
US7693719B2 (en) * 2004-10-29 2010-04-06 Microsoft Corporation Providing personalized voice font for text-to-speech applications
KR20070093434A (en) * 2004-12-22 2007-09-18 코닌클리케 필립스 일렉트로닉스 엔.브이. Portable audio playback device and method for operation thereof
US7412389B2 (en) * 2005-03-02 2008-08-12 Yang George L Document animation system
US8073697B2 (en) * 2006-09-12 2011-12-06 International Business Machines Corporation Establishing a multimodal personality for a multimodal application
JP2008145234A (en) * 2006-12-08 2008-06-26 Denso Corp Navigation apparatus and program
US8438032B2 (en) * 2007-01-09 2013-05-07 Nuance Communications, Inc. System for tuning synthesized speech
US8886537B2 (en) * 2007-03-20 2014-11-11 Nuance Communications, Inc. Method and system for text-to-speech synthesis with personalized voice
KR20090047159A (en) * 2007-11-07 2009-05-12 삼성전자주식회사 Audio-book playback method and apparatus thereof
US8224652B2 (en) * 2008-09-26 2012-07-17 Microsoft Corporation Speech and text driven HMM-based body animation synthesis
US8863212B2 (en) * 2008-10-16 2014-10-14 At&T Intellectual Property I, Lp Presentation of an adaptive avatar
US8370151B2 (en) * 2009-01-15 2013-02-05 K-Nfb Reading Technology, Inc. Systems and methods for multiple voice document narration
US8359202B2 (en) * 2009-01-15 2013-01-22 K-Nfb Reading Technology, Inc. Character models for document narration
US8150695B1 (en) * 2009-06-18 2012-04-03 Amazon Technologies, Inc. Presentation of written works based on character identities and attributes
US9218680B2 (en) * 2010-09-01 2015-12-22 K-Nfb Reading Technology, Inc. Systems and methods for rendering graphical content and glyphs

Patent Citations (116)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4397635A (en) * 1982-02-19 1983-08-09 Samuels Curtis A Reading teaching system
US4965727A (en) * 1984-09-13 1990-10-23 Halamka John D Computer card
US4636173A (en) * 1985-12-12 1987-01-13 Robert Mossman Method for teaching reading
US4913539A (en) * 1988-04-04 1990-04-03 New York Institute Of Technology Apparatus and method for lip-synching animation
US5278943A (en) * 1990-03-23 1994-01-11 Bright Star Technology, Inc. Speech animation and inflection system
US5860064A (en) * 1993-05-13 1999-01-12 Apple Computer, Inc. Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system
US5649060A (en) * 1993-10-18 1997-07-15 International Business Machines Corporation Automatic indexing and aligning of audio and text using speech recognition
US5786814A (en) * 1995-11-03 1998-07-28 Xerox Corporation Computer controlled display system activities using correlated graphical and timeline interfaces for controlling replay of temporal data representing collaborative activities
US5737725A (en) * 1996-01-09 1998-04-07 U S West Marketing Resources Group, Inc. Method and system for automatically generating new voice files corresponding to new text from a script
US5953005A (en) * 1996-06-28 1999-09-14 Sun Microsystems, Inc. System and method for on-line multimedia access
US6961700B2 (en) * 1996-09-24 2005-11-01 Allvoice Computing Plc Method and apparatus for processing the output of a speech recognition engine
US6199076B1 (en) * 1996-10-02 2001-03-06 James Logan Audio program player including a dynamic program selection controller
US5721827A (en) * 1996-10-02 1998-02-24 James Logan System for electrically distributing personalized information
US5732216A (en) * 1996-10-02 1998-03-24 Internet Angles, Inc. Audio message exchange system
US7669111B1 (en) * 1997-01-29 2010-02-23 Philip R Krause Electronic text reading environment enhancement method and apparatus
US6017219A (en) * 1997-06-18 2000-01-25 International Business Machines Corporation System and method for interactive reading and language instruction
US6226615B1 (en) * 1997-08-06 2001-05-01 British Broadcasting Corporation Spoken text display method and apparatus, for use in generating television signals
US6064957A (en) * 1997-08-15 2000-05-16 General Electric Company Improving speech recognition through text-based linguistic post-processing
US6076059A (en) * 1997-08-29 2000-06-13 Digital Equipment Corporation Method for aligning text with audio signals
US6490557B1 (en) * 1998-03-05 2002-12-03 John C. Jeppesen Method and apparatus for training an ultra-large vocabulary, continuous speech, speaker independent, automatic speech recognition system and consequential database
US6081780A (en) * 1998-04-28 2000-06-27 International Business Machines Corporation TTS and prosody based authoring system
US6151576A (en) * 1998-08-11 2000-11-21 Adobe Systems Incorporated Mixing digitized speech and text using reliability indices
US6947896B2 (en) * 1998-09-02 2005-09-20 International Business Machines Corporation Text marking for deferred correction
US6324511B1 (en) * 1998-10-01 2001-11-27 Mindmaker, Inc. Method of and apparatus for multi-modal information presentation to computer users with dyslexia, reading disabilities or visual impairment
US6068487A (en) * 1998-10-20 2000-05-30 Lernout & Hauspie Speech Products N.V. Speller for reading system
US6442518B1 (en) * 1999-07-14 2002-08-27 Compaq Information Technologies Group, L.P. Method for refining time alignments of closed captions
US7110945B2 (en) * 1999-07-16 2006-09-19 Dreamations Llc Interactive book
US20070011011A1 (en) * 1999-07-16 2007-01-11 Cogliano Mary A Interactive book
US7174295B1 (en) * 1999-09-06 2007-02-06 Nokia Corporation User interface for text to speech conversion
US6446041B1 (en) * 1999-10-27 2002-09-03 Microsoft Corporation Method and system for providing audio playback of a multi-source document
US7412643B1 (en) * 1999-11-23 2008-08-12 International Business Machines Corporation Method and apparatus for linking representation and realization data
US6792409B2 (en) * 1999-12-20 2004-09-14 Koninklijke Philips Electronics N.V. Synchronous reproduction in a speech recognition system
US20030028380A1 (en) * 2000-02-02 2003-02-06 Freeland Warwick Peter Speech system
US6260011B1 (en) * 2000-03-20 2001-07-10 Microsoft Corporation Methods and apparatus for automatically synchronizing electronic audio files with electronic text files
US6263308B1 (en) * 2000-03-20 2001-07-17 Microsoft Corporation Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process
US7366714B2 (en) * 2000-03-23 2008-04-29 Albert Krachman Method and system for providing electronic discovery on computer databases and archives using statement analysis to detect false statements and recover relevant data
US6505153B1 (en) * 2000-05-22 2003-01-07 Compaq Information Technologies Group, L.P. Efficient method for producing off-line closed captions
US7191117B2 (en) * 2000-06-09 2007-03-13 British Broadcasting Corporation Generation of subtitles or captions for moving pictures
US6633741B1 (en) * 2000-07-19 2003-10-14 John G. Posa Recap, summary, and auxiliary information generation for electronic books
US6961895B1 (en) * 2000-08-10 2005-11-01 Recording For The Blind & Dyslexic, Incorporated Method and apparatus for synchronization of text and audio data
US8131552B1 (en) * 2000-11-21 2012-03-06 At&T Intellectual Property Ii, L.P. System and method for automated multimedia content indexing and retrieval
US20020080179A1 (en) * 2000-12-25 2002-06-27 Toshihiko Okabe Data transfer method and data transfer device
US20020099552A1 (en) * 2001-01-25 2002-07-25 Darryl Rubin Annotating electronic information with audio clips
US7194411B2 (en) * 2001-02-26 2007-03-20 Benjamin Slotznick Method of displaying web pages to enable user access to text information that the user has difficulty reading
US8117034B2 (en) * 2001-03-29 2012-02-14 Nuance Communications Austria Gmbh Synchronise an audio cursor and a text cursor during editing
US20020143534A1 (en) * 2001-03-29 2002-10-03 Koninklijke Philips Electronics N.V. Editing during synchronous playback
US20030013073A1 (en) * 2001-04-09 2003-01-16 International Business Machines Corporation Electronic book with multimode I/O
US20030014252A1 (en) * 2001-05-10 2003-01-16 Utaha Shizuka Information processing apparatus, information processing method, recording medium, and program
US20030018663A1 (en) * 2001-05-30 2003-01-23 Cornette Ranjita K. Method and system for creating a multimedia electronic book
US20060149558A1 (en) * 2001-07-17 2006-07-06 Jonathan Kahn Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile
US7483834B2 (en) * 2001-07-18 2009-01-27 Panasonic Corporation Method and apparatus for audio navigation of an information appliance
US7376560B2 (en) * 2001-10-12 2008-05-20 Koninklijke Philips Electronics N.V. Speech recognition device to mark parts of a recognized text
US20070174060A1 (en) * 2001-12-20 2007-07-26 Canon Kabushiki Kaisha Control apparatus
US20100182325A1 (en) * 2002-01-22 2010-07-22 Gizmoz Israel 2002 Ltd. Apparatus and method for efficient animation of believable speaking 3d characters in real time
US20030212559A1 (en) * 2002-05-09 2003-11-13 Jianlei Xie Text-to-speech (TTS) for hand-held devices
US7487086B2 (en) * 2002-05-10 2009-02-03 Nexidia Inc. Transcript alignment
US20030219706A1 (en) * 2002-05-22 2003-11-27 Nijim Yousef Wasef Talking E-book
US20060190249A1 (en) * 2002-06-26 2006-08-24 Jonathan Kahn Method for comparing a transcribed text file with a previously created file
US7953601B2 (en) * 2002-06-28 2011-05-31 Nuance Communications, Inc. Method and apparatus for preparing a document to be read by text-to-speech reader
US7490040B2 (en) * 2002-06-28 2009-02-10 International Business Machines Corporation Method and apparatus for preparing a document to be read by a text-to-speech reader
US20070124672A1 (en) * 2002-10-29 2007-05-31 International Business Machines Corporation Apparatus and method for automatically highlighting text in an electronic document
US7194693B2 (en) * 2002-10-29 2007-03-20 International Business Machines Corporation Apparatus and method for automatically highlighting text in an electronic document
US8009966B2 (en) * 2002-11-01 2011-08-30 Synchro Arts Limited Methods and apparatus for use in sound replacement with automatic synchronization to images
US20040138881A1 (en) * 2002-11-22 2004-07-15 Olivier Divay Automatic insertion of non-verbalized punctuation
US20040135814A1 (en) * 2003-01-15 2004-07-15 Vendelin George David Reading tool and method
US20060242595A1 (en) * 2003-03-07 2006-10-26 Hirokazu Kizumi Scroll display control
US20070106508A1 (en) * 2003-04-29 2007-05-10 Jonathan Kahn Methods and systems for creating a second generation session file
US7979281B2 (en) * 2003-04-29 2011-07-12 Custom Speech Usa, Inc. Methods and systems for creating a second generation session file
US20050021343A1 (en) * 2003-07-24 2005-01-27 Spencer Julian A.Q. Method and apparatus for highlighting during presentations
US7346506B2 (en) * 2003-10-08 2008-03-18 Agfa Inc. System and method for synchronized text display and audio playback
US20050096909A1 (en) * 2003-10-29 2005-05-05 Raimo Bakis Systems and methods for expressive text-to-speech
US20050137867A1 (en) * 2003-12-17 2005-06-23 Miller Mark R. Method for electronically generating a synchronized textual transcript of an audio recording
US20050203750A1 (en) * 2004-03-12 2005-09-15 International Business Machines Corporation Displaying text of speech in synchronization with the speech
US20090019389A1 (en) * 2004-07-29 2009-01-15 Andreas Matthias Aust System and method for providing visual markers in electronic documents
US20060074659A1 (en) * 2004-09-10 2006-04-06 Adams Marilyn J Assessing fluency based on elapsed time
US7366671B2 (en) * 2004-09-29 2008-04-29 Inventec Corporation Speech displaying system and method
US20060111902A1 (en) * 2004-11-22 2006-05-25 Bravobrava L.L.C. System and method for assisting language learning
US20080255837A1 (en) * 2004-11-30 2008-10-16 Jonathan Kahn Method for locating an audio segment within an audio file
US7987244B1 (en) * 2004-12-30 2011-07-26 At&T Intellectual Property Ii, L.P. Network repository for voice fonts
US7996218B2 (en) * 2005-03-07 2011-08-09 Samsung Electronics Co., Ltd. User adaptive speech recognition method and apparatus
US20080140313A1 (en) * 2005-03-22 2008-06-12 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Map-based guide system and method
US20090202226A1 (en) * 2005-06-06 2009-08-13 Texthelp Systems, Ltd. System and method for converting electronic text to a digital multimedia electronic book
US7809572B2 (en) * 2005-07-20 2010-10-05 Panasonic Corporation Voice quality change portion locating apparatus
US20080195370A1 (en) * 2005-08-26 2008-08-14 Koninklijke Philips Electronics, N.V. System and Method For Synchronizing Sound and Manually Transcribed Text
US8073694B2 (en) * 2005-09-27 2011-12-06 At&T Intellectual Property Ii, L.P. System and method for testing a TTS voice
US20100094632A1 (en) * 2005-09-27 2010-04-15 At&T Corp, System and Method of Developing A TTS Voice
US20090048832A1 (en) * 2005-11-08 2009-02-19 Nec Corporation Speech-to-text system, speech-to-text method, and speech-to-text program
US8155958B2 (en) * 2005-11-08 2012-04-10 Nec Corporation Speech-to-text system, speech-to-text method, and speech-to-text program
US20070118378A1 (en) * 2005-11-22 2007-05-24 International Business Machines Corporation Dynamically Changing Voice Attributes During Speech Synthesis Based upon Parameter Differentiation for Dialog Contexts
US8103507B2 (en) * 2005-12-30 2012-01-24 Cisco Technology, Inc. Searchable multimedia stream
US20070171189A1 (en) * 2006-01-20 2007-07-26 Primax Electronics Ltd. Auxiliary reading system of handheld electronic device
US20080133219A1 (en) * 2006-02-10 2008-06-05 Spinvox Limited Mass-Scale, User-Independent, Device-Independent Voice Messaging System
US8036894B2 (en) * 2006-02-16 2011-10-11 Apple Inc. Multi-unit approach to text-to-speech synthesis
US20110320189A1 (en) * 2006-02-27 2011-12-29 Dictaphone Corporation Systems and methods for filtering dictated and non-dictated sections of documents
US20110213613A1 (en) * 2006-04-03 2011-09-01 Google Inc., a CA corporation Automatic Language Model Update
US7693717B2 (en) * 2006-04-12 2010-04-06 Custom Speech Usa, Inc. Session file modification with annotation using speech recognition or text to speech
US20070271104A1 (en) * 2006-05-19 2007-11-22 Mckay Martin Streaming speech with synchronized highlighting generated by a server
US20080027726A1 (en) * 2006-07-28 2008-01-31 Eric Louis Hansen Text to audio mapping, and animation of the text
US20100278453A1 (en) * 2006-09-15 2010-11-04 King Martin T Capture and display of annotations in paper and electronic documents
US20100281365A1 (en) * 2006-10-19 2010-11-04 Tae Hyeon Kim Encoding method and apparatus and decoding method and apparatus
US20100031142A1 (en) * 2006-10-23 2010-02-04 Nec Corporation Content summarizing system, method, and program
US20080140413A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Synchronization of audio to reading
US20080140412A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Interactive tutoring
US20080140652A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Authoring tool
US20100057461A1 (en) * 2007-02-06 2010-03-04 Andreas Neubacher Method and system for creating or updating entries in a speech recognition lexicon
US20080291325A1 (en) * 2007-05-24 2008-11-27 Microsoft Corporation Personality-Based Device
US8065142B2 (en) * 2007-06-28 2011-11-22 Nuance Communications, Inc. Synchronization of an input text of a speech with a recording of the speech
US20120041758A1 (en) * 2007-06-28 2012-02-16 Nuance Communications, Inc. Synchronization of an input text of a speech with a recording of the speech
US20100023330A1 (en) * 2008-07-28 2010-01-28 International Business Machines Corporation Speed podcasting
US8131545B1 (en) * 2008-09-25 2012-03-06 Google Inc. Aligning a transcript to audio data
US20100169092A1 (en) * 2008-11-26 2010-07-01 Backes Steven J Voice interface ocx
US20100216108A1 (en) * 2009-02-20 2010-08-26 Jackson Fish Market, LLC Audiovisual record of a user reading a book aloud for playback with a virtual book
US20100299131A1 (en) * 2009-05-21 2010-11-25 Nexidia Inc. Transcript alignment
US20110054901A1 (en) * 2009-08-28 2011-03-03 International Business Machines Corporation Method and apparatus for aligning texts
US20110288861A1 (en) * 2010-05-18 2011-11-24 K-NFB Technology, Inc. Audio Synchronization For Document Narration with User-Selected Playback
US8392186B2 (en) * 2010-05-18 2013-03-05 K-Nfb Reading Technology, Inc. Audio synchronization for document narration with user-selected playback

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
Alexander Haubold, John R. Kender, "Alignment of Speech to Highly Imperfect Text Transcriptions" ICME 2007: 224-227. *
Biatov. "Large Text and Audio Data Alignment for Multimedia Applications" 2003. *
Cardinal et al. "Segmentation of Recordings Based on Partial Transcriptions" 2005. *
Fisher, W.M.; Fiscus, J.G. "Better alignment procedures for speech recognition evaluation", Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, On page(s): 59 - 62 vol.2 Volume: 2, 27-30 April 1993. *
Hazen. "Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings" 2006. *
J. Picone, G.R. Doddington, and D.S. Pallett, "PhoneMediated Word Alignment for Speech Recognition Evaluation", IEEE Trans. ASSP, Vol. 38, No.3, March 1990, pp. 559-562. *
J. Picone, K.M. Goudie-Marshall, G.R. Doddington, and W. Fisher, "Automatic Text Alignment for Speech System Evaluation", IEEE Trans. ASSP, Vol. ASSP-34, NO. 4, August 1986, pp. 780-784. *
Lynn Wilcox, John S. Boreczky: Annotation and Segmentation for Multimedia Indexing and Retrieval. HICSS (2) 1998: 259-266. *
Mohamed El-Helaly, Aishy Amer. Synchronization of Processed Audio-Video Signals using Time-Stamps. IEEE International Conference on Image Processing. San Antonio, TX: IEEE, 2007, pp. 193-196. *
Moreno, P.J. et al., "A Recursive Algorithm for the Forced Alignment of Very Long Audio Segments," in Proceedings, ICSLP, 1998. *
Vignoli et al. "A Segmental Time-Alignment Tecnhique for Text-Speech Synchronization" 1999. *

Cited By (289)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US11928604B2 (en) 2005-09-08 2024-03-12 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US11012942B2 (en) 2007-04-03 2021-05-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US10255028B2 (en) 2008-07-04 2019-04-09 Booktrack Holdings Limited Method and system for making and playing soundtracks
US10140082B2 (en) 2008-07-04 2018-11-27 Booktrack Holdings Limited Method and system for making and playing soundtracks
US10095466B2 (en) 2008-07-04 2018-10-09 Booktrack Holdings Limited Method and system for making and playing soundtracks
US10095465B2 (en) 2008-07-04 2018-10-09 Booktrack Holdings Limited Method and system for making and playing soundtracks
US9223864B2 (en) 2008-07-04 2015-12-29 Booktrack Holdings Limited Method and system for making and playing soundtracks
US9135333B2 (en) 2008-07-04 2015-09-15 Booktrack Holdings Limited Method and system for making and playing soundtracks
US20110153047A1 (en) * 2008-07-04 2011-06-23 Booktrack Holdings Limited Method and System for Making and Playing Soundtracks
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US8498867B2 (en) * 2009-01-15 2013-07-30 K-Nfb Reading Technology, Inc. Systems and methods for selection and use of multiple characters for document narration
US8498866B2 (en) * 2009-01-15 2013-07-30 K-Nfb Reading Technology, Inc. Systems and methods for multiple language document narration
US10088976B2 (en) 2009-01-15 2018-10-02 Em Acquisition Corp., Inc. Systems and methods for multiple voice document narration
US20100318364A1 (en) * 2009-01-15 2010-12-16 K-Nfb Reading Technology, Inc. Systems and methods for selection and use of multiple characters for document narration
US8954328B2 (en) 2009-01-15 2015-02-10 K-Nfb Reading Technology, Inc. Systems and methods for document narration with multiple characters having multiple moods
US20100324903A1 (en) * 2009-01-15 2010-12-23 K-Nfb Reading Technology, Inc. Systems and methods for document narration with multiple characters having multiple moods
US20100324904A1 (en) * 2009-01-15 2010-12-23 K-Nfb Reading Technology, Inc. Systems and methods for multiple language document narration
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10824322B2 (en) 2010-01-11 2020-11-03 Apple Inc. Electronic text manipulation and display
US20130219322A1 (en) * 2010-01-11 2013-08-22 Apple Inc. Electronic text manipulation and display
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US12087308B2 (en) 2010-01-18 2024-09-10 Apple Inc. Intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10741185B2 (en) 2010-01-18 2020-08-11 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10692504B2 (en) 2010-02-25 2020-06-23 Apple Inc. User profiling for voice input processing
US8903723B2 (en) 2010-05-18 2014-12-02 K-Nfb Reading Technology, Inc. Audio synchronization for document narration with user-selected playback
US9478219B2 (en) 2010-05-18 2016-10-25 K-Nfb Reading Technology, Inc. Audio synchronization for document narration with user-selected playback
US8520025B2 (en) 2011-02-24 2013-08-27 Google Inc. Systems and methods for manipulating user annotations in electronic books
US8543941B2 (en) 2011-02-24 2013-09-24 Google Inc. Electronic book contextual menu systems and methods
US10067922B2 (en) 2011-02-24 2018-09-04 Google Llc Automated study guide generation for electronic books
US9645986B2 (en) 2011-02-24 2017-05-09 Google Inc. Method, medium, and system for creating an electronic book with an umbrella policy
US9063641B2 (en) 2011-02-24 2015-06-23 Google Inc. Systems and methods for remote collaborative studying using electronic books
US9501461B2 (en) 2011-02-24 2016-11-22 Google Inc. Systems and methods for manipulating user annotations in electronic books
US10417405B2 (en) 2011-03-21 2019-09-17 Apple Inc. Device access using voice authentication
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
CN103703431A (en) * 2011-06-03 2014-04-02 苹果公司 Automatically creating a mapping between text data and audio data
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US11350253B2 (en) 2011-06-03 2022-05-31 Apple Inc. Active transport based notifications
US10672399B2 (en) 2011-06-03 2020-06-02 Apple Inc. Switching between text data and audio data based on a mapping
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
WO2012167276A1 (en) * 2011-06-03 2012-12-06 Apple Inc. Automatically creating a mapping between text data and audio data
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US9613654B2 (en) 2011-07-26 2017-04-04 Booktrack Holdings Limited Soundtrack for electronic text
US9666227B2 (en) 2011-07-26 2017-05-30 Booktrack Holdings Limited Soundtrack for electronic text
US9613653B2 (en) 2011-07-26 2017-04-04 Booktrack Holdings Limited Soundtrack for electronic text
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9678634B2 (en) 2011-10-24 2017-06-13 Google Inc. Extensible framework for ereader tools
US9141404B2 (en) 2011-10-24 2015-09-22 Google Inc. Extensible framework for ereader tools
US9031493B2 (en) 2011-11-18 2015-05-12 Google Inc. Custom narration of electronic books
US20130191125A1 (en) * 2012-01-25 2013-07-25 Kabushiki Kaisha Toshiba Transcription supporting system and transcription supporting method
US11069336B2 (en) 2012-03-02 2021-07-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
WO2013151610A1 (en) * 2012-04-06 2013-10-10 Google Inc. Synchronizing progress in audio and text versions of electronic books
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US9069744B2 (en) 2012-05-15 2015-06-30 Google Inc. Extensible framework for ereader tools, including named entity information
US10102187B2 (en) 2012-05-15 2018-10-16 Google Llc Extensible framework for ereader tools, including named entity information
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9047356B2 (en) 2012-09-05 2015-06-02 Google Inc. Synchronizing multiple reading positions in electronic books
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10714117B2 (en) 2013-02-07 2020-07-14 Apple Inc. Voice trigger for a digital assistant
WO2014137074A1 (en) * 2013-03-05 2014-09-12 Lg Electronics Inc. Mobile terminal and method of controlling the mobile terminal
KR101952179B1 (en) 2013-03-05 2019-05-22 엘지전자 주식회사 Mobile terminal and control method for the mobile terminal
KR20140109167A (en) * 2013-03-05 2014-09-15 엘지전자 주식회사 Mobile terminal and control method for the mobile terminal
US10241743B2 (en) 2013-03-05 2019-03-26 Lg Electronics Inc. Mobile terminal for matching displayed text with recorded external audio and method of controlling the mobile terminal
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US9323733B1 (en) 2013-06-05 2016-04-26 Google Inc. Indexed electronic book annotations
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10769385B2 (en) 2013-06-09 2020-09-08 Apple Inc. System and method for inferring user intent from speech inputs
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US11048473B2 (en) 2013-06-09 2021-06-29 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US12010262B2 (en) 2013-08-06 2024-06-11 Apple Inc. Auto-activating smart responses based on activities from remote devices
US9898077B2 (en) * 2013-09-18 2018-02-20 Booktrack Holdings Limited Playback system for synchronised soundtracks for electronic media content
US20150082136A1 (en) * 2013-09-18 2015-03-19 Booktrack Holdings Limited Playback system for synchronised soundtracks for electronic media content
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
US10714095B2 (en) 2014-05-30 2020-07-14 Apple Inc. Intelligent assistant for home automation
US10699717B2 (en) 2014-05-30 2020-06-30 Apple Inc. Intelligent assistant for home automation
US10417344B2 (en) 2014-05-30 2019-09-17 Apple Inc. Exemplar-based natural language processing
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10657966B2 (en) 2014-05-30 2020-05-19 Apple Inc. Better resolution when referencing to concepts
US10878809B2 (en) 2014-05-30 2020-12-29 Apple Inc. Multi-command single utterance input method
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10453443B2 (en) 2014-09-30 2019-10-22 Apple Inc. Providing an indication of the suitability of speech recognition
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10390213B2 (en) 2014-09-30 2019-08-20 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10438595B2 (en) 2014-09-30 2019-10-08 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US10529332B2 (en) 2015-03-08 2020-01-07 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10930282B2 (en) 2015-03-08 2021-02-23 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11127397B2 (en) 2015-05-27 2021-09-21 Apple Inc. Device voice control
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10681212B2 (en) 2015-06-05 2020-06-09 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10354652B2 (en) 2015-12-02 2019-07-16 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10942703B2 (en) 2015-12-23 2021-03-09 Apple Inc. Proactive assistance based on dialog communication between devices
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10942702B2 (en) 2016-06-11 2021-03-09 Apple Inc. Intelligent device arbitration and control
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10580409B2 (en) 2016-06-11 2020-03-03 Apple Inc. Application integration with a digital assistant
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10698951B2 (en) * 2016-07-29 2020-06-30 Booktrack Holdings Limited Systems and methods for automatic-creation of soundtracks for speech audio
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11656884B2 (en) 2017-01-09 2023-05-23 Apple Inc. Application integration with a digital assistant
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10741181B2 (en) 2017-05-09 2020-08-11 Apple Inc. User interface for correcting recognition errors
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US10847142B2 (en) 2017-05-11 2020-11-24 Apple Inc. Maintaining privacy of personal information
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US10909171B2 (en) 2017-05-16 2021-02-02 Apple Inc. Intelligent automated assistant for media exploration
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10671251B2 (en) 2017-12-22 2020-06-02 Arbordale Publishing, LLC Interactive eReader interface generation based on synchronization of textual and audial descriptors
US11657725B2 (en) 2017-12-22 2023-05-23 Fathom Technologies, LLC E-reader interface system with audio and highlighting synchronization for digital books
US11443646B2 (en) * 2017-12-22 2022-09-13 Fathom Technologies, LLC E-Reader interface system with audio and highlighting synchronization for digital books
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10720160B2 (en) 2018-06-01 2020-07-21 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US10944859B2 (en) 2018-06-03 2021-03-09 Apple Inc. Accelerated task performance
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10504518B1 (en) 2018-06-03 2019-12-10 Apple Inc. Accelerated task performance
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11360739B2 (en) 2019-05-31 2022-06-14 Apple Inc. User activity shortcut suggestions
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11064244B2 (en) 2019-12-13 2021-07-13 Bank Of America Corporation Synchronizing text-to-audio with interactive videos in the video framework
US10805665B1 (en) 2019-12-13 2020-10-13 Bank Of America Corporation Synchronizing text-to-audio with interactive videos in the video framework
US11350185B2 (en) 2019-12-13 2022-05-31 Bank Of America Corporation Text-to-audio for interactive videos using a markup language
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence

Also Published As

Publication number Publication date
US8364488B2 (en) 2013-01-29
US20100299149A1 (en) 2010-11-25
US20100324904A1 (en) 2010-12-23
US8498866B2 (en) 2013-07-30
US20100318364A1 (en) 2010-12-16
US8359202B2 (en) 2013-01-22
US8954328B2 (en) 2015-02-10
US8498867B2 (en) 2013-07-30
US20100324903A1 (en) 2010-12-23
US8352269B2 (en) 2013-01-08
US20100318363A1 (en) 2010-12-16
US20100324905A1 (en) 2010-12-23

Similar Documents

Publication Publication Date Title
US20190196666A1 (en) Systems and Methods Document Narration
US8793133B2 (en) Systems and methods document narration
US8498867B2 (en) Systems and methods for selection and use of multiple characters for document narration
US9478219B2 (en) Audio synchronization for document narration with user-selected playback
US9330657B2 (en) Text-to-speech for digital literature
US6181351B1 (en) Synchronizing the moveable mouths of animated characters with recorded speech
US20080027726A1 (en) Text to audio mapping, and animation of the text
JP2003295882A (en) Text structure for speech synthesis, speech synthesizing method, speech synthesizer and computer program therefor
KR20220165666A (en) Method and system for generating synthesis voice using style tag represented by natural language
US20190088258A1 (en) Voice recognition device, voice recognition method, and computer program product
JP3936351B2 (en) Voice response service equipment
JP2009020264A (en) Voice synthesis device and voice synthesis method, and program
WO2010083354A1 (en) Systems and methods for multiple voice document narration
US20230377607A1 (en) Methods for dubbing audio-video media files
KR102585031B1 (en) Real-time foreign language pronunciation evaluation system and method
JP6957069B1 (en) Learning support system
JP3760420B2 (en) Voice response service equipment
Székely et al. Off the cuff: Exploring extemporaneous speech delivery with TTS
CN117475991A (en) Method and device for converting text into audio and computer equipment

Legal Events

Date Code Title Description
AS Assignment

Owner name: K-NFB READING TECHNOLOGY, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KURZWEIL, RAYMOND C.;ALBRECHT, PAUL;CHAPMAN, PETER;AND OTHERS;SIGNING DATES FROM 20100329 TO 20100819;REEL/FRAME:024921/0307

AS Assignment

Owner name: K-NFB READING TECHNOLOGY, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:K-NFB HOLDING TECHNOLOGY, INC.;REEL/FRAME:030059/0351

Effective date: 20130315

Owner name: K-NFB HOLDING TECHNOLOGY, INC., MASSACHUSETTS

Free format text: CHANGE OF NAME;ASSIGNOR:K-NFB READING TECHNOLOGY, INC.;REEL/FRAME:030058/0669

Effective date: 20130315

AS Assignment

Owner name: FISH & RICHARDSON P.C., MINNESOTA

Free format text: LIEN;ASSIGNOR:K-NFB HOLDING TECHNOLOGY, IMC.;REEL/FRAME:034599/0860

Effective date: 20141230

AS Assignment

Owner name: DIMENSIONAL STACK ASSETS LLC, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:K-NFB READING TECHNOLOGY, INC.;REEL/FRAME:035546/0205

Effective date: 20150302

AS Assignment

Owner name: EM ACQUISITION CORP., INC., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:DIMENSIONAL STACK ASSETS, LLC;REEL/FRAME:036593/0328

Effective date: 20150910

Owner name: DIMENSIONAL STACK ASSETS LLC, NEW YORK

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:FISH & RICHARDSON P.C.;REEL/FRAME:036629/0762

Effective date: 20150830

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION