CN109166581A - Audio recognition method, device, electronic equipment and computer readable storage medium - Google Patents
Audio recognition method, device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN109166581A CN109166581A CN201811126926.XA CN201811126926A CN109166581A CN 109166581 A CN109166581 A CN 109166581A CN 201811126926 A CN201811126926 A CN 201811126926A CN 109166581 A CN109166581 A CN 109166581A
- Authority
- CN
- China
- Prior art keywords
- speech recognition
- voice
- voice messaging
- result
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 230000002452 interceptive effect Effects 0.000 claims abstract description 130
- 230000008569 process Effects 0.000 claims abstract description 49
- 238000004891 communication Methods 0.000 claims description 15
- 230000001052 transient effect Effects 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 10
- 238000004590 computer program Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000009415 formwork Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0623—Item investigation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Finance (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Accounting & Taxation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the invention provides a kind of audio recognition method, device, electronic equipment and computer readable storage mediums, are applied to technical field of voice recognition.This method comprises: passing through the interactive voice process for being directed to user and terminal device, obtain the corresponding interactive information of interactive voice process, then speech recognition corpus is established according to interactive information, and then works as and receive the first voice messaging that user is inputted based on interactive information, speech recognition is carried out to the first voice messaging by speech recognition corpus, determines the first recognition result.The result of the speech recognition of voice messaging to be identified is limited within the scope of the speech recognition corpus established according to interactive information by the embodiment of the present invention, reduce the range of the possible corresponding recognition result information of voice messaging to be identified, so as to promoted voice messaging to be identified language identification accuracy rate, and then promoted user experience.
Description
Technical field
The present embodiments relate to technical field of voice recognition, more particularly to a kind of audio recognition method, device, electronics
Equipment and computer readable storage medium.
Background technique
With the development of speech recognition technology, speech recognition technology enters the rank applied in wider territory
Section, wherein mode by voice input provides the hot spot that the services such as retrieval, navigation become research for user, and solve because
The ambiguity bring speech recognition problem of voice, which becomes, provides the key of more preferable experience service for user.
Currently, when carrying out speech recognition to one section of voice to be identified, be involved in trained obtained acoustic model and
Language model can determine corresponding relationship between voice messaging to be identified and corresponding multiple text words by trained language model
Each probability, wherein the corresponding text word of maximum probability by as speech recognition as a result, for example, need to be to " yiyibushe "
Corresponding voice messaging carries out speech recognition, can determine voice messaging to be identified and text word " with reluctance " by language model
Corresponding maximum probability, " yiyibushe " corresponding voice messaging will be identified that " with reluctance ", however, if user
Issue corresponding " yiyibushe " voice messaging be because wanting to go to " clothes are not given up " clothes shop, according to existing voice identification technology,
The voice messaging of corresponding " yiyibushe " will be identified that text word " with reluctance " that the recognition result and user want
The result arrived is not consistent.Inventor in the specific implementation process, has found the ambiguity due to voice, it is accurate that there are speech recognitions
The low problem of rate.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of audio recognition method, device, electronic equipments and computer-readable
Storage medium promotes the accuracy rate of speech recognition, and then promotes user experience.
To solve the above-mentioned problems, the embodiment of the present invention mainly provides the following technical solutions:
In a first aspect, a kind of audio recognition method based on interactive information is provided, this method comprises:
For the interactive voice process of user and terminal device, the corresponding interactive information of interactive voice process is obtained;
Speech recognition corpus is established according to interactive information;
When the first voice messaging for receiving user and being inputted based on interactive information, by speech recognition corpus to the first language
Message breath carries out speech recognition, determines the first recognition result.
Second aspect, provides a kind of speech recognition equipment based on interactive information, which includes:
Module is obtained, for being directed to the interactive voice process of user and terminal device, it is corresponding to obtain interactive voice process
Interactive information;
Module is established, for establishing speech recognition corpus according to the interactive information for obtaining module acquisition;
Receiving module, for receiving user based on the first voice messaging for obtaining the interactive information input that module obtains;
Identification module, the first language for being received by the speech recognition corpus for establishing module foundation to receiving module
Message breath carries out speech recognition, determines the first recognition result.
The third aspect provides a kind of electronic equipment, which includes:
Processor, memory, communication interface and bus;
Wherein, processor, memory, communication interface complete mutual communication by bus;
Communication interface is for the information transmission between the electronic equipment and the communication equipment of relevant device;
Processor is used to call the program instruction in memory, to execute the language shown in first aspect based on interactive information
Voice recognition method.
Fourth aspect provides a kind of non-transient computer readable storage medium, which is characterized in that non-transient computer can
It reads storage medium and stores computer instruction, computer instruction makes computer execute the language based on interactive information shown in first aspect
Voice recognition method.
By above-mentioned technical proposal, technical solution provided in an embodiment of the present invention is at least had the advantage that
The embodiment of the invention provides a kind of audio recognition method, device, electronic equipment and computer readable storage medium,
Compared with the text word for corresponding to maximum probability with voice messaging to be identified is determined as the result of speech recognition by the prior art, this hair
Bright embodiment obtains the corresponding interactive information of interactive voice process by the interactive voice process for user and terminal device,
Then speech recognition corpus is established according to interactive information, and then works as and receives the first voice that user is inputted based on interactive information
Information carries out speech recognition to the first voice messaging by speech recognition corpus, determines the first recognition result, i.e. the present invention is real
The result for applying the speech recognition of voice messaging to be identified in example has been limited at the speech recognition corpus established according to interactive information
Within the scope of library, the range of the possible corresponding recognition result information of voice messaging to be identified is reduced, so as to be promoted wait know
The accuracy rate of the language identification of other voice messaging, and then promote user experience.
Above description is only the general introduction of technical solution of the embodiment of the present invention, in order to better understand the embodiment of the present invention
Technological means, and can be implemented in accordance with the contents of the specification, and in order to allow above and other mesh of the embodiment of the present invention
, feature and advantage can be more clearly understood, the special specific embodiment for lifting the embodiment of the present invention below.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
The limitation of embodiment.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of process signal of audio recognition method based on interactive information provided in an embodiment of the present invention
Figure;
Fig. 2 shows a kind of structural representations of the speech recognition equipment based on interactive information provided in an embodiment of the present invention
Figure;
Fig. 3 shows the structural representation of another speech recognition equipment based on interactive information provided in an embodiment of the present invention
Figure;
Fig. 4 shows the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
The embodiment of the invention provides a kind of audio recognition methods based on interactive information, as shown in Figure 1, this method packet
It includes:
Step S101: for the interactive voice process of user and terminal device, the corresponding interaction of interactive voice process is obtained
Information.
For the embodiment of the present invention, user can carry out by voice input mode or by the screen or keyboard of terminal device
The input mode and tablet computer, smart phone, palm PC, wearable device, mobile internet device of touch or key
(MID) and the intelligent terminals such as onboard navigation system interact, in user and above-mentioned terminal device interactive process
Corresponding interactive information will be generated, will acquire the corresponding interaction generated in interactive process with the terminal device that user interacts
Information.
Step S102: speech recognition corpus is established according to interactive information.
For the embodiment of the present invention, with terminal device that user interacts after the corresponding interactive information of acquisition, will deposit
The interactive information obtained is stored up to establish a speech recognition corpus, the interactive information that can also be will acquire carries out conversion process
It is stored again later, to establish a speech recognition corpus.
Step S103: when the first voice messaging for receiving user and inputting based on interactive information, pass through speech recognition corpus
Library carries out speech recognition to the first voice messaging, determines the first recognition result.
For the embodiment of the present invention, it is anti-will to obtain corresponding terminal equipment with corresponding terminal equipment interactive process by user
The corresponding interactive information of feedback, according to the corresponding interactive information that these feed back, user can determine further operation according to demand, such as
To corresponding terminal equipment issue voice instruction order, corresponding terminal equipment receive user sending voice instruction order after,
The voice instruction order of user will be identified by the speech recognition corpus established, to obtain corresponding speech recognition
As a result.
The embodiment of the invention provides a kind of audio recognition method based on interactive information, with the prior art will with it is to be identified
The result that the text word that voice messaging corresponds to maximum probability is determined as speech recognition is compared, and the embodiment of the present invention is by being directed to user
With the interactive voice process of terminal device, the corresponding interactive information of interactive voice process is obtained, is then established according to interactive information
Speech recognition corpus, and then when receiving the first voice messaging that user is inputted based on interactive information, pass through speech recognition language
Expect that library carries out speech recognition to the first voice messaging, determines the first recognition result, i.e., voice letter to be identified in the embodiment of the present invention
The result of the speech recognition of breath has been limited within the scope of the speech recognition corpus established according to interactive information, is reduced
The range of the possible corresponding recognition result information of voice messaging to be identified, so as to promote the language identification of voice messaging to be identified
Accuracy rate, and then promoted user experience.
The embodiment of the invention provides alternatively possible embodiments, wherein step S101 includes:
Step S1011 (not shown): when receive user input the second voice messaging, to the second voice messaging into
Row speech recognition determines the second recognition result, using the second recognition result as interactive information;
For the embodiment of the present invention, user can send according to their own needs to corresponding terminal device, such as navigation equipment
Voice instruction order orders when corresponding terminal device receives the voice instruction that user issues, will pass through speech recognition technology
Processing is decoded to voice instruction order, available corresponding text results information, corresponding terminal device can also root
Corresponding operation is executed according to the information after decoding process, and then obtains certain operation result information, terminal device will acquire this
A little interactive information of the result information as user and corresponding terminal equipment.
For example, user A is intended to buy clothes near current location, corresponding voice messaging is issued to mobile phone, mobile phone is obtaining
Take family sending voice command after, processing is decoded by speech recognition technology, obtained speech recognition result for " near
Clothes shop ", mobile phone executes positioning and determines current geographical location information, and speech recognition result and current geographical location are believed
Manner of breathing combines the search operaqtion executed, and determines corresponding operation result information, and mobile phone will acquire these operation result informations simultaneously
The interactive information interacted as user A and mobile phone.
After determining the second recognition result, this method further include:
Step S104 (not shown): the second recognition result is shown;
Connect example, mobile phone can will execute the operation result information that is obtained after search operaqtion by screen or voice mode to
Family A is shown.
Wherein, the first voice messaging that the user in step S103 is inputted based on interactive information, comprising:
The first voice messaging that user is inputted based on the second recognition result.
Connect example, user A know mobile phone by after screen is shown or voice prompting mode is shown result information, Ke Yigen
According to the demand of oneself, corresponding voice instruction order is sent.
Wherein, step S102 includes:
Step S1021 (not shown): speech recognition corpus is established according to the second speech recognition result.
Precedent is connect, mobile phone will store the obtained operation result information after carry out search operaqtion, obtain speech recognition language
Expect library, it can also be to being stored again after obtained operation result information conversion process, to obtain speech recognition corpus.
The embodiment of the present invention carries out speech recognition according to the second voice messaging of user's input and obtains the second identification information,
And be shown the second recognition result to user, while speech recognition corpus is established according to the second speech recognition result, it builds
Vertical speech recognition corpus matches with the interactive information generated when human-computer interaction before, is based on speech recognition corpus to be subsequent
Library further executes speech recognition and the range of diminution speech recognition result provides reliable guarantee.
The embodiment of the invention provides alternatively possible embodiments, when the interactive process in step S101 is searched for voice
When rope interactive process, speech recognition is carried out to the second voice messaging in step S1011, determines the second recognition result, comprising:
Step A (not shown): speech recognition is carried out to the second voice messaging, obtains the second speech recognition result.
For the embodiment of the present invention, user can send according to their own needs to corresponding terminal device, such as navigation equipment
Voice instruction order orders when corresponding terminal device receives the voice instruction that user issues, will pass through speech recognition technology
The speech recognition result of processing, available text word form or syllable form is decoded to voice instruction order.
Step B (not shown): being scanned for according to the second speech recognition result by general search library, is determined and the
The corresponding search result of two speech recognition results, and using corresponding search result as the second recognition result.
For the embodiment of the present invention, terminal device is according to the speech recognition knot of obtained text word form or syllable form
Fruit is scanned for by third party's search engine or other retrieval ports, so that corresponding search result is obtained, the search result
As to the recognition result of user speech instruction order.
For example, user A will go to neighbouring clothes shop to buy clothes, corresponding voice instruction order, mobile phone are issued to mobile phone
Identify that obtained speech recognition result is " neighbouring clothes shop ", then, hand by the voice instruction order to user A
Machine executes corresponding positioning operation to determine current geographic position, is scanned for by scheduled search engine, to be used
Multiple clothes shops including " clothes are not given up " clothes shop near current position locating for the A of family, obtained multiple clothes shops
Information is the recognition result of voice instruction order.
It for the embodiment of the present invention, is identified by the second voice messaging to user, and according to the second obtained language
Sound recognition result carries out corresponding search operaqtion, the second recognition result of the second voice messaging is obtained, so as to according to user's
Voice command provides the coordinate indexing result information being consistent with user demand.
The embodiment of the invention provides alternatively possible embodiment, step S103 includes:
Step S1031 (not shown): speech recognition is carried out to the first voice messaging, obtains the first speech recognition knot
Fruit.
Step S1032 (not shown): the first speech recognition result is scanned for by speech recognition corpus, is obtained
To search result corresponding with the first speech recognition result, and using corresponding search result as the first recognition result.
Wherein, the second speech recognition result in the first speech recognition result and step A can be syllable sequence;
Wherein, speech recognition corpus can be syllable sequence corpus.
Language is passed through after receiving the first voice messaging that user issues according to interactive information for the embodiment of the present invention
Sound identification technology is decoded the first voice messaging received, to obtain the first language of the first voice messaging speech recognition
Then sound recognition result is scanned in speech recognition corpus using the first obtained speech recognition result information, is obtained
The corresponding search result of first speech recognition result, and know corresponding search result as the first of the first voice messaging of user
Other result.
For example, user A is interacted by carrying out phonetic search with mobile phone before, to obtain near user A including " clothes
Do not give up " multiple clothes shops including clothes shop, obtained multiple clothes shop's information are added to speech recognition corpus, used by mobile phone
" clothes are not given up " clothes shop is gone to according to obtained multiple clothes shop's information, selection in family, and issues the voice of corresponding " yiyibushe "
Instruction order after mobile phone receives user instruction, indicates order according to voice of the speech recognition technology to correspondence " yiyibushe "
It is decoded, obtains the syllable sequence of " yiyibushe ", then according to the corresponding syllables sequence stored in speech recognition corpus
With the index relative between text word, index relative existing for " yiyibushe " syllable sequence and " clothes are not given up " is determined, thus
The recognition result for determining corresponding " yiyibushe " voice instruction order is " clothes are not given up ".
For the embodiment of the present invention, by the speech recognition corpus of foundation, so that user is issued according to interactive information
The recognition result of the first voice messaging be limited within the scope of speech recognition corpus, it is corresponding to reduce voice messaging
The range of first recognition result information, to improve the accuracy rate of speech recognition.
The embodiment of the invention provides a kind of speech recognition equipments based on interactive information, as shown in Fig. 2, the speech recognition
Device 20 may include: to obtain module 201, establish module 202, receiving module 203 and identification module 204, wherein
Module 201 is obtained, for being directed to the interactive voice process of user and terminal device, it is corresponding to obtain interactive voice process
Interactive information.
Module 202 is established, for establishing speech recognition corpus according to the interactive information for obtaining the acquisition of module 201.
Receiving module 203, for receiving user based on the first voice letter for obtaining the interactive information input that module 201 obtains
Breath.
Identification module 204 receives receiving module 203 for the speech recognition corpus by establishing the foundation of module 202
The first voice messaging arrived carries out speech recognition, determines the first recognition result.
It is mentioned in the executable above-mentioned one embodiment of the present invention of the speech recognition equipment based on interactive information of the present embodiment
A kind of audio recognition method based on interactive information supplied, realization principle is similar, and details are not described herein again.
The embodiment of the invention provides a kind of speech recognition equipment based on interactive information, with the prior art will with it is to be identified
The result that the text word that voice messaging corresponds to maximum probability is determined as speech recognition is compared, and the embodiment of the present invention is by being directed to user
With the interactive voice process of terminal device, the corresponding interactive information of interactive voice process is obtained, is then established according to interactive information
Speech recognition corpus, and then when receiving the first voice messaging that user is inputted based on interactive information, pass through speech recognition language
Expect that library carries out speech recognition to the first voice messaging, determines the first recognition result, i.e., voice letter to be identified in the embodiment of the present invention
The result of the speech recognition of breath has been limited within the scope of the speech recognition corpus established according to interactive information, is reduced
The range of the possible corresponding recognition result information of voice messaging to be identified, so as to promote the language identification of voice messaging to be identified
Accuracy rate, and then promoted user experience.
Speech recognition equipment the embodiment of the invention provides another kind based on interactive information, as shown in figure 3, the present embodiment
Device may include: obtain module 301, establish module 302, receiving module 303 and identification module 304, wherein
Module 301 is obtained, for being directed to the interactive voice process of user and terminal device, it is corresponding to obtain interactive voice process
Interactive information.
Wherein, the acquisition module 301 in Fig. 3 is same or similar with the function of acquisition module 201 in Fig. 2.
Module 302 is established, for establishing speech recognition corpus according to the interactive information for obtaining the acquisition of module 301.
Wherein, in Fig. 3 to establish module 302 same or similar with the function of establishing module 202 in Fig. 2.
Receiving module 303, for receiving user based on the first voice letter for obtaining the interactive information input that module 301 obtains
Breath.
Wherein, the receiving module 303 in Fig. 3 is same or similar with the function of receiving module 203 in Fig. 2.
Identification module 304 receives receiving module 303 for the speech recognition corpus by establishing the foundation of module 302
The first voice messaging arrived carries out speech recognition, determines the first recognition result.
Wherein, the identification module 304 in Fig. 3 is same or similar with the function of identification module 204 in Fig. 2.
Specifically, receiving module 303, specifically for receiving the second voice messaging of user's input;
Identification module 304 carries out speech recognition specifically for the second voice messaging for receiving to receiving module, determines the
Two recognition results;
Module 301 is obtained, specifically for using the second determining recognition result of identification module 304 as interactive information;
The device further include: display module 305;
Display module 305, for the second recognition result to be shown;
Wherein, the first voice messaging that user is inputted based on interactive information, comprising:
The first voice messaging that user is inputted based on the second recognition result.
Specifically, module 302 is established, is established specifically for the second speech recognition result determined according to identification module 304
Speech recognition corpus.
The embodiment of the present invention carries out speech recognition according to the second voice messaging of user's input and obtains the second identification information,
And be shown the second recognition result to user, while speech recognition corpus is established according to the second speech recognition result, it builds
Vertical speech recognition corpus matches with the interactive information generated when human-computer interaction before, is based on speech recognition corpus to be subsequent
Library further executes speech recognition and the range of diminution speech recognition result provides reliable guarantee.
Specifically, when interactive voice process is phonetic search interactive process,
Identification module 304, including recognition unit 3041 and search unit 3042;
Recognition unit 3041 obtains the second speech recognition result for carrying out speech recognition to the second voice messaging;
Search unit 3042 passes through general inspection specifically for the second speech recognition result obtained according to recognition unit 3041
Suo Ku is scanned for, and determines search result corresponding with the second speech recognition result, and using corresponding search result as second
Recognition result.
It for the embodiment of the present invention, is identified by the second voice messaging to user, and according to the second obtained language
Sound recognition result carries out corresponding search operaqtion, the second recognition result of the second voice messaging is obtained, so as to according to user's
Voice command provides the coordinate indexing result information being consistent with user demand.
Specifically, recognition unit 3041 are specifically used for carrying out speech recognition to the first voice messaging, obtain the knowledge of the first voice
Other result;
Search unit 3042 scans for the first speech recognition result by speech recognition corpus, obtains and first
The corresponding search result of speech recognition result, and using corresponding search result as the first recognition result.
Wherein, the first speech recognition result and the second speech recognition result are syllable sequence,
It is wherein syllable sequence corpus according to the semantics recognition corpus that the second speech recognition result is established.
For the embodiment of the present invention, by the speech recognition corpus of foundation, so that user is issued according to interactive information
The recognition result of the first voice messaging be limited within the scope of speech recognition corpus, it is corresponding to reduce voice messaging
The range of result information, to improve the accuracy rate of speech recognition.
One provided in the above embodiment of the present invention can be performed in the speech recognition equipment based on interactive information of the present embodiment
Audio recognition method of the kind based on interactive information, realization principle is similar, and details are not described herein again.
The embodiment of the invention provides a kind of speech recognition equipment based on interactive information, with the prior art will with it is to be identified
The result that the text word that voice messaging corresponds to maximum probability is determined as speech recognition is compared, and the embodiment of the present invention is by being directed to user
With the interactive voice process of terminal device, the corresponding interactive information of interactive voice process is obtained, is then established according to interactive information
Speech recognition corpus, and then when receiving the first voice messaging that user is inputted based on interactive information, pass through speech recognition language
Expect that library carries out speech recognition to the first voice messaging, determines the first recognition result, i.e., voice letter to be identified in the embodiment of the present invention
The result of the speech recognition of breath has been limited within the scope of the speech recognition corpus established according to interactive information, is reduced
The range of the possible corresponding recognition result information of voice messaging to be identified, so as to promote the language identification of voice messaging to be identified
Accuracy rate, and then promoted user experience.
The embodiment of the invention provides a kind of electronic equipment, as shown in figure 4, electronic equipment shown in Fig. 4 40 includes:
Processor 41, memory 42, communication interface 43 and bus 44;
Wherein, processor 41, memory 42, communication interface 43 complete mutual communication by bus 44;
Communication interface 43 is for the information transmission between the electronic equipment 40 and the communication equipment of relevant device;
Processor 41 is used to call the program instruction in memory 42, to realize Fig. 2 or shown in Fig. 3 acquisition module, build
The function of formwork erection block, the function of receiving module and identification module and display module shown in Fig. 3 305.
Processor 41 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystal
Pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described in present disclosure
Various illustrative logic blocks, module and circuit.Processor 41 is also possible to realize the combination of computing function, such as includes one
The combination of a or multi-microprocessor, DSP and the combination of microprocessor etc..
Bus 44 may include an access, and information is transmitted between said modules.Bus 44 can be pci bus or EISA is total
Line etc..Bus 44 can be divided into address bus, data/address bus, control bus etc..For convenient for indicating, only with a thick line in Fig. 4
It indicates, it is not intended that an only bus or a type of bus.
Memory 42 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM or
Person can store the other kinds of dynamic memory of information and instruction, be also possible to EEPROM, CD-ROM or other CDs are deposited
Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium or
Other magnetic storage apparatus of person or can be used in carry or store have instruction or data structure form desired program code
And can by any other medium of computer access, but not limited to this.
Specifically, memory 42 is used to store the application code for executing application scheme, and is controlled by processor 41
System executes.Processor 41 is for executing the application code stored in memory 42, to realize Fig. 2 or embodiment illustrated in fig. 3
The movement of the speech recognition equipment based on interactive information provided.
The embodiment of the invention provides a kind of electronic equipment to be suitable for above method embodiment.Details are not described herein.
The embodiment of the invention provides a kind of electronic equipment, and the prior art will corresponding probability be most with voice messaging to be identified
The result that big text word is determined as speech recognition is compared, and the embodiment of the present invention is handed over by the voice for user and terminal device
Mutual process obtains the corresponding interactive information of interactive voice process, then establishes speech recognition corpus according to interactive information, in turn
When the first voice messaging for receiving user and being inputted based on interactive information, by speech recognition corpus to the first voice messaging into
Row speech recognition determines the first recognition result, i.e., the result quilt of the speech recognition of voice messaging to be identified in the embodiment of the present invention
It has been limited within the scope of the speech recognition corpus established according to interactive information, reducing voice messaging to be identified may be right
The range for the recognition result information answered, so as to promoted voice messaging to be identified language identification accuracy rate, and then promoted use
Family experience.
The embodiment of the invention provides a kind of non-transient computer readable storage medium, non-transient computer readable storage mediums
Matter stores computer instruction, and computer instruction makes computer execute the voice based on interactive information of any one of above-described embodiment
Recognition methods.
The embodiment of the invention provides a kind of non-transient computer readable storage mediums to be suitable for above method embodiment,
This is repeated no more.
The embodiment of the invention provides a kind of non-transient computer readable storage mediums, will be with language to be identified with the prior art
The result that the text word that message ceases corresponding maximum probability is determined as speech recognition is compared, the embodiment of the present invention by for user with
The interactive voice process of terminal device obtains the corresponding interactive information of interactive voice process, then establishes language according to interactive information
Sound identifies corpus, and then when receiving the first voice messaging that user is inputted based on interactive information, passes through speech recognition corpus
Library carries out speech recognition to the first voice messaging, determines the first recognition result, i.e., voice messaging to be identified in the embodiment of the present invention
Speech recognition result be limited at according to interactive information establish speech recognition corpus within the scope of, reduce to
Identify voice messaging may corresponding recognition result information range, so as to promote the language identification of voice messaging to be identified
Accuracy rate, and then promote user experience.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flashRAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (10)
1. a kind of audio recognition method based on interactive information characterized by comprising
For the interactive voice process of user and terminal device, the corresponding interactive information of the interactive voice process is obtained;
Speech recognition corpus is established according to the interactive information;
When the first voice messaging for receiving the user and inputting based on the interactive information, pass through the speech recognition corpus
Speech recognition is carried out to first voice messaging, determines the first recognition result.
2. audio recognition method according to claim 1, which is characterized in that the voice for user and terminal device
Interactive process obtains the corresponding interactive information of the interactive voice process, comprising:
When the second voice messaging for receiving user's input, speech recognition is carried out to second voice messaging, determines the second knowledge
Not as a result, using second recognition result as interactive information;
After second recognition result of determination, this method further include:
Second recognition result is shown;
The first voice messaging that the user is inputted based on the interactive information, comprising:
The first voice messaging that the user is inputted based on second recognition result.
3. audio recognition method according to claim 2, which is characterized in that described to establish voice according to the interactive information
Identify corpus, comprising:
Speech recognition corpus is established according to second speech recognition result.
4. audio recognition method according to claim 2, which is characterized in that when the interactive voice process is phonetic search
It is described that speech recognition is carried out to second voice messaging when interactive process, determine the second recognition result, comprising:
Speech recognition is carried out to second voice messaging, obtains the second speech recognition result;
It is scanned for according to second speech recognition result by general search library, determining and second speech recognition result
Corresponding search result, and using the corresponding search result as the second recognition result.
5. audio recognition method according to claim 1, which is characterized in that described to pass through the speech recognition corpus pair
First voice messaging carries out speech recognition, determines the first recognition result, comprising:
Speech recognition is carried out to first voice messaging, obtains the first speech recognition result;
First speech recognition result is scanned for by the speech recognition corpus, obtains knowing with first voice
The corresponding search result of other result, and using the corresponding search result as the first recognition result.
6. audio recognition method according to claim 4 or 5, which is characterized in that first speech recognition result and institute
Stating the second speech recognition result is syllable sequence;
It wherein, is syllable sequence corpus according to the speech recognition corpus that second speech recognition result is established.
7. a kind of speech recognition equipment based on interactive information characterized by comprising
Module is obtained, for being directed to the interactive voice process of user and terminal device, it is corresponding to obtain the interactive voice process
Interactive information;
Module is established, the interactive information for obtaining according to the acquisition module establishes speech recognition corpus;
Receiving module, for receiving the first voice of the interactive information input that the user is obtained based on the acquisition module
Information;
Identification module, for received to the receiving module by the speech recognition corpus for establishing module foundation
First voice messaging carries out speech recognition, determines the first recognition result.
8. speech recognition equipment according to claim 7, which is characterized in that
The receiving module, specifically for receiving the second voice messaging of user's input;
The identification module carries out speech recognition specifically for second voice messaging received to the receiving module,
Determine the second recognition result;
The acquisition module, specifically for using the second determining recognition result of the identification module as interactive information;
Described device further include: display module;
The display module, for second recognition result to be shown;
The first voice messaging that the user is inputted based on the interactive information, comprising:
The first voice messaging that the user is inputted based on second recognition result.
9. a kind of electronic equipment characterized by comprising
Processor, memory, communication interface and bus;
Wherein, the processor, memory, communication interface complete mutual communication by the bus;
The communication interface is for the information transmission between the electronic equipment and the communication equipment of relevant device;
The processor is used to call program instruction in the memory, is required with perform claim 1 to as claimed in claim 6
Audio recognition method.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Store up computer instruction, the computer instruction requires the computer perform claim 1 to described in any one of claim 6
Audio recognition method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811126926.XA CN109166581A (en) | 2018-09-26 | 2018-09-26 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811126926.XA CN109166581A (en) | 2018-09-26 | 2018-09-26 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109166581A true CN109166581A (en) | 2019-01-08 |
Family
ID=64880397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811126926.XA Pending CN109166581A (en) | 2018-09-26 | 2018-09-26 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109166581A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110517675A (en) * | 2019-08-08 | 2019-11-29 | 出门问问信息科技有限公司 | Exchange method, device, storage medium and electronic equipment based on speech recognition |
CN111199730A (en) * | 2020-01-08 | 2020-05-26 | 北京松果电子有限公司 | Voice recognition method, device, terminal and storage medium |
CN112927570A (en) * | 2021-02-23 | 2021-06-08 | 京东方科技集团股份有限公司 | Interaction method, interaction device, computer equipment and computer-readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130006604A1 (en) * | 2011-06-28 | 2013-01-03 | International Business Machines Corporation | Cross-lingual audio search |
US20140236572A1 (en) * | 2013-02-20 | 2014-08-21 | Jinni Media Ltd. | System Apparatus Circuit Method and Associated Computer Executable Code for Natural Language Understanding and Semantic Content Discovery |
CN106128453A (en) * | 2016-08-30 | 2016-11-16 | 深圳市容大数字技术有限公司 | The Intelligent Recognition voice auto-answer method of a kind of robot and robot |
CN107305768A (en) * | 2016-04-20 | 2017-10-31 | 上海交通大学 | Easy wrongly written character calibration method in interactive voice |
-
2018
- 2018-09-26 CN CN201811126926.XA patent/CN109166581A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130006604A1 (en) * | 2011-06-28 | 2013-01-03 | International Business Machines Corporation | Cross-lingual audio search |
US20140236572A1 (en) * | 2013-02-20 | 2014-08-21 | Jinni Media Ltd. | System Apparatus Circuit Method and Associated Computer Executable Code for Natural Language Understanding and Semantic Content Discovery |
CN107305768A (en) * | 2016-04-20 | 2017-10-31 | 上海交通大学 | Easy wrongly written character calibration method in interactive voice |
CN106128453A (en) * | 2016-08-30 | 2016-11-16 | 深圳市容大数字技术有限公司 | The Intelligent Recognition voice auto-answer method of a kind of robot and robot |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110517675A (en) * | 2019-08-08 | 2019-11-29 | 出门问问信息科技有限公司 | Exchange method, device, storage medium and electronic equipment based on speech recognition |
CN110517675B (en) * | 2019-08-08 | 2021-12-03 | 出门问问信息科技有限公司 | Interaction method and device based on voice recognition, storage medium and electronic equipment |
CN111199730A (en) * | 2020-01-08 | 2020-05-26 | 北京松果电子有限公司 | Voice recognition method, device, terminal and storage medium |
CN112927570A (en) * | 2021-02-23 | 2021-06-08 | 京东方科技集团股份有限公司 | Interaction method, interaction device, computer equipment and computer-readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11158102B2 (en) | Method and apparatus for processing information | |
US10553201B2 (en) | Method and apparatus for speech synthesis | |
CN107464554B (en) | Method and device for generating speech synthesis model | |
US11308671B2 (en) | Method and apparatus for controlling mouth shape changes of three-dimensional virtual portrait | |
US20160034558A1 (en) | Generating a clustering model and clustering based on the clustering model | |
CN111951780B (en) | Multitasking model training method for speech synthesis and related equipment | |
US11749255B2 (en) | Voice question and answer method and device, computer readable storage medium and electronic device | |
CN108877782A (en) | Audio recognition method and device | |
CN110534085B (en) | Method and apparatus for generating information | |
CN110610698B (en) | Voice labeling method and device | |
CN109166581A (en) | Audio recognition method, device, electronic equipment and computer readable storage medium | |
CN111667810B (en) | Method and device for acquiring polyphone corpus, readable medium and electronic equipment | |
US10936815B2 (en) | Removable spell checker device | |
CN109829164A (en) | Method and apparatus for generating text | |
CN113935337A (en) | Dialogue management method, system, terminal and storage medium | |
CN112509562A (en) | Method, apparatus, electronic device and medium for text post-processing | |
CN109815448B (en) | Slide generation method and device | |
CN112182255A (en) | Method and apparatus for storing media files and for retrieving media files | |
CN110232920A (en) | Method of speech processing and device | |
CN114564606A (en) | Data processing method and device, electronic equipment and storage medium | |
JP2021108095A (en) | Method for outputting information on analysis abnormality in speech comprehension | |
CN110110099A (en) | A kind of multimedia document retrieval method and device | |
CN117708266A (en) | Intention recognition method, device, electronic equipment and storage medium | |
CN109189822A (en) | Data processing method and device | |
CN113761136B (en) | Dialogue processing, information processing, model training method, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190108 |
|
RJ01 | Rejection of invention patent application after publication |