CN109712620A - Voice interactive method, interactive voice equipment and storage device - Google Patents
Voice interactive method, interactive voice equipment and storage device Download PDFInfo
- Publication number
- CN109712620A CN109712620A CN201811585283.5A CN201811585283A CN109712620A CN 109712620 A CN109712620 A CN 109712620A CN 201811585283 A CN201811585283 A CN 201811585283A CN 109712620 A CN109712620 A CN 109712620A
- Authority
- CN
- China
- Prior art keywords
- data
- voice
- natural language
- language processing
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
This application discloses a kind of voice interactive method, interactive voice equipment and storage devices.Wherein, which comprises obtain to voice responsive data;Natural language processing is carried out to voice responsive data to described using non-network end data;Result based on the natural language processing is responded.Above scheme can reduce influence of the Network status to equipment interactive voice.
Description
Technical field
This application involves speech processes fields, more particularly to a kind of voice interactive method, interactive voice equipment and storage
Device.
Background technique
With the continuous development of information technology, user interaction techniques are widely used.And interactive voice be used as after
User interaction patterns of new generation after keyboard mutuality, touch screen interaction, have the characteristics that convenient and efficient, are gradually widely used in
In each field.For example, family's smart home device such as speaker can also add voice on its traditional function for field of household appliances
Interactive mode, with the interaction between realization and user.
Currently, equipment needs to handle its collected user voice data when carrying out interactive voice with user
And then it responds thereto.However, being usually destined to network-side since the process handled voice data is complex
Special Language Processing platform is responded to realize by receiving the processing result that Language Processing platform is fed back.It is this according to
Rely the speech interaction mode handled on line, the network condition that equipment is presently in can be limited to, if current network conditions are poor,
Equipment possibly can not in time for user voice data carry out respond or can not directly respond, therefore extreme influence equipment with
Normal voice interaction between user.
Summary of the invention
The application is mainly solving the technical problems that provide a kind of voice interactive method, interactive voice equipment and storage dress
It sets, can reduce influence of the Network status to interactive voice.
To solve the above-mentioned problems, the application first aspect provides a kind of voice interactive method, which is characterized in that packet
It includes: obtaining to voice responsive data;Natural language processing is carried out to the voice data using non-network end data;Based on described
The result of natural language processing is responded.
To solve the above-mentioned problems, the application second aspect provides a kind of interactive voice equipment, including what is be mutually coupled
Memory and processor;Wherein, the program instruction that the processor is used to execute the memory storage realizes above-mentioned method.
To solve the above-mentioned problems, the application third aspect provides a kind of storage device, and being stored with processor can run
Program instruction, described program instruct for executing above-mentioned method.
In above scheme, the natural language processing of voice responsive data is treated using non-network end data, the treatment process
It without relying on network end data, therefore can realize the interactive voice of off-line type, reduce and even avoid Network status to equipment voice
Interactive influence.Moreover, because the data interaction of network-side needs to expend certain time, therefore compared to relying on network end data,
Equipment directly relies on non-network end data and carries out natural language processing, and then the knot based on the natural language processing to voice data
Fruit is responded, and the response speed of language interaction can be accelerated.
Detailed description of the invention
Fig. 1 is the flow diagram of one embodiment of the application voice interactive method;
Fig. 2 is the flow diagram of step S120 in another embodiment of the application voice interactive method;
Fig. 3 a is the flow diagram of step S223 in the application voice interactive method another embodiment
Fig. 3 b is the structural schematic diagram of one embodiment of the application interactive voice equipment;
Fig. 4 a is the flow diagram of the application voice interactive method another embodiment;
Fig. 4 b is the structural schematic diagram of one embodiment of the application voice interactive system;
Fig. 5 is the structural schematic diagram of another embodiment of the application interactive voice equipment;
Fig. 6 is the structural schematic diagram of one embodiment of the application storage device.
Specific embodiment
With reference to the accompanying drawings of the specification, the scheme of the embodiment of the present application is described in detail.
In being described below, for illustration and not for limitation, propose such as specific system structure, interface, technology it
The detail of class, so as to provide a thorough understanding of the present application.
The terms " system " and " network " are often used interchangeably herein.The terms "and/or", only
It is a kind of incidence relation for describing affiliated partner, indicates may exist three kinds of relationships, for example, A and/or B, can indicates: individually
There are A, exist simultaneously A and B, these three situations of individualism B.In addition, character "/" herein, typicallys represent forward-backward correlation pair
As if a kind of relationship of "or".In addition, " more " expressions two or more than two herein.
Referring to Fig. 1, Fig. 1 is the flow diagram of one embodiment of the application voice interactive method.In the present embodiment, the party
Method is executed by interactive voice equipment, which can be the arbitrary equipment with processing capacity, for example, intelligent family
Occupy equipment, computer, tablet computer, mobile phone etc..The smart home device can for air-conditioning, electric cooker, micro-wave oven, refrigerator,
The household electric appliances such as speaker.Specifically, method includes the following steps:
S110: it obtains to voice responsive data.
Specifically, interactive voice equipment is equipped with voice collecting circuit, which specifically may include microphone etc.
Sound collector and for being filtered to the collected electric signal of sound collector and the filter amplification circuit of enhanced processing.
Interactive voice equipment detects the voice data in local environment by the voice collecting circuit, as to voice responsive number
According to.By taking interactive voice equipment is intelligent air condition as an example, intelligent air condition can detect what user issued by the voice collecting circuit
Voice data, and using the voice data as to voice responsive data, and then following S120 are executed to realize between user
Interaction.
It is understood that reduce the circuit complexity of interactive voice equipment, and then reduce the body of interactive voice equipment
Product, interactive voice equipment can also be obtained from other voice capture devices to voice responsive data.For example, the interactive voice equipment
It can be connect by modes such as USB, bluetooths with voice capture device, voice capture device is detecting the voice number in local environment
According to rear, by communication modes such as USB data line transmission or Bluetooth transmissions using the voice data detected as to voice responsive
Data are transmitted to interactive voice equipment, so that interactive voice equipment is obtained to voice responsive data.
S120: natural language processing is carried out to the voice data using non-network end data.
Wherein, the non-network end data be not to be obtained by communication networks such as internet, local area network, mobile communications networks
The teledata obtained, the non-network end data may include that the local data of interactive voice equipment and interactive voice equipment carry out
The processing equipment of electrical connection (is such as electrically connected by USB interface, bus interface input/output interface mode with interactive voice equipment
Processing equipment) storage data.
The present embodiment treats voice responsive data using non-network end data and carries out natural language processing, i.e., treats in realization
Voice responsive data carry out being not necessarily to rely on arbitrary network end data during natural language processing, which only needs to rely on voice
The non-network end data of interactive device can only be realized.Interactive voice equipment even if can also voluntarily carry out in off-line case as a result,
To the processing of voice responsive data, without being influenced by network condition.The natural language processing process may include language to be responded
Sound data are identified to obtain text data, and then carry out semantic understanding to text data, and generate correspondence according to semantic results
Response data.
S130: the result based on the natural language processing is responded.
Carry out obtaining corresponding response data from Language Processing for example, S120 treats voice responsive data, such as execute instruction or
For informing the reply data of user.Language interactive device executes this and executes instruction or soundplay or show the reply number
According to respond the interaction to response language data, between realization and user of user's input.
It is understood that needing to treat voice responsive data to realize interactive voice and carrying out natural language processing, then
It is responded based on natural language processing result.And realize the biggish processing capacity of natural language processing needs, therefore the application language
The processing capacity of natural language processing can be realized configured with can not depend on network end data in sound interactive device, and then can realize
The application can off-line type interactive voice.
The present embodiment using non-network end data treat voice responsive data carry out natural language processing, the treatment process without
Network end data need to be relied on, therefore can realize the interactive voice of off-line type, reduces and even avoids Network status to the friendship of equipment voice
Mutual influence.Moreover, because the data interaction of network-side needs to expend certain time, therefore compared to dependence network end data, language
Sound interactive device directly relies on non-network end data and treats the progress natural language processing of voice responsive data, and then is based on the nature
The result of Language Processing is responded, and the response speed of language interaction can be accelerated.
In the following, carrying out natural language processing using the voice responsive data for the treatment of that non-network end data is realized to the application
Process carry out concrete example explanation.
Fig. 2 and Fig. 3 b is please referred to, Fig. 2 is the process signal of S120 in another embodiment of the application voice interactive method
Figure, Fig. 3 b is the structural schematic diagram of one embodiment of the application interactive voice equipment.In the present embodiment, in above-mentioned S120 to described
Natural language processing is carried out to voice responsive data, specifically includes following sub-step:
S221: will be to voice responsive data conversion at text data.
In order to realize subsequent natural language processing, interactive voice equipment is first led to what S110 was obtained to voice responsive data
It crosses speech recognition the relevant technologies to be converted, to obtain corresponding text data.
Specifically, interactive voice equipment can will be written to voice responsive data conversion based on the vocabulary in local lexicon
Notebook data.For example, interactive voice equipment is in local, there are acoustic model, language model and pronunciation dictionaries, wherein acoustic model
It is the representation of knowledge to the difference of acoustics, phonetics, the variable of environment, speaker's gender, accent etc..Language model is to one group
The representation of knowledge of word Sequence composition.Pronounceable dictionary (lexicon) is contained from word (words) to phoneme (phones)
Mapping, effect are for connection to acoustic model and language model.The above-mentioned acoustic model prestored, language model and pronunciation dictionary
At least one of include several vocabulary (such as pronunciation dictionary include several word-phonetics mapping), wherein this is used to deposit
The database or data file for storing up vocabulary are referred to as local lexicon.Specifically, language interactive device first will be to voice responsive number
Pattern match, the text data finally identified in conjunction with pronunciation dictionary and language model are carried out according to acoustic model.
Since the number of the vocabulary prestored in interactive voice equipment will affect the accuracy of the speech recognition, for example, if
Interactive voice equipment does not prestore to vocabulary a certain in voice responsive data, then interactive voice equipment can not carry out the voice vocabulary
Correct text conversion.Therefore, theoretically, the vocabulary that interactive voice equipment prestores is more, and speech recognition experience is better.But
On the other hand, the vocabulary that interactive voice equipment prestores is more, and the requirement to the processing capacity of interactive voice equipment is higher, namely
Interactive voice equipment may need to configure the processor of higher height reason ability.Therefore the accuracy to speech recognition can be comprehensively considered
It is required that the processing capacity that can be supported with interactive voice equipment, determines the vocabulary that interactive voice equipment prestores.In an applied field
Jing Zhong can prestore medium-scale vocabulary for the local lexicon of interactive voice equipment, for example, the vocabulary in the local lexicon
Quantity is more than 1000 and less than 10000, and for example 1000,2000,5000 or 10000 etc..
In addition, include in local lexicon it is following at least one: set the vocabulary and universal word in field.Voice as a result,
Interactive device can accordingly identify the voice data of setting field and general field.Wherein, field is set, alternatively referred to as vertical neck
Domain refers to the preset field with certain industry background, set the vocabulary in field as with the stronger word of field relevance
It converges, such as this sets field as field of household appliances, or is further field of air conditioning, the micro-wave oven field etc. of household electrical appliances, field of air conditioning
Vocabulary include opening, air-conditioning, refrigeration, heating etc..For general field is Relative vertical field, and not only for a certain row
Industry field, but it is generally applicable to be related to conglomerate field or conglomerate field, universal word is that conglomerate field may
The vocabulary used, such as some spoken words: today, tomorrow, ask, help I, once, weather, such as how.
In an application scenarios, processing capacity needed for considering vocabulary decision interactive voice equipment can be by interactive voice
Equipment is limited to the interactive voice for setting field, and then the vocabulary prestored needed for reducing.For example, interactive voice equipment
The vocabulary in setting field is stored in local lexicon and the universal word that is likely to occur in the exchange in the setting field such as
Above-mentioned spoken word.For example, interactive voice equipment is fan, field is set as fan art, therefore its vocabulary for setting field
Including switch, wind speed etc, simultaneously because spoken randomness, it is also possible to it is related to many meaningless auxiliary vacabularies, such as:
" me is helped to open fan ", " opening fan to me ", " fan please be open once ", " helping me ", " giving me " in these sentences,
" asking ", " once ", these universal words are no for understanding in all senses, but for speech recognition, it is necessary to must cover this
The general word of class.For the interaction for setting field typically for one, saying vocabulary is often limited, general Chinese
Vocabulary is to a certain amount of as more than 2000 can substantially unimpeded exchange.Therefore, interactive voice equipment need to only prestore its setting field
Vocabulary and relevant some universal words reach certain vocabulary, the interaction in setting field can be realized.
In the present embodiment, the step of which is converted into text, is by the automatic speech recognition in interactive voice equipment
(Automatic Speech Recognition, ASR) module 31 is realized.The ASR module 31 can be stored with above-mentioned acoustic model,
The local data files such as pronunciation dictionary, language model, for language data to be converted into the symbol sequence of computer capacity identification
Column --- text.
It is understood that in other embodiments, above-mentioned acoustic model, pronunciation dictionary, language for speech recognition
The local data files such as model can also be not stored in interactive voice equipment local, and be stored in and be electrically connected with interactive voice equipment
In other processing equipments connect.
S222: judge whether text data belongs to the content in setting field.If so, executing S223, otherwise terminate process.
Specifically, interactive voice equipment is analyzed the text data obtained through S110 to determine what text data was related to
Whether content belongs to setting field.In the present embodiment, interactive voice equipment is locally being preset with disaggregated model 32, this S222 can have
Body includes: to be analyzed using local disaggregated model 32 text data, and determine the textual data based on analysis result
According to whether belonging to the content in setting field.For example, interactive voice equipment prestores the classification established using deep learning scheduling algorithm
The text data being converted to is input to the disaggregated model 32 and handled, obtained text data and belong to by model, ARS module 31
A possibility that setting field as a result, the possibility result may include execution degree when text data belongs to default field and/or
Score.Obtained result a possibility that belonging to setting field is compared by interactive voice equipment with preset threshold, if possibility
As a result it is greater than the preset threshold, then can determine that text data belongs to the content in setting field.
It is understood that in other embodiments, interactive voice equipment is not limited to the interactive voice in setting field
When, S222 can not also be executed namely interactive voice equipment executes S221 and is not necessarily to judge directly to execute S223 later.
S223: natural language processing is carried out to text data.
Specifically, interactive voice equipment is rung in the case where judging that natural language to be processed belongs to the content in setting field
Natural language processing should be carried out to the text data in the judging result.Wherein, interactive voice equipment can be used existing
Natural language processing mode handles text data, to obtain the response data with text Data Matching, the number of responses
According to being, for example, to execute instruction, or reply data for informing user.
In the present embodiment, as shown in Figure 3a, step S223 specifically includes following sub-step:
S223a: semantic understanding is carried out to text data, obtains semantic results.
Specifically, it first treats the corresponding text data of voice responsive data and carries out natural language understanding, also i.e. by textual data
According to be converted into computer it will be appreciated that slot position information, the semantic of text data reception and registration can be obtained using the slot position information and believe
Breath.The present embodiment combination Fig. 3 b is illustrated, and interactive voice equipment can be equipped with natural language understanding (Natural
Language Understanding, NLU) module 33, sub-step S223a can realize by the NLU module 33.For example, NLU
Module 33 receives the classification results of the output of disaggregated model 32, and the allocation result exported in disaggregated model 32 is text data category
When setting field content, executes and semantic understanding is carried out to text data.
In one embodiment, sub-step S223a can include: slot position filling is carried out to the text data, obtains slot position letter
Breath, wherein the slot position information is used to indicate the semantic results of the text data.Slot position (also referred to as semantic slot, slot) refers to
The expression of semantic understanding, usually by some parameters to description, such as: to the corresponding text data of voice responsive data are as follows: " today
Shenzhen weather is how ", there is following slot position information: { field: weather;Time: today;Address: Shenzhen }.Wherein, for language
The field that sound interactive device is directed to is different, and slot position can be set as different.For example, interactive voice equipment be directed to set field as family
Electrical domain, slot position information can include: equipment control, kitchen, scene etc..
In the present embodiment, interactive voice equipment is first filled into the text data in non-setting field, therefore only need to be to setting field
Text data carry out semantic understanding, thus can reduce interactive voice equipment realize semantic understanding processing capability requirements.
S223b: the first response data is generated based on the semantic results.
Specifically, first response data can for generate it is matched execute instruction and execute this execute instruction or generate
For interactive reply data.Wherein, interactive voice equipment can be equipped with dialogue management (Dialog Management, DM) module
34 (alternatively referred to as applying logic module 34), spatial term (Natural Language Generation, NLG) module
35 and Text To Speech synthesize (Text To Speech, TTS) module 36.The execution of sub-step S223b can be by DM module
34, the combination of NLG module 35 and TTS module 36 or part of module are realized.
In an application scenarios, it includes: to generate and the semantic results are matched holds that interactive voice equipment, which executes the S223b,
Row instruction;Accordingly, interactive voice equipment is specifically included when executing above-mentioned S130 executes instruction described in execution;And/or it is raw
At with the matched reply data of the semantic results, wherein the reply data can reply data or speech answering for text
Data;For example, the generation replys data with the matched text of the semantic results, comprising: the determining and semantic results
The reply content matched;Spatial term is carried out to the reply content, text is obtained and replys data;The generation and institute's predicate
The adopted matched speech answering data of result, comprising: the determining and matched reply content of the semantic results;To the reply content
Spatial term is carried out, text is obtained and replys data;The text is replied into data conversion into speech answering data.It is corresponding
Ground, interactive voice equipment are specifically included when executing above-mentioned S130: showing that the text is replied described in data or voice broadcasting
Speech answering data.
Wherein, this is executed instruction can be directly generated by DM module 34, and be exported to the execution unit of interactive voice equipment
37, which execute this, executes instruction, which is, for example, the flabellum driver of fan, compressor motor of air-conditioning etc..The reply
Data can be first by the content to be replied of 34 determination of DM module, and the Content Transformation that will be replied using NLG module 35 is literary at generating
The reply data of this type, are replied according to display mode, then export the reply data of text type to voice and hand over
The display unit (Fig. 3 b does not show) of mutual equipment, to show the reply data by display unit;It is carried out according to voice broadcast mode
Reply, then exported the reply data of text type to TTS module 36 by NLG module 35, with by TTS module by text type
Reply data carry out text compressing, obtain the reply data of sound-type, and then by the reply data of the sound-type
The sound playing unit 38 of output to interactive voice equipment carrys out voice and plays the reply data.It can be understood that, voice is handed over
Mutual equipment, which can both generate to execute instruction or generate, replys data.For example, the voice data that air-conditioning receives user's sending " helps me to beat
Open lower air-conditioning ", air-conditioning natural language processing is carried out to it and generate air-conditioning open instructions and reply data " it is good, beaten for you
Turn on the aircondition ", air-conditioning starts itself in response to the air-conditioning open instructions, and by the voice broadcast reply data " it is good, for
You open air-conditioning ".
It is understood that module included by interactive voice equipment described in Fig. 3 b or unit can be different programs
Part can also be the functional module group realized by different circuits.
The present embodiment, interactive voice equipment filter out the voice data to non-setting field, only carry out the language in setting field
Sound data carry out natural language processing, can reduce during interactive voice equipment realizes speech recognition and other natural language processings
Required processing capacity, and then reduce the realization difficulty and cost of interactive voice equipment.Moreover, because the correlation in setting field
Vocabulary is limited, therefore within the scope of the vocabulary of processing capacity for meeting interactive voice equipment, it is general that more correlation can be increased
Vocabulary, and then more reliable accurately understanding can be carried out to the natural language in the setting field, therefore can realize to a certain extent true
Positive natural language interaction.
Fig. 4 a is please referred to, Fig. 4 a is the flow diagram of the application voice interactive method another embodiment.In the present embodiment,
This method is executed by interactive voice equipment, which can be the arbitrary equipment with processing capacity, for example, intelligence
Can home equipment, computer, tablet computer, mobile phone etc. specifically, method includes the following steps:
S410: it obtains to voice responsive data.
S420: natural language processing is carried out to voice responsive data to described using non-network end data.
S430: the result based on the natural language processing obtained using non-network end data is responded.
Wherein, the explanation of step S410-S430 please refers to the associated description of above example, and this will not be repeated here.
S440: whether judge, which can treat response language data using non-network end data, carries out natural language processing.If
Be then to execute S450, otherwise, process can be terminated, with and receive it is new when response data, continue to execute the application voice friendship
Mutual method.
In the present embodiment, S420 step is to rely on non-network end data to carry out natural language to voice responsive data to described
Processing, it is found that the realization of the natural language processing to voice responsive data is needed by interactive voice as described in above example
Pre-stored some relative words or other data in equipment, since the processing capacity of interactive voice equipment is limited, therefore its is pre-
The related data deposited is also limited, thus there may be using interactive voice equipment local pre-stored data can not to voice data into
The case where row speech recognition and subsequent natural language processing.Therefore whether interactive voice equipment monitor itself after executing S420
Based on carrying out natural language processing to voice responsive data and responded based on processing result, for example, detection is in preset time
Inside whether there is the response action (such as execute instruction or voice plays and replys data) carried out to described to voice responsive data.If
It is not detected in preset time to the response action to voice responsive data, it is determined that utilize non-network end number to detect
According to natural language processing can not be carried out to voice responsive data to described, namely off-line type interactive voice cannot achieve at present, this
When, interactive voice equipment then considers that relying on network-side carries out natural language processing to voice responsive data to described.
S450: it relies on network end data and carries out natural language processing to voice responsive data to described.
Whether interactive voice equipment can be detected first currently can realize network communication, if cannot, it initiates network connection and asks
It asks, communication link is established to carry out subsequent data interaction with request.After determining achievable network communication, interactive voice equipment
Network end data is relied on to carry out natural language processing to voice responsive data to described.For example, as shown in Figure 4 b, interactive voice
System includes interactive voice equipment 41 and network side equipment 42 interconnected, and the interactive voice equipment 41 is as described above
Interactive voice equipment, network side equipment 42 can be for remote server etc., for carrying out voice to the voice data received
Identification and natural language processing, to generate the second response data.Specifically, interactive voice equipment 41 will be described to voice responsive
Data are sent to network side equipment 42;Interactive voice equipment 41 receives the second response data that the network side equipment 42 is fed back;
Wherein, second response data is that the network side equipment is obtained to described to the progress natural language processing of voice responsive data
It arrives, executes instruction or reply data as described above.Wherein, network side equipment is based on described to voice responsive data progress nature
The process of Language Processing can refer to interactive voice equipment itself and carry out natural language processing to voice responsive data based on described
Process, therefore this will not be repeated here.
S460: it is responded based on the result for relying on the natural language processing that network end data obtains.
For example, interactive voice equipment is after the second response data for receiving network-side feedback, which is held
Row second response data, such as execute corresponding instruction or voice broadcasting reply data etc..Specifically, it can refer to above-mentioned S130 step
Rapid associated description.
In the present embodiment, when off-line type interactive voice cannot achieve, interactive voice equipment can further rely on network-side
Data come realize be based on it is described carry out natural language processing to voice responsive data and responded, namely by realizing language on line
Sound interaction, thus using the combination on offline and line, it is ensured that the reliability of interactive voice.
Referring to Fig. 5, Fig. 5 is the structural schematic diagram of one embodiment of the application interactive voice equipment.In the present embodiment, the language
Sound interactive device 50 includes memory 51, processor 52 and telecommunication circuit 53.Wherein, telecommunication circuit 53, memory 51 distinguish coupling
Connect processor 52.Specifically, the various components of interactive voice equipment 50 can be coupled by bus or interactive voice is set
Standby 50 processor is connect with other assemblies one by one respectively.The interactive voice equipment 50 can be computer, mobile phone, smart home
The equipment that equipment etc. has certain processing capacity.Wherein, smart home device can be air-conditioning, electric cooker, micro-wave oven, electric ice
The household electric appliances such as case, speaker.
Telecommunication circuit 53 with other equipment for communicating.For example, working as the natural language that interactive voice equipment needs to provide on line
When saying processing function, telecommunication circuit 53 can be communicated with network side equipment.
Memory 51 is used for the data of the program instruction and processor 52 of the execution of storage processor 52 during processing
Such as non-network end data, wherein the memory 51 includes non-volatile memory portion, for storing above procedure instruction.
Processor 52 controls the operation of interactive voice equipment 50, and processor 52 can also be known as CPU (Central
Processing Unit, central processing unit).Processor 52 may be a kind of IC chip, the processing energy with signal
Power.Processor 52 can also be general processor, digital signal processor (DSP), specific integrated circuit (ASIC), ready-made compile
Journey gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.It is logical
It can be microprocessor with processor or the processor be also possible to any conventional processor etc..In addition, processor 52 can
To be realized jointly by multiple at circuit chip.
In the present embodiment, the program instruction that processor 52 is stored by calling memory 51, it is any of the above-described for executing
The step of embodiment of the method method.
For example, processor 52 is for obtaining to voice responsive data;Using non-network end data to described to voice responsive
Data carry out natural language processing;Result based on the natural language processing, which is responded, (such as controls above-mentioned execution list
The components such as member and voice playing unit execute and the matched operation of natural language processing result).
In some embodiments, processor 52 execute it is described to it is described to voice responsive data carry out natural language processing packet
It includes: to voice responsive data conversion at text data by described in;Natural language processing is carried out to the text data.
Further, memory 51 is stored with lexicon, and processor 52, which executes, to be turned described in described incite somebody to action to voice responsive data
Change text data into, it may include: based on the vocabulary in local lexicon will it is described to voice responsive data conversion at text data.
Wherein, the vocabulary quantity in the local lexicon can be for more than 1000 and less than 10000.
Wherein, may include in the local lexicon it is following at least one: set the vocabulary and universal word in field.
In some embodiments, processor 52 execute it is described will it is described to voice responsive data conversion at text data it
Afterwards, it is also used to: judging whether the text data belongs to the content in setting field;Processor 52 executes described to the textual data
According to progress natural language processing, comprising: the content for belonging to the setting field in response to the text data, to the textual data
According to progress natural language processing.
In some embodiments, memory 51 is stored with disaggregated model, and processor 52 executes the judgement textual data
According to whether belonging to the content in setting field, comprising: the text data is analyzed using local disaggregated model, and based on point
Analysis result determines whether the text data belongs to the content in setting field.
In some embodiments, processor 52 executes described to text data progress natural language processing, comprising: right
The text data carries out semantic understanding, obtains semantic results;The first response data is generated based on the semantic results.
In some embodiments, processor 52 executes described based on the semantic results the first response data of generation, can wrap
It includes: generating and the semantic results are matched executes instruction;The result based on the natural language processing is responded, packet
It includes: being executed instruction described in execution.
In some embodiments, described to generate the first response data based on the semantic results, comprising: to generate and institute's predicate
The adopted matched reply data of result, wherein the reply data are that text replys data or speech answering data;It is described to be based on institute
The result for stating natural language processing is responded, comprising: shows that the text replys data or voice plays the speech answering
Data.
Wherein, processor 52 executes the generation and the matched speech answering data of the semantic results, it may include: it determines
With the matched reply content of the semantic results;Spatial term is carried out to the reply content, text is obtained and replys data;
The text is replied into data conversion into speech answering data.
In some embodiments, whether processor 52 is also used to judge to utilize non-network end data can be to described wait respond
Language data carries out natural language processing;In response to that can be carried out certainly to described to response language data using non-network end data
Right Language Processing relies on network end data and carries out natural language processing to voice responsive data to described.
Further, processor 52 execute it is described judge using non-network end data whether can be to described to response language
Data carry out natural language processing, it may include: whether detection has within a preset time carries out to described to voice responsive data
Response action;Processor 52 executes institute's dependence network end data and carries out at natural language to described to voice responsive data
Reason, it may include: network side equipment is sent to voice responsive data for described by telecommunication circuit 53;It is connect by telecommunication circuit 53
Receive the second response data of the network side equipment feedback, wherein second response data is the network side equipment to institute
It states and is obtained to the progress natural language processing of voice responsive data.
It is understood that in another embodiment, if interactive voice equipment is not necessarily to be communicated with network-side, the language
Sound interactive device may not include above-mentioned telecommunication circuit 53.
Referring to Fig. 6, the application also provides a kind of embodiment of storage device.In the present embodiment, which is deposited
The program instruction 61 that processor can be run is contained, which is used to execute the method in above-described embodiment.
The storage device 60 be specifically as follows USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory),
Random access memory (RAM, Random Access Memory), magnetic or disk etc. can store Jie of program instruction
Matter, or may be the server for being stored with the program instruction, which can be sent to other for the program instruction of storage
Equipment operation, or can also be with the program instruction of the self-operating storage.
In some embodiments, storage device 60 can also be memory as shown in Figure 5.
In above scheme, the natural language processing of voice responsive data is treated using non-network end data, the treatment process
It without relying on network end data, therefore can realize the interactive voice of off-line type, reduce and even avoid Network status to equipment voice
Interactive influence.Moreover, because the data interaction of network-side needs to expend certain time, therefore compared to relying on network end data,
Equipment directly relies on non-network end data and carries out natural language processing, and then the knot based on the natural language processing to voice data
Fruit is responded, and the response speed of language interaction can be accelerated.
In several embodiments provided herein, it should be understood that disclosed method and apparatus can pass through it
Its mode is realized.For example, device embodiments described above are only schematical, for example, stroke of module or unit
Point, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can
To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for
The mutual coupling, direct-coupling or communication connection of opinion can be through some interfaces, the INDIRECT COUPLING of device or unit
Or communication connection, it can be electrical property, mechanical or other forms.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit
Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks
On unit.It can select some or all of unit therein according to the actual needs to realize the mesh of present embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
It, can if integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product
To be stored in a computer readable storage medium.Based on this understanding, the technical solution of the application substantially or
Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products
Out, which is stored in a storage medium, including some instructions are used so that a computer equipment
(can be personal computer, server or the network equipment etc.) or processor (processor) execute each implementation of the application
The all or part of the steps of methods.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
Claims (13)
1. a kind of voice interactive method characterized by comprising
It obtains to voice responsive data;
Natural language processing is carried out to voice responsive data to described using non-network end data;
Result based on the natural language processing is responded.
2. the method according to claim 1, wherein described carry out natural language to voice responsive data to described
Processing, comprising:
To voice responsive data conversion at text data by described in;
Natural language processing is carried out to the text data.
3. according to the method described in claim 2, it is characterized in that, to voice responsive data conversion at textual data described in the general
According to, comprising:
Based on the vocabulary in local lexicon will it is described to voice responsive data conversion at text data;
Wherein, include in the local lexicon it is following at least one: set the vocabulary and universal word in field.
4. according to the method described in claim 2, it is described will it is described to voice responsive data conversion at text data after, institute
State method further include:
Judge whether the text data belongs to the content in setting field;
It is described that natural language processing is carried out to the text data, comprising:
The content for belonging to the setting field in response to the text data carries out natural language processing to the text data.
5. according to the method described in claim 4, the content for judging the text data and whether belonging to setting field, packet
It includes:
The text data is analyzed using local disaggregated model, and whether the text data is determined based on analysis result
Belong to the content in setting field.
6. according to the method described in claim 2, it is characterized in that, it is described to the text data carry out natural language processing,
Include:
Semantic understanding is carried out to the text data, obtains semantic results;
The first response data is generated based on the semantic results.
7. according to the method described in claim 6, it is characterized in that, described generate the first number of responses based on the semantic results
According to, comprising:
It generates and the semantic results are matched executes instruction;
The result based on the natural language processing is responded, comprising:
It is executed instruction described in execution.
8. according to the method described in claim 6, it is characterized in that, described generate the first number of responses based on the semantic results
According to, comprising:
It generates and the matched reply data of the semantic results, wherein the reply data are that text replys data or voice returns
Complex data;
The result based on the natural language processing is responded, comprising:
Show that the text replys data or voice plays the speech answering data.
9. according to the method described in claim 8, it is characterized in that, the generation and the matched reply number of the semantic results
According to, comprising:
The determining and matched reply content of the semantic results;
Spatial term is carried out to the reply content, text is obtained and replys data;
The text is replied into data conversion into speech answering data.
10. the method according to claim 1, wherein the method also includes:
Judge whether natural language processing can be carried out to response language data to described using non-network end data;
In response to using non-network end data natural language processing can be carried out to response language data to described, network-side is relied on
Data carry out natural language processing to voice responsive data to described.
11. according to the method described in claim 10, it is characterized in that, whether described judge to utilize non-network end data can be right
It is described to carry out natural language processing to response language data, comprising:
Whether detection has within a preset time carries out to the response action to voice responsive data;
The dependence network end data carries out natural language processing to voice responsive data to described, comprising:
Network side equipment is sent to voice responsive data by described;
Receive the second response data of the network side equipment feedback, wherein second response data is that the network-side is set
It is standby to be obtained to described to the progress natural language processing of voice responsive data.
12. a kind of interactive voice equipment, which is characterized in that including the memory and processor being mutually coupled;Wherein, the processing
The program instruction that device is used to execute the memory storage realizes the described in any item methods of claim 1 to 11.
13. a kind of storage device, which is characterized in that be stored with the program instruction that can be run by processor, described program instruction
For realizing the described in any item methods of claim 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811585283.5A CN109712620A (en) | 2018-12-24 | 2018-12-24 | Voice interactive method, interactive voice equipment and storage device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811585283.5A CN109712620A (en) | 2018-12-24 | 2018-12-24 | Voice interactive method, interactive voice equipment and storage device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109712620A true CN109712620A (en) | 2019-05-03 |
Family
ID=66256173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811585283.5A Pending CN109712620A (en) | 2018-12-24 | 2018-12-24 | Voice interactive method, interactive voice equipment and storage device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109712620A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106486122A (en) * | 2016-12-26 | 2017-03-08 | 旗瀚科技有限公司 | A kind of intelligent sound interacts robot |
CN108320747A (en) * | 2018-02-08 | 2018-07-24 | 广东美的厨房电器制造有限公司 | Appliances equipment control method, equipment, terminal and computer readable storage medium |
WO2018199374A1 (en) * | 2017-04-24 | 2018-11-01 | 엘지전자 주식회사 | Audio device and control method therefor |
-
2018
- 2018-12-24 CN CN201811585283.5A patent/CN109712620A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106486122A (en) * | 2016-12-26 | 2017-03-08 | 旗瀚科技有限公司 | A kind of intelligent sound interacts robot |
WO2018199374A1 (en) * | 2017-04-24 | 2018-11-01 | 엘지전자 주식회사 | Audio device and control method therefor |
CN108320747A (en) * | 2018-02-08 | 2018-07-24 | 广东美的厨房电器制造有限公司 | Appliances equipment control method, equipment, terminal and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110211580B (en) | Multi-intelligent-device response method, device, system and storage medium | |
CN112201246B (en) | Intelligent control method and device based on voice, electronic equipment and storage medium | |
CN110557451B (en) | Dialogue interaction processing method and device, electronic equipment and storage medium | |
CN107424607A (en) | Voice control mode switching method and device and equipment with device | |
EP2770445A2 (en) | Method and system for supporting a translation-based communication service and terminal supporting the service | |
US7689424B2 (en) | Distributed speech recognition method | |
CN111161726B (en) | Intelligent voice interaction method, device, medium and system | |
CN107507616A (en) | The method to set up and device of gateway scene | |
CN109661704A (en) | Context-aware inquiry identification for electronic equipment | |
CN106847291A (en) | Speech recognition system and method that a kind of local and high in the clouds is combined | |
EP2504745B1 (en) | Communication interface apparatus and method for multi-user | |
CN112151013A (en) | Intelligent equipment interaction method | |
JP2018109663A (en) | Speech processing unit, dialog system, terminal device, program, and speech processing method | |
CN113555018A (en) | Voice interaction method and device | |
CN112767916B (en) | Voice interaction method, device, equipment, medium and product of intelligent voice equipment | |
CN110047486A (en) | Sound control method, device, server, system and storage medium | |
CN110262278B (en) | Control method and device of intelligent household electrical appliance and intelligent household electrical appliance | |
CN116013257A (en) | Speech recognition and speech recognition model training method, device, medium and equipment | |
WO2022222045A1 (en) | Speech information processing method, and device | |
CN109712620A (en) | Voice interactive method, interactive voice equipment and storage device | |
CN111128127A (en) | Voice recognition processing method and device | |
CN115567336B (en) | Wake-free voice control system and method based on smart home | |
CN111414760B (en) | Natural language processing method, related equipment, system and storage device | |
CN114999496A (en) | Audio transmission method, control equipment and terminal equipment | |
CN108303900A (en) | Method, device and system for playing audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190503 |