CN109871238A

CN109871238A - Voice interactive method, device and storage medium

Info

Publication number: CN109871238A
Application number: CN201910000655.1A
Authority: CN
Inventors: 牛飞; 王芃; 陈果果; 刘晓峰; 张�杰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2019-01-02
Filing date: 2019-01-02
Publication date: 2019-06-11

Abstract

The present invention provides a kind of voice interactive method, device and storage medium, this method comprises: receiving wake-up states message that peripheral hardware end is sent and the first audio for waking up peripheral hardware end, wake-up states message are used to indicate peripheral hardware end and are in wake-up states；If it is determined that including the wake-up information at peripheral hardware end in the first audio, then the second audio of peripheral hardware end transmission is received, and the second audio is sent to server, instructed so that server returns to the first response to terminal according to the second audio；The first response instruction that server is sent is received, and the first response is executed according to the first response instruction control peripheral hardware end and instructs corresponding first response action.After the present invention wakes up peripheral hardware end by voice, wake-up information is confirmed again by terminal, avoids that false wake-up occurs, and enrich the interactive function of peripheral hardware end and terminal, improves user experience.

Description

Voice interactive method, device and storage medium

Technical field

The present invention relates to technical field of voice interaction more particularly to a kind of voice interactive methods, device and storage medium.

Background technique

Bluetooth (Bluetooth) is a kind of wireless technology standard, it can be achieved that fixed equipment, mobile device and building people domain Short-range data exchange between net；After terminal and bluetooth equipment are attached, according to the category of bluetooth equipment, bluetooth can be set It is standby to carry out corresponding operation；As bluetooth equipment be Baffle Box of Bluetooth when, terminal can play music by bluetooth equipment.

In the prior art, the interactive function between terminal and bluetooth equipment is single, does not meet the side of current device intelligence To poor user experience.

Summary of the invention

The present invention provides a kind of voice interactive method, device and storage medium, after waking up peripheral hardware end by voice, by terminal Wake-up information is confirmed again, avoids that false wake-up occurs, and enrich the interactive function of peripheral hardware end and terminal, improves use Family experience.

The first aspect of the present invention provides a kind of voice interactive method, is applied to terminal, comprising:

The first audio of wake-up states message and the wake-up peripheral hardware end that peripheral hardware end is sent is received, the wake-up states disappear Breath is used to indicate the peripheral hardware end and is in wake-up states；

If it is determined that including the wake-up information at the peripheral hardware end in first audio, then receive what the peripheral hardware end was sent Second audio, and second audio is sent to server, so that the server is according to second audio to the end End returns to the first response instruction；

The first response instruction that the server is sent is received, and described outer according to the first response instruction control If end executes first response and instructs corresponding first response action.

Optionally, described if it is determined that include in first audio peripheral hardware end wake-up information after, further includes:

The first radio reception instruction is sent to the peripheral hardware end, the first radio reception instruction is used to indicate the peripheral hardware end and starts to receive Sound.

Optionally, described to be corresponded to according to the first response instruction control peripheral hardware end execution the first response instruction The first response action after, further includes:

If receiving the third audio that the peripheral hardware end is sent in the first preset duration, and include in the third audio The wake-up information at the peripheral hardware end, then send the second radio reception instruction to the peripheral hardware end, and the second radio reception instruction is used for Indicate that radio reception is continued at the peripheral hardware end.

Optionally, first audio for receiving wake-up states message that peripheral hardware end is sent and waking up the peripheral hardware end it Afterwards, further includes:

If it is determined that not including the wake-up information for having the peripheral hardware end in first audio, then sends and stop to the peripheral hardware end Dormancy information, the dormancy information are used to indicate the peripheral hardware end and enter dormant state.

It is optionally, described that second audio is sent to after server, further includes:

It receives the stopping that the server is sent and sends message, the stopping sends message and is used to indicate the terminal stopping Audio is sent to the server, it is described to stop sending message being the of the server after receiving second audio In two preset durations, sent when not receiving four audio that the terminal is sent；

It is sent to the peripheral hardware end and stops radio reception message, the stopping radio reception message is used to indicate the peripheral hardware end and stops receiving Sound.

Optionally, after transmission the second radio reception instruction to the peripheral hardware end, further includes:

If not receiving the 4th audio that the peripheral hardware end is sent in third preset duration, enter idle state, and Idle message is sent to the peripheral hardware end.

The second aspect of the present invention provides a kind of voice interactive method, is applied to peripheral hardware end, comprising:

Wake-up states message is sent to terminal and wakes up first audio at the peripheral hardware end, and the wake-up states message is used for Notify the terminal, the peripheral hardware end is in wake-up states；

The second audio is sent to the terminal, so that second audio is sent to server by the terminal, so that institute It states server and the first response instruction is returned to the terminal according to second audio, second audio is true in the terminal It include to send after the wake-up information at the peripheral hardware end in fixed first audio；

It executes first response and instructs corresponding first response action.

Optionally, before second audio of transmission to the terminal, further includes:

The first radio reception instruction that the terminal is sent is received, the first radio reception instruction is used to indicate the peripheral hardware end and starts Radio reception, it includes the corresponding wake-up in the peripheral hardware end in determining first audio that the first radio reception instruction, which is the terminal, It is sent when information.

Optionally, after corresponding first response action of the execution the first response instruction, further includes:

Third audio is sent to the terminal in the first preset duration；

The second radio reception instruction that the terminal is sent is received, the second radio reception instruction is used to indicate the peripheral hardware end and continues Radio reception, the second radio reception instruction are that the terminal includes to call out in determining the third audio described in the peripheral hardware end is corresponding It is sent when information of waking up.

Optionally, the method also includes:

The dormancy information that the terminal is sent is received, into dormant state, the dormancy information is that the terminal determines institute It is sent when stating in the first audio not comprising the wake-up information for having the peripheral hardware end.

It is optionally, described to before the first audio that the terminal sends wake-up states message and wakes up the peripheral hardware end, Further include:

The first audio of user is collected, and enters wake-up states, the peripheral hardware end determines in the first audio to include described The wake-up information at peripheral hardware end.

Optionally, after second audio of transmission to the terminal, further includes:

Receive the stopping radio reception message that the terminal is sent；

Stop radio reception.

Optionally, after the stopping radio reception, further includes:

Receive the idle message that the terminal is sent；

If not collecting the third audio in first preset duration, enter dormant state.

The third aspect of the present invention provides a kind of voice interaction device, comprising:

Receiving module, for receiving the wake-up states message of peripheral hardware end transmission and waking up first audio at the peripheral hardware end, The wake-up states message is used to indicate the peripheral hardware end and is in wake-up states；

Second audio processing modules, for if it is determined that include the wake-up information at the peripheral hardware end in first audio, The second audio that the peripheral hardware end is sent then is received, and second audio is sent to server, so that the server root The first response instruction is returned to the voice interaction device according to second audio；

First response command process module, the first response instruction sent for receiving the server, and according to The first response instruction controls the peripheral hardware end execution first response and instructs corresponding first response action.

Optionally, described device further include: the first radio reception instruction sending module；

The first radio reception instruction sending module, for sending the first radio reception instruction to the peripheral hardware end, described first is received Sound instruction is used to indicate the peripheral hardware end and starts radio reception.

Optionally, described device further include: the second radio reception instruction sending module；

The second radio reception instruction sending module, if sent for receiving the peripheral hardware end in the first preset duration Third audio, and include the corresponding wake-up information in the peripheral hardware end in the third audio, then it is sent to the peripheral hardware end Second radio reception instruction, the second radio reception instruction are used to indicate the peripheral hardware end and continue radio reception.

Optionally, described device further include: dormancy information sending module；

The dormancy information sending module, for if it is determined that not including the wake-up for having the peripheral hardware end in first audio Information, then send dormancy information to the peripheral hardware end, and the dormancy information is used to indicate the peripheral hardware end and enters dormant state.

Optionally, described device further include: stop sending message processing module；

The stopping sends message processing module, sends message for receiving the stopping that the server is sent, described to stop Only transmission message is used to indicate the voice interaction device and stops sending audio to the server, and the stopping, which sends message, is The server is receiving in the second preset duration after second audio, does not receive the voice interaction device hair It is sent when four audio sent；It is sent to the peripheral hardware end and stops radio reception message, the stopping radio reception message is used to indicate institute It states peripheral hardware end and stops radio reception.

Optionally, described device further include: idle message sending module；

The idle message sending module, if for not receiving that the peripheral hardware end sends in third preset duration Four audios then enter idle state, and send idle message to the peripheral hardware end.

The fourth aspect of the present invention provides a kind of voice interaction device, comprising:

Wake-up states sending module, for sending wake-up states message to terminal and waking up the of the voice interaction device One audio, for the wake-up states message for notifying the terminal, the voice interaction device is in wake-up states；

Second audio sending module, for sending the second audio to the terminal, so that the terminal is by second sound Frequency is sent to server, instructs so that the server returns to the first response to the terminal according to second audio, described Second audio be the terminal determine include in first audio voice interaction device wake-up information after send 's；

First response action execution module instructs corresponding first response action for executing first response.

Optionally, described device further include: the first radio reception command reception module；

The first radio reception command reception module, the first radio reception instruction sent for receiving the terminal, described first Radio reception instruction is used to indicate the voice interaction device and starts radio reception, and the first radio reception instruction is the terminal described in the determination It is sent when in the first audio including the corresponding wake-up information of the voice interaction device.

Optionally, described device further include: the second radio reception command reception module；

The second radio reception command reception module, for sending third audio to the terminal in the first preset duration； The second radio reception instruction that the terminal is sent is received, the second radio reception instruction is used to indicate the voice interaction device and continues to receive Sound, it includes the corresponding institute of the voice interaction device in determining the third audio that the second radio reception instruction, which is the terminal, It is sent when stating wake-up information.

Optionally, described device further include: dormancy information receiving module；

The dormancy information receiving module, the dormancy information sent for receiving the terminal are described into dormant state Dormancy information is that the terminal determines in first audio transmission when not comprising the wake-up information for being stated voice interaction device 's.

Optionally, described device further include: the first audio collects module；

First audio collects module, for collecting the first audio of user, and enters wake-up states, the voice is handed over Mutual device determines in the first audio to include the corresponding wake-up information of the voice interaction device.

Optionally, described device further include: stop radio module；

The stopping radio module, the stopping radio reception message sent for receiving the terminal；Stop radio reception.

Optionally, described device further include: sleep block；

The sleep block, the idle message sent for receiving the terminal；If in first preset duration not The third audio is collected, then enters dormant state.

The fifth aspect of the present invention provides a kind of terminal, comprising: at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the terminal executes The voice interactive method of above-mentioned first aspect.

The sixth aspect of the present invention provides a kind of peripheral hardware end, comprising: at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the peripheral hardware end is held The voice interactive method of the above-mentioned second aspect of row.

The seventh aspect of the present invention provides a kind of computer readable storage medium, deposits on the computer readable storage medium Computer executed instructions are contained, when the computer executed instructions are executed by processor, realize the voice of above-mentioned first aspect Exchange method.

The eighth aspect of the present invention provides a kind of computer readable storage medium, deposits on the computer readable storage medium Computer executed instructions are contained, when the computer executed instructions are executed by processor, realize the voice of above-mentioned second aspect Exchange method.

The present invention provides a kind of voice interactive method, device and storage medium, this method comprises: receiving what peripheral hardware end was sent Wake-up states message and the first audio for waking up peripheral hardware end, wake-up states message are used to indicate peripheral hardware end and are in wake-up states；If Determine include in the first audio peripheral hardware end wake-up information, then receive the second audio of peripheral hardware end transmission, and by the second audio It is sent to server, is instructed so that server returns to the first response to terminal according to the second audio；Receive server is sent the One response instruction, and the first response is executed according to the first response instruction control peripheral hardware end and instructs corresponding first response action.This After invention wakes up peripheral hardware end by voice, wake-up information is confirmed again by terminal, avoids that false wake-up occurs, and enrich The interactive function at peripheral hardware end and terminal, improves user experience.

Detailed description of the invention

Fig. 1 is the schematic diagram of a scenario that voice interactive method provided by the invention is applicable in；

Fig. 2 is the flow diagram one of voice interactive method provided by the invention；

Fig. 3 is the flow diagram two of voice interactive method provided by the invention；

Fig. 4 is the flow diagram three of voice interactive method provided by the invention；

Fig. 5 is the structural schematic diagram one of a voice interaction device provided by the invention；

Fig. 6 is the structural schematic diagram two of a voice interaction device provided by the invention；

Fig. 7 is the structural schematic diagram three of a voice interaction device provided by the invention；

Fig. 8 is the structural schematic diagram one of another voice interaction device provided by the invention；

Fig. 9 is the structural schematic diagram two of another voice interaction device provided by the invention；

Figure 10 is the structural schematic diagram three of another voice interaction device provided by the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the embodiment of the present invention, to this Technical solution in inventive embodiments is clearly and completely described, it is clear that described embodiment is that a part of the invention is real Example is applied, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creation Property labour under the premise of every other embodiment obtained, shall fall within the protection scope of the present invention.

Bluetooth peripheral hardware end in existing technology is varied, such as bluetooth headset, Baffle Box of Bluetooth, bluetooth keyboard, movement hand Ring etc., these bluetooth peripheral hardware ends are before use, need to establish bluetooth connection with terminal；Illustratively, Baffle Box of Bluetooth is built with terminal The process of vertical bluetooth connection are as follows: the power key of long-pressing Baffle Box of Bluetooth searches for bluetooth sound so that Baffle Box of Bluetooth is opened at the terminal The title of case, input pairing password, then can establish bluetooth connection.

Upon establishment of a connection, terminal can pass through the song or other audios on Baffle Box of Bluetooth playback terminal, the audio It can be stored in the local folders of terminal, be also possible to terminal and interact the instant audio obtained with server；Terminal The file played will be needed to be sent to Baffle Box of Bluetooth, Baffle Box of Bluetooth can the corresponding audio of played file.

But the interactive function in the prior art between terminal and bluetooth peripheral hardware end is excessively single, is merely able to realize in terminal It is passively playable, cannot be interacted with user, poor user experience under control；And can be interacted in the prior art with user Equipment is smart machine, with the proviso that can establish connection with server, the deployment cost at peripheral hardware end is high.

Precisely in order to solving the problems, such as that the interactive function between above-mentioned terminal and bluetooth peripheral hardware end is excessively single, and abundant While interactive function between the two, the deployment cost at bluetooth peripheral hardware end is reduced；The present invention provides a kind of interactive voice sides Formula.Fig. 1 is the schematic diagram of a scenario that voice interactive method provided by the invention is applicable in, as shown in Figure 1, voice provided by the invention is handed over It include: peripheral hardware end, terminal and server in the applicable scene of mutual method.

Wherein, peripheral hardware end can establish bluetooth connection with terminal, and the specific bluetooth connection can be in the prior art Based on the data communication of classical bluetooth, designated equipment is selected in the system set interface guidance user of terminal and completes to match；Or Person, terminal can establish smart bluetooth (DuerOS Mobile Accessories, DMA) connection with peripheral hardware end, illustratively, Terminal, can direct sweeping at the interface of the application program of terminal completion peripheral hardware end when wanting to establish DMA with peripheral hardware end and connect It retouches, match and connects, the system set interface for needing not return to terminal is configured, then completes to connect to the interface of application program It connects.Corresponding, when establishing common bluetooth connection in the present embodiment, peripheral hardware end is common bluetooth equipment；It is established therewith in terminal When DMA connection, peripheral hardware end is dma device, that is, supports the equipment of DMA Bluetooth protocol.Specifically, established when terminal and peripheral hardware end When being common bluetooth connection, specific mode is referred to bluetooth connection mode in the prior art；Terminal and peripheral hardware end are established Be DMA connection process, be specifically illustrated in the following embodiments.

It can be wireless connection or wired connection between terminal and server in the present invention, the terminal in the present invention can Think mobile phone, personal digital assistant (Personal Digital Assistant, PDA), tablet computer, portable equipment (for example, Portable computer, pocket computer or handheld computer) etc. mobile devices；It is also possible to the fixation such as desktop computer to set It is standby.

Below between peripheral hardware end, terminal and server interaction angle, to voice interactive method provided by the invention into Row explanation, Fig. 2 is the flow diagram one of voice interactive method provided by the invention, as shown in Fig. 2, language provided in this embodiment Sound exchange method may include:

S201, peripheral hardware end send wake-up states message to terminal and wake up first audio at peripheral hardware end, wake-up states message For notifying terminal, peripheral hardware end is in wake-up states.

Peripheral hardware end in the present embodiment has function of radio receiver, specifically, peripheral hardware end can be the vehicle-mounted branch with Mike Mic Frame has the Baffle Box of Bluetooth of function of radio receiver, bluetooth headset, light emitting diode (Light-Emitting Diode, LED) lamp, makes a noise The equipment such as clock.

Terminal with after bluetooth connection is established at peripheral hardware end or DMA is connect, when user has interactive voice demand, as user thinks When inquiring weather, playing song, peripheral hardware end can be waken up, the mode at the wake-up peripheral hardware end in the present embodiment is voice wake-up side Formula.

Wherein, a kind of mode at peripheral hardware end is waken up are as follows: peripheral hardware end has preset wake-up information, which can be Preset wake-up word, peripheral hardware end can be constantly in radio reception state, when peripheral hardware end collect user say wake up word or comprising When having the sentence for waking up word, terminal enters wake-up states.

User wake-up peripheral hardware end another mode are as follows: peripheral hardware is provided with switch button on end, user by click or Other operations of person to waking up after button selects, open by peripheral hardware end, calls out at this time when peripheral hardware end collects user and says peripheral hardware end Wake up word or include peripheral hardware end wake up word sentence when, terminal enters wake-up states.Packet is collected at peripheral hardware end in the present embodiment It is the first audio containing the audio for waking up word.

Illustratively, as peripheral hardware end wake-up word be " small degree ", then user says the language of " small degree " or " small degree is waken up " When sentence, after the audio is collected at peripheral hardware end, which is parsed, however, it is determined that include the wake-up at peripheral hardware end in the audio Word, then peripheral hardware end enters wake-up states.

Wake-up states message can be sent to terminal after peripheral hardware end wakes up for notifying terminal, peripheral hardware end, which is in, wakes up shape State, peripheral hardware end also sends the first audio for waking up peripheral hardware end to terminal in the present embodiment, its purpose is that being carried out again by terminal true Recognize, whether really includes the corresponding wake-up word in peripheral hardware end in first audio, to improve wake-up quality.

Possible application scenarios are as follows: in a noisy environment or user apart from peripheral hardware end remote position when speaking, It whether include the corresponding wake-up word in peripheral hardware end in the audio that the parsing of peripheral hardware end is collected；At this time due to the influence of extraneous factor, It is wrong to the parsing result of audio to may cause peripheral hardware end, i.e., so that peripheral hardware end is mistakenly considered in the audio collected to include peripheral hardware Corresponding wake-up word is held, so that peripheral hardware end enters wake-up states, increases the power consumption at peripheral hardware end.

S202, terminal determine include in the first audio peripheral hardware end wake-up information.

Terminal again parses first audio after the first audio for receiving the transmission of peripheral hardware end, confirmation first In audio whether really include peripheral hardware end wake-up information, which can be the corresponding preset wake-up in peripheral hardware end Word.Text can be converted by the first audio in the present embodiment, word for word confirm in the first audio whether really include peripheral hardware end Wake-up word.

S203, peripheral hardware end send the second audio to terminal.

When in the first audio of terminal check including the wake-up information at peripheral hardware end, terminal can send radio reception to peripheral hardware end Instruction, so that peripheral hardware end starts radio reception；Or peripheral hardware end starts radio reception when not receiving the sleep messages of terminal transmission, it should Sleep messages are that terminal is sent when wake-up information corresponding there is no peripheral hardware end to peripheral hardware end in determining the first audio.

The second audio in the present embodiment can be the first section audio that peripheral hardware end is collected.Specifically, peripheral hardware end is being detected In preset time period after to the first section audio, if effective audio is not detected, which is sent to terminal.This The volume for the audio collected can be more than the audio of threshold volume as effective audio by peripheral hardware end in embodiment.It can think To after peripheral hardware end enters wake-up states, if only detecting a word that user says, peripheral hardware end should by what is collected In short corresponding audio is as the second audio.

In the present embodiment, the second audio collected is sent to terminal by peripheral hardware end, is received in the present embodiment at peripheral hardware end Start radio reception after to radio reception instruction, the audio collected is effective audio, so that terminal and server interacts, obtains the sound Frequently corresponding response audio；It can solve that peripheral hardware end is not waken up or when terminal does not indicate the radio reception of peripheral hardware end, peripheral hardware end will be received The problem of audio got is sent to terminal, the memory of occupied terminal.

S204, terminal receives the second audio that peripheral hardware end is sent, and the second audio is sent to server.

In the present embodiment, the second audio is sent to server after the second audio for receiving the transmission of peripheral hardware end by terminal, To obtain the corresponding response instruction of second audio.

Illustratively, the second audio is that " me is helped to set the alarm clock of 7:00 on tomorrow morning " is sent to terminal, terminal again by this second Audio is sent to server, to obtain response instruction.

S205, server return to the first response to terminal according to the second audio and instruct.

In the present embodiment, server can parse the second audio after receiving the second audio.Specifically, clothes The process that business device parses the second audio can be with are as follows: converts text for the second audio, text is carried out cutting processing, is obtained Take the corresponding multiple words of the text；Target word is obtained further according to the part of speech of each word, further according to the semanteme of target word, Obtain the corresponding response instruction of second audio.

It can be using such as neural LISP program LISP (Neuro-Linguistic of tokenizer in the present embodiment Programming, NLP) tool carries out word segmentation processing to the corresponding text of the second audio, the corresponding multiple words of text are obtained, If the corresponding text of the second audio is " me is helped to set the alarm clock of 7:00 on tomorrow morning ", using tokenizer by the character segmentation at multiple Word, the word after specific cutting can be " helping me ", " setting ", " tomorrow morning ", " 7:00 ", " " and " alarm clock ".

In the present embodiment, optionally, the corresponding target word of effective information can be obtained according to the part of speech of multiple words of acquisition Language, such as quantifier, adverbial word, the adjective in the conversation message after cutting are removed, the corresponding target word of effective information is obtained, Such as noun and verb, as by above-mentioned cutting result " helping me ", " " remove, obtain the corresponding target word of effective information Language, " setting ", " tomorrow morning ", " 7:00 " and " alarm clock ".Setting when server determines the intention of user according to the target word of acquisition Tomorrow morning 7:00 alarm clock, then server can to user return about setting alarm clock first response instruction.

It is worth noting that, server can first carry out text when the corresponding text of the second audio is more texts Subordinate sentence processing, then word segmentation processing is carried out to each clause, every height is obtained further according to the semanteme of the middle target word of each clause The corresponding response audio of sentence, the corresponding multiple response audios of the second audio are sent to according to the sequencing of clause in the literature Terminal.

Illustratively, the corresponding text of the second audio of user is " me to be helped to set the alarm clock of 7:00 on tomorrow morning；Turn off the light ", service Text is divided into two clauses " me is helped to set the alarm clock of 7:00 on tomorrow morning " and " turning off the light " by device.It is corresponding to obtain each clause respectively again Target word, such as " setting ", " tomorrow morning ", " 7:00 " and " alarm clock ", and " turning off the light " it is corresponding then to obtain each clause respectively Response instruction merges the response instruction of each clause to form the first response instruction.

The first response instruction of the second audio of response is obtained in server, which is sent to terminal, with Terminal control peripheral hardware end is set to carry out corresponding first response action.

S206, terminal receives the first response instruction that server is sent, and is held according to the first response instruction control peripheral hardware end The response of row first instructs corresponding first response action.

Illustratively, user is " me to be helped to set the alarm clock of 7:00 on tomorrow morning receiving the second audio；Turn off the light " the first sound It should instruct, and be instructed according to first response, the jingle bell alarm clock of control alarm clock setting 7:00 on tomorrow morning, and control LED light and close.

S207, peripheral hardware end execute the first response action.

In the present embodiment, it can store multiple responses in terminal or peripheral hardware end and instructed corresponding response action.

If multiple response instruct corresponding response action storage in the terminal, terminal receives the of server transmission After one response instruction, corresponding first response action is obtained according to the first response instruction, is sent to peripheral hardware end and executes first sound The instruction that should be acted, peripheral hardware end execute corresponding first response action in turn.

If multiple response instructs corresponding response action to store in peripheral hardware end, terminal receives server transmission After first response instruction, which is sent to peripheral hardware end；After peripheral hardware termination receives the first response instruction, more A response, which instructs, obtains corresponding first response action of the first response instruction in corresponding response action, it is dynamic to execute first response Make.

In the present embodiment using peripheral hardware end carry out radio reception, compared with the existing technology in terminal directly handed over server Mutually obtain the mode of response audio；It on the one hand, may not be able to be accurate apart from its certain distance since the radio reception effect of terminal is limited Radio reception or radio reception effect are poor, more preferable using peripheral hardware end radio reception effect in the present embodiment；On the other hand, also make terminal and outer If the interaction at end is more diversified, user experience is improved.

Voice interactive method provided in this embodiment includes: to receive the wake-up states message and wake up peripheral hardware that peripheral hardware end is sent First audio at end, wake-up states message are used to indicate peripheral hardware end and are in wake-up states；If it is determined that including outer in the first audio If the wake-up information at end, then the second audio of peripheral hardware end transmission is received, and the second audio is sent to server, so that server The first response instruction is returned to terminal according to the second audio；The first response instruction that server is sent is received, and according to the first sound Control peripheral hardware end should be instructed to execute the first response and instruct corresponding first response action.The present embodiment wakes up peripheral hardware end by voice Afterwards, wake-up information is confirmed again by terminal, avoids that false wake-up occurs, and enrich the interaction function of peripheral hardware end and terminal Can, improve user experience.

Voice interactive method provided by the invention is further described below with reference to Fig. 3, Fig. 3 is provided by the invention The flow diagram two of voice interactive method, as shown in figure 3, voice interactive method provided in this embodiment may include:

S301, terminal are established DMA with peripheral hardware end and are connect.

In the prior art, bluetooth connection is established between terminal and peripheral hardware bluetooth equipment are as follows: terminal is swept by existing bluetooth Mode is retouched, i.e. ble scanning obtains the bluetooth equipment that can connect, first establishes ble between bluetooth equipment and connect；The connection is established Afterwards, bluetooth equipment is to terminal returning response message, which can be carried out by rfcomm link and bluetooth The connection of equipment, terminal are disconnected after receiving the response message and being connect with the ble of bluetooth equipment, again through rfcomm link with Bluetooth equipment is attached.Connection type in the prior art will lead under ble link normal condition, influence to carry out rfcomm company The success rate and speed connect.

Peripheral hardware end in the present embodiment is the peripheral hardware end for supporting DMA agreement, specifically, to terminal and peripheral hardware in the present embodiment End is established DMA connection type and is described briefly: terminal supports the DMA peripheral hardware end of DMA agreement to send out to terminal during scanning Broadcast packet is sent, includes to indicate that the identification information of DMA connection is supported at the peripheral hardware end in the broadcast packet, then terminal directly passes through Rfcomm link is attached with peripheral hardware end, is solved under ble link normal condition in the prior art, influences to carry out rfcomm The problem of success rate and speed of connection.

S302, the first audio of user is collected at peripheral hardware end, and enters wake-up states, and peripheral hardware end, which determines in the first audio, includes There is the corresponding wake-up information in peripheral hardware end.

In the present embodiment, established after DMA connects at terminal and peripheral hardware end, user wants to carry out interactive voice, then user says The wake-up information at peripheral hardware end out specifically, the wake-up information can be the preset wake-up word in peripheral hardware end, or includes wake-up word Sentence, to wake up peripheral hardware end.Wherein, the first audio is the audio for including the corresponding wake-up information in peripheral hardware end, at peripheral hardware end It collects to after the first audio, into wake-up states.

It illustratively, can be the preset wake-up word in peripheral hardware end as waken up information, the wake-up word at peripheral hardware end is " small degree ", then When user says the sentence of " small degree " or " small degree is waken up ", after first audio is collected at peripheral hardware end, determine in the audio It include the wake-up word at peripheral hardware end, peripheral hardware end enters wake-up states.

S303, peripheral hardware end send wake-up states message to terminal and wake up first audio at peripheral hardware end, wake-up states message For notifying terminal, peripheral hardware end is in wake-up states.

S304, terminal is if it is determined that include the wake-up information at peripheral hardware end in the first audio, then to the first receipts of peripheral hardware end transmission Sound instruction, the first radio reception instruction are used to indicate peripheral hardware end and start radio reception.

When terminal includes the wake-up information at peripheral hardware end in determining the first audio, the first radio reception is sent to peripheral hardware end and is referred to It enables, the first radio reception instruction is used to indicate peripheral hardware end and starts radio reception.

It is worth noting that, in the present embodiment terminal if it is determined that not including in the first audio has the wake-up information at peripheral hardware end, Dormancy information then is sent to peripheral hardware end, dormancy information is used to indicate peripheral hardware end and enters dormant state.

Do not include the wake-up information for having peripheral hardware end in the first audio that i.e. peripheral hardware end is sent, and peripheral hardware end is mistakenly considered wherein to wrap Containing information is waken up in the case where entering wake-up states, whether terminal can judge true in the audio again according to the first audio Include peripheral hardware end wake-up information；If not including in the first audio has the wake-up information at peripheral hardware end, make peripheral hardware end into The generation for the phenomenon that entering dormant state, peripheral hardware end is avoided to enter the electricity at wake-up states consumption peripheral hardware end in false wake-up.

S305, peripheral hardware end send the second audio to terminal.

In the present embodiment, peripheral hardware end starts radio reception after receiving the first radio reception instruction, i.e., is sent to the audio collected Terminal is handled.

S306, terminal receives the second audio that peripheral hardware end is sent, and the second audio is sent to server.

S307, server return to the first response to terminal according to the second audio and instruct.

S308, terminal receives the first response instruction that server is sent, and is held according to the first response instruction control peripheral hardware end The response of row first instructs corresponding first response action.

S309, peripheral hardware end execute the first response action.

S310, peripheral hardware end send third audio to terminal in the first preset duration.

The first preset duration is stored in peripheral hardware end, the first preset duration after peripheral hardware end has executed the first response action If interior collect third audio, which is sent to terminal, if in the present embodiment in the first preset duration peripheral hardware It holds when including the wake-up information at peripheral hardware end into the third audio that terminal is sent, peripheral hardware end can continue radio reception, realize peripheral hardware The interaction at end, terminal and server.

S311, terminal determine the wake-up information in third audio comprising peripheral hardware end, then send the second radio reception to peripheral hardware end and refer to It enables, the second radio reception instruction is used to indicate peripheral hardware end and continues radio reception.

In the present embodiment, after the third audio that terminal receives the transmission of peripheral hardware end, determined according to above-mentioned analysis mode Whether include the wake-up information at peripheral hardware end in third audio, if so, sending the second radio reception instruction, the second radio reception to peripheral hardware end Instruction is used to indicate peripheral hardware end and continues radio reception.Wherein, after radio reception is continued at peripheral hardware end, the new audio collected can be sent to Terminal is instructed according to the response that the interactive voice mode in above-described embodiment obtains new audio, and then is controlled peripheral hardware end and executed Corresponding response action, to realize more wheel interactions.

It further, really include the wake-up information at peripheral hardware end in by the first audio of terminal check in the present embodiment It later, whether include that the corresponding wake-up information in peripheral hardware end is judged by terminal in the audio that peripheral hardware termination receives, at end It include that the second radio reception instruction instruction peripheral hardware can be sent to peripheral hardware end after the corresponding wake-up information in peripheral hardware end in end confirmation audio Radio reception is continued at end, to carry out more wheels interaction of voice.

Specifically, the embodiment in S303, S306-S309 in the present embodiment specifically can refer in above-described embodiment Associated description in S201, S204-S207, this is not restricted.

Terminal in the present embodiment is established DMA with peripheral hardware end and is connect, and solves ble link normal condition in the prior art Under, influence the problem of carrying out the success rate and speed of rfcomm connection；After the present embodiment wakes up peripheral hardware end by voice, by terminal Wake-up information is confirmed again, avoids that false wake-up occurs, and enrich the interactive function of peripheral hardware end and terminal, improves use Family experience.Further, third audio is sent to terminal in the first preset duration at peripheral hardware end, in terminal check third audio In include the corresponding wake-up information in peripheral hardware end after, the second radio reception instruction instruction peripheral hardware end can be sent to peripheral hardware end and continue to receive Sound, to realize more wheels interaction of voice.

Voice interactive method provided by the invention is further described below with reference to Fig. 4, Fig. 4 is provided by the invention The flow diagram three of voice interactive method, as shown in figure 4, voice interactive method provided in this embodiment may include:

S401, terminal are established DMA with peripheral hardware end and are connect.

S402, the first audio of user is collected at peripheral hardware end, and enters wake-up states, and peripheral hardware end, which determines in the first audio, includes There is the corresponding wake-up information in peripheral hardware end.

S403, peripheral hardware end send wake-up states message to terminal and wake up first audio at peripheral hardware end, wake-up states message For notifying terminal, peripheral hardware end is in wake-up states.

S404, terminal is if it is determined that include the wake-up information at peripheral hardware end in the first audio, then to the first receipts of peripheral hardware end transmission Sound instruction, the first radio reception instruction are used to indicate peripheral hardware end and start radio reception.

S405, peripheral hardware end send the second audio to terminal.

S406, terminal receives the second audio that peripheral hardware end is sent, and the second audio is sent to server.

S407, server is sent to terminal stops sending message, and stopping transmission message is used to indicate terminal and stops to service Device sends audio.

The second preset duration is provided in the present embodiment, in server, server is in the second sound for receiving terminal transmission After frequency, if the second preset duration does not receive the 4th audio of terminal transmission again, it is determined that user, which speaks, to be finished, then according to second Audio obtains corresponding response audio, and sends to terminal and stop sending message, wherein stopping sends message and is used to indicate terminal Stop sending audio to server.Specifically, terminal receive server transmission stopping send message after, no longer to service Device sends new audio.

S408, terminal is sent to peripheral hardware end stops radio reception message, stops radio reception message and is used to indicate the stopping radio reception of peripheral hardware end.

Peripheral hardware end in the present embodiment is the peripheral hardware end of controllable radio reception, and terminal is sent in the stopping for receiving server transmission After message, it can be sent to peripheral hardware end and stop radio reception message, so that peripheral hardware end stops radio reception, to reduce the power consumption at peripheral hardware end.

S409, peripheral hardware end stop radio reception.

S410, server return to the first response to terminal according to the second audio and instruct.

S411, terminal receives the first response instruction that server is sent, and is held according to the first response instruction control peripheral hardware end The response of row first instructs corresponding first response action.

S412, peripheral hardware end execute the first response action.

S413 enters idle shape if terminal does not receive the 4th audio of peripheral hardware end transmission in third preset duration State, and idle message is sent to peripheral hardware end.

Third preset duration is stored in the present embodiment, in terminal, in terminal according to the first response instruction control peripheral hardware end It executes in the third preset duration after the first response instructs corresponding first response action, if not receiving the of peripheral hardware end transmission Four audios, i.e., new audio, it is determined that the demand of the not new interactive voice of user, then terminal enters idle state, specifically Also idle message is sent to peripheral hardware end.Specifically, the idle state that terminal enters can be into energy-saving mode, not have In the case where interactive voice, power consumption of terminal is reduced.

S414, peripheral hardware end receive the idle message that terminal is sent；If not collecting third audio in the first preset duration, Then enter dormant state.

In the present embodiment, peripheral hardware end has determined the interactive voice of terminal after the idle message for receiving terminal transmission It completes；Specifically, receiving in the first preset duration after the idle message, if third audio is not collected at peripheral hardware end, i.e., Then determine that user does not have the demand of interactive voice, then enters dormant state.

Specifically, the embodiment in S401-S406, S410-S412 in the present embodiment specifically can refer to above-mentioned implementation The associated description in S301-S306, S307-S309 in example, this is not restricted.

In the present embodiment, if terminal does not receive the audio of peripheral hardware end transmission in preset duration, enter idle state, And it includes the wake-up audio for waking up information that peripheral hardware end does not receive within the scope of time threshold, then enters dormant state, so as to In the case where no interactive voice, the power consumption of terminal and peripheral hardware end is reduced.

Fig. 5 is the structural schematic diagram one of a voice interaction device provided by the invention, which is terminal, such as Shown in Fig. 5, which includes: at receiving module 501, the second audio processing modules 502 and the first response instruction Manage module 503.

Receiving module 501 is called out for receiving the wake-up states message of peripheral hardware end transmission and waking up first audio at peripheral hardware end Awake status message is used to indicate peripheral hardware end and is in wake-up states.

Second audio processing modules 502, for if it is determined that include the wake-up information at peripheral hardware end in the first audio, then receiving The second audio that peripheral hardware end is sent, and the second audio is sent to server, so that server is handed over according to the second audio to voice Mutual device returns to the first response instruction.

First response command process module 503, for receiving the first response instruction of server transmission, and according to the first sound Control peripheral hardware end should be instructed to execute the first response and instruct corresponding first response action.

Voice interaction device provided in this embodiment is similar with principle and technical effect that above-mentioned voice interactive method is realized, Therefore not to repeat here.

Optionally, Fig. 6 is the structural schematic diagram two of a voice interaction device provided by the invention, as shown in fig. 6, the voice Interactive device 500 includes: the first radio reception instruction sending module 504, the second radio reception instruction sending module 505, dormancy information transmission Module 506 stops sending message processing module 507 and idle message sending module 508.

First radio reception instruction sending module 504, for sending the first radio reception instruction to peripheral hardware end, the first radio reception instruction is used for Instruction peripheral hardware end starts radio reception.

Second radio reception instruction sending module 505, if the third for receiving the transmission of peripheral hardware end in the first preset duration Audio, and include the corresponding wake-up information in peripheral hardware end in third audio, then the second radio reception instruction, the second radio reception are sent to peripheral hardware end Instruction is used to indicate peripheral hardware end and continues radio reception.

Dormancy information sending module 506, for if it is determined that not including in the first audio has the wake-up information at peripheral hardware end, then to Peripheral hardware end sends dormancy information, and dormancy information is used to indicate peripheral hardware end and enters dormant state.

Stop sending message processing module 507, the stopping for receiving server transmission sends message, stops sending message It is used to indicate voice interaction device to stop sending audio to server, stops sending message being that server is receiving the second audio In the second preset duration later, sent when not receiving four audio of voice interaction device transmission；It is sent to peripheral hardware end Stop radio reception message, stops radio reception message and be used to indicate the stopping radio reception of peripheral hardware end.

Idle message sending module 508, if the 4th sound for not receiving the transmission of peripheral hardware end in third preset duration Frequently, then enter idle state, and send idle message to peripheral hardware end.

Fig. 7 is the structural schematic diagram three of a voice interaction device provided by the invention, as shown in fig. 7, the interactive voice fills Setting 700 includes: memory 701 and at least one processor 702.

Memory 701, for storing program instruction.

Processor 702, for being performed the voice interactive method realized in the present embodiment, specific implementation in program instruction Principle can be found in above-described embodiment, and details are not described herein again for the present embodiment.

The voice interaction device 700 can also include and input/output interface 703.

Input/output interface 703 may include independent output interface and input interface, or integrated input and defeated Integrated interface out.Wherein, output interface is used for output data, and input interface is used to obtain the data of input, above-mentioned output Data are the general designation exported in above method embodiment, and the data of input are the general designation inputted in above method embodiment.

The present invention also provides a kind of readable storage medium storing program for executing, it is stored with and executes instruction in readable storage medium storing program for executing, work as interactive voice When at least one processor of device executes this and executes instruction, when computer executed instructions are executed by processor, realize above-mentioned Voice interactive method in embodiment.

The present invention also provides a kind of program product, the program product include execute instruction, this execute instruction be stored in it is readable In storage medium.At least one processor of voice interaction device can read this from readable storage medium storing program for executing and execute instruction, at least One processor executes this and executes instruction so that voice interaction device implements the interactive voice that above-mentioned various embodiments provide Method.

Fig. 8 is the structural schematic diagram one of another voice interaction device provided by the invention, which is peripheral hardware End, as shown in figure 8, the voice interaction device 800 includes: wake-up states sending module 801,802 and of the second audio sending module First response action execution module 803.

Wake-up states sending module 801, for sending wake-up states message to terminal and waking up the of voice interaction device One audio, for wake-up states message for notifying terminal, voice interaction device is in wake-up states.

Second audio sending module 802, for sending the second audio to terminal, so that the second audio is sent to clothes by terminal Business device instructs so that server returns to the first response to terminal according to the second audio, and the second audio is to determine the first sound in terminal It include to send after the wake-up information of voice interaction device in frequency.

First response action execution module 803 instructs corresponding first response action for executing the first response.

Optionally, Fig. 9 is the structural schematic diagram two of another voice interaction device provided by the invention, as shown in figure 9, the language Sound interactive device 800 includes: that the first radio reception command reception module 804, the second radio reception command reception module 805, dormancy information connect Receive module 806, the first audio collects module 807, stops radio module 808 and sleep block 809.

First radio reception command reception module 804, for receiving the first radio reception instruction of terminal transmission, the first radio reception instruction is used Start radio reception in instruction voice interaction device, the first radio reception instruction is that terminal includes interactive voice dress in determining the first audio It is sent when setting corresponding wake-up information.

Second radio reception command reception module 805, for sending third audio to terminal in the first preset duration；It receives eventually The second radio reception instruction that end is sent, the second radio reception instruction are used to indicate voice interaction device and continue radio reception, and the second radio reception instruction is What terminal was sent when wake-up information corresponding comprising voice interaction device in determining third audio.

Dormancy information receiving module 806, for receiving the dormancy information of terminal transmission, into dormant state, dormancy information It is that terminal is sent when determining in the first audio not comprising the wake-up information for having voice interaction device.

First audio collects module 807, for collecting the first audio of user, and enters wake-up states, interactive voice dress It sets and determines that in the first audio include the corresponding wake-up information of voice interaction device.

Stop radio module 808, for receiving the stopping radio reception message of terminal transmission；Stop radio reception.

Sleep block 809, for receiving the idle message of terminal transmission；If not collecting third in the first preset duration Audio then enters dormant state.

Figure 10 is the structural schematic diagram three of another voice interaction device provided by the invention, and as shown in Figure 10, which hands over Mutual device 1000 includes: memory 1001 and at least one processor 1002.

Memory 1001, for storing program instruction.

Processor 1002, it is specific real for being performed the voice interactive method realized in the present embodiment in program instruction Existing principle can be found in above-described embodiment, and details are not described herein again for the present embodiment.

The voice interaction device 1000 can also include and input/output interface 1003.

Input/output interface 1003 may include independent output interface and input interface, or integrated input and The integrated interface of output.Wherein, output interface is used for output data, and input interface is used to obtain the data of input, above-mentioned output Data be the general designation that exports in above method embodiment, the data of input are the general designation inputted in above method embodiment.

In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this hair The part steps of bright each embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviation: ROM), random access memory (English: Random Access Memory, letter Claim: RAM), the various media that can store program code such as magnetic or disk.

In the embodiment of the above-mentioned network equipment or terminal device, it should be appreciated that processor can be central processing unit (English: Central Processing Unit, referred to as: CPU), it can also be other general processors, digital signal processor (English: Digital Signal Processor, abbreviation: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) etc..General processor can be microprocessor or the processor It is also possible to any conventional processor etc..Hardware handles can be embodied directly in conjunction with the step of method disclosed in the present application Device executes completion, or in processor hardware and software module combination execute completion.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of voice interactive method is applied to terminal characterized by comprising

Receive the wake-up states message and the first audio for waking up the peripheral hardware end that peripheral hardware end is sent, the wake-up states message use Wake-up states are in the instruction peripheral hardware end；

If it is determined that including the wake-up information at the peripheral hardware end in first audio, then the peripheral hardware end is sent second is received Audio, and second audio is sent to server, so that the server is returned according to second audio to the terminal Return the first response instruction；

The first response instruction that the server is sent is received, and the peripheral hardware end is controlled according to the first response instruction It executes first response and instructs corresponding first response action.

2. the method according to claim 1, wherein described if it is determined that including described outer in first audio If after the wake-up information at end, further includes:

The first radio reception instruction is sent to the peripheral hardware end, the first radio reception instruction is used to indicate the peripheral hardware end and starts radio reception.

3. the method according to claim 1, wherein described control the peripheral hardware according to the first response instruction End executes after corresponding first response action of the first response instruction, further includes:

If receiving the third audio that the peripheral hardware end is sent in the first preset duration, and comprising described in the third audio The wake-up information at peripheral hardware end, then send the second radio reception instruction to the peripheral hardware end, and the second radio reception instruction is used to indicate Continue radio reception in the peripheral hardware end.

4. the method according to claim 1, wherein described receive the wake-up states message and call out that peripheral hardware end is sent After first audio at the peripheral hardware end of waking up, further includes:

If it is determined that not including the wake-up information for having the peripheral hardware end in first audio, then suspend mode letter is sent to the peripheral hardware end Breath, the dormancy information are used to indicate the peripheral hardware end and enter dormant state.

5. method according to claim 1-4, which is characterized in that described that second audio is sent to service After device, further includes:

It receives the stopping that the server is sent and sends message, the stopping transmission message is used to indicate the terminal and stops to institute It states server and sends audio, described to stop sending message be that the server is second pre- after receiving second audio If in duration, being sent when not receiving four audio that the terminal is sent；

It is sent to the peripheral hardware end and stops radio reception message, the stopping radio reception message is used to indicate the peripheral hardware end and stops radio reception.

6. according to the method described in claim 2, it is characterized in that, described instruct it to the peripheral hardware end the second radio reception of transmission Afterwards, further includes:

If not receiving the 4th audio that the peripheral hardware end is sent in third preset duration, enter idle state, and to institute It states peripheral hardware end and sends idle message.

7. a kind of voice interactive method is applied to peripheral hardware end characterized by comprising

Wake-up states message is sent to terminal and wakes up first audio at the peripheral hardware end, and the wake-up states message is for notifying The terminal, the peripheral hardware end are in wake-up states；

The second audio is sent to the terminal, so that second audio is sent to server by the terminal, so that the clothes Business device returns to the first response to the terminal according to second audio and instructs, and second audio is to determine institute in the terminal State include in the first audio the peripheral hardware end wake-up information after send；

It executes first response and instructs corresponding first response action.

8. the method according to the description of claim 7 is characterized in that also being wrapped before second audio of transmission to the terminal It includes:

The first radio reception instruction that the terminal is sent is received, the first radio reception instruction is used to indicate the peripheral hardware end and starts to receive Sound, the first radio reception instruction are that the terminal includes the corresponding wake-up letter in the peripheral hardware end in determining first audio It is sent when breath.

9. the method according to the description of claim 7 is characterized in that described execute corresponding first sound of the first response instruction After should acting, further includes:

Third audio is sent to the terminal in the first preset duration；

The second radio reception instruction that the terminal is sent is received, the second radio reception instruction is used to indicate the peripheral hardware end and continues to receive Sound, it includes the corresponding wake-up in the peripheral hardware end in determining the third audio that the second radio reception instruction, which is the terminal, It is sent when information.

10. the method according to the description of claim 7 is characterized in that the method also includes:

The dormancy information that the terminal is sent is received, into dormant state, the dormancy information is that the terminal determines described the It is sent when not including the wake-up information for having the peripheral hardware end in one audio.

11. the method according to the description of claim 7 is characterized in that described send wake-up states message to the terminal and call out Before first audio at the peripheral hardware end of waking up, further includes:

The first audio of user is collected, and enters wake-up states, the peripheral hardware end determines to include the peripheral hardware in the first audio The wake-up information at end.

12. according to the described in any item methods of claim 7-11, which is characterized in that described to send the second audio to the terminal Later, further includes:

Receive the stopping radio reception message that the terminal is sent；

Stop radio reception.

13. according to the method for claim 12, which is characterized in that after the stopping radio reception, further includes:

Receive the idle message that the terminal is sent；

If not collecting third audio in the first preset duration, enter dormant state.

14. a kind of voice interaction device characterized by comprising

Receiving module, it is described for receiving the wake-up states message of peripheral hardware end transmission and waking up first audio at the peripheral hardware end Wake-up states message is used to indicate the peripheral hardware end and is in wake-up states；

Second audio processing modules, for if it is determined that include the wake-up information at the peripheral hardware end in first audio, then connecing The second audio that the peripheral hardware end is sent is received, and second audio is sent to server, so that the server is according to institute It states the second audio and returns to the first response instruction to the voice interaction device；

First response command process module, the first response instruction sent for receiving the server, and according to described First response instruction controls the peripheral hardware end execution first response and instructs corresponding first response action.

15. a kind of voice interaction device characterized by comprising

Wake-up states sending module, for sending wake-up states message to terminal and waking up the first sound of the voice interaction device Frequently, for the wake-up states message for notifying the terminal, the voice interaction device is in wake-up states；

Second audio sending module, for sending the second audio to the terminal, so that the terminal sends out second audio It send to server, is instructed so that the server returns to the first response to the terminal according to second audio, described second Audio be the terminal determine include in first audio voice interaction device wake-up information after send；

16. a kind of terminal characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the terminal perform claim It is required that the described in any item methods of 1-6.

17. a kind of peripheral hardware end characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that peripheral hardware end right of execution Benefit requires the described in any item methods of 7-13.

18. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium It executes instruction, when the computer executed instructions are executed by processor, realizes method described in any one of claims 1-6.

19. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium It executes instruction, when the computer executed instructions are executed by processor, realizes the described in any item methods of claim 7-13.