CN109599111A

CN109599111A - Voice interactive method, device and storage medium

Info

Publication number: CN109599111A
Application number: CN201910000688.6A
Authority: CN
Inventors: 陈果果; 牛飞; 王芃; 邢仁泰; 张涛
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2019-01-02
Filing date: 2019-01-02
Publication date: 2019-04-09

Abstract

The present invention provides a kind of voice interactive method, device and storage medium, this method comprises: receiving the first audio that peripheral hardware end is sent, if including the corresponding wake-up word of terminal in the first audio, terminal enters wake-up states；The second audio that peripheral hardware end is sent is received, and the second audio is sent to server, so that server is according to the second audio to terminal returning response audio；The response audio that server is sent is received, and plays response audio.The present invention carries out radio reception by peripheral hardware end, so that interacting acquisition response audio between terminal and server, enriches the interactive function of peripheral hardware end and terminal, improves user experience.

Description

Voice interactive method, device and storage medium

Technical field

The present invention relates to technical field of voice interaction more particularly to a kind of voice interactive methods, device and storage medium.

Background technique

Bluetooth (Bluetooth) is a kind of wireless technology standard, it can be achieved that fixed equipment, mobile device and building people domain Short-range data exchange between net；After terminal and bluetooth equipment are attached, according to the category of bluetooth equipment, bluetooth can be set It is standby to carry out corresponding operation；As bluetooth equipment be Baffle Box of Bluetooth when, terminal can play music by bluetooth equipment.

In the prior art, the interactive function between terminal and bluetooth equipment is single, does not meet the side of current device intelligence To poor user experience.

Summary of the invention

The present invention provides a kind of voice interactive method, device and storage medium, carries out radio reception by peripheral hardware end, so that It is interacted between terminal and server, enriches the interactive function of peripheral hardware end and terminal, improve user experience.

The first aspect of the present invention provides a kind of voice interactive method, is applied to terminal, comprising:

The first audio that the peripheral hardware end is sent is received, if in first audio including the corresponding wake-up word of terminal, Then the terminal enters wake-up states；

The second audio that the peripheral hardware end is sent is received, and second audio is sent to server, so that the clothes Be engaged in device according to second audio to the terminal returning response audio；

The response audio that the server is sent is received, and plays the response audio.

Optionally, after second audio is sent to server by the terminal, further includes:

It receives the stopping that the server is sent and sends message, the stopping sends message and is used to indicate the terminal stopping Radio reception is sent to the server, it is described to stop sending message being the of the server after receiving second audio In one preset duration, do not receive what the third audio that the terminal is sent was sent.

Optionally, the peripheral hardware end is the peripheral hardware end for continuing radio reception；

Second audio for receiving the peripheral hardware end and sending, and after second audio is sent to server, also Include:

The third audio that the peripheral hardware end is sent is received, and by the third audio storage in the terminal, described the The difference of the receiving time of the receiving time of three audios and second audio is greater than first preset duration.

Optionally, after the broadcasting response audio, further includes:

If not receiving the new audio that the peripheral hardware end is sent in the second preset duration, enter dormant state, and Sleep messages are sent to the peripheral hardware end, the sleep messages, which are used to indicate the peripheral hardware end, terminates radio reception.

Optionally, before first audio for receiving the peripheral hardware end transmission, further includes:

Radio reception instruction is sent to peripheral hardware end, the radio reception instruction is used to indicate the peripheral hardware end and starts radio reception.

The second aspect of the present invention provides a kind of voice interactive method, is applied to peripheral hardware end, comprising:

The first audio is sent to the terminal, it is described if in first audio including the corresponding wake-up word of terminal Terminal enters wake-up states；

The second audio is sent to the terminal, so that second audio is sent to server by the terminal, so that institute Server is stated according to second audio to the terminal returning response audio.

Optionally, after second audio of transmission to the terminal, further includes:

Third audio, the sending time of the sending time of the third audio and second audio are sent to the terminal Difference be greater than the first preset duration.

Optionally, the method also includes: receive the sleep messages that the terminal sends；

Terminate radio reception.

Optionally, before first audio of transmission to the terminal, further includes:

The radio reception instruction that the terminal is sent is received, and is instructed according to the radio reception and starts radio reception.

The third aspect of the present invention provides a kind of voice interaction device, comprising:

First audio sending module, the first audio sent for receiving the peripheral hardware end, if being wrapped in first audio Containing the corresponding wake-up word of voice interaction device, then the voice interaction device enters wake-up states；

Second audio sending module, the second audio sent for receiving the peripheral hardware end, and second audio is sent out It send to server, so that the server is according to second audio to the voice interaction device returning response audio；

Playing module, the response audio sent for receiving the server, and play the response audio.

Optionally, described device further include: stop sending message reception module；

The stopping sends message reception module, sends message for receiving the stopping that the server is sent, described to stop Only transmission message is used to indicate the voice interaction device and stops sending radio reception to the server, and the stopping, which sends message, is The server is receiving in the first preset duration after second audio, does not receive the voice interaction device hair What the third audio sent was sent.

Optionally, the peripheral hardware end is the peripheral hardware end for continuing radio reception.

Optionally, described device further include: third audio receiving module；

The third audio receiving module, the third audio sent for receiving the peripheral hardware end, and by the third sound Frequency is stored in the voice interaction device, the difference of the receiving time of the receiving time of the third audio and second audio Greater than first preset duration.

Optionally, described device further include: sleep block；

The sleep block, if the new radio reception sent for not receiving the peripheral hardware end in the second preset duration, Then enter dormant state, and send sleep messages to the peripheral hardware end, the sleep messages, which are used to indicate the peripheral hardware end, to be terminated Radio reception.

Optionally, described device further include: radio reception instruction sending module；

The radio reception instruction sending module, for sending radio reception instruction to peripheral hardware end, the radio reception instruction is used to indicate institute It states peripheral hardware end and starts radio reception.

The fourth aspect of the present invention provides a kind of voice interaction device, comprising:

First audio sending module, for sending the first audio to the terminal, if including end in first audio Corresponding wake-up word is held, then the terminal enters wake-up states；

Second audio sending module, for sending the second audio to the terminal, so that the terminal is by second sound Frequency is sent to server, so that the server is according to second audio to the terminal returning response audio.

Optionally, described device further include: third audio sends mould module；

The third audio sending module, for sending third audio to the terminal, when the transmission of the third audio Between and second audio sending time difference be greater than the first preset duration.

Optionally, described device further include: terminate radio module；

The end radio module, the sleep messages sent for receiving the terminal；Terminate radio reception.

Optionally, described device further include: radio reception command reception module；

The radio reception command reception module is opened for receiving the radio reception instruction of terminal transmission, and according to radio reception instruction Beginning radio reception.

The fifth aspect of the present invention provides a kind of terminal, comprising: at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the terminal executes The voice interactive method of above-mentioned first aspect.

The sixth aspect of the present invention provides a kind of peripheral hardware end, comprising: at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the peripheral hardware end is held The voice interactive method of the above-mentioned second aspect of row.

The seventh aspect of the present invention provides a kind of computer readable storage medium, deposits on the computer readable storage medium Computer executed instructions are contained, when the computer executed instructions are executed by processor, realize the voice of above-mentioned first aspect Exchange method.

The eighth aspect of the present invention provides a kind of computer readable storage medium, deposits on the computer readable storage medium Computer executed instructions are contained, when the computer executed instructions are executed by processor, realize the voice of above-mentioned second aspect Exchange method.

The present invention provides a kind of voice interactive method, device and storage medium, this method comprises: receiving what peripheral hardware end was sent First audio, if including the corresponding wake-up word of terminal in the first audio, terminal enters wake-up states；Peripheral hardware end is received to send The second audio, and the second audio is sent to server, so that server is according to the second audio to terminal returning response audio； The response audio that server is sent is received, and plays response audio.The present invention carries out radio reception by peripheral hardware end, so that terminal It is interacted between server, enriches the interactive function of peripheral hardware end and terminal, improve user experience.

Detailed description of the invention

Fig. 1 is the schematic diagram of a scenario that voice interactive method provided by the invention is applicable in；

Fig. 2 is the flow diagram one of voice interactive method provided by the invention；

Fig. 3 is the flow diagram two of voice interactive method provided by the invention；

Fig. 4 is the interface schematic diagram of terminal provided by the invention；

Fig. 5 is the flow diagram three of voice interactive method provided by the invention；

Fig. 6 is the structural schematic diagram one of a voice interaction device provided by the invention；

Fig. 7 is the structural schematic diagram two of a voice interaction device provided by the invention；

Fig. 8 is the structural schematic diagram three of a voice interaction device provided by the invention；

Fig. 9 is the structural schematic diagram one of another voice interaction device provided by the invention；

Figure 10 is the structural schematic diagram two of another voice interaction device provided by the invention；

Figure 11 is the structural schematic diagram three of another voice interaction device provided by the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the embodiment of the present invention, to this Technical solution in inventive embodiments is clearly and completely described, it is clear that described embodiment is that a part of the invention is real Example is applied, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creation Property labour under the premise of every other embodiment obtained, shall fall within the protection scope of the present invention.

Bluetooth peripheral hardware end in existing technology is varied, such as bluetooth headset, Baffle Box of Bluetooth, bluetooth keyboard, movement hand Ring etc., these bluetooth peripheral hardware ends are before use, need to establish bluetooth connection with terminal；Illustratively, Baffle Box of Bluetooth is built with terminal The process of vertical bluetooth connection are as follows: the power key of long-pressing Baffle Box of Bluetooth searches for bluetooth sound so that Baffle Box of Bluetooth is opened at the terminal The title of case, input pairing password, then can establish bluetooth connection.

Upon establishment of a connection, terminal can pass through the song or other audios on Baffle Box of Bluetooth playback terminal, the audio It can be stored in the local folders of terminal, be also possible to terminal and interact the instant audio obtained with server；Terminal The file played will be needed to be sent to Baffle Box of Bluetooth, Baffle Box of Bluetooth can the corresponding audio of played file.

But the interactive function in the prior art between terminal and bluetooth peripheral hardware end is excessively single, is merely able to realize in terminal It is passively playable, cannot be interacted with user, poor user experience under control；And can be interacted in the prior art with user Equipment is smart machine, with the proviso that can establish connection with server, the deployment cost at peripheral hardware end is high.

Precisely in order to solving the problems, such as that the interactive function between above-mentioned terminal and bluetooth peripheral hardware end is excessively single, and abundant While interactive function between the two, the deployment cost at bluetooth peripheral hardware end is reduced；The present invention provides a kind of interactive voice sides Formula.Fig. 1 is the schematic diagram of a scenario that voice interactive method provided by the invention is applicable in, as shown in Figure 1, voice provided by the invention is handed over It include: peripheral hardware end, terminal and server in the applicable scene of mutual method.

Wherein, peripheral hardware end can establish bluetooth connection with terminal, and the specific bluetooth connection can be in the prior art Based on the data communication of classical bluetooth, designated equipment is selected in the system set interface guidance user of terminal and completes to match；Or Person, terminal can establish DMA (DuerOS Mobile Accessories) with peripheral hardware end and connect, and illustratively, terminal is being wanted When establishing DMA with peripheral hardware end and connecting, scanning, pairing and the company at peripheral hardware end directly can be completed at the interface of the application program of terminal It connects, the system set interface for needing not return to terminal is configured, then completes connection to the interface of application program.It is corresponding, When establishing common bluetooth connection in the present embodiment, peripheral hardware end is common bluetooth equipment；When terminal establishes DMA connection therewith, Peripheral hardware end is dma device, that is, supports the equipment of DMA Bluetooth protocol.Specifically, what is established when terminal and peripheral hardware end is common bluetooth When connection, specific mode is referred to bluetooth connection mode in the prior art；What terminal and peripheral hardware end were established is that DMA is connect Process, be specifically illustrated in the following embodiments.

It can be wireless connection or wired connection between terminal and server in the present invention, the terminal in the present invention can Think mobile phone, personal digital assistant (PersonalDigital Assistant, PDA), tablet computer, portable equipment (for example, Portable computer, pocket computer or handheld computer) etc. mobile devices；It is also possible to the fixation such as desktop computer to set It is standby.

Below from the angle of interaction between terminal and server, voice interactive method provided by the invention is illustrated, Fig. 2 is the flow diagram one of voice interactive method provided by the invention, as shown in Fig. 2, interactive voice provided in this embodiment Method may include:

S201, terminal receives the first audio that peripheral hardware end is sent, if in the first audio including the corresponding wake-up word of terminal, Then terminal enters wake-up states.

In the present embodiment, the first audio that end-on receives the transmission of peripheral hardware end is parsed one by one, specifically, the parsing Process can be with are as follows: the first audio that terminal will acquire is converted into text.

Whether default wake-up word is had in the first audio that terminal judgement receives, and the wake-up word is for waking up terminal, specifically , it is the interaction waken up between terminal and server.Corresponding, terminal judges whether have in the corresponding lteral data of the first audio It wakes up word or terminal judges whether there is the corresponding audio data of wake-up word in the corresponding audio data of the first audio.In terminal Determine have in the first audio wake up word when, that is, enter wake-up states, i.e., terminal will carry wake up word the first audio after Radio reception is sent to server.

Illustratively, waking up word is " small degree ", then when there is " small degree " in the corresponding lteral data of the first audio, Huo Zhe When there is " small degree " corresponding audio data in the corresponding audio data of one audio, determines in first audio and carry wake-up word " small degree ", then terminal enters wake-up states.

In the present embodiment, it is contemplated that in order to save the memory of terminal, fixed duration can be preset in the terminal, After terminal starts radio reception to peripheral hardware end transmission radio reception instruction peripheral hardware end, called out if terminal does not receive to carry in fixed duration The audio of awake word, then can disconnect with peripheral hardware end, or send to peripheral hardware end and stop radio reception instruction, to indicate that peripheral hardware end is stopped Only radio reception；When really needing the progress radio reception of peripheral hardware end, radio reception instruction can be sent to peripheral hardware end again.

S202, terminal receives the second audio that peripheral hardware end is sent, and the second audio is sent to server.

Terminal is after entering wake-up states, and since peripheral hardware end is always in radio reception, terminal can receive the transmission of peripheral hardware end Second audio, the audio after terminal being entered wake-up states in the present embodiment is as effective audio.Terminal is receiving After imitating radio reception i.e. the second audio, in order to obtain the corresponding response audio of second audio, which can be sent to Server.

Illustratively, for user after finishing " small degree ", which is sent to terminal by peripheral hardware end, and terminal enters wake-up State, then user continues " today, how is Pekinese's weather ", which is sent to terminal by peripheral hardware end, and terminal is into one Second audio " today, how is Pekinese's weather " is sent to server by step, to obtain the response data of second audio.

S203, server is according to the second audio to terminal returning response audio.

In the present embodiment, server can parse the second audio after receiving the second audio.Specifically, clothes The process that business device parses the second audio can be with are as follows: converts text for the second audio, text is carried out cutting processing, is obtained Take the corresponding multiple words of the text；Target word is obtained further according to the part of speech of each word, further according to the semanteme of target word, Obtain the corresponding response audio of second audio.

It can be using such as neural LISP program LISP (Neuro-Linguistic of tokenizer in the present embodiment Programming, NLP) tool carries out word segmentation processing to the corresponding text of the second audio, the corresponding multiple words of text are obtained, If the corresponding text of the second audio is " today, how is Pekinese's weather ", using tokenizer by the character segmentation at multiple words Language, the word after specific cutting can be " today ", " Beijing ", " ", " weather " and " how ".

In the present embodiment, optionally, the corresponding target word of effective information can be obtained according to the part of speech of multiple words of acquisition Language, such as quantifier, adverbial word, the adjective in the conversation message after cutting are removed, the corresponding target word of effective information is obtained, Such as noun and verb obtain the corresponding target word of effective information if removed " how " and " " in above-mentioned cutting result Language, " today ", " Beijing " and " weather ".What server determined that user asks according to the target word of acquisition is Pekinese day today Gas, then server can return to response audio about Beijing weather today to user, such as " today Beijing fine day, temperature 20 Degree ".

It is worth noting that, server can first carry out text when the corresponding text of the second audio is more texts Subordinate sentence processing, then word segmentation processing is carried out to each clause, every height is obtained further according to the semanteme of the middle target word of each clause The corresponding response audio of sentence, the corresponding multiple response audios of the second audio are sent to according to the sequencing of clause in the literature Terminal.

Illustratively, the corresponding text of the second audio of user be " go to Beijing what has joyful? sexual valence of where staying Than high? ", text is divided into two clauses " place what Beijing has joyful " and " cost performance of where staying height " by server. Obtain the corresponding target word of each clause respectively again, such as " Beijing ", " joyful ", " place " and " lodging ", " cost performance is high ", Then obtain the corresponding response audio of each clause respectively, such as be respectively " there are the Forbidden City, Great Wall ... in the joyful place in Beijing " and " staying in Beijing, you can choose the hotel xx ".

S204, terminal receives the response audio that server is sent, and plays response audio.

In the present embodiment, after terminal obtains the corresponding response audio of the second audio, the response audio is played automatically.It can think To, when respond audio it is corresponding be multiple clauses audio when, according to receive respond audio chronological order one by one Response audio is played out.

In the present embodiment, in order to save electricity or the user of terminal terminal is used, terminal is caused to be inconvenient to play sound When answering audio, the response audio can be sent to peripheral hardware end after receiving the response audio, by peripheral hardware end to response audio It plays out.Specifically, peripheral hardware end can be the peripheral hardware end with audio playing function, such as bluetooth sound under this kind of embodiment Case, motion bracelet etc..

In the present embodiment using peripheral hardware end carry out radio reception, compared with the existing technology in terminal directly handed over server Mutually obtain the mode of response audio；It on the one hand, may not be able to be accurate apart from its certain distance since the radio reception effect of terminal is limited Radio reception or radio reception effect are poor, such as have the vehicle-mounted bracket of Mic in the present embodiment using peripheral hardware end, and radio reception effect is more preferable；Separately On the one hand, also make the interaction of terminal and bluetooth equipment more diversified, improve user experience.

The present embodiment provides a kind of voice interactive method, device and storage mediums, receive this method comprises: sending to peripheral hardware end Sound instruction, radio reception instruction are used to indicate peripheral hardware end and start radio reception；The first audio that peripheral hardware end is sent is received, if wrapping in the first audio Containing the corresponding wake-up word of terminal, then terminal enters wake-up states；The second audio that peripheral hardware end is sent is received, and by the second audio It is sent to server, so that server is according to the second audio to terminal returning response audio；Receive the response sound that server is sent Frequently, and response audio is played.The present embodiment carries out radio reception by peripheral hardware end, so that being handed between terminal and server Mutually, the interactive function for enriching peripheral hardware end and terminal, improves user experience.

Voice interactive method provided by the invention is further described below with reference to Fig. 3, Fig. 3 is provided by the invention The flow diagram two of voice interactive method, as shown in figure 3, voice interactive method provided in this embodiment may include:

S301, terminal send radio reception instruction to peripheral hardware end, and radio reception instruction is used to indicate peripheral hardware end and starts radio reception.

In the present embodiment, before terminal sends radio reception instruction to peripheral hardware end, DMA need to be established with peripheral hardware end and connect.

In the prior art, bluetooth connection is established between terminal and peripheral hardware bluetooth equipment are as follows: terminal is swept by existing bluetooth Mode is retouched, i.e. Bluetooth Low Energy (Bluetooth Low Energy, ble) scanning obtains the bluetooth equipment that can connect, with indigo plant Tooth equipment room first establishes ble connection；After the connection is established, bluetooth equipment is indicated to terminal returning response message, the response message Terminal can be by supporting the rfcomm link of rfcomm agreement disappear with the connection of bluetooth equipment, terminal receiving the response It disconnects after breath and being connect with the ble of bluetooth equipment, is attached again through rfcomm link with bluetooth equipment.It is in the prior art Connection type will lead under ble link normal condition, influence the success rate and speed that carry out rfcomm connection.

Peripheral hardware end in the present embodiment is the peripheral hardware end for supporting DMA agreement, specifically, to terminal and peripheral hardware in the present embodiment End is established DMA connection type and is described briefly: terminal supports the DMA peripheral hardware end of DMA agreement to send out to terminal during scanning Broadcast packet is sent, includes to indicate that the identification information of DMA connection is supported at the peripheral hardware end in the broadcast packet, then terminal directly passes through Rfcomm link is attached with peripheral hardware end, is solved under ble link normal condition in the prior art, influences to carry out rfcomm The problem of success rate and speed of connection.

Peripheral hardware end in the present embodiment has function of radio receiver, specifically, peripheral hardware end can be the vehicle-mounted branch with Mike Mic Frame, Baffle Box of Bluetooth, bluetooth headset, light emitting diode (Light-Emitting Diode, LED) lamp etc. with function of radio receiver are set It is standby.

Terminal with after bluetooth connection is established at peripheral hardware end or DMA is connect, when user has interactive voice demand, as user wants It when inquiring weather, playing song, can be operated on the interface of terminal, radio reception is sent to peripheral hardware end with triggering terminal and is referred to It enables.Fig. 4 is the interface schematic diagram of terminal provided by the invention, can be as shown in figure 4, terminal is after establishing connection with peripheral hardware end The title that peripheral hardware end is shown on terminal interface, such as peripheral hardware end A；And " start radio reception " control, user by click or other " starting radio reception " control is somebody's turn to do in operation selection, sends radio reception instruction to peripheral hardware end with triggering terminal, specifically, radio reception instruction is used for Instruction peripheral hardware end starts radio reception.

S302, peripheral hardware end receives the radio reception instruction that terminal is sent, and is instructed according to radio reception and start radio reception.

In the present embodiment, after peripheral hardware termination receives the radio reception instruction of terminal, start radio reception.

Specifically, the peripheral hardware end in the present embodiment can be the peripheral hardware end for continuing radio reception, such as there is the vehicle-mounted bracket of Mic, When the vehicle-mounted bracket is dma device, since the vehicle-mounted bracket is the category for continuing radio reception, i.e., after it can will receive radio reception instruction Audio be sent to terminal；Correspondingly, be provided in terminal storage audio file perhaps catalogue in file or mesh Record the audio sent dedicated for storage peripheral hardware end.

It is envisioned that the audio that peripheral hardware end obtains is to use if peripheral hardware end starts radio reception after terminal triggering radio reception instruction The wake-up audio at family.Illustratively, user loquiturs after selection " starting radio reception " control, as terminal and server carry out Interactive wake-up word is " small degree ", then user can say " small degree ", be interacted with waking up terminal with server.

But at peripheral hardware end during radio reception, user wakes up word since other matters fail to export in time, then peripheral hardware end is obtained The audio taken can in ambient enviroment automobile sound or user and the session talked of other users, in the case of this kind, The audio for waking up word is carried since terminal does not receive, the interaction between terminal and server is not waken up, but peripheral hardware end is still Meeting radio reception, and it is sent to terminal.

S303, terminal receives the first audio that peripheral hardware end is sent, if in the first audio including the corresponding wake-up word of terminal, Then terminal enters wake-up states.

S304, terminal receives the second audio that peripheral hardware end is sent, and the second audio is sent to server.

S305, terminal receive server send stopping send message, stop send message be used to indicate terminal stop to Server sends audio.

The first preset duration is provided in the present embodiment, in server, server is in the second sound for receiving terminal transmission After frequency, if not receiving the third audio of terminal transmission in the first preset time again, it is determined that user, which speaks, to be finished, then according to the Two audios obtain corresponding response audio, and send to terminal and stop sending message.

Specifically, terminal no longer sends new sound after the stopping for receiving server transmission sends message to server Frequently.It is envisioned that sending the second sound to terminal if the peripheral hardware end in the present embodiment is the peripheral hardware end for continuing radio reception After first preset duration of frequency, moreover it is possible to third audio is obtained, then continues to send third audio to terminal, and terminal receives this After third audio, the third audio is no longer sent to server and is handled.If the peripheral hardware end in the present embodiment is controllable receives The peripheral hardware end of sound, then terminal can send to peripheral hardware end after the stopping for receiving server transmission sends message and stop sending Message can reduce the power consumption at peripheral hardware end so that peripheral hardware end stops radio reception in this way.

S306, peripheral hardware end send third audio to terminal.

Peripheral hardware end in the present embodiment is the peripheral hardware end for continuing radio reception, specifically, it is in the radio reception instruction for receiving terminal Afterwards, start radio reception；Even if after the stopping that terminal receives server transmission sends message still third sound can be sent to terminal Frequently, wherein the difference of the receiving time of the receiving time of third audio and the second audio is greater than the first preset duration.

It is envisioned that peripheral hardware end belongs to the peripheral hardware end of lasting radio reception, stop receiving as long as terminal is not sent to peripheral hardware end Audio can be sent to always terminal by the instruction of sound, peripheral hardware end, and in the present embodiment, third audio is being sent to terminal by peripheral hardware end Afterwards, also the 4th audio, fifth audio can be sent to terminal successively.

S307, terminal by third audio storage in the terminal.

The file or catalogue of storage audio, the third sound that terminal will receive are provided in terminal in the present embodiment Frequency is stored in corresponding file or catalogue.Wherein, since terminal receives the message of the stopping radio reception of server, then not Subsequently received audio server is sent to again to handle.

It is envisioned that the interaction in order to continue to realize server and terminal, user can be arranged at the terminal by the Three audios are sent to server, so that server parses third audio, return to the corresponding response of third audio for terminal Audio.

S308, server is according to the second audio to terminal returning response audio.

S309, terminal receives the response audio that server is sent, and plays response audio.

The specific embodiment of S303-S304, S308-S309 in the present embodiment can refer to S201- in above-described embodiment Associated description in S202, S203-S204, this will not be repeated here.

Terminal in the present embodiment is established DMA with peripheral hardware end and is connect, and solves ble link normal condition in the prior art Under, influence the problem of carrying out the success rate and speed of rfcomm connection；And the first preset duration is previously provided in server, In the first preset duration after terminal to server the second audio of transmission, if terminal does not send third audio to server, take Device be engaged in terminal transmission stopping radio reception information, instruction terminal no longer sends new radio reception to server, and server is to the second audio Parsing and returning response audio are carried out, the interaction function audio energy of peripheral hardware end and terminal is enriched, improves user experience.

Voice interactive method provided by the invention is further described below with reference to Fig. 5, Fig. 5 is provided by the invention The flow diagram three of voice interactive method, as shown in figure 5, voice interactive method provided in this embodiment may include:

S501, terminal send radio reception instruction to peripheral hardware end, and radio reception instruction is used to indicate peripheral hardware end and starts radio reception.

S502, peripheral hardware end receives the radio reception instruction that terminal is sent, and is instructed according to radio reception and start radio reception.

S503, terminal receives the first audio that peripheral hardware end is sent, if in the first audio including the corresponding wake-up word of terminal, Then terminal enters wake-up states.

S504, terminal receives the second audio that peripheral hardware end is sent, and the second audio is sent to server.

S505, terminal receive server send stopping send message, stop send message be used to indicate terminal stop to Server sends audio.

S506, server is according to the second audio to terminal returning response audio.

S507, terminal receives the response audio that server is sent, and plays response audio.

S508 enters suspend mode shape if terminal does not receive the new audio of peripheral hardware end transmission in the second preset duration State, and sleep messages are sent to peripheral hardware end.

The second preset duration is stored in the present embodiment, in terminal, it is second pre- after receiving the second audio in terminal If if the new audio of peripheral hardware end transmission is not received in duration, it is determined that the demand of the not new interactive voice of user, then eventually End enters dormant state, specifically also sends sleep messages to peripheral hardware end.Sleep messages, which are used to indicate peripheral hardware end, terminates radio reception.

Wherein, terminal enter dormant state can be terminal need to receive again carry wake up word audio when, It can enter the wake-up states interacted with server.It is worth noting that, the second preset duration in the present embodiment is greater than, server From receiving duration of second audio to returning response audio.

S509, peripheral hardware end terminate radio reception.

Peripheral hardware end determines the demand of the not new interactive voice of user, then after the sleep messages for receiving terminal transmission Terminate radio reception.Wait the radio reception instruction of next terminal.

The specific embodiment of S501-S505, S506-S507 in the present embodiment can refer to S301- in above-described embodiment Associated description in S305, S308-S309, this will not be repeated here.

In the present embodiment, if terminal receive do not received in the second preset duration after the second audio peripheral hardware end send New audio, then enter dormant state, and to peripheral hardware end send sleep messages, sleep messages, which are used to indicate peripheral hardware end, to be terminated to receive Sound.The present embodiment terminal in demand of the user without new interactive voice enters dormant state, and peripheral hardware end stops radio reception, with section Save the power consumption of terminal and peripheral hardware end.

Fig. 6 is the structural schematic diagram one of a voice interaction device provided by the invention, as shown in fig. 6, the interactive voice fills It is set to terminal, which includes: the first audio sending module 601, the second audio sending module 602 and play Module 603.

First audio sending module 601, for receiving the first audio of peripheral hardware end transmission, if including language in the first audio The corresponding wake-up word of sound interactive device, then voice interaction device enters wake-up states.

Second audio for receiving the second audio of peripheral hardware end transmission, and is sent to clothes by the second audio sending module 602 It is engaged in device, so that server is according to the second audio to voice interaction device returning response audio.

Playing module 603 for receiving the response audio of server transmission, and plays response audio.

Voice interaction device provided in this embodiment is similar with principle and technical effect that above-mentioned voice interactive method is realized, Therefore not to repeat here.

Optionally, Fig. 7 is the structural schematic diagram two of a voice interaction device provided by the invention, as shown in fig. 7, the voice Interactive device 600 includes: to stop sending message reception module 604, third audio receiving module 605 and sleep block 606 and receive Sound instruction sending module 607.

Stop sending message reception module 604, the stopping for receiving server transmission sends message, stops sending message It is used to indicate voice interaction device to stop sending audio to server, stops sending message being that server is receiving the second audio In the first preset duration later, the third audio transmission of voice interaction device transmission is not received.

Optionally, peripheral hardware end is the peripheral hardware end for continuing radio reception.

Third audio receiving module 605, for receiving the third audio of peripheral hardware end transmission, and by third audio storage in language In sound interactive device, the difference of the receiving time of the receiving time of third audio and the second audio is greater than the first preset duration.

Sleep block 606, if the new audio for not receiving the transmission of peripheral hardware end in the second preset duration, enters Dormant state, and sleep messages are sent to peripheral hardware end, sleep messages, which are used to indicate peripheral hardware end, terminates radio reception.

Radio reception instruction sending module 607, for sending radio reception instruction to peripheral hardware end, the radio reception instruction is used to indicate described Peripheral hardware end starts radio reception.

Fig. 8 is the structural schematic diagram three of a voice interaction device provided by the invention, as shown in figure 8, the interactive voice fills Setting 800 includes: memory 801 and at least one processor 802.

Memory 801, for storing program instruction.

Processor 802, for being performed the voice interactive method realized in the present embodiment, specific implementation in program instruction Principle can be found in above-described embodiment, and details are not described herein again for the present embodiment.

The voice interaction device 800 can also include and input/output interface 803.

Input/output interface 803 may include independent output interface and input interface, or integrated input and defeated Integrated interface out.Wherein, output interface is used for output data, and input interface is used to obtain the data of input, above-mentioned output Data are the general designation exported in above method embodiment, and the data of input are the general designation inputted in above method embodiment.

The present invention also provides a kind of readable storage medium storing program for executing, it is stored with and executes instruction in readable storage medium storing program for executing, work as interactive voice When at least one processor of device executes this and executes instruction, when computer executed instructions are executed by processor, realize above-mentioned Voice interactive method in embodiment.

The present invention also provides a kind of program product, the program product include execute instruction, this execute instruction be stored in it is readable In storage medium.At least one processor of voice interaction device can read this from readable storage medium storing program for executing and execute instruction, at least One processor executes this and executes instruction so that voice interaction device implements the interactive voice that above-mentioned various embodiments provide Method.

Fig. 9 is the structural schematic diagram one of another voice interaction device provided by the invention, as shown in figure 9, the interactive voice Device is peripheral hardware end, which includes: the first audio sending module 901 and the second audio sending module 902.

First audio sending module 901, for sending the first audio to the terminal, if including in first audio The corresponding wake-up word of terminal, then the terminal enters wake-up states.

Second audio sending module 902, for sending the second audio to the terminal, so that the terminal is by described second Audio is sent to server, so that the server is according to second audio to the terminal returning response audio.

Optionally, Figure 10 is the structural schematic diagram two of another voice interaction device provided by the invention, as shown in Figure 10, should Voice interaction device 900 includes: third audio sending module 903, terminates radio module 904 and radio reception command reception module 905.

The third audio sending module 903, for sending third audio, the transmission of the third audio to the terminal The difference of the sending time of time and second audio is greater than the first preset duration.

The end radio module 904, the sleep messages sent for receiving the terminal；Terminate radio reception.

Radio reception command reception module 905 starts for receiving the radio reception instruction of terminal transmission, and according to radio reception instruction Radio reception.

Figure 11 is the structural schematic diagram three of another voice interaction device provided by the invention, and as shown in figure 11, which hands over Mutual device 1100 includes: memory 1101 and at least one processor 1102.

Memory 1101, for storing program instruction.

Processor 1102, it is specific real for being performed the voice interactive method realized in the present embodiment in program instruction Existing principle can be found in above-described embodiment, and details are not described herein again for the present embodiment.

The voice interaction device 1100 can also include and input/output interface 1103.

Input/output interface 1103 may include independent output interface and input interface, or integrated input and The integrated interface of output.Wherein, output interface is used for output data, and input interface is used to obtain the data of input, above-mentioned output Data be the general designation that exports in above method embodiment, the data of input are the general designation inputted in above method embodiment.

In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this hair The part steps of bright each embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviation: ROM), random access memory (English: Random Access Memory, letter Claim: RAM), the various media that can store program code such as magnetic or disk.

In the embodiment of the above-mentioned network equipment or terminal device, it should be appreciated that processor can be central processing unit (English: Central Processing Unit, referred to as: CPU), it can also be other general processors, digital signal processor (English: Digital Signal Processor, abbreviation: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) etc..General processor can be microprocessor or the processor It is also possible to any conventional processor etc..Hardware handles can be embodied directly in conjunction with the step of method disclosed in the present application Device executes completion, or in processor hardware and software module combination execute completion.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of voice interactive method is applied to terminal characterized by comprising

The first audio that peripheral hardware end is sent is received, if in first audio including the corresponding wake-up word of terminal, the end End enters wake-up states；

The second audio that the peripheral hardware end is sent is received, and second audio is sent to server, so that the server According to second audio to the terminal returning response audio；

2. the method according to claim 1, wherein second audio is sent to server in the terminal Later, further includes:

It receives the stopping that the server is sent and sends message, the stopping transmission message is used to indicate the terminal and stops to institute It states server and sends audio, described to stop sending message be that the server is first pre- after receiving second audio If in duration, not receiving what the third audio that the terminal is sent was sent.

3. according to the method described in claim 2, it is characterized in that, the peripheral hardware end is the peripheral hardware end for continuing radio reception；

Second audio for receiving the peripheral hardware end and sending, and second audio is sent to after server, further includes:

The third audio that the peripheral hardware end is sent is received, and by the third audio storage in the terminal, the third sound Frequently are as follows: receive the audio received after first preset duration of second audio.

4. the method according to claim 1, wherein after the broadcasting response audio, further includes:

If not receiving the new audio that the peripheral hardware end is sent in the second preset duration, enter dormant state, and to institute It states peripheral hardware end and sends sleep messages, the sleep messages, which are used to indicate the peripheral hardware end, terminates radio reception.

5. the method according to claim 1, wherein it is described receive the first audio that the peripheral hardware end is sent it Before, further includes:

6. a kind of voice interactive method is applied to peripheral hardware end characterized by comprising

The first audio is sent to terminal, if in first audio including the corresponding wake-up word of terminal, the terminal enters Wake-up states；

The second audio is sent to the terminal, so that second audio is sent to server by the terminal, so that the clothes Be engaged in device according to second audio to the terminal returning response audio.

7. according to the method described in claim 6, it is characterized in that, also being wrapped after second audio of transmission to the terminal It includes:

To terminal transmission third audio, the difference of the sending time of the sending time of the third audio and second audio Greater than the first preset duration.

8. the method according to the description of claim 7 is characterized in that the method also includes:

Receive the sleep messages that the terminal is sent；

Terminate radio reception.

9. according to the method described in claim 6, it is characterized in that, also being wrapped before first audio of transmission to the terminal It includes:

10. a kind of voice interaction device characterized by comprising

First audio sending module, for receiving the first audio of peripheral hardware end transmission, if including voice in first audio The corresponding wake-up word of interactive device, then the voice interaction device enters wake-up states；

Second audio sending module, the second audio sent for receiving the peripheral hardware end, and second audio is sent to Server, so that the server is according to second audio to the voice interaction device returning response audio；

11. device according to claim 10, which is characterized in that the voice interaction device further include: stop transmission and disappear Cease receiving module；

The stopping sends message reception module, sends message, the stopping hair for receiving the stopping that the server is sent It send message to be used to indicate the voice interaction device to stop sending audio to the server, it is described that the stopping, which sends message, Server is receiving in the first preset duration after second audio, does not receive what the voice interaction device was sent What third audio was sent.

12. device according to claim 11, which is characterized in that the peripheral hardware end is the peripheral hardware end for continuing radio reception；It is described Voice interaction device further include: third audio receiving module；

The third audio receiving module, the third audio sent for receiving the peripheral hardware end, and the third audio is deposited In the voice interaction device, the difference of the receiving time of the receiving time of the third audio and second audio is greater than for storage First preset duration.

13. device according to claim 10, which is characterized in that the voice interaction device further include: sleep block；

The sleep block, if the new radio reception sent for not receiving the peripheral hardware end in the second preset duration, into Enter dormant state, and send sleep messages to the peripheral hardware end, the sleep messages, which are used to indicate the peripheral hardware end, terminates radio reception.

14. device according to claim 10, which is characterized in that the voice interaction device further include: radio reception instruction hair Send module；

The radio reception instruction sending module, for sending radio reception instruction to peripheral hardware end, the radio reception instruction is used to indicate described outer If end starts radio reception.

15. a kind of voice interaction device characterized by comprising

Radio reception command reception module for receiving the radio reception instruction of terminal transmission, and instructs according to the radio reception and starts radio reception；

First audio sending module, for sending the first audio to the terminal, if including terminal pair in first audio The wake-up word answered, then the terminal enters wake-up states；

Second audio sending module, for sending the second audio to the terminal, so that the terminal sends out second audio It send to server, so that the server is according to second audio to the terminal returning response audio.

16. device according to claim 15, which is characterized in that the voice interaction device further include: third audio hair Send mould module；

The third audio sending module, for the terminal send third audio, the sending time of the third audio with The difference of the sending time of second audio is greater than the first preset duration.

17. device according to claim 16, which is characterized in that the voice interaction device further include: terminate radio reception mould Block；

18. device according to claim 15, which is characterized in that the voice interaction device further include: radio reception instruction connects Receive module；

The radio reception command reception module starts to receive for receiving the radio reception instruction of terminal transmission, and according to radio reception instruction Sound.

19. a kind of terminal characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the terminal perform claim It is required that the described in any item methods of 1-5.

20. a kind of peripheral hardware end characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that peripheral hardware end right of execution Benefit requires the described in any item methods of 6-9.

21. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium It executes instruction, when the computer executed instructions are executed by processor, realizes the described in any item methods of claim 1-5.

22. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium It executes instruction, when the computer executed instructions are executed by processor, realizes the described in any item methods of claim 6-9.