CN111367491A

CN111367491A - Voice interaction method and device, electronic equipment and storage medium

Info

Publication number: CN111367491A
Application number: CN202010136026.4A
Authority: CN
Inventors: 姜彦兮
Original assignee: Chengdu Jimi Technology Co Ltd
Current assignee: Chengdu Jimi Technology Co Ltd; Chengdu XGIMI Technology Co Ltd
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2020-07-03

Abstract

The present application relates to the field of information processing technologies, and in particular, to a voice interaction method, apparatus, electronic device, and storage medium. The voice interaction method provided by the embodiment of the application comprises the following steps: and responding to the radio reception request, adjusting the working state to a radio reception state to collect audio information, sending the audio information to the equipment host to generate a radio reception stopping instruction when the display interface indicated by the audio information is a target interface, and then responding to the radio reception stopping instruction, adjusting the working state to a non-radio reception state to stop collecting the audio information. The voice interaction method, the voice interaction device, the electronic equipment and the storage medium provided by the embodiment of the application can reduce the power consumption of the voice controller by reducing the time of the voice controller in a radio receiving state, so that the cruising ability is improved.

Description

Voice interaction method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a voice interaction method, apparatus, electronic device, and storage medium.

Background

The voice controller is a controller driven by language in a human-computer interaction system. In the prior art, the voice controller can only be awakened by the awakening word so as to carry out subsequent voice interaction. Therefore, after the existing voice controller is started, the existing voice controller needs to be in a radio reception state all the time so as to perform voice detection and awakening identification, so that the power consumption is high, and the cruising ability is weak.

Disclosure of Invention

An object of the present invention is to provide a voice interaction method, apparatus, electronic device and storage medium to solve the above problem.

In a first aspect, a voice interaction method provided in an embodiment of the present application includes:

responding to the radio reception request, and adjusting the working state to a radio reception state to collect audio information;

sending the audio information to the equipment host machine, so that the equipment host machine generates a radio reception stopping instruction when the display interface indicated by the audio information is a target interface;

and responding to the instruction of stopping receiving the sound, and adjusting the working state to be a non-sound-receiving state so as to stop collecting the audio information.

The voice interaction method provided by the embodiment of the application comprises the following steps: and responding to the radio reception request, adjusting the working state to a radio reception state to collect audio information, sending the audio information to the equipment host to generate a radio reception stopping instruction when the display interface indicated by the audio information is a target interface, and then responding to the radio reception stopping instruction, adjusting the working state to a non-radio reception state to stop collecting the audio information. Therefore, the power consumption of the voice controller can be reduced by reducing the time of the voice controller in a radio receiving state, and the cruising ability is improved.

With reference to the first aspect, an embodiment of the present application further provides a first optional implementation manner of the first aspect, where the voice interaction method further includes:

when the radio reception control key is triggered, a radio reception request is generated.

The voice interaction method provided by the embodiment of the application further comprises the following steps: when the radio reception control key is triggered, a radio reception request is generated. The judgment of whether the radio reception control key is triggered has higher accuracy, so that the reliability of generating the radio reception request can be ensured.

With reference to the first aspect, an embodiment of the present application further provides a second optional implementation manner of the first aspect, where the sending the audio information to the device host is used for the device host to generate a radio reception stopping instruction when a display interface indicated by the audio information is a target interface, and the generating includes:

and sending the audio information to the equipment host to enable the equipment host to extract voice information from the audio information, decoding the voice information to obtain text information, performing semantic analysis on the text information to obtain an analysis result, and judging whether a display interface indicated by the audio information is a target interface according to the analysis result.

In a second aspect, a voice interaction method provided in an embodiment of the present application includes:

when audio information sent by a voice controller is received, judging whether a display interface indicated by the audio information is a target interface;

and when the display interface indicated by the audio information is the target interface, displaying the display interface indicated by the audio information, and generating a reception stopping instruction so that the voice controller responds to the reception stopping instruction, and adjusting the working state to a non-reception state so as to stop collecting the audio information.

The voice interaction method in the embodiment of the application comprises the following steps: when audio information sent by the voice controller is received, whether a display interface indicated by the audio information is a target interface is judged, when the display interface indicated by the audio information is the target interface, the display interface indicated by the audio information is displayed, a reception stopping instruction is generated, the voice controller responds to the reception stopping instruction, the working state is adjusted to be a non-reception state, and the collection of the audio information is stopped. Therefore, the power consumption of the voice controller can be reduced by reducing the time of the voice controller in a radio receiving state, and the cruising ability is improved.

With reference to the second aspect, this application embodiment further provides a first optional implementation manner of the second aspect, where when receiving audio information sent by the voice controller, determining whether a display interface indicated by the audio information is a target interface includes:

when audio information sent by a voice controller is received, extracting the voice information from the audio information;

decoding the voice information to obtain text information;

and performing semantic analysis on the text information to obtain an analysis result, and judging whether a display interface indicated by the audio information is a target interface according to the analysis result.

With reference to the first optional implementation manner of the second aspect, this application example further provides a third optional implementation manner of the second aspect, where performing semantic analysis on the text information to obtain an analysis result, and determining whether a display interface indicated by the audio information is a target interface according to the analysis result, where the method includes:

performing semantic analysis on the text information to obtain an analysis result;

determining a display interface indicated by the audio information according to the analysis result, and acquiring an interface label corresponding to the display page;

and judging whether the display interface indicated by the audio information is a target interface or not according to the interface label.

With reference to the second aspect, an embodiment of the present application further provides a fourth optional implementation manner of the second aspect, where the voice interaction method further includes:

and when the display interface indicated by the audio information is not the target interface, displaying the display interface indicated by the audio information, and generating a re-judgment instruction, wherein the re-judgment instruction is used for controlling the equipment host to perform the step of judging whether the display interface indicated by the audio information is the target interface or not when the audio information sent by the voice controller is received again.

In the embodiment of the application, when the display interface indicated by the audio information is not the target interface, the display interface indicated by the audio information is displayed, and a re-judgment instruction is generated, wherein the re-judgment instruction is used for controlling the device host to perform the step of judging whether the display interface indicated by the audio information is the target interface or not when the audio information sent by the voice controller is received again. Therefore, multi-turn conversation between the user and the equipment host can be realized through the voice interaction device, and the control convenience of the equipment host is enhanced.

In a third aspect, a voice interaction apparatus provided in an embodiment of the present application includes:

the first adjusting module is used for responding to the radio reception request and adjusting the working state into a radio reception state so as to acquire audio information;

the sending module is used for sending the audio information to the equipment host so that the equipment host generates a reception stopping instruction when the display interface indicated by the audio information is a target interface;

and the second adjusting module is used for responding to the instruction of stopping receiving the sound and adjusting the working state into a non-sound-receiving state so as to stop collecting the audio information.

The voice interaction apparatus provided in the embodiment of the present application has the same beneficial effects as the voice interaction method provided in the first aspect or any one of the optional implementation manners of the first aspect, and details are not described here.

In a fourth aspect, a voice interaction apparatus provided in an embodiment of the present application includes:

the judging module is used for judging whether a display interface indicated by the audio information is a target interface or not when the audio information sent by the voice controller is received;

and the first instruction generation module is used for displaying the display interface indicated by the audio information when the display interface indicated by the audio information is a target interface, and generating a reception stopping instruction so that the voice controller responds to the reception stopping instruction and adjusts the working state to be a non-reception state to stop collecting the audio information.

The voice interaction apparatus provided in the embodiment of the present application has the same beneficial effects as the voice interaction method provided in the second aspect or any optional implementation manner of the second aspect, and details are not described here.

In a fifth aspect, an electronic device provided in an embodiment of the present application includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the computer program to implement the voice interaction method provided in the first aspect, the first optional implementation manner of the first aspect, or the voice interaction method provided in any optional implementation manner of the second aspect, the second aspect.

The electronic device provided in the embodiment of the present application has the same beneficial effects as the voice interaction method provided in the first aspect, the first optional implementation manner of the first aspect, or the voice interaction method provided in any optional implementation manner of the second aspect, and details of the method are not repeated herein.

In a sixth aspect, an embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed, the voice interaction method provided in the first aspect, the first optional implementation manner of the first aspect, or the voice interaction method provided in any optional implementation manner of the second aspect, is implemented.

The storage medium provided by the embodiment of the present application has the same beneficial effects as the voice interaction method provided by any one of the first aspect, the first optional implementation manner of the first aspect, or the second aspect, and the second optional implementation manner of the second aspect, and is not described herein again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart illustrating steps of a voice interaction method according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating another step of a voice interaction method according to an embodiment of the present application.

Fig. 3 is a schematic structural block diagram of a voice interaction apparatus according to an embodiment of the present application.

Fig. 4 is a block diagram of another schematic structure of a voice interaction apparatus according to an embodiment of the present application.

Fig. 5 is a schematic structural block diagram of an electronic device according to an embodiment of the present application.

Fig. 6 is a block diagram of another schematic structure of an electronic device according to an embodiment of the present disclosure.

Reference numerals: 111-a first adjustment module; 112-a sending module; 113-a second adjustment module; 121-a judgment module; 122-a first instruction generation module; 200-an electronic device; 210-a processor; 220-a memory; 230-display.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Furthermore, it should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The first embodiment:

please refer to fig. 1, which is a flowchart illustrating a voice interaction method applied to a voice controller according to an embodiment of the present application. It is understood that in the embodiment of the present application, the voice controller can communicate with the control host, for example, can communicate with the control host through a bluetooth communication module. In addition, it should be noted that the voice interaction method provided in the embodiment of the present application is not limited by fig. 1 and the following sequence, and the following describes the step flow of the voice interaction method provided in the embodiment of the present application with reference to fig. 1.

Step S110, responding to the radio reception request, and adjusting the working state to a radio reception state to collect audio information.

In the embodiment of the application, the working state of the voice controller comprises a radio receiving state and a non-radio receiving state, when the working state of the voice controller is the radio receiving state, the voice controller can collect audio information of the application environment and send the audio information to the equipment host, and when the working state of the voice controller is the non-radio receiving state, the voice controller stops collecting the audio information of the application environment. In addition, it can be understood that, in the embodiment of the present application, after the voice controller is turned on or the voice interaction function is turned on, the operating state defaults to the non-sound-reception state, and only when a sound-reception request is generated, the operating state is adjusted to the sound-reception state in response to the sound-reception request, and regarding the generation of the sound-reception request, as an optional implementation manner, the embodiment of the present application may include step S011.

Step S011, when the radio reception control key is triggered, a radio reception request is generated.

In the embodiment of the application, the voice controller is provided with a radio control key, and the radio control key can be a mechanical key or a touch key, such as a resistance-type touch key, a capacitance-type touch key and the like. The user can generate the radio reception request by triggering the radio reception control key, so that the radio reception control key can be monitored, and the radio reception request is generated when the radio reception control key is triggered and indicated by the monitoring result.

In addition, in this embodiment of the application, the voice controller may collect audio information through a dedicated Codec (COder/DECoder) chip, and perform analog-to-digital conversion on the audio information, so as to convert the audio information collected in the form of an analog signal into audio information existing in the form of a digital signal.

And step S120, sending the audio information to the equipment host, so that the equipment host generates a radio reception stopping instruction when the display interface indicated by the audio information is a target interface.

In order to ensure the timeliness of voice interaction, in the embodiment of the application, the voice controller can acquire the audio information and simultaneously send the acquired audio information to the equipment host in real time. Certainly, in order to reduce the data transmission pressure of the audio information and ensure the reliability of signal transmission, in the embodiment of the present application, the audio information collected in the preset time before the current time point may also be sent to the device host at every interval of the preset time, and the preset time may be, but is not limited to, 10ms, 15ms, and 20 ms.

In addition, as for the sending mode of the audio information, in the embodiment of the application, the audio information can be sent to the device host through the bluetooth module. Based on this, before sending the Audio information to the device host, format conversion is also performed on the Audio information, for example, the Audio information is converted into any one of a bluetooth Audio transmission format such as an Advanced Audio Coding (ACC) format, a Sub-Band Coding (SBC) format, an APTX format, an LDAC format, a high-quality wireless Audio (HWA) format, and the like, where the APTX format is a bluetooth Audio transmission format obtained after format conversion of the Audio information by a digital Audio compression algorithm based on a Sub-Band Adaptive Differential Pulse Code Modulation (SB-ADPCM) technique, and the LDAC format is a near lossless Coding format.

When the equipment host receives the audio information sent by the voice controller, whether the display interface indicated by the audio information is a target interface or not can be judged, when the display interface indicated by the audio information is the target interface, the display interface indicated by the audio information is displayed, a radio reception stopping instruction is generated, and the radio reception stopping instruction is sent to the voice controller.

In this embodiment of the application, the method for the device host to determine whether the display interface indicated by the audio information is the target interface may be: the device host extracts voice information from the audio information, decodes the voice information to obtain text information, performs semantic analysis on the text information to obtain an analysis result, and judges whether a display interface indicated by the audio information is a target interface according to the analysis result. Specifically, in reference to the voice interaction method provided in the second embodiment, details of step S211, step S212, and step S213 are not repeated herein.

Step S130, responding to the instruction of stopping sound reception, and adjusting the working state to be a non-sound reception state so as to stop collecting the audio information.

Through the arrangement, the time that the voice controller is in the radio reception state can be shortened, namely, when a user has a video playing requirement, a radio reception request is generated by triggering the radio reception control key, so that the voice controller adjusts the working state to the radio reception state, audio information is collected, meanwhile, the audio information is sent to the equipment host, the equipment host is used for generating a radio reception stopping instruction when the display interface indicated by the audio information is the target interface, the voice controller receives the radio reception stopping instruction sent by the equipment host, the working state is adjusted to the non-radio reception state according to the radio reception stopping instruction, the power consumption of the voice controller is reduced, and the cruising ability is improved.

Second embodiment:

please refer to fig. 2, which is a flowchart illustrating a voice interaction method according to an embodiment of the present application. It should be noted that the voice interaction method provided in the embodiment of the present application is not limited by the sequence shown in fig. 2 and the following, and the following describes the step flow of the voice interaction method provided in the embodiment of the present application with reference to fig. 2.

Step S210, when receiving the audio information sent by the voice controller, determining whether the display interface indicated by the audio information is a target interface.

In the embodiment of the present application, the Device host may be a user terminal having a video playing function, and the user terminal may be, but is not limited to, a smart television, a projection controller, a Personal Computer (PC), a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like. When a user has a video playing requirement, the voice controller which sends out the voice information to enable the working state to be the radio receiving state can collect the audio information comprising the voice information and send the audio information to the equipment host, and when the equipment host receives the audio information sent by the voice controller, whether a display interface indicated by the audio information is a target interface or not can be judged.

It should be noted that, in this embodiment of the application, the target interface is a bottommost interface of video playing, for example, a first page of a video selection page includes selection tags such as "movie", "drama" and "synthesis", a lower layer interface corresponding to the "movie" tag includes region tags such as "chinese", "usa", "korea", type tags such as "comedy", "horror", "suspense", and year tags such as "2019", "2018" and "2017", and a lower layer interface corresponding to the "comedy" tag includes movie tags such as "mouse king", "ghost house" and "world of chumen", so that display interfaces corresponding to movie tags such as "mouse king", "ghost house" and "world of chumen" are bottommost interfaces, and selection tags such as "movie", "drama", "synthesis", and "include" chinese ", and" synthesis ", and a lower layer interface corresponding to the" movie "tag includes selection tags such as" hua "," kor ", and" for example, The display interface corresponding to the region tags such as "usa", "korea", and the year tags such as "comedy", "horror", "suspense", and "2019", "2018", "2017" is the non-bottom interface, and it can also be understood that for a certain tag, when there is no corresponding lower interface, it is the bottom interface, that is, the target interface, and when there is a corresponding lower interface, it is the non-bottom interface, that is, the non-target interface.

Further, as for step S210, in the embodiment of the present application, it includes step S211, step S212, and step S213.

And step S211, when the audio information sent by the voice controller is received, extracting the voice information from the audio information.

Because the collected audio information includes background noise in addition to the voice information sent by the user, in this embodiment of the application, step S211 may include: when receiving the audio information sent by the Voice controller, detecting a start point and an end point of a human Voice from the audio information to extract the Voice information from the audio information according to the start point and the end point, and the process can be implemented based on Voice Activity Detection (VAD) technology. Therefore, even under a complex application environment with serious background noise, the voice information can be extracted from the audio information, and the reliability of the voice interaction method is ensured.

For the VAD technique, in the embodiment of the present application, as a first optional implementation manner, in each piece of collected audio information, the target-length sub-audio information at the beginning may be used as background audio, and the time length corresponding to the target-length sub-audio information may be located in the time length interval [200ms, 400ms ]. Because the user usually does not start to send out the voice information at the moment of triggering the radio reception control key or within the target length period of triggering the radio reception control key, in the embodiment of the application, the background audio can be used as the mute audio, and based on the mute audio, the energy average value of each frame of audio data in the background audio can be used as the reference value. Then, the audio data with the energy value larger than the reference value in the audio information is used as valid data, the audio data with the energy value smaller than or equal to the reference value is used as invalid data, and the voice information is a set of all valid data.

With regard to VAD techniques, in the embodiment of the present application, as a second optional implementation manner, frame division processing may be performed on audio information to obtain multiple frames of audio data, and thereafter, audio feature extraction is performed on each frame of audio data, for example, at least one of audio features such as extraction of logarithmic frame energy, extraction of Zero Crossing Rate (ZCR), normalization of autocorrelation coefficients at a position delayed by one position, first coefficients of Pth order linear prediction, and logarithm of Pth order linear prediction error is extracted. And then, classifying the multi-frame audio data through a preset classification model to determine whether each frame of audio data is valid data or not, wherein the voice information is a set of all valid data.

It can be understood that, in this embodiment of the application, in the second optional implementation manner, the preset classification model may be obtained by constructing a preset model and training the preset model based on an audio sample data set. The preset model may be a Support Vector Machine (SVM) classifier, and the audio sample set includes multiple frames of sample audio, each frame of sample audio has a corresponding audio tag and a corresponding audio feature, where the audio tag includes a voice information tag and a non-language information tag.

Step S212, decoding the voice information to obtain text information.

For step S212, in the embodiment of the present application, the method may be implemented based on an Automatic Speech Recognition (ASR) technology. The principle of the ASR technique is that voice information is segmented to obtain a plurality of segments of sub-voice information, each segment of sub-voice information is encoded to convert each segment of sub-voice information into a digital vector, each digital vector is converted into a chinese character by a preset acoustic model, and each chinese character obtained by conversion is subjected to word-connecting combination to obtain text information.

It can be understood that, in the embodiment of the present application, the preset acoustic model may be a Convolutional Neural Network (CNN) after training based on a language and a text data set, and details of the preset acoustic model are not described herein.

Step S213, performing semantic analysis on the text information to obtain an analysis result, and determining whether the display interface indicated by the audio information is a target interface according to the analysis result.

In this embodiment of the application, each display interface has a corresponding interface tag for representing whether the display interface is a target interface, and the interface tag may be a digital tag, for example, for a certain display interface, if the interface tag is "1", it is determined that the display interface is the target interface, and if the interface tag is not "1", it is determined that the display interface is a non-target interface. Based on this, in the embodiment of the present application, step S213 may include: and performing semantic analysis on the text information to obtain an analysis result, determining a display interface indicated by the audio information according to the analysis result, acquiring an interface tag corresponding to a display page, and judging whether the display interface indicated by the audio information is a target interface according to the interface tag.

As for the step of performing semantic analysis on the text information to obtain an analysis result, in the embodiment of the present application, the step may be implemented based on a Natural Language Processing (NLP) technology, and the NLP technology may include a syntactic semantic analysis technology and an information extraction technology. In the embodiment of the application, word segmentation, part-of-speech tagging, named entity recognition and linking, syntactic analysis, semantic role recognition, ambiguous word disambiguation and other operations can be performed on text information based on syntactic and semantic technical analysis to obtain a first target text. And then, extracting key information such as actions and objects from the first target text based on an information extraction technology, and carrying out word connection combination on the key information to obtain a second target text as an analysis result. For example, when the first target text is "i want to play movie cooking mouse king", the key information extracted from the first target text based on the information extraction technology includes an action "play" and an object "cooking mouse king", and thereafter, the key information "play" and "cooking mouse king" are subjected to conjunctive combination, and the obtained second target text is "play cooking mouse king".

After semantic analysis is performed on the text information and an analysis result is obtained, the display interface indicated by the audio information can be determined according to the analysis result, the interface tag corresponding to the display page is obtained, and whether the display interface indicated by the audio information is the target interface or not is judged according to the interface tag.

And step S220, when the display interface indicated by the audio information is the target interface, displaying the display interface indicated by the audio information, and generating a reception stopping instruction for the voice controller to respond to the reception stopping instruction, and adjusting the working state to a non-reception state to stop collecting the audio information.

In the embodiment of the application, when the display interface indicated by the audio information is the target interface, an interface playing instruction can be generated, the display interface indicated by the audio information is displayed according to the interface playing instruction, meanwhile, a reception stopping instruction is generated, the reception stopping instruction is sent to the voice controller, and then the voice controller responds to the reception stopping instruction and adjusts the working state to be the non-reception state so as to stop collecting the audio information. Therefore, the time that the voice controller is in the radio reception state can be shortened, namely, when a user has a video playing requirement, a radio reception request is generated by triggering the radio reception control key, so that the voice controller adjusts the working state to the radio reception state, audio information is collected, meanwhile, the audio information is sent to the equipment host, the equipment host is used for generating a radio reception stopping instruction when the display interface indicated by the audio information is a target interface, the voice controller receives the radio reception stopping instruction sent by the equipment host, and adjusts the working state to a non-radio reception state according to the radio reception stopping instruction, the power consumption of the voice controller is reduced, and the cruising ability is improved.

In order to implement multiple rounds of conversations between a user and the device host and enhance the control convenience of the device host, the voice interaction device provided in the embodiment of the application may further include step S230, when the display interface indicated by the audio information is not the target interface, displaying the display interface indicated by the audio information, and generating a re-determination instruction, where the re-determination instruction is used to control the device host to perform the step of determining whether the display interface indicated by the audio information is the target interface when the audio information sent by the voice controller is received again.

In addition, based on step S230, in the voice interaction apparatus provided in this embodiment of the present application, in the "search-confirm" step, the user needs to trigger the confirm button disposed on the voice controller again to perform a confirmation action, that is, after searching and determining the display interface indicated by the audio information, the user needs to trigger the confirm button to generate a confirm instruction, and send the confirm instruction to the device host, so that the device host, when receiving the confirm instruction, confirms that the display interface obtained by the search is the intended interface of the user, and for the scheme of displaying the display interface obtained by the search, in the "search-confirm" step, the user does not need to perform a button confirmation again, thereby improving the convenience of the voice interaction method.

For convenience of understanding, the following describes, by way of example, a first workflow of a voice interaction system implemented by the voice interaction method provided in connection with the first embodiment and the voice interaction method provided in the second embodiment.

Similarly, for example, the top page of the video selection page includes selection tags such as "movie", "drama", "anarchia", and the like, the lower interface corresponding to the "movie" tag includes region tags such as "chinese", "usa", "korea", and the like, type tags such as "comedy", "horror", "suspense", and the like, and year tags such as "2019", "2018", "2017", and the lower interface corresponding to the "comedy" tag includes movie tags such as "cui mouse king", "foggy house", "chou men world", and the user has not determined the target movie.

After a user triggers a radio control key to generate a radio request, the voice controller adjusts the working state to a radio state to acquire audio information, and then the user can send first voice information 'open a video selection page', and send the first audio information to the equipment host.

After that, the user may send out the second voice message "open the movie selection tag", and after the voice controller collects the second audio information including the second voice information, the second audio information is sent to the device host.

Then, the user may send third voice information "open the comedy type tag", and after the voice controller collects the third audio information including the third voice information, the third audio information is sent to the device host.

If the user regards as the target film the film that the film label "cooking mouse king" that is located the display interface first page that comedy type label corresponds, then the user can send fourth voice message "broadcast cooking mouse king", after voice controller gathered the fourth audio information including fourth voice information, send fourth audio information for the equipment host computer, because the display interface that fourth audio information instructs is the target interface, consequently, the equipment host computer shows the display interface that fourth audio information instructs, namely, broadcast film "cooking mouse king", and generate and stop the radio reception instruction, send the instruction of stopping the radio reception for voice controller, voice controller then adjusts operating condition to non-radio reception state according to the instruction of stopping the radio reception.

After the display interface corresponding to the comedy type label is displayed, the user can also send out fifth voice information like "turn to next page", used for indicating page turning, the voice controller transmits fifth audio information to the equipment host after collecting the fifth audio information comprising the fifth voice information, since the display interface indicated by the fifth audio information is not the target interface, the device host displays the display interface indicated by the fifth audio information, that is, after the next page of the display interface home page corresponding to the comedy type label is displayed, a re-judgment instruction is generated, and then, the actions can be repeated until the user successfully plays the target film, a radio reception stopping instruction is generated, the radio reception stopping instruction is sent to the voice controller, and the voice controller adjusts the working state to be a non-radio reception state according to the radio reception stopping instruction.

By way of example, a second workflow of the voice interaction system implemented by the voice interaction method provided in the first embodiment and the voice interaction method provided in the second embodiment will be described below.

The first page of the video selection page comprises selection tags such as ' movie ', ' TV play ', ' synthesis art ', the lower-layer interface corresponding to the ' movie ' tag comprises region tags such as ' Chinese ', ' USA ', ' Korea ', type tags such as ' comedy ', ' horror ', ' suspicion ', and the like, and year tags such as ' 2019 ', ' 2018 ', ' 2017 ', and the lower-layer interface corresponding to the ' comedy ' tag comprises movie tags such as ' cooking mouse king ', ' ghost house ', and ' Chumen ' world ', and the user determines that ' cooking mouse ' is taken as a target movie for example king.

The user generates a radio reception request by triggering a radio reception control key, the voice controller adjusts the working state to a radio reception state to collect audio information, and then the user can send out sixth voice information 'play cooking mouse king' and send the sixth audio information to the equipment host.

The third embodiment:

based on the same inventive concept as the voice interaction method provided in the first embodiment, an embodiment of the present application further provides a voice interaction apparatus, please refer to fig. 3, and the voice interaction apparatus provided in the embodiment of the present application includes a first adjusting module 111, a sending module 112, and a second adjusting module 113.

The first adjusting module 111 is configured to adjust the operating state to a sound receiving state in response to the sound receiving request, so as to collect audio information.

The description of the first adjusting module 111 specifically refers to the detailed description of the step S110 in the voice interaction method provided in the first embodiment, that is, the step S110 may be executed by the first adjusting module 111.

The sending module 112 is configured to send the audio information to the device host, so that the device host generates a radio reception stopping instruction when the display interface indicated by the audio information is the target interface.

The sending module 112 is specifically configured to send the audio information to the device host, so that the device host extracts the voice information from the audio information, decodes the voice information to obtain text information, performs semantic analysis on the text information to obtain an analysis result, and determines whether a display interface indicated by the audio information is a target interface according to the analysis result.

The description about the sending module 112 may specifically refer to the detailed description about the step S120 in the voice interaction method provided in the first embodiment, that is, the step S120 may be executed by the sending module 112.

The second adjusting module 113 is configured to adjust the working state to a non-sound-reception state in response to the sound-reception stopping instruction, so as to stop collecting the audio information.

The description of the second adjusting module 113 may specifically refer to the detailed description of the step S130 in the voice interaction method provided in the first embodiment, that is, the step S130 may be executed by the second adjusting module 113.

The voice interaction device provided by the embodiment of the application can further comprise a request generation module.

And the request generation module is used for generating a radio reception request when the radio reception control key is triggered.

The description of the request generation module may specifically refer to the detailed description of step S011 in the voice interaction method provided in the first embodiment, that is, step S011 may be executed by the request generation module.

The fourth embodiment:

based on the same inventive concept as the voice interaction method provided in the second embodiment, the embodiment of the present application further provides a voice interaction apparatus, please refer to fig. 4, the voice interaction apparatus provided in the embodiment of the present application includes a determining module 121 and a first instruction generating module 122.

The judging module 121 is configured to, when audio information sent by the voice controller is received, judge whether a display interface indicated by the audio information is a target interface.

The description of the determining module 121 may refer to the detailed description of step S210 in the voice interaction method provided in the second embodiment, that is, step S210 may be executed by the determining module 121.

The first instruction generating module 122 is configured to display the display interface indicated by the audio information when the display interface indicated by the audio information is a target interface, and generate a reception stopping instruction, so that the voice controller responds to the reception stopping instruction, and adjusts the working state to a non-reception state to stop collecting the audio information.

The description of the first instruction generating module 122 may specifically refer to the detailed description of step S220 in the voice interaction method provided in the second embodiment, that is, step S220 may be executed by the first instruction generating module 122.

In this embodiment, the determining module 121 may include an information extracting unit, an information encoding unit, and a semantic analysis unit.

And the information extraction unit is used for extracting the voice information from the audio information when receiving the audio information sent by the voice controller.

In an embodiment of the application, the information extraction unit is specifically configured to, when audio information sent by the voice controller is received, detect a start point and an end point of a human voice from the audio information, and extract the voice information from the audio information according to the start point and the end point.

The description about the information extraction unit can refer to the detailed description about step S211 in the voice interaction method provided in the second embodiment, that is, step S211 can be performed by the information extraction unit.

And the information coding unit is used for decoding the voice information to obtain text information.

The description of the information encoding unit can refer to the detailed description of step S212 in the voice interaction method provided in the second embodiment, that is, step S212 can be performed by the information encoding unit.

And the semantic analysis unit is used for performing semantic analysis on the text information to obtain an analysis result so as to judge whether the display interface indicated by the audio information is a target interface according to the analysis result.

In the embodiment of the application, the semantic analysis unit is specifically configured to perform semantic analysis on the text information to obtain an analysis result, and then determine the display interface indicated by the audio information according to the analysis result, obtain an interface tag corresponding to the display page, and then determine whether the display interface indicated by the audio information is a target interface according to the interface tag.

The description of the semantic analysis unit may refer to the detailed description of step S213, that is, step S213 may be performed by the semantic analysis unit in the voice interaction method provided in the second embodiment.

The voice interaction device provided by the embodiment of the application can further comprise a second instruction generation module.

And the second instruction generating module is used for displaying the display interface indicated by the audio information when the display interface indicated by the audio information is not the target interface, and generating a re-judgment instruction, wherein the re-judgment instruction is used for controlling the equipment host to perform the step of judging whether the display interface indicated by the audio information is the target interface or not when the audio information sent by the voice controller is received again.

The description about the second instruction generating module can refer to the detailed description about the step S230 in the voice interaction method provided in the second embodiment, that is, the step S230 can be executed by the second instruction generating module.

Fifth embodiment:

referring to fig. 5, a schematic block diagram of an electronic device 200 according to an embodiment of the present disclosure is shown. It is understood that when the electronic device 200 is a device to which the voice interaction method provided by the first embodiment is applied or to which the voice interaction apparatus provided by the third embodiment is applied, it may be a voice controller, and when the electronic device 200 is a device to which the voice interaction method provided by the second embodiment is applied or to which the voice interaction apparatus provided by the fourth embodiment is applied, it may be a device host, which may be a user terminal having a video playing function, and the user terminal may be, but is not limited to, a smart television, a projection controller, a PC, a PDA, a MID, etc. Further, structurally, the electronic device 200 may include a processor 210 and a memory 220.

The processor 210 and the memory 220 are electrically connected, directly or indirectly, to enable data transmission or interaction, for example, the components may be electrically connected to each other via one or more communication buses or signal lines. The voice interaction means includes at least one software module which may be stored in the memory 220 in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of the electronic device 200. The processor 210 is used for executing executable modules stored in the memory 220, such as software functional modules and computer programs included in the voice interaction apparatus, so as to implement the voice interaction method. The processor 210 may execute the computer program upon receiving the execution instruction.

In the embodiment of the present application, the processor 210 may be an integrated circuit chip having signal processing capability. The Processor 210 may also be a general-purpose Processor, for example, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a discrete gate or transistor logic device, a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present Application. Further, a general purpose processor may be a microprocessor or any conventional processor or the like.

The Memory 220 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), and an electrically Erasable Programmable Read-Only Memory (EEPROM). The memory 220 is used for storing a program, and the processor 210 executes the program after receiving the execution instruction.

Referring to fig. 6, in the embodiment of the present application, when the electronic device 200 is a device applying the voice interaction method provided by the second embodiment or applying the voice interaction apparatus provided by the fourth embodiment, it may further include a display 230, and the display 230 is electrically connected to the processor 210 directly or indirectly to implement data transmission or interaction, for example, the components may be electrically connected to each other through one or more communication buses or signal lines.

In the embodiment of the present application, the Display 230 may be, but is not limited to, a Cathode Ray Tube (CRT) Display, a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), an Organic Light-Emitting Diode (OLED) Display, and is used for displaying a Display interface indicated by audio information.

It should be understood that the configurations shown in fig. 5 and 6 are merely illustrative, and the electronic device 200 provided in the embodiments of the present application may have fewer or more components than those shown in fig. 5 and 6, or have a different configuration than those shown in fig. 5 and 6.

Sixth embodiment:

an embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed, the voice interaction method provided in the first embodiment is implemented, or the voice interaction method provided in the second embodiment is implemented.

To sum up, in the embodiment of the present application, a voice interaction method applied to a voice controller includes: and responding to the radio reception request, adjusting the working state to a radio reception state to collect audio information, sending the audio information to the equipment host to generate a radio reception stopping instruction when the display interface indicated by the audio information is a target interface, and then responding to the radio reception stopping instruction, adjusting the working state to a non-radio reception state to stop collecting the audio information. Correspondingly, the voice interaction method applied to the equipment host comprises the following steps: when the audio information sent by the voice controller is received, whether a display interface indicated by the audio information is a target interface or not is judged, when the display interface indicated by the audio information is the target interface, the display interface indicated by the audio information is displayed, a reception stopping instruction is generated, the voice controller responds to the reception stopping instruction, the working state is adjusted to be a non-reception state, and the collection of the audio information is stopped. Based on this, the voice interaction method, the voice interaction device, the electronic device and the storage medium provided by the embodiment of the application can reduce the power consumption of the voice controller by reducing the time of the voice controller in the radio receiving state, thereby improving the cruising ability.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in each embodiment of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in each embodiment of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a RAM, a ROM, a magnetic disk, or an optical disk.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Moreover, it is noted that, in this document, relational terms such as "first," "second," "third," and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Claims

1. A method of voice interaction, comprising:

sending the audio information to an equipment host machine, so that the equipment host machine generates a reception stopping instruction when a display interface indicated by the audio information is a target interface;

and responding to the radio reception stopping instruction, and adjusting the working state to a non-radio reception state so as to stop collecting the audio information.

2. The voice interaction method according to claim 1, further comprising:

and when the sound reception control key is triggered, generating the sound reception request.

3. The voice interaction method according to claim 1, wherein the sending the audio information to a device host for the device host to generate a sound reception stopping instruction when a display interface indicated by the audio information is a target interface comprises:

and sending the audio information to an equipment host machine so that the equipment host machine can extract voice information from the audio information, decoding the voice information to obtain text information, performing semantic analysis on the text information to obtain an analysis result, and judging whether a display interface indicated by the audio information is a target interface according to the analysis result.

4. A method of voice interaction, comprising:

and when the display interface indicated by the audio information is a target interface, displaying the display interface indicated by the audio information, and generating a reception stopping instruction, so that the voice controller responds to the reception stopping instruction, and adjusts the working state to a non-reception state to stop collecting the audio information.

5. The method of claim 4, wherein when receiving audio information sent by a voice controller, determining whether a display interface indicated by the audio information is a target interface comprises:

decoding the voice information to obtain text information;

6. The voice interaction method according to claim 5, wherein performing semantic analysis on the text information to obtain an analysis result, and determining whether a display interface indicated by the audio information is a target interface according to the analysis result, includes:

7. The voice interaction method of claim 4, further comprising:

and when the display interface indicated by the audio information is not the target interface, displaying the display interface indicated by the audio information, and generating a re-judgment instruction, wherein the re-judgment instruction is used for controlling the equipment host to execute the step of judging whether the display interface indicated by the audio information is the target interface or not when the audio information sent by the voice controller is received again.

8. A voice interaction apparatus, comprising:

the sending module is used for sending the audio information to the equipment host so that the equipment host generates a radio reception stopping instruction when a display interface indicated by the audio information is a target interface;

and the second adjusting module is used for responding to the radio reception stopping instruction and adjusting the working state into a non-radio reception state so as to stop collecting the audio information.

9. A voice interaction apparatus, comprising:

10. An electronic device, comprising a processor and a memory, wherein the memory stores a computer program thereon, and the processor is configured to execute the computer program to implement the voice interaction method according to any one of claims 1 to 7.

11. A storage medium having a computer program stored thereon, wherein the computer program, when executed, implements the method of voice interaction of any one of claims 1 to 7.