WO2022161077A1 - Speech control method, and electronic device - Google Patents

Speech control method, and electronic device Download PDF

Info

Publication number
WO2022161077A1
WO2022161077A1 PCT/CN2021/142083 CN2021142083W WO2022161077A1 WO 2022161077 A1 WO2022161077 A1 WO 2022161077A1 CN 2021142083 W CN2021142083 W CN 2021142083W WO 2022161077 A1 WO2022161077 A1 WO 2022161077A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
voice
recording data
recording
audio
Prior art date
Application number
PCT/CN2021/142083
Other languages
French (fr)
Chinese (zh)
Inventor
王晓博
许嘉璐
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022161077A1 publication Critical patent/WO2022161077A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • the present application relates to computer technology, and in particular, to a voice control method and electronic device.
  • voice assistant As a new type of terminal application (application, APP) based on voice semantic algorithm, voice assistant provides service functions such as interactive dialogue, information query, and device control by receiving and recognizing voice signals sent by users. With the continuous development of deep learning theory and the maturity of intelligent voice hardware, voice assistant applications have become an essential software function for terminal devices such as smartphones, tablet computers, smart TVs, and smart speakers.
  • the user's living room has three devices: a speaker, a TV, and a mobile phone. All three devices have a voice assistant application installed, and the wake-up words are all "small E and small E".
  • the voice assistant application of the speaker, TV and mobile phone selects one of the three devices as the answering device by detecting the audio energy information of the wake-up word. Since the speaker is closest to the user, the three devices negotiate and select the speaker as the answering device based on the audio energy information of the wake-up word.
  • the speaker wakes up its own voice assistant application, and other devices do not respond to the wake word, that is, do not wake up their respective voice assistant applications. In this way, after the user continues to speak the voice signal, only the speaker will recognize and respond to the user's voice signal. For example, after the user speaks the voice signal "play song 112222", the speaker recognizes and responds to the voice signal. For example, the speaker responds by outputting the voice signal "Song 112222 will be played for you”.
  • the answering device recognizes and responds to the user's voice signal.
  • this processing method will have the problem of misrecognition by the answering device, that is, there is an answering device.
  • the problem that the voice signal input by the user after the wake-up word cannot be accurately recognized.
  • the present application provides a voice control method and electronic device, so as to solve the problem of misrecognition of voice control in a multi-device scenario and improve the accuracy of voice control.
  • an embodiment of the present application provides a voice control method, which can be applied to a voice control system, and the voice control system can at least include a first electronic device and a second electronic device with a voice control function.
  • the control method may include: the first electronic device and the second electronic device respectively receive a first voice command input by a user, and the first electronic device responds to the first voice command.
  • the second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user.
  • the second electronic device sends the audio recording data of the second electronic device to the first electronic device.
  • the first electronic device responds to the second voice instruction according to the recorded data of the first electronic device and/or the recorded data of the second electronic device.
  • the recording data of the first electronic device includes recording data of the second voice instruction input by the user recorded by the first electronic device.
  • the recording of the second electronic device may start before the first electronic device responds to the first voice instruction, decoupling the selection process of the answering device from the recording process of the electronic device, regardless of whether the first electronic device is determined as the Both the answering device and the second electronic device can record and save the second voice command input by the user.
  • the recording data of the second electronic device is sent to the first electronic device, and the The first electronic device responds to the second voice command.
  • the first electronic device acts as an answering device to answer the first voice command
  • the first electronic device and the second electronic device both record the second voice command and save the recorded data
  • the second electronic device sends its own recorded data
  • the first electronic device responds to the second voice instruction according to the recorded data of the first electronic device and/or the recorded data of the second electronic device.
  • the voice commands input by the user are recorded by the non-responding device
  • the answering device performs SE, ASR and other processing based on the recorded data of the answering device and/or the recorded data of the non-response device, effectively eliminating the need for the equipment in the process of selecting the answering device.
  • the communication delay between different devices can be solved, so as to solve the frame loss problem of voice control caused by delay in multi-device scenarios.
  • the answering device responds to the second voice command through the recording data collected by multiple devices collaboratively, which can solve the problem of the influence of the audio quality of the voice command picked up by the electronic device on the accuracy of ASR recognition, and improve the accuracy of voice control.
  • the method may further include: the first electronic device invokes a voice pickup instruction to the second electronic device, where the voice pickup instruction is used by the second electronic device to return the recording data of the second electronic device.
  • the recording by the second electronic device may include: recording by the second electronic device when or after the second electronic device receives the first voice instruction input by the user.
  • the second electronic device when the second electronic device receives the first voice command input by the user or after the second electronic device records the recording, that is, the second electronic device starts recording before determining the answering device, the second electronic device can record to the user The second voice command entered.
  • This can effectively eliminate the communication delay between devices in the process of selecting the answering device, thereby solving the problem of frame loss in voice control caused by delay in multi-device scenarios.
  • the method may further include: when or after the first electronic device receives the first voice instruction input by the user, recording the first electronic device, and the recording is used to record the second voice instruction input by the user.
  • the first voice command is used to wake up the voice control function of the first electronic device and/or the second electronic device.
  • the first voice instruction here may be the voice instruction of step 401 in the following embodiment shown in FIG. 3 .
  • the method may further include: the first electronic device and the second electronic device determine that the first electronic device is the answering device of the voice control system according to the audio quality information of the first voice command received by the first electronic device respectively.
  • the method may further include: during the recording process of the first electronic device and the second electronic device. , the first electronic device does not detect the second voice instruction input by the user within the preset time period, the first electronic device deletes the saved recording data and continues to record.
  • the first electronic device invokes a multi-round dialogue pause command to the second electronic device, where the multi-round dialogue pause command is used to instruct the multi-round dialogue pause to temporarily stop.
  • the second electronic device deletes the saved recording data and continues recording.
  • the first voice instruction here may be the voice instruction before step 701 in the embodiment shown in FIG. 6 below.
  • the second voice instruction here may be the voice instruction of step 703 in the following embodiment shown in FIG. 6 .
  • the method may further include: the first electronic device receiving audio quality information of the recording data of the second electronic device sent by the second electronic device.
  • This implementation can speed up the decision of the optimal radio equipment, thereby improving the response speed of the voice control.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio quality information of the data and the audio quality information of the recording data of the second electronic device are used to determine the optimal audio pickup device from the voice control system.
  • the optimal radio device is the first electronic device
  • the first electronic device responds to the second voice command according to the recording data of the first electronic device, or according to the recording data of the first electronic device and the recording data of the second electronic device.
  • the first electronic device responds to the second voice command according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device.
  • the audio quality information is used to indicate the audio quality of the recording data.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio content information of the data and/or the audio content information of the recording data of the second electronic device is to respond to the second voice instruction.
  • the audio content information is used to represent the audio content of the recording data.
  • the second voice command is responded according to the audio content information of the recording data of the first electronic device.
  • the audio content information of the recording data of the first electronic device is less than the audio content information of the recording data of the second electronic device
  • the second voice command is responded according to the audio content information of the recording data of the second electronic device.
  • the first electronic device can compare the audio content information of the audio recording data of the first electronic device to the audio content information. Splicing with the audio content information of the recording data of the second electronic device, and responding to the second voice command according to the spliced audio content information.
  • an embodiment of the present application provides a voice control method, which can be applied to a first electronic device of a voice control system, the voice control system can also include at least a second electronic device, and the voice control method can include: An electronic device receives the first voice command input by the user, and the first electronic device responds to the first voice command.
  • the first electronic device receives the recording data of the second electronic device sent by the second electronic device, and the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice instruction input by the user.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, and the recorded data of the first electronic device includes the first electronic device recording the second voice command input by the user. recording data.
  • the method may further include: the first electronic device invokes a voice pickup instruction to the second electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the method may further include: when or after the first electronic device receives the first voice instruction input by the user, recording the first electronic device for recording the second voice instruction input by the user.
  • the first voice command is used to wake up the voice control function of the first electronic device and/or the second electronic device.
  • the method may further include: the first electronic device according to the audio quality information of the first voice command received by the first electronic device and the audio quality information of the first voice command received by the second electronic device, It is determined that the first electronic device is an answering device of the voice control system.
  • the method may further include: during the recording process of the first electronic device, the first electronic device If the second voice command input by the user is not detected within the preset time period, the first electronic device deletes the saved recording data and continues to record; the first electronic device invokes multiple rounds of dialogue pause instructions to the second electronic device, The dialogue pause instruction is used to instruct multiple rounds of dialogue to temporarily stop; the second electronic device deletes the saved recording data and continues recording.
  • the method may further include: the first electronic device receiving audio quality information of the recording data of the second electronic device sent by the second electronic device.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio quality information of the data and the audio quality information of the recording data of the second electronic device are used to determine the optimal audio pickup device from the voice control system.
  • the optimal audio pickup device is the first electronic device
  • the first electronic device responds to the second voice command according to the recording data of the first electronic device.
  • the optimal radio device is the second electronic device
  • the first electronic device responds to the second voice command according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device.
  • the audio quality information is used to indicate the audio quality of the recording data.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio content information of the data and/or the audio content information of the recording data of the second electronic device is to respond to the second voice instruction.
  • the audio content information is used to represent the audio content of the recording data.
  • an embodiment of the present application provides a voice control method.
  • the voice control method can be applied to a second electronic device of a voice control system.
  • the voice control system can also include at least a first electronic device.
  • the voice control method can include :
  • the second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user.
  • the second electronic device sends the recording data of the second electronic device to the first electronic device, the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice command input by the user, and the recording data is used by the first electronic device After answering the first voice instruction, answer the second voice instruction.
  • the method may further include: the second electronic device receives a voice pickup instruction called by the first electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the recording by the second electronic device may include: recording by the second electronic device when or after the second electronic device receives the first voice instruction input by the user.
  • the method may further include: the second electronic device according to the audio quality information of the first voice command received by the second electronic device and the audio quality information of the first voice command received by the first electronic device, It is determined that the first electronic device is an answering device of the voice control system.
  • the method may further include: during the recording process of the second electronic device, the second electronic device receives the second electronic device to invoke multiple rounds of dialogue pause commands, The multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop; the second electronic device deletes the saved recording data and continues recording.
  • the method may further include: the second electronic device sends audio quality information of the recording data of the second electronic device to the first electronic device.
  • an embodiment of the present application provides a voice control device, the device has the function of implementing the second aspect or any possible design of the second aspect.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions, for example, a transceiver unit or module, and a processing unit or module.
  • an embodiment of the present application provides a voice control device, the device has a function of implementing the third aspect or any possible design of the third aspect.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions, for example, a transceiver unit or module, and a processing unit or module.
  • an embodiment of the present application provides an electronic device, which may include: one or more processors; one or more memories; wherein the one or more memories are used to store one or more programs ; the one or more processors are configured to run the one or more programs to implement the method according to the second aspect or any possible design of the second aspect.
  • an embodiment of the present application provides an electronic device, which may include: one or more processors; one or more memories; wherein the one or more memories are used to store one or more programs ; the one or more processors are configured to run the one or more programs to implement the method according to the third aspect or any possible design of the third aspect.
  • an embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the second aspect or any of the second aspect.
  • a computer-readable storage medium which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the second aspect or any of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the third aspect or any of the third aspect.
  • a computer-readable storage medium which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the third aspect or any of the third aspect.
  • an embodiment of the present application provides a chip, which is characterized in that it includes a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, to A method as described in the second aspect or any possible design of the second aspect is performed.
  • an embodiment of the present application provides a chip, characterized in that it includes a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, to perform the method described in the third aspect or any possible design of the third aspect.
  • embodiments of the present application provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in the second aspect or any possible design of the second aspect.
  • the embodiments of the present application provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in the third aspect or any possible design of the third aspect.
  • an embodiment of the present application provides a voice control system, where the voice control system includes at least a first electronic device and a second electronic device having a voice control function.
  • the first electronic device is adapted to perform the method as described in the second aspect or any possible design of the second aspect.
  • the second electronic device is configured to perform the method as described in the third aspect or any possible design of the third aspect.
  • the voice control method and electronic device of the embodiments of the present application solve the problem of frame loss in voice control in the multi-device scenario by directly recording multiple devices without performing cross-device communication, and improve voice control 's accuracy. After that, responding to the voice command input by the user through the recorded data of the multi-device collaborative sound collection can effectively solve the problem that the audio quality of the voice command picked up by the electronic device affects the accuracy of ASR recognition, and improve the accuracy of voice control.
  • FIG. 1 provides a schematic diagram of a voice control system according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a voice control method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a scenario of multi-device voice control provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another multi-device voice control scenario provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another voice control method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another multi-device voice control scenario provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a voice control device according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a voice control apparatus provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, c can be single or multiple.
  • Voice assistant An application program built on artificial intelligence, with the help of speech semantic recognition algorithm, through instant question-and-answer voice interaction with users, it helps users to complete information query, device control, text input and other operations.
  • Voice assistants usually use staged cascade processing, followed by voice wake-up, voice front-end processing, automatic speech recognition (ASR), natural language understanding (NLU), dialogue management (dialog management, DM), Basic workflows such as natural language generation (NLG) and text-to-speech (TTS) provide service functions.
  • ASR automatic speech recognition
  • NLU natural language understanding
  • DM dialogue management
  • Basic workflows such as natural language generation (NLG) and text-to-speech (TTS) provide service functions.
  • the voice front-end processing may include but is not limited to voice enhancement (speech enhancement, SE).
  • ASR can take the speech signal processed by SE noise reduction as input, and output the textual description result of the user's speech signal.
  • ASR is the basis for voice assistant applications to accurately complete subsequent recognition processing tasks.
  • the audio quality of the user's voice signal input to the ASR directly determines the accuracy of the ASR recognition result.
  • the voice control method of the embodiment of the present application can ensure the accuracy and reliability of the user voice signal input to the ASR, thereby improving the accuracy of the ASR recognition result, and then accurately completing the subsequent recognition processing task.
  • Voice wake-up The electronic device receives and detects a specific user voice signal (ie wake-up word) when the screen is locked or the voice assistant is dormant, activates or starts the voice assistant, and makes the voice assistant enter the state of waiting for voice signal input.
  • a specific user voice signal ie wake-up word
  • AEC Acoustic echo cancellation
  • the answering device In the multi-device voice control process, multiple electronic devices select an answering device through mutual communication and negotiation, and the answering device identifies and responds to the user's voice signal.
  • audio quality due to the diversity and complexity of usage scenarios, user voice commands picked up and processed by electronic devices are inevitably disturbed by various external and internal noises.
  • the interference of noise will affect the audio quality of the user's voice command picked up by the electronic device.
  • the external noise can be noises such as air conditioner fans and unrelated human voices around the device
  • the internal noise can be the audio/video played by the electronic device itself.
  • the distance and orientation between the electronic device and the user, as well as the posture of the electronic device and the performance of the microphone module, etc., will also affect the audio quality of the user's voice commands picked up by the electronic device.
  • the audio quality of the user's voice command picked up by the electronic device is poor, misrecognition will occur.
  • the communication delay caused by the cross-device communication between the multiple electronic devices and the delay caused by the selection of the answering device will cause the frame loss problem, which will lead to misidentification. .
  • the above delay will cause the user to say the voice signal "play song 112222", but the answering device only recognizes the voice signal "2222", that is, the voice signal "play song 11” is not received and recognized, which makes the answering device unable to Accurately recognize and respond to user voice commands.
  • the voice control method of the embodiment of the present application can improve the audio quality and/or reduce the time delay, so as to solve the problem of misrecognition of voice commands in the process of multi-device voice control.
  • the delay caused by the realization of multi-device wake-up and data transmission through communication is eliminated, thereby eliminating the impact of the delay on the accuracy of ASR recognition, and solving the problem of multiple devices.
  • the frame loss problem of voice control in the scene improves the accuracy of voice control.
  • the optimal radio equipment Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user.
  • the influence of audio quality of voice commands picked up by electronic devices on the accuracy of ASR recognition can be solved, and the accuracy of voice control can be improved.
  • the voice control method in the embodiment of the present application can be applied to a multi-device scenario.
  • the multi-device scenario may include a scenario where a user uses multiple electronic devices concurrently, or a scenario where user voice interaction occurs within the effective working range of multiple electronic devices.
  • each of the plurality of electronic devices has a voice control function.
  • This voice control function may be provided by a voice assistant.
  • the method of this embodiment can ensure the accuracy and reliability of the voice command input to the ASR, thereby improving the accuracy of the ASR recognition result, and further improving the accuracy of the ASR recognition result. Complete the subsequent recognition and processing tasks, and complete the response to the voice command. It makes the electronic device more intelligent, and realizes the efficient and accurate interaction between the electronic device and the user. At the same time, the user experience is improved.
  • the voice command in the embodiment of the present application refers to the command input by the user to the electronic device in the form of sound.
  • the voice command is used to enable the electronic device to provide the user with service functions such as interactive dialogue, information query, and device control.
  • the voice instruction may be a piece of voice signal input by the user through the microphone of the electronic device.
  • a voice assistant may be installed in the electronic device to enable the electronic device to implement a voice control function.
  • Voice assistants are generally dormant. The user can voice wake up the voice assistant before using the voice control function of the electronic device.
  • the voice signal to wake up the voice assistant may be called a wake-up word (or wake-up voice).
  • the wake word may be pre-registered in the electronic device.
  • the wake-up word may be "small E, small E".
  • the wake-up word may also be any other word or statement, which can be flexibly set according to requirements, and the embodiments of the present application will not illustrate them one by one.
  • the above-mentioned voice assistant may be an embedded application in the electronic device (ie, a system application of the electronic device), or may be a downloadable application.
  • Embedded applications are applications provided as part of the implementation of an electronic device such as a cell phone.
  • a downloadable application is an application that can provide its own internet protocol multimedia subsystem (IMS) connection.
  • the downloadable application may be pre-installed in the electronic device, or may be a third-party application downloaded by the user and installed in the electronic device.
  • IMS internet protocol multimedia subsystem
  • FIG. 1 is a schematic diagram of a voice control system according to an embodiment of the present application.
  • the voice control system may include multiple electronic devices, and the multiple electronic devices meet one or more of the following conditions: connected to the same wireless access point (such as a WiFi access point), or logged into the same account, or Set by the user in the same group, or the user's voice interaction occurs within the effective working range of the plurality of electronic devices.
  • the same wireless access point such as a WiFi access point
  • Set by the user in the same group or the user's voice interaction occurs within the effective working range of the plurality of electronic devices.
  • the voice control system may include three electronic devices, for example, a first electronic device 201 , a second electronic device 202 and a third electronic device 203 .
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 all have a voice control function, for example, a voice assistant is installed.
  • the first electronic device 201 , the second electronic device 202 , and the third electronic device 203 can wake up the voice assistant with the same wake-up word, for example, "small E and small E".
  • the electronic devices described in the embodiments of the present application may be mobile phones, tablet computers, desktops, laptops, handheld computers, Laptops, desktops, ultra-mobile personal computers (UMPCs), netbooks, and cellular phones, personal digital assistants (PDAs), augmented reality (AR) ⁇ virtual reality , VR) devices, media players, TVs, smart speakers, smart watches, smart headphones and other devices.
  • PDAs personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • the specific form of the electronic device is not particularly limited in the embodiments of the present application.
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can be the same type of electronic device, such as the first electronic device 201 , the second electronic device 202 and the third electronic device
  • the devices 203 are all mobile phones.
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can be different types of electronic devices, for example, the first electronic device 201 is a mobile phone, and the second electronic device 202 is a smart speaker , the third electronic device 203 is a television (as shown in FIG. 1 ).
  • the first electronic device 201, the second electronic device 202 and the third electronic device 203 directly start recording without cross-device communication, so as to solve the frame loss problem of voice control in a multi-device scenario, Improve the accuracy of voice control.
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can record each other without being called by other devices (eg, a central device), thus realizing a decentralized recording method.
  • This decentralized recording method does not need to perform the process of selecting a device as the calling device, which can effectively eliminate the delay caused by communication between devices and improve the accuracy of subsequent voice control.
  • one or more electronic devices are selected as the optimal sound-receiving device. Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user.
  • the embodiment of the present application can solve the problem of the influence of the audio quality of the voice command picked up by the electronic device on the accuracy of the ASR recognition by means of multi-device cooperative audio collection.
  • the voice control system may also include a server 204 .
  • the server 204 can provide intelligent voice services.
  • FIG. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Speaker 170A, Receiver 170B, Microphone 170C, Headphone Interface 170D, Sensor Module 180, Key 190, Motor 191, Indicator 192, Camera 193, Display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and an environmental sensor Light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor ( image signal processor, ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural-network processing unit (NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • a controller can be the nerve center and command center of an electronic device.
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuitsound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver (universal asynchronous receiver) /transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and/or Universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous receiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal serial bus
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device. While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in an electronic device can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G etc. applied on the electronic device.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the wireless communication module 160 can provide applications on electronic devices including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellite systems ( global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite systems
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify the signal, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the wireless communication module 160 may interact with other electronic devices, for example, after detecting a voice signal matching the wake-up word, send energy information of the detected voice signal to other electronic devices.
  • the electronic device in this embodiment of the present application may communicate with other electronic devices through the mobile communication module 150 and/or the wireless communication module 160 .
  • the first electronic device 201 sends a voice pickup instruction and the like to the second electronic device 202 through the communication module 150 and/or the wireless communication module 160 .
  • the antenna 1 of the electronic device is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi-zenith) satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • global positioning system global positioning system, GPS
  • global navigation satellite system global navigation satellite system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system, BDS
  • quasi-zenith satellite system quasi-zenith satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device realizes the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed
  • quantum dot light-emitting diode quantum dot light emitting diodes, QLED
  • the electronic device may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the electronic device can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device selects the frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, etc.
  • Video codecs are used to compress or decompress digital video.
  • An electronic device may support one or more video codecs.
  • the electronic device can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of electronic devices can be realized, such as image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be received by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound through the human mouth close to the microphone 170C, and input the sound signal into the microphone 170C.
  • the electronic device may be provided with at least one microphone 170C.
  • the electronic device may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals.
  • the electronic device may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the electronic device in this embodiment of the present application may receive a voice instruction input by the user through the microphone 170C.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device determines the intensity of the pressure based on the change in capacitance. When a touch operation acts on the display screen 194, the electronic device detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device can also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • the gyro sensor 180B can be used to determine the motion attitude of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (ie, the x, y, and z axes) may be determined by the gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device, calculates the distance to be compensated by the lens module according to the angle, and allows the lens to counteract the shaking of the electronic device through reverse motion to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device calculates the altitude from the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the electronic device can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • Distance sensor 180F for measuring distance.
  • Electronic devices can measure distances by infrared or laser. In some embodiments, when shooting a scene, the electronic device can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • Electronic devices emit infrared light outward through light-emitting diodes.
  • Electronic devices use photodiodes to detect reflected infrared light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object in the vicinity of the electronic device. When insufficient reflected light is detected, the electronic device can determine that there is no object in the vicinity of the electronic device.
  • the electronic device can use the proximity light sensor 180G to detect that the user holds the electronic device close to the ear to talk, so as to automatically turn off the screen to save power.
  • Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints. Electronic devices can use the collected fingerprint characteristics to unlock fingerprints, access application locks, take photos with fingerprints, and answer incoming calls with fingerprints.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device utilizes the temperature detected by the temperature sensor 180J to implement a temperature handling strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device may reduce the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection.
  • the electronic device when the temperature is lower than another threshold, the electronic device heats the battery 142 to avoid abnormal shutdown of the electronic device caused by the low temperature.
  • the electronic device boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device, which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device may receive key input and generate key signal input related to user settings and function control of the electronic device.
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device.
  • the electronic device can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device interacts with the network through the SIM card to realize functions such as call and data communication.
  • the electronic device employs an eSIM, ie: an embedded SIM card.
  • the eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.
  • recording is directly started without cross-device communication between multiple devices, so as to solve the problem of frame loss of voice control in the multi-device scenario, and improve the accuracy of voice control.
  • one or more electronic devices are selected from the multiple electronic devices as the optimal audio pickup device. Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user. Through the selection of the optimal radio equipment, choose to satisfy the clearest pickup (closest to the user), the least noise interference (farthest away from the noise source), or the best SE processing effect (the best microphone noise reduction performance or support AEC) At least one of the electronic devices is used as a voice pickup entrance for the voice assistant to call, which can effectively solve the problem of the influence of the audio quality of the voice commands picked up by the electronic device on the ASR recognition accuracy.
  • the device information may include, but is not limited to, static attribute information or dynamic attribute information of the electronic device.
  • the static attribute information may include, but is not limited to, device model, system version, microphone capability information, and the like.
  • the dynamic attribute information may include, but is not limited to, power information of the electronic device, headphone status information, microphone status information, speaker status information, audio quality information of the recording data, and the like.
  • the speaker status information may be used to indicate whether the speaker of the electronic device is occupied.
  • the audio quality information is used to indicate whether the audio quality of the recorded data is good or bad.
  • the specific form of the audio quality information may include one or more items such as sound intensity information, noise sound intensity information, and signal-to-noise ratio information.
  • FIG. 3 is a schematic flowchart of a voice control method according to an embodiment of the present application. This embodiment is illustrated by taking the three electronic devices shown in FIG. 1 , a speaker 201 , a TV 202 and a mobile phone 203 as examples. As shown in FIG. 3, the method of this embodiment may include:
  • Step 401 the speaker 201 , the television 202 and the mobile phone 203 respectively receive the first voice instruction input by the user.
  • the first voice instruction is used to wake up the voice assistant of the electronic device.
  • the first voice instruction may be the above-mentioned wake-up word "small E small E".
  • the first voice command is used to wake up the respective voice assistants of the speaker 201 , the television 202 and the mobile phone 203 .
  • the electronic device can monitor whether the user has a voice signal input in real time through the microphone.
  • the electronic device can monitor whether the user has a voice signal input in real time through the microphone.
  • a user wants to use the voice control function of the electronic device, he or she can make a sound within the sound pickup range of the electronic device, so as to input the emitted sound into the microphone.
  • the electronic device can monitor the corresponding voice signal, such as the first voice command, through the microphone.
  • the user when the user wants to use the voice control function, he can say the wake-up word "small E, small E". If the sounding position of the user is located within the respective pickup ranges of the speaker 201, the TV 202 and the mobile phone 203, and no other software or hardware is using the microphone to collect the voice signal, the speaker 201, the TV 202 and the mobile phone 203 can pass their respective voice signals. The microphone detects the first voice instruction corresponding to the wake-up word "small E small E”.
  • Step 402 in response to the first voice command, the speaker 201 , the TV 202 and the mobile phone 203 wake up their respective voice assistants and start recording.
  • the electronic device When the electronic device detects the first voice command, in response to the first voice command, the electronic device wakes up the voice assistant.
  • the first voice command can be checked, that is, it is determined whether the received first voice command is a wake-up word registered in the electronic device. If the verification is passed, it indicates that the received first voice command is a wake-up word, which wakes up the voice assistant. If the verification fails, it indicates that the received first voice command is not a wake-up word, and the electronic device may not wake up the voice assistant at this time, that is, keep the voice assistant in a dormant state.
  • the speaker 201, the TV 202 and the mobile phone 203 when the speaker 201, the TV 202 and the mobile phone 203 detect the first voice command respectively, the speaker 201, the TV 202 and the mobile phone 203 wake up their respective voice assistants and start recording. After the speaker 201, the TV 202 and the mobile phone 203 start recording respectively, they can detect whether the user inputs other voice commands through their respective microphones, and when detecting other voice commands input by the user, generate recording data and save them in their own devices.
  • the television set 202 and the mobile phone 203 start recording, they respectively receive the second voice instruction input by the user. For example, take the second voice command spoken by the user as "play song 112222" as an example.
  • the speaker 201, the TV 202 and the mobile phone 203 respectively record the second voice command to generate their own recording data, and the content of the recording data is "play song 112222".
  • the recording data may be recorded every 0.5s to generate the recording data.
  • 0.5 may also be other numerical values, for example, 0.6, 1, etc., which are not described one by one in the embodiments of the present application.
  • the electronic device may further determine audio quality information corresponding to the recorded data according to the recorded data. In other words, the electronic device also evaluates the quality of its own recording data.
  • the audio quality information may include one or more items of sound intensity information, noise sound intensity information, and signal-to-noise ratio information.
  • the speaker 201 , the TV 202 and the mobile phone 203 can respectively perform quality evaluation on the respective recording data, and determine the audio quality information corresponding to the respective recording data.
  • Step 403 the speaker 201 , the TV 202 and the mobile phone 203 respectively execute the selection of the answering device, determine the answering device, and the answering device plays the answering voice corresponding to the first voice command.
  • step 402 and step 403 is not limited by the size of the serial number, and other execution sequences may also be used. For example, an answering device selection is performed while recording is started.
  • the answering device in this embodiment is used to play the answering voice corresponding to the voice command input by the user.
  • the answering device plays the answer voice corresponding to the first voice command, that is, the wake-up answer voice, such as "I'm here". While other electronic devices that are not used as answering devices wake up the voice assistant, but do not play the answering voice corresponding to the voice command input by the user.
  • the electronic device may select an answering device based on the audio quality information corresponding to the first voice command to determine an answering device.
  • the electronic device can evaluate the quality of the received first voice command, determine the audio quality information corresponding to the first voice command received by itself, and broadcast the audio quality corresponding to the first voice command received by itself. information and its own device information.
  • the electronic device receives audio quality information and its own device information corresponding to the first voice instruction received by itself and broadcast by other electronic devices.
  • the electronic device selects one electronic device as the answering device according to the audio quality information and device information of all the electronic devices. For example, choose the electronic device with the best audio quality as the answering device.
  • the speaker 201 when the speaker 201 detects the first voice command, the speaker 201 can also evaluate the quality of the first voice command, determine the audio quality information corresponding to the first voice command received by the speaker 201, and broadcast the speaker. The audio quality information corresponding to the first voice command received by 201 and the device information of the speaker 201 . Similar processing method, when the TV set 202 detects the first voice command, the TV set 202 can also perform quality evaluation on the first voice command, determine the audio quality information corresponding to the first voice command received by the TV set 202, and broadcast it. The audio quality information corresponding to the first voice command received by the TV set 202 and the device information of the TV set 202 .
  • the mobile phone 203 can also evaluate the quality of the first voice command, determine the audio quality information corresponding to the first voice command received by the mobile phone 203, and broadcast the first voice received by the mobile phone 203.
  • the audio quality information corresponding to the instruction and the device information of the mobile phone 203 are specified.
  • the speaker 201 can receive the audio quality information and device information corresponding to the first voice command of the TV 202 and the mobile phone 203, and the speaker 201 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201, the TV 202 and the mobile phone 203 according to the Device information, select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device.
  • the TV 202 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201 and the mobile phone 203 , and the TV 202 can receive the audio quality information corresponding to the first voice command of the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality Information and device information, select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device.
  • the mobile phone 203 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201 and the TV 202, and the mobile phone 203 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201, the TV 202 and the mobile phone 203 , select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device.
  • the speaker 201, the television 202, and the mobile phone 203 are all determined to be the answering device as an exemplary illustration.
  • the speaker 201 acts as an answering device and plays a wake-up answering voice, such as "I am here".
  • the TV set 202 and the mobile phone 203 do not play the wake-up response voice, but the voice assistants of the TV set 202 and the mobile phone 203 are in the wake-up state as described in step 402 above, and can record.
  • the answering device may also be selected in combination with other information, such as the priority of each electronic device.
  • the specific implementation manner of performing the selection of the answering device may also adopt other manners, and this embodiment of the present application does not limit the foregoing manner.
  • the answering device in the last use process of the user or the answering device set by the user may be used as the answering device in this embodiment.
  • Step 404 the speaker 201 calls the voice pickup instruction to the TV set 202 and the mobile phone 203 respectively, and the voice pickup instruction is used to instruct to return the recording data.
  • the speaker 201 starts to perform the distributed sound collection task.
  • the answering device can respectively call the pickup instruction to other non-answering devices, and the pickup instruction is used to instruct the non-answering device to return the recording data to the answering device.
  • the voice assistant of the speaker 201 can call the interface between the voice assistant of the television 202 and the voice assistant of the speaker 201 to transmit the voice pickup instruction to the television 202 .
  • the voice assistant of the speaker 201 can call the interface between the voice assistant of the mobile phone 203 and the voice assistant of the speaker 201 to transmit a voice pickup instruction to the speaker 201 .
  • the pickup instruction may carry the identification information of the answering device.
  • the identification information of the answering device may be a media access control (media access control, MAC) address of the answering device.
  • the voice pickup instruction may carry the identification information of the speaker 201 to instruct the television 202 to return the recording data to the speaker 201 .
  • Step 405 the television 202 and the mobile phone 203 respectively send the recording data to the speaker 201 .
  • the answering device receives recorded data sent by other non-answering devices. After other non-answering devices send their own recording data, they can continue recording and send new recording data to the answering device.
  • the television 202 sends the audio recording data of the television 202 to the speaker 201 .
  • the mobile phone 203 sends the recording data of the mobile phone 203 to the speaker 201 .
  • the recorded data may include the above-mentioned second voice instruction.
  • the content of the recording data is "play song 112222".
  • the speaker 201 performs quality evaluation on the received recording data of the TV set 202 , and determines the audio quality information corresponding to the recording data of the TV set 202 .
  • the speaker 201 performs quality evaluation on the received recording data of the mobile phone 203 , and determines the audio quality information corresponding to the recording data of the mobile phone 203 .
  • the speaker 201 may also receive audio quality information corresponding to the recording data of the television set 202 sent by the television set 202 .
  • the speaker 201 can also receive audio quality information corresponding to the recording data of the mobile phone 203 sent by the mobile phone 203 .
  • Step 406 the speaker 201 determines the optimal radio device in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information, and plays the response voice corresponding to the second voice command according to the recording data of the optimal radio device.
  • the answering device selects an optimal radio device from multiple electronic devices according to the audio quality information corresponding to the recording data of multiple electronic devices (including itself and other non-responding devices), and uses the recording data of the optimal radio device to perform SE. , ASR, etc., to correctly identify the voice command input by the user, and then accurately respond to the voice command input by the user.
  • the accurate response to the voice command input by the user includes playing the response voice corresponding to the voice command input by the user.
  • the accurate response to the voice command input by the user may further include triggering the answering device or other non-responding device to execute an event corresponding to the voice command. The event could be playing a song, playing a video, making a call, etc.
  • the speaker 201 may also send the recording data of the optimal radio device to the server 204 shown in FIG. 1 , and the server 204 uses the recording data of the optimal device to perform SE, ASR and other processing, so as to correctly recognize the voice command input by the user, and then make an accurate response to the voice command input by the user.
  • the speaker 201 in this embodiment determines the speaker 201 as the optimal sound-receiving device among the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information of the recording data of the speaker 201 , the TV 202 and the mobile phone 203 .
  • the speaker 201 can play the answering voice "Song 112222 will be played for you here".
  • the multimedia resource of the song 112222 can be provided by the server 204 or the mobile phone 203 .
  • the speaker 201 may also play the response voice corresponding to the second voice command according to its own recording data and the recording data of the optimal audio recording device.
  • the speaker 201 can splicing its own recording data and the recording data of the optimal audio-receiving device, and plays the response voice corresponding to the second voice command based on the spliced recording data.
  • steps 404 to 406 may also be performed again to process the new recording data in a similar manner, so as to correctly identify the new voice command input by the user, and then perform the processing on the new voice command input by the user. Voice commands for accurate responses.
  • the voice control method of the embodiment of the present application may further process the new recording data through the following steps.
  • Step 407 the speaker 201 sends a stop recording instruction to the TV 202 and the mobile phone 203 respectively.
  • the answering device sends a stop recording instruction to other non-answering devices, and the stop recording instruction is used to instruct to stop recording and discard the recording data.
  • step 408 the television 202 and the mobile phone 203 respectively stop recording, and discard the recording data.
  • Non-answering devices stop recording based on the stop recording command to reduce power consumption.
  • the speaker 201 sends a stop recording instruction to the TV 202 and the mobile phone 203 respectively.
  • the television set 202 and the mobile phone 203 respectively stop recording and discard the recording data.
  • the recorded data corresponding to the second voice instruction is discarded.
  • the speaker 201 receives a new voice command input by the user.
  • the speaker 201 records the third voice instruction to generate recording data, and the content of the recording data is "change a song”.
  • the speaker 201 uses the recorded data to perform processing such as SE, ASR, etc., so as to correctly recognize the voice command input by the user, and then accurately respond to the voice command input by the user.
  • the speaker 201 can play the response voice "OK, switch songs for you", and play the switched songs.
  • the answering device and the optimal radio device are both the speaker 201 as an example for illustration.
  • the answering device and the optimal radio device may be the same device or different devices.
  • the answering device is a speaker.
  • the optimal radio device is a television set 202, and the embodiments of the present application are not limited by the above examples.
  • the answering device and the optimal radio device are different devices, the answering device can call the recording data of the optimal radio device.
  • the answering device when the voice command received by the answering device is used to turn off the voice assistant, the answering device can stop calling the recording data of other non-answering devices, and then stop its own distributed voice recording task, and discard the recorded data.
  • the multiple electronic devices when multiple electronic devices respectively receive the first voice command input by the user, the multiple electronic devices wake up their respective voice assistants and start recording, and the first voice command is used to wake up the voice assistant of the electronic device. .
  • the answering device can determine the optimal radio device according to the recording data of each electronic device, and play the response voice corresponding to the second voice command according to the recording data of the optimal radio device.
  • this embodiment realizes a decentralized collaborative recording method by directly starting the recording after waking up from the electronic device, and no longer relying on the central device to call.
  • the recording Before the answering device is determined, the recording has been started, and the recording data is used for SE, ASR and other processing, which effectively eliminates the communication delay between devices, thereby solving the problem of frame loss in voice control caused by delay in multi-device scenarios.
  • the voice commands input by the user can be correctly recognized, and then the voice commands input by the user can be accurately responded to, and the accuracy of voice control can be improved.
  • the audio recording can be started in advance, and the electronic device can evaluate the quality of its own recording data, which can speed up the audio evaluation of the electronic device and shorten the time required for subsequent decision-making on the optimal radio device. , to speed up the processing flow of the voice control method and improve the response speed of the voice control.
  • FIG. 3 uses the wake-up word to wake up the voice assistant and start recording as an example for illustration.
  • the embodiment of the present application is not limited by this.
  • the embodiment of the present application may also not have the above wake-up process.
  • the method triggers the recording of the electronic device, and based on the multi-device collaborative radio, the accuracy of the voice control is improved.
  • the other manner may be that the electronic device detects a human voice, or the electronic device detects the voice of a specific user, etc., which are not described one by one in the embodiments of the present application.
  • the specific implementation of the voice control method without the above wake-up process triggering the recording of the electronic device is similar to the embodiment shown in FIG. 3 .
  • the answering device calls the voice pickup instruction, the non-responding device returns the recorded data, and the answering device returns the recording data.
  • the optimal radio device is determined, and the response voice corresponding to the second voice command is played according to the recorded data of the optimal radio device.
  • FIG. 6 is a schematic flowchart of another voice control method provided by an embodiment of the present application. This embodiment is illustrated by taking the three electronic devices shown in FIG. 1 , a speaker 201 , a television 202 and a mobile phone 203 , and the answering device being the speaker 201 as an example. This embodiment is not the first invocation after the electronic device wakes up, for example, the second invocation, the third invocation, and the fourth invocation of the multi-round dialogue of the voice assistant. As shown in FIG. 6 , the method of this embodiment may include:
  • Step 701 the speaker 201 respectively invokes a multi-round dialogue pause instruction to the TV set 202 and the mobile phone 203 , and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue pause instruction to temporarily stop.
  • the answering device does not detect a new voice command input by the user within a preset time period, that is, there is a time interval between voice commands input by the user.
  • the answering device detects this time interval and triggers multiple rounds of dialogue pause operations.
  • the answering device may respectively call other non-answering devices a multi-round dialogue pause instruction, where the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop.
  • the voice assistant of the speaker 201 may invoke the interface between the voice assistant of the television 202 and the voice assistant of the speaker 201 to transmit a multi-round dialogue temporary stop instruction to the television 202 .
  • the voice assistant of the speaker box 201 can call the interface between the voice assistant of the mobile phone 203 and the voice assistant of the speaker box 201 , so as to transmit to the speaker box 201 an instruction to temporarily stop multiple rounds of conversations.
  • the speaker 201 deletes the previously saved recording data and continues to keep the recording.
  • step 702 the television 202 and the mobile phone 203 respectively delete the recorded recording data and keep the recording respectively.
  • the television set 202 and the mobile phone 203 respectively delete the recording data before invoking the multi-round dialogue pause instruction, and continue to keep the recording.
  • Step 703 the speaker 201 , the television 202 and the mobile phone 203 respectively receive the fourth voice command input by the user, and record the fourth voice command respectively to generate respective recording data.
  • the speaker 201 , the TV 202 and the mobile phone 203 may further perform quality evaluation on the respective received recording data, and determine the audio quality information corresponding to the respective received recording data.
  • the fourth voice command spoken by the user may be “play movie 333333” as an example.
  • the speaker 201 , the TV 202 and the mobile phone 203 respectively record the fourth voice command to generate respective recording data, and the content of the recording data is "play movie 333333".
  • step 704 the speaker 201 calls the voice pickup instruction to the TV set 202 and the mobile phone 203 respectively, and the voice pickup instruction is used to instruct to return the recording data.
  • the answering device can respectively call the pickup instruction to other non-answering devices, and the pickup instruction is used to instruct the non-answering device to return the recording data to the answering device.
  • Step 705 the television 202 and the mobile phone 203 respectively send the recording data to the speaker 201 .
  • the television 202 sends the audio recording data of the television 202 to the speaker 201 .
  • the mobile phone 203 sends the recording data of the mobile phone 203 to the speaker 201 .
  • the content of the audio recording data is "play movie 333333".
  • Step 706 the speaker 201 determines the optimal radio device in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information, and responds to the fourth voice command according to the recording data of the optimal radio device.
  • the answering device selects an optimal radio device from multiple electronic devices according to the audio quality information corresponding to the recording data of multiple electronic devices (including itself and other non-responding devices), and uses the recording data of the optimal radio device to perform SE. , ASR, etc., to correctly identify the voice command input by the user, and then accurately respond to the voice command input by the user.
  • the accurate response to the voice command input by the user includes playing the response voice corresponding to the voice command input by the user.
  • the accurate response to the voice command input by the user may further include triggering the answering device or other non-responding device to execute an event corresponding to the voice command.
  • the event can be playing a song, playing a video, making a call, etc.
  • the speaker 201 of this embodiment determines that the optimal sound-receiving device is the speaker in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information of the recording data of the speaker 201 , the TV 202 and the mobile phone 203 .
  • the speaker 201 can play the response voice "The movie 333333 will be played on the TV", and the TV 202 starts to play the movie 333333.
  • the optimal radio equipment can change.
  • the fifth voice command spoken by the user may be “sound small” as an example.
  • the speaker 201 , the TV 202 and the mobile phone 203 respectively record the fifth voice command to generate their respective recording data, and the content of the recording data is the "sound point”.
  • the TV 202 is determined as the optimal sound-receiving device among the speakers 201 , the TV 202 and the mobile phone 203 .
  • the speaker 201 can respond to the fifth voice command based on the recording data of the TV set 202 .
  • different devices can be selected for sound recording according to the recording effect. For example, after the TV 202 starts to play a movie, strong self-noise (such as the sound produced during movie playback) occurs in the user's home, and the voice assistant of the speaker 201 will also be mixed into the statement played by the TV. If the sound of the speaker 201 is used The recorded data will cause ASR recognition errors.
  • the voice control method of this embodiment can improve the accuracy of ASR recognition by dynamically calling the TV to perform radio recording and complete echo cancellation, thereby accurately responding to the voice commands input by the user, and improving the accuracy of voice control. Rate.
  • the above-mentioned embodiments shown in FIG. 3 and FIG. 6 are illustrated by taking the answering device selecting the optimal radio device according to the audio quality information, and responding to the second voice command according to the recording data of the optimal radio device as an example.
  • the answering device directly responds to the second voice instruction according to the received recording data, or according to the received recording data and its own recording data.
  • the specific implementation manner of responding to the second voice command may be that the answering device splices the audio content information of the received recording data and the audio content information of its own recording data, based on The spliced audio content information responds to the second voice command.
  • the user speaks the voice signal "play song 112222”
  • the answering device only recognizes the voice signal "2222”
  • the audio content information of the recording data of the answering device is used to represent the voice signal "2222”
  • the answering device receives the recording of other devices
  • the audio content information of the data is used to represent the voice signal "play song 112”
  • the answering device can splicing the two to obtain the spliced audio content information
  • the spliced audio content information is used to represent the voice signal "play song 112222".
  • FIG. 8 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present application.
  • the apparatus can be applied to an electronic device of a voice control system (such as the above-mentioned first electronic device 201 ), and the voice control system can also include at least a second electronic device (such as the second electronic device 202 or the third electronic device 202 ).
  • device 203 the apparatus may include: a transceiver module 81 and a processing module 82 .
  • the transceiver module 81 may specifically be the mobile communication module 150 and/or the wireless communication module 160 in the embodiment shown in FIG. 2 .
  • the processing module 82 may be the processor 110 of the embodiment shown in FIG. 2 .
  • the transceiver module 81 is used for receiving the first voice command input by the user, and the processing module 82 is used for responding to the first voice command.
  • the transceiver module 81 is further configured to receive the recording data of the second electronic device sent by the second electronic device, where the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice instruction input by the user.
  • the processing module 82 is further configured to respond to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, and the recorded data of the first electronic device includes the second voice input by the user recorded by the first electronic device The recorded data of the command.
  • the transceiver module 81 is further configured to call a voice pickup instruction to the second electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the processing module 82 is further configured to record when or after the first electronic device receives the first voice instruction input by the user, and the recording is used to record the second voice instruction input by the user.
  • the first voice command is used to wake up a voice control function of the first electronic device and/or the second electronic device.
  • the processing module 82 is further configured to determine the first electronic device according to the audio quality information of the first voice command received by the first electronic device and the audio quality information of the first voice command received by the second electronic device Answering device for voice control system.
  • the processing module 82 is further configured to, after the first electronic device responds to the first voice command, before recording the second voice command input by the user, during the recording process of the first electronic device, within a preset time period If the second voice command input by the user is not detected, the saved recording data will be deleted, and the recording will continue.
  • the transceiver module 81 is further configured to call a multi-round dialogue pause instruction to the second electronic device, and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop.
  • the transceiver module 81 is further configured to receive audio quality information of the recording data of the second electronic device sent by the second electronic device.
  • the processing module 82 is configured to determine the optimal radio device from the voice control system according to the audio quality information of the audio recording data of the first electronic device and the audio quality information of the audio recording data of the second electronic device.
  • the optimal sound-receiving device is the first electronic device
  • the second voice command is responded to according to the recording data of the first electronic device.
  • the optimal sound-receiving device is the second electronic device
  • the second voice command is answered according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device.
  • the audio quality information is used to indicate the audio quality of the recording data.
  • the processing module 82 is configured to respond to the second voice instruction according to the audio content information of the audio recording data of the first electronic device and/or the audio content information of the audio recording data of the second electronic device.
  • the audio content information is used to represent the audio content of the recording data.
  • the voice control apparatus in this embodiment of the present application can be used to execute the steps of the answering device (eg, speaker 201 ) in the above method embodiment, and its technical principle and technical effect can be found in the explanation of the above method embodiment, which will not be repeated here.
  • the answering device eg, speaker 201
  • FIG. 9 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present application.
  • the apparatus can be applied to an electronic device (such as a second electronic device 202 or a third electronic device 203 ) of a voice control system, and the voice control system can also include at least a first electronic device (such as a first electronic device) 201), the apparatus may include: a transceiver module 91 and a processing module 92.
  • the transceiver module 91 may specifically be the mobile communication module 150 and/or the wireless communication module 160 in the embodiment shown in FIG. 2 .
  • the processing module 92 may be the processor 110 of the embodiment shown in FIG. 2 .
  • the processing module 92 is used for recording and saving the recording data, and the recording is used for recording the second voice instruction input by the user.
  • the transceiver module 91 is used for sending the recording data of the second electronic device to the first electronic device, the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice command input by the user, and the recording data is used for the first electronic device. After responding to the first voice command, the device responds to the second voice command.
  • the transceiver module 91 is further configured to receive a voice pickup instruction called by the first electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the processing module 92 is configured to record when or after the second electronic device receives the first voice instruction input by the user.
  • the processing module 92 is further configured to determine the first electronic device according to the audio quality information of the first voice command received by the second electronic device and the audio quality information of the first voice command received by the first electronic device.
  • the device is the answering device of the voice control system.
  • the processing module 92 is further configured to, after the first electronic device responds to the first voice command, during the recording process of the second electronic device, receive through the transceiver module 91 to invoke multiple rounds of dialogue pause commands from the second electronic device, The multi-round dialogue pause command is used to instruct the multi-round dialogue to temporarily stop.
  • the processing module 92 is also used to delete the saved recording data and continue recording.
  • the transceiver module 91 is further configured to send the audio quality information of the recording data of the second electronic device to the first electronic device.
  • the voice control apparatus in this embodiment of the present application can be used to perform the steps of any non-response device (such as a TV 202 or a mobile phone 203 ) in the above method embodiments.
  • any non-response device such as a TV 202 or a mobile phone 203
  • the electronic device may include: a microphone 1001 , one or more processors 1002 ; one or more memories 1003 ; the above devices may be connected through one or more communication buses 1005 .
  • the above-mentioned memory 1003 stores one or more computer programs 1004, one or more processors 1002 are used to execute one or more computer programs 1004, and the one or more computer programs 1004 include instructions, and the above-mentioned instructions can be used to execute the above-mentioned Each step performed by any electronic device in the method embodiment.
  • the electronic device may be any of the above-mentioned electronic devices, for example, a smart phone, a smart watch, and the like.
  • the electronic device shown in FIG. 10 may also include other devices such as a display screen, which is not limited in this embodiment of the present application. When it includes other devices, it may specifically be the electronic device shown in FIG. 2 .
  • the electronic device in this embodiment of the present application can be used to execute the steps of the electronic device in any of the above method embodiments, and the technical principles and technical effects of the electronic device can be referred to the explanations of the above method embodiments, which will not be repeated here.
  • inventions of the embodiments of the present application further provide a computer storage medium, where the computer storage medium may include computer instructions, when the computer instructions are executed on the electronic device, the electronic device is made to perform the execution of the electronic device in the foregoing method embodiments. each step.
  • inventions of the embodiments of the present application further provide a computer program product, which, when the computer program product runs on a computer, enables the computer to perform each step performed by the electronic device in the foregoing method embodiments.
  • An embodiment of the present application further provides a voice control system
  • the voice control system may at least include: a first electronic device and a second electronic device, wherein the first electronic device may adopt the structure of the embodiment shown in FIG. 8 or FIG. 10 ,
  • the second electronic device may adopt the structure of the embodiment shown in FIG. 9 or FIG. 10 , and correspondingly, may implement the technical solutions of any of the above method embodiments, and the implementation principles and technical effects thereof are similar, and will not be repeated here.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the processor mentioned in the above embodiments may be an integrated circuit chip, which has signal processing capability.
  • each step of the above method embodiment may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed in the embodiments of the present application may be directly embodied as executed by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor.
  • the software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephone Function (AREA)
  • Selective Calling Equipment (AREA)

Abstract

A speech control method, and an electronic device The speech control method is applied to a speech control system, wherein the speech control system at least comprises a first electronic device and a second electronic device, which have a speech control function. The speech control method comprises: a first electronic device and a second electronic device respectively receiving a first speech instruction input by a user, and the first electronic device responding to the first speech instruction; the second electronic device performing recording and storing recording data, wherein the recording is used for recording a second speech instruction input by the user; the second electronic device sending the recording data of the second electronic device to the first electronic device; the first electronic device responding to the second speech instruction according to recording data of the first electronic device and/or the recording data of the second electronic device, wherein the recording data of the first electronic device comprises recording data of when the first electronic device records a second speech instruction input by the user. By means of the method, the problem of false recognition of speech control in a multi-device scenario can be solved, thereby improving the accuracy of speech control.

Description

语音控制方法和电子设备Voice control method and electronic device
本申请要求于2021年01月29日提交中国专利局、申请号为202110130831.0、申请名称为“语音控制方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110130831.0 and the application title "Voice Control Method and Electronic Device" filed with the China Patent Office on January 29, 2021, the entire contents of which are incorporated into this application by reference.
技术领域technical field
本申请涉及计算机技术,尤其涉及一种语音控制方法和电子设备。The present application relates to computer technology, and in particular, to a voice control method and electronic device.
背景技术Background technique
语音助手作为一种基于语音语义算法的新型终端应用程序(application,APP),通过接收和识别用户发送的语音信号,提供交互对话、信息查询、设备控制等服务功能。随着深度学习理论的不断发展和智能语音硬件的成熟,语音助手应用程序已经成为智能手机、平板电脑、智能电视、智能音箱等终端设备必备的软件功能。As a new type of terminal application (application, APP) based on voice semantic algorithm, voice assistant provides service functions such as interactive dialogue, information query, and device control by receiving and recognizing voice signals sent by users. With the continuous development of deep learning theory and the maturity of intelligent voice hardware, voice assistant applications have become an essential software function for terminal devices such as smartphones, tablet computers, smart TVs, and smart speakers.
随着搭载语音助手的终端设备大量普及,众多用户已持有多台相同或相异类型的终端设备。在用户并发使用多台终端设备的场景下,或用户语音交互发生在多台终端设备有效工作范围内的场景下,通过终端设备之间的信号检测和交互协商,选择拾音最清晰(即距离用户最近)的终端设备作为拾音入口,供语音助手应用程序调用,可以提升语音助手应用程序的识别准确率。例如,用户家客厅有音箱,电视机和手机三台设备,这三台设备均安装有语音助手应用程序,且唤醒词均为“小E小E”。那么,当用户说出唤醒词“小E小E”后,音箱,电视机以及手机的语音助手应用程序通过检测唤醒词的音频能量信息,在三台设备中选择一台设备作为应答设备。由于音箱距离用户最近,所以三台设备基于唤醒词的音频能量信息,协商选择出音箱作为应答设备。音箱唤醒自身的语音助手应用程序,其他设备则对唤醒词不响应,即不唤醒各自的语音助手应用程序。这样,在用户继续说出语音信号后,就会只有音箱对用户的语音信号进行识别并响应。例如,用户说出语音信号“播放歌曲112222”后,音箱对该语音信号进行识别并响应。例如,音箱响应输出语音信号“将为你播放歌曲112222”。With the popularization of terminal devices equipped with voice assistants, many users already own multiple terminal devices of the same or different types. In the scenario where the user uses multiple terminal devices concurrently, or the user's voice interaction occurs within the effective working range of multiple terminal devices, through signal detection and interactive negotiation between the terminal devices, select the clearest pickup (that is, the distance The user's nearest) terminal device is used as a pickup entrance for the voice assistant application to call, which can improve the recognition accuracy of the voice assistant application. For example, the user's living room has three devices: a speaker, a TV, and a mobile phone. All three devices have a voice assistant application installed, and the wake-up words are all "small E and small E". Then, when the user speaks the wake-up word "Xiao E Xiao E", the voice assistant application of the speaker, TV and mobile phone selects one of the three devices as the answering device by detecting the audio energy information of the wake-up word. Since the speaker is closest to the user, the three devices negotiate and select the speaker as the answering device based on the audio energy information of the wake-up word. The speaker wakes up its own voice assistant application, and other devices do not respond to the wake word, that is, do not wake up their respective voice assistant applications. In this way, after the user continues to speak the voice signal, only the speaker will recognize and respond to the user's voice signal. For example, after the user speaks the voice signal "play song 112222", the speaker recognizes and responds to the voice signal. For example, the speaker responds by outputting the voice signal "Song 112222 will be played for you".
上述多设备语音控制过程中,由应答设备对用户的语音信号进行识别并响应,然而,由于使用场景的多样性和复杂性,这种处理方式会存在应答设备误识别的问题,即存在应答设备不能准确识别用户在唤醒词之后输入的语音信号的问题。In the above-mentioned multi-device voice control process, the answering device recognizes and responds to the user's voice signal. However, due to the diversity and complexity of the usage scenarios, this processing method will have the problem of misrecognition by the answering device, that is, there is an answering device. The problem that the voice signal input by the user after the wake-up word cannot be accurately recognized.
发明内容SUMMARY OF THE INVENTION
本申请提供一种语音控制方法和电子设备,以解决多设备场景中语音控制的误识别问题,提升语音控制的准确率。The present application provides a voice control method and electronic device, so as to solve the problem of misrecognition of voice control in a multi-device scenario and improve the accuracy of voice control.
第一方面,本申请实施例提供一种语音控制方法,该语音控制方法可以应用于语音控制系统,该语音控制系统至少可以包括具备语音控制功能的第一电子设备和第二电子设备,该语音控制方法可以包括:第一电子设备和第二电子设备分别接收用户输入的第一语音指 令,该第一电子设备应答该第一语音指令。第二电子设备录音,并保存录音数据,该录音用于录制用户输入的第二语音指令。第二电子设备向第一电子设备发送第二电子设备的录音数据。第一电子设备根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令。其中,第一电子设备的录音数据包括第一电子设备录制用户输入的第二语音指令的录音数据。In the first aspect, an embodiment of the present application provides a voice control method, which can be applied to a voice control system, and the voice control system can at least include a first electronic device and a second electronic device with a voice control function. The control method may include: the first electronic device and the second electronic device respectively receive a first voice command input by a user, and the first electronic device responds to the first voice command. The second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user. The second electronic device sends the audio recording data of the second electronic device to the first electronic device. The first electronic device responds to the second voice instruction according to the recorded data of the first electronic device and/or the recorded data of the second electronic device. The recording data of the first electronic device includes recording data of the second voice instruction input by the user recorded by the first electronic device.
第二电子设备录音可以开始于第一电子设备应答该第一语音指令之前,将应答设备的选择过程与电子设备的录音过程解耦合,无论多个电子设备之间是否决策出第一电子设备作为应答设备,第二电子设备都可以对用户输入第二语音指令进行录音,并保存,在决策出第一电子设备作为应答设备之后,将第二电子设备的录音数据发送给第一电子设备,由第一电子设备应答第二语音指令。The recording of the second electronic device may start before the first electronic device responds to the first voice instruction, decoupling the selection process of the answering device from the recording process of the electronic device, regardless of whether the first electronic device is determined as the Both the answering device and the second electronic device can record and save the second voice command input by the user. After the first electronic device is decided as the answering device, the recording data of the second electronic device is sent to the first electronic device, and the The first electronic device responds to the second voice command.
本实现方式,第一电子设备作为应答设备应答第一语音指令,第一电子设备和第二电子设备均对第二语音指令进行录音,并保存录音数据,第二电子设备将自身的录音数据发送给第一电子设备,第一电子设备根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令。本实现方式通过非应答设备对用户输入的语音指令进行录音,应答设备基于应答设备的录音数据和/或非应答设备的录音数据,进行SE、ASR等处理,有效消除选取应答设备过程中设备之间的通信时延,从而解决多设备场景中因时延导致的语音控制的丢帧问题。应答设备通过多设备协同收音的录音数据,应答第二语音指令,可解决电子设备所拾取的语音指令的音频质量对ASR识别准确率的影响问题,提升语音控制的准确率。In this implementation, the first electronic device acts as an answering device to answer the first voice command, the first electronic device and the second electronic device both record the second voice command and save the recorded data, and the second electronic device sends its own recorded data To the first electronic device, the first electronic device responds to the second voice instruction according to the recorded data of the first electronic device and/or the recorded data of the second electronic device. In this implementation, the voice commands input by the user are recorded by the non-responding device, and the answering device performs SE, ASR and other processing based on the recorded data of the answering device and/or the recorded data of the non-response device, effectively eliminating the need for the equipment in the process of selecting the answering device. The communication delay between different devices can be solved, so as to solve the frame loss problem of voice control caused by delay in multi-device scenarios. The answering device responds to the second voice command through the recording data collected by multiple devices collaboratively, which can solve the problem of the influence of the audio quality of the voice command picked up by the electronic device on the accuracy of ASR recognition, and improve the accuracy of voice control.
一种可能的设计中,该方法还可以包括:第一电子设备向第二电子设备调用拾音指令,该拾音指令用于第二电子设备返回第二电子设备的录音数据。In a possible design, the method may further include: the first electronic device invokes a voice pickup instruction to the second electronic device, where the voice pickup instruction is used by the second electronic device to return the recording data of the second electronic device.
一种可能的设计中,该第二电子设备录音,可以包括:在第二电子设备接收到用户输入的第一语音指令时或之后,第二电子设备录音。In a possible design, the recording by the second electronic device may include: recording by the second electronic device when or after the second electronic device receives the first voice instruction input by the user.
本实现方式,通过在第二电子设备接收到用户输入的第一语音指令时或之后,第二电子设备录音,即在确定应答设备之前第二电子设备开始录音,第二电子设备可以录制到用户输入的第二语音指令。这样可以有效消除选取应答设备过程中设备之间的通信时延,从而解决多设备场景中因时延导致的语音控制的丢帧问题。In this implementation manner, when the second electronic device receives the first voice command input by the user or after the second electronic device records the recording, that is, the second electronic device starts recording before determining the answering device, the second electronic device can record to the user The second voice command entered. This can effectively eliminate the communication delay between devices in the process of selecting the answering device, thereby solving the problem of frame loss in voice control caused by delay in multi-device scenarios.
一种可能的设计中,该方法还可以包括:在第一电子设备接收到用户输入的第一语音指令时或之后,第一电子设备录音,该录音用于录制用户输入的第二语音指令。In a possible design, the method may further include: when or after the first electronic device receives the first voice instruction input by the user, recording the first electronic device, and the recording is used to record the second voice instruction input by the user.
一种可能的设计中,该第一语音指令用于唤醒第一电子设备和/或第二电子设备的语音控制功能。In a possible design, the first voice command is used to wake up the voice control function of the first electronic device and/or the second electronic device.
为了便于理解,这里的第一语音指令可以是下述图3所示实施例的步骤401的语音指令。For ease of understanding, the first voice instruction here may be the voice instruction of step 401 in the following embodiment shown in FIG. 3 .
一种可能的设计中,该方法还可以包括:第一电子设备和第二电子设备分别根据各自接收到的第一语音指令的音频质量信息,确定第一电子设备为语音控制系统的应答设备。In a possible design, the method may further include: the first electronic device and the second electronic device determine that the first electronic device is the answering device of the voice control system according to the audio quality information of the first voice command received by the first electronic device respectively.
一种可能的设计中,在第一电子设备应答第一语音指令之后,在录制到用户输入的第二语音指令之前,该方法还可以包括:在第一电子设备和第二电子设备录音过程中,该第一电子设备在预设时间段内未检测到用户输入的第二语音指令,第一电子设备删除已保存的录音数据,并继续录音。第一电子设备向第二电子设备调用多轮对话暂停指令,该多轮 对话暂停指令用于指示多轮对话暂时停止。该第二电子设备删除已保存的录音数据,并继续录音。In a possible design, after the first electronic device responds to the first voice command and before recording the second voice command input by the user, the method may further include: during the recording process of the first electronic device and the second electronic device. , the first electronic device does not detect the second voice instruction input by the user within the preset time period, the first electronic device deletes the saved recording data and continues to record. The first electronic device invokes a multi-round dialogue pause command to the second electronic device, where the multi-round dialogue pause command is used to instruct the multi-round dialogue pause to temporarily stop. The second electronic device deletes the saved recording data and continues recording.
为了便于理解,这里的第一语音指令可以是下述图6所示实施例的步骤701之前的语音指令。这里的第二语音指令可以是下述图6所示实施例的步骤703的语音指令。For ease of understanding, the first voice instruction here may be the voice instruction before step 701 in the embodiment shown in FIG. 6 below. The second voice instruction here may be the voice instruction of step 703 in the following embodiment shown in FIG. 6 .
一种可能的设计中,该方法还可以包括:第一电子设备接收第二电子设备发送的第二电子设备的录音数据的音频质量信息。In a possible design, the method may further include: the first electronic device receiving audio quality information of the recording data of the second electronic device sent by the second electronic device.
本实现方式,可以加快最优收音设备的决策,从而提升语音控制响应速度。This implementation can speed up the decision of the optimal radio equipment, thereby improving the response speed of the voice control.
一种可能的设计中,第一电子设备根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令,可以包括:第一电子设备根据第一电子设备的录音数据的音频质量信息和第二电子设备的录音数据的音频质量信息,从语音控制系统中确定最优收音设备。当最优收音设备为第一电子设备时,第一电子设备根据第一电子设备的录音数据,或者,根据第一电子设备的录音数据和第二电子设备的录音数据,应答第二语音指令。当最优收音设备为第二电子设备时,第一电子设备根据第二电子设备的录音数据,或者,根据第二电子设备的录音数据和第一电子设备的录音数据,应答第二语音指令。其中,该音频质量信息用于表示录音数据的音频质量。In a possible design, the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device. The audio quality information of the data and the audio quality information of the recording data of the second electronic device are used to determine the optimal audio pickup device from the voice control system. When the optimal radio device is the first electronic device, the first electronic device responds to the second voice command according to the recording data of the first electronic device, or according to the recording data of the first electronic device and the recording data of the second electronic device. When the optimal radio device is the second electronic device, the first electronic device responds to the second voice command according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device. The audio quality information is used to indicate the audio quality of the recording data.
本实现方式,通过使用最优收音设备的录音数据,应答第二语音指令,可以降低噪声对语音控制准确率的影响。In this implementation manner, by using the recording data of the optimal radio equipment to respond to the second voice command, the influence of noise on the accuracy of voice control can be reduced.
一种可能的设计中,第一电子设备根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令,可以包括:第一电子设备根据第一电子设备的录音数据的音频内容信息和/或第二电子设备的录音数据的音频内容信息,应答第二语音指令。其中,音频内容信息用于表示录音数据的音频内容。In a possible design, the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device. The audio content information of the data and/or the audio content information of the recording data of the second electronic device is to respond to the second voice instruction. The audio content information is used to represent the audio content of the recording data.
例如,当第一电子设备的录音数据的音频内容信息多于第二电子设备的录音数据的音频内容信息时,根据第一电子设备的录音数据的音频内容信息,应答第二语音指令。当第一电子设备的录音数据的音频内容信息少于第二电子设备的录音数据的音频内容信息时,根据第二电子设备的录音数据的音频内容信息,应答第二语音指令。再例如,当第一电子设备的录音数据的音频内容信息与第二电子设备的录音数据的音频内容信息存在部分相同内容时,第一电子设备可以对第一电子设备的录音数据的音频内容信息和第二电子设备的录音数据的音频内容信息进行拼接,根据拼接后的音频内容信息,应答第二语音指令。For example, when the audio content information of the recording data of the first electronic device is more than the audio content information of the recording data of the second electronic device, the second voice command is responded according to the audio content information of the recording data of the first electronic device. When the audio content information of the recording data of the first electronic device is less than the audio content information of the recording data of the second electronic device, the second voice command is responded according to the audio content information of the recording data of the second electronic device. For another example, when the audio content information of the audio recording data of the first electronic device and the audio content information of the audio recording data of the second electronic device have partially the same content, the first electronic device can compare the audio content information of the audio recording data of the first electronic device to the audio content information. Splicing with the audio content information of the recording data of the second electronic device, and responding to the second voice command according to the spliced audio content information.
本实现方式,通过使用多设备协同收音的录音数据,应答第二语音指令,可以避免丢帧,提升语音控制的准确率。In this implementation manner, by using the recording data collected by multiple devices cooperatively to respond to the second voice command, frame loss can be avoided, and the accuracy of voice control can be improved.
第二方面,本申请实施例提供一种语音控制方法,该方法可以应用于语音控制系统的第一电子设备,该语音控制系统还可以至少包括第二电子设备,该语音控制方法可以包括:第一电子设备接收用户输入的第一语音指令,第一电子设备应答第一语音指令。第一电子设备接收第二电子设备发送的第二电子设备的录音数据,第二电子设备的录音数据包括第二电子设备录制用户输入的第二语音指令的录音数据。第一电子设备根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令,第一电子设备的录音数据包括第一电子设备录制用户输入的第二语音指令的录音数据。In a second aspect, an embodiment of the present application provides a voice control method, which can be applied to a first electronic device of a voice control system, the voice control system can also include at least a second electronic device, and the voice control method can include: An electronic device receives the first voice command input by the user, and the first electronic device responds to the first voice command. The first electronic device receives the recording data of the second electronic device sent by the second electronic device, and the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice instruction input by the user. The first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, and the recorded data of the first electronic device includes the first electronic device recording the second voice command input by the user. recording data.
一种可能的设计中,该方法还可以包括:第一电子设备向第二电子设备调用拾音指令,拾音指令用于第二电子设备返回第二电子设备的录音数据。In a possible design, the method may further include: the first electronic device invokes a voice pickup instruction to the second electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
一种可能的设计中,该方法还可以包括:在第一电子设备接收用户输入的第一语音指令时或之后,第一电子设备录音,录音用于录制用户输入的第二语音指令。In a possible design, the method may further include: when or after the first electronic device receives the first voice instruction input by the user, recording the first electronic device for recording the second voice instruction input by the user.
一种可能的设计中,第一语音指令用于唤醒第一电子设备和/或第二电子设备的语音控制功能。In a possible design, the first voice command is used to wake up the voice control function of the first electronic device and/or the second electronic device.
一种可能的设计中,该方法还可以包括:第一电子设备根据第一电子设备接收到的第一语音指令的音频质量信息和第二电子设备接收到的第一语音指令的音频质量信息,确定第一电子设备为语音控制系统的应答设备。In a possible design, the method may further include: the first electronic device according to the audio quality information of the first voice command received by the first electronic device and the audio quality information of the first voice command received by the second electronic device, It is determined that the first electronic device is an answering device of the voice control system.
一种可能的设计中,在第一电子设备应答第一语音指令之后,在录制到用户输入的第二语音指令之前,该方法还可以包括:在第一电子设备录音过程中,第一电子设备在预设时间段内未检测到用户输入的第二语音指令,第一电子设备删除已保存的录音数据,并继续录音;第一电子设备向第二电子设备调用多轮对话暂停指令,多轮对话暂停指令用于指示多轮对话暂时停止;第二电子设备删除已保存的录音数据,并继续录音。In a possible design, after the first electronic device responds to the first voice command and before recording the second voice command input by the user, the method may further include: during the recording process of the first electronic device, the first electronic device If the second voice command input by the user is not detected within the preset time period, the first electronic device deletes the saved recording data and continues to record; the first electronic device invokes multiple rounds of dialogue pause instructions to the second electronic device, The dialogue pause instruction is used to instruct multiple rounds of dialogue to temporarily stop; the second electronic device deletes the saved recording data and continues recording.
一种可能的设计中,该方法还可以包括:第一电子设备接收第二电子设备发送的第二电子设备的录音数据的音频质量信息。In a possible design, the method may further include: the first electronic device receiving audio quality information of the recording data of the second electronic device sent by the second electronic device.
一种可能的设计中,第一电子设备根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令,可以包括:第一电子设备根据第一电子设备的录音数据的音频质量信息和第二电子设备的录音数据的音频质量信息,从语音控制系统中确定最优收音设备。当最优收音设备为第一电子设备时,第一电子设备根据第一电子设备的录音数据,应答第二语音指令。当最优收音设备为第二电子设备时,第一电子设备根据第二电子设备的录音数据,或者,根据第二电子设备的录音数据和第一电子设备的录音数据,应答第二语音指令。其中,音频质量信息用于表示录音数据的音频质量。In a possible design, the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device. The audio quality information of the data and the audio quality information of the recording data of the second electronic device are used to determine the optimal audio pickup device from the voice control system. When the optimal audio pickup device is the first electronic device, the first electronic device responds to the second voice command according to the recording data of the first electronic device. When the optimal radio device is the second electronic device, the first electronic device responds to the second voice command according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device. The audio quality information is used to indicate the audio quality of the recording data.
一种可能的设计中,第一电子设备根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令,可以包括:第一电子设备根据第一电子设备的录音数据的音频内容信息和/或第二电子设备的录音数据的音频内容信息,应答第二语音指令。其中,音频内容信息用于表示录音数据的音频内容。In a possible design, the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device. The audio content information of the data and/or the audio content information of the recording data of the second electronic device is to respond to the second voice instruction. The audio content information is used to represent the audio content of the recording data.
第三方面,本申请实施例提供一种语音控制方法,该语音控制方法可以应用于语音控制系统的第二电子设备,该语音控制系统还可以至少包括第一电子设备,该语音控制方法可以包括:第二电子设备录音,并保存录音数据,录音用于录制用户输入的第二语音指令。第二电子设备向第一电子设备发送第二电子设备的录音数据,第二电子设备的录音数据包括第二电子设备录制用户输入的第二语音指令的录音数据,录音数据用于第一电子设备在应答第一语音指令之后,应答第二语音指令。In a third aspect, an embodiment of the present application provides a voice control method. The voice control method can be applied to a second electronic device of a voice control system. The voice control system can also include at least a first electronic device. The voice control method can include : The second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user. The second electronic device sends the recording data of the second electronic device to the first electronic device, the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice command input by the user, and the recording data is used by the first electronic device After answering the first voice instruction, answer the second voice instruction.
一种可能的设计中,该方法还可以包括:第二电子设备接收第一电子设备调用拾音指令,拾音指令用于第二电子设备返回第二电子设备的录音数据。In a possible design, the method may further include: the second electronic device receives a voice pickup instruction called by the first electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
一种可能的设计中,第二电子设备录音,可以包括:在第二电子设备接收到用户输入的第一语音指令时或之后,第二电子设备录音。In a possible design, the recording by the second electronic device may include: recording by the second electronic device when or after the second electronic device receives the first voice instruction input by the user.
一种可能的设计中,该方法还可以包括:第二电子设备根据第二电子设备接收到的第一语音指令的音频质量信息和第一电子设备接收到的第一语音指令的音频质量信息,确定第一电子设备为语音控制系统的应答设备。In a possible design, the method may further include: the second electronic device according to the audio quality information of the first voice command received by the second electronic device and the audio quality information of the first voice command received by the first electronic device, It is determined that the first electronic device is an answering device of the voice control system.
一种可能的设计中,在第一电子设备应答第一语音指令之后,该方法还可以包括:在 第二电子设备录音过程中,第二电子设备接收第二电子设备调用多轮对话暂停指令,多轮对话暂停指令用于指示多轮对话暂时停止;第二电子设备删除已保存的录音数据,并继续录音。In a possible design, after the first electronic device responds to the first voice command, the method may further include: during the recording process of the second electronic device, the second electronic device receives the second electronic device to invoke multiple rounds of dialogue pause commands, The multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop; the second electronic device deletes the saved recording data and continues recording.
一种可能的设计中,该方法还可以包括:第二电子设备向第一电子设备发送第二电子设备的录音数据的音频质量信息。In a possible design, the method may further include: the second electronic device sends audio quality information of the recording data of the second electronic device to the first electronic device.
第四方面,本申请实施例提供一种语音控制装置,该装置具有实现上述第二方面或第二方面任一种可能的设计的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块,例如,收发单元或模块,处理单元或模块。In a fourth aspect, an embodiment of the present application provides a voice control device, the device has the function of implementing the second aspect or any possible design of the second aspect. The functions can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, for example, a transceiver unit or module, and a processing unit or module.
第五方面,本申请实施例提供一种语音控制装置,该装置具有实现上述第三方面或第三方面任一种可能的设计的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块,例如,收发单元或模块,处理单元或模块。In a fifth aspect, an embodiment of the present application provides a voice control device, the device has a function of implementing the third aspect or any possible design of the third aspect. The functions can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, for example, a transceiver unit or module, and a processing unit or module.
第六方面,本申请实施例提供一种电子设备,该电子设备可以包括:一个或多个处理器;一个或多个存储器;其中,所述一个或多个存储器用于存储一个或多个程序;所述一个或多个处理器用于运行所述一个或多个程序,以实现如第二方面或第二方面任一种可能的设计所述的方法。In a sixth aspect, an embodiment of the present application provides an electronic device, which may include: one or more processors; one or more memories; wherein the one or more memories are used to store one or more programs ; the one or more processors are configured to run the one or more programs to implement the method according to the second aspect or any possible design of the second aspect.
第七方面,本申请实施例提供一种电子设备,该电子设备可以包括:一个或多个处理器;一个或多个存储器;其中,所述一个或多个存储器用于存储一个或多个程序;所述一个或多个处理器用于运行所述一个或多个程序,以实现如第三方面或第三方面任一种可能的设计所述的方法。In a seventh aspect, an embodiment of the present application provides an electronic device, which may include: one or more processors; one or more memories; wherein the one or more memories are used to store one or more programs ; the one or more processors are configured to run the one or more programs to implement the method according to the third aspect or any possible design of the third aspect.
第八方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行如第二方面或第二方面任一种可能的设计所述的方法。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the second aspect or any of the second aspect. A possible design of the method described.
第九方面,本申请实施例提供一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行如第三方面或第三方面任一种可能的设计所述的方法。In a ninth aspect, an embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the third aspect or any of the third aspect. A possible design of the method described.
第十方面,本申请实施例提供一种芯片,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行如第二方面或第二方面任一种可能的设计所述的方法。In a tenth aspect, an embodiment of the present application provides a chip, which is characterized in that it includes a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, to A method as described in the second aspect or any possible design of the second aspect is performed.
第十一方面,本申请实施例提供一种芯片,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行如第三方面或第三方面任一种可能的设计所述的方法。In an eleventh aspect, an embodiment of the present application provides a chip, characterized in that it includes a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, to perform the method described in the third aspect or any possible design of the third aspect.
第十二方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第二方面或第二方面任一种可能的设计所述的方法。In a twelfth aspect, embodiments of the present application provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in the second aspect or any possible design of the second aspect.
第十三方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第三方面或第三方面任一种可能的设计所述的方法。In a thirteenth aspect, the embodiments of the present application provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in the third aspect or any possible design of the third aspect.
第十四方面,本申请实施例提供一种语音控制系统,所述语音控制系统至少包括具备语音控制功能的第一电子设备和第二电子设备。第一电子设备用于执行如第二方面或第二 方面任一种可能的设计所述的方法。第二电子设备用于执行如第三方面或第三方面任一种可能的设计所述的方法。In a fourteenth aspect, an embodiment of the present application provides a voice control system, where the voice control system includes at least a first electronic device and a second electronic device having a voice control function. The first electronic device is adapted to perform the method as described in the second aspect or any possible design of the second aspect. The second electronic device is configured to perform the method as described in the third aspect or any possible design of the third aspect.
本申请实施例的语音控制方法和电子设备,在上述多设备场景下,通过多个设备之间不进行跨设备通信直接录音的方式,解决多设备场景中语音控制的丢帧问题,提升语音控制的准确率。之后,通过多设备协同收音的录音数据,应答用户输入的语音指令,可有效解决电子设备所拾取的语音指令的音频质量对ASR识别准确率的影响问题,提升语音控制的准确率。The voice control method and electronic device of the embodiments of the present application, in the above-mentioned multi-device scenario, solve the problem of frame loss in voice control in the multi-device scenario by directly recording multiple devices without performing cross-device communication, and improve voice control 's accuracy. After that, responding to the voice command input by the user through the recorded data of the multi-device collaborative sound collection can effectively solve the problem that the audio quality of the voice command picked up by the electronic device affects the accuracy of ASR recognition, and improve the accuracy of voice control.
附图说明Description of drawings
图1为本申请实施例提供一种语音控制系统的示意图;1 provides a schematic diagram of a voice control system according to an embodiment of the present application;
图2为本申请实施例提供的一种电子设备的硬件结构示意图;2 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application;
图3为本申请实施例提供的一种语音控制方法的流程示意图;3 is a schematic flowchart of a voice control method provided by an embodiment of the present application;
图4为本申请实施例提供的一种多设备语音控制的场景示意图;FIG. 4 is a schematic diagram of a scenario of multi-device voice control provided by an embodiment of the present application;
图5为本申请实施例提供的另一种多设备语音控制的场景示意图;FIG. 5 is a schematic diagram of another multi-device voice control scenario provided by an embodiment of the present application;
图6为本申请实施例提供的另一种语音控制方法的流程示意图;6 is a schematic flowchart of another voice control method provided by an embodiment of the present application;
图7为本申请实施例提供的另一种多设备语音控制的场景示意图;FIG. 7 is a schematic diagram of another multi-device voice control scenario provided by an embodiment of the present application;
图8为本申请实施例提供的一种语音控制装置的结构示意图;FIG. 8 is a schematic structural diagram of a voice control device according to an embodiment of the present application;
图9为本申请实施例提供的一种语音控制装置的结构示意图;FIG. 9 is a schematic structural diagram of a voice control apparatus provided by an embodiment of the present application;
图10为本申请实施例提供的一种电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例涉及的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", etc. involved in the embodiments of the present application are only used for the purpose of distinguishing and describing, and cannot be understood as indicating or implying relative importance, nor can they be understood as indicating or implying a sequence. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, eg, comprising a series of steps or elements. A method, system, product or device is not necessarily limited to those steps or units expressly listed, but may include other steps or units not expressly listed or inherent to the process, method, product or device.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that, in this application, "at least one (item)" refers to one or more, and "a plurality" refers to two or more. "And/or" is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, "A and/or B" can mean: only A, only B, and both A and B exist , where A and B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one item(s) below" or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a) of a, b or c, can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c" ", where a, b, c can be single or multiple.
语音助手:一种基于人工智能构建的应用程序,借助语音语义识别算法,通过与用户进行即时问答式的语音交互,帮助用户完成信息查询、设备控制、文本输入等操作。语音助手通常采用分阶段级联处理,依次通过语音唤醒、语音前端处理、自动语音识别(automatic speech recognition,ASR)、自然语言理解(natural language understanding,NLU)、对话管理(dialog management,DM)、自然语言生成(natural language generation,NLG)、文本转语音(text to speech,TTS)等基本工作流程提供服务功能。其中,语音前端处理可以 包括但不限于语音增强(speech enhancement,SE)。ASR可以以SE降噪处理后的语音信号为输入,输出用户语音信号的文本化描述结果。ASR是语音助手应用程序准确完成后续识别处理任务的基础。输入ASR的用户语音信号的音频质量,直接决定了ASR识别结果的准确率。本申请实施例的语音控制方法可以保障输入ASR的用户语音信号的准确性和可靠性,从而提升ASR识别结果的准确率,进而准确完成后续识别处理任务。Voice assistant: An application program built on artificial intelligence, with the help of speech semantic recognition algorithm, through instant question-and-answer voice interaction with users, it helps users to complete information query, device control, text input and other operations. Voice assistants usually use staged cascade processing, followed by voice wake-up, voice front-end processing, automatic speech recognition (ASR), natural language understanding (NLU), dialogue management (dialog management, DM), Basic workflows such as natural language generation (NLG) and text-to-speech (TTS) provide service functions. Wherein, the voice front-end processing may include but is not limited to voice enhancement (speech enhancement, SE). ASR can take the speech signal processed by SE noise reduction as input, and output the textual description result of the user's speech signal. ASR is the basis for voice assistant applications to accurately complete subsequent recognition processing tasks. The audio quality of the user's voice signal input to the ASR directly determines the accuracy of the ASR recognition result. The voice control method of the embodiment of the present application can ensure the accuracy and reliability of the user voice signal input to the ASR, thereby improving the accuracy of the ASR recognition result, and then accurately completing the subsequent recognition processing task.
语音唤醒:电子设备在锁屏或语音助手休眠状态下,接收并检测特定的用户语音信号(即唤醒词),激活或启动语音助手,使语音助手进入等待语音信号输入状态。Voice wake-up: The electronic device receives and detects a specific user voice signal (ie wake-up word) when the screen is locked or the voice assistant is dormant, activates or starts the voice assistant, and makes the voice assistant enter the state of waiting for voice signal input.
回声消除(acoustic echo cancellation,AEC):一种语音前端处理技术,通过音波干扰方式消除麦克风与扬声器因空气产生回受路径而产生的杂音,可有效缓解由于扬声器播放音频或声波空间反射所引发的噪声干扰问题。Acoustic echo cancellation (AEC): A voice front-end processing technology that eliminates the noise generated by the microphone and the speaker due to the return path of the air by means of sound wave interference, which can effectively alleviate the sound caused by the speaker playing audio or sound waves. noise interference problem.
多设备语音控制过程中,多个电子设备通过相互通信协商选择出应答设备,由应答设备对用户的语音信号进行识别并响应。这种处理方式存在误识别的原因有两个方面:音频质量和时延。对于音频质量,由于使用场景的多样性和复杂性,电子设备所拾取并处理的用户语音指令不可避免地会受到各类外部噪声和内部噪声的干扰。噪声的干扰会影响电子设备所拾取用户语音指令的音频质量。例如,外部噪声可以是设备周边的空调风机、无关人声等噪声,内部噪声可以是电子设备自身所播放的音/视频。另外,电子设备与用户之间的距离、方位,以及电子设备自身摆放姿态和麦克风模组性能等,也会影响电子设备所拾取用户语音指令的音频质量。当电子设备所拾取用户语音指令的音频质量较差时,会引发误识别。对于时延,多个电子设备在协商选择应答设备的过程中,多个电子设备之间跨设备通信产生的通信时延和应答设备选择产生的时延,都会引发丢帧问题,进而引发误识别。例如,上述时延,会导致用户说出语音信号“播放歌曲112222”,而应答设备仅识别到语音信号“2222”,即未接收并识别到语音信号“播放歌曲11”,进而使得应答设备无法对用户语音指令进行准确识别和响应。In the multi-device voice control process, multiple electronic devices select an answering device through mutual communication and negotiation, and the answering device identifies and responds to the user's voice signal. There are two reasons for the misidentification of this approach: audio quality and latency. With regard to audio quality, due to the diversity and complexity of usage scenarios, user voice commands picked up and processed by electronic devices are inevitably disturbed by various external and internal noises. The interference of noise will affect the audio quality of the user's voice command picked up by the electronic device. For example, the external noise can be noises such as air conditioner fans and unrelated human voices around the device, and the internal noise can be the audio/video played by the electronic device itself. In addition, the distance and orientation between the electronic device and the user, as well as the posture of the electronic device and the performance of the microphone module, etc., will also affect the audio quality of the user's voice commands picked up by the electronic device. When the audio quality of the user's voice command picked up by the electronic device is poor, misrecognition will occur. For the delay, in the process of negotiating and selecting the answering device among multiple electronic devices, the communication delay caused by the cross-device communication between the multiple electronic devices and the delay caused by the selection of the answering device will cause the frame loss problem, which will lead to misidentification. . For example, the above delay will cause the user to say the voice signal "play song 112222", but the answering device only recognizes the voice signal "2222", that is, the voice signal "play song 11" is not received and recognized, which makes the answering device unable to Accurately recognize and respond to user voice commands.
本申请实施例的语音控制方法可以从提升音频质量和/或降低时延,解决多设备语音控制过程中的语音指令误识别的问题。通过多个电子设备之间不进行跨设备通信直接开始录音的方式,消除通过通信实现多设备唤醒和数据传输,而产生的时延,进而消除时延对ASR识别准确性的影响,解决多设备场景中语音控制的丢帧问题,提升语音控制的准确率。通过在多个电子设备中选择一个或多个电子设备作为最优收音设备,最优收音设备的录音数据的音频质量好于其他电子设备。基于最优收音设备的录音数据,对用户输入的语音指令进行响应。通过多设备协同收音,可解决电子设备所拾取的语音指令的音频质量对ASR识别准确率的影响问题,提升语音控制的准确率。The voice control method of the embodiment of the present application can improve the audio quality and/or reduce the time delay, so as to solve the problem of misrecognition of voice commands in the process of multi-device voice control. Through the method of directly starting recording without cross-device communication between multiple electronic devices, the delay caused by the realization of multi-device wake-up and data transmission through communication is eliminated, thereby eliminating the impact of the delay on the accuracy of ASR recognition, and solving the problem of multiple devices. The frame loss problem of voice control in the scene improves the accuracy of voice control. By selecting one or more electronic devices among the plurality of electronic devices as the optimal audio pickup device, the audio quality of the audio recording data of the optimal audio pickup device is better than that of other electronic devices. Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user. Through multi-device cooperative audio recording, the influence of audio quality of voice commands picked up by electronic devices on the accuracy of ASR recognition can be solved, and the accuracy of voice control can be improved.
本申请实施例的语音控制方法,可以应用于多设备场景。多设备场景可以包括用户并发使用多个电子设备的场景,或用户语音交互发生在多个电子设备的有效工作范围内的场景。其中,多个电子设备各自具备语音控制功能。该语音控制功能可以由语音助手提供。在该多设备场景下,用户在说出唤醒词和语音指令后,采用本实施例的方法,可以保障输入ASR的语音指令的准确性和可靠性,从而提升ASR识别结果的准确率,进而准确完成后续识别处理任务,完成对语音指令的响应。使得电子设备更加智能,实现了电子设备与用户之间的高效准确互动。同时,提高了用户的使用体验。The voice control method in the embodiment of the present application can be applied to a multi-device scenario. The multi-device scenario may include a scenario where a user uses multiple electronic devices concurrently, or a scenario where user voice interaction occurs within the effective working range of multiple electronic devices. Among them, each of the plurality of electronic devices has a voice control function. This voice control function may be provided by a voice assistant. In the multi-device scenario, after the user speaks the wake-up word and the voice command, the method of this embodiment can ensure the accuracy and reliability of the voice command input to the ASR, thereby improving the accuracy of the ASR recognition result, and further improving the accuracy of the ASR recognition result. Complete the subsequent recognition and processing tasks, and complete the response to the voice command. It makes the electronic device more intelligent, and realizes the efficient and accurate interaction between the electronic device and the user. At the same time, the user experience is improved.
本申请实施例的语音指令,指用户以声音形式向电子设备输入的指令。该语音指令用于使得电子设备向用户提供交互对话、信息查询、设备控制等服务功能。例如,该语音指令可以是用户通过电子设备的麦克风输入的一段语音信号。The voice command in the embodiment of the present application refers to the command input by the user to the electronic device in the form of sound. The voice command is used to enable the electronic device to provide the user with service functions such as interactive dialogue, information query, and device control. For example, the voice instruction may be a piece of voice signal input by the user through the microphone of the electronic device.
在一些实施例中,可以通过在电子设备中安装语音助手,以使该电子设备实现语音控制功能。语音助手一般情况下是处于休眠状态的。用户在使用电子设备的语音控制功能之前,可以对语音助手进行语音唤醒。其中,唤醒语音助手的语音信号可以称为唤醒词(或唤醒语音)。该唤醒词可以预先注册在电子设备中。举例而言,该唤醒词可以是“小E小E”,当然可以理解的,唤醒词也可以是其他任意词语或语句,其可以根据需求进行灵活设置,本申请实施例不一一举例说明。In some embodiments, a voice assistant may be installed in the electronic device to enable the electronic device to implement a voice control function. Voice assistants are generally dormant. The user can voice wake up the voice assistant before using the voice control function of the electronic device. Among them, the voice signal to wake up the voice assistant may be called a wake-up word (or wake-up voice). The wake word may be pre-registered in the electronic device. For example, the wake-up word may be "small E, small E". Of course, it can be understood that the wake-up word may also be any other word or statement, which can be flexibly set according to requirements, and the embodiments of the present application will not illustrate them one by one.
另外,上述语音助手可以是电子设备中的嵌入式应用(即电子设备的系统应用),也可以是可下载应用。嵌入式应用是作为电子设备(如手机)实现的一部分提供的应用程序。可下载应用是一个可以提供自己的因特网协议多媒体子系统(internet protocol multimedia subsystem,IMS)连接的应用程序。可下载应用可以预先安装在电子设备中,也可是由用户下载并安装在电子设备中的第三方应用。In addition, the above-mentioned voice assistant may be an embedded application in the electronic device (ie, a system application of the electronic device), or may be a downloadable application. Embedded applications are applications provided as part of the implementation of an electronic device such as a cell phone. A downloadable application is an application that can provide its own internet protocol multimedia subsystem (IMS) connection. The downloadable application may be pre-installed in the electronic device, or may be a third-party application downloaded by the user and installed in the electronic device.
下面将结合附图对本申请实施例的实施方式进行详细描述。The implementation of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
图1为本申请实施例提供的一种语音控制系统的示意图。该语音控制系统可以包括多个电子设备,且多个电子设备满足以下条件中的一个或多个:连接了同一个无线接入点(如WiFi接入点),或登录了同一个账号,或被用户设置在同一个组中,或用户语音交互发生在该多个电子设备的有效工作范围内。FIG. 1 is a schematic diagram of a voice control system according to an embodiment of the present application. The voice control system may include multiple electronic devices, and the multiple electronic devices meet one or more of the following conditions: connected to the same wireless access point (such as a WiFi access point), or logged into the same account, or Set by the user in the same group, or the user's voice interaction occurs within the effective working range of the plurality of electronic devices.
其中,作为一种示例,该语音控制系统可以包括三个电子设备,例如,第一电子设备201、第二电子设备202和第三电子设备203。第一电子设备201、第二电子设备202和第三电子设备203均具备语音控制功能,如均安装有语音助手。Wherein, as an example, the voice control system may include three electronic devices, for example, a first electronic device 201 , a second electronic device 202 and a third electronic device 203 . The first electronic device 201 , the second electronic device 202 and the third electronic device 203 all have a voice control function, for example, a voice assistant is installed.
在一些实施例中,第一电子设备201、第二电子设备202和第三电子设备203唤醒语音助手的唤醒词可以相同,如均为“小E小E”。In some embodiments, the first electronic device 201 , the second electronic device 202 , and the third electronic device 203 can wake up the voice assistant with the same wake-up word, for example, "small E and small E".
示例性的,本申请实施例所述的电子设备,如上述第一电子设备201,第二电子设备202以及第三电子设备203可以为手机、平板电脑、桌面型、膝上型、手持计算机、笔记本电脑、台式电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmentedreality,AR)\虚拟现实(virtual reality,VR)设备、媒体播放器、电视机、智能音箱、智能手表、智能耳机等设备。本申请实施例对电子设备的具体形态不作特殊限制。电子设备的具体结构可以参考图2对应实施例的描述。Exemplarily, the electronic devices described in the embodiments of the present application, such as the first electronic device 201, the second electronic device 202, and the third electronic device 203, may be mobile phones, tablet computers, desktops, laptops, handheld computers, Laptops, desktops, ultra-mobile personal computers (UMPCs), netbooks, and cellular phones, personal digital assistants (PDAs), augmented reality (AR)\virtual reality , VR) devices, media players, TVs, smart speakers, smart watches, smart headphones and other devices. The specific form of the electronic device is not particularly limited in the embodiments of the present application. For the specific structure of the electronic device, reference may be made to the description of the corresponding embodiment in FIG. 2 .
另外,在一些实施例中,上述第一电子设备201,第二电子设备202及第三电子设备203可以为相同类型的电子设备,如第一电子设备201,第二电子设备202及第三电子设备203均为手机。在其他一些实施例中,上述第一电子设备201,第二电子设备202及第三电子设备203可以为不同类型的电子设备,如第一电子设备201为手机,第二电子设备202为智能音箱,第三电子设备203为电视机(如图1所示)。In addition, in some embodiments, the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can be the same type of electronic device, such as the first electronic device 201 , the second electronic device 202 and the third electronic device The devices 203 are all mobile phones. In some other embodiments, the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can be different types of electronic devices, for example, the first electronic device 201 is a mobile phone, and the second electronic device 202 is a smart speaker , the third electronic device 203 is a television (as shown in FIG. 1 ).
在本申请实施例中,通过第一电子设备201,第二电子设备202及第三电子设备203之间不进行跨设备通信直接开始录音的方式,解决多设备场景中语音控制的丢帧问题,提升语音控制的准确率。In the embodiment of the present application, the first electronic device 201, the second electronic device 202 and the third electronic device 203 directly start recording without cross-device communication, so as to solve the frame loss problem of voice control in a multi-device scenario, Improve the accuracy of voice control.
第一电子设备201,第二电子设备202及第三电子设备203各自无需其他设备(例如,中心设备)调用,便可以录音,实现了去中心化的录音方式。这种去中心化的录音方式,无需执行选择一台设备作为调用设备的过程,可以有效消除设备之间通信所产生的时延,提升后续语音控制的准确率。The first electronic device 201 , the second electronic device 202 and the third electronic device 203 can record each other without being called by other devices (eg, a central device), thus realizing a decentralized recording method. This decentralized recording method does not need to perform the process of selecting a device as the calling device, which can effectively eliminate the delay caused by communication between devices and improve the accuracy of subsequent voice control.
之后,基于第一电子设备201,第二电子设备202及第三电子设备203各自的设备信息、各自的录音数据等一个或多个维度,在第一电子设备201,第二电子设备202及第三电子设备203中选择一个或多个电子设备作为最优收音设备。基于最优收音设备的录音数据,对用户输入的语音指令进行响应。本申请实施例通过多设备协同收音,可解决电子设备所拾取的语音指令的音频质量对ASR识别准确率的影响问题。Afterwards, based on one or more dimensions such as the respective device information of the first electronic device 201, the second electronic device 202 and the third electronic device 203, the respective recording data, etc., the first electronic device 201, the second electronic device 202 and the third electronic device 203 Among the three electronic devices 203, one or more electronic devices are selected as the optimal sound-receiving device. Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user. The embodiment of the present application can solve the problem of the influence of the audio quality of the voice command picked up by the electronic device on the accuracy of the ASR recognition by means of multi-device cooperative audio collection.
在一些实施例中,该语音控制系统还可以包括服务器204。该服务器204能够可以提供智能语音服务。In some embodiments, the voice control system may also include a server 204 . The server 204 can provide intelligent voice services.
请参考图2,为本申请实施例提供的一种电子设备的结构示意图。Please refer to FIG. 2 , which is a schematic structural diagram of an electronic device according to an embodiment of the present application.
如图2所示,电子设备可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中,传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。As shown in FIG. 2 , the electronic device may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Speaker 170A, Receiver 170B, Microphone 170C, Headphone Interface 170D, Sensor Module 180, Key 190, Motor 191, Indicator 192, Camera 193, Display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195 and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and an environmental sensor Light sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本实施例示意的结构并不构成对电子设备的具体限定。在另一些实施例中,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device. In other embodiments, the electronic device may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processingunit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor ( image signal processor, ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural-network processing unit (NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
控制器可以是电子设备的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。A controller can be the nerve center and command center of an electronic device. The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuitsound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous  receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purposeinput/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuitsound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver (universal asynchronous receiver) /transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and/or Universal serial bus (universal serial bus, USB) interface, etc.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from the wired charger through the USB interface 130 . In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device. While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 . The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110 . In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.
电子设备的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
天线1和天线2用于发射和接收电磁波信号。电子设备中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in an electronic device can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
移动通信模块150可以提供应用在电子设备上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G etc. applied on the electronic device. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
无线通信模块160可以提供应用在电子设备上的包括无线局域网(wirelesslocal area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。例如,在本申请一些实施例中,无线通信模块160可以与其他电子设备进行交互,如在检测到与唤醒词匹配的语音信号后,向其他电子设备发送检测到的语音信号的能量信息。例如,本申请实施例的电子设备可以通过移动通信模块150和/或无线通信模块160与其他电子设备通信。举例而言,第一电子设备201通过通信模块150和/或无线通信模块160向第二电子设备202发送调用拾音指令等。The wireless communication module 160 can provide applications on electronic devices including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellite systems ( global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify the signal, and convert it into electromagnetic waves for radiation through the antenna 2 . For example, in some embodiments of the present application, the wireless communication module 160 may interact with other electronic devices, for example, after detecting a voice signal matching the wake-up word, send energy information of the detected voice signal to other electronic devices. For example, the electronic device in this embodiment of the present application may communicate with other electronic devices through the mobile communication module 150 and/or the wireless communication module 160 . For example, the first electronic device 201 sends a voice pickup instruction and the like to the second electronic device 202 through the communication module 150 and/or the wireless communication module 160 .
在一些实施例中,电子设备的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code divisionmultiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(globalnavigation satellite system,GLONASS),北斗卫星导航系统(beidou navigationsatellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the electronic device is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc. The GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi-zenith) satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
电子设备通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device realizes the display function through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emittingdiode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrixorganic light emitting diode,AMOLED),柔性发光二极管(flex light-emittingdiode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot lightemitting diodes,QLED)等。在一些实施例中,电子设备可以包括1个或N个显示屏194,N为大于1的正整数。Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). , AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on. In some embodiments, the electronic device may include 1 or N display screens 194 , where N is a positive integer greater than 1.
电子设备可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194 and the application processor.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备可以包括1个或N个摄像头193,N为大于1的正整数。Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device may include 1 or N cameras 193 , where N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device selects the frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, etc.
视频编解码器用于对数字视频压缩或解压缩。电子设备可以支持一种或多种视频编解码器。这样,电子设备可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. An electronic device may support one or more video codecs. In this way, the electronic device can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Through the NPU, applications such as intelligent cognition of electronic devices can be realized, such as image recognition, face recognition, speech recognition, text understanding, etc.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes various functional applications and data processing of the electronic device by executing the instructions stored in the internal memory 121 . The internal memory 121 may include a storage program area and a storage data area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
电子设备可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备可以通过扬声器170A收听音乐,或收听免提通话。 Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device answers a call or a voice message, the voice can be received by placing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息或需要通过语音助手触发电子设备执行某些事件时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备可以设置至少一个麦克风170C。在另一些实施例中,电子设备可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。例如,本申请实施例的电子设备可以通过麦克风170C接收用户输入的语音指令。The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message or needing to trigger the electronic device to perform certain events through the voice assistant, the user can make a sound through the human mouth close to the microphone 170C, and input the sound signal into the microphone 170C. The electronic device may be provided with at least one microphone 170C. In other embodiments, the electronic device may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions. For example, the electronic device in this embodiment of the present application may receive a voice instruction input by the user through the microphone 170C.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备根据压 力传感器180A检测所述触摸操作强度。电子设备也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。The pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals. In some embodiments, the pressure sensor 180A may be provided on the display screen 194 . There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, and the like. The capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device determines the intensity of the pressure based on the change in capacitance. When a touch operation acts on the display screen 194, the electronic device detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device can also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
陀螺仪传感器180B可以用于确定电子设备的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。The gyro sensor 180B can be used to determine the motion attitude of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (ie, the x, y, and z axes) may be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device, calculates the distance to be compensated by the lens module according to the angle, and allows the lens to counteract the shaking of the electronic device through reverse motion to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
气压传感器180C用于测量气压。在一些实施例中,电子设备通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device calculates the altitude from the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
磁传感器180D包括霍尔传感器。电子设备可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备是翻盖机时,电子设备可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。The magnetic sensor 180D includes a Hall sensor. The electronic device can use the magnetic sensor 180D to detect the opening and closing of the flip holster. In some embodiments, when the electronic device is a flip machine, the electronic device can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
加速度传感器180E可检测电子设备在各个方向上(一般为三轴)加速度的大小。当电子设备静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device in various directions (generally three axes). The magnitude and direction of gravity can be detected when the electronic device is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
距离传感器180F,用于测量距离。电子设备可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备可以利用距离传感器180F测距以实现快速对焦。Distance sensor 180F for measuring distance. Electronic devices can measure distances by infrared or laser. In some embodiments, when shooting a scene, the electronic device can use the distance sensor 180F to measure the distance to achieve fast focusing.
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备通过发光二极管向外发射红外光。电子设备使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备附近有物体。当检测到不充分的反射光时,电子设备可以确定电子设备附近没有物体。电子设备可以利用接近光传感器180G检测用户手持电子设备贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes. The light emitting diodes may be infrared light emitting diodes. Electronic devices emit infrared light outward through light-emitting diodes. Electronic devices use photodiodes to detect reflected infrared light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object in the vicinity of the electronic device. When insufficient reflected light is detected, the electronic device can determine that there is no object in the vicinity of the electronic device. The electronic device can use the proximity light sensor 180G to detect that the user holds the electronic device close to the ear to talk, so as to automatically turn off the screen to save power. Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
环境光传感器180L用于感知环境光亮度。电子设备可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备是否在口袋里,以防误触。The ambient light sensor 180L is used to sense ambient light brightness. The electronic device can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device is in the pocket to prevent accidental touch.
指纹传感器180H用于采集指纹。电子设备可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。The fingerprint sensor 180H is used to collect fingerprints. Electronic devices can use the collected fingerprint characteristics to unlock fingerprints, access application locks, take photos with fingerprints, and answer incoming calls with fingerprints.
温度传感器180J用于检测温度。在一些实施例中,电子设备利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备对电池142加热,以避免低温导致电子设备异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备对电池142的输出电压执行升压,以避免低温导致的异常关机。The temperature sensor 180J is used to detect the temperature. In some embodiments, the electronic device utilizes the temperature detected by the temperature sensor 180J to implement a temperature handling strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device may reduce the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device heats the battery 142 to avoid abnormal shutdown of the electronic device caused by the low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备的表面,与显示屏194所处的位置不同。Touch sensor 180K, also called "touch panel". The touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device, which is different from the location where the display screen 194 is located.
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal. In some embodiments, the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone. The audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, so as to realize the voice function. The application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备可以接收按键输入,产生与电子设备的用户设置以及功能控制有关的键信号输入。The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device may receive key input and generate key signal input related to user settings and function control of the electronic device.
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。Motor 191 can generate vibrating cues. The motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. The motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备的接触和分离。电子设备可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备中,不能和电子设备分离。The SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device. The electronic device can support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The electronic device interacts with the network through the SIM card to realize functions such as call and data communication. In some embodiments, the electronic device employs an eSIM, ie: an embedded SIM card. The eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.
以下实施例中的方法均可以在具有上述硬件结构的电子设备中实现。The methods in the following embodiments can all be implemented in an electronic device having the above-mentioned hardware structure.
在本申请实施例中,在上述多设备场景下,通过多个设备之间不进行跨设备通信直接开始录音的方式,解决多设备场景中语音控制的丢帧问题,提升语音控制的准确率。In the embodiment of the present application, in the above-mentioned multi-device scenario, recording is directly started without cross-device communication between multiple devices, so as to solve the problem of frame loss of voice control in the multi-device scenario, and improve the accuracy of voice control.
之后,基于多个电子设备的设备信息、录音数据等一个或多个维度,在多个电子设备中选择一个或多个电子设备作为最优收音设备。基于最优收音设备的录音数据,对用户输入的语音指令进行响应。通过最优收音设备的选择,选择满足拾音最清晰(距离用户最近)、受噪声干扰情况最低(距离噪声源最远)、或SE处理效果最优(麦克风降噪性能最优或支持AEC)中至少一项的电子设备作为拾音入口,供语音助手调用,可有效解决电子设备所拾取的语音指令的音频质量对ASR识别准确率的影响问题。其中,该设备信息可以包括但不限于电子设备的静态属性信息或动态属性信息。该静态属性信息可以包括但不限于设备型号、系统版本、麦克风的能力信息等。该动态属性信息可以包括但不限于电子设备的 电量信息、耳机状态信息、麦克风状态信息、扬声器状态信息、录音数据的音频质量信息等。其中,扬声器状态信息可以用于指示电子设备的扬声器是否被占用。音频质量信息用于表示录音数据的音频质量好坏。音频质量信息的具体形式可以包括音强信息、噪声声强信息、信噪比信息等一项或多项。After that, based on one or more dimensions such as device information and recording data of the multiple electronic devices, one or more electronic devices are selected from the multiple electronic devices as the optimal audio pickup device. Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user. Through the selection of the optimal radio equipment, choose to satisfy the clearest pickup (closest to the user), the least noise interference (farthest away from the noise source), or the best SE processing effect (the best microphone noise reduction performance or support AEC) At least one of the electronic devices is used as a voice pickup entrance for the voice assistant to call, which can effectively solve the problem of the influence of the audio quality of the voice commands picked up by the electronic device on the ASR recognition accuracy. The device information may include, but is not limited to, static attribute information or dynamic attribute information of the electronic device. The static attribute information may include, but is not limited to, device model, system version, microphone capability information, and the like. The dynamic attribute information may include, but is not limited to, power information of the electronic device, headphone status information, microphone status information, speaker status information, audio quality information of the recording data, and the like. The speaker status information may be used to indicate whether the speaker of the electronic device is occupied. The audio quality information is used to indicate whether the audio quality of the recorded data is good or bad. The specific form of the audio quality information may include one or more items such as sound intensity information, noise sound intensity information, and signal-to-noise ratio information.
图3为本申请实施例提供的一种语音控制方法的流程示意图。本实施例以如图1所示的三个电子设备,音箱201,电视机202和手机203为例进行举例说明。如图3所示,本实施例的方法可以包括:FIG. 3 is a schematic flowchart of a voice control method according to an embodiment of the present application. This embodiment is illustrated by taking the three electronic devices shown in FIG. 1 , a speaker 201 , a TV 202 and a mobile phone 203 as examples. As shown in FIG. 3, the method of this embodiment may include:
步骤401、音箱201,电视机202和手机203分别接收用户输入的第一语音指令。Step 401 , the speaker 201 , the television 202 and the mobile phone 203 respectively receive the first voice instruction input by the user.
该第一语音指令用于唤醒电子设备的语音助手。例如,该第一语音指令可以是上述唤醒词“小E小E”。本实施例中,该第一语音指令用于唤醒音箱201,电视机202和手机203各自的语音助手。The first voice instruction is used to wake up the voice assistant of the electronic device. For example, the first voice instruction may be the above-mentioned wake-up word "small E small E". In this embodiment, the first voice command is used to wake up the respective voice assistants of the speaker 201 , the television 202 and the mobile phone 203 .
对于安装有语音助手的电子设备,在该电子设备没有其他软硬件使用麦克风采集语音信号的情况下,电子设备可以通过麦克风实时监测用户是否有语音信号输入。一般情况下,在用户想要使用电子设备的语音控制功能时,可以在电子设备的拾音范围内发声,以将发出的声音输入到麦克风。此时,若电子设备没有其他软硬件正在使用麦克风采集语音信号,则电子设备可以通过麦克风监测到对应的语音信号,如第一语音指令。For an electronic device installed with a voice assistant, if the electronic device has no other software or hardware to use a microphone to collect voice signals, the electronic device can monitor whether the user has a voice signal input in real time through the microphone. Generally, when a user wants to use the voice control function of the electronic device, he or she can make a sound within the sound pickup range of the electronic device, so as to input the emitted sound into the microphone. At this time, if no other software or hardware of the electronic device is using the microphone to collect the voice signal, the electronic device can monitor the corresponding voice signal, such as the first voice command, through the microphone.
例如,结合图4所示,用户在想要使用语音控制功能时,可以说出唤醒词“小E小E”。如用户的发声位置位于音箱201,电视机202和手机203各自的拾音范围内,且均没有其他软硬件正在使用麦克风采集语音信号,则音箱201,电视机202和手机203便可通过各自的麦克风检测到唤醒词“小E小E”对应的第一语音指令。For example, as shown in FIG. 4 , when the user wants to use the voice control function, he can say the wake-up word "small E, small E". If the sounding position of the user is located within the respective pickup ranges of the speaker 201, the TV 202 and the mobile phone 203, and no other software or hardware is using the microphone to collect the voice signal, the speaker 201, the TV 202 and the mobile phone 203 can pass their respective voice signals. The microphone detects the first voice instruction corresponding to the wake-up word "small E small E".
步骤402、响应于第一语音指令,音箱201,电视机202和手机203分别唤醒各自的语音助手,并开始录音。Step 402 , in response to the first voice command, the speaker 201 , the TV 202 and the mobile phone 203 wake up their respective voice assistants and start recording.
当电子设备检测到该第一语音指令时,响应于该第一语音指令,电子设备唤醒语音助手。一种示例,在电子设备接收到上述第一语音指令后,可以对该第一语音指令进行校验,即判断接收到的该第一语音指令是否是注册在电子设备中的唤醒词。如果校验通过,则表明接收到的第一语音指令是唤醒词,唤醒语音助手。如果校验未通过,则表明接收到的第一语音指令不是唤醒词,此时电子设备可以不唤醒语音助手,即保持语音助手的休眠状态。When the electronic device detects the first voice command, in response to the first voice command, the electronic device wakes up the voice assistant. In an example, after the electronic device receives the above-mentioned first voice command, the first voice command can be checked, that is, it is determined whether the received first voice command is a wake-up word registered in the electronic device. If the verification is passed, it indicates that the received first voice command is a wake-up word, which wakes up the voice assistant. If the verification fails, it indicates that the received first voice command is not a wake-up word, and the electronic device may not wake up the voice assistant at this time, that is, keep the voice assistant in a dormant state.
本实施例中,当音箱201,电视机202和手机203分别检测到第一语音指令时,音箱201,电视机202和手机203分别唤醒各自的语音助手,并开始录音。音箱201,电视机202和手机203分别开始录音后,可以通过各自的麦克风检测用户是否输入其他语音指令,当检测到用户输入的其他语音指令时,生成录音数据,并保存在自身设备中。In this embodiment, when the speaker 201, the TV 202 and the mobile phone 203 detect the first voice command respectively, the speaker 201, the TV 202 and the mobile phone 203 wake up their respective voice assistants and start recording. After the speaker 201, the TV 202 and the mobile phone 203 start recording respectively, they can detect whether the user inputs other voice commands through their respective microphones, and when detecting other voice commands input by the user, generate recording data and save them in their own devices.
例如,结合图3和图4所示,音箱201,电视机202和手机203在开始录音之后,分别接收到用户输入的第二语音指令。例如,以用户说出的第二语音指令为“播放歌曲112222”为例。音箱201,电视机202和手机203分别对第二语音指令进行录音,生成各自的录音数据,该录音数据的内容即为“播放歌曲112222”。For example, as shown in FIG. 3 and FIG. 4 , after the sound box 201 , the television set 202 and the mobile phone 203 start recording, they respectively receive the second voice instruction input by the user. For example, take the second voice command spoken by the user as "play song 112222" as an example. The speaker 201, the TV 202 and the mobile phone 203 respectively record the second voice command to generate their own recording data, and the content of the recording data is "play song 112222".
需要说明的是,一种可实现方式,录音数据可以是每录制0.5s,生成录音数据。其中,0.5还可以是其他数值,例如,0.6,1等,本申请实施例不一一举例说明。保存录音数据时,可以是使用新的录音数据覆盖之前的录音数据,也可以是不使用新的录音数据覆盖之前的 录音数据,保存之前的录音数据和新的录音数据。本申请实施例以保存之前的录音数据和新的录音数据为例进行举例说明。It should be noted that, in an achievable manner, the recording data may be recorded every 0.5s to generate the recording data. Wherein, 0.5 may also be other numerical values, for example, 0.6, 1, etc., which are not described one by one in the embodiments of the present application. When saving the recording data, you can overwrite the previous recording data with the new recording data, or you can save the previous recording data and the new recording data without overwriting the previous recording data with the new recording data. The embodiment of the present application uses saving the previous recording data and the new recording data as an example for illustration.
在一些实施例中,电子设备还可以根据录音数据,确定录音数据对应的音频质量信息。换言之,电子设备还对自身的录音数据进行质量评价。如上所述音频质量信息可以包括音强信息、噪声声强信息、信噪比信息等一项或多项。In some embodiments, the electronic device may further determine audio quality information corresponding to the recorded data according to the recorded data. In other words, the electronic device also evaluates the quality of its own recording data. As mentioned above, the audio quality information may include one or more items of sound intensity information, noise sound intensity information, and signal-to-noise ratio information.
以本实施例的三个电子设备为例,音箱201,电视机202和手机203可以分别对各自的录音数据进行质量评价,确定各自的录音数据对应的音频质量信息。Taking the three electronic devices in this embodiment as an example, the speaker 201 , the TV 202 and the mobile phone 203 can respectively perform quality evaluation on the respective recording data, and determine the audio quality information corresponding to the respective recording data.
步骤403、音箱201,电视机202和手机203分别执行应答设备选择,确定应答设备,应答设备播放第一语音指令对应的应答语音。Step 403 , the speaker 201 , the TV 202 and the mobile phone 203 respectively execute the selection of the answering device, determine the answering device, and the answering device plays the answering voice corresponding to the first voice command.
其中,步骤402和步骤403的执行顺序不以序号大小作为限制,其还可以是其他执行顺序。例如,在开始录音的同时,执行应答设备选择。Wherein, the execution order of step 402 and step 403 is not limited by the size of the serial number, and other execution sequences may also be used. For example, an answering device selection is performed while recording is started.
本实施例的应答设备是指用于播放与用户输入的语音指令对应的应答语音。例如,应答设备播放第一语音指令对应的应答语音,即唤醒应答语音,如“我在”。而其他不作为应答设备的电子设备虽然唤醒了语音助手,但不播放与用户输入的语音指令对应的应答语音。The answering device in this embodiment is used to play the answering voice corresponding to the voice command input by the user. For example, the answering device plays the answer voice corresponding to the first voice command, that is, the wake-up answer voice, such as "I'm here". While other electronic devices that are not used as answering devices wake up the voice assistant, but do not play the answering voice corresponding to the voice command input by the user.
电子设备可以基于第一语音指令对应的音频质量信息,执行应答设备选择,确定应答设备。一种可实现方式,电子设备可以对接收到的第一语音指令进行质量评价,确定自身接收到的第一语音指令对应的音频质量信息,并广播自身接收到的第一语音指令对应的音频质量信息和自身的设备信息。电子设备接收到其他电子设备广播的自身接收到的第一语音指令对应的音频质量信息和自身的设备信息。电子设备根据所有电子设备的音频质量信息和设备信息,从中选择一个电子设备作为应答设备。例如,选择音频质量最好的电子设备作为应答设备。The electronic device may select an answering device based on the audio quality information corresponding to the first voice command to determine an answering device. In one possible implementation, the electronic device can evaluate the quality of the received first voice command, determine the audio quality information corresponding to the first voice command received by itself, and broadcast the audio quality corresponding to the first voice command received by itself. information and its own device information. The electronic device receives audio quality information and its own device information corresponding to the first voice instruction received by itself and broadcast by other electronic devices. The electronic device selects one electronic device as the answering device according to the audio quality information and device information of all the electronic devices. For example, choose the electronic device with the best audio quality as the answering device.
结合步骤402中的示例,在音箱201检测到第一语音指令时,音箱201还可以对第一语音指令进行质量评价,确定音箱201接收到的第一语音指令对应的音频质量信息,并广播音箱201接收到的第一语音指令对应的音频质量信息和音箱201的设备信息。类似的处理方式,在电视机202检测到第一语音指令时,电视机202还可以对第一语音指令进行质量评价,确定电视机202接收到的第一语音指令对应的音频质量信息,并广播电视机202接收到的第一语音指令对应的音频质量信息和电视机202的设备信息。在手机203检测到第一语音指令时,手机203还可以对第一语音指令进行质量评价,确定手机203接收到的第一语音指令对应的音频质量信息,并广播手机203接收到的第一语音指令对应的音频质量信息和手机203的设备信息。这样,音箱201可以接收到电视机202和手机203的第一语音指令对应的音频质量信息和设备信息,音箱201根据音箱201、电视机202和手机203的第一语音指令对应的音频质量信息和设备信息,在音箱201、电视机202和手机203中选择一个电子设备作为应答设备。类似的,电视机202可以接收到音箱201和手机203的第一语音指令对应的音频质量信息和设备信息,电视机202根据音箱201、电视机202和手机203的第一语音指令对应的音频质量信息和设备信息,在音箱201、电视机202和手机203中选择一个电子设备作为应答设备。手机203可以接收到音箱201和电视机202的第一语音指令对应的音频质量信息和设备信息,手机203根据音箱201、电视机202和手机203的第一语音指令对应的音频质量信息和设备信息,在音箱201、电视机202和手机 203中选择一个电子设备作为应答设备。这里,以音箱201、电视机202和手机203均确定音箱201为应答设备作为示例性说明。Combined with the example in step 402, when the speaker 201 detects the first voice command, the speaker 201 can also evaluate the quality of the first voice command, determine the audio quality information corresponding to the first voice command received by the speaker 201, and broadcast the speaker. The audio quality information corresponding to the first voice command received by 201 and the device information of the speaker 201 . Similar processing method, when the TV set 202 detects the first voice command, the TV set 202 can also perform quality evaluation on the first voice command, determine the audio quality information corresponding to the first voice command received by the TV set 202, and broadcast it. The audio quality information corresponding to the first voice command received by the TV set 202 and the device information of the TV set 202 . When the mobile phone 203 detects the first voice command, the mobile phone 203 can also evaluate the quality of the first voice command, determine the audio quality information corresponding to the first voice command received by the mobile phone 203, and broadcast the first voice received by the mobile phone 203. The audio quality information corresponding to the instruction and the device information of the mobile phone 203 are specified. In this way, the speaker 201 can receive the audio quality information and device information corresponding to the first voice command of the TV 202 and the mobile phone 203, and the speaker 201 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201, the TV 202 and the mobile phone 203 according to the Device information, select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device. Similarly, the TV 202 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201 and the mobile phone 203 , and the TV 202 can receive the audio quality information corresponding to the first voice command of the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality Information and device information, select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device. The mobile phone 203 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201 and the TV 202, and the mobile phone 203 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201, the TV 202 and the mobile phone 203 , select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device. Here, the speaker 201, the television 202, and the mobile phone 203 are all determined to be the answering device as an exemplary illustration.
例如,如图4所示,音箱201作为应答设备,播放唤醒应答语音,如“我在”。而电视机202和手机203不播放唤醒应答语音,但是如上步骤402所述电视机202和手机203各自的语音助手处于唤醒状态,并且可以录音。For example, as shown in FIG. 4 , the speaker 201 acts as an answering device and plays a wake-up answering voice, such as "I am here". The TV set 202 and the mobile phone 203 do not play the wake-up response voice, but the voice assistants of the TV set 202 and the mobile phone 203 are in the wake-up state as described in step 402 above, and can record.
需要说明的是,在执行应答设备选择过程中,还可以结合其他信息选择应答设备,例如各个电子设备的优先级等。另外,执行应答设备选择的具体实现方式也可以采用其他方式,本申请实施例不以上述方式作为限制。例如,可以采用用户上一次使用过程中的应答设备或者用户设置的应答设备作为本实施例的应答设备。It should be noted that, during the selection process of the answering device, the answering device may also be selected in combination with other information, such as the priority of each electronic device. In addition, the specific implementation manner of performing the selection of the answering device may also adopt other manners, and this embodiment of the present application does not limit the foregoing manner. For example, the answering device in the last use process of the user or the answering device set by the user may be used as the answering device in this embodiment.
步骤404、音箱201分别向电视机202和手机203调用拾音指令,该拾音指令用于指示返回录音数据。Step 404 , the speaker 201 calls the voice pickup instruction to the TV set 202 and the mobile phone 203 respectively, and the voice pickup instruction is used to instruct to return the recording data.
在上述步骤403之后,音箱201开始执行分布式收音任务。应答设备可以分别向其他非应答设备调用拾音指令,该拾音指令用于指示非应答设备向应答设备返回录音数据。After the above-mentioned step 403, the speaker 201 starts to perform the distributed sound collection task. The answering device can respectively call the pickup instruction to other non-answering devices, and the pickup instruction is used to instruct the non-answering device to return the recording data to the answering device.
结合上述步骤的示例,音箱201的语音助手可以调用电视机202的语音助手与音箱201的语音助手之间的接口,以向电视机202传递拾音指令。音箱201的语音助手可以调用手机203的语音助手与音箱201的语音助手之间的接口,以向音箱201传递拾音指令。该拾音指令可以携带应答设备的标识信息。应答设备的标识信息可以是应答设备的媒体访问控制(media access control,MAC)地址。例如,该拾音指令可以携带音箱201的标识信息,以指示电视机202向音箱201返回录音数据。Combined with the examples of the above steps, the voice assistant of the speaker 201 can call the interface between the voice assistant of the television 202 and the voice assistant of the speaker 201 to transmit the voice pickup instruction to the television 202 . The voice assistant of the speaker 201 can call the interface between the voice assistant of the mobile phone 203 and the voice assistant of the speaker 201 to transmit a voice pickup instruction to the speaker 201 . The pickup instruction may carry the identification information of the answering device. The identification information of the answering device may be a media access control (media access control, MAC) address of the answering device. For example, the voice pickup instruction may carry the identification information of the speaker 201 to instruct the television 202 to return the recording data to the speaker 201 .
步骤405、电视机202和手机203分别向音箱201发送录音数据。Step 405 , the television 202 and the mobile phone 203 respectively send the recording data to the speaker 201 .
应答设备接收其他非应答设备发送的录音数据。其他非应答设备在发送各自录音数据后,可以继续录音,并向应答设备发送新的录音数据。The answering device receives recorded data sent by other non-answering devices. After other non-answering devices send their own recording data, they can continue recording and send new recording data to the answering device.
结合上述步骤的示例,电视机202向音箱201发送电视机202的录音数据。手机203向音箱201发送手机203的录音数据。该录音数据可以包括上述第二语音指令。例如,该录音数据的内容为“播放歌曲112222”。Combined with the example of the above steps, the television 202 sends the audio recording data of the television 202 to the speaker 201 . The mobile phone 203 sends the recording data of the mobile phone 203 to the speaker 201 . The recorded data may include the above-mentioned second voice instruction. For example, the content of the recording data is "play song 112222".
一种可实现方式,音箱201对接收到的电视机202的录音数据进行质量评价,确定电视机202的录音数据对应的音频质量信息。音箱201对接收到的手机203的录音数据进行质量评价,确定手机203的录音数据对应的音频质量信息。In a possible implementation manner, the speaker 201 performs quality evaluation on the received recording data of the TV set 202 , and determines the audio quality information corresponding to the recording data of the TV set 202 . The speaker 201 performs quality evaluation on the received recording data of the mobile phone 203 , and determines the audio quality information corresponding to the recording data of the mobile phone 203 .
另一种可实现方式,音箱201还可以接收电视机202发送的电视机202的录音数据对应的音频质量信息。音箱201还可以接收手机203发送的手机203的录音数据对应的音频质量信息。In another implementation manner, the speaker 201 may also receive audio quality information corresponding to the recording data of the television set 202 sent by the television set 202 . The speaker 201 can also receive audio quality information corresponding to the recording data of the mobile phone 203 sent by the mobile phone 203 .
步骤406、音箱201根据音频质量信息,在音箱201、电视机202和手机203中确定最优收音设备,并根据最优收音设备的录音数据,播放第二语音指令对应的应答语音。Step 406 , the speaker 201 determines the optimal radio device in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information, and plays the response voice corresponding to the second voice command according to the recording data of the optimal radio device.
应答设备根据多个电子设备(包括自身和其他非应答设备)的录音数据对应的音频质量信息,从多个电子设备中选择一个最优收音设备,使用该最优收音设备的录音数据,进行SE、ASR等处理,以正确识别用户输入的语音指令,进而对用户输入的语音指令进行准确响应。其中,对用户输入的语音指令进行准确响应包括播放用户输入的语音指令对应的应答语音。在一些实施例中,对用户输入的语音指令进行准确响应还可以包括触发应答设备或其他非应答设备执行语音指令对应的事件。该事件可以是播放歌曲、播放视频、拨 打电话等。The answering device selects an optimal radio device from multiple electronic devices according to the audio quality information corresponding to the recording data of multiple electronic devices (including itself and other non-responding devices), and uses the recording data of the optimal radio device to perform SE. , ASR, etc., to correctly identify the voice command input by the user, and then accurately respond to the voice command input by the user. The accurate response to the voice command input by the user includes playing the response voice corresponding to the voice command input by the user. In some embodiments, the accurate response to the voice command input by the user may further include triggering the answering device or other non-responding device to execute an event corresponding to the voice command. The event could be playing a song, playing a video, making a call, etc.
需要说明的是,在一些实施例中,音箱201也可以将最优收音设备的录音数据发送给如图1所示的服务器204,由服务器204使用该最优设备的录音数据,进行SE、ASR等处理,以正确识别用户输入的语音指令,进而对用户输入的语音指令进行准确响应。It should be noted that, in some embodiments, the speaker 201 may also send the recording data of the optimal radio device to the server 204 shown in FIG. 1 , and the server 204 uses the recording data of the optimal device to perform SE, ASR and other processing, so as to correctly recognize the voice command input by the user, and then make an accurate response to the voice command input by the user.
例如,结合图4和图5所示,用户虽然距离手机203最近,但是由于用户在使用吹风机205,该吹风机205会产生噪音,影响手机203的收音质量。本实施例的音箱201根据音箱201、电视机202和手机203的录音数据的音频质量信息,在音箱201、电视机202和手机203中确定最优收音设备为音箱201。例如,如图5所示,音箱201可以播放应答语音“将在这里为您播放歌曲112222”。歌曲112222的多媒体资源可以由服务器204或手机203提供。For example, as shown in FIG. 4 and FIG. 5 , although the user is closest to the mobile phone 203 , because the user is using the hair dryer 205 , the hair dryer 205 will generate noise, which affects the sound quality of the mobile phone 203 . The speaker 201 in this embodiment determines the speaker 201 as the optimal sound-receiving device among the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information of the recording data of the speaker 201 , the TV 202 and the mobile phone 203 . For example, as shown in FIG. 5, the speaker 201 can play the answering voice "Song 112222 will be played for you here". The multimedia resource of the song 112222 can be provided by the server 204 or the mobile phone 203 .
可选的,另一种可实现方式,音箱201还可以根据自身的录音数据和最优收音设备的录音数据,播放第二语音指令对应的应答语音。例如,音箱201可以将自身的录音数据和最优收音设备的录音数据进行拼接,基于拼接后的录音数据,播放第二语音指令对应的应答语音。Optionally, in another implementation manner, the speaker 201 may also play the response voice corresponding to the second voice command according to its own recording data and the recording data of the optimal audio recording device. For example, the speaker 201 can splicing its own recording data and the recording data of the optimal audio-receiving device, and plays the response voice corresponding to the second voice command based on the spliced recording data.
可选的,在步骤406之后,还可以再次执行步骤404至步骤406,以采用类似的方式对新的录音数据进行处理,以正确识别用户输入的新的语音指令,进而对用户输入的新的语音指令进行准确响应。Optionally, after step 406, steps 404 to 406 may also be performed again to process the new recording data in a similar manner, so as to correctly identify the new voice command input by the user, and then perform the processing on the new voice command input by the user. Voice commands for accurate responses.
可选的,在一些实施例中,本申请实施例的语音控制方法还可以通过如下步骤对新的录音数据进行处理。Optionally, in some embodiments, the voice control method of the embodiment of the present application may further process the new recording data through the following steps.
步骤407、音箱201向电视机202和手机203分别发送停止录音指令。Step 407 , the speaker 201 sends a stop recording instruction to the TV 202 and the mobile phone 203 respectively.
应答设备向其他非应答设备发送停止录音指令,该停止录音指令用于指示停止录音,并丢弃录音数据。The answering device sends a stop recording instruction to other non-answering devices, and the stop recording instruction is used to instruct to stop recording and discard the recording data.
步骤408、电视机202和手机203分别停止录音,并丢弃录音数据。In step 408, the television 202 and the mobile phone 203 respectively stop recording, and discard the recording data.
其他非应答设备基于停止录音指令停止录音,以减少功率消耗。Other non-answering devices stop recording based on the stop recording command to reduce power consumption.
例如,音箱201向电视机202和手机203分别发送停止录音指令。电视机202和手机203分别停止录音,并丢弃录音数据。例如,丢弃第二语音指令对应的录音数据。之后,由音箱201接收用户输入的新的语音指令。例如,以用户说出的第三语音指令为“换一首歌”为例。音箱201对第三语音指令进行录音,生成录音数据,该录音数据的内容即为“换一首歌”。音箱201使用该录音数据,进行SE、ASR等处理,以正确识别用户输入的语音指令,进而对用户输入的语音指令进行准确响应。例如,音箱201可以播放应答语音“好的,为您切换歌曲”,播放切换后的歌曲。For example, the speaker 201 sends a stop recording instruction to the TV 202 and the mobile phone 203 respectively. The television set 202 and the mobile phone 203 respectively stop recording and discard the recording data. For example, the recorded data corresponding to the second voice instruction is discarded. After that, the speaker 201 receives a new voice command input by the user. For example, take the third voice command spoken by the user as "change a song" as an example. The speaker 201 records the third voice instruction to generate recording data, and the content of the recording data is "change a song". The speaker 201 uses the recorded data to perform processing such as SE, ASR, etc., so as to correctly recognize the voice command input by the user, and then accurately respond to the voice command input by the user. For example, the speaker 201 can play the response voice "OK, switch songs for you", and play the switched songs.
需要说明的是,本实施例以应答设备和最优收音设备均为音箱201为例进行举例说明,应答设备和最优收音设备可以是同一设备,也可以是不同设备,例如,应答设备为音箱201,最优收音设备为电视机202,本申请实施例不以上述举例作为限定。当应答设备和最优收音设备是不同设备时,应答设备可以调用最优收音设备的录音数据。It should be noted that, in this embodiment, the answering device and the optimal radio device are both the speaker 201 as an example for illustration. The answering device and the optimal radio device may be the same device or different devices. For example, the answering device is a speaker. 201, the optimal radio device is a television set 202, and the embodiments of the present application are not limited by the above examples. When the answering device and the optimal radio device are different devices, the answering device can call the recording data of the optimal radio device.
在一些实施例中,当应答设备接收到的语音指令用于关闭语音助手时,应答设备可以停止调用其他非应答设备的录音数据,然后停止自身分布式收音任务,丢弃录音数据。In some embodiments, when the voice command received by the answering device is used to turn off the voice assistant, the answering device can stop calling the recording data of other non-answering devices, and then stop its own distributed voice recording task, and discard the recorded data.
本申请实施例,在多个电子设备分别接收到用户输入的第一语音指令时,多个电子设备分别唤醒各自的语音助手,并开始录音,该第一语音指令用于唤醒电子设备的语音助手。 在多个电子设备协商确定应答设备之后,应答设备可以根据各个电子设备的录音数据,确定最优收音设备,根据最优收音设备的录音数据,播放第二语音指令对应的应答语音。与通过中心设备调用后开始录音的方式不同,本实施例通过从电子设备唤醒后直接开始录音,不再依赖中心设备调用,实现了去中心化的协同收音方式。在没有确定应答设备之前已经开始录音,使用录音数据,进行SE、ASR等处理,有效消除设备之间的通信时延,从而解决多设备场景中因时延导致的语音控制的丢帧问题。In this embodiment of the present application, when multiple electronic devices respectively receive the first voice command input by the user, the multiple electronic devices wake up their respective voice assistants and start recording, and the first voice command is used to wake up the voice assistant of the electronic device. . After multiple electronic devices negotiate and determine the answering device, the answering device can determine the optimal radio device according to the recording data of each electronic device, and play the response voice corresponding to the second voice command according to the recording data of the optimal radio device. Different from the method of starting recording after being called by the central device, this embodiment realizes a decentralized collaborative recording method by directly starting the recording after waking up from the electronic device, and no longer relying on the central device to call. Before the answering device is determined, the recording has been started, and the recording data is used for SE, ASR and other processing, which effectively eliminates the communication delay between devices, thereby solving the problem of frame loss in voice control caused by delay in multi-device scenarios.
通过使用最优收音设备的录音数据,进行SE、ASR等处理,可以正确识别用户输入的语音指令,进而对用户输入的语音指令进行准确响应,提升语音控制的准确率。By using the recording data of the optimal radio equipment and processing SE, ASR, etc., the voice commands input by the user can be correctly recognized, and then the voice commands input by the user can be accurately responded to, and the accuracy of voice control can be improved.
通过将唤醒和收音两个过程结合起来,使得音频录制提前开始,电子设备可以对自身的录音数据进行质量评价,进而可以加快电子设备的音频评价的速度,缩短后续决策最优收音设备所需时间,加快语音控制方法处理流程,提升语音控制响应速度。By combining the two processes of wake-up and radio, the audio recording can be started in advance, and the electronic device can evaluate the quality of its own recording data, which can speed up the audio evaluation of the electronic device and shorten the time required for subsequent decision-making on the optimal radio device. , to speed up the processing flow of the voice control method and improve the response speed of the voice control.
需要说明的是,上述图3实施例以通过唤醒词唤醒语音助手,并开始录音为例进行举例说明,本申请实施例不以此作为限制,本申请实施例也可以没有上述唤醒过程,通过其他方式触发电子设备录音,并基于多设备协同收音,提升语音控制的准确率。举例而言,该其他方式可以是电子设备检测到人声,或者电子设备检测到特定用户的声音等,本申请实施例不一一举例说明。对于没有上述唤醒过程触发电子设备录音,实现语音控制方法的具体实现方式,与图3所示实施例类似,例如,开始录音之后,应答设备调用拾音指令,非应答设备返回录音数据,应答设备根据各个电子设备的录音数据,确定最优收音设备,根据最优收音设备的录音数据,播放第二语音指令对应的应答语音。其实现原理和技术效果可以参见上述实施例的解释说明。It should be noted that the above-mentioned embodiment of FIG. 3 uses the wake-up word to wake up the voice assistant and start recording as an example for illustration. The embodiment of the present application is not limited by this. The embodiment of the present application may also not have the above wake-up process. The method triggers the recording of the electronic device, and based on the multi-device collaborative radio, the accuracy of the voice control is improved. For example, the other manner may be that the electronic device detects a human voice, or the electronic device detects the voice of a specific user, etc., which are not described one by one in the embodiments of the present application. The specific implementation of the voice control method without the above wake-up process triggering the recording of the electronic device is similar to the embodiment shown in FIG. 3 . For example, after the recording starts, the answering device calls the voice pickup instruction, the non-responding device returns the recorded data, and the answering device returns the recording data. According to the recording data of each electronic device, the optimal radio device is determined, and the response voice corresponding to the second voice command is played according to the recorded data of the optimal radio device. For the realization principle and technical effect, reference may be made to the explanations of the above-mentioned embodiments.
图6为本申请实施例提供的另一种语音控制方法的流程示意图。本实施例以如图1所示的三个电子设备,音箱201,电视机202和手机203,且应答设备为音箱201为例进行举例说明。本实施例为电子设备唤醒之后的非第一次调用,例如,语音助手的多轮对话的第二次调用、第三次调用、第四次调用等。如图6所示,本实施例的方法可以包括:FIG. 6 is a schematic flowchart of another voice control method provided by an embodiment of the present application. This embodiment is illustrated by taking the three electronic devices shown in FIG. 1 , a speaker 201 , a television 202 and a mobile phone 203 , and the answering device being the speaker 201 as an example. This embodiment is not the first invocation after the electronic device wakes up, for example, the second invocation, the third invocation, and the fourth invocation of the multi-round dialogue of the voice assistant. As shown in FIG. 6 , the method of this embodiment may include:
步骤701、音箱201分别向电视机202和手机203调用多轮对话暂停指令,该多轮对话暂停指令用于指示多轮对话暂时停止。Step 701 , the speaker 201 respectively invokes a multi-round dialogue pause instruction to the TV set 202 and the mobile phone 203 , and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue pause instruction to temporarily stop.
应答设备在预设时间段内未检测到用户输入的新的语音指令,即用户输入的语音指令之间存在时间间隔。应答设备检测到该时间间隔,触发多轮对话暂停操作。应答设备可以分别向其他非应答设备调用多轮对话暂停指令,该多轮对话暂停指令用于指示多轮对话暂时停止。The answering device does not detect a new voice command input by the user within a preset time period, that is, there is a time interval between voice commands input by the user. The answering device detects this time interval and triggers multiple rounds of dialogue pause operations. The answering device may respectively call other non-answering devices a multi-round dialogue pause instruction, where the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop.
例如,音箱201的语音助手可以调用电视机202的语音助手与音箱201的语音助手之间的接口,以向电视机202传递多轮对话暂时停止指令。音箱201的语音助手可以调用手机203的语音助手与音箱201的语音助手之间的接口,以向音箱201传递多轮对话暂时停止指令。音箱201删除之前保存的录音数据,并继续保持录音。For example, the voice assistant of the speaker 201 may invoke the interface between the voice assistant of the television 202 and the voice assistant of the speaker 201 to transmit a multi-round dialogue temporary stop instruction to the television 202 . The voice assistant of the speaker box 201 can call the interface between the voice assistant of the mobile phone 203 and the voice assistant of the speaker box 201 , so as to transmit to the speaker box 201 an instruction to temporarily stop multiple rounds of conversations. The speaker 201 deletes the previously saved recording data and continues to keep the recording.
步骤702、电视机202和手机203分别删除各自保存的录音数据,并分别保持录音。In step 702, the television 202 and the mobile phone 203 respectively delete the recorded recording data and keep the recording respectively.
电视机202和手机203分别删除调用多轮对话暂停指令之前的录音数据,并继续保持录音。The television set 202 and the mobile phone 203 respectively delete the recording data before invoking the multi-round dialogue pause instruction, and continue to keep the recording.
步骤703、音箱201,电视机202和手机203分别接收用户输入的第四语音指令,并分别对该第四语音指令进行录音,生成各自的录音数据。Step 703 , the speaker 201 , the television 202 and the mobile phone 203 respectively receive the fourth voice command input by the user, and record the fourth voice command respectively to generate respective recording data.
在一些实施例中,音箱201,电视机202和手机203还可以分别对各自接收到的录音数据进行质量评价,确定各自接收到的录音数据对应的音频质量信息。In some embodiments, the speaker 201 , the TV 202 and the mobile phone 203 may further perform quality evaluation on the respective received recording data, and determine the audio quality information corresponding to the respective received recording data.
例如,如图7所示,以用户说出的第四语音指令可以是“播放电影333333”为例。音箱201,电视机202和手机203分别对第四语音指令进行录音,生成各自的录音数据,该录音数据的内容即为“播放电影333333”。For example, as shown in FIG. 7 , the fourth voice command spoken by the user may be “play movie 333333” as an example. The speaker 201 , the TV 202 and the mobile phone 203 respectively record the fourth voice command to generate respective recording data, and the content of the recording data is "play movie 333333".
步骤704、音箱201分别向电视机202和手机203调用拾音指令,该拾音指令用于指示返回录音数据。In step 704, the speaker 201 calls the voice pickup instruction to the TV set 202 and the mobile phone 203 respectively, and the voice pickup instruction is used to instruct to return the recording data.
在上述步骤703之后,音箱201开始重新执行分布式收音任务。应答设备可以分别向其他非应答设备调用拾音指令,该拾音指令用于指示非应答设备向应答设备返回录音数据。After the above step 703, the speaker 201 starts to perform the distributed sound collection task again. The answering device can respectively call the pickup instruction to other non-answering devices, and the pickup instruction is used to instruct the non-answering device to return the recording data to the answering device.
步骤705、电视机202和手机203分别向音箱201发送录音数据。Step 705 , the television 202 and the mobile phone 203 respectively send the recording data to the speaker 201 .
结合上述步骤的示例,电视机202向音箱201发送电视机202的录音数据。手机203向音箱201发送手机203的录音数据。例如,该录音数据的内容为“播放电影333333”。Combined with the example of the above steps, the television 202 sends the audio recording data of the television 202 to the speaker 201 . The mobile phone 203 sends the recording data of the mobile phone 203 to the speaker 201 . For example, the content of the audio recording data is "play movie 333333".
步骤706、音箱201根据音频质量信息,在音箱201、电视机202和手机203中确定最优收音设备,并根据最优收音设备的录音数据,对第四语音指令进行响应。Step 706 , the speaker 201 determines the optimal radio device in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information, and responds to the fourth voice command according to the recording data of the optimal radio device.
应答设备根据多个电子设备(包括自身和其他非应答设备)的录音数据对应的音频质量信息,从多个电子设备中选择一个最优收音设备,使用该最优收音设备的录音数据,进行SE、ASR等处理,以正确识别用户输入的语音指令,进而对用户输入的语音指令进行准确响应。其中,对用户输入的语音指令进行准确响应包括播放用户输入的语音指令对应的应答语音。在一些实施例中,对用户输入的语音指令进行准确响应还可以包括触发应答设备或其他非应答设备执行语音指令对应的事件。该事件可以是播放歌曲、播放视频、拨打电话等。The answering device selects an optimal radio device from multiple electronic devices according to the audio quality information corresponding to the recording data of multiple electronic devices (including itself and other non-responding devices), and uses the recording data of the optimal radio device to perform SE. , ASR, etc., to correctly identify the voice command input by the user, and then accurately respond to the voice command input by the user. The accurate response to the voice command input by the user includes playing the response voice corresponding to the voice command input by the user. In some embodiments, the accurate response to the voice command input by the user may further include triggering the answering device or other non-responding device to execute an event corresponding to the voice command. The event can be playing a song, playing a video, making a call, etc.
例如,结合图7所示,本实施例的音箱201根据音箱201、电视机202和手机203的录音数据的音频质量信息,在音箱201、电视机202和手机203中确定最优收音设备为音箱201。例如,如图7所示,音箱201可以播放应答语音“将在电视机上播放播放电影333333”,并由电视机202开始播放电影333333。For example, as shown in FIG. 7 , the speaker 201 of this embodiment determines that the optimal sound-receiving device is the speaker in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information of the recording data of the speaker 201 , the TV 202 and the mobile phone 203 . 201. For example, as shown in FIG. 7 , the speaker 201 can play the response voice "The movie 333333 will be played on the TV", and the TV 202 starts to play the movie 333333.
之后如果用户再次触发多轮对话暂停,则可以重新执行上述步骤701至步骤706。在这个过程中,最优收音设备可以发生变化。例如,结合图7所示示例,电视机开始播放电影之后,以用户说出的第五语音指令可以是“声音小点”为例。音箱201,电视机202和手机203分别对第五语音指令进行录音,生成各自的录音数据,该录音数据的内容即为“声音小点”。之后,通过上述步骤所涉及的流程,在音箱201、电视机202和手机203中确定最优收音设备为电视机202。音箱201可以基于电视机202的录音数据,对第五语音指令进行响应。本实施例可以在用户的环境发生改变时,根据录音效果选择不同的设备进行收音。例如,电视机202开始播放电影后,用户家中出现强烈自噪(如电影播放过程中产生的声音),此时音箱201的语音助手的收音也会混入电视机播放的语句,如果使用音箱201的录音数据会造成ASR识别错误,本实施例的语音控制方法通过动态调用电视机进行收音并完成回声消除,可以提升ASR识别准确率,进而对用户输入的语音指令进行准确响应,提升语音控制的准确率。Afterwards, if the user triggers multiple rounds of dialogue pause again, the above steps 701 to 706 may be re-executed. During this process, the optimal radio equipment can change. For example, with reference to the example shown in FIG. 7 , after the TV starts to play a movie, the fifth voice command spoken by the user may be “sound small” as an example. The speaker 201 , the TV 202 and the mobile phone 203 respectively record the fifth voice command to generate their respective recording data, and the content of the recording data is the "sound point". Afterwards, through the processes involved in the above steps, the TV 202 is determined as the optimal sound-receiving device among the speakers 201 , the TV 202 and the mobile phone 203 . The speaker 201 can respond to the fifth voice command based on the recording data of the TV set 202 . In this embodiment, when the user's environment changes, different devices can be selected for sound recording according to the recording effect. For example, after the TV 202 starts to play a movie, strong self-noise (such as the sound produced during movie playback) occurs in the user's home, and the voice assistant of the speaker 201 will also be mixed into the statement played by the TV. If the sound of the speaker 201 is used The recorded data will cause ASR recognition errors. The voice control method of this embodiment can improve the accuracy of ASR recognition by dynamically calling the TV to perform radio recording and complete echo cancellation, thereby accurately responding to the voice commands input by the user, and improving the accuracy of voice control. Rate.
需要说明的是,上述图3和图6所示实施例以应答设备根据音频质量信息,选择最优收音设备,根据最优收音设备的录音数据,应答第二语音指令为例进行举例说明,其还可以是其他处理方式,例如,应答设备直接根据接收的录音数据,或者根据接收的录音数据和自身的录音数据,应答第二语音指令。其中,根据接收的录音数据和自身的录音数据,应答第二语音指令的具体实现方式可以是,应答设备对接收到的录音数据的音频内容信息和自身的录音数据的音频内容信息进行拼接,基于拼接后的音频内容信息,应答第二语音指令。例如,用户说出语音信号“播放歌曲112222”,应答设备仅识别到语音信号“2222”,应答设备的录音数据的音频内容信息用于表示语音信号“2222”,应答设备接收到其他设备的录音数据的音频内容信息用于表示语音信号“播放歌曲112”,应答设备可以对二者进行拼接,得到拼接后的音频内容信息,拼接后的音频内容信息用于表示语音信号“播放歌曲112222”。It should be noted that the above-mentioned embodiments shown in FIG. 3 and FIG. 6 are illustrated by taking the answering device selecting the optimal radio device according to the audio quality information, and responding to the second voice command according to the recording data of the optimal radio device as an example. Other processing methods are also possible, for example, the answering device directly responds to the second voice instruction according to the received recording data, or according to the received recording data and its own recording data. Wherein, according to the received recording data and its own recording data, the specific implementation manner of responding to the second voice command may be that the answering device splices the audio content information of the received recording data and the audio content information of its own recording data, based on The spliced audio content information responds to the second voice command. For example, the user speaks the voice signal "play song 112222", the answering device only recognizes the voice signal "2222", the audio content information of the recording data of the answering device is used to represent the voice signal "2222", and the answering device receives the recording of other devices The audio content information of the data is used to represent the voice signal "play song 112", and the answering device can splicing the two to obtain the spliced audio content information, and the spliced audio content information is used to represent the voice signal "play song 112222".
图8为本申请实施例的一种语音控制装置的结构示意图。如图8所示,该装置可以应用于语音控制系统的电子设备(如上述第一电子设备201),该语音控制系统还可以至少包括第二电子设备(如第二电子设备202或第三电子设备203),该装置可以包括:收发模块81和处理模块82。举例而言,收发模块81具体可以是如图2所示实施例的移动通信模块150和/或无线通信模块160。处理模块82可以是如图2所示实施例的处理器110。FIG. 8 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present application. As shown in FIG. 8 , the apparatus can be applied to an electronic device of a voice control system (such as the above-mentioned first electronic device 201 ), and the voice control system can also include at least a second electronic device (such as the second electronic device 202 or the third electronic device 202 ). device 203 ), the apparatus may include: a transceiver module 81 and a processing module 82 . For example, the transceiver module 81 may specifically be the mobile communication module 150 and/or the wireless communication module 160 in the embodiment shown in FIG. 2 . The processing module 82 may be the processor 110 of the embodiment shown in FIG. 2 .
收发模块81用于接收用户输入的第一语音指令,处理模块82用于应答第一语音指令。收发模块81还用于接收第二电子设备发送的第二电子设备的录音数据,第二电子设备的录音数据包括第二电子设备录制用户输入的第二语音指令的录音数据。处理模块82还用于根据第一电子设备的录音数据和/或第二电子设备的录音数据,应答第二语音指令,第一电子设备的录音数据包括第一电子设备录制用户输入的第二语音指令的录音数据。The transceiver module 81 is used for receiving the first voice command input by the user, and the processing module 82 is used for responding to the first voice command. The transceiver module 81 is further configured to receive the recording data of the second electronic device sent by the second electronic device, where the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice instruction input by the user. The processing module 82 is further configured to respond to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, and the recorded data of the first electronic device includes the second voice input by the user recorded by the first electronic device The recorded data of the command.
在一些实施例中,收发模块81还用于向第二电子设备调用拾音指令,拾音指令用于第二电子设备返回第二电子设备的录音数据。In some embodiments, the transceiver module 81 is further configured to call a voice pickup instruction to the second electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
在一些实施例中,处理模块82还用于在第一电子设备接收用户输入的第一语音指令时或之后,录音,录音用于录制用户输入的第二语音指令。In some embodiments, the processing module 82 is further configured to record when or after the first electronic device receives the first voice instruction input by the user, and the recording is used to record the second voice instruction input by the user.
在一些实施例中,第一语音指令用于唤醒第一电子设备和/或第二电子设备的语音控制功能。In some embodiments, the first voice command is used to wake up a voice control function of the first electronic device and/or the second electronic device.
在一些实施例中,处理模块82还用于根据第一电子设备接收到的第一语音指令的音频质量信息和第二电子设备接收到的第一语音指令的音频质量信息,确定第一电子设备为语音控制系统的应答设备。In some embodiments, the processing module 82 is further configured to determine the first electronic device according to the audio quality information of the first voice command received by the first electronic device and the audio quality information of the first voice command received by the second electronic device Answering device for voice control system.
在一些实施例中,处理模块82还用于在第一电子设备应答第一语音指令之后,在录制到用户输入的第二语音指令之前,在第一电子设备录音过程中,在预设时间段内未检测到用户输入的第二语音指令,删除已保存的录音数据,并继续录音。收发模块81还用于向第二电子设备调用多轮对话暂停指令,多轮对话暂停指令用于指示多轮对话暂时停止。In some embodiments, the processing module 82 is further configured to, after the first electronic device responds to the first voice command, before recording the second voice command input by the user, during the recording process of the first electronic device, within a preset time period If the second voice command input by the user is not detected, the saved recording data will be deleted, and the recording will continue. The transceiver module 81 is further configured to call a multi-round dialogue pause instruction to the second electronic device, and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop.
在一些实施例中,收发模块81还用于接收第二电子设备发送的第二电子设备的录音数据的音频质量信息。In some embodiments, the transceiver module 81 is further configured to receive audio quality information of the recording data of the second electronic device sent by the second electronic device.
在一些实施例中,处理模块82用于根据第一电子设备的录音数据的音频质量信息和第二电子设备的录音数据的音频质量信息,从语音控制系统中确定最优收音设备。当最优 收音设备为第一电子设备时,根据第一电子设备的录音数据,应答第二语音指令。当最优收音设备为第二电子设备时,根据第二电子设备的录音数据,或者,根据第二电子设备的录音数据和第一电子设备的录音数据,应答第二语音指令。其中,音频质量信息用于表示录音数据的音频质量。In some embodiments, the processing module 82 is configured to determine the optimal radio device from the voice control system according to the audio quality information of the audio recording data of the first electronic device and the audio quality information of the audio recording data of the second electronic device. When the optimal sound-receiving device is the first electronic device, the second voice command is responded to according to the recording data of the first electronic device. When the optimal sound-receiving device is the second electronic device, the second voice command is answered according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device. The audio quality information is used to indicate the audio quality of the recording data.
在一些实施例中,处理模块82用于根据第一电子设备的录音数据的音频内容信息和/或第二电子设备的录音数据的音频内容信息,应答第二语音指令。其中,音频内容信息用于表示录音数据的音频内容。In some embodiments, the processing module 82 is configured to respond to the second voice instruction according to the audio content information of the audio recording data of the first electronic device and/or the audio content information of the audio recording data of the second electronic device. The audio content information is used to represent the audio content of the recording data.
本申请实施例的语音控制装置可以用于执行上述方法实施例中应答设备(如音箱201)的步骤,其技术原理和技术效果可以参见上述方法实施例的解释说明,此处不再赘述。The voice control apparatus in this embodiment of the present application can be used to execute the steps of the answering device (eg, speaker 201 ) in the above method embodiment, and its technical principle and technical effect can be found in the explanation of the above method embodiment, which will not be repeated here.
图9为本申请实施例的一种语音控制装置的结构示意图。如图9所示,该装置可以应用于语音控制系统的电子设备(如第二电子设备202或第三电子设备203),该语音控制系统还可以至少包括第一电子设备(如第一电子设备201),该装置可以包括:收发模块91和处理模块92。举例而言,收发模块91具体可以是如图2所示实施例的移动通信模块150和/或无线通信模块160。处理模块92可以是如图2所示实施例的处理器110。FIG. 9 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present application. As shown in FIG. 9 , the apparatus can be applied to an electronic device (such as a second electronic device 202 or a third electronic device 203 ) of a voice control system, and the voice control system can also include at least a first electronic device (such as a first electronic device) 201), the apparatus may include: a transceiver module 91 and a processing module 92. For example, the transceiver module 91 may specifically be the mobile communication module 150 and/or the wireless communication module 160 in the embodiment shown in FIG. 2 . The processing module 92 may be the processor 110 of the embodiment shown in FIG. 2 .
处理模块92用于录音,并保存录音数据,录音用于录制用户输入的第二语音指令。收发模块91用于向第一电子设备发送第二电子设备的录音数据,第二电子设备的录音数据包括第二电子设备录制用户输入的第二语音指令的录音数据,录音数据用于第一电子设备在应答第一语音指令之后,应答第二语音指令。The processing module 92 is used for recording and saving the recording data, and the recording is used for recording the second voice instruction input by the user. The transceiver module 91 is used for sending the recording data of the second electronic device to the first electronic device, the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice command input by the user, and the recording data is used for the first electronic device. After responding to the first voice command, the device responds to the second voice command.
一种可能的设计中,收发模块91还用于接收第一电子设备调用拾音指令,拾音指令用于第二电子设备返回第二电子设备的录音数据。In a possible design, the transceiver module 91 is further configured to receive a voice pickup instruction called by the first electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
一种可能的设计中,处理模块92用于在第二电子设备接收到用户输入的第一语音指令时或之后,录音。In a possible design, the processing module 92 is configured to record when or after the second electronic device receives the first voice instruction input by the user.
一种可能的设计中,处理模块92还用于根据第二电子设备接收到的第一语音指令的音频质量信息和第一电子设备接收到的第一语音指令的音频质量信息,确定第一电子设备为语音控制系统的应答设备。In a possible design, the processing module 92 is further configured to determine the first electronic device according to the audio quality information of the first voice command received by the second electronic device and the audio quality information of the first voice command received by the first electronic device. The device is the answering device of the voice control system.
一种可能的设计中,处理模块92还用于在第一电子设备应答第一语音指令之后,在第二电子设备录音过程中,通过收发模块91接收第二电子设备调用多轮对话暂停指令,多轮对话暂停指令用于指示多轮对话暂时停止。处理模块92还用于删除已保存的录音数据,并继续录音。In a possible design, the processing module 92 is further configured to, after the first electronic device responds to the first voice command, during the recording process of the second electronic device, receive through the transceiver module 91 to invoke multiple rounds of dialogue pause commands from the second electronic device, The multi-round dialogue pause command is used to instruct the multi-round dialogue to temporarily stop. The processing module 92 is also used to delete the saved recording data and continue recording.
一种可能的设计中,收发模块91还用于向第一电子设备发送第二电子设备的录音数据的音频质量信息。In a possible design, the transceiver module 91 is further configured to send the audio quality information of the recording data of the second electronic device to the first electronic device.
本申请实施例的语音控制装置可以用于执行上述方法实施例中任意非应答设备(如电视202或手机203)的步骤,其技术原理和技术效果可以参见上述方法实施例的解释说明,此处不再赘述。The voice control apparatus in this embodiment of the present application can be used to perform the steps of any non-response device (such as a TV 202 or a mobile phone 203 ) in the above method embodiments. For the technical principle and technical effect, please refer to the explanations of the above method embodiments. No longer.
本申请实施例另一些实施例还提供了一种电子设备,用于执行以上各方法实施例中电子设备的方法。如图10所示,该电子设备可以包括:麦克风1001、一个或多个处理器1002;一个或多个存储器1003;上述各器件可以通过一个或多个通信总线1005连接。其中上述 存储器1003中存储一个或多个计算机程序1004,一个或多个处理器1002用于执行一个或多个计算机程序1004,该一个或多个计算机程序1004包括指令,上述指令可以用于执行上述方法实施例中任一电子设备执行的各个步骤。该电子设备可以是上述任一形式的电子设备,例如,智能手机、智能手表等。Other embodiments of the embodiments of the present application further provide an electronic device, which is used to execute the methods of the electronic device in the above method embodiments. As shown in FIG. 10 , the electronic device may include: a microphone 1001 , one or more processors 1002 ; one or more memories 1003 ; the above devices may be connected through one or more communication buses 1005 . Wherein the above-mentioned memory 1003 stores one or more computer programs 1004, one or more processors 1002 are used to execute one or more computer programs 1004, and the one or more computer programs 1004 include instructions, and the above-mentioned instructions can be used to execute the above-mentioned Each step performed by any electronic device in the method embodiment. The electronic device may be any of the above-mentioned electronic devices, for example, a smart phone, a smart watch, and the like.
当然,图10所示的电子设备还可以包含如显示屏等其他器件,本申请实施例对此不做任何限制。当其包括其他器件时,具体可以为图2所示的电子设备。Of course, the electronic device shown in FIG. 10 may also include other devices such as a display screen, which is not limited in this embodiment of the present application. When it includes other devices, it may specifically be the electronic device shown in FIG. 2 .
本申请实施例的电子设备可以用于执行上述任一方法实施例中电子设备的步骤,其技术原理和技术效果可以参见上述方法实施例的解释说明,此处不再赘述。The electronic device in this embodiment of the present application can be used to execute the steps of the electronic device in any of the above method embodiments, and the technical principles and technical effects of the electronic device can be referred to the explanations of the above method embodiments, which will not be repeated here.
本申请实施例另一些实施例还提供一种计算机存储介质,该计算机存储介质可包括计算机指令,当该计算机指令在电子设备上运行时,使得该电子设备执行上述方法实施例中电子设备执行的各个步骤。Other embodiments of the embodiments of the present application further provide a computer storage medium, where the computer storage medium may include computer instructions, when the computer instructions are executed on the electronic device, the electronic device is made to perform the execution of the electronic device in the foregoing method embodiments. each step.
本申请实施例另一些实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行上述方法实施例中电子设备执行的各个步骤。Other embodiments of the embodiments of the present application further provide a computer program product, which, when the computer program product runs on a computer, enables the computer to perform each step performed by the electronic device in the foregoing method embodiments.
本申请实施例还提供一种语音控制系统,该语音控制系统可以至少包括:第一电子设备和第二电子设备,其中,第一电子设备可以采用图8或图10所示实施例的结构,第二电子设备可以采用图9或图10所示实施例的结构,其对应地,可以执行上述任一方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。An embodiment of the present application further provides a voice control system, the voice control system may at least include: a first electronic device and a second electronic device, wherein the first electronic device may adopt the structure of the embodiment shown in FIG. 8 or FIG. 10 , The second electronic device may adopt the structure of the embodiment shown in FIG. 9 or FIG. 10 , and correspondingly, may implement the technical solutions of any of the above method embodiments, and the implementation principles and technical effects thereof are similar, and will not be repeated here.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。From the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated as required. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above.
在本申请实施例所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the embodiments of the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
以上各实施例中提及的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存 储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。The processor mentioned in the above embodiments may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above method embodiment may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the methods disclosed in the embodiments of the present application may be directly embodied as executed by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor. The software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。The memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (29)

  1. 一种语音控制方法,其特征在于,应用于语音控制系统,所述语音控制系统至少包括具备语音控制功能的第一电子设备和第二电子设备,所述方法包括:A voice control method, characterized in that it is applied to a voice control system, the voice control system comprising at least a first electronic device and a second electronic device with a voice control function, the method comprising:
    第一电子设备和第二电子设备分别接收用户输入的第一语音指令,所述第一电子设备应答所述第一语音指令;The first electronic device and the second electronic device respectively receive a first voice command input by a user, and the first electronic device responds to the first voice command;
    所述第二电子设备录音,并保存录音数据,所述录音用于录制用户输入的第二语音指令;The second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user;
    所述第二电子设备向所述第一电子设备发送所述第二电子设备的录音数据;sending, by the second electronic device, the recording data of the second electronic device to the first electronic device;
    所述第一电子设备根据所述第一电子设备的录音数据和/或所述第二电子设备的录音数据,应答所述第二语音指令;The first electronic device responds to the second voice instruction according to the recorded data of the first electronic device and/or the recorded data of the second electronic device;
    其中,所述第一电子设备的录音数据包括所述第一电子设备录制用户输入的所述第二语音指令的录音数据。The recording data of the first electronic device includes recording data of the second voice instruction input by the user recorded by the first electronic device.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    所述第一电子设备向所述第二电子设备调用拾音指令,所述拾音指令用于所述第二电子设备返回所述第二电子设备的录音数据。The first electronic device invokes a voice pickup instruction to the second electronic device, and the voice pickup instruction is used by the second electronic device to return the recording data of the second electronic device.
  3. 根据权利要求1或2所述的方法,其特征在于,所述第二电子设备录音,包括:The method according to claim 1 or 2, wherein the recording by the second electronic device comprises:
    在所述第二电子设备接收到用户输入的第一语音指令时或之后,所述第二电子设备录音。The second electronic device records audio when or after the second electronic device receives the first voice instruction input by the user.
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:The method according to claim 3, wherein the method further comprises:
    在所述第一电子设备接收到用户输入的第一语音指令时或之后,所述第一电子设备录音,所述录音用于录制用户输入的第二语音指令。When or after the first electronic device receives the first voice instruction input by the user, the first electronic device makes a recording, and the recording is used to record the second voice instruction input by the user.
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述第一语音指令用于唤醒所述第一电子设备和/或所述第二电子设备的语音控制功能。The method according to any one of claims 1 to 4, wherein the first voice instruction is used to wake up a voice control function of the first electronic device and/or the second electronic device.
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, wherein the method further comprises:
    所述第一电子设备和所述第二电子设备分别根据各自接收到的第一语音指令的音频质量信息,确定所述第一电子设备为所述语音控制系统的应答设备。The first electronic device and the second electronic device respectively determine that the first electronic device is an answering device of the voice control system according to the audio quality information of the first voice command received respectively.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,在所述第一电子设备应答所述第一语音指令之后,在录制到用户输入的第二语音指令之前,所述方法还包括:The method according to any one of claims 1 to 6, wherein after the first electronic device responds to the first voice instruction, and before recording the second voice instruction input by the user, the method further include:
    在所述第一电子设备和所述第二电子设备录音过程中,所述第一电子设备在预设时间段内未检测到用户输入的第二语音指令,所述第一电子设备删除已保存的录音数据,并继续录音;所述第一电子设备向所述第二电子设备调用多轮对话暂停指令,所述多轮对话暂停指令用于指示多轮对话暂时停止;所述第二电子设备删除已保存的录音数据,并继续录音。During the recording process of the first electronic device and the second electronic device, the first electronic device does not detect the second voice command input by the user within a preset time period, and the first electronic device deletes the saved and continue recording; the first electronic device calls a multi-round dialogue pause instruction to the second electronic device, and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop; the second electronic device Delete the saved recording data and continue recording.
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:
    所述第一电子设备接收所述第二电子设备发送的所述第二电子设备的录音数据的音频质量信息。The first electronic device receives audio quality information of the recording data of the second electronic device sent by the second electronic device.
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述第一电子设备根据所述第一电子设备的录音数据和/或所述第二电子设备的录音数据,应答所述第二语音指令,包括:The method according to any one of claims 1 to 8, wherein the first electronic device responds to the said first electronic device according to the recorded data of the first electronic device and/or the recorded data of the second electronic device Second voice command, including:
    所述第一电子设备根据所述第一电子设备的录音数据的音频质量信息和所述第二电子设备的录音数据的音频质量信息,从所述语音控制系统中确定最优收音设备;The first electronic device determines an optimal radio device from the voice control system according to the audio quality information of the audio recording data of the first electronic device and the audio quality information of the audio recording data of the second electronic device;
    当所述最优收音设备为第一电子设备时,所述第一电子设备根据所述第一电子设备的录音数据,或者,根据所述第一电子设备的录音数据和所述第二电子设备的录音数据,应答所述第二语音指令;When the optimal audio pickup device is the first electronic device, the first electronic device is based on the recording data of the first electronic device, or, according to the recording data of the first electronic device and the second electronic device the recorded data, and respond to the second voice command;
    当所述最优收音设备为第二电子设备时,所述第一电子设备根据所述第二电子设备的录音数据,或者,根据所述第二电子设备的录音数据和所述第一电子设备的录音数据,应答所述第二语音指令;When the optimal sound-receiving device is the second electronic device, the first electronic device is based on the recording data of the second electronic device, or, according to the recording data of the second electronic device and the first electronic device the recorded data, and respond to the second voice command;
    其中,所述音频质量信息用于表示所述录音数据的音频质量。Wherein, the audio quality information is used to indicate the audio quality of the recording data.
  10. 根据权利要求1至8任一项所述的方法,其特征在于,所述第一电子设备根据所述第一电子设备的录音数据和/或所述第二电子设备的录音数据,应答所述第二语音指令,包括:The method according to any one of claims 1 to 8, wherein the first electronic device responds to the said first electronic device according to the recorded data of the first electronic device and/or the recorded data of the second electronic device Second voice command, including:
    所述第一电子设备根据所述第一电子设备的录音数据的音频内容信息和/或所述第二电子设备的录音数据的音频内容信息,应答所述第二语音指令;The first electronic device responds to the second voice instruction according to the audio content information of the audio recording data of the first electronic device and/or the audio content information of the audio recording data of the second electronic device;
    其中,所述音频内容信息用于表示所述录音数据的音频内容。Wherein, the audio content information is used to represent the audio content of the recording data.
  11. 一种语音控制方法,其特征在于,应用于语音控制系统的第一电子设备,所述语音控制系统还至少包括第二电子设备,所述方法包括:A voice control method, characterized in that it is applied to a first electronic device of a voice control system, the voice control system further comprising at least a second electronic device, and the method includes:
    所述第一电子设备接收用户输入的第一语音指令,所述第一电子设备应答所述第一语音指令;The first electronic device receives a first voice command input by a user, and the first electronic device responds to the first voice command;
    所述第一电子设备接收所述第二电子设备发送的所述第二电子设备的录音数据,所述第二电子设备的录音数据包括所述第二电子设备录制用户输入的第二语音指令的录音数据;The first electronic device receives the audio recording data of the second electronic device sent by the second electronic device, where the audio recording data of the second electronic device includes the recording data of the second electronic device that records the second voice instruction input by the user. recording data;
    所述第一电子设备根据所述第一电子设备的录音数据和/或所述第二电子设备的录音数据,应答所述第二语音指令,所述第一电子设备的录音数据包括所述第一电子设备录制用户输入的所述第二语音指令的录音数据。The first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, and the recorded data of the first electronic device includes the first electronic device. An electronic device records recording data of the second voice instruction input by the user.
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:The method according to claim 11, wherein the method further comprises:
    所述第一电子设备向所述第二电子设备调用拾音指令,所述拾音指令用于所述第二电子设备返回所述第二电子设备的录音数据。The first electronic device invokes a voice pickup instruction to the second electronic device, and the voice pickup instruction is used by the second electronic device to return the recording data of the second electronic device.
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:The method of claim 12, wherein the method further comprises:
    在所述第一电子设备接收用户输入的所述第一语音指令时或之后,所述第一电子设备录音,所述录音用于录制用户输入的所述第二语音指令。When or after the first electronic device receives the first voice instruction input by the user, the first electronic device makes a recording, and the recording is used to record the second voice instruction input by the user.
  14. 根据权利要求11至13任一项所述的方法,其特征在于,所述第一语音指令用于唤醒所述第一电子设备和/或所述第二电子设备的语音控制功能。The method according to any one of claims 11 to 13, wherein the first voice command is used to wake up a voice control function of the first electronic device and/or the second electronic device.
  15. 根据权利要求11至14任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 11 to 14, wherein the method further comprises:
    所述第一电子设备根据所述第一电子设备接收到的第一语音指令的音频质量信息和所述第二电子设备接收到的第一语音指令的音频质量信息,确定所述第一电子设备为所述语音控制系统的应答设备。The first electronic device determines the first electronic device according to the audio quality information of the first voice command received by the first electronic device and the audio quality information of the first voice command received by the second electronic device It is the answering device of the voice control system.
  16. 根据权利要求11至15任一项所述的方法,其特征在于,在所述第一电子设备应答 所述第一语音指令之后,在录制到用户输入的第二语音指令之前,所述方法还包括:The method according to any one of claims 11 to 15, wherein after the first electronic device responds to the first voice instruction, before recording the second voice instruction input by the user, the method further include:
    在所述第一电子设备录音过程中,所述第一电子设备在预设时间段内未检测到用户输入的第二语音指令,所述第一电子设备删除已保存的录音数据,并继续录音;所述第一电子设备向所述第二电子设备调用多轮对话暂停指令,所述多轮对话暂停指令用于指示多轮对话暂时停止。During the recording process of the first electronic device, if the first electronic device does not detect the second voice command input by the user within a preset time period, the first electronic device deletes the saved recording data and continues to record ; the first electronic device invokes a multi-round dialogue pause instruction to the second electronic device, and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop.
  17. 根据权利要求11至16任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 11 to 16, wherein the method further comprises:
    所述第一电子设备接收所述第二电子设备发送的所述第二电子设备的录音数据的音频质量信息。The first electronic device receives audio quality information of the recording data of the second electronic device sent by the second electronic device.
  18. 根据权利要求11至17任一项所述的方法,其特征在于,所述第一电子设备根据所述第一电子设备的录音数据和/或所述第二电子设备的录音数据,应答所述第二语音指令,包括:The method according to any one of claims 11 to 17, wherein the first electronic device responds to the said first electronic device according to the recorded data of the first electronic device and/or the recorded data of the second electronic device Second voice command, including:
    所述第一电子设备根据所述第一电子设备的录音数据的音频质量信息和所述第二电子设备的录音数据的音频质量信息,从所述语音控制系统中确定最优收音设备;The first electronic device determines an optimal radio device from the voice control system according to the audio quality information of the audio recording data of the first electronic device and the audio quality information of the audio recording data of the second electronic device;
    当所述最优收音设备为第一电子设备时,所述第一电子设备根据所述第一电子设备的录音数据,应答所述第二语音指令;When the optimal radio device is the first electronic device, the first electronic device responds to the second voice instruction according to the recording data of the first electronic device;
    当所述最优收音设备为第二电子设备时,所述第一电子设备根据所述第二电子设备的录音数据,或者,根据所述第二电子设备的录音数据和所述第一电子设备的录音数据,应答所述第二语音指令;When the optimal sound-receiving device is the second electronic device, the first electronic device is based on the recording data of the second electronic device, or, according to the recording data of the second electronic device and the first electronic device the recorded data, and respond to the second voice command;
    其中,所述音频质量信息用于表示所述录音数据的音频质量。Wherein, the audio quality information is used to indicate the audio quality of the recording data.
  19. 根据权利要求11至17任一项所述的方法,其特征在于,所述第一电子设备根据所述第一电子设备的录音数据和/或所述第二电子设备的录音数据,应答所述第二语音指令,包括:The method according to any one of claims 11 to 17, wherein the first electronic device responds to the said first electronic device according to the recorded data of the first electronic device and/or the recorded data of the second electronic device Second voice command, including:
    所述第一电子设备根据所述第一电子设备的录音数据的音频内容信息和/或所述第二电子设备的录音数据的音频内容信息,应答所述第二语音指令;The first electronic device responds to the second voice instruction according to the audio content information of the audio recording data of the first electronic device and/or the audio content information of the audio recording data of the second electronic device;
    其中,所述音频内容信息用于表示所述录音数据的音频内容。Wherein, the audio content information is used to represent the audio content of the recording data.
  20. 一种语音控制方法,其特征在于,应用于语音控制系统的第二电子设备,所述语音控制系统还至少包括第一电子设备,所述方法包括:A voice control method, characterized in that it is applied to a second electronic device of a voice control system, the voice control system further comprising at least a first electronic device, and the method includes:
    所述第二电子设备录音,并保存录音数据,所述录音用于录制用户输入的第二语音指令;The second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user;
    所述第二电子设备向所述第一电子设备发送所述第二电子设备的录音数据,所述第二电子设备的录音数据包括所述第二电子设备录制用户输入的第二语音指令的录音数据,所述录音数据用于所述第一电子设备在应答第一语音指令之后,应答所述第二语音指令。The second electronic device sends the recording data of the second electronic device to the first electronic device, where the recording data of the second electronic device includes the recording of the second electronic device recording the second voice command input by the user data, the recording data is used for the first electronic device to respond to the second voice command after responding to the first voice command.
  21. 根据权利要求20所述的方法,其特征在于,所述方法还包括:The method of claim 20, wherein the method further comprises:
    所述第二电子设备接收所述第一电子设备调用拾音指令,所述拾音指令用于所述第二电子设备返回所述第二电子设备的录音数据。The second electronic device receives a voice pickup instruction called by the first electronic device, and the voice pickup instruction is used by the second electronic device to return the recording data of the second electronic device.
  22. 根据权利要求20或21所述的方法,其特征在于,所述第二电子设备录音,包括:The method according to claim 20 or 21, wherein the recording by the second electronic device comprises:
    在所述第二电子设备接收到用户输入的第一语音指令时或之后,所述第二电子设备录音。The second electronic device records audio when or after the second electronic device receives the first voice instruction input by the user.
  23. 根据权利要求20至22任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 20 to 22, wherein the method further comprises:
    所述第二电子设备根据所述第二电子设备接收到的第一语音指令的音频质量信息和所述第一电子设备接收到的第一语音指令的音频质量信息,确定所述第一电子设备为所述语音控制系统的应答设备。The second electronic device determines the first electronic device according to the audio quality information of the first voice command received by the second electronic device and the audio quality information of the first voice command received by the first electronic device It is the answering device of the voice control system.
  24. 根据权利要求20至23任一项所述的方法,其特征在于,在所述第一电子设备应答所述第一语音指令之后,所述方法还包括:The method according to any one of claims 20 to 23, wherein after the first electronic device responds to the first voice command, the method further comprises:
    在所述第二电子设备录音过程中,所述第二电子设备接收所述第二电子设备调用多轮对话暂停指令,所述多轮对话暂停指令用于指示多轮对话暂时停止;所述第二电子设备删除已保存的录音数据,并继续录音。During the recording process of the second electronic device, the second electronic device receives the second electronic device to invoke a multi-round dialogue pause instruction, and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop; the first The second electronic device deletes the saved recording data and continues the recording.
  25. 根据权利要求20至24任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 20 to 24, wherein the method further comprises:
    所述第二电子设备向所述第一电子设备发送所述第二电子设备的录音数据的音频质量信息。The second electronic device sends audio quality information of the recording data of the second electronic device to the first electronic device.
  26. 一种电子设备,其特征在于,包括:一个或多个处理器和存储器;An electronic device, comprising: one or more processors and memories;
    所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,所述电子设备执行如权利要求11至19中任一项所述的语音控制方法,或者,所述电子设备执行如权利要求20至25任一项所述的语音控制方法。The memory is coupled to the one or more processors for storing computer program code comprising computer instructions that, when executed by the one or more processors, The electronic device executes the voice control method according to any one of claims 11 to 19, or the electronic device executes the voice control method according to any one of claims 20 to 25.
  27. 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求11至19中任一项所述的语音控制方法,或者,使得所述电子设备执行如权利要求20至25任一项所述的语音控制方法。A computer storage medium, characterized by comprising computer instructions, which, when the computer instructions are executed on an electronic device, cause the electronic device to execute the voice control method according to any one of claims 11 to 19, or , so that the electronic device executes the voice control method according to any one of claims 20 to 25.
  28. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求11至19中任一项所述的语音控制方法,或者,使得所述计算机执行如权利要求20至25任一项所述的语音控制方法。A computer program product, characterized in that, when the computer program product runs on a computer, the computer is caused to execute the voice control method according to any one of claims 11 to 19, or, the computer is caused to A voice control method as claimed in any one of claims 20 to 25 is performed.
  29. 一种语音控制系统,其特征在于,所述语音控制系统至少包括具备语音控制功能的第一电子设备和第二电子设备,所述语音控制系统用于执行如权利要求1至10任一项所述的语音控制方法。A voice control system, characterized in that, the voice control system at least includes a first electronic device and a second electronic device with a voice control function, and the voice control system is used to perform the method as claimed in any one of claims 1 to 10. described voice control method.
PCT/CN2021/142083 2021-01-29 2021-12-28 Speech control method, and electronic device WO2022161077A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110130831.0 2021-01-29
CN202110130831.0A CN114822525A (en) 2021-01-29 2021-01-29 Voice control method and electronic equipment

Publications (1)

Publication Number Publication Date
WO2022161077A1 true WO2022161077A1 (en) 2022-08-04

Family

ID=82526078

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142083 WO2022161077A1 (en) 2021-01-29 2021-12-28 Speech control method, and electronic device

Country Status (2)

Country Link
CN (1) CN114822525A (en)
WO (1) WO2022161077A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873418A (en) * 2022-10-11 2024-04-12 华为技术有限公司 Recording control method, electronic equipment and medium
CN116682465B (en) * 2022-10-31 2024-09-27 荣耀终端有限公司 Method for recording content and electronic equipment
CN117707404A (en) * 2023-05-31 2024-03-15 荣耀终端有限公司 Scene processing method, electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148615A1 (en) * 2014-11-26 2016-05-26 Samsung Electronics Co., Ltd. Method and electronic device for voice recognition
CN107622652A (en) * 2016-07-15 2018-01-23 青岛海尔智能技术研发有限公司 The sound control method and appliance control system of appliance system
CN108228699A (en) * 2016-12-22 2018-06-29 谷歌有限责任公司 Collaborative phonetic controller
US20180228006A1 (en) * 2017-02-07 2018-08-09 Lutron Electronics Co., Inc. Audio-Based Load Control System
CN111326151A (en) * 2018-12-14 2020-06-23 上海诺基亚贝尔股份有限公司 Apparatus, method and computer-readable medium for voice interaction
CN111369994A (en) * 2020-03-16 2020-07-03 维沃移动通信有限公司 Voice processing method and electronic equipment
CN112002319A (en) * 2020-08-05 2020-11-27 海尔优家智能科技(北京)有限公司 Voice recognition method and device of intelligent equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148615A1 (en) * 2014-11-26 2016-05-26 Samsung Electronics Co., Ltd. Method and electronic device for voice recognition
CN107622652A (en) * 2016-07-15 2018-01-23 青岛海尔智能技术研发有限公司 The sound control method and appliance control system of appliance system
CN108228699A (en) * 2016-12-22 2018-06-29 谷歌有限责任公司 Collaborative phonetic controller
US20180228006A1 (en) * 2017-02-07 2018-08-09 Lutron Electronics Co., Inc. Audio-Based Load Control System
CN111326151A (en) * 2018-12-14 2020-06-23 上海诺基亚贝尔股份有限公司 Apparatus, method and computer-readable medium for voice interaction
CN111369994A (en) * 2020-03-16 2020-07-03 维沃移动通信有限公司 Voice processing method and electronic equipment
CN112002319A (en) * 2020-08-05 2020-11-27 海尔优家智能科技(北京)有限公司 Voice recognition method and device of intelligent equipment

Also Published As

Publication number Publication date
CN114822525A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2021000876A1 (en) Voice control method, electronic equipment and system
CN110784830B (en) Data processing method, Bluetooth module, electronic device and readable storage medium
US11843716B2 (en) Translation method and electronic device
JP2022541207A (en) Voice activated method and electronic device
CN111369988A (en) Voice awakening method and electronic equipment
WO2022161077A1 (en) Speech control method, and electronic device
EP3826280B1 (en) Method for generating speech control command, and terminal
WO2020073288A1 (en) Method for triggering electronic device to execute function and electronic device
CN112119641B (en) Method and device for realizing automatic translation through multiple TWS (time and frequency) earphones connected in forwarding mode
WO2021052139A1 (en) Gesture input method and electronic device
WO2022022319A1 (en) Image processing method, electronic device, image processing system and chip system
WO2021000817A1 (en) Ambient sound processing method and related device
WO2020051852A1 (en) Method for recording and displaying information in communication process, and terminals
CN115589051B (en) Charging method and terminal equipment
CN113728295A (en) Screen control method, device, equipment and storage medium
US12136896B2 (en) Method and apparatus for adjusting vibration waveform of linear motor
WO2020078267A1 (en) Method and device for voice data processing in online translation process
CN114120987B (en) Voice wake-up method, electronic equipment and chip system
WO2022007757A1 (en) Cross-device voiceprint registration method, electronic device and storage medium
CN113380240B (en) Voice interaction method and electronic equipment
CN115731923A (en) Command word response method, control equipment and device
WO2023216922A1 (en) Target device selection identification method, terminal device, system and storage medium
WO2022143048A1 (en) Dialogue task management method and apparatus, and electronic device
WO2024055881A1 (en) Clock synchronization method, electronic device, system, and storage medium
WO2024114493A1 (en) Human-machine interaction method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922666

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922666

Country of ref document: EP

Kind code of ref document: A1