CN110827818A

CN110827818A - Control method, device, equipment and storage medium of intelligent voice equipment

Info

Publication number: CN110827818A
Application number: CN201911138882.7A
Authority: CN
Inventors: 孔秀哲
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2020-02-21
Anticipated expiration: 2039-11-20
Also published as: CN110827818B

Abstract

The invention provides a control method and a control device of intelligent voice equipment, electronic equipment and a storage medium; the method comprises the following steps: receiving voice signals of users in a space where first intelligent voice equipment is located; sensing the space according to the voice signal to determine the intelligent voice equipment in the space and the position relation between the intelligent voice equipment and the user; when the space further comprises at least one second intelligent voice device, a target intelligent voice device meeting the use scene of the space is determined in the first intelligent voice device and the at least one second intelligent voice device according to the position relation, and the target intelligent voice device is triggered to be in the awakening state to respond to the voice signal of the user. By the method and the device, the intelligent response to the voice signal of the user can be realized in the complex environment of a plurality of intelligent voice devices, so that the experience of the user is improved.

Description

Control method, device, equipment and storage medium of intelligent voice equipment

Technical Field

The present invention relates to artificial intelligence technology, and in particular, to a method and an apparatus for controlling an intelligent speech device, an electronic device, and a storage medium.

Background

Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning, etc., and along with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important values.

With the development of computer technology, the intelligent voice device becomes one of important applications in the field of artificial intelligence, and the intelligent voice device can intelligently interact with the instant question and answer through intelligent conversation, so that the intelligent voice device is beneficial to solving various problems of a user, namely, the intelligent voice device can answer the problems of the user, and can also meet the requirements provided by the user, for example, if the user needs to play an XX song, the intelligent voice device can play the XX song for the user.

However, as more and more smart Voice devices access to Voice Service (VS) exist, a situation that a plurality of smart Voice devices exist in the same scene (in the same home or in the same room) occurs, and in this situation, if a user wakes up a smart Voice device and initiates a Voice request, the plurality of smart Voice devices simultaneously respond to and reply the Voice request initiated by the user, so that the experience of the user is greatly reduced.

Disclosure of Invention

Embodiments of the present invention provide a method and an apparatus for controlling an intelligent voice device, an electronic device, and a storage medium, which can implement an intelligent response to a voice signal of a user in a complex environment of multiple intelligent voice devices, thereby improving a user experience.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a control method of intelligent voice equipment, which comprises the following steps:

receiving voice signals of users in a space where first intelligent voice equipment is located;

sensing the space according to the voice signal to determine intelligent voice equipment in the space and a position relation between the intelligent voice equipment and the user;

when the space further comprises at least one second intelligent voice device, determining a target intelligent voice device meeting the use scene of the space in the first intelligent voice device and the at least one second intelligent voice device according to the position relation, and

and triggering the target intelligent voice equipment to be in a wake-up state so as to respond to the voice signal of the user.

In the foregoing technical solution, the determining, according to the location relationship, a target smart speech device that satisfies a usage scenario of the space in the first smart speech device and the at least one second smart speech device includes:

identifying a user corresponding to the voice signal of the user based on the voiceprint characteristics of the voice signal of the user;

when the user corresponding to the account is determined to be the user corresponding to the voice signal based on the user account bound by the first intelligent voice device and the at least one second intelligent voice device, determining the intelligent voice device corresponding to the account to be the awakenable intelligent voice device;

and determining the intelligent voice equipment with the highest matching degree with the position relation between the users as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment.

The embodiment of the invention provides a control device of intelligent voice equipment, which comprises:

the receiving module is used for receiving voice signals of users in the space where the first intelligent voice equipment is located;

the perception module is used for perceiving the space according to the voice signal so as to determine intelligent voice equipment in the space and a position relation between the intelligent voice equipment and the user;

a processing module, configured to determine, when the space further includes at least one second smart voice device, a target smart voice device that satisfies a usage scenario of the space among the first smart voice device and the at least one second smart voice device according to the location relationship, and

and the triggering module is used for triggering the target intelligent voice equipment to be in a wake-up state so as to respond to the voice signal of the user.

In the foregoing technical solution, the sensing module is further configured to execute the following processing for a voice signal of the user received by any intelligent voice device in the space:

analyzing and processing the voice signals of the user received by the intelligent voice equipment from multiple directions to obtain energy values of the voice signals of the user received by the intelligent voice equipment from multiple directions;

determining the direction corresponding to the maximum energy value as the direction of the user relative to the intelligent voice equipment, and

and determining the distance corresponding to the attenuation value as the distance between the intelligent voice equipment and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user.

analyzing the voice signal to obtain a first distance between the intelligent voice equipment and the user and a first direction of the user relative to the intelligent voice equipment;

responding to the received voice signal, and detecting obstacles in the space to obtain a second distance between the intelligent voice equipment and the user;

carrying out obstacle recognition on the space to obtain a second direction of the user relative to the intelligent voice equipment;

when the distance difference value between the first distance and the second distance is greater than a distance error threshold value and/or the direction error between the first direction and the second direction is greater than a direction error threshold value, determining the weighted value of the first distance and the second distance as the distance between the intelligent voice device and the user, and determining the average value of the first direction and the second direction as the direction of the user relative to the intelligent voice device.

In the foregoing technical solution, the processing module is further configured to perform the following processing on a position relationship obtained by performing perceptual processing on a voice signal received by any one of the smart devices in the space:

when the time that the position relation is kept unchanged exceeds a time threshold, determining that the user is in a static state;

and according to the distance between the intelligent voice equipment and the user included in the position relationship, determining the intelligent voice equipment with the minimum distance from the user in the space as target intelligent voice equipment.

when the position relation changes, determining that the user is in a motion state;

determining the direction of change of the direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment, wherein the direction of change of the direction is included in the position relation;

according to the distance between the user and the intelligent equipment in the position relationship, multiplying the reciprocal of the distance by the vector of the moving direction to obtain the matching degree of the position relationship between the intelligent voice equipment and the user;

determining the intelligent voice equipment with the highest matching degree as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment;

when the direction of the user relative to the intelligent voice equipment changes to be close to the intelligent voice equipment, the moving direction value is positive, and when the direction of the user relative to the intelligent voice equipment changes to be far away from the intelligent voice equipment, the moving direction value is negative.

In the above technical solution, the processing module is further configured to determine an intelligent voice device in an awake state in the first intelligent voice device and the at least one second intelligent voice device;

when the distance between the intelligent voice equipment in the awakening state and the user does not exceed the critical distance, determining that the intelligent voice equipment in the awakening state is the target intelligent voice equipment;

the critical distance is the maximum distance when the user and the intelligent voice equipment can correctly perceive the voice signal sent by the other party.

In the foregoing technical solution, the processing module is further configured to determine that there is an intelligent voice device interacting with the user in the first intelligent voice device and the at least one second intelligent voice device, and

and when the distance between the intelligent voice device and the user does not exceed the critical distance, determining that the intelligent voice device interacting with the user is a target intelligent voice device.

In the above technical solution, the processing module is further configured to determine a change trend of a positional relationship between the intelligent voice device in the awake state and the user before receiving the voice signal;

when it is determined that the intelligent voice device in the awakening state exceeds the critical distance according to the change trend of the position relationship, determining the intelligent voice device with the highest matching degree with the position relationship between the intelligent voice device and the user as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

the triggering module is further used for triggering the intelligent voice equipment in the awakening state to be in a standby state and awakening the target intelligent voice equipment in real time when the intelligent voice equipment with the highest position relation matching degree with the user is determined to be the target intelligent voice equipment.

In the above technical solution, the processing module is further configured to determine a change trend of a positional relationship between the intelligent voice device in an awake state and the user before receiving the voice signal;

when it is determined that the intelligent voice device in the awakening state exceeds the critical distance within a preset time according to the change trend of the position relationship, determining the intelligent voice device with the highest matching degree with the position relationship between the intelligent voice device and the user as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

the triggering module is further configured to wake up the target smart voice device in advance before the smart voice device in the wake-up state does not exceed the critical distance.

and when the intelligent voice equipment in the awakening state does not exceed the critical distance according to the change trend of the position relationship, determining that the intelligent voice equipment in the awakening state is the target intelligent voice equipment.

In the above technical solution, the processing module is further configured to acquire historical data of the intelligent voice device in the awake state before receiving the voice signal, and predict a predicted usage duration of the intelligent voice device in the awake state through an artificial intelligence model in combination with a change trend, a usage duration, and an awake frequency of the location relationship of the intelligent voice device that has been awoken;

determining, in the first intelligent voice device and the at least one second intelligent voice device, the intelligent voice device with the highest matching degree with the position relationship between the users as a target intelligent voice device;

the triggering module is also used for waking up the target intelligent voice equipment in real time when the expected use time reaches the preset time; or,

and waking up the target intelligent voice equipment in advance before the expected use duration is reached.

In the above technical solution, the apparatus further includes:

the switching module is used for triggering intelligent voice equipment which is out of the target intelligent voice equipment and is in an awakening state to be switched to a standby state in real time; or,

and waiting for a preset time period, determining the change trend of the position relation between the target intelligent voice equipment and the user aiming at the intelligent voice equipment in the awakening state except the target intelligent voice equipment in the preset time period, and triggering the intelligent voice equipment in the awakening state to be switched to the standby state when the intelligent voice equipment in the awakening state is determined to exceed the critical distance in the preset time period.

In the above technical solution, the apparatus further includes:

and the response module is used for triggering the target intelligent voice device to respond to the last voice signal again when the target intelligent voice device is not the same as the intelligent voice device in the awakening state before the voice signal is received and the distance between the intelligent voice device in the awakening state and the user exceeds a critical distance in the process that the intelligent voice device in the awakening state responds to the voice signal of the user for the last time.

In the above technical solution, the processing module is further configured to determine, when the account corresponds to the plurality of intelligent voice devices based on the user account bound to the first intelligent voice device and the at least one second intelligent voice device, that the intelligent voice device with the highest matching degree with the position relationship between the users is the target intelligent voice device among the plurality of intelligent voice devices.

In the above technical solution, the processing module is further configured to identify a user corresponding to the voice signal of the user based on a voiceprint feature of the voice signal of the user;

An embodiment of the present invention provides an intelligent voice device, including:

a memory for storing executable instructions;

and the processor is used for realizing the control method of the intelligent voice equipment provided by the embodiment of the invention when the executable instructions stored in the memory are executed.

The embodiment of the invention provides a server for controlling intelligent voice equipment, which comprises:

a memory for storing executable instructions;

The embodiment of the invention provides a storage medium, which stores executable instructions and is used for causing a processor to execute so as to realize the control method of the intelligent voice equipment provided by the embodiment of the invention.

The embodiment of the invention has the following beneficial effects:

the target intelligent voice equipment meeting the use scene of the space is determined in the first intelligent voice equipment and the at least one second intelligent voice equipment according to the position relation, and the target intelligent voice equipment is triggered to respond to the voice signal of the user, so that the situation that the intelligent voice equipment in the same scene responds to the voice signal of the user is avoided, and the experience of the user is improved.

Drawings

Fig. 1 is a schematic diagram of an optional application scenario 10 of a control method for an intelligent voice device according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device 500 according to an embodiment of the present invention;

3A-3C are flow diagrams of a control method for an intelligent voice device according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a control method of an intelligent voice device according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a user waking up an intelligent device according to an embodiment of the present invention;

fig. 6 is a schematic view of an application scenario of an intelligent speech device according to an embodiment of the present invention;

fig. 7 is a schematic view of an application scenario in which an intelligent voice device interacts with a cloud end according to an embodiment of the present invention;

fig. 8 is a waveform diagram of voice data uploaded to the cloud by the intelligent voice device 1 according to the embodiment of the present invention;

fig. 9 is a frequency spectrum diagram of voice data uploaded to the cloud by the intelligent voice device 1 according to the embodiment of the present invention;

fig. 10 is a waveform diagram of voice data uploaded to the cloud by the smart voice device 2 according to the embodiment of the present invention;

fig. 11 is a frequency spectrum diagram of voice data uploaded to the cloud by the intelligent voice device 2 according to the embodiment of the present invention;

fig. 12 is a schematic view of another application scenario in which an intelligent voice device interacts with a cloud end according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the description that follows, references to the terms "first", "second", and the like, are intended only to distinguish similar objects and not to indicate a particular ordering for the objects, it being understood that "first", "second", and the like may be interchanged under certain circumstances or sequences of events to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) The voice assistant: an intelligent terminal application is used for helping users to solve various problems and mainly helping users to solve life problems through intelligent interaction of intelligent conversation and instant question and answer.

2) Cloud: also known as a cloud platform, is a software platform that uses application virtualization technology (application virtualization), and integrates various functions such as software search, download, use, management, backup, and the like. Through the platform, various common software can be packaged in an independent virtualization environment, so that application software cannot be coupled with a system, and the purpose of green software use is achieved.

3) Energy value: when the energy value of the voice data of the intelligent voice device is larger, it indicates that the voice information of the user received by the intelligent voice device is clearer, that is, the intelligent voice device is closer to the user. The energy value can be represented by a waveform diagram and a spectrogram, and when the amplitude of a waveform on the waveform diagram is larger, the energy value of voice data is larger, namely the energy value of the voice data is in direct proportion to the amplitude of the waveform; when the high frequency region on the spectrogram is more active, the energy value of the voice data is larger, and when the high frequency region on the spectrogram is more active, the energy value of the voice data is larger, that is, the energy value of the voice data is in a direct proportion relation with the activity of the high frequency region.

4) And (3) voice recognition: a process for a machine to convert speech signals into corresponding text or commands through a recognition and understanding process.

In order to solve at least the above technical problems of the related art, embodiments of the present invention provide a method and an apparatus for controlling an intelligent voice device, an electronic device, and a storage medium, which enable a target intelligent voice device to be in an awake state to respond to a voice signal of a user, and prevent the intelligent voice devices from responding to the voice signal of the user in the same scene, thereby improving the experience of the user. The following describes an exemplary application of the electronic device provided by the embodiment of the present invention, where the electronic device implementing the intelligent voice device control scheme provided by the embodiment of the present invention may be a server, for example, a server deployed in a cloud, and determines a location relationship between the first intelligent voice device and the user and at least one second intelligent voice device provided by the first intelligent voice device and the at least one second intelligent voice device in the same space, and determines a target intelligent voice device meeting a usage scenario of the space in the first intelligent voice device and the at least one second intelligent voice device according to the location relationship, and triggers the target intelligent voice device to be in an awake state to respond to a voice signal of the user.

The electronic device implementing the intelligent voice device control scheme provided by the embodiment of the present invention may also be a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a personal digital assistant), and various user terminals (intelligent voice devices) having an intelligent voice function, where for example, a first intelligent voice device is a handheld terminal, and determines a location relationship between the intelligent voice device and a user according to a received voice signal of the user and a voice signal of the user provided by at least one second intelligent voice device in the same space, determines a target intelligent voice device meeting a usage scenario of the space in the first intelligent voice device and the at least one second intelligent voice device according to the location relationship, and triggers the target intelligent voice device to be in a wake-up state to respond to the voice signal of the user.

Referring to fig. 1, fig. 1 is a schematic diagram of an optional application scenario 10 of the control method for an intelligent voice device according to an embodiment of the present invention, where a terminal 200 (illustratively, an intelligent voice device 200-1, an intelligent voice device 200-2, and an intelligent voice device 200-3) is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 200 may be used to receive a voice signal of a user, for example, when the user sends the voice signal, the terminal automatically collects the voice signal of the user.

In some embodiments, the terminal 200 locally performs the control method of the smart voice device according to the embodiment of the present invention to perform the sensing processing on the space according to the voice signal of the user, so as to determine the smart voice device included in the space and the position relationship between the smart voice device and the user, and determine a target smart voice device meeting the usage scenario of the space according to the position relationship among the first smart voice device and the at least one second smart voice device, trigger the target smart voice device to be in the wake-up state to respond to the voice signal of the user, for example, a voice assistant is installed on the smart voice device 200-1 (the first smart voice device), after the user sends the voice signal, the smart voice device 200-1 collects the voice signal of the user, and receives the voice signal of the user collected by the smart voice device 200-2 and the smart voice device 200-3 (the second smart voice device), the method comprises the steps of determining the position relation between intelligent voice equipment (first intelligent voice equipment and second intelligent voice equipment) and a user according to voice signals, determining target intelligent voice equipment (any one of the first intelligent voice equipment and the second intelligent voice equipment) meeting the use scene of a space according to the position relation, and triggering the target intelligent voice equipment to be in an awakening state through a voice assistant so as to respond to the voice signals of the user.

The terminal 200 can also transmit a user's voice signal to the server 100 through the network 300, and invokes a control function of the intelligent voice device provided by the server 100, the server 100 performs a control process by the control method of the intelligent voice device provided by the embodiment of the present invention, for example, a voice assistant is installed on the terminal 200 (intelligent voice device), and after the user sends out a voice signal, the terminal 200 collects the voice signal of the user through the voice assistant, and transmits the voice signal of the user to the server 100 through the network 300, the server 100 determines the intelligent voice device and the location relationship with the user based on the voice signal of the user, and determining target intelligent voice equipment meeting the use scene of the space according to the position relation, sending a control instruction to the target intelligent voice equipment, triggering the target intelligent voice equipment to be in an awakening state, and responding to a voice signal of a user through a voice assistant.

Continuing to describe the structure of the electronic device implementing the intelligent voice device control scheme provided by the embodiment of the present invention, referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 500 provided by the embodiment of the present invention, where the electronic device 500 shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in connection with embodiments of the invention is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a display module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.

In some embodiments, the control Device of the smart voice Device provided in the embodiments of the present invention may be implemented by combining software and hardware, and as an example, the control Device of the smart voice Device provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the control method of the smart voice Device provided in the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In other embodiments, the control apparatus of the smart voice device provided by the embodiment of the present invention may be implemented in a software manner, and fig. 2 illustrates the control apparatus 555 of the smart voice device stored in the memory 550, which may be software in the form of programs and plug-ins, and includes a series of modules, including a receiving module 5551, a sensing module 5552, a processing module 5553, a triggering module 5554, a switching module 5555, and a response module 5556; the receiving module 5551, the sensing module 5552, the processing module 5553, the triggering module 5554, the switching module 5555, and the response module 5556 are used to implement the control method of the intelligent voice device provided by the embodiment of the invention.

In the following, the control method of the intelligent voice device provided by the embodiment of the present invention is described by taking the first intelligent voice device as an execution subject, in combination with the exemplary application and implementation of the intelligent voice device provided by the embodiment of the present invention. Referring to fig. 3A, fig. 3A is a schematic flowchart of a control method of an intelligent voice device according to an embodiment of the present invention, and is described with reference to the steps shown in fig. 3A.

In step 101, a voice signal of a user in a space where a first smart voice device is located is received.

After a user sends a voice signal, for example, a wake-up word "ABAB", the first intelligent voice device may collect the voice signal of the user, and transmit the collected voice signal of the user through local area network broadcasting or other close-range communication methods, and the first intelligent voice device may also receive the voice signal of the user collected by the second intelligent voice device (that is, any intelligent voice device except the first intelligent voice device in the space, the number of the intelligent voice devices may be one or more) in the space where the first intelligent voice device is located.

In step 102, the space is perceptually processed according to the voice signal to determine the intelligent voice device included in the space and the position relationship with the user.

After the first intelligent voice device receives the voice signal of the user, namely the voice signal of the user collected by the first intelligent voice device and the voice signal of the user sent by the second intelligent voice device, the space is subjected to perception processing according to the voice signal, and the intelligent voice device in the space and the position relation between the intelligent voice device and the user, namely the position relation between the first intelligent voice device and the user and the position relation between the second intelligent voice device and the user are determined.

Referring to fig. 3B, fig. 3B is an optional flowchart provided in an embodiment of the present invention, and in some embodiments, fig. 3B illustrates that step 102 in fig. 3A may be implemented by step 1021 to step 1023 illustrated in fig. 3B.

The method for perceptually processing the space according to the voice signal to determine the intelligent voice equipment in the space and the position relation between the intelligent voice equipment and the user comprises the following steps: for a voice signal of a user received by any intelligent voice device in a space, the following processing is executed:

in step 1021, the speech signals of the user received by the intelligent speech device from multiple directions are analyzed to obtain energy values of the speech signals of the user received by the intelligent speech device from multiple directions.

In step 1022, the direction corresponding to the maximum energy value is determined as the direction of the user relative to the intelligent voice device.

In step 1023, according to the relation that the energy value of the voice signal attenuates with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user, the distance corresponding to the attenuation value is determined as the distance between the intelligent voice device and the user.

After the first intelligent voice device receives the voice signals (carrying the device identifier for uniquely identifying the device) of the user sent by other intelligent voice devices, the voice signals of the user received by any intelligent voice device (the first intelligent voice device and the second intelligent voice device) are subjected to direction and distance identification processing. The intelligent voice equipment can be provided with a multi-directional microphone array and is used for receiving the voice equipment of a user from multiple directions, so that the voice signals of the user, received by the intelligent voice equipment from multiple directions, can be analyzed and processed, energy values of the voice signals of the user, received by the intelligent voice equipment from multiple directions, are obtained, the energy values of the voice signals of different directions are different, the energy value of the voice signal of the direction, closer to the user, of the intelligent voice equipment is larger, and therefore the direction corresponding to the maximum energy value of the voice signal is determined to be the direction of the user relative to the intelligent voice equipment. After the direction of the user relative to the intelligent voice device is determined, the distance corresponding to the attenuation value can be determined to be the distance between the intelligent voice device and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user, wherein the energy value of the reference voice signal of the user is a fixed value or a standard energy value of the voice signal of the user detected by other accurate voice detection devices in real time.

In some embodiments, perceptually processing a space according to a speech signal to determine a location relationship between a smart speech device included in the space and a user includes: for a voice signal of a user received by any intelligent voice device in a space, the following processing is executed: analyzing the voice signals to obtain a first distance between the intelligent voice equipment and the user and a first direction of the user relative to the intelligent voice equipment; responding to the received voice signal, and detecting a barrier in the space to obtain a second distance between the intelligent voice equipment and the user; carrying out obstacle recognition on the space to obtain a second direction of the user relative to the intelligent voice equipment; when the distance difference value of the first distance and the second distance is larger than the distance error threshold value and/or the direction error between the first direction and the second direction is larger than the direction error threshold value, determining the weighted value of the first distance and the second distance as the distance between the intelligent voice equipment and the user, and determining the average value of the first direction and the second direction as the direction of the user relative to the intelligent voice equipment.

After the first intelligent voice device receives the voice signals (carrying the device identifier for uniquely identifying the device) of the user sent by other intelligent voice devices, the voice signals of the user received by any intelligent voice device (the first intelligent voice device and the second intelligent voice device) are subjected to direction and distance identification processing. Since the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device are determined only by the voice signal and may not be accurate, the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device may also be determined in other ways, and the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device obtained by combining the two methods are combined to finally obtain the accurate distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device.

Firstly, analyzing and processing voice signals of a user, which are received by intelligent voice equipment from multiple directions, to obtain energy values of the voice signals of the user, which are received by the intelligent voice equipment from multiple directions; and determining the direction corresponding to the maximum energy value as a first direction of the user relative to the intelligent voice equipment, and determining the distance corresponding to the attenuation value as a first distance between the intelligent voice equipment and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user. Then, in response to the received voice signal, other devices may be triggered to perform obstacle detection on the space, so as to obtain a second distance between the intelligent voice device and the user, where the other devices may be a sound wave detection device (such as ultrasonic detection), an image acquisition and analysis device (such as camera acquisition, recognizing a human profile), a biological signal detection device (such as infrared detection), and the like, and are used for detecting the distance between the intelligent voice device and the user. For example, the sound wave detection device can send out sound waves, receive the sound waves reflected by the obstacles, and determine the distance between the intelligent voice device and the user according to the back-and-forth time of the sound waves; the image acquisition and analysis equipment can acquire the image of the obstacle in the current space, identify the user according to an image identification method and determine the distance between the intelligent voice equipment and the user; the bio-signal detection device may detect bio-signals, for example, detect a user in the current space, and determine a distance between the smart voice device and the user according to the detected user. The other devices can be integrated in the intelligent voice device or can be independent devices which can be sensed and used by the intelligent voice device. When the space is subjected to obstacle detection to obtain a second distance between the intelligent voice device and the user, the space can be subjected to obstacle identification through other devices to obtain a second direction of the user relative to the intelligent voice device, wherein the other devices can be sound wave detection devices (such as ultrasonic detection), image acquisition and analysis devices (such as camera acquisition for identifying the outline of a person), biological signal detection devices (such as infrared detection) and the like for detecting the direction of the user relative to the intelligent voice device. For example, the sound wave detection device can send out sound waves, receive the sound waves reflected back by the obstacle, and determine the direction of the user relative to the intelligent voice device according to the direction of the return sound waves; the image acquisition and analysis equipment can acquire the image of the obstacle in the current space, identify the user according to an image identification method and determine the direction of the user relative to the intelligent voice equipment; the bio-signal detection device may detect bio-signals, for example, detect a user in the current space, and determine the direction of the user relative to the smart voice device based on the detected user.

After determining a first distance between the smart voice device and the user and a first direction of the user relative to the smart voice device through voice signal parsing (first method), determining a second distance between the smart voice device and the user and a second direction of the user relative to the smart voice device through other devices (second method), when it is determined that a distance difference between the first distance and the second distance is greater than a distance error threshold, and/or a direction error between the first direction and the second direction is greater than a direction error threshold, determining the weighted value of the first distance and the second distance as the distance between the intelligent voice device and the user, and determining an average of the first direction and the second direction as a direction of the user with respect to the smart voice device, therefore, the first method and the second method are fused, and the accuracy of the position relation between the intelligent voice equipment and the user is improved. For example, the weights of the first distance and the second distance may be set according to specific situations, when the first method is emphasized, a higher weight may be set for the first distance and a lower weight may be set for the second distance, and if the weight set for the first distance is 0.6 and the weight set for the second distance is 0.4, the distance between the smart voice device and the user is the first distance 0.6+ the second distance 0.4.

It should be noted that other devices may be configured to be continuously aware, and thus, may respond to received speech signals in real time. Other devices can be set to be turned on or off periodically, so that the aim of saving electricity can be fulfilled.

Of course, the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device may also be determined only by the second method, and the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device are determined without considering the first method, that is, the second distance between the intelligent voice device and the user is obtained only by detecting obstacles in the space in response to the received voice signal; and carrying out obstacle recognition on the space to obtain a second direction of the user relative to the intelligent voice equipment.

In step 103, when at least one second smart voice device is further included in the space, a target smart voice device satisfying the usage scenario of the space is determined in the first smart voice device and the at least one second smart voice device according to the location relationship.

After the position relation (direction and distance) between the intelligent voice equipment and the user is determined, when only the first intelligent voice equipment exists in the space, the first intelligent voice equipment is determined to be target intelligent voice equipment, when at least one second intelligent voice equipment is further included in the space, the target intelligent voice equipment meeting the use scene of the space is determined according to the position relation between the first intelligent voice equipment and the at least one second intelligent voice equipment, therefore, the target intelligent voice equipment responds to the voice signals of the user, and the situation that a plurality of intelligent voice equipment in the same space respond to the voice signals of the user at the same time is avoided.

Referring to fig. 3B, fig. 3B is an optional flowchart schematic diagram provided in an embodiment of the present invention, and in some embodiments, fig. 3B shows that step 103 in fig. 3A may be implemented by steps 1031 to 1032 shown in fig. 3B.

Determining a target intelligent voice device satisfying a usage scenario of a space in a first intelligent voice device and at least one second intelligent voice device according to a location relationship, comprising: the method comprises the following steps of carrying out perception processing on a position relation obtained by voice signals received by any intelligent device in a space:

in step 1031, when the time during which the positional relationship remains unchanged exceeds the time threshold, it is determined that the user is in a stationary state.

In step 1032, the intelligent voice device with the minimum distance to the user in the space is determined as the target intelligent voice device according to the distance between the intelligent voice device and the user included in the position relationship.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the target intelligent voice device can be determined according to the position relationship between any one of the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user. When the position relationship between any intelligent voice device and the user is kept unchanged within a preset time length, the user can be determined to be in a static state, and at the moment, the target intelligent voice device can be determined according to the distance between the intelligent voice device and the user, namely, the intelligent voice device with the minimum distance to the user in the space is determined to be the target intelligent voice device.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: the method comprises the following steps of carrying out perception processing on a position relation obtained by voice signals received by any intelligent device in a space: when the position relation changes, determining that the user is in a motion state; determining the direction change direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment, which is included in the position relationship; multiplying the reciprocal of the distance by the vector of the moving direction according to the distance between the user and the intelligent equipment in the position relationship to obtain the matching degree of the position relationship between the intelligent voice equipment and the user; in the first intelligent voice device and at least one second intelligent voice device, determining the intelligent voice device with the highest matching degree as a target intelligent voice device; when the direction of the user relative to the intelligent voice equipment changes to be close to the intelligent voice equipment, the moving direction value is positive, and when the direction of the user relative to the intelligent voice equipment changes to be far away from the intelligent voice equipment, the moving direction value is negative.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the target intelligent voice device can be determined according to the position relationship between any one of the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user. Typically, the user is in a state while the user is speaking the voice signal, such as when the user is closer to the first smart voice device before speaking the voice signal and farther from the first smart voice device after speaking the voice signal. Thus, in order to determine the appropriate target smart voice device to respond to the user's voice signal, it may be determined whether the user is in motion. When the position relationship between the intelligent voice device and the user changes, for example, the duration of the change of the position relationship exceeds a time threshold, it is determined that the user is in a motion state, and at this time, the target intelligent voice device can be determined according to the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device. The position relationship comprises a distance between the intelligent voice device and the user and a direction of the user relative to the intelligent voice device, and the direction change direction can be determined as the moving direction of the user relative to the intelligent voice device according to the direction of the user relative to the intelligent voice device in the position relationship. And multiplying the reciprocal of the distance by the vector of the moving direction to obtain the matching degree of the position relation between the intelligent voice equipment and the user, wherein the intelligent voice equipment with higher matching degree can meet the user requirement. Therefore, the intelligent voice device with the highest matching degree in the first intelligent voice device and the at least one second intelligent voice device is determined as the target intelligent voice device, and the target intelligent voice device can best meet the user requirement.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: determining intelligent voice equipment in an awakening state in the first intelligent voice equipment and the at least one second intelligent voice equipment; when the distance between the intelligent voice equipment in the awakening state and the user does not exceed the critical distance, determining the intelligent voice equipment in the awakening state as target intelligent voice equipment; the critical distance is the maximum distance between the user and the intelligent voice device when the user and the intelligent voice device can correctly perceive the voice signal sent by the other party.

When the space is determined to further comprise at least one second intelligent voice device, the first intelligent voice device receives a voice signal sent by the second intelligent voice device and can carry an awakening state identifier, the awakening state identifier is used for identifying whether the second intelligent voice device is in an awakening state or not, when the first intelligent voice device broadcasts the voice signal, the awakening state identifier of the first intelligent voice device is carried in the voice signal, the intelligent voice device in the awakening state in the first intelligent voice device and the at least one second intelligent voice device can be determined according to the awakening state identifier of any intelligent voice device, when the distance between the intelligent voice device in the awakening state and the user does not exceed the critical distance, the intelligent voice device in the awakening state can also sense the voice signal sent by the user, and the user can also sense the voice signal sent by the intelligent voice device in the awakening state, in order to improve the continuity of user experience, it may be determined that the smart voice device in the awake state is the target smart voice device, and even if the smart voice device in the awake state is not the smart voice device closest to the user, as long as the smart voice device in the awake state can satisfy the user experience, the smart voice device in the awake state may continue to respond to the voice signal of the user. For example, 2 pieces of intelligent voice equipment are relatively close to the user, and the intelligent voice equipment relatively far away is in the awakening state, so that the voice signal output by the intelligent voice equipment sensed by the user is not obviously reduced, and therefore, the equipment relatively far away can continuously respond to the voice signal of the user, and the intelligent voice equipment closest to the user is awakened until the distance of the currently used intelligent voice equipment exceeds the critical distance.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: and when the intelligent voice equipment which is interacting with the user exists in the first intelligent voice equipment and the at least one second intelligent voice equipment and the distance between the first intelligent voice equipment and the user does not exceed the critical distance, determining the intelligent voice equipment which is interacting with the user as the target intelligent voice equipment.

When the space is determined to further comprise at least one second intelligent voice device, the first intelligent voice device receives a voice signal sent by the second intelligent voice device and can carry an interaction state identifier, the interaction state identifier is used for identifying whether the second intelligent voice device is in a state of interacting with the user or not, when the first intelligent voice device broadcasts the voice signal, the interaction state identifier of the first intelligent voice device is carried in the voice signal, the intelligent voice device which is interacting with the user in the first intelligent voice device and the at least one second intelligent voice device can be determined according to the interaction state identifier of any intelligent voice device, when the distance between the intelligent voice device which is interacting with the user and the user does not exceed the critical distance, the intelligent voice device which is interacting with the user can also sense the voice signal sent by the user, and the user can also sense the voice signal sent by the intelligent voice device which is interacting with the user, in order to improve the continuity of the user experience, it may be determined that the intelligent voice device interacting with the user is the target intelligent voice device, and even if the intelligent voice device interacting with the user is not the intelligent voice device closest to the user, as long as the intelligent voice device interacting with the user can satisfy the user experience, the intelligent voice device interacting with the user may continue to respond to the voice signal of the user. For example, if 2 pieces of smart voice devices are closer to the user and a relatively far smart voice device is in an interactive state with the user, the relatively far smart voice device continues to respond to the voice signal of the user until the relatively far smart voice device exceeds a critical distance, and the smart voice device closest to the user is awakened when delay of switching response of the voice of the user is avoided.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: determining a change trend of a position relationship between the intelligent voice device in an awakening state and the user before the voice signal is received; and when the intelligent voice equipment in the awakening state is determined to exceed the critical distance according to the change trend of the position relationship, the intelligent voice equipment with the highest matching degree with the position relationship between the intelligent voice equipment and the user is determined to be the target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the target intelligent voice device can be determined according to the position relationship between any one of the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user. Firstly, determining the change trend of the position relationship between the intelligent voice equipment in the awakening state and the user before receiving the voice signal, and determining the intelligent voice equipment in the awakening state as target intelligent voice equipment when the intelligent voice equipment in the awakening state does not exceed the critical distance according to the change trend of the position relationship; when it is determined that the intelligent voice device in the awakening state exceeds the critical distance according to the change trend of the position relationship, it is indicated that the intelligent voice device in the awakening state may not meet the user requirement, and therefore, the intelligent voice device with the highest matching degree of the position relationship between the intelligent voice device and the user is determined as the target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device according to the determination method of the matching degree.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: determining a change trend of a position relationship between the intelligent voice device in an awakening state and the user before the voice signal is received; when the fact that the intelligent voice equipment in the awakening state exceeds the critical distance within the preset time length is determined according to the change trend of the position relation, the intelligent voice equipment with the highest matching degree with the position relation between the intelligent voice equipment and the user is determined to be the target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the target intelligent voice device can be determined according to the position relationship between any one of the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user. The method comprises the steps of firstly determining the change trend of the position relationship between the intelligent voice equipment in the awakening state and a user before receiving a voice signal, and when the intelligent voice equipment in the awakening state is determined to exceed a critical distance within a preset time length according to the change trend of the position relationship, indicating that the intelligent voice equipment in the awakening state possibly cannot meet the requirement of the user.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: determining a change trend of a position relationship between the intelligent voice device in an awakening state and the user before the voice signal is received; and when the intelligent voice equipment in the awakening state does not exceed the critical distance according to the change trend of the position relationship, determining the intelligent voice equipment in the awakening state as target intelligent voice equipment.

When the intelligent voice equipment in the awakening state does not exceed the critical distance within the preset time length or does not exceed the critical distance within the preset time length according to the change trend of the position relationship, the intelligent voice equipment in the awakening state can meet the user requirement, and in order to improve the user experience continuity, the intelligent voice equipment in the awakening state can be determined as the target intelligent voice equipment.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: acquiring historical data of the intelligent voice equipment in the awakening state before the voice signal is received, and predicting the predicted use duration of the intelligent voice equipment in the awakening state through an artificial intelligence model by combining the change trend of the position relation of the awakened intelligent voice equipment, the use duration and the awakening times; and in the first intelligent voice equipment and the at least one second intelligent voice equipment, the intelligent voice equipment with the highest matching degree with the position relation between the users is determined to be the target intelligent voice equipment.

Before determining a target intelligent voice device, acquiring historical data of the intelligent voice device in an awakening state before receiving a voice signal, and predicting the expected use time of the intelligent voice device in the awakening state through an artificial intelligence model by combining the change trend, the use time and the awakening times of the position relationship of the awakened intelligent voice device, wherein the change trend of the position relationship of the awakened intelligent voice device represents that the closer the user is to the awakened intelligent voice device, the longer the expected use time of the awakened intelligent voice device is; the longer the service life of the awakened intelligent voice equipment is, the longer the expected service life is; the more times the intelligent voice device is awakened, the longer the usage time is expected to be. And in the first intelligent voice equipment and the at least one second intelligent voice equipment, the intelligent voice equipment which is determined to have the highest matching degree with the position relation between the intelligent voice equipment and the user is determined to be the target intelligent voice equipment, so that when the expected use duration arrives, the target intelligent voice equipment is triggered to be in an awakening state and respond to the voice signal of the user.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: when the account number is determined to correspond to the plurality of intelligent voice devices based on the user account number bound by the first intelligent voice device and the at least one second intelligent voice device, the intelligent voice device with the highest matching degree with the position relation between the intelligent voice devices and the user is determined to be the target intelligent voice device.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the target intelligent voice device can be determined according to the position relationship between any one of the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user. The method includes the steps that based on a user account number bound by a first intelligent voice device and at least one second intelligent voice device, the account number is determined to correspond to a plurality of intelligent voice devices, and then it is stated that a target intelligent voice device needs to be determined from the plurality of intelligent voice devices to respond to voice signals of a user. Therefore, in the plurality of intelligent voice devices, the intelligent voice device with the highest matching degree can be determined as the target intelligent voice device according to the method for determining the matching degree of the position relationship between the intelligent voice device and the user.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: identifying a user corresponding to the voice signal of the user based on the voiceprint characteristics of the voice signal of the user; when the user corresponding to the account is determined to be the user corresponding to the voice signal based on the user account bound by the first intelligent voice device and the at least one second intelligent voice device, the intelligent voice device corresponding to the account is determined to be the awakenable intelligent voice device; and determining the intelligent voice equipment with the highest matching degree with the position relation between the users as the target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the target intelligent voice device can be determined according to the position relationship between any one of the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user. The method includes the steps that a user corresponding to a voice signal of the user is identified based on voiceprint characteristics of the voice signal of the user, when the user corresponding to an account is determined to be the user corresponding to the voice signal based on a user account bound by a first intelligent voice device and at least one second intelligent voice device, the intelligent voice device corresponding to the account can be determined to be an awakenable intelligent voice device, and then it is indicated that a target intelligent voice device needs to be determined from a plurality of awakenable intelligent voice devices to respond to the voice signal of the user. Therefore, in the plurality of awakenable intelligent voice devices, the intelligent voice device with the highest matching degree can be determined as the target intelligent voice device according to the method for determining the matching degree of the position relationship between the intelligent voice device and the user.

In step 104, the target smart voice device is triggered to be in a wake-up state in response to the voice signal of the user.

After the target intelligent voice device is determined, the first intelligent voice device may trigger the target intelligent voice device to be in an awake state and respond to a voice signal of a user, wherein the target intelligent voice device may also be in the awake state before the target intelligent voice device is triggered to be in the awake state. The first intelligent voice device can trigger the target intelligent voice device to be in the awakening state through local area network broadcasting or other close-range communication modes so as to respond to the voice signal of the user.

In some embodiments, determining a change trend of a position relationship between the smart voice device in an awake state before receiving the voice signal and the user, and when it is determined that the smart voice device in the awake state will exceed the critical distance according to the change trend of the position relationship, determining, among the first smart voice device and the at least one second smart voice device, the smart voice device with the highest degree of matching with the position relationship between the users as a target smart voice device, triggering the target smart voice device to be in the awake state, includes: when the intelligent voice equipment with the highest matching degree with the position relation of the user is determined to be the target intelligent voice equipment, triggering the intelligent voice equipment in the awakening state to be in the standby state, and awakening the target intelligent voice equipment in real time.

When it is determined that the intelligent voice device in the awakening state exceeds the critical distance, the intelligent voice device in the awakening state can be triggered to be in the standby state, the target intelligent voice device is awakened in real time, the intelligent voice device in the awakening state is prevented from being still in the awakening state, the power-saving effect can be achieved, a plurality of intelligent voice devices can be prevented from being in the awakening state, and the user experience is reduced.

In some embodiments, determining a change trend of a position relationship between the smart voice device in the wake-up state and the user before receiving the voice signal, and when it is determined that the smart voice device in the wake-up state will exceed a critical distance within a preset time period according to the change trend of the position relationship, determining, in the first smart voice device and the at least one second smart voice device, a smart voice device with a highest degree of matching with the position relationship between the user as a target smart voice device, triggering the target smart voice device to be in the wake-up state, includes: and before the intelligent voice device in the awakening state does not exceed the critical distance, awakening the target intelligent voice device in advance.

When the fact that the intelligent voice equipment in the awakening state exceeds the critical distance within the preset time is determined, the target intelligent voice equipment can be awakened in advance before the intelligent voice equipment in the awakening state does not exceed the critical distance, seamless connection of the intelligent voice equipment is achieved, and the problem that the target intelligent voice equipment is awakened again when the intelligent voice equipment in the awakening state does not exceed the critical distance and awakening time is wasted is avoided.

In some embodiments, when obtaining historical data of a smart voice device that is in an awake state before receiving a voice signal and predicting, by an artificial intelligence model, an expected usage duration of the smart voice device that is in the awake state in combination with a change trend of a positional relationship of the smart voice device that has been awake, a usage duration, and an awake number, and among a first smart voice device and at least one second smart voice device, a smart voice device that is determined to have a highest degree of matching with a positional relationship between users is determined to be a target smart voice device, triggering the target smart voice device to be in the awake state, includes: when the expected use time reaches, awakening the target intelligent voice equipment in real time; alternatively, the target smart voice device may be pre-awakened before the expected usage duration is reached.

After the predicted use time of the intelligent voice equipment in the awakening state is predicted through the artificial intelligence model, the target intelligent voice equipment can be awakened in real time when the predicted use time is reached, and the effect of saving electricity is achieved; or, before the expected use duration arrives, the target intelligent voice equipment is awakened in advance, seamless connection of the intelligent voice equipment is achieved, and the problem that the target intelligent voice equipment is awakened again when the expected use duration arrives, and awakening time is wasted is avoided.

In some embodiments, after triggering the target smart voice device to be in the awake state, the method further comprises: triggering intelligent voice equipment which is out of the target intelligent voice equipment and is in an awakening state to be switched to a standby state in real time; or, waiting for a preset time period, determining a change trend of a position relation between the target intelligent voice device and the user aiming at the intelligent voice devices in the awakening state except the target intelligent voice device in the preset time period, and triggering the intelligent voice devices in the awakening state to switch to the standby state when the intelligent voice devices in the awakening state are determined to exceed a critical distance in the preset time period.

In order to avoid that a plurality of intelligent voice devices are in an awakening state in the same space, the intelligent voice devices which are out of the target intelligent voice device and are in the awakening state can be triggered to be switched to be in a standby state in real time. Or, in order to avoid switching the intelligent voice device to the wake-up state back and forth, the target intelligent voice device may be triggered to wait for a preset time period after being in the wake-up state, and a change trend of a position relationship with the user is determined for the intelligent voice devices in the wake-up state other than the target intelligent voice device within the preset time period, and when it is determined that the intelligent voice device in the wake-up state will exceed a critical distance within a preset time period, the intelligent voice device in the wake-up state is triggered to switch to the standby state.

Referring to fig. 3C, fig. 3C is an optional flowchart provided by an embodiment of the present invention, and in some embodiments, fig. 3C shows that, in step 105, when the target smart voice device is not the same device as the smart voice device that was in the awake state before receiving the voice signal, and in a process that the smart voice device that was in the awake state last responded to the voice signal of the user, a distance between the smart voice device that was in the awake state and the user exceeds a critical distance, the target smart voice device is triggered to respond to the last voice signal again.

In order to avoid that the user does not clearly obtain the voice signal of the intelligent voice device responding to the user, when the target intelligent voice device is not the same device as the intelligent voice device which is in the awakening state before the voice signal is received, and the distance between the intelligent voice device in the awakening state and the user when the intelligent voice device in the awakening state last responds to the voice signal of the user exceeds the critical distance, the user may not sense the information of the voice signal of the intelligent voice device which is in the awakening state last responds to the user, so that the target intelligent voice device can be triggered to respond to the last voice signal again, and signal omission is avoided.

The following describes a control method of an intelligent voice device according to an embodiment of the present invention, taking an execution subject for implementing the control scheme of the intelligent voice device according to the embodiment of the present invention as an example. Referring to fig. 4, fig. 4 is a schematic flowchart of a control method of an intelligent voice device according to an embodiment of the present invention, and is described with reference to the steps shown in fig. 4.

In step 201, a first smart voice device and at least one second smart voice device receive a voice signal of a user.

When the user utters a voice signal, such as the wake-up word "ABAB," the first smart voice device and the at least one second smart voice device may capture the user's voice signal.

In step 202, the first smart voice device and the at least one second smart voice device send the voice signal of the user to the server.

In step 203, the server receives the voice signals of the user sent to the server by the first intelligent voice device and the at least one second intelligent voice device.

In step 204, the server performs perceptual processing on the space according to the voice signal to determine the intelligent voice device included in the space and the position relationship with the user.

After the server receives the voice signals of the users of the first intelligent voice device and the at least one second intelligent voice device, the space can be subjected to perception processing according to the voice signals, so that the intelligent voice devices included in the space and the position relation between the intelligent voice devices and the users can be determined.

In some embodiments, perceptually processing a space according to a speech signal to determine a location relationship between a smart speech device included in the space and a user includes: for a voice signal of a user received by any intelligent voice device in a space, the following processing is executed:

analyzing and processing the voice signals of the user received by the intelligent voice equipment from multiple directions to obtain energy values of the voice signals of the user received by the intelligent voice equipment from multiple directions; and determining the direction corresponding to the maximum energy value as the direction of the user relative to the intelligent voice equipment, and determining the distance corresponding to the attenuation value as the distance between the intelligent voice equipment and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user.

After the server receives the voice signals (carrying device identifiers for uniquely identifying the devices) of the user sent by the first intelligent voice device and the second intelligent voice device, the server performs direction and distance recognition processing on the voice signals of the user received by any intelligent voice device (the first intelligent voice device and the second intelligent voice device). The intelligent voice device can be provided with a multi-directional microphone array and is used for receiving the voice device of the user from multiple directions, so that the intelligent voice device obtains energy values of voice signals of the user received from multiple directions and sends the energy values of the voice signals of the user received from multiple directions to the server, and the server carries out direction and distance recognition processing according to the energy values of the voice signals of the user received from multiple directions.

In step 205, when the space further includes at least one second smart voice device, the server determines a target smart voice device satisfying the usage scenario of the space from among the first smart voice device and the at least one second smart voice device according to the location relationship.

After the server determines the position relationship (direction and distance) between the intelligent voice equipment and the user, when only the first intelligent voice equipment exists in the space, the first intelligent voice equipment is determined to be target intelligent voice equipment, when at least one second intelligent voice equipment is further included in the space, the target intelligent voice equipment meeting the use scene of the space is determined according to the position relationship between the first intelligent voice equipment and the at least one second intelligent voice equipment, so that the target intelligent voice equipment responds to the voice signals of the user, and the situation that a plurality of intelligent voice equipment respond to the voice signals of the user simultaneously in the same space is avoided.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a location relationship includes: the method comprises the following steps of carrying out perception processing on a position relation obtained by voice signals received by any intelligent device in a space: when the time for which the position relation is kept unchanged exceeds a time threshold, determining that the user is in a static state; and according to the distance between the intelligent voice equipment and the user included in the position relationship, determining the intelligent voice equipment with the minimum distance to the user in the space as target intelligent voice equipment.

After the server determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the target intelligent voice device can be determined for the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user.

In step 206, the server triggers the target smart voice device to be in a wake state in response to the user's voice signal.

After the server determines the target intelligent voice device, a wake-up instruction can be sent to the target intelligent voice device according to the address of the target intelligent voice device, and the target intelligent voice device receives the wake-up instruction, enters a wake-up state, and responds to the voice signal of the user, so that the target intelligent voice device is triggered to be in the wake-up state to respond to the voice signal of the user.

In some embodiments, after triggering the target smart voice device to be in the awake state, the method further comprises: when the target intelligent voice device is not the same as the intelligent voice device in the awakening state before the voice signal is received, and the distance between the intelligent voice device in the awakening state and the user exceeds the critical distance in the process that the intelligent voice device in the awakening state responds to the voice signal of the user for the last time, triggering the target intelligent voice device to respond to the last voice signal again.

So far, the control method of the intelligent voice device according to the embodiment of the present invention has been described in conjunction with the exemplary application and implementation of the electronic device provided by the embodiment of the present invention when the electronic device is an intelligent voice device and a server, and the following continues to describe the scheme for controlling the intelligent voice device by matching each module in the control apparatus 555 of the intelligent voice device provided by the embodiment of the present invention.

The receiving module 5551 is configured to receive a voice signal of a user in a space where the first smart voice device is located;

a perception module 5552, configured to perform perception processing on the space according to the voice signal to determine an intelligent voice device included in the space and a location relationship between the intelligent voice device and the user;

a processing module 5553, configured to, when the space further includes at least one second smart voice device, determine, according to the location relationship, a target smart voice device that satisfies a usage scenario of the space among the first smart voice device and the at least one second smart voice device, an

A triggering module 5554 for triggering the target smart voice device to be in a wake-up state in response to the voice signal of the user.

In the above technical solution, the sensing module 5552 is further configured to execute the following processing on the voice signal of the user received by any intelligent voice device in the space: analyzing and processing the voice signals of the user received by the intelligent voice equipment from multiple directions to obtain energy values of the voice signals of the user received by the intelligent voice equipment from multiple directions; determining the direction corresponding to the maximum energy value as the direction of the user relative to the intelligent voice equipment, and determining the distance corresponding to the attenuation value as the distance between the intelligent voice equipment and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user.

In the above technical solution, the sensing module 5552 is further configured to execute the following processing on the voice signal of the user received by any intelligent voice device in the space: analyzing the voice signal to obtain a first distance between the intelligent voice equipment and the user and a first direction of the user relative to the intelligent voice equipment; responding to the received voice signal, and detecting obstacles in the space to obtain a second distance between the intelligent voice equipment and the user; carrying out obstacle recognition on the space to obtain a second direction of the user relative to the intelligent voice equipment; when the distance difference value between the first distance and the second distance is greater than a distance error threshold value and/or the direction error between the first direction and the second direction is greater than a direction error threshold value, determining the weighted value of the first distance and the second distance as the distance between the intelligent voice device and the user, and determining the average value of the first direction and the second direction as the direction of the user relative to the intelligent voice device.

In the foregoing technical solution, the processing module 5553 is further configured to perform the following processing on a position relationship obtained by performing sensing processing on a voice signal received by any intelligent device in the space: when the time that the position relation is kept unchanged exceeds a time threshold, determining that the user is in a static state; and according to the distance between the intelligent voice equipment and the user included in the position relationship, determining the intelligent voice equipment with the minimum distance from the user in the space as target intelligent voice equipment.

In the foregoing technical solution, the processing module 5553 is further configured to perform the following processing on a position relationship obtained by performing sensing processing on a voice signal received by any intelligent device in the space: when the position relation changes, determining that the user is in a motion state; determining the direction of change of the direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment, wherein the direction of change of the direction is included in the position relation; according to the distance between the user and the intelligent equipment in the position relationship, multiplying the reciprocal of the distance by the vector of the moving direction to obtain the matching degree of the position relationship between the intelligent voice equipment and the user; determining the intelligent voice equipment with the highest matching degree as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment; when the direction of the user relative to the intelligent voice equipment changes to be close to the intelligent voice equipment, the moving direction value is positive, and when the direction of the user relative to the intelligent voice equipment changes to be far away from the intelligent voice equipment, the moving direction value is negative.

In the above technical solution, the processing module 5553 is further configured to determine an intelligent voice device in an awake state in the first intelligent voice device and the at least one second intelligent voice device; when the distance between the intelligent voice equipment in the awakening state and the user does not exceed the critical distance, determining that the intelligent voice equipment in the awakening state is the target intelligent voice equipment; the critical distance is the maximum distance when the user and the intelligent voice equipment can correctly perceive the voice signal sent by the other party.

In the foregoing technical solution, the processing module 5553 is further configured to determine that the intelligent voice device interacting with the user is a target intelligent voice device when the intelligent voice device interacting with the user exists in the first intelligent voice device and the at least one second intelligent voice device and a distance between the first intelligent voice device and the user does not exceed a critical distance.

In the above technical solution, the processing module 5553 is further configured to determine a variation trend of a position relationship between the intelligent voice device in the awake state before receiving the voice signal and the user; when it is determined that the intelligent voice device in the awakening state exceeds the critical distance according to the change trend of the position relationship, determining the intelligent voice device with the highest matching degree with the position relationship between the intelligent voice device and the user as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

the triggering module 5554 is further configured to, when it is determined that the intelligent voice device with the highest degree of matching with the position relationship of the user is the target intelligent voice device, trigger the intelligent voice device in the wake-up state to be in a standby state, and wake up the target intelligent voice device in real time.

In the above technical solution, the processing module 5553 is further configured to determine a variation trend of a position relationship between the intelligent voice device in the wake-up state and the user before receiving the voice signal; when it is determined that the intelligent voice device in the awakening state exceeds the critical distance within a preset time according to the change trend of the position relationship, determining the intelligent voice device with the highest matching degree with the position relationship between the intelligent voice device and the user as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

the triggering module 5554 is further configured to wake up the target smart voice device in advance before the smart voice device in the wake-up state does not exceed the critical distance.

In the above technical solution, the processing module 5553 is further configured to determine a variation trend of a position relationship between the intelligent voice device in the wake-up state and the user before receiving the voice signal; and when the intelligent voice equipment in the awakening state does not exceed the critical distance according to the change trend of the position relationship, determining that the intelligent voice equipment in the awakening state is the target intelligent voice equipment.

In the above technical solution, the processing module 5553 is further configured to obtain historical data of the intelligent voice device in the awake state before receiving the voice signal, and predict, by using an artificial intelligence model, an expected usage duration of the intelligent voice device in the awake state in combination with a change trend, a usage duration, and an awake frequency of the location relationship of the intelligent voice device that has been awoken; determining, in the first intelligent voice device and the at least one second intelligent voice device, the intelligent voice device with the highest matching degree with the position relationship between the users as a target intelligent voice device; the triggering module is also used for waking up the target intelligent voice equipment in real time when the expected use time reaches the preset time; or, the target intelligent voice device is awakened in advance before the expected use duration is reached.

In the above technical solution, the apparatus further includes:

a switching module 5555, configured to trigger an intelligent voice device, other than the target intelligent voice device, in an awake state to switch to a standby state in real time; or,

In the above technical solution, the control device 555 of the intelligent voice device further includes:

a response module 5556, configured to trigger the target smart voice device to re-respond to the last voice signal when the target smart voice device is not the same device as the smart voice device in the wake-up state before receiving the voice signal and a distance between the smart voice device in the wake-up state and the user exceeds a critical distance in a process that the smart voice device in the wake-up state last responds to the voice signal of the user.

In the foregoing technical solution, the processing module 5553 is further configured to determine, based on a user account bound to the first intelligent voice device and the at least one second intelligent voice device, that, when the account corresponds to multiple intelligent voice devices, among the multiple intelligent voice devices, an intelligent voice device with the highest matching degree with the location relationship between the users is a target intelligent voice device.

In the above technical solution, the processing module 5553 is further configured to identify a user corresponding to the voice signal of the user based on a voiceprint feature of the voice signal of the user; when the user corresponding to the account is determined to be the user corresponding to the voice signal based on the user account bound by the first intelligent voice device and the at least one second intelligent voice device, determining the intelligent voice device corresponding to the account to be the awakenable intelligent voice device; and determining the intelligent voice equipment with the highest matching degree with the position relation between the users as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment.

Embodiments of the present invention also provide a storage medium storing executable instructions, where the executable instructions are stored, and when executed by a processor, will cause the processor to execute a control method of an intelligent voice device provided by an embodiment of the present invention, for example, the control method of the intelligent voice device shown in fig. 3A to 3C, or the control method of the intelligent voice device shown in fig. 4.

In some embodiments, the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.

An Automatic Speech Recognition (ASR) technology can well meet user requirements in a single intelligent Speech device scenario, but user experience is not very good in a scenario where multiple intelligent Speech devices coexist.

With more and more intelligent voice devices, a situation that a plurality of intelligent voice devices exist in the same scene (in the same home or in the same room) appears, and in this situation, if a user wakes up the intelligent voice devices and initiates a voice request, the plurality of intelligent voice devices simultaneously respond to and reply the voice request initiated by the user, so that the experience of the user is greatly reduced.

In order to solve the above problem, an embodiment of the present invention provides a method for controlling an intelligent Voice device (a single device response method (VSSP) based on Spatial Perception), where the method may judge a device closest to a user in a physical space according to an energy level of a Voice received by the intelligent Voice device from the user side and a comprehensive latitude of a VS account system. Therefore, under the above situation, even if the user initiates a voice request to the smart voice device, only the smart voice device closest to the user will give a response, and other smart voice devices farther away from the user will not respond to the request of the user, and automatically enter a standby state to wait for next wake-up, so as to avoid sound confusion, and thus, the method can be better applied to a scene where multiple smart voice devices coexist.

Fig. 5 is a schematic diagram of a user waking up an intelligent device according to an embodiment of the present invention, and as shown in fig. 5, an intelligent speech device 1 and an intelligent speech device 2 are in the same environment, where a physical space of the intelligent speech device 1 is closer to the user than a physical space of the intelligent speech device 2, and when the user speaks an "ABAB" wake up word, both the intelligent speech device 1 and the intelligent speech device 2 are woken up and wait for a response to a speech request of the user.

Fig. 6 is a schematic view of an application scenario of the intelligent voice device according to the embodiment of the present invention, as shown in fig. 6, when a user initiates a real voice request, for example, "what is the weather today", both the intelligent voice device 1 and the intelligent voice device 2 receive the voice request of the user. At this time, since the intelligent speech device 1 is closer to the user than the intelligent speech device 2, only the intelligent speech device 1 (target intelligent speech device) replies the speech request of the user and broadcasts a reply language, for example, "shenzhen is today fine, temperature.

Fig. 7 is a schematic view of an application scenario in which the smart voice device interacts with the cloud, as shown in fig. 7, in the application scenario of fig. 6, after receiving a voice request of a user, the smart voice device 1 and the smart voice device 2 may send the voice request (voice data) of the corresponding user to the cloud, and the cloud may receive the request from the smart voice device 1 and the request from the smart voice device 2, where the request of the smart voice device 1 and the request of the smart voice device 2 are the same voice stream of "how the weather is today". At this moment, the cloud end can judge whether the intelligent voice device 1 and the intelligent voice device 2 are the same login account, and if the login accounts of the intelligent voice device 1 and the intelligent voice device 2 are the same, it is indicated that the intelligent voice device 1 and the intelligent voice device 2 belong to 2 devices of the same user with a high probability. Moreover, if the time of the request of the intelligent voice device 1 received by the cloud is close to the time of the request of the intelligent voice device 2, the probability that the intelligent voice device 1 and the intelligent voice device 2 are in the same environment is very high. At this moment, the cloud end can perform VSSP processing, compare the energy value of the voice data uploaded by the intelligent voice device 1 with the energy value of the voice data uploaded by the intelligent voice device 2, and judge that the distance between the intelligent voice device 1 and the user is closer according to the energy value of the voice data, so that the cloud end can issue a broadcast instruction to the intelligent voice device 1 (target intelligent voice device), and the intelligent voice device 1 broadcasts a reply according to the broadcast instruction, for example, "deep fin today is clear, and the temperature. Meanwhile, the cloud sends a standby instruction to the intelligent voice device 2, so that the intelligent voice device 2 enters a standby state to wait for the next awakening of the user.

Fig. 8 is a waveform diagram of voice data uploaded to a cloud by the intelligent voice device 1 according to the embodiment of the present invention, fig. 9 is a frequency spectrum diagram of voice data uploaded to the cloud by the intelligent voice device 1 according to the embodiment of the present invention, fig. 10 is a waveform diagram of voice data uploaded to the cloud by the intelligent voice device 2 according to the embodiment of the present invention, fig. 11 is a frequency spectrum diagram of voice data uploaded to the cloud by the intelligent voice device 2 according to the embodiment of the present invention, as can be seen from the waveform diagrams shown in fig. 8 and fig. 10, in the application scenario of fig. 7, after "how good the weather today" is uploaded to the cloud, the cloud can determine that the waveform energy value in fig. 8 is much larger than the waveform energy value in fig. 10 through voice recognition; as can be seen from the frequency spectrums shown in fig. 9 and 11, in the application scenario of fig. 9, after "what is the weather today" is uploaded to the cloud, the cloud can determine that the spectral energy value in fig. 9 is much larger than the spectral energy value in fig. 11 in the same time through voice recognition, that is, as can be seen from the places in the boxes in the frequency spectrums shown in fig. 9 and 11, the high frequency region of the voice data in fig. 9 is more active than the high frequency region of the voice data in fig. 11. Therefore, as can be seen from fig. 8 to 11, if the energy value of the voice data received by the smart voice device 1 at the same time is greater than the energy value of the voice data of the smart voice device 2, it indicates that the smart voice device 1 is closer to the physical space of the user relative to the smart voice device 3, and the VSSP method needs to be triggered, and the cloud control only allows the smart voice device 1 to respond to the voice request of the user.

Fig. 12 is a schematic view of another application scenario in which the intelligent voice device interacts with the cloud, as shown in fig. 12, the intelligent voice device 1 and the intelligent voice device 2 are the same login account, the intelligent voice device 3 is an account different from the intelligent voice device 1 and the intelligent voice device 2, and distances between the intelligent voice device and the user are respectively: intelligent speech equipment 3> intelligent speech equipment 2> intelligent speech equipment 1, at this moment, the VSSP method only can take effect between intelligent speech equipment 1 and intelligent speech equipment 2, intelligent speech equipment 1 (target intelligent speech equipment) and intelligent speech equipment 3 will respond to user's request simultaneously, intelligent speech equipment 2 because the VSSP method can automatic entering standby state, thereby can be applicable to many intelligent speech equipment coexistence in the same space, but intelligent speech equipment belongs to the scene of different users (such as public places such as offices), can not lead to other farther intelligent speech equipment of distance can't use simultaneously because of intelligent speech equipment is same scene, reach the effect that intelligent speech equipment shared.

In order to verify the effect achieved by the embodiment of the invention, the intelligent voice device (jingguan intelligent screen) adopting the VSSP method is compared with the existing intelligent voice device (degree at home and day sprites), wherein, under the same scene, two jingguan intelligent screens, two degrees at home and two day sprites are respectively set, and the same account is logged in, and the comparison result is shown in table 1:

TABLE 1

Product type	The device 1 is 1 meter from the user	The device 2 is 3 meters from the user
			Degree at home	Response to	Response to
Tiantian fairy	Response to	Response to
			Jingle intelligent screen	Response to	Does not respond to

As can be seen from table 1, in the same application scenario, when the degree is at home and the day is at 2 devices, the device wakes up simultaneously, responds to the voice request of the user, and replies simultaneously, thereby causing a sound confusion phenomenon. In the scene that 2 devices coexist, the jingle smart screen of the VSSP method is turned on, although the jingle smart screen is simultaneously awakened, only the closest device (the device 1 which is 1 m away from the user) replies the voice request of the user, and the user experience is good.

When a plurality of intelligent voice devices are in the same local area network and a plurality of intelligent voice devices (intelligent voice device 1 and intelligent voice device 2 … …) receive a voice request of a user, the cloud is not needed, the intelligent voice device 1 can send encrypted account information and energy values of voice data in the local area network, receive encrypted account information and energy values of voice data sent by other intelligent voice devices (intelligent voice device 2 … …), and locally compare the encrypted account information and energy values of voice data sent by other intelligent voice devices with the encrypted account information and energy values of voice data of the intelligent voice devices. If the received account information is consistent with the self account information and the energy value of the received voice data is larger than that of the self voice data, the intelligent voice device 1 automatically enters a standby state.

In addition, the intelligent voice equipment can judge the equipment which is most in accordance with the user requirement in physical space from other latitudes of the voice signals received by the user side. For example, 1) the position relationship (i.e. at least one of the direction and the distance) of the voice signal can be considered, when the user is in a static state, the dimension of the distance is considered preferentially, and the target intelligent voice device is determined according to the distance to respond to the voice signal of the user; when the user is in motion, the moving direction and the matching degree of the distance can be comprehensively considered, for example, the matching degrees of the moving direction and the distance are weighted, so as to determine the target intelligent voice device to respond to the voice signal of the user. 2) Besides using the energy value of the voice signal to represent the distance between the user and the intelligent voice equipment, devices such as an infrared sensing device, an ultrasonic device, a camera and the like can be used for sensing the position relation, and the devices can be integrated in the intelligent voice equipment or can be independent devices sensed and used by the intelligent voice equipment. 3) When a plurality of awakened intelligent voice devices exist, the awakened intelligent voice devices can be filtered according to the identity information of the user, namely the awakened intelligent voice devices are only bound to the same user account. 4) And if the current user has interaction with the intelligent voice device (for example, the user speaks while walking), determining the target intelligent semantic device on the premise of ensuring the user interaction.

In summary, in the embodiment of the present invention, the target intelligent voice device meeting the usage scenario of the space is determined in the first intelligent voice device and the at least one second intelligent voice device according to the location relationship, so that the intelligent voice devices in the same scenario are prevented from responding to the voice signal of the user, and the experience of the user is improved.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A control method of an intelligent voice device is characterized by comprising the following steps:

2. The method according to claim 1, wherein the perceptually processing the space according to the voice signal to determine a location relationship between a smart voice device included in the space and the user comprises:

for the voice signal of the user received by any intelligent voice device in the space, executing the following processing:

3. The method according to claim 1, wherein the perceptually processing the space according to the voice signal to determine a location relationship between a smart voice device included in the space and the user comprises:

4. The method according to claim 1, wherein the determining, in the first smart voice device and the at least one second smart voice device, a target smart voice device that satisfies the usage scenario of the space according to the location relationship comprises:

and executing the following processing on the position relation obtained by carrying out perception processing on the voice signal received by any intelligent equipment in the space:

5. The method according to claim 1, wherein the determining, in the first smart voice device and the at least one second smart voice device, a target smart voice device that satisfies the usage scenario of the space according to the location relationship comprises:

6. The method according to claim 1, wherein the determining, in the first smart voice device and the at least one second smart voice device, a target smart voice device that satisfies the usage scenario of the space according to the location relationship comprises:

determining an intelligent voice device in an awakening state in the first intelligent voice device and the at least one second intelligent voice device;

7. The method according to claim 1, wherein the determining, in the first smart voice device and the at least one second smart voice device, a target smart voice device that satisfies the usage scenario of the space according to the location relationship comprises:

when there is an intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device that is interacting with the user, an

8. The method according to claim 1, wherein the determining, in the first smart voice device and the at least one second smart voice device, a target smart voice device that satisfies the usage scenario of the space according to the location relationship comprises:

determining a trend of change of a position relationship between the smart voice device in an awake state and the user before the voice signal is received;

the triggering the target smart voice device to be in a wake-up state includes:

when the intelligent voice equipment with the highest matching degree with the position relation of the user is determined to be the target intelligent voice equipment, triggering the intelligent voice equipment in the awakening state to be in the standby state, and awakening the target intelligent voice equipment in real time.

9. The method according to claim 1, wherein the determining, in the first smart voice device and the at least one second smart voice device, a target smart voice device that satisfies the usage scenario of the space according to the location relationship comprises:

the triggering the target smart voice device to be in a wake-up state includes:

and before the intelligent voice equipment in the awakening state does not exceed the critical distance, awakening the target intelligent voice equipment in advance.

10. The method of claim 1, wherein after the triggering the target smart voice device to be in an awake state, the method further comprises:

triggering intelligent voice equipment which is out of the target intelligent voice equipment and is in an awakening state to be switched to a standby state in real time; or,

11. The method according to any one of claims 1-10, wherein determining a target smart voice device that satisfies a usage scenario of the space among the first smart voice device and the at least one second smart voice device according to the location relationship comprises:

and when the account number is determined to correspond to a plurality of intelligent voice devices based on the user account number bound by the first intelligent voice device and the at least one second intelligent voice device, determining the intelligent voice device with the highest matching degree with the position relation between the users as a target intelligent voice device in the plurality of intelligent voice devices.

12. A control apparatus of an intelligent voice device, the apparatus comprising:

13. An intelligent speech device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of controlling an intelligent speech device according to any one of claims 1 to 11 when executing the executable instructions stored in the memory.

14. A server for controlling a smart voice device, comprising:

a memory for storing executable instructions;

15. A storage medium storing executable instructions for causing a processor to implement the method of controlling an intelligent speech device according to any one of claims 1 to 11 when executed.