CN110691300B

CN110691300B - Audio playing device and method for providing information

Info

Publication number: CN110691300B
Application number: CN201910865114.5A
Authority: CN
Inventors: 梁文昭
Original assignee: Lianshang Xinchang Network Technology Co Ltd
Current assignee: Lianshang Xinchang Network Technology Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2022-07-19
Anticipated expiration: 2039-09-12
Also published as: CN110691300A

Abstract

The purpose of the present application is to provide an audio playback apparatus and a method for providing information; the audio playing device plays the current audio based on the first output level and collects environmental sound information; and the audio equipment responds to the detected control trigger event related to the environmental sound information and provides text information corresponding to the environmental sound information to the user or provides the environmental sound information to the user. According to the method and the system, the user can timely react and obtain the previous conversation content when someone mentions the topic of interest around, and therefore the team communication efficiency and the user experience are improved.

Description

Audio playing device and method for providing information

Technical Field

The present application relates to the field of communications, and more particularly, to a technique for providing information.

Background

With the improvement of living standard, people have higher and higher requirements on music playing equipment, for example, people can obtain immersive listening experience by wearing listening equipment such as earphones and earplugs; in some cases, people also wear earphones and earplugs to isolate outside noise. In addition, listening device manufacturers will also improve the ambient sound blocking performance (passive noise reduction) of listening devices as much as possible, and even add an active noise reduction function for the ambient sound to the listening devices, so as to improve the user experience.

Disclosure of Invention

An object of the present application is to provide an audio playback apparatus and a method for playing back audio.

According to one aspect of the present application, a method for providing information is provided, which is applied to an audio playing device. Wherein, the method comprises the following steps:

playing the current audio based on the first output level;

collecting environmental sound information; and

in response to detecting a control trigger event related to the environmental sound information, providing text information corresponding to the environmental sound information to a user; or,

providing the ambient sound information to a user in response to detecting a control trigger event with respect to the ambient sound information.

Accordingly, according to another aspect of the present application, the present application provides an audio playing device, wherein the device comprises:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the operations of the above-described method.

The present application also provides a computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform the operations of the above-described method.

According to another aspect of the present application, the present application further provides an audio playing device. Wherein, this equipment includes:

a first module for playing a current audio based on a first output level;

the second module is used for collecting environmental sound information; and

the third module is used for providing text information corresponding to the environmental sound information for a user in response to the detection of a control trigger event related to the environmental sound information; or,

a third second module that provides the ambient sound information to a user in response to detecting a control trigger event with respect to the ambient sound information.

With the improvement of the blocking performance of earphones, earplugs and the like to outside sound, a user probably cannot hear surrounding people speaking with the user when enjoying music, so that the user misses the interested topics, and the user experiences poor and has less pleasure. In view of this, the present application provides an audio playing device and a method for playing an audio, which collect environmental sounds, detect a specific control trigger event based on the environmental sounds, and provide the environmental sounds or texts corresponding to the environmental sounds within a period of time to a user when the control trigger event is detected, so that the user can timely react and obtain previous conversation content when someone around mentions a topic of interest of the user, thereby improving team communication efficiency and user experience.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1a and FIG. 1b are flow charts of a method for playing audio according to an embodiment of the present application;

FIGS. 2a to 2c respectively show an implementation scenario of an embodiment of the present application;

FIG. 3 illustrates a scenario involving controlling the acquisition time of a trigger event versus time;

fig. 4a and fig. 4b respectively show functional modules of an audio playing device in an embodiment of the present application;

FIG. 5 illustrates functional modules of an exemplary system for various embodiments of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include forms of volatile Memory, Random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change Memory (PCM), Programmable Random Access Memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM), Flash Memory (Flash Memory) or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (Digital Versatile Disc, DVD) or other optical storage, magnetic tape or other magnetic or non-magnetic storage devices, may be used to store information that may be accessed by the computing device.

The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product, such as a smart phone, a tablet computer, etc., capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, etc. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a VPN Network, a wireless Ad Hoc Network (Ad Hoc Network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

With the improvement of the blocking performance of earphones, earplugs and the like to outside sound, a user probably cannot hear surrounding people speaking with the user when enjoying music, so that the user misses the interested topics, and the user experiences poor and has less pleasure. In view of the above, the present application provides an audio playing device and a method for providing information, which will be described in detail with reference to the accompanying drawings.

The following describes in detail a specific embodiment of the present application based on an audio playback device. Wherein the audio playing device is, in some embodiments, a mobile phone, a tablet computer, a personal computer or other electronic products, and in some embodiments, includes a main body portion for performing data processing, a sound collecting unit (e.g., the portion including a microphone and corresponding peripheral circuitry) for collecting ambient sound, and optionally an audio output unit (e.g., a headset, an ear plug, a speaker, etc.); wherein in some embodiments the sound capturing unit is attached to the audio output unit, e.g. the sound capturing unit comprises one or several microphones arranged on the audio output unit. Those skilled in the art will appreciate that existing and future audio playback devices, as may be suitable for use in the present application, are included within the scope of the present application and are hereby incorporated by reference. For example, without limitation, the audio output unit and/or the sound collection unit may be disposed inside the main body, may be connected to the main body by a cable, and may communicate with the main body based on a communication protocol such as bluetooth or Wi-Fi.

According to one aspect of the present application, a method for providing information is provided, which is applied to an audio playback device. Wherein the method comprises step S100, step S200 and step S310, refer to fig. 1 a; or the method comprises step S100, step S200 and step S320, see fig. 1 b. Wherein the method is implemented in some embodiments based on the scenario shown in fig. 2a, wherein the audio playing device 10 outputs audio through a pair of headphones, and collects the ambient sound at the position of the user through a microphone (e.g. the first microphone 201 in fig. 2 a).

Specifically, in step S100, the audio playing device plays current audio (e.g., a piece of music) based on a first output level. For example, the first output level corresponds to an output volume of the audio output unit, e.g., 60 dB.

In step S200, the audio playback apparatus collects ambient sound information through a microphone. In some embodiments, the microphone is embedded in the audio playback device, and in other embodiments, the microphone is coupled to the audio playback device in a wired or wireless manner. In particular, in some embodiments, the microphone is attached to an external audio output unit (e.g., the headset), for example, the microphone is fixed to the headset. It will be understood by those skilled in the art that the above-described microphone arrangements are merely exemplary and not limiting with respect to the embodiments of the present application, and that other existing or future microphone arrangements, as may be appropriate for the present application, are also within the scope of the present application and are hereby incorporated by reference.

In step S310, in response to detecting a control trigger event related to the environmental sound information, the audio playing device provides text information corresponding to the environmental sound information to a user. The text information corresponding to the environmental sound information is obtained by the audio playing device executing the voice recognition operation locally, or the audio playing device sends the environmental sound information to the corresponding network device (such as a cloud server) and the network device executes the voice recognition operation. In step S320, the audio playing device provides the saved ambient sound information to the user in response to detecting the control trigger event related to the ambient sound information, for example, the saved ambient sound information is output through the audio output unit (e.g., the earphone) currently used by the user.

Wherein, in order to reduce the influence that the environmental noise brought to the recognition precision and promote speech recognition's accuracy, audio playback equipment gathers environmental sound information based on low pass filter in some embodiments. Wherein, the low-pass filter can filter out part of white noise in the environment.

In some embodiments, the method further comprises step S400 (not shown). In step S400, after the audio playing device collects the environmental sound information, the audio playing device saves the environmental sound information for subsequent processing; in some embodiments, the ambient sound information collected by the audio playback device is stored in a cache (e.g., a region marked out in a memory) of the audio playback device. Accordingly, in response to detecting the control trigger event, the audio playing device provides the text information corresponding to the saved ambient sound information to the user, or provides the saved ambient sound information to the user.

In order to save the storage space and improve the efficiency of listening to the saved environmental sound information by the user, optionally in step S400, the audio playing device saves the environmental sound information based on a preset time window, for example, the audio playing device records the environmental sound cyclically. Specifically, in some embodiments, the audio playing device continuously records several pieces of audio of the same length (e.g., 5s) in sequence, and deletes the oldest audio (moves the oldest audio out of the time window) after the total duration of the pieces of audio exceeds the preset time window length (e.g., 20 s); and then, the audio playing device provides text information corresponding to the environmental sound information to the user based on all the audio clips which are stored currently, or provides the environmental sound information to the user.

In some embodiments, the method further comprises step S500 (not shown). In step S500, in response to detecting the control trigger event, the audio playing device plays the current audio based on the second output level, or suspends playing the current audio, so as to reduce interference caused by playing the current audio, so that the user can focus on the information provided by the audio playing device. Where the volume at which the current audio is played based on the second output level is different from the volume at which the current audio is played based on the first output level, in some exemplary embodiments the audio output volume corresponding to the second output level is lower than the audio output volume corresponding to the first output level, for example, the output volume corresponding to the second output level is 30dB (or other smaller value). In particular, in some embodiments, the output volume of the audio output unit is reduced to 0dB based on the second output level.

In the case of playing the current audio at the second output level, in order to facilitate the user to resume listening to the audio after the dialog or discussion is over, thereby improving the user experience, in some embodiments, the method further includes step S600 (not shown). In step S600, the audio playing device plays the current audio based on the first output level in response to detecting a recovery triggering event. For the case that the audio playing device plays the current audio at the second output level after detecting the control trigger event, the audio playing device resumes playing the current audio at the first output level, for example, the output volume of the audio output unit after adjusting the output level is restored from 30dB (or other smaller value) to 60 dB. For the case that the audio playing device pauses playing the current audio after detecting the control trigger event, the audio playing device continues playing the current audio at the first output level from the playing breakpoint position of the current audio or the vicinity of the breakpoint position, for example, the audio output unit outputs the current audio at an output volume of 60 dB.

Wherein to facilitate user action, in some embodiments the recovery triggering event comprises any one of:

a user performs a level restoration operation, such as a user pressing a physical/virtual key on the audio playback device, issuing a voice instruction, making a somatosensory input (e.g., the audio playback device detects a gesture or a shaking motion of the user through a camera to detect the input; or the audio playback device detects a shaking or a flipping of the user on the audio playback device based on a built-in gyroscope to detect the input; or the audio playback device detects the operation through its external device, such as the above-mentioned audio output unit detecting a somatosensory operation such as a shaking motion of the user or an operation of the user on a physical/virtual key, and sending the detection result to a main body portion of the audio playback device) to perform the level restoration operation;

the user does not answer for a preset duration, for example, the user does not speak for a preset duration (e.g., 10 seconds), or does not operate any physical/virtual keys or the like, or the user does not perform any somatosensory operation.

In some embodiments, in step S310, in response to detecting a control trigger event related to the environmental sound information, the audio playing device provides text viewing prompt information, and provides text information corresponding to the environmental sound information to the user. For example, the text viewing prompt information is provided in some embodiments in the form of voice (including voice, preset ring tone, etc.), text push, etc. to notify the user of the dialog or discussion content provided in text form, so that the user can more quickly focus on the dialog or discussion, thereby improving the user experience.

In some embodiments, the control trigger event described above includes at least any one of:

-the volume of the ambient sound information is larger than a preset volume threshold;

-the volume of the ambient sound information increases over time, e.g. the speaker gradually approaches the user or the speaker increases the volume;

-a sound property of the ambient sound information fulfils a preset property condition, e.g. a human voice is detected from the ambient sound information, or a sound complying with a preset frequency or voiceprint (e.g. a person-specific sound) is detected from the ambient sound information.

In some embodiments, referring to fig. 2b, the audio playing device 10 collects the environmental sounds through the first microphone 201 and the second microphone 202, and detects the control trigger event based on the sound information collected by the first microphone 201 and the second microphone 202, respectively, so as to reduce the misjudgment and be suitable for implementing a specific logic function in some embodiments. Specifically, the audio playback device acquires first ambient sound information based on the first microphone 201, and acquires second ambient sound information based on the second microphone 202. Accordingly, for the case that the audio playing device provides the text information corresponding to the environmental sound information to the user after detecting the control trigger event, in step S310, the audio playing device provides the text information corresponding to the environmental sound information to the user in response to detecting the control trigger event related to the first environmental sound information and the second environmental sound information. For the case that the audio playing device provides the environmental sound information to the user after detecting the control trigger event, in step S320, the audio playing device provides the environmental sound information to the user in response to detecting the control trigger event related to the first environmental sound information and the second environmental sound information.

In some embodiments, in the case that the audio playing apparatus 10 collects the ambient sound through the first microphone 201 and the second microphone 202, the method further includes steps S700 and S800 (both not shown). In step S700, the audio playing device determines sound source orientation information based on the first environmental sound information and the second environmental sound information; subsequently, in step S800, the audio playback apparatus indicates the sound source direction to the user based on the sound source direction information. Wherein, the sound source orientation is provided to the user in the form of voice broadcast based on the audio output unit, indication symbol/indication text presented on the screen of the audio playing device, etc., so that the user can quickly locate the sound source and more quickly focus on the conversation or discussion, thereby improving the user experience.

Specifically, the sound source direction information may be used in some embodiments to determine the sound source direction based on the volume of the same environmental sound collected by the first microphone 201 and the second microphone 202 (for example, if the frequencies of the sounds collected by the first microphone 201 and the second microphone 202 are the same or close, it is determined that the sounds collected by the two microphones are the same environmental sound), or the time difference when the first microphone 201 and the second microphone 202 receive the same environmental sound. Taking the first microphone 201 and the second microphone 202 separately disposed on two sides of the head of the user (for example, separately disposed on two side earphone units of an earphone worn by the user), and the audio playing device determining the sound source direction based on the time difference, referring to fig. 2c, when the difference between the distances of the speaker and the first microphone 201 and the second microphone 202 is d, there will be a time difference between the arrival times of the environmental sound at the first microphone 201 and the second microphone 202 (the time difference can be used to calculate the difference d between the distances); based on the time difference, it is possible to determine whether the sound source is located substantially on the left side or right side of the user. Taking the orientation shown in fig. 2c as an example, if the environmental sound first reaches the first microphone 201 and then reaches the second microphone 202, the sound source is roughly determined to be located at the left side of the user, and the user can visually search for the speaker from the left space without searching for the speaker in the whole space, so that the user can quickly focus on the conversation or discussion. Further, theoretically, the set of points satisfying the characteristic in the space on the plane where the first microphone 201 and the second microphone 202 are located should be one branch of a hyperbola that focuses on the positions where the first microphone 201 and the second microphone 202 are located, and the direction of the sound source can be further determined to be within the range between the asymptotes of the hyperbolas, so that the efficiency of the user in searching for the speaker is further improved based on the hyperbola determined by the difference d between the positions of the first microphone 201 and the second microphone 202 and the distance.

It should be understood that the above-described embodiments for determining the direction of the sound source based on the time difference are merely examples and are not intended to limit the present application in any way. In other embodiments, the orientation of the audio source may also be determined based on other means. For example, although the ears are closer, the sound blocking effect of the user's head will cause the sound level difference between the same ambient sound received by the first microphone 201 and the second microphone 202, and the sound source should be located at the side with the larger sound level. For another example, since the sound waves have different phases at different positions in space, the phase difference of the sound wave vibrations can also be used to identify the sound source position. For another example, the sound wave transmitted from one side of the user will bypass the user to reach the other side, and the diffraction capability of the sound wave is related to the ratio between the wavelength and the size of the obstacle, and for the same obstacle, the higher the sound frequency is, the greater the attenuation of the corresponding component is, and the different side microphones also receive different sound tones. In addition, in some embodiments, the parameters are combined to obtain more accurate sound source orientation. It should be understood by those skilled in the art that these ways for determining the orientation of the sound source are only examples and not any limitation to the present application, and other existing ways for determining the orientation of the sound source based on the first ambient sound information and the second ambient sound information, such as the way that the sound source is determined by the present application, are also included in the scope of protection of the present application and are included by reference.

In some embodiments, in order to improve the positioning accuracy of the sound source position, in step S700, the audio playing device tracks the first environmental sound information and the second environmental sound information to determine the sound source position information. For example, the audio playing device first collects first environmental sound information and second environmental sound information by the first microphone 201 and the second microphone 202, and determines first orientation information of the sound source based on any one of the above manners; while the user rotates the head based on the first orientation information, the audio playing device continuously collects the first environmental sound information and the second environmental sound information by the first microphone 201 and the second microphone 202, and continuously determines the second orientation information of the sound source based on any one of the above modes; the audio playing device then determines the azimuth information of the sound source based on the first azimuth information and the second azimuth information, for example, the azimuth information of the sound source is determined by the intersection of the spatial ranges covered by the first azimuth information and the second azimuth information, so as to improve the measurement accuracy of the azimuth of the sound source.

In some embodiments, the method further comprises step S900 (not shown). In step S900, the audio playing device obtains the pose information of the audio playing device, where the pose information is used to characterize the angular state (pose) including (but not limited to) the pitch, roll, etc. of the audio playing device in space, for example, the pose information includes that the audio playing device is in the landscape/portrait state. In some embodiments, the gesture information is obtained based on a sensing device such as a gyroscope or a gravity sensor built in the audio playback device. Then, in step S800, the audio playing device presents the orientation of the sound source relative to the audio playing device based on the posture information and the sound source orientation information, so as to indicate the sound source orientation to the user, for example, visually indicate the sound source orientation at a corresponding angle in the landscape/portrait state of the audio playing device, refer to fig. 2 c.

In order to avoid unnecessary disturbance in the case of determining the sound source direction information based on the first environmental sound information and the second environmental sound information, and only detecting whether a person speaks to the user or talks about a topic of interest to the user in a spatial direction of interest to the user, in some embodiments, the control trigger event includes: the sound source direction detected by the audio playing equipment meets the preset direction condition. For example, the direction of the sound source detected by the audio playback device is included in the spatial range specified in advance by the user, or the approximate range of the direction of the sound source detected by the audio playback device intersects with the spatial range specified in advance by the user. On the basis, the control trigger event also optionally comprises any one of the following items to avoid the system from being too sensitive to generate misoperation:

-the volume of the first ambient sound information is larger than a preset first volume threshold;

-the volume of the second ambient sound information is larger than a preset second volume threshold;

-the volume of the first ambient sound information increases over time;

-the volume of the second ambient sound information increases over time;

-the difference between the volume of the first ambient sound information and the second ambient sound information decreases over time, e.g. the sound source gradually becomes facing the user.

Of course, the sound source direction meeting the preset direction condition is not necessarily the premise of the control trigger events; accordingly, in some embodiments, where the first and second ambient sound information is collected with the first and

second microphones

201 and 202, the control trigger event includes any of:

-the volume of the first ambient sound information increases over time;

-the volume of the second ambient sound information increases over time;

In some embodiments, the control trigger event comprises: the environmental sound information includes predetermined keyword information so that the user is timely notified of listening or joining a discussion when a person in the surroundings mentions a corresponding keyword (e.g., a person mentions a topic in which the user is interested); accordingly, the audio playing device provides the environmental sound information or the text corresponding to the environmental sound information to the user in response to detecting that the environmental sound information includes the predetermined keyword information. Particularly, if the preset keywords comprise the stop-reporting vocabularies of the vehicles, the user can be reminded of arriving at the stop and getting off in time. For example, when the keyword contains the name of the user, the user can quickly react when someone else calls himself or talks about himself; when the keyword contains a preset station name (such as a subway station) of the user, the user can get off the train in time when taking a vehicle; when the keywords include a website forecast (such as a 'next station' prompt during the website report), the user can get off the bus as soon as possible when taking a vehicle and coming to the station; … …, etc. In some embodiments, the audio playing device first obtains text information corresponding to the environmental sound information (for example, the audio playing device locally converts voice in the environmental sound information into characters, or the audio playing device sends the environmental sound information to the cloud and receives characters returned after the cloud converts the corresponding voice), and detects whether preset keywords are contained in the text information; certainly, the audio playing device can also send the environmental sound information to the cloud, detect whether the environmental sound information contains preset keywords or not through the cloud, and return the judgment result to the audio playing device. In some embodiments, the predetermined keyword information is entered by the user in advance locally on the audio playing device, and in other embodiments, the predetermined keyword information is sent locally on the audio playing device by the cloud server (for example, the audio playing device initiates a synchronization operation, or the cloud server pushes the predetermined keyword information to the audio playing device).

In order to enable the user to join the ongoing conversation as soon as possible or to enable the user to listen carefully to details in the recorded ambient sound, in some embodiments, the audio playing device provides the user with the ambient sound information based on an audio playing rate in step S320, for example, the audio playing device plays the ambient sound information faster (e.g., at "2 times" speed) to save the user' S listening time, so that the user joins the discussion as soon as possible, or plays the ambient sound information slower (e.g., at "0.5 times" speed) to provide more sound details, so that the user does not miss important information. Referring to fig. 3, the audio playing device continuously records the ambient sound, and the time t corresponding to the control trigger event₀Is included in the collection time T of the corresponding ambient sound information. In other words, the control trigger event occurs at time t₀Previously, audio playback devices have recorded a portion of ambient sound information; and at a time t₀And later, the audio playing device continues to record the environmental sound information for a period of time. Therefore, the content provided for the user not only covers the environmental sound information of a period of time before the trigger event is controlled, but also covers the environmental sound information of a period of time after the trigger event is controlled, so that the user is prevented from missing subsequent conversations when listening to the previous recording or reading characters. On the basis, if the realization mode of providing the environmental sound information for the user by changing the audio playing speed is combined, the continuous conversation experience can be provided for the user, so that the user can smoothly join in discussion without worrying about missing some details.

It should be understood by those skilled in the art that the above-described embodiments are merely exemplary and not restrictive of the present application, and that other corresponding embodiments, which are currently or later become known, may be applied to the present application and are included within the scope of the present application and are incorporated herein by reference.

In this application, a Microphone (or called "Microphone", "Microphone" or "Microphone") may have different directivities in different embodiments. For example, the microphones used to implement the present application may be Omnidirectional microphones, or may be single-directional microphones, Bi-directional microphones, or Microphone arrays (Microphone arrays) in order to achieve more precise positioning of the sound source, wherein common single-directional microphones include cardiac directional microphones and hyper-cardiac directional microphones.

Corresponding to the method, according to another aspect of the present application, the present application further provides an audio playing device. The audio playing device includes a first module 100, a second module 200, and a third module 310, referring to fig. 4a, where the first module 100, the second module 200, and the third module 310 are respectively configured to perform operations of step S100, step S200, and step S310 in the embodiment corresponding to fig. 1a, and for a specific implementation, reference is made to the related embodiments, which are not repeated herein; alternatively, the audio playing device includes a first module 100, a second module 200, and a third module 320, referring to fig. 4b, where the first module 100, the second module 200, and the third module 320 are respectively configured to perform operations of step S100, step S200, and step S320 in the embodiment corresponding to fig. 1b, and for a specific implementation, reference is made to the related embodiments, which are not repeated herein.

In some embodiments, the audio playing device further includes a fourth module 400 (not shown), where the fourth module 400 is configured to perform the step S400 in the above embodiments, and for the specific implementation, reference is made to the above related embodiments, which are not described herein again.

In some embodiments, the audio playing device further includes a fifth module 500 (not shown), where the fifth module 500 is configured to perform the step S500 in the foregoing embodiments, and for the specific implementation, reference is made to the foregoing related embodiments, which are not described herein again.

In some embodiments, the audio playing device further includes a sixth module 600 (not shown), where the sixth module 600 is configured to execute step S600 in the foregoing embodiments, and please refer to the foregoing related embodiments for detailed implementation, which is not described herein again.

In some embodiments, the audio playing device further includes a seventh module 700 (not shown), where the seventh module 700 is configured to perform step S700 in the foregoing embodiments, and for the specific implementation, reference is made to the foregoing related embodiments, which are not described herein again.

In some embodiments, the audio playing device further includes an eighth module 800 (not shown), where the eighth module 800 is configured to perform step S800 in the foregoing embodiments, and please refer to the foregoing related embodiments for detailed description, which is not repeated herein.

In some embodiments, the audio playing device further includes a ninth module 900 (not shown), where the ninth module 900 is configured to perform step S900 in the foregoing embodiments, and please refer to the foregoing related embodiments for detailed implementation, which is not described herein again.

Some specific embodiments of the present application are detailed above. It should be understood that the above embodiments are only examples and are not intended to limit the specific embodiments of the present application.

The present application also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding.

The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.

The present application further provides a computer device, comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

FIG. 5 illustrates an exemplary system that can be used to implement the various embodiments described in this application.

As shown in fig. 5, in some embodiments, the system 1000 can be implemented as any one of the audio playback devices in the embodiments described herein. In some embodiments, system 1000 may include one or more computer-readable media (e.g., system memory or NVM/storage 1020) having instructions and one or more processors (e.g., processor(s) 1005) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.

For one embodiment, system control module 1010 may include any suitable interface controllers to provide any suitable interface to at least one of the processor(s) 1005 and/or to any suitable device or component in communication with system control module 1010.

The system control module 1010 may include a memory controller module 1030 to provide an interface to the system memory 1015. Memory controller module 1030 may be a hardware module, a software module, and/or a firmware module.

System memory 1015 may be used to load and store data and/or instructions, for example, for system 1000. For one embodiment, system memory 1015 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, system memory 1015 may include double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, system control module 1010 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 1020 and communication interface(s) 1025.

For example, NVM/storage 1020 may be used to store data and/or instructions. NVM/storage 1020 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk drive(s) (HDD (s)), one or more Compact Disc (CD) drive(s), and/or one or more Digital Versatile Disc (DVD) drive (s)).

NVM/storage 1020 may include storage resources that are physically part of a device on which system 1000 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 1020 may be accessed over a network via communication interface(s) 1025.

Communication interface(s) 1025 may provide an interface for system 1000 to communicate over one or more networks and/or with any other suitable device. System 1000 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 1005 may be packaged together with logic for one or more controller(s) of the system control module 1010, e.g., memory controller module 1030. For one embodiment, at least one of the processor(s) 1005 may be packaged together with logic for one or more controller(s) of the system control module 1010 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1005 may be integrated on the same die with logic for one or more controller(s) of the system control module 1010. For one embodiment, at least one of the processor(s) 1005 may be integrated on the same die with logic of one or more controllers of the system control module 1010 to form a system on a chip (SoC).

In various embodiments, system 1000 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 1000 may have more or fewer components and/or different architectures. For example, in some embodiments, system 1000 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

Additionally, some portions of the present application may be applied as a computer program product, such as computer program instructions, which, when executed by a computer, may invoke or provide the method and/or solution according to the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.

Communication media includes media whereby communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules or other data may be embodied in a modulated data signal, such as a carrier wave or similar mechanism that is embodied in a wireless medium, such as part of spread-spectrum techniques, for example. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.

An embodiment according to the present application herein comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or solution according to embodiments of the present application as described above.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method for providing information, applied to an audio playing device, wherein the method comprises:

playing the current audio based on the first output level;

collecting environmental sound information, wherein the environmental sound information comprises first environmental sound information and second environmental sound information, the first environmental sound information is collected based on a first microphone, the second environmental sound information is collected based on a second microphone, and the first microphone and the second microphone are respectively arranged on earphone units on two sides of an earphone worn by a user;

detecting whether a control trigger event related to the first environmental sound information and the second environmental sound information exists based on the first environmental sound information and the second environmental sound information, wherein the control trigger event comprises that the difference between the volumes of the first environmental sound information and the second environmental sound information is reduced along with time, the difference between the volumes of the first environmental sound information and the second environmental sound information is reduced along with time to be used for determining that a sound source becomes facing a user, and the control trigger event is used for determining whether the environmental sound within a period of time or text information corresponding to the environmental sound is provided for the user;

and in response to detecting a control trigger event related to the first environmental sound information and the second environmental sound information, providing text information corresponding to the environmental sound information to a user or providing the environmental sound information to the user.

2. The method of claim 1, wherein the method further comprises:

saving the environmental sound information;

the step of providing the text information corresponding to the environmental sound information to the user in response to detecting the control trigger event related to the environmental sound information includes:

providing text information corresponding to the saved environmental sound information to a user in response to detecting a control trigger event related to the environmental sound information;

the step of providing the ambient sound information to a user in response to detecting a control trigger event with respect to the ambient sound information, comprises:

providing the saved ambient sound information to a user in response to detecting a control trigger event with respect to the ambient sound information.

3. The method of claim 2, wherein the step of saving the ambient sound information comprises:

and saving the environmental sound information based on a preset time window.

4. The method of claim 1, wherein the method further comprises:

playing the current audio based on a second output level, wherein a volume of playing the current audio based on the second level is different from a volume of playing the current audio based on the first output level; or,

pausing the playing of the current audio.

5. The method of claim 4, wherein the method further comprises:

in response to detecting a recovery trigger event, playing the current audio based on the first output level.

6. The method of claim 5, wherein the recovery triggering event comprises any one of:

the user executes the level recovery operation;

the user does not respond within a preset time length.

7. The method according to any one of claims 1 to 6, wherein the step of providing text information corresponding to the ambient sound information to a user in response to detecting a control trigger event with respect to the ambient sound information comprises:

and in response to detecting a control trigger event related to the environmental sound information, providing text viewing prompt information and providing text information corresponding to the environmental sound information to a user.

8. The method of any of claims 1-6, wherein the control trigger event comprises at least any one of:

the volume of the environmental sound information is larger than a preset volume threshold;

the volume of the ambient sound information increases over time;

and the sound attribute of the environmental sound information meets a preset attribute condition.

9. The method of claim 1, wherein the method further comprises:

determining sound source orientation information based on the first ambient sound information and the second ambient sound information;

and indicating the direction of the sound source to the user based on the sound source direction information.

10. The method of claim 9, wherein determining the audio source orientation information based on the first ambient sound information and the second ambient sound information comprises:

tracking the first ambient sound information and the second ambient sound information to determine sound source orientation information.

11. The method of claim 9, wherein the step of indicating a source orientation to a user based on the source orientation information is preceded by:

acquiring attitude information of the audio playing equipment;

the step of indicating the direction of the sound source to the user based on the sound source direction information includes:

and presenting the position of the sound source relative to the audio playing equipment based on the attitude information and the sound source position information, thereby indicating the position of the sound source to the user.

12. The method of claim 9, wherein the control trigger event comprises:

and the sound source azimuth information meets a preset azimuth condition.

13. The method of claim 12, wherein the control trigger event further comprises any one of:

the volume of the first environmental sound information is larger than a preset first volume threshold;

the volume of the second environment sound information is larger than a preset second volume threshold;

the volume of the first ambient sound information increases over time;

the volume of the second ambient sound information increases with time;

a difference between the volumes of the first ambient sound information and the second ambient sound information decreases with time.

14. The method of claim 1, wherein the control trigger event further comprises any one of:

and the volume of the second environment sound information is greater than a preset second volume threshold value.

15. The method of claim 1, wherein the control trigger event comprises:

the ambient sound information includes predetermined keyword information.

16. The method of claim 1, wherein the step of providing the ambient sound information to a user in response to detecting a control trigger event with respect to the ambient sound information comprises:

in response to detecting a control-triggering event with respect to the ambient sound information, the ambient sound information is provided to a user based on an audio playback rate.

17. The method of claim 16, wherein the time corresponding to the control trigger event is included in the collection time of the ambient sound information.

18. An audio playback apparatus, wherein the apparatus comprises:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform operations according to the method of any one of claims 1 to 17.

19. A computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform operations of any of the methods of claims 1-17.