CN107799125A - A kind of audio recognition method, mobile terminal and computer-readable recording medium - Google Patents

A kind of audio recognition method, mobile terminal and computer-readable recording medium Download PDF

Info

Publication number
CN107799125A
CN107799125A CN201711097133.5A CN201711097133A CN107799125A CN 107799125 A CN107799125 A CN 107799125A CN 201711097133 A CN201711097133 A CN 201711097133A CN 107799125 A CN107799125 A CN 107799125A
Authority
CN
China
Prior art keywords
information
lip
voice communication
noise
mobile terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711097133.5A
Other languages
Chinese (zh)
Inventor
刘康飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN201711097133.5A priority Critical patent/CN107799125A/en
Publication of CN107799125A publication Critical patent/CN107799125A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of audio recognition method, mobile terminal and computer-readable recording medium, wherein, methods described includes:In voice communication course, whether the noise intensity detected in current environment meets predetermined condition;If the noise intensity meets predetermined condition, lip characteristic image is gathered, and lip reading information is identified according to the lip characteristic image;The lip reading information is converted into voice messaging and/or text information, and sent to the receiving terminal of voice communication.The present invention can ensure that the receiving terminal of voice communication receives accurate voice messaging and/or text information in a noisy environment, so as to improve communication quality.

Description

A kind of audio recognition method, mobile terminal and computer-readable recording medium
Technical field
The present invention relates to communication technical field, more particularly to a kind of audio recognition method, mobile terminal and computer-readable Storage medium.
Background technology
With the fast development of mobile terminal, people are also more and more to the functional requirements of mobile terminal.Mobile terminal Voice communication can be carried out, such as:Voice call, video calling, and send speech message etc., with realize person to person or it is man-machine it Between information exchange.The voice messaging when user that mobile terminal collects microphone speaks is sent to the reception of voice communication End, so as to realize voice communication.
But when mobile terminal is in progress voice communication in noisy environment, due to the interference of noise, cause to communicate Quality Down.
The content of the invention
It is existing to solve the invention provides a kind of audio recognition method, mobile terminal and computer-readable recording medium Mobile terminal is in the problem of causing communication quality to decline in noisy environment in technology.
In a first aspect, the embodiments of the invention provide a kind of audio recognition method, applied to mobile terminal, wherein the side Method includes:
In voice communication course, whether the noise intensity detected in current environment meets predetermined condition;
If the noise intensity meets predetermined condition, lip characteristic image is gathered, and according to the lip characteristic image Identify lip reading information;
The lip reading information is converted into voice messaging and/or text information, and sent to the receiving terminal of voice communication.
Second aspect, the embodiment of the present invention additionally provide a kind of mobile terminal, and the mobile terminal includes:
Detection module, in voice communication course, whether the noise intensity detected in current environment to meet predetermined bar Part;
Identification module, if meeting predetermined condition for the noise intensity, lip characteristic image is gathered, and according to described Lip characteristic image identifies lip reading information;
Sending module, for the lip reading information to be converted into voice messaging and/or text information, and send to voice and lead to The receiving terminal of letter.
The third aspect, the embodiment of the present invention additionally provide a kind of mobile terminal, including processor, memory and are stored in institute The computer program that can be run on memory and on the processor is stated, when the computer program is by the computing device The step of realizing audio recognition method as described above.
Fourth aspect, the embodiment of the present invention additionally provide a kind of computer-readable recording medium, described computer-readable to deposit Computer program is stored with storage media, the computer program realizes speech recognition side as described above when being executed by processor The step of method.
The embodiment of the present invention can ensure that the receiving terminal of voice communication receives accurate voice letter in a noisy environment Breath and/or text information, so as to improve communication quality.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these accompanying drawings Obtain other accompanying drawings.
Fig. 1 represents one of flow chart of audio recognition method of the embodiment of the present invention;
Fig. 2 represents the two of the flow chart of the audio recognition method of the embodiment of the present invention;
Fig. 3 represents the three of the flow chart of the audio recognition method of the embodiment of the present invention;
Fig. 4 represents one of block diagram of mobile terminal of the embodiment of the present invention;
Fig. 5 represents the two of the block diagram of the mobile terminal of the embodiment of the present invention;
Fig. 6 represents the hardware architecture diagram of the mobile terminal of the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, rather than whole embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.
Referring to Fig. 1, the embodiments of the invention provide a kind of audio recognition method, applied to mobile terminal, methods described bag Include:
Step 11, in voice communication course, whether the noise intensity detected in current environment meets predetermined condition.
In the embodiment, when user carries out voice communication using mobile terminal, the noise detected in current environment is strong Whether degree meets preparatory condition, can be the loudness value of noise in the current environment that will be detected as noise intensity;It is if current The loudness value of noise is more than the first predetermined threshold value in environment, it is determined that noise intensity meets predetermined condition.It can also be that detection is worked as The first loudness value of noise is corresponded in preceding environment and user inputted during voice communication between the second loudness value of voice Loudness difference, if loudness difference is less than the second predetermined threshold value, it is determined that the noise intensity in current environment meets predetermined condition, its Middle loudness value can use decibel to measure.
Wherein, the mode of voice communication includes:Voice call, video calling, send voice SMS and pass through IMU News application sends speech message etc..It should be noted that the mode of voice communication can also include with its other party in addition Formula, the present invention are not limited.
Step 12, if the noise intensity meets predetermined condition, lip characteristic image is gathered, and it is special according to the lip Levy image recognition lip reading information.
In the embodiment, if the noise intensity meets predetermined condition, influenceed in the noise intensity for then representing current environment The communication quality of voice communication, then switch to lip reading recognition mode by mobile terminal, and collection carries out language using the mobile terminal The lip characteristic image of the user of sound communication, and identify that the user carries out voice messaging according to the lip characteristic image Lip reading information during input, when the lip reading information of user can be collected as voice communication by ensureing the transmitting terminal of voice communication Input information, the influence of noise voice communication in environment is avoided, so as to ensure the communication quality of voice communication.
Step 13, the lip reading information is converted into voice messaging and/or text information, and sends connecing to voice communication Receiving end.
In the embodiment, the lip reading information that will identify to obtain according to the lip characteristic information of user, with voice and/or text The mode of word is cranked to the receiving terminal of voice communication.Such as:, will identification if user carries out voice call using mobile terminal Obtained lip reading information, which is converted to voice messaging and sent, to the receiving terminal of voice communication, ensures that the receiving terminal of voice communication can connect Clearly voice messaging is received, so as to improve communication quality;If even if user sends language using the communication applications of mobile terminal Sound message, then it will identify that obtained lip reading information is converted to voice messaging and sent to the receiving terminal of voice communication, or will identification Obtained lip reading information is converted to text information and sent to the receiving terminal of voice communication, even if avoiding using communication applications to send During speech message, input efficiency is caused to reduce because phonetic entry is switched to character input modes by Environmental Noise Influence, so as to Even if improving transmitting efficiency when using communication applications transmission speech message, and ensure that the receiving terminal of voice communication can receive Clearly voice messaging or corresponding text information, and then improve communication quality.
In such scheme, when user carries out voice communication using mobile terminal, by the noise in current environment Intensity is detected, and when detecting that noise intensity meets predetermined condition, identification collection lip characteristic image, and according to described Lip characteristic image identifies lip reading information, and the lip reading information is converted into voice messaging and/or text information is sent to language The receiving terminal of sound communication, avoids the influence of noise voice communication in environment.The program is in noisy environment in mobile terminal When carrying out voice communication, clearly voice messaging can be received and/or believe with the voice by ensureing the receiving terminal of voice communication Text information corresponding to breath, so as to improve the quality of voice communication, and then be advantageous to be lifted the experience effect of user.
In addition, the program when detecting that noise intensity meets predetermined condition, is directly switch to lip reading recognition mode, to making Lip reading when user carries out voice messaging input is identified, and avoids manual switching speech recognition mode from reducing communication efficiency.
Referring to Fig. 2, the embodiment of the present invention additionally provides a kind of audio recognition method, applied to mobile terminal, methods described Including:
Step 21, in voice communication course, the audio-frequency information that the microphone of the mobile terminal collects is obtained.
In the embodiment, if mobile terminal is in noisy environment, in the audio-frequency information collected by microphone Voice messaging during voice messaging input is carried out including the use of person, and the voice is removed in the environment that is presently in of mobile terminal Noise information outside information.
Step 22, dissection process is carried out to the audio-frequency information, extracts the noise information in the audio-frequency information.
In the embodiment, by carrying out dissection process to audio-frequency information, such as:Changed according to the sound wave of the audio-frequency information and advised Rule, voice messaging and noise information in audio-frequency information are distinguished, and extract the noise information in the audio-frequency information.
Step 23, detect whether noise intensity corresponding to the noise information meets predetermined condition.
Specifically, the step of whether noise intensity meets predetermined condition corresponding to the noise information is detected, including:Detection Whether noise intensity corresponding to the noise information is more than predetermined threshold value;If noise intensity corresponding to the noise information is more than pre- If threshold value, it is determined that the noise intensity meets predetermined condition.Such as:Judged according to the maximum of amplitude in the waveform of noise information Whether noise intensity meets predetermined condition.If amplitude is more than default amplitude threshold, it is determined that noise intensity meets predetermined condition, To avoid Environmental Noise Influence communication quality, and then mobile terminal is switched into lip during progress voice communication in noisy environment Language recognition mode, the lip reading information when user carries out voice messaging input is identified, ensure the receiving terminal energy of voice communication Clearly voice messaging is enough received, so as to improve the quality of voice communication.
Step 24, if the noise intensity meets predetermined condition, lip characteristic image is gathered, and it is special according to the lip Levy image recognition lip reading information.
In the embodiment, if the noise intensity meets predetermined condition, influenceed in the noise intensity for then representing current environment The communication quality of voice communication, then switch to lip reading recognition mode by mobile terminal, and collection carries out language using the mobile terminal The lip characteristic image of the user of sound communication, and identify that the user carries out voice messaging according to the lip characteristic image Lip reading information during input, the influence of noise voice communication in environment is avoided, so as to ensure the communication quality of voice communication.
Step 25, the lip reading information is converted into voice messaging and/or text information, and sends connecing to voice communication Receiving end.
In such scheme, when user carries out voice communication using mobile terminal, the Mike of the mobile terminal is obtained The audio-frequency information that wind collects, and extract noise information in audio-frequency information;By being carried out to the noise intensity of the noise information Detection, and when detecting that noise intensity meets predetermined condition, lip characteristic image is gathered, and according to the lip characteristic image Lip reading information is identified, and the lip reading information is converted into voice messaging and/or text information is sent to the reception of voice communication End, avoids the influence of noise voice communication in environment.The program is in noisy environment in mobile terminal and carries out voice communication When, clearly voice messaging and/or word corresponding with the voice messaging can be received by ensureing the receiving terminal of voice communication Information, so as to improve the quality of voice communication, and then be advantageous to be lifted the experience effect of user.
Referring to Fig. 3, the embodiment of the present invention additionally provides a kind of audio recognition method, applied to mobile terminal, methods described Including:
Step 31, in voice communication course, whether the noise intensity detected in current environment meets predetermined condition.
In the embodiment, when user carries out voice communication using mobile terminal, the noise detected in current environment is strong Whether degree meets preparatory condition, can be the loudness value of noise in the current environment that will be detected as noise intensity;It is if current The loudness value of noise is more than the first predetermined threshold value in environment, it is determined that noise intensity meets predetermined condition.It can also be that detection is worked as The first loudness value of noise is corresponded in preceding environment and user inputted during voice communication between the second loudness value of voice Loudness difference, if loudness difference is less than the second predetermined threshold value, it is determined that the noise intensity in current environment meets predetermined condition, its Middle loudness value can use decibel to measure.
Wherein, the mode of voice communication includes:Voice call, video calling, send voice SMS and pass through IMU News application sends speech message etc..It should be noted that the mode of voice communication can also include with its other party in addition Formula, the present invention are not limited.
Step 32, if the noise intensity meets predetermined condition, prompt whether user is entered using lip reading recognition mode Row voice communication.
In the embodiment, if the noise intensity meets predetermined condition, influenceed in the noise intensity for then representing current environment The communication quality of voice communication, then whether user is prompted using the progress voice communication of lip reading recognition mode.Specifically, it can lead to Crossing display one includes the prompting frame of prompt message, to prompt whether user carries out voice communication using lip reading recognition mode.Also User can be prompted whether to carry out voice using lip reading recognition mode and led to by way of it will be prompted to information and carry out voice broadcast Letter.
Step 33, determine to carry out the triggering command of voice communication using lip reading recognition mode if getting user, adopt Collect lip characteristic image, and lip reading information is identified according to the lip characteristic image.
In the embodiment, determined to carry out the triggering command of voice communication using lip reading recognition mode according to user, will moved Dynamic terminal switches to lip reading recognition mode, gathers the lip characteristic image of user, to be identified according to the lip characteristic image The user carries out the lip reading information of input during voice communication, avoids the environment residing for user from being not easy to identify using lip reading During pattern, voice communication is influenceed, and then ensures the communication quality and communication success rate of voice communication.
Step 34, the lip reading information is converted into voice messaging and/or text information, and sends connecing to voice communication Receiving end.
In such scheme, when user carries out voice communication using mobile terminal, by the noise in current environment Intensity is detected, and when detecting that noise intensity meets predetermined condition, prompts whether user uses lip reading recognition mode Carry out voice communication;If getting user to determine to carry out the triggering command of voice communication using lip reading recognition mode, gather Lip characteristic image, and according to the lip reading information of input during lip characteristic image identification user's progress voice communication, and The lip reading information is converted into voice messaging and/or text information is sent to the receiving terminal of voice communication, is avoided in environment Influence of noise voice communication.The program ensures voice communication when mobile terminal is in progress voice communication in noisy environment Receiving terminal can receive clearly voice messaging and/or text information corresponding with the voice messaging, so as to improve language The quality of sound communication, and then be advantageous to be lifted the experience effect of user.
In addition, this programme determines to carry out the triggering command of voice communication using lip reading recognition mode always according to user, will Mobile terminal switches to lip reading recognition mode, during avoiding the environment residing for user from being not easy to use lip reading recognition mode, shadow Voice communication is rung, and then ensures the communication quality and communication success rate of voice communication.
Further, the step of gathering lip characteristic image, and lip reading information is identified according to the lip characteristic image, bag Include:Start the camera of the mobile terminal;The lip for obtaining the user of the progress voice communication of the camera collection is special Levy image;According to the lip characteristic image, the lip reading information when user carries out voice communication is identified.
Specifically, starting the camera of mobile terminal, the image information of the camera collection is obtained, detects described image Whether the lip characteristic image of user is included in information;If the lip characteristic image comprising user, it is determined that reach mobile Terminal carries out the condition of lip reading identification, then according to the lip characteristic image collected, identifies that the user carries out voice communication When lip reading information, to avoid the influence of noise voice communication in environment.
If the lip characteristic image not comprising user or the part lip characteristic image comprising user, mobile terminal Can not be according to the lip characteristic image collected, when identifying the lip reading information when user carries out voice communication, prompting makes User adjusts the acquisition angles of camera, in order to collect the complete lip characteristic image of user, to identify the use Person carries out lip reading information during voice communication, to avoid the influence of noise voice communication in environment.Wherein, user's adjustment is prompted The mode of the acquisition angles of camera can be prompted using prompting frame or voice broadcast prompting message by the way of carried Show, or other modes in addition, the present invention are not limited.
It should be noted that the collection lip characteristic image of above-described embodiment, and identified according to the lip characteristic image The step of lip reading information, it can apply to each embodiment of the above.
Referring to Fig. 4 and Fig. 5, the embodiment of the present invention additionally provides a kind of mobile terminal, and the mobile terminal 400 includes:
Detection module 410, for during voice communication, it is pre- whether the noise intensity in detection current environment meets Fixed condition.
Identification module 420, if meeting predetermined condition for the noise intensity, gather lip characteristic image, and according to The lip characteristic image identifies lip reading information.
Sending module 430, for the lip reading information to be converted into voice messaging and/or text information, and send to language The receiving terminal of sound communication.
Wherein, the detection module 410 includes:
Acquisition submodule 411, the sound that the microphone in voice communication course, obtaining the mobile terminal collects Frequency information.
Extracting sub-module 412, for carrying out dissection process to the audio-frequency information, extract the noise in the audio-frequency information Information.
Detection sub-module 413, for detecting whether noise intensity corresponding to the noise information meets predetermined condition.
Wherein, the detection sub-module 413 includes:
Detection unit 4131, for detecting whether noise intensity corresponding to the noise information is more than predetermined threshold value.
Processing unit 4132, if being more than predetermined threshold value for noise intensity corresponding to the noise information, it is determined that described Noise intensity meets predetermined condition corresponding to noise information.
Wherein, the identification module 420 includes:
Prompting submodule 421, if meeting predetermined condition for the noise intensity, prompt whether user uses lip reading Recognition mode carries out voice communication;
Submodule 422 is identified, if determining to carry out touching for voice communication using lip reading recognition mode for getting user Send instructions, then identification collection lip characteristic image, and lip reading information is identified according to the lip characteristic image.
Wherein, the mobile terminal 400 includes:
Start unit 4221, for starting the camera of the mobile terminal;
Acquiring unit 4222, the lip characteristic pattern of the user of the progress voice communication for obtaining the camera collection Picture;
Recognition unit 4223, during for according to the lip characteristic image, identifying that the user carries out voice communication Lip reading information.
Mobile terminal provided in an embodiment of the present invention can realize that mobile terminal is realized in Fig. 1 to Fig. 3 embodiment of the method Each process, to avoid repeating, repeat no more here.
Mobile terminal 400 in such scheme, when user carries out voice communication using mobile terminal, by current Noise intensity in environment is detected, and when detecting that noise intensity meets predetermined condition, gathers lip characteristic image, and Lip reading information is identified according to the lip characteristic image, and the lip reading information is converted into voice messaging and/or text information Send to the receiving terminal of voice communication, avoid the influence of noise voice communication in environment.The program is in noisy in mobile terminal Environment in carry out voice communication when, ensure voice communication receiving terminal can receive clearly voice messaging and/or with institute Text information corresponding to voice messaging is stated, so as to improve the quality of voice communication, and then is advantageous to be lifted the experience effect of user.
Fig. 6 is a kind of hardware architecture diagram for the mobile terminal for realizing each embodiment of the present invention.
The mobile terminal 600 includes but is not limited to:It is radio frequency unit 601, mixed-media network modules mixed-media 602, audio output unit 603, defeated Enter unit 604, sensor 605, display unit 606, user input unit 607, interface unit 608, memory 609, processor The part such as 610 and power supply 611.It will be understood by those skilled in the art that the mobile terminal structure shown in Fig. 6 is not formed Restriction to mobile terminal, mobile terminal can be included than illustrating more or less parts, either combine some parts or Different part arrangements.In embodiments of the present invention, mobile terminal include but is not limited to mobile phone, tablet personal computer, notebook computer, Palm PC, car-mounted terminal, wearable device and pedometer etc..
Wherein, processor 610, in voice communication course, whether the noise intensity detected in current environment to meet Predetermined condition;If the noise intensity meets predetermined condition, lip characteristic image is gathered, and according to the lip characteristic image Identify lip reading information;The lip reading information is converted into voice messaging and/or text information, and sent to the reception of voice communication End.
Mobile terminal 600 in such scheme, when user carries out voice communication using mobile terminal, by current Noise intensity in environment is detected, and when detecting that noise intensity meets predetermined condition, gathers lip characteristic image, and Lip reading information is identified according to the lip characteristic image, and the lip reading information is converted into voice messaging and/or text information Send to the receiving terminal of voice communication, avoid the influence of noise voice communication in environment.The program is in noisy in mobile terminal Environment in carry out voice communication when, ensure voice communication receiving terminal can receive clearly voice messaging and/or with institute Text information corresponding to voice messaging is stated, so as to improve the quality of voice communication, and then is advantageous to be lifted the experience effect of user.
It should be understood that in the embodiment of the present invention, radio frequency unit 601 can be used for receiving and sending messages or communication process in, signal Reception and transmission, specifically, by from base station downlink data receive after, handled to processor 610;In addition, will be up Data are sent to base station.Generally, radio frequency unit 601 includes but is not limited to antenna, at least one amplifier, transceiver, coupling Device, low-noise amplifier, duplexer etc..In addition, radio frequency unit 601 can also by wireless communication system and network and other set Standby communication.
Mobile terminal has provided the user wireless broadband internet by mixed-media network modules mixed-media 602 and accessed, and such as helps user to receive Send e-mails, browse webpage and access streaming video etc..
Audio output unit 603 can be receiving by radio frequency unit 601 or mixed-media network modules mixed-media 602 or in memory 609 It is sound that the voice data of storage, which is converted into audio signal and exported,.Moreover, audio output unit 603 can also be provided and moved The audio output for the specific function correlation that dynamic terminal 600 performs is (for example, call signal receives sound, message sink sound etc. Deng).Audio output unit 603 includes loudspeaker, buzzer and receiver etc..
Input block 604 is used to receive audio or video signal.Input block 604 can include graphics processor (Graphics Processing Unit, GPU) 6041 and microphone 6042, graphics processor 6041 is in video acquisition mode Or the static images or the view data of video obtained in image capture mode by image capture apparatus (such as camera) are carried out Reason.Picture frame after processing may be displayed on display unit 606.Picture frame after the processing of graphics processor 6041 can be deposited Storage is transmitted in memory 609 (or other storage mediums) or via radio frequency unit 601 or mixed-media network modules mixed-media 602.Mike Wind 6042 can receive sound, and can be voice data by such acoustic processing.Voice data after processing can be The form output of mobile communication base station can be sent to via radio frequency unit 601 by being converted in the case of telephone calling model.
Mobile terminal 600 also includes at least one sensor 605, such as optical sensor, motion sensor and other biographies Sensor.Specifically, optical sensor includes ambient light sensor and proximity transducer, wherein, ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 6061, and proximity transducer can close when mobile terminal 600 is moved in one's ear Display panel 6061 and/or backlight.As one kind of motion sensor, accelerometer sensor can detect in all directions (general For three axles) size of acceleration, size and the direction of gravity are can detect that when static, available for identification mobile terminal posture (ratio Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap);Pass Sensor 605 can also include fingerprint sensor, pressure sensor, iris sensor, molecule sensor, gyroscope, barometer, wet Meter, thermometer, infrared ray sensor etc. are spent, will not be repeated here.
Display unit 606 is used for the information for showing the information inputted by user or being supplied to user.Display unit 606 can wrap Display panel 6061 is included, liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode can be used Forms such as (Organic Light-Emitting Diode, OLED) configures display panel 6061.
User input unit 607 can be used for the numeral or character information for receiving input, and produce the use with mobile terminal The key signals input that family is set and function control is relevant.Specifically, user input unit 607 include contact panel 6071 and Other input equipments 6072.Contact panel 6071, also referred to as touch-screen, collect touch operation of the user on or near it (for example user uses any suitable objects or annex such as finger, stylus on contact panel 6071 or in contact panel 6071 Neighbouring operation).Contact panel 6071 may include both touch detecting apparatus and touch controller.Wherein, touch detection Device detects the touch orientation of user, and detects the signal that touch operation is brought, and transmits a signal to touch controller;Touch control Device processed receives touch information from touch detecting apparatus, and is converted into contact coordinate, then gives processor 610, receiving area Manage the order that device 610 is sent and performed.It is furthermore, it is possible to more using resistance-type, condenser type, infrared ray and surface acoustic wave etc. Type realizes contact panel 6071.Except contact panel 6071, user input unit 607 can also include other input equipments 6072.Specifically, other input equipments 6072 can include but is not limited to physical keyboard, function key (such as volume control button, Switch key etc.), trace ball, mouse, action bars, will not be repeated here.
Further, contact panel 6071 can be covered on display panel 6061, when contact panel 6071 is detected at it On or near touch operation after, send processor 610 to determine the type of touch event, be followed by subsequent processing device 610 according to touch The type for touching event provides corresponding visual output on display panel 6061.Although in figure 6, contact panel 6071 and display Panel 6061 is the part independent as two to realize the input of mobile terminal and output function, but in some embodiments In, can be integrated by contact panel 6071 and display panel 6061 and realize input and the output function of mobile terminal, it is specific this Place does not limit.
Interface unit 608 is the interface that external device (ED) is connected with mobile terminal 600.For example, external device (ED) can include Line or wireless head-band earphone port, external power source (or battery charger) port, wired or wireless FPDP, storage card end Mouth, port, audio input/output (I/O) port, video i/o port, earphone end for connecting the device with identification module Mouthful etc..Interface unit 608 can be used for receive the input (for example, data message, electric power etc.) from external device (ED) and One or more elements that the input received is transferred in mobile terminal 600 can be used in the He of mobile terminal 600 Data are transmitted between external device (ED).
Memory 609 can be used for storage software program and various data.Memory 609 can mainly include storing program area And storage data field, wherein, storing program area can storage program area, application program (such as the sound needed at least one function Sound playing function, image player function etc.) etc.;Storage data field can store according to mobile phone use created data (such as Voice data, phone directory etc.) etc..In addition, memory 609 can include high-speed random access memory, can also include non-easy The property lost memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts.
Processor 610 is the control centre of mobile terminal, utilizes each of various interfaces and the whole mobile terminal of connection Individual part, by running or performing the software program and/or module that are stored in memory 609, and call and be stored in storage Data in device 609, the various functions and processing data of mobile terminal are performed, so as to carry out integral monitoring to mobile terminal.Place Reason device 610 may include one or more processing units;Preferably, processor 610 can integrate application processor and modulatedemodulate is mediated Device is managed, wherein, application processor mainly handles operating system, user interface and application program etc., and modem processor is main Handle radio communication.It is understood that above-mentioned modem processor can not also be integrated into processor 610.
Mobile terminal 600 can also include the power supply 611 (such as battery) to all parts power supply, it is preferred that power supply 611 Can be logically contiguous by power-supply management system and processor 610, so as to realize management charging by power-supply management system, put The function such as electricity and power managed.
In addition, mobile terminal 600 includes some unshowned functional modules, will not be repeated here.
Preferably, the embodiment of the present invention also provides a kind of mobile terminal, including processor 610, memory 609, is stored in On memory 609 and the computer program that can be run on the processor 610, the computer program are performed by processor 610 Each process of the above-mentioned audio recognition method embodiments of Shi Shixian, and identical technique effect can be reached, to avoid repeating, here Repeat no more.
The embodiment of the present invention also provides a kind of computer-readable recording medium, and meter is stored with computer-readable recording medium Calculation machine program, the computer program realize each process of above-mentioned audio recognition method embodiment, and energy when being executed by processor Reach identical technique effect, to avoid repeating, repeat no more here.Wherein, described computer-readable recording medium, such as only Read memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic disc or CD etc..
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or device including a series of elements not only include those key elements, and And also include the other element being not expressly set out, or also include for this process, method, article or device institute inherently Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Other identical element also be present in the process of key element, method, article or device.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme is substantially done to prior art in other words Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal (can be mobile phone, computer, service Device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
Embodiments of the invention are described above in conjunction with accompanying drawing, but the invention is not limited in above-mentioned specific Embodiment, above-mentioned embodiment is only schematical, rather than restricted, one of ordinary skill in the art Under the enlightenment of the present invention, in the case of present inventive concept and scope of the claimed protection is not departed from, it can also make a lot Form, belong within the protection of the present invention.

Claims (10)

1. a kind of audio recognition method, applied to mobile terminal, it is characterised in that methods described includes:
In voice communication course, whether the noise intensity detected in current environment meets predetermined condition;
If the noise intensity meets predetermined condition, lip characteristic image is gathered, and identify according to the lip characteristic image Lip reading information;
The lip reading information is converted into voice messaging and/or text information, and sent to the receiving terminal of voice communication.
2. audio recognition method according to claim 1, it is characterised in that described in voice communication course, detection is worked as The step of whether noise intensity in preceding environment meets predetermined condition, including:
In voice communication course, the audio-frequency information that the microphone of the mobile terminal collects is obtained;
Dissection process is carried out to the audio-frequency information, extracts the noise information in the audio-frequency information;
Detect whether noise intensity corresponding to the noise information meets predetermined condition.
3. audio recognition method according to claim 2, it is characterised in that made an uproar corresponding to the detection noise information The step of whether sound intensity meets predetermined condition, including:
Detect whether noise intensity corresponding to the noise information is more than predetermined threshold value;
If noise intensity corresponding to the noise information is more than predetermined threshold value, it is determined that noise intensity corresponding to the noise information Meet predetermined condition.
4. audio recognition method according to claim 1, it is characterised in that if the noise intensity meets predetermined bar The step of part, then gathering lip characteristic image, and lip reading information identified according to the lip characteristic image, including:
If the noise intensity meets predetermined condition, whether user is prompted using the progress voice communication of lip reading recognition mode;
If getting user to determine to carry out the triggering command of voice communication using lip reading recognition mode, lip characteristic pattern is gathered Picture, and lip reading information is identified according to the lip characteristic image.
5. a kind of mobile terminal, it is characterised in that the mobile terminal includes:
Detection module, in voice communication course, whether the noise intensity detected in current environment to meet predetermined condition;
Identification module, if meeting predetermined condition for the noise intensity, lip characteristic image is gathered, and according to the lip Characteristic image identifies lip reading information;
Sending module, for the lip reading information to be converted into voice messaging and/or text information, and send to voice communication Receiving terminal.
6. mobile terminal according to claim 5, it is characterised in that the detection module includes:
Acquisition submodule, the audio-frequency information that the microphone in voice communication course, obtaining the mobile terminal collects;
Extracting sub-module, for carrying out dissection process to the audio-frequency information, extract the noise information in the audio-frequency information;
Detection sub-module, for detecting whether noise intensity corresponding to the noise information meets predetermined condition.
7. mobile terminal according to claim 6, it is characterised in that the detection sub-module includes:
Detection unit, for detecting whether noise intensity corresponding to the noise information is more than predetermined threshold value;
Processing unit, if being more than predetermined threshold value for noise intensity corresponding to the noise information, it is determined that the noise information Corresponding noise intensity meets predetermined condition.
8. mobile terminal according to claim 5, it is characterised in that the identification module includes:
Prompting submodule, if meeting predetermined condition for the noise intensity, whether user is prompted using lip reading identification mould Formula carries out voice communication;
Submodule is identified, if determining to carry out the triggering command of voice communication using lip reading recognition mode for getting user, Lip characteristic image is then gathered, and lip reading information is identified according to the lip characteristic image.
9. a kind of mobile terminal, it is characterised in that including processor, memory and be stored on the memory and can be described The computer program run on processor, the computer program are realized during the computing device as in Claims 1-4 The step of audio recognition method described in any one.
10. a kind of computer-readable recording medium, it is characterised in that be stored with computer on the computer-readable recording medium Program, the audio recognition method as any one of Claims 1-4 is realized when the computer program is executed by processor The step of.
CN201711097133.5A 2017-11-09 2017-11-09 A kind of audio recognition method, mobile terminal and computer-readable recording medium Pending CN107799125A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711097133.5A CN107799125A (en) 2017-11-09 2017-11-09 A kind of audio recognition method, mobile terminal and computer-readable recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711097133.5A CN107799125A (en) 2017-11-09 2017-11-09 A kind of audio recognition method, mobile terminal and computer-readable recording medium

Publications (1)

Publication Number Publication Date
CN107799125A true CN107799125A (en) 2018-03-13

Family

ID=61549500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711097133.5A Pending CN107799125A (en) 2017-11-09 2017-11-09 A kind of audio recognition method, mobile terminal and computer-readable recording medium

Country Status (1)

Country Link
CN (1) CN107799125A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537207A (en) * 2018-04-24 2018-09-14 Oppo广东移动通信有限公司 Lip reading recognition methods, device, storage medium and mobile terminal
CN108538283A (en) * 2018-03-15 2018-09-14 上海电力学院 A kind of conversion method by lip characteristics of image to speech coding parameters
CN108648745A (en) * 2018-03-15 2018-10-12 上海电力学院 A kind of conversion method by lip image sequence to speech coding parameters
CN108831472A (en) * 2018-06-27 2018-11-16 中山大学肿瘤防治中心 A kind of artificial intelligence sonification system and vocal technique based on lip reading identification
CN109145088A (en) * 2018-08-10 2019-01-04 广东小天才科技有限公司 Searching method based on family education machine and family education machine
CN109214820A (en) * 2018-07-06 2019-01-15 厦门快商通信息技术有限公司 A kind of trade company's cash collecting system and method based on audio-video combination
CN110096154A (en) * 2019-05-08 2019-08-06 北京百度网讯科技有限公司 For handling the method and device of information
CN110213431A (en) * 2019-04-30 2019-09-06 维沃移动通信有限公司 Message method and mobile terminal
CN110568922A (en) * 2018-06-06 2019-12-13 奥迪股份公司 Method for recognizing input
CN111045639A (en) * 2019-12-11 2020-04-21 深圳追一科技有限公司 Voice input method, device, electronic equipment and storage medium
CN111063354A (en) * 2019-10-30 2020-04-24 云知声智能科技股份有限公司 Man-machine interaction method and device
CN111326152A (en) * 2018-12-17 2020-06-23 南京人工智能高等研究院有限公司 Voice control method and device
CN111625094A (en) * 2020-05-25 2020-09-04 北京百度网讯科技有限公司 Interaction method and device for intelligent rearview mirror, electronic equipment and storage medium
CN112634924A (en) * 2020-12-14 2021-04-09 深圳市沃特沃德股份有限公司 Noise filtering method and device based on voice call and computer equipment
CN113168227A (en) * 2018-12-14 2021-07-23 三星电子株式会社 Method of performing function of electronic device and electronic device using the same
WO2021223765A1 (en) * 2020-06-01 2021-11-11 青岛海尔洗衣机有限公司 Voice recognition method, voice recognition system and electrical device
CN113689858A (en) * 2021-08-20 2021-11-23 广东美的厨房电器制造有限公司 Control method and device of cooking equipment, electronic equipment and storage medium
CN114023351A (en) * 2021-12-17 2022-02-08 广东讯飞启明科技发展有限公司 Speech enhancement method and system based on noisy environment
WO2023006033A1 (en) * 2021-07-29 2023-02-02 华为技术有限公司 Speech interaction method, electronic device, and medium
CN116721661A (en) * 2023-08-10 2023-09-08 深圳中检实验室技术有限公司 Man-machine interaction management system for intelligent safe biological cabinet
CN117251095A (en) * 2023-09-12 2023-12-19 深圳市驿格科技有限公司 Data input method and system for PDA
WO2024178096A1 (en) * 2023-02-21 2024-08-29 Meta Platforms Technologies, Llc Speech reconstruction system for multimedia files

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5313522A (en) * 1991-08-23 1994-05-17 Slager Robert P Apparatus for generating from an audio signal a moving visual lip image from which a speech content of the signal can be comprehended by a lipreader
CN201986001U (en) * 2010-12-31 2011-09-21 上海华勤通讯技术有限公司 Mouth shape identification input mobile terminal
CN102932212A (en) * 2012-10-12 2013-02-13 华南理工大学 Intelligent household control system based on multichannel interaction manner
CN105825167A (en) * 2016-01-29 2016-08-03 维沃移动通信有限公司 Method for enhancing lip language recognition rate and mobile terminal
CN107293300A (en) * 2017-08-01 2017-10-24 珠海市魅族科技有限公司 Audio recognition method and device, computer installation and readable storage medium storing program for executing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5313522A (en) * 1991-08-23 1994-05-17 Slager Robert P Apparatus for generating from an audio signal a moving visual lip image from which a speech content of the signal can be comprehended by a lipreader
CN201986001U (en) * 2010-12-31 2011-09-21 上海华勤通讯技术有限公司 Mouth shape identification input mobile terminal
CN102932212A (en) * 2012-10-12 2013-02-13 华南理工大学 Intelligent household control system based on multichannel interaction manner
CN105825167A (en) * 2016-01-29 2016-08-03 维沃移动通信有限公司 Method for enhancing lip language recognition rate and mobile terminal
CN107293300A (en) * 2017-08-01 2017-10-24 珠海市魅族科技有限公司 Audio recognition method and device, computer installation and readable storage medium storing program for executing

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108538283B (en) * 2018-03-15 2020-06-26 上海电力学院 Method for converting lip image characteristics into voice coding parameters
CN108538283A (en) * 2018-03-15 2018-09-14 上海电力学院 A kind of conversion method by lip characteristics of image to speech coding parameters
CN108648745A (en) * 2018-03-15 2018-10-12 上海电力学院 A kind of conversion method by lip image sequence to speech coding parameters
CN108648745B (en) * 2018-03-15 2020-09-01 上海电力学院 Method for converting lip image sequence into voice coding parameter
CN108537207A (en) * 2018-04-24 2018-09-14 Oppo广东移动通信有限公司 Lip reading recognition methods, device, storage medium and mobile terminal
CN110568922B (en) * 2018-06-06 2023-06-30 奥迪股份公司 Method for recognizing input
CN110568922A (en) * 2018-06-06 2019-12-13 奥迪股份公司 Method for recognizing input
CN108831472A (en) * 2018-06-27 2018-11-16 中山大学肿瘤防治中心 A kind of artificial intelligence sonification system and vocal technique based on lip reading identification
CN109214820A (en) * 2018-07-06 2019-01-15 厦门快商通信息技术有限公司 A kind of trade company's cash collecting system and method based on audio-video combination
CN109214820B (en) * 2018-07-06 2021-12-21 厦门快商通信息技术有限公司 Merchant money collection system and method based on audio and video combination
CN109145088A (en) * 2018-08-10 2019-01-04 广东小天才科技有限公司 Searching method based on family education machine and family education machine
CN113168227A (en) * 2018-12-14 2021-07-23 三星电子株式会社 Method of performing function of electronic device and electronic device using the same
CN111326152A (en) * 2018-12-17 2020-06-23 南京人工智能高等研究院有限公司 Voice control method and device
CN110213431A (en) * 2019-04-30 2019-09-06 维沃移动通信有限公司 Message method and mobile terminal
CN110096154A (en) * 2019-05-08 2019-08-06 北京百度网讯科技有限公司 For handling the method and device of information
CN111063354B (en) * 2019-10-30 2022-03-25 云知声智能科技股份有限公司 Man-machine interaction method and device
CN111063354A (en) * 2019-10-30 2020-04-24 云知声智能科技股份有限公司 Man-machine interaction method and device
CN111045639A (en) * 2019-12-11 2020-04-21 深圳追一科技有限公司 Voice input method, device, electronic equipment and storage medium
CN111045639B (en) * 2019-12-11 2021-06-22 深圳追一科技有限公司 Voice input method, device, electronic equipment and storage medium
CN111625094A (en) * 2020-05-25 2020-09-04 北京百度网讯科技有限公司 Interaction method and device for intelligent rearview mirror, electronic equipment and storage medium
WO2021223765A1 (en) * 2020-06-01 2021-11-11 青岛海尔洗衣机有限公司 Voice recognition method, voice recognition system and electrical device
CN112634924B (en) * 2020-12-14 2024-01-09 深圳市沃特沃德信息有限公司 Noise filtering method and device based on voice call and computer equipment
CN112634924A (en) * 2020-12-14 2021-04-09 深圳市沃特沃德股份有限公司 Noise filtering method and device based on voice call and computer equipment
WO2023006033A1 (en) * 2021-07-29 2023-02-02 华为技术有限公司 Speech interaction method, electronic device, and medium
CN113689858B (en) * 2021-08-20 2024-01-05 广东美的厨房电器制造有限公司 Control method and device of cooking equipment, electronic equipment and storage medium
CN113689858A (en) * 2021-08-20 2021-11-23 广东美的厨房电器制造有限公司 Control method and device of cooking equipment, electronic equipment and storage medium
CN114023351A (en) * 2021-12-17 2022-02-08 广东讯飞启明科技发展有限公司 Speech enhancement method and system based on noisy environment
WO2024178096A1 (en) * 2023-02-21 2024-08-29 Meta Platforms Technologies, Llc Speech reconstruction system for multimedia files
CN116721661A (en) * 2023-08-10 2023-09-08 深圳中检实验室技术有限公司 Man-machine interaction management system for intelligent safe biological cabinet
CN116721661B (en) * 2023-08-10 2023-10-31 深圳中检实验室技术有限公司 Man-machine interaction management system for intelligent safe biological cabinet
CN117251095A (en) * 2023-09-12 2023-12-19 深圳市驿格科技有限公司 Data input method and system for PDA
CN117251095B (en) * 2023-09-12 2024-05-17 深圳市驿格科技有限公司 Data input method and system for PDA

Similar Documents

Publication Publication Date Title
CN107799125A (en) A kind of audio recognition method, mobile terminal and computer-readable recording medium
CN107608514A (en) Information processing method and mobile terminal
CN107872639A (en) Transmission method, device and the mobile terminal of communication video
CN107613131A (en) A kind of application program disturbance-free method and mobile terminal
CN107679514A (en) A kind of face identification method and electronic equipment
CN108108007A (en) A kind of processing method and mobile terminal for reducing power consumption
CN108803963A (en) A kind of screenshotss method and mobile terminal
CN107808084A (en) A kind of touch operation method and mobile terminal
CN107911445A (en) A kind of information push method, mobile terminal and storage medium
CN107845057A (en) One kind is taken pictures method for previewing and mobile terminal
CN107678829A (en) A kind of application control method and mobile terminal
CN107785027A (en) A kind of audio-frequency processing method and electronic equipment
CN107864353A (en) A kind of video recording method and mobile terminal
CN109412932A (en) A kind of screenshotss method and terminal
CN108521501A (en) Pronunciation inputting method and mobile terminal
CN107835495A (en) A kind of message prompt method, device and mobile terminal
CN110012172A (en) A kind of processing incoming call and terminal equipment
CN110225195A (en) A kind of audio communication method and terminal
CN111182118B (en) Volume adjusting method and electronic equipment
CN107705804A (en) A kind of audible device condition detection method and mobile terminal
CN109348035A (en) A kind of recognition methods of telephone number and terminal device
CN109743454A (en) A kind of call handling method and mobile terminal
CN108650392A (en) A kind of call recording method and mobile terminal
CN108600679A (en) A kind of video call method and terminal
CN107749761A (en) A kind of anti-interference method and mobile terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180313

RJ01 Rejection of invention patent application after publication