CN114528851A - Reply statement determination method and device, electronic equipment and storage medium - Google Patents
Reply statement determination method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114528851A CN114528851A CN202210148787.0A CN202210148787A CN114528851A CN 114528851 A CN114528851 A CN 114528851A CN 202210148787 A CN202210148787 A CN 202210148787A CN 114528851 A CN114528851 A CN 114528851A
- Authority
- CN
- China
- Prior art keywords
- intention
- user
- pinyin
- text
- voice information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000012545 processing Methods 0.000 claims abstract description 63
- 238000012790 confirmation Methods 0.000 claims abstract description 55
- 238000004458 analytical method Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 93
- 230000015654 memory Effects 0.000 claims description 29
- 238000000605 extraction Methods 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000013145 classification model Methods 0.000 claims description 8
- 238000013473 artificial intelligence Methods 0.000 abstract description 7
- 239000003973 paint Substances 0.000 description 19
- 230000008439 repair process Effects 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000009825 accumulation Methods 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 241000234435 Lilium Species 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
The application relates to the technical field of artificial intelligence, and particularly discloses a reply sentence determining method, a reply sentence determining device, electronic equipment and a storage medium, wherein the reply sentence determining method comprises the following steps: acquiring voice information of a user at the current moment, and analyzing the voice information to obtain a text; extracting the characteristics of the text to obtain characteristics X; acquiring the number of samples in a feature library; when the number of samples is smaller than or equal to a first threshold value, sending the voice information to an artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A; performing secondary confirmation processing on the user according to the intention A; when the secondary confirmation processing is passed, combining the intention A and the characteristics X, storing a combined result serving as a sample into a characteristic library, and generating a reply sentence according to the intention A to reply the user; and when the secondary confirmation processing is failed, generating rejection information and sending the rejection information to the user.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a reply sentence determining method and device, electronic equipment and a storage medium.
Background
With the advent of the intelligent era, the intelligent dialogue system is widely applied to the field of customer service, and replaces the traditional manual seat to simply communicate with the user, for example: scene navigation, service navigation, simple information, etc., to reduce the labor cost of the enterprise. In the existing intelligent dialogue system, only after the intelligent dialogue system obtains the accurate intention of the user, the business process corresponding to the intention of the user is started. For example: the intelligent dialogue system recognizes that the current intention of the user is 'automobile scratch paint repair claim consultation' through the voice information of the user, and then the intelligent dialogue system matches a corresponding flow chart and a dialogue library according to the intention, generates a reply sentence according to the intention of the current input voice of the user, starts dialogue service and helps the customer solve the problem. Therefore, the traditional intelligent dialogue system depends on accurate recognition of the user intention, when the user intention cannot be recognized accurately, the intelligent dialogue system can be in an unanswered situation, only refusal recognition can be output to the user, and the user can be requested to explain the refusal recognition again, or the intelligent dialogue system is directly hung up, so that the user experience is poor.
Therefore, the conventional solution is to accumulate a large number of dialog samples and train the model of the intelligent dialog system, so as to improve the accuracy of the intention recognition of the speech input by the user. It is clear that in this way the greater the number of samples, the greater the accuracy of the resulting model.
In summary, the conventional method requires a lot of time to accumulate the original data. However, for the new field, on one hand, since the business is newly opened, a large amount of samples are not accumulated, and on the other hand, the intelligent dialogue system is required to be used immediately to deal with the consultation of a large amount of users, and no time is spent on collecting the samples. Therefore, the conventional method cannot be applied to the establishment of an intelligent dialogue system in the new field, and an intelligent dialogue scheme which can be directly used under the condition of low samples or even zero samples and can simultaneously accumulate samples is urgently needed.
Disclosure of Invention
In order to solve the above problems in the prior art, embodiments of the present application provide a reply sentence determination method, apparatus, electronic device, and storage medium, which can be directly used in the case of low or even zero samples, and simultaneously perform sample accumulation.
In a first aspect, an embodiment of the present application provides a reply statement determination method, including:
acquiring voice information of a user at the current moment, and analyzing the voice information to obtain a text;
extracting the character text to obtain a characteristic X;
acquiring the number of samples in a feature library;
when the number of samples is smaller than or equal to a first threshold value, sending the voice information to an artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A;
performing secondary confirmation processing on the user according to the intention A;
when the secondary confirmation processing is passed, combining the intention A and the characteristics X, storing a combined result serving as a sample into a characteristic library, and generating a reply sentence according to the intention A to reply the user;
and when the secondary confirmation processing is failed, generating rejection information and sending the rejection information to the user.
In a second aspect, an embodiment of the present application provides a reply sentence determination apparatus, including:
the analysis module is used for acquiring the voice information of the user at the current moment and analyzing the voice information to obtain a text;
the extraction module is used for extracting the characteristics of the text to obtain characteristics X;
the processing module is used for acquiring the intention of the text to obtain an intention A; performing secondary confirmation processing on the user according to the intention A; and when the secondary confirmation processing is passed, generating a reply sentence according to the intention A, and replying the user.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor coupled to the memory, the memory for storing a computer program, the processor for executing the computer program stored in the memory to cause the electronic device to perform the method of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, the computer program causing a computer to perform the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer operable to cause the computer to perform a method according to the first aspect.
The implementation of the embodiment of the application has the following beneficial effects:
it can be seen that in the embodiment of the application, when sample support is lacked in the feature library, the intention of the user is identified through a human agent so as to ensure the accuracy of intention identification in the case of low samples or zero samples. Meanwhile, secondary confirmation is carried out on the client according to the intention recognized by the manual agent, the correctness of the intention is further improved, and the intention is combined with the characteristics of the user voice extracted by the user, and the combination is stored as a sample. Therefore, under the condition of low samples or zero samples, the correct operation of the intelligent dialogue system can be ensured, output refusal or hang-up which is not repeated due to the accuracy of intention identification is avoided, and the user experience is improved. Meanwhile, the correctness of the intention is ensured by a secondary confirmation mode, the accumulation of the correct samples is completed in the service process, and the accuracy and the efficiency of subsequent training are further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of a reply statement determination method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for obtaining a text by applying to a voice message of a user at a current time according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for obtaining a feature X by performing feature extraction on a text in accordance with an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a method for calculating similarity according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a method for a user to perform secondary confirmation processing on an intention a according to an embodiment of the present application;
fig. 6 is a schematic hardware structure diagram of a reply statement determination apparatus according to an embodiment of the present application;
fig. 7 is a block diagram illustrating functional modules of a reply statement determination apparatus according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Firstly, it should be noted that the reply sentence determination method provided by the present application can be applied to remote intelligent voice service conversation, offline intelligent robot scene navigation, intelligent business handling machine business guidance and other scenes. In this embodiment, a remote intelligent voice customer service dialogue scene will be taken as an example to explain the reply sentence determination method provided by the present application, and the reply sentence determination method in other scenes is similar to the reply sentence determination method in the remote intelligent voice customer service dialogue scene, and is not described herein again.
Next, it should be noted that the embodiments disclosed in the present application may acquire and process related data based on artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Referring to fig. 1, fig. 1 is a schematic flow chart of a reply statement determination method according to an embodiment of the present application. The reply statement determination method comprises the following steps:
101: and acquiring the voice information of the user at the current moment, and analyzing the voice information to obtain a text.
Specifically, after the user establishes communication with the intelligent customer service robot through the communication device, the intelligent customer service robot can receive voice information input by the user through the communication device, and then analyze the voice information to obtain a text. In an alternative embodiment, for example, in an off-line intelligent robot scene navigation scene, the intelligent robot may collect voice information of a user through a voice collecting device such as a microphone installed in the intelligent robot. In other words, the method for acquiring speech in real time in the art can be applied to the present embodiment, that is, the present embodiment is not limited to the method for acquiring speech in real time.
Meanwhile, in this embodiment, a method for obtaining a text by analyzing voice information of a user at a current time is provided, as shown in fig. 2, the method includes:
201: and audio extraction is carried out on the voice information to obtain a pinyin text.
In the embodiment, the audio extraction can be performed on the voice information of the user at the current moment to obtain the corresponding audio characteristics, and then the audio characteristics are further analyzed and disassembled to obtain the corresponding pinyin text. For example, the voice information of the user at the current moment is "i want to handle the car scratch paint repair claim", and after audio extraction is performed on the voice information, a pinyin text can be obtained: "Woxingyaobanliqicheguahenbuqilipei".
202: and segmenting the pinyin text to obtain at least one sub-pinyin.
In this embodiment, each of the at least one sub-pinyins is used to identify a syllable in the pronunciation. Following the example of "i want to handle scratch paint repair claim" of car ", after obtaining the pinyin text" wooxingyobanliqicheguahenbuqiipiei ", the pinyin text may be divided according to the pinyin composition rule to obtain at least one sub-pinyin. Specifically, the initial consonants and the final sounds in the pinyin text are recognized firstly, the pinyin text is further split into the single initial consonants and the single final sounds, the initial consonants and the final sounds are combined according to the pinyin composition rule, and the initial consonants and the final sounds are combined to obtain at least one sub-pinyin. When the pinyin text is split, a first initial consonant and a first final are firstly identified. In order to ensure that the first final is successfully identified, the last phonetic letter and the next phonetic letter of the first final are identified, and if the two phonetic letters are initials, the first letter is a part of the previous final. For example, "hanghang" may identify an initial consonant and a final sound as [ h, ang, h, ang ] or [ h, an, g, h, an, g ], and identifies a third and a fourth pinyin letters in [ h, an, g, h, an, g ], and the third pinyin letter [ g ] is followed by no final sound. Therefore, for the pinyin text 'wooxingyobanliqiuchaehenbuqiipibei', after the initial consonant and final sound identification processing, a character string can be obtained: [ w, o, x, i, ang, y, ao, b, an, l, i, q, i, ch, e, g, u, a, h, en, b, u, q, i, l, i, p, ei ]. And then, identifying and combining the character strings according to a pinyin composition rule, and identifying backwards from the first character. When the second initial is recognized, the first few characters are combined into a sub-pinyin. And then carrying out subsequent recognition until the last character is recognized. After splitting and combining the initial consonant and the final sound of the character string in the example, at least one sub-pinyin [ wo, xiang, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei ] can be obtained.
203: and acquiring the application scene of the voice information of the user at the current moment.
In this embodiment, the customer service number dialed when the user communicates with the intelligent customer service robot may be obtained, and the obtained customer service number may be matched with the preset customer service number classification table to obtain the application scenario corresponding to the customer service number. For example, different service numbers in the service number table may be preset to correspond to different application scenarios, for example: the customer service number '10087' corresponds to a 'maintenance' application scene; the customer service number "10089" corresponds to an "insurance" application scenario. Based on this, following the example of "i want to handle car scratch repair paint claim", when obtaining that the service number currently dialed by the user is "10089", matching the service number with the preset telephone number classification table, and determining that the application scene corresponding to the service number is "insurance" scene.
204: and determining a preset word bank corresponding to the application scene according to the application scene of the voice information of the user at the current moment.
In the embodiment, the word bank is screened through the application scene, so that the number of candidate words can be reduced, and the confirmation efficiency of the text is improved. And the semantics of the candidate words accord with the application scene of the voice information, so that the accuracy of the subsequently generated text can be improved.
205: and matching each sub-pinyin in a preset word bank to obtain at least one group of first words corresponding to at least one sub-pinyin one to one.
In this embodiment, the sub-pinyins are compared with pinyins of the words in the screened preset lexicon. Identifying the semantics of the sub-Pinyin, matching the semantics with the semantics of the words in the preset word bank, and selecting the word with the maximum matching degree as the first word corresponding to the sub-Pinyin. Illustratively, the sub-pinyins are [ wo, xiang, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei ], of which "li" is selected for detailed explanation. Specifically, in the preset word stock, for a group of first words corresponding to the sub-pinyin "li", the first words may be: "Lily", "Lily" or "Li".
206: and determining a target word in the first word group corresponding to each sub-pinyin according to the sub-pinyin adjacent to each sub-pinyin to obtain at least one target word corresponding to at least one sub-pinyin one to one.
In this embodiment, the word group can be obtained by combining two words adjacent to each other on the left and/or right of each sub-pinyin. And matching the semantics of the phrases with the application scene of the voice information of the user at the current moment, and screening out the target words which most accord with the application scene from at least one group of first words. Illustratively, in the predetermined word bank, a group of first words corresponding to the sub-pinyin "li" includes: "reason", "benefit" or "inner", a group of first words corresponding to the sub-pinyin "pei" adjacent to the right thereof includes: "claims", "matches" or "accompanies". Thus, by combining, the phrase: "claim", "match", "accompany", "claim", "interest accompany", "claim", "interest match" and "accompany". And then matching the obtained phrase with the application scene insurance to obtain the phrase 'claim settlement' with the highest matching degree of the application scene, so that the first word 'principle' is used as the target word corresponding to the sub-pinyin 'li'.
207: and arranging at least one target word according to the arrangement sequence of at least one sub-pinyin in the pinyin text to obtain a text.
In this embodiment, after the target words corresponding to each sub-pinyin are obtained through the screening in step 206, the target words may be arranged according to the sequence of the respective corresponding sub-pinyin in the pinyin text, so as to obtain the text. Specifically, the above example of "i want to handle a car scratch repair paint claim" is followed. Through the previous series of operations, the target words corresponding to the sub-pinyin [ wo, xiang, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei ] are respectively obtained as follows: "i", "want", "do", "care", "car", "scratch", "repair", "paint", "care", "claim". Thus, the target words "i", "want", "do", "car", "scratch", "patch", "paint", "clear", "claim" are arranged in the sequence of the pinyin text "woo, xiang, yao, ban, li, qi, che, gua, hen, bu, qi, li, pei" to obtain the text "i want to handle car scratch and paint patch claims".
102: and extracting the characteristics of the text to obtain the characteristics X.
In the present embodiment, there is provided a method for obtaining a feature X by performing feature extraction on a text, as shown in fig. 3, the method including:
301: and performing word splitting processing on the text to obtain at least one keyword.
In this embodiment, at least one candidate field may be obtained by recognizing a segmented character in a text and then replacing the recognized segmented character with a space. Specifically, the segmentation character may be set in advance, including but not limited to: verbs, noun words, punctuation, special symbols, etc. And then, performing forward maximum matching on each candidate field in the at least one candidate field and the general word segmentation dictionary respectively, and taking the successfully matched word in the general word segmentation dictionary as the candidate word corresponding to each candidate field. And finally, screening the obtained candidate words to obtain at least one keyword.
For example, several candidate words may be screened by aligning their semantics, specifically: the text of the word "I want to handle the automobile scratch paint repair claim" is segmented, and the word "I want to handle the automobile scratch paint repair claim" is obtained according to the separators of the verb words and the noun words. And comparing the semantics of the candidate words, and knowing by combining the application scene where the user is currently located, wherein the keywords are 'car', 'scratch', 'paint repair', 'claim settlement'.
302: and calculating the relevance between any two different keywords in the at least one keyword to obtain at least one relevance.
In the present embodiment, first, two arbitrary adjacent keywords of at least one keyword are combined to obtain a second word combination. And matching the second word combination with the application scene, scoring the matching degree of the second word combination and the application scene, determining that the two keywords are associated when the score is greater than a fifth threshold value, and taking the score as the association degree. Continuing with the examples with the keywords "car", "scratch", "paint repair", "claim", combining "car" and "scratch" to obtain a second word combination "car scratch", matching with the application scenario "insurance", to obtain a score of 95, and determining that the two words are associated and the degree of association is 95 if the score is greater than a fifth threshold.
303: and constructing a keyword map according to the at least one association degree and the at least one keyword.
In the embodiment, firstly, a full connection graph is established by taking a keyword as a vertex; and deleting the connecting lines with the association degree lower than a fourth threshold value in the full connection graph according to the full connection graph and the association degree to generate a keyword graph corresponding to the keywords.
304: and performing graph embedding processing on each keyword in the at least one keyword according to the keyword graph to obtain at least one first graph vector, wherein the at least one first graph vector corresponds to the at least one keyword one to one.
In the present embodiment, first, a keyword is determined as a graph node. And then constructing a same graph according to the nodes in the graph. And finally, calling a deep walking model to carry out graph embedding processing on the co-occurrence relation structure graph, and outputting a first graph vector corresponding to the keyword.
305: and performing word embedding processing on each keyword to obtain at least one first word vector corresponding to at least one keyword one to one.
306: and for each image quantity in at least one image vector, calculating the average vector of each image quantity and the word vector corresponding to each image quantity to obtain at least one first vector, wherein the at least one first vector is in one-to-one correspondence with the at least one keyword.
In this embodiment, at least one first graph vector corresponding to at least one keyword and at least one word vector are added, and then an average value is obtained, and an average vector is obtained by calculation, so that at least one first vector corresponding to at least one keyword is obtained. Illustratively, as in the above example, the word vector for the keyword "car" is (1,2) and the map vector is (5,6), the summed vectors are (6,8), and the average vector is calculated as (3,4) as the first vector for the keyword "car".
307: and splicing the at least one first vector according to the sequence of the at least one keyword in the text to obtain the characteristic X.
In this embodiment, following the above example of "i want to process the car scratch and paint repair claim", the keyword "car" corresponds to the first vector a, the keyword "scratch" corresponds to the first vector B, the keyword "paint repair" corresponds to the first vector C, and the keyword "claim" corresponds to the first vector D, which are arranged in the order of the original text "car scratch and paint repair claim", that is, "car", "scratch", "paint repair", and "claim". And longitudinally splicing the first vectors from top to bottom to obtain a vector P as a feature X of the text 'automobile scratch paint repair claim'. Specifically, the vector P can be represented by the formula (r):
103: and acquiring the number of samples of the samples in the feature library.
104: and when the number of samples is less than or equal to a first threshold value, sending the voice information to the artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A.
In this embodiment, when the number of samples is less than or equal to the first threshold, it indicates that there are not enough samples in the database to support the intelligent customer service robot to accurately recognize the voice information of the user. Therefore, the voice information of the user at the current moment can be sent to the artificial seat, and the intention A corresponding to the voice information of the user can be determined through the recognition of the artificial seat.
Meanwhile, in the present embodiment, when the number of samples is greater than the first threshold, it is indicated that there is already a certain number of samples as a support in the database, and although the number of samples is not enough to support training of the intention recognition model with a sufficiently high accuracy, the number of sample features corresponding to each intention is enough to support feature comparison. Based on this, when the intention of the user voice information is recognized, the similarity calculation processing may be performed on the feature X and each of N samples in the feature library to obtain N similarities which correspond to the N samples one by one, where N is an integer greater than or equal to 1. And then determining the maximum similarity among the N similarities as a target similarity, and when the target similarity is greater than a second threshold, taking an intention B corresponding to the target similarity as an intention A of the voice information of the user at the current moment.
Illustratively, when the number of samples in the feature library reaches a certain number M (a first threshold), after extracting features from the speech information of the user at the current time, similarity calculation may be performed on the extracted features and the features collected in the feature library, respectively, to obtain a similarity S. Specifically, the feature library includes J intents, each intention corresponds to 500 samples, and the similarity calculation process is performed to obtain J × 500 similarities. At this time, the maximum similarity S can be found from the jx 500 similaritiesmaxAnd the maximum similarity S is determinedmaxAnd comparing with a preset second threshold value. When S ismaxIf the similarity is greater than the second threshold, the maximum similarity S is obtainedmaxThe intention to which the corresponding sample belongs is taken as the intention of the voice information of the user at the current moment.
Based on this, in the present embodiment, there is provided a method of calculating a similarity, as shown in fig. 4, the method including:
401: and calculating the product of the feature X and the feature vector corresponding to each sample to obtain a vector product F.
402: and calculating the product of the modulus of the characteristic X and the modulus of the characteristic vector corresponding to each sample to obtain the length product E of the modulus of the characteristic X and the characteristic vector corresponding to each sample.
403: and calculating the sum of the product E of the length of the modulus of the characteristic X and the characteristic vector corresponding to each sample and a constant C to obtain the length sum G.
In this embodiment, the constant C may be an integer greater than or equal to 1, and the constant C may be used to avoid that the length product of the modulus of the vector of the feature X and the modulus of the vector of the sample feature is 0, which may result in the formula being invalid.
404: and acquiring the ratio of the vector product F and the length G, and taking the ratio as the similarity between the feature X and each sample.
Specifically, the similarity can be expressed by the formula (ii):
wherein S is similarity, a is feature X, b is a feature vector corresponding to each sample, i.
Further, a | i | of the vector of the feature X may be represented by formula (c):
where [ a ] is the modulus of the vector of the feature X, V1-VdIs an element in the vector of features X; further, the norm i of the feature vector corresponding to each sample can be represented by the formula (iv):
where pii is the modulus, X, of the feature vector corresponding to each sample1-XdOne element in the feature vector corresponding to each sample;
in this embodiment, when the number of samples is greater than the third threshold, it is described that the number of samples accumulated in the feature library is sufficient to support the training of the model, so as to obtain an accurate intent recognition model. Therefore, at this stage, the accumulated samples in the sample library can be input into the initial model for training to obtain a classification model, and then the features X are input into the classification model to obtain the intention a. Wherein the third threshold is greater than the first threshold.
Specifically, in the present embodiment, the initial model may adopt a natural languageUnderstanding (Natural Language Understanding, NLU) model. The NLU model is essentially a feature extraction network and a classifier, features are extracted through the feature extraction network, then the features are input into the classifier to be classified, the output labels are scoring values of all accurate intents, the result with the highest score and larger than a preset threshold value T is selected as a final result, and if the score is smaller than the threshold value, a rejection (intention can not be identified) is output. Assuming a total of N business scenarios, there are N intents (A)1,A2,A3,…,Ae) In the early stage, data are collected in practice in such a way that when each intention-collected feature X reaches a certain number M, for example, each intention collects more than 500 pieces of features, a feature library (a feature library formed by collecting M pieces of features for each intention) is obtained.
In this embodiment, after the data amount is collected to a certain extent, that is, when the number of samples is greater than the third threshold, the conventional NLU model training may be performed. The user intention is output through the NLU model, and the extracted feature X is additionally output. And if the refusal is identified, pushing the words spoken by the user to the seat personnel, enabling the seat personnel to judge and give the intention A, and warehousing the data to obtain a bad case feature library. In the embodiment, any number of NLU models can be started (M are randomly selected when the number of the NLU models is larger than M), and the artificial seat pressure during the period of next NLU model training online can be relieved.
105: it is determined whether the secondary confirmation processing performed to the user according to the intention a passes, and when the secondary confirmation processing passes, it jumps to step 106, and when the secondary confirmation processing does not pass, it jumps to step 107.
In the present embodiment, the secondary confirmation is performed to confirm to the user whether the recognized intention a is accurate before performing the subsequent service, so as to ensure the recognition accuracy. In the present embodiment, there is provided a method for a user to perform secondary confirmation processing of an intention a, as shown in fig. 5, the method including:
501: a confirmation statement is generated from intent a.
In the present embodiment, for example, the intention a of "a car scratch repair paint claim" may be converted into a question similar to the question sentence "whether your requirement is a car scratch repair paint claim? ", as a confirmation statement to confirm to the user whether the recognized intention a is correct.
502: and sending a confirmation statement to the user and receiving feedback information of the user.
In the present embodiment, a confirmation sentence is transmitted to the user, information fed back by the user is received by voice, the voice fed back by the user is received, and the intention fed back by the user is recognized. Besides, in a scene under the online condition, two choices of 'yes' and 'no' can be displayed to the user through the display device and confirmed by the user.
503: when the feedback information is yes, it is determined that the secondary confirmation processing has passed.
504: and when the feedback information is negative, judging that the secondary confirmation processing is not passed.
106: combining the intention A and the characteristics X, storing a combination result as a sample into a characteristic library, and generating a reply sentence according to the intention A so as to reply to the user.
In the embodiment, the accurate intention A obtained by the secondary confirmation of the user is combined with the extracted features X to obtain an accurate sample to fill the sample in the feature library, so that the samples in the sample library are accumulated while working. And due to the relationship of the second confirmation, the resulting sample is essentially the forward sample intended to be correct. Illustratively, the precise intention a is combined with the corresponding feature X in the form of (a, X) and stored as a sample in the feature library.
107: and generating rejection information and sending the rejection information to the user.
In summary, according to the method for determining the reply statement provided by the present application, in the feature library, when the number of samples is less than or equal to the first threshold, the feature library lacks sample support, and the identified intention is output to the user by using a mode of cooperation of an artificial seat and a feature extraction network. And the client confirms the identified intention for the second time, and performs subsequent service after obtaining the user feedback. And when the user feedback is confirmation, putting the voice information of the user and the corresponding intention into the feature library together to realize data accumulation. In the feature library, when the number of samples is greater than the first threshold and less than or equal to the third threshold, the feature library has a certain sample basis and still uses the feature extraction network to identify the intention of the user voice information. At this stage, based on that the feature library has a certain sample basis, similarity calculation is carried out on the intention of the feature extraction network for identifying the voice information of the user and the existing sample features, and the intention corresponding to the maximum similarity is obtained. Meanwhile, secondary confirmation is needed, and subsequent services are performed after user feedback is obtained. And when the user feedback is confirmation, putting the voice information of the user and the corresponding intention into the feature library together to realize data accumulation. In the feature library, when the number of samples is greater than a third threshold value and the number of samples in the feature library is sufficient, conventional intention identification is performed, and an intention and a corresponding feature are output. And performing secondary confirmation, and performing subsequent service after user feedback is obtained. And when the user feedback is confirmation, putting the voice information of the user and the corresponding intention into the feature library together to realize data accumulation. Through the series of processes, a training process of the intelligent customer service robot from the beginning to the end is realized. Not only slowed down the pressure of artifical seat, also fine solution the problem of cold start in earlier stage simultaneously, strengthened intelligent customer service robot system's reliability, reduced the cost of enterprise, improved the experience of inlet wire customer.
Referring to fig. 6, fig. 6 is a schematic diagram of a hardware structure of a reply statement determination apparatus according to an embodiment of the present disclosure. The reply sentence determination apparatus 600 comprises at least one processor 601, a communication line 602, a memory 603 and at least one communication interface 604.
In this embodiment, the processor 601 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present disclosure.
The communication link 602, which may include a path, carries information between the aforementioned components.
The communication interface 604 may be any transceiver or other device (e.g., an antenna, etc.) for communicating with other devices or communication networks, such as an ethernet, RAN, Wireless Local Area Network (WLAN), etc.
The memory 603 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
In this embodiment, the memory 603 may be independent and connected to the processor 601 through the communication line 602. The memory 603 may also be integrated with the processor 601. The memory 603 provided in the embodiments of the present application may generally have a nonvolatile property. The memory 603 is used for storing computer-executable instructions for executing the present application, and is controlled by the processor 601 to execute the instructions. The processor 601 is configured to execute computer-executable instructions stored in the memory 603, thereby implementing the methods provided in the embodiments described below.
In alternative embodiments, computer-executable instructions may also be referred to as application code, which is not specifically limited in this application.
In alternative embodiments, processor 601 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 6.
In an alternative embodiment, the reply sentence determination apparatus 600 may include a plurality of processors, such as the processor 601 and the processor 607 in fig. 6. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In an optional embodiment, if one reply statement determination apparatus 600 is a server, for example, it may be an independent server, or may be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform, and the like. The reply sentence determination apparatus 600 may further include an output device 605 and an input device 606. Output device 605 is in communication with processor 601 and may display information in a variety of ways. For example, the output device 605 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 606 is in communication with the processor 601 and may receive user input in a variety of ways. For example, the input device 606 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
A reply sentence determination apparatus 600 described above may be a general-purpose device or a special-purpose device. The present embodiment does not limit the type of the reply sentence determination apparatus 600.
Referring to fig. 7, fig. 7 is a block diagram illustrating functional modules of a reply statement determination apparatus according to an embodiment of the present disclosure. As shown in fig. 7, the reply sentence determination apparatus includes:
the analysis module 701 is used for acquiring voice information of a user at the current moment and analyzing the voice information to obtain a text;
an extraction module 702, configured to perform feature extraction on the text to obtain a feature X;
the processing module 703 is configured to obtain the number of samples in the feature library, send the voice information to the human agent when the number of samples is less than or equal to a first threshold, receive an intention analysis result of the human agent on the voice information, obtain an intention a, perform secondary confirmation processing on the user according to the intention a, generate a reply statement according to the intention a when the secondary confirmation processing is passed, and reply the user.
In an embodiment of the present invention, in performing the secondary confirmation process to the user according to the intention a, the processing module 703 is specifically configured to:
generating a confirmation statement according to the intention A;
sending a confirmation statement to a user and receiving feedback information of the user;
when the feedback information is yes, judging that the secondary confirmation processing is passed;
and when the feedback information is negative, judging that the secondary confirmation processing is not passed.
In the embodiment of the present invention, in analyzing the voice information of the user at the current time to obtain the text, the analyzing module 701 is specifically configured to:
performing audio extraction on the voice information to obtain a pinyin text;
dividing the pinyin text to obtain at least one sub-pinyin, wherein each sub-pinyin in the at least one sub-pinyin is used for identifying a syllable in pronunciation;
acquiring an application scene of voice information of a user at the current moment;
determining a preset word bank corresponding to an application scene according to the application scene of the voice information of the user at the current moment;
matching each sub-pinyin in a preset word bank to obtain at least one group of first words, wherein the at least one group of first words is in one-to-one correspondence with at least one sub-pinyin;
determining a target word in a first word group corresponding to each sub-pinyin according to the adjacent sub-pinyin of each sub-pinyin to obtain at least one target word, wherein the at least one target word is in one-to-one correspondence with the at least one sub-pinyin;
and arranging at least one target word according to the arrangement sequence of at least one sub-pinyin in the pinyin text to obtain a text.
In the embodiment of the present invention, in extracting features of a text to obtain a feature X, the extracting module 702 is specifically configured to:
performing word splitting processing on the text to obtain at least one keyword;
calculating the association degree between any two different keywords in the at least one keyword to obtain at least one association degree;
constructing a keyword map according to at least one relevance and at least one keyword;
performing graph embedding processing on each keyword in at least one keyword according to the keyword graph to obtain at least one first graph vector, wherein the at least one first graph vector corresponds to the at least one keyword one to one;
performing word embedding processing on each keyword to obtain at least one first word vector, wherein the at least one first word vector corresponds to the at least one keyword one to one;
for each image quantity in at least one image vector, calculating an average vector of each image quantity and a word vector corresponding to each image quantity to obtain at least one first vector, wherein the at least one first vector is in one-to-one correspondence with at least one keyword;
and splicing the at least one first vector according to the sequence of the at least one keyword in the text to obtain the characteristic X.
In the embodiment of the present invention, when the number of samples is greater than the first threshold, the processing module 703 is specifically configured to:
performing similarity calculation processing on the feature X and each sample in N samples in a feature library to obtain N similarities, wherein the N similarities correspond to the N samples one by one, and N is an integer greater than or equal to 1;
determining target similarity in the N similarities, wherein the target similarity is the maximum similarity in the N similarities;
and when the target similarity is larger than a second threshold value, acquiring an intention B corresponding to the target similarity, and taking the intention B as an intention A.
In the embodiment of the present invention, in terms of calculating the similarity, the processing module 703 is specifically configured to:
calculating the product of the feature X and the feature vector corresponding to each sample to obtain a vector product F;
calculating the product of the modulus of the characteristic X and the modulus of the characteristic vector corresponding to each sample to obtain the length product E of the modulus of the characteristic X and the characteristic vector corresponding to each sample;
calculating the length product E of the modulus of the characteristic X and the characteristic vector corresponding to each sample and the sum of a constant C to obtain a length sum G, wherein the constant C is an integer greater than or equal to 1;
and acquiring the ratio of the vector product F and the length G, and taking the ratio as the similarity between the feature X and each sample.
Specifically, the similarity between the feature X and the feature vector corresponding to each sample can be represented by the formula (v):
wherein S is similarity, a is feature X, b is a feature vector corresponding to each sample, i.
In the embodiment of the present invention, when the number of samples is greater than the third threshold, the processing module 703 is specifically configured to:
inputting samples in a sample library into an initial model for training to obtain a classification model, wherein a third threshold value is larger than a first threshold value;
and inputting the characteristic X into a classification model to obtain the intention A.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 8, the electronic device 800 includes a transceiver 801, a processor 802, and a memory 803. Connected to each other by a bus 804. The memory 803 is used to store computer programs and data, and can transfer the data stored in the memory 803 to the processor 802.
The processor 802 is configured to read the computer program in the memory 803 to perform the following operations:
acquiring voice information of a user at the current moment, and analyzing the voice information to obtain a text;
extracting the character text to obtain a characteristic X;
acquiring the number of samples in a feature library;
when the number of samples is smaller than or equal to a first threshold value, sending the voice information to an artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A;
performing secondary confirmation processing on the user according to the intention A;
when the secondary confirmation processing is passed, combining the intention A and the characteristics X, storing a combined result serving as a sample into a characteristic library, and generating a reply sentence according to the intention A to reply the user;
and when the secondary confirmation processing is failed, generating rejection information and sending the rejection information to the user.
In an embodiment of the present invention, in performing the secondary confirmation process to the user according to the intention a, the processor 802 is specifically configured to perform the following operations:
generating a confirmation statement according to the intention A;
sending a confirmation statement to a user and receiving feedback information of the user;
when the feedback information is yes, judging that the secondary confirmation processing is passed;
and when the feedback information is negative, judging that the secondary confirmation processing is not passed.
In the embodiment of the present invention, in analyzing the voice information of the user at the current time to obtain the text, the processor 802 is specifically configured to perform the following operations:
performing audio extraction on the voice information to obtain a pinyin text;
dividing the pinyin text to obtain at least one sub-pinyin, wherein each sub-pinyin in the at least one sub-pinyin is used for identifying a syllable in pronunciation;
acquiring an application scene of voice information of a user at the current moment;
determining a preset word bank corresponding to an application scene according to the application scene of the voice information of the user at the current moment;
matching each sub-pinyin in a preset word bank to obtain at least one group of first words, wherein the at least one group of first words is in one-to-one correspondence with at least one sub-pinyin;
determining a target word in a first word group corresponding to each sub-pinyin according to the adjacent sub-pinyin of each sub-pinyin to obtain at least one target word, wherein the at least one target word is in one-to-one correspondence with the at least one sub-pinyin;
and arranging at least one target word according to the arrangement sequence of at least one sub-pinyin in the pinyin text to obtain a text.
In an embodiment of the present invention, in terms of extracting features of a text to obtain a feature X, the processor 802 is specifically configured to perform the following operations:
performing word splitting processing on the text to obtain at least one keyword;
calculating the association degree between any two different keywords in the at least one keyword to obtain at least one association degree;
constructing a keyword map according to at least one relevance and at least one keyword;
performing graph embedding processing on each keyword in at least one keyword according to the keyword graph to obtain at least one first graph vector, wherein the at least one first graph vector corresponds to the at least one keyword one to one;
performing word embedding processing on each keyword to obtain at least one first word vector, wherein the at least one first word vector corresponds to the at least one keyword one to one;
for each image quantity in at least one image vector, calculating an average vector of each image quantity and a word vector corresponding to each image quantity to obtain at least one first vector, wherein the at least one first vector is in one-to-one correspondence with at least one keyword;
and splicing the at least one first vector according to the sequence of the at least one keyword in the text to obtain the characteristic X.
In an embodiment of the present invention, when the number of samples is greater than the first threshold, the processor 802 is specifically configured to perform the following operations:
performing similarity calculation processing on the feature X and each sample in N samples in a feature library to obtain N similarities, wherein the N similarities are in one-to-one correspondence with the N samples, and N is an integer greater than or equal to 1;
determining target similarity in the N similarities, wherein the target similarity is the maximum similarity in the N similarities;
and when the target similarity is larger than a second threshold value, acquiring an intention B corresponding to the target similarity, and taking the intention B as an intention A.
In the embodiment of the present invention, in terms of calculating the similarity, the processor 802 is specifically configured to perform the following operations:
calculating the product of the feature X and the feature vector corresponding to each sample to obtain a vector product F;
calculating the product of the modulus of the characteristic X and the modulus of the characteristic vector corresponding to each sample to obtain the length product E of the modulus of the characteristic X and the characteristic vector corresponding to each sample;
calculating the length product E of the modulus of the characteristic X and the characteristic vector corresponding to each sample and the sum of a constant C to obtain a length sum G, wherein the constant C is an integer greater than or equal to 1;
and acquiring the ratio of the vector product F and the length G, and taking the ratio as the similarity between the feature X and each sample.
Specifically, the similarity between the feature X and the feature vector corresponding to each sample may be represented by the formula:
wherein S is similarity, a is feature X, b is a feature vector corresponding to each sample, i.
In an embodiment of the present invention, when the number of samples is greater than the third threshold, the processor 802 is specifically configured to perform the following operations:
inputting samples in a sample library into an initial model for training to obtain a classification model, wherein a third threshold value is larger than a first threshold value;
and inputting the characteristic X into a classification model to obtain the intention A.
It should be understood that the reply sentence determination apparatus in the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile Internet device MID (MID), a robot, a wearable device, etc. The reply sentence determination apparatus is merely an example, and is not exhaustive, and includes but is not limited to the reply sentence determination apparatus. In practical applications, the reply sentence determination apparatus may further include: intelligent vehicle-mounted terminal, computer equipment and the like.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention can be implemented by combining software and a hardware platform. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods of the embodiments or some parts of the embodiments of the present invention.
Therefore, the present application embodiment also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to implement part or all of the steps of any one of the reply sentence determination methods described in the above method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the reply sentence determination methods as described in the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required by the application.
In the above embodiments, the description of each embodiment has its own emphasis, and for parts not described in detail in a certain embodiment, reference may be made to the description of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a memory and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, and the memory may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the methods and their core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. A method for reply sentence determination, the method comprising:
acquiring voice information of a user at the current moment, and analyzing the voice information to obtain a text;
extracting the characteristics of the text to obtain characteristics X;
acquiring the number of samples in a feature library;
when the number of samples is smaller than or equal to a first threshold value, sending the voice information to an artificial seat, and receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A;
performing secondary confirmation processing on the user according to the intention A;
when the secondary confirmation processing is passed, combining the intention A and the feature X, storing a combined result serving as a sample into the feature library, and generating a reply statement according to the intention A to reply to the user;
and when the secondary confirmation processing is failed, generating rejection information and sending the rejection information to the user.
2. The method according to claim 1, wherein performing a secondary confirmation process to the user according to the intention a comprises:
generating a confirmation statement according to the intention A;
sending the confirmation statement to the user and receiving feedback information of the user;
when the feedback information is yes, judging that the secondary confirmation processing is passed;
and when the feedback information is negative, judging that the secondary confirmation processing is not passed.
3. The method of claim 1, wherein obtaining the voice information of the user at the current moment, and analyzing the voice information to obtain a text, comprises:
performing audio extraction on the voice information to obtain a pinyin text;
dividing the pinyin text to obtain at least one sub-pinyin, wherein each sub-pinyin in the at least one sub-pinyin is used for identifying a syllable in pronunciation;
acquiring an application scene of the voice information of the user at the current moment;
determining a preset word bank corresponding to the application scene according to the application scene of the voice information of the user at the current moment;
matching each sub-pinyin in a preset word stock to obtain at least one group of first words, wherein the at least one group of first words is in one-to-one correspondence with the at least one sub-pinyin;
determining a target word in a first word group corresponding to each sub-pinyin according to the sub-pinyin adjacent to each sub-pinyin to obtain at least one target word, wherein the at least one target word is in one-to-one correspondence with the at least one sub-pinyin;
and arranging the at least one target word according to the arrangement sequence of the at least one sub-pinyin in the pinyin text to obtain the text.
4. The method of claim 1, wherein the extracting the feature of the text to obtain the feature X comprises:
performing word splitting processing on the text to obtain at least one keyword;
calculating the association degree between any two different keywords in the at least one keyword to obtain at least one association degree;
constructing a keyword graph according to the at least one association degree and the at least one keyword;
performing graph embedding processing on each keyword in the at least one keyword according to the keyword graph to obtain at least one first graph vector, wherein the at least one first graph vector corresponds to the at least one keyword one to one;
performing word embedding processing on each keyword to obtain at least one first word vector, wherein the at least one first word vector corresponds to the at least one keyword one to one;
for each image quantity in the at least one image vector, calculating an average vector of the image quantity and a word vector corresponding to the image quantity to obtain at least one first vector, wherein the at least one first vector is in one-to-one correspondence with at least one keyword;
and splicing the at least one first vector according to the sequence of the at least one keyword in the text to obtain the characteristic X.
5. The method of claim 1, wherein when the number of samples is greater than the first threshold, the method further comprises:
performing similarity calculation processing on the feature X and each sample in N samples in the feature library to obtain N similarities, wherein the N similarities are in one-to-one correspondence with the N samples, and N is an integer greater than or equal to 1;
determining a target similarity among the N similarities, wherein the target similarity is the largest similarity among the N similarities;
and when the target similarity is larger than a second threshold value, acquiring an intention B corresponding to the target similarity, and taking the intention B as the intention A.
6. The method according to claim 5, wherein the performing similarity calculation processing on the feature X and each of N features in the feature library comprises:
calculating the product of the feature X and the feature vector corresponding to each sample to obtain a vector product F;
calculating the product of the modulus of the characteristic X and the modulus of the characteristic vector corresponding to each sample to obtain the length product E of the modulus of the characteristic X and the characteristic vector corresponding to each sample;
calculating the length product E of the modulus of the characteristic X and the characteristic vector corresponding to each sample and the sum of a constant C to obtain a length sum G, wherein the constant C is an integer greater than or equal to 1;
and acquiring the ratio of the vector product F to the length G, and taking the ratio as the similarity between the feature X and each sample.
7. The method of claim 1, wherein when the number of samples is greater than a third threshold, the method further comprises:
inputting the samples in the feature library into an initial model for training to obtain a classification model, wherein the third threshold is greater than the first threshold;
and inputting the features X into the classification model to obtain the intention A.
8. An apparatus for determining a reply sentence, the apparatus comprising:
the analysis module is used for acquiring the voice information of the user at the current moment and analyzing the voice information to obtain a text;
the extraction module is used for extracting the characteristics of the text to obtain characteristics X;
the processing module is used for obtaining the number of samples of the samples in the feature library, sending the voice information to an artificial seat when the number of samples is smaller than or equal to a first threshold value, receiving an intention analysis result of the artificial seat on the voice information to obtain an intention A, performing secondary confirmation processing on the user according to the intention A, and generating a reply sentence according to the intention A when the secondary confirmation processing is passed to reply the user.
9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the one or more programs including instructions for performing the steps in the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210148787.0A CN114528851B (en) | 2022-02-17 | 2022-02-17 | Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210148787.0A CN114528851B (en) | 2022-02-17 | 2022-02-17 | Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114528851A true CN114528851A (en) | 2022-05-24 |
CN114528851B CN114528851B (en) | 2023-07-25 |
Family
ID=81622667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210148787.0A Active CN114528851B (en) | 2022-02-17 | 2022-02-17 | Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114528851B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952271A (en) * | 2023-03-09 | 2023-04-11 | 杭州心识宇宙科技有限公司 | Method, device, storage medium and electronic equipment for generating dialogue information |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018032213A (en) * | 2016-08-24 | 2018-03-01 | シャープ株式会社 | Information processor, information processing system, information processing method and program |
CN108804612A (en) * | 2018-05-30 | 2018-11-13 | 武汉烽火普天信息技术有限公司 | A kind of text sentiment classification method based on counter propagation neural network model |
CN111563164A (en) * | 2020-05-07 | 2020-08-21 | 成都信息工程大学 | Specific target emotion classification method based on graph neural network |
CN112365894A (en) * | 2020-11-09 | 2021-02-12 | 平安普惠企业管理有限公司 | AI-based composite voice interaction method and device and computer equipment |
CN112417102A (en) * | 2020-11-26 | 2021-02-26 | 中国科学院自动化研究所 | Voice query method, device, server and readable storage medium |
CN112632244A (en) * | 2020-12-18 | 2021-04-09 | 平安普惠企业管理有限公司 | Man-machine conversation optimization method and device, computer equipment and storage medium |
CN113377928A (en) * | 2021-08-11 | 2021-09-10 | 明品云(北京)数据科技有限公司 | Text recommendation method, system, device and medium |
CN113378545A (en) * | 2021-06-08 | 2021-09-10 | 北京邮电大学 | Aspect level emotion analysis method and device, electronic equipment and storage medium |
-
2022
- 2022-02-17 CN CN202210148787.0A patent/CN114528851B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018032213A (en) * | 2016-08-24 | 2018-03-01 | シャープ株式会社 | Information processor, information processing system, information processing method and program |
CN108804612A (en) * | 2018-05-30 | 2018-11-13 | 武汉烽火普天信息技术有限公司 | A kind of text sentiment classification method based on counter propagation neural network model |
CN111563164A (en) * | 2020-05-07 | 2020-08-21 | 成都信息工程大学 | Specific target emotion classification method based on graph neural network |
CN112365894A (en) * | 2020-11-09 | 2021-02-12 | 平安普惠企业管理有限公司 | AI-based composite voice interaction method and device and computer equipment |
CN112417102A (en) * | 2020-11-26 | 2021-02-26 | 中国科学院自动化研究所 | Voice query method, device, server and readable storage medium |
CN112632244A (en) * | 2020-12-18 | 2021-04-09 | 平安普惠企业管理有限公司 | Man-machine conversation optimization method and device, computer equipment and storage medium |
CN113378545A (en) * | 2021-06-08 | 2021-09-10 | 北京邮电大学 | Aspect level emotion analysis method and device, electronic equipment and storage medium |
CN113377928A (en) * | 2021-08-11 | 2021-09-10 | 明品云(北京)数据科技有限公司 | Text recommendation method, system, device and medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952271A (en) * | 2023-03-09 | 2023-04-11 | 杭州心识宇宙科技有限公司 | Method, device, storage medium and electronic equipment for generating dialogue information |
CN115952271B (en) * | 2023-03-09 | 2023-06-27 | 杭州心识宇宙科技有限公司 | Method and device for generating dialogue information, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN114528851B (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096570B (en) | Intention identification method and device applied to intelligent customer service robot | |
CN110444198B (en) | Retrieval method, retrieval device, computer equipment and storage medium | |
US10176804B2 (en) | Analyzing textual data | |
CN108847241B (en) | Method for recognizing conference voice as text, electronic device and storage medium | |
CN105931644B (en) | A kind of audio recognition method and mobile terminal | |
CN111783471B (en) | Semantic recognition method, device, equipment and storage medium for natural language | |
US6763331B2 (en) | Sentence recognition apparatus, sentence recognition method, program, and medium | |
JPWO2008023470A1 (en) | SENTENCE UNIT SEARCH METHOD, SENTENCE UNIT SEARCH DEVICE, COMPUTER PROGRAM, RECORDING MEDIUM, AND DOCUMENT STORAGE DEVICE | |
CN112699645B (en) | Corpus labeling method, apparatus and device | |
CN110503956B (en) | Voice recognition method, device, medium and electronic equipment | |
CN111144102B (en) | Method and device for identifying entity in statement and electronic equipment | |
CN110717021B (en) | Input text acquisition and related device in artificial intelligence interview | |
CN115982376A (en) | Method and apparatus for training models based on text, multimodal data and knowledge | |
CN111368066B (en) | Method, apparatus and computer readable storage medium for obtaining dialogue abstract | |
WO2023045186A1 (en) | Intention recognition method and apparatus, and electronic device and storage medium | |
CN112307183B (en) | Search data identification method, apparatus, electronic device and computer storage medium | |
CN114528851B (en) | Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium | |
CN113946668A (en) | Semantic processing method, system and device based on edge node and storage medium | |
CN114020886A (en) | Speech intention recognition method, device, equipment and storage medium | |
CN116522905B (en) | Text error correction method, apparatus, device, readable storage medium, and program product | |
CN114218356B (en) | Semantic recognition method, device, equipment and storage medium based on artificial intelligence | |
CN114118049B (en) | Information acquisition method, device, electronic equipment and storage medium | |
CN109344388A (en) | Spam comment identification method and device and computer readable storage medium | |
CN114281969A (en) | Reply sentence recommendation method and device, electronic equipment and storage medium | |
CN116049370A (en) | Information query method and training method and device of information generation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |