CN113962215A

CN113962215A - Text error correction method, device and equipment based on artificial intelligence and storage medium

Info

Publication number: CN113962215A
Application number: CN202111215901.9A
Authority: CN
Inventors: 郭丹丹
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-01-21

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a text error correction method based on artificial intelligence, which comprises the following steps: performing word segmentation processing on the acquired text to be detected based on a pre-trained word segmentation model to acquire a corresponding word segmentation result; acquiring words in the word segmentation result and the probability value of each word in the text to be detected in a corresponding sentence of the text to be detected; determining a suspected error position candidate set of the text to be detected based on the probability value; based on a pre-constructed dictionary, acquiring a candidate result corresponding to each error in the suspected error position candidate set, and determining an error correction candidate set corresponding to the candidate result; acquiring the confusion degree of the candidate results in the error correction candidate set in the corresponding sentences, and determining the error correction result corresponding to each error based on the confusion degree; and correcting the error of the text to be detected based on the error correction result. The invention can improve the accuracy of text error correction.

Description

Text error correction method, device and equipment based on artificial intelligence and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a text error correction method and apparatus based on artificial intelligence, an electronic device, and a computer-readable storage medium.

Background

At present, various errors inevitably exist in Chinese texts, for example, various errors caused by form-close characters, homophones, dialects and the like; for example, in a conventional speech recognition scheme, the recognition result often fails to express the real intention of the client well due to objective reasons, for example, the client has a dialect accent when speaking or is affected by external environmental noise, and the like, which may cause errors in the recognized text.

For the above problems, it is necessary to perform error checking and correction on the corresponding text to improve the accuracy of intention understanding, thereby improving the user experience.

The existing text error correction scheme mainly generates a plurality of candidate texts aiming at a text to be corrected through a rule-based model or a statistic-based model, and then screens out the most reasonable text from the candidate texts. However, in the process of text error correction based on the rule-based model or the statistical-based model, the accuracy of text error correction is low, the text error correction effect cannot be compatible with various error forms, such as fuzzy sound, spoken language, dialect, and the like, and further cannot meet the requirements of users on the text error correction function at present.

Therefore, a text error correction method is needed to achieve efficient, comprehensive and accurate error finding and correction effects by considering various types of text errors.

Disclosure of Invention

The invention provides a text error correction method and device based on artificial intelligence, electronic equipment and a computer readable storage medium, and mainly aims to improve the efficiency and accuracy of text error correction.

In order to achieve the above object, the present invention provides a text error correction method based on artificial intelligence, which comprises: performing word segmentation processing on the acquired text to be detected based on a pre-trained word segmentation model to acquire a corresponding word segmentation result;

acquiring words in the word segmentation result and the probability value of each word in the text to be detected in a corresponding sentence of the text to be detected;

determining a suspected error position candidate set of the text to be detected based on the probability value;

based on a pre-constructed dictionary, acquiring a candidate result corresponding to each error in the suspected error position candidate set, and determining an error correction candidate set corresponding to the candidate result;

acquiring the confusion degree of the candidate results in the error correction candidate set in the corresponding sentences, and determining the error correction result corresponding to each error based on the confusion degree;

and correcting the error of the text to be detected based on the error correction result.

In addition, an optional technical solution is that the step of performing word segmentation processing on the acquired text to be detected based on the pre-trained word segmentation model to acquire a corresponding word segmentation result includes:

acquiring a training set corpus, and training the initialized N-gram model based on the training set corpus to acquire a trained word segmentation model;

performing word segmentation processing on the text to be detected once based on the word segmentation model, and acquiring a corresponding first word segmentation result;

performing secondary word segmentation processing on the first word segmentation result based on a forward maximum matching word segmentation method to obtain a corresponding second word segmentation result; performing secondary word segmentation processing on the first word segmentation result based on a backward maximum matching word segmentation method to obtain a corresponding third word segmentation result;

and selecting a target text from the second word segmentation result and the third word segmentation result as the word segmentation result based on a preset rule.

In addition, an optional technical solution is that the step of obtaining the words in the word segmentation result and the probability value of each word in the text to be detected in the corresponding sentence of the text to be detected includes:

acquiring each character in a text to be detected, and determining a corresponding character set;

merging the word segmentation result and the word set to determine a target set;

and acquiring the probability values of all elements in the target set in the corresponding sentences.

In addition, an optional technical solution is that the pre-constructed dictionary includes a fuzzy sound dictionary and a shape-like word dictionary, and the step of determining the error correction candidate set corresponding to the candidate result includes:

converting the characters and/or words at the error positions into target pinyin;

searching fuzzy sound or similar sound corresponding to the target pinyin in the fuzzy sound dictionary to form a first candidate result; at the same time, the user can select the desired position,

splitting the initial consonant and the final of the target pinyin to obtain the split target initial consonant and target final;

searching fuzzy sounds or similar sounds corresponding to the target initial consonant and the target final in the fuzzy sound dictionary to form a second candidate result;

searching all the similar words corresponding to the errors in the similar word dictionary to form a third candidate result;

forming the error correction candidate set based on the first candidate result, the second candidate result, and the third candidate result.

In addition, an optional technical solution is that, before obtaining the confusion degree of the candidate results in the error correction candidate set in the corresponding sentence, the method further includes:

and preliminarily screening the candidate results in the error correction candidate set based on a pre-trained screening model to determine a target candidate set.

In addition, an optional technical solution is that the preliminary screening of the candidate results in the candidate set of correction candidates based on the pre-trained screening model to determine the target candidate set includes:

training a logistic regression model based on the obtained training data;

predicting the result in the error correction candidate set based on the logistic regression model, and acquiring a corresponding prediction score;

and filtering the candidate results with the prediction scores smaller than a preset range based on a preset range to determine the target candidate set.

In addition, an optional technical solution is that the confusion degree obtaining formula is:

wherein w represents a word or a word in the candidate result, i represents a serial number of w in the corresponding sentence, s represents the sentence after the candidate result is replaced, N represents the number of all words or words in the sentence, and p represents a probability value of w in the corresponding sentence.

In order to solve the above problems, the present invention further provides an artificial intelligence-based text error correction apparatus, comprising:

the word segmentation result acquisition unit is used for carrying out word segmentation processing on the acquired text to be detected based on the pre-trained word segmentation model so as to acquire a corresponding word segmentation result;

a probability value obtaining unit, configured to obtain words in the word segmentation result and a probability value of each word in the text to be detected in a corresponding sentence of the text to be detected;

a suspected error position candidate set determining unit, configured to determine a suspected error position candidate set of the text to be detected based on the probability value;

the error correction candidate set determining unit is used for acquiring a candidate result corresponding to each error in the suspected error position candidate set based on a pre-constructed dictionary and determining an error correction candidate set corresponding to the candidate result;

an error correction result determination unit configured to acquire a perplexity of the candidate results in the error correction candidate set in the corresponding sentence, and determine an error correction result corresponding to each error based on the perplexity;

and the text error correction unit is used for correcting the error of the text to be detected based on the error correction result.

In order to solve the above problem, the present invention also provides an electronic device, including:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the artificial intelligence based text error correction method.

In order to solve the above problem, the present invention further provides a computer-readable storage medium having at least one instruction stored therein, where the at least one instruction is executed by a processor in an electronic device to implement the artificial intelligence based text error correction method described above.

Determining a suspected error position candidate set of the text to be detected; the method comprises the steps of obtaining candidate results corresponding to all errors in a suspected error position candidate set based on a pre-constructed dictionary, determining an error correction candidate set corresponding to the candidate results, then obtaining the confusion degree of the candidate results in the error correction candidate set in corresponding sentences, determining the error correction results corresponding to all the errors based on the confusion degree, detecting and positioning text errors in a comprehensive consideration of two aspects of word granularity and word granularity, meanwhile, considering the influences of various factors such as dialects, accents, similar words and the like in the error correction process, improving the accuracy of error correction, further improving the accuracy of later-stage intention recognition, and being applicable to various types of text error correction processes.

Drawings

FIG. 1 is a schematic flow chart of a text error correction method based on artificial intelligence according to an embodiment of the present invention;

FIG. 2 is a block diagram of an artificial intelligence-based text correction apparatus according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an internal structure of an electronic device implementing an artificial intelligence-based text error correction method according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to solve various problems existing in the existing text error correction, the invention provides a text error correction method based on artificial intelligence, which can perform error detection from two dimensions of word granularity and word granularity, can give consideration to various error reasons such as dialect, accent, fuzzy sound, similar words and the like, improves the accuracy of error detection on a text, further improves the accuracy of intention identification and the experience effect of a user, and is suitable for various text error correction processes.

The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The invention provides a text error correction method based on artificial intelligence. Referring to fig. 1, a schematic flow chart of a text error correction method based on artificial intelligence according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the text error correction method based on artificial intelligence mainly includes the following steps:

s100: and performing word segmentation processing on the acquired text to be detected based on the pre-trained word segmentation model to acquire a corresponding word segmentation result.

Wherein, the step S100 may further include:

s110: acquiring a training set corpus, and training the initialized N-gram model based on the training set corpus to acquire a trained word segmentation model;

s120: performing word segmentation processing on the text to be detected once based on the word segmentation model, and acquiring a corresponding first word segmentation result;

s130: performing secondary word segmentation processing on the first word segmentation result based on a forward maximum matching word segmentation method to obtain a corresponding second word segmentation result; and performing secondary word segmentation processing on the first word segmentation result based on a backward maximum matching word segmentation method to obtain a corresponding third word segmentation result.

Specifically, the first segmentation result may be understood as a first segmentation text obtained after performing first segmentation processing on a text to be detected, and then performing second segmentation processing on the first segmentation text to obtain a corresponding second segmentation result and a corresponding third segmentation result, that is, the second segmentation text and the third segmentation text.

The forward maximum matching word segmentation method mainly comprises the following steps: 1. if the length of the sentence to be segmented is larger than the maximum word length of the word list, intercepting n segmented words at the beginning of the current sentence until the total word length of the n words is larger than or equal to the maximum word length of the word list; 2. if the words formed by combining the n words are in the word list, outputting the combined words as word segmentation results, otherwise, searching whether the words formed by combining the first n-1 words or the first n-1 words and the first k words of the nth word are in the word list, and simultaneously, whether the rest part of the nth word is also in the word list and is not a single word until the combined words meeting the conditions are found; 3. and outputting the merged words, and repeating the process by taking the rest part of the sentence as a new sentence to be divided.

Similarly, the backward maximum matching method is similar to the forward maximum matching direction, and mainly includes intercepting n divided words from the end of a sentence, and then performing gradual judgment.

S140: and selecting a target text from the second word segmentation result and the third word segmentation result as the word segmentation result based on a preset rule.

In this step, the preset rule may include: when the word long-term expectation in the second word segmentation result and the third word segmentation result is different, selecting a text with a word long-term expectation from the second word segmentation result and the third word segmentation result as word segmentation results; or when the words selected from the second word segmentation result and the third word segmentation result are the same for a long time, selecting the text with small word length difference from the second word segmentation result and the third word segmentation result as the word segmentation result.

After the word segmentation processing in step S100, a word segmentation result corresponding to the text to be detected is obtained, and the result can be combined with each word in the text to be detected to obtain the probability of the word after word segmentation and each word in the corresponding sentence of the text to be detected, and accordingly, the suspected misplaced position is screened out, and further, error check is performed through two angles, i.e., word granularity and word granularity, so that error omission is prevented, and accuracy of later error correction is improved.

It should be noted that, in order to improve the accuracy of word segmentation, before processing the text to be detected, a preprocessing process for the text to be detected may be further included, for example, the preprocessing process may include: filtering special characters and emoticons in a text to be detected, and then adding identifiers at the beginning and the end of a sentence, such as: CLS and SEP, etc., so as to mark and distinguish sentences, and facilitate the calculation of the subsequent confusion degree.

S200: and acquiring words in the word segmentation result and the probability value of each word in the text to be detected in the corresponding sentence of the text to be detected.

The step of obtaining the words in the word segmentation result and the probability value of each word in the text to be detected in the corresponding sentence of the text to be detected may further include:

s210: acquiring each character in a text to be detected, and determining a corresponding character set;

s220: merging the word segmentation result and the word set to determine a target set;

s230: and acquiring the probability values of all elements in the target set in the corresponding sentences.

When the probability values of all the elements in the corresponding sentences are determined, whether errors exist in the corresponding positions of the elements can be preliminarily judged according to the probability values of the elements.

S300: and determining a suspected error position candidate set of the text to be detected based on the probability value.

In the two steps, words in the word segmentation result and the probability value of each word of the text to be detected in the corresponding sentence can be obtained based on the trained N-gram language model, when the corresponding probability value is smaller than a preset threshold value, the word or the word at the current position can be judged to belong to the suspected error position, and the final suspected error position candidate set can be obtained according to all the suspected error positions in the text to be detected.

Specifically, the N-gram language model is a probability-based judgment model, and its input may be the word of the word segmentation result and the sequential sequence of each word, and its output is the probability of the corresponding word and word. Suppose that the sentence T is a sequence or word sequence w1, w₂，w₃...w_nComposition, then the joint probability of the N-gram language model output can be expressed as: p (t) ═ P (w)₁)*p(w₂)*p(w₃)*…*p(wn)＝p(w₁)*p(w₂|w₁)*p(w₃|w₁w₂)*…*p(wn|w₁w₂w₃...). It can be seen that the conditional probability of occurrence for each word and word in the sentence T can be obtained by counting in a preset corpus.

As a specific example, for an N-gram language model of N-grams, the probability of the wn word or word can be expressed as: p (wn | w)₁w₂w₃...)＝C(w_i-n-1，…，w_i)/C(w_i-n-1，…，w_i-1) (ii) a C (w) in the above formula_i-n-1，…，w_i) Representing a character string w_i-n-1，…，w_iNumber or frequency of occurrences in a predetermined corpus. In addition, the preset threshold value can be set according to an application scene or requirement, the existing empirical value can also be taken, and in the application process, the sample text and the preset threshold value can be set to be mistakenAnd (4) carrying out error check, and then adjusting a preset threshold value by combining the obtained error check result and the error position in the sample text so as to ensure the accuracy of the error check.

S400: and acquiring a candidate result corresponding to each error in the suspected error position candidate set based on a pre-constructed dictionary, and determining an error correction candidate set corresponding to the candidate result.

Wherein, the dictionary can comprise a fuzzy sound dictionary and a similar character dictionary; the construction process of the fuzzy sound dictionary can be completed according to pinyin and fuzzy sound rules, for example, corresponding fuzzy sounds or similar sounds can be counted according to the habits of dialect accents in different regions, and for example, n and l, b and f, an and ang, en and eng and the like can be regarded as similar fuzzy sounds; in addition, the constructed word-like dictionary mainly includes statistical word-like words, for example, "already" and "already" can be regarded as word-like words, "late" and "mischief", etc., so as to be able to take into account the errors that may occur in various aspects and improve the error correction accuracy.

As a specific example, the step S400 may further include the following:

s410: converting the characters and/or words at the error positions into target pinyin;

wherein, because the elements in the target set include words and words, the error positions in the corresponding suspected error position candidate set may be words or words; after the error position is determined, the characters and/or words at the position can be converted into a pinyin format to determine the corresponding target pinyin.

In addition, the whole sentence with the error position can be converted into a pinyin form, fuzzy sound or similar characters are searched for a target pinyin (error position) in the pinyin form sentence based on a pre-constructed dictionary, and then replacement processing can be carried out according to the query result to determine a final candidate set.

S420: searching fuzzy sound or similar sound corresponding to the target pinyin in the fuzzy sound dictionary to form a first candidate result; at the same time, the user can select the desired position,

s430: splitting the initial consonant and the final of the target pinyin to obtain the split target initial consonant and target final;

s440: searching fuzzy sounds or similar sounds corresponding to the target initial consonant and the target final in the fuzzy sound dictionary to form a second candidate result;

further, the method includes step S450, which may be executed simultaneously with the above steps: searching all the similar words corresponding to the errors in the similar word dictionary to form a third candidate result;

s460: forming the error correction candidate set based on the first candidate result, the second candidate result, and the third candidate result.

The first candidate result, the second candidate result, and the third candidate result respectively represent a plurality of possibility results corresponding to each error obtained from different angles and from different departure points, and correct correction results for the errors certainly exist in the results, and further, it is necessary to perform judgment and screening on a plurality of possibilities in the error correction candidate set one by one until a final error correction result is determined.

In other words, the final error correction candidate set can be formed by taking the union set of the three candidate results, and then each result in the error correction candidate set is verified, so that the optimal error correction result can be screened out, and the error correction processing is performed on the corresponding error position according to the optimal error correction result.

S500: acquiring the confusion degree of the candidate results in the error correction candidate set in the corresponding sentences, and determining the error correction result corresponding to each error based on the confusion degree.

Further, before performing step S500, the method further includes: and preliminarily screening the candidate results in the error correction candidate set based on a pre-trained screening model, determining a target candidate set, and then acquiring the perplexity of the candidate results in the target candidate set in the corresponding sentences.

Specifically, the preliminary screening of the candidate results in the error correction candidate set based on the pre-trained screening model, and the process of determining the target candidate set may further include:

s510: training a logistic regression model based on the obtained training data;

s520: predicting the result in the error correction candidate set based on the logistic regression model, and acquiring a corresponding prediction score;

s530: and filtering the candidate results with the prediction scores smaller than a preset range based on a preset range to determine the target candidate set.

In the process, obvious errors in the error correction candidate set are deleted mainly through a logistic regression model, and a candidate result with a high prediction score is selected to reduce the pressure of subsequent calculation amount and complete the primary screening of the error correction candidate set.

And further, each candidate result in the target candidate set is replaced into a corresponding sentence, the confusion degree of the sentence after the corresponding replacement result is obtained, then the candidate result with the minimum confusion degree is selected as the final error correction result, the corresponding dislocation position is replaced according to the error correction result, and the correct text after error correction is obtained.

Specifically, the calculation formula for obtaining the degree of confusion can be expressed as:

The sentence with errors can be obtained through the formula, the confusion degree of each candidate result is replaced, and then the candidate result with the minimum confusion value is selected as the final error correction result based on the confusion degree.

S600: and correcting the error of the text to be detected based on the error correction result.

Through the steps, a corresponding error correction result can be obtained for each error position, the errors of the corresponding position are replaced according to all the error correction results, the error detection and correction process of the text to be detected can be completed, and the corrected text is formed so as to facilitate subsequent operations such as intention identification.

According to the text error correction method based on artificial intelligence, text errors can be comprehensively considered from the two aspects of character granularity and word granularity, the text errors are detected and positioned, meanwhile, the influences of various factors such as dialects, accents, similar characters and the like can be considered in the error correction process, the error correction accuracy is improved, the later intention recognition accuracy is further improved, and the method is applicable to various text error correction processes.

Fig. 2 is a functional block diagram of the text error correction device based on artificial intelligence according to the present invention.

The artificial intelligence based text error correction apparatus 200 according to the present invention can be installed in an electronic device. According to the implemented functions, the artificial intelligence based text correction apparatus may include: a word segmentation result acquisition unit 210, a probability value acquisition unit 220, a suspected error position candidate set determination unit 230, an error correction candidate set determination unit 240, an error correction result determination unit 250, and a text error correction unit 260. The unit of the present invention, which may also be referred to as a module, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the word segmentation result obtaining unit 210 is configured to perform word segmentation processing on the obtained text to be detected based on the pre-trained word segmentation model to obtain a corresponding word segmentation result.

Wherein, the unit 210 may further include:

the word segmentation model training module is used for acquiring a training set corpus and training the initialized N-gram model based on the training set corpus to acquire a trained word segmentation model;

the first word segmentation result acquisition module is used for carrying out word segmentation processing on the text to be detected once based on the word segmentation model and acquiring a corresponding first word segmentation result;

the second and third word segmentation result acquisition modules are used for carrying out secondary word segmentation processing on the first word segmentation result based on a forward maximum matching word segmentation method to acquire a corresponding second word segmentation result; and performing secondary word segmentation processing on the first word segmentation result based on a backward maximum matching word segmentation method to obtain a corresponding third word segmentation result.

And the word segmentation result acquisition module is used for selecting a target text from the second word segmentation result and the third word segmentation result as the word segmentation result based on a preset rule.

In this module, the preset rule may include: when the word long-term expectation in the second word segmentation result and the third word segmentation result is different, selecting a text with a word long-term expectation from the second word segmentation result and the third word segmentation result as word segmentation results; or when the words selected from the second word segmentation result and the third word segmentation result are the same for a long time, selecting the text with small word length difference from the second word segmentation result and the third word segmentation result as the word segmentation result.

After the word segmentation processing of the word segmentation result obtaining unit 210, a word segmentation result corresponding to the text to be detected is obtained, and the result can be combined with each word in the text to be detected to obtain the probability of the word after word segmentation and each word in the corresponding sentence of the text to be detected, and accordingly, the position suspected to be misplaced is screened out, and further, error check is performed through two angles of word granularity and word granularity, error omission is prevented, and the accuracy of later-stage error correction is improved.

A probability value obtaining unit 220, configured to obtain words in the word segmentation result and a probability value of each word in the text to be detected in a corresponding sentence of the text to be detected.

The unit for obtaining the word in the word segmentation result and the probability value of each word in the text to be detected in the corresponding sentence of the text to be detected may further include:

the character set determining module is used for acquiring each character in the text to be detected and determining a corresponding character set;

the target set determining module is used for performing union processing on the word segmentation result and the word set so as to determine a target set;

and the probability value acquisition module is used for acquiring the probability values of all elements in the target set in the corresponding sentences.

A suspected error position candidate set determining unit 230, configured to determine a suspected error position candidate set of the text to be detected based on the probability value.

In the two units, words in the word segmentation result and the probability value of each word of the text to be detected in the corresponding sentence can be obtained based on the trained N-gram language model, when the corresponding probability value is smaller than a preset threshold value, the word or the word at the current position can be judged to belong to the suspected error position, and the final suspected error position candidate set can be obtained according to all the suspected error positions in the text to be detected.

As a specific example, for an N-gram language model of N-grams, the probability of the wn word or word can be expressed as: p (wn | w)₁w₂w₃...)＝C(w_i-n-1，…，w_i)/C(w_i-n-1，…，w_i-1) (ii) a C (w) in the above formula_i-n-1，…，w_i) Representing a character string w_i-n-1，…，w_iNumber or frequency of occurrences in a predetermined corpus. In addition, the preset threshold value can be set according to an application scene or requirement, an existing empirical value can also be obtained, in the application process, a sample text and the preset threshold value can be set for error check, and then the preset threshold value is subjected to error check by combining the obtained error check result and the error position in the sample textAdjustment is performed to ensure the accuracy of the error check.

An error correction candidate set determining unit 240, configured to obtain a candidate result corresponding to each error in the suspected error location candidate set based on a pre-constructed dictionary, and determine an error correction candidate set corresponding to the candidate result.

As a specific example, the error correction candidate set determining unit 240 may further include the following:

the target pinyin conversion module is used for converting the characters and/or words at each error position into target pinyin;

A first candidate result forming module, configured to search the fuzzy sound or similar sound corresponding to the target pinyin in the fuzzy sound dictionary to form a first candidate result; at the same time, the user can select the desired position,

the pinyin splitting module is used for splitting the initial consonants and the final consonants of the target pinyin to obtain the split target initial consonants and target final consonants;

a second candidate result forming module, configured to search, in the fuzzy sound dictionary, for a fuzzy sound or a similar sound corresponding to the target initial and the target final to form a second candidate result;

an error correction candidate set forming module, configured to form the error correction candidate set based on the first candidate result, the second candidate result, and the third candidate result.

An error correction result determination unit 250 configured to acquire a perplexity of the candidate results in the error correction candidate set in the corresponding sentence, and determine an error correction result corresponding to each error based on the perplexity.

Wherein before executing the unit, the method further comprises: and preliminarily screening the candidate results in the error correction candidate set based on a pre-trained screening model, determining a target candidate set, and then acquiring the perplexity of the candidate results in the target candidate set in the corresponding sentences.

1. training a logistic regression model based on the obtained training data;

2. predicting the result in the error correction candidate set based on the logistic regression model, and acquiring a corresponding prediction score;

3. and filtering the candidate results with the prediction scores smaller than a preset range based on a preset range to determine the target candidate set.

And the text error correction unit 260 is used for correcting the error of the text to be detected based on the error correction result.

After the processing of the units, a corresponding error correction result can be obtained for each error position, and the errors of the corresponding position are replaced according to all the error correction results, so that the error detection and correction process of the text to be detected can be completed, and the corrected text is formed, so that the subsequent operations such as intention identification and the like can be facilitated.

Fig. 3 is a schematic structural diagram of an electronic device for implementing an artificial intelligence-based text error correction method according to the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as an artificial intelligence based text correction program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of an artificial intelligence based text correction program, etc., but also for temporarily storing data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., artificial intelligence based text error correction programs, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The artificial intelligence based text correction program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, enable:

performing word segmentation processing on the acquired text to be detected based on a pre-trained word segmentation model to acquire a corresponding word segmentation result;

training a logistic regression model based on the obtained training data;

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A text error correction method based on artificial intelligence, which is characterized by comprising the following steps:

2. The artificial intelligence based text error correction method according to claim 1, wherein the step of performing word segmentation processing on the obtained text to be detected based on the pre-trained word segmentation model to obtain the corresponding word segmentation result comprises:

3. The artificial intelligence based text error correction method according to claim 1, wherein the step of obtaining the words in the word segmentation result and the probability value of each word in the text to be detected in the corresponding sentence of the text to be detected comprises:

4. The artificial intelligence based text error correction method of claim 1, wherein the pre-built dictionary comprises a fuzzy-tone dictionary and a pictographic dictionary, and the step of determining the error correction candidate set corresponding to the candidate result comprises:

5. The artificial intelligence based text error correction method of any one of claims 1 to 4, further comprising, before obtaining a perplexity of candidate results in the error correction candidate set in corresponding sentences:

6. The artificial intelligence based text correction method of claim 5, wherein the preliminary screening of the candidate results in the correction candidate set to determine the target candidate set based on the pre-trained screening model comprises:

training a logistic regression model based on the obtained training data;

7. The artificial intelligence based text error correction method of claim 1, wherein the confusion level is obtained by the formula:

8. An artificial intelligence based text correction apparatus, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps in the artificial intelligence based text correction method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the steps in the artificial intelligence based text correction method according to any one of claims 1 to 7.