CN117648923B

CN117648923B - Chinese spelling error correction method suitable for medical context

Info

Publication number: CN117648923B
Application number: CN202410120343.5A
Authority: CN
Inventors: 高敏; 陈恩红; 刘昌春; 蒋浚哲; 张凯; 王慕秋; 李京秀; 宋雪莉; 丁蓓蓓; 张梦云
Original assignee: Anhui Provincial Hospital First Affiliated Hospital Of Ustc
Current assignee: Anhui Provincial Hospital First Affiliated Hospital Of Ustc
Priority date: 2024-01-29
Filing date: 2024-01-29
Publication date: 2024-05-10
Anticipated expiration: 2044-01-29
Also published as: CN117648923A

Abstract

The invention relates to the field of artificial intelligence, in particular to a Chinese spelling error correction method suitable for medical contexts, which comprises the steps of converting sentences into Chinese character label sequences, inputting the Chinese character label sequences into a BERT pre-training Chinese language model to obtain context information characteristics, and carrying out linear transformation on the context information characteristics to enable the dimensions of the context information characteristics to be aligned with a word list; before calculating each positionNormalized confidence of each candidate item, obtain the front of each positionConfidence of individual candidates; before calculating each positionThe candidate items correspond to the visual similarity and the voice similarity of the Chinese characters and the input Chinese characters, and the two are weighted to obtain the similarity; front of each position is calculated by fusing similarity and confidenceComprehensive weights of the individual candidates; and taking the Chinese character with the highest comprehensive weight at each position as the Chinese character after error correction. The invention solves the problem of similar character errors by modeling the visual similarity and the voice similarity of the Chinese characters.

Description

Chinese spelling error correction method suitable for medical context

Technical Field

The invention relates to the field of artificial intelligence, in particular to a Chinese spelling error correction method suitable for medical contexts.

Background

With the population growth and aging of China, the number of medical staff is greatly increased, so that doctors can take more time for taking a doctor, and the doctor cannot concentrate on other works, such as medical record writing, prescription making and the like. The heavy working pressure greatly improves the probability of misspelling of doctors in work, thereby causing information transmission deviation and even accidents. For example, misspelling of the drug name may result in the patient obtaining the wrong drug; careless mistakes of disease names may cause misdiagnosis; errors in the registration of surgical procedures can also severely affect the therapeutic effect. The automatic spelling error correction system can find out the wrong words in the words and propose modification suggestions, helps medical staff reduce spelling errors, improves the accuracy of medical records, saves time of doctors and applies more energy to treatment.

Some researchers have attempted to make spelling corrections in a medical environment based on deep learning methods. This spelling error correction method mainly requires the construction of neural network models that learn and understand complex patterns of language, including context, grammar structure, semantic meaning, etc., through a large amount of training data. The input misspelled text is encoded by the encoder into a fixed length vector, important information in the text is captured, and the decoder then generates the correct spelled text based on this vector. In the training process, the neural network model continuously adjusts internal parameters by comparing the generated text with the true correct text, so that the generated text is more similar to the true correct text.

These neural network models rely on local context information for prediction because the design of such neural network models makes it difficult to handle long-range dependence problems, and the models may not adequately understand phrases or sentences with rich meaning in the context, resulting in an inability to correct context-dependent errors. For example, hyperthyroidism in a condition description is wrongly written as hypothyroidism, and although a correct disease name can be presumed from specific symptoms in the condition description, it cannot correct spelling errors because the model cannot sufficiently understand the relevance and semantic meaning in the context information.

On the other hand, for spelling errors of similar chinese characters, for example, sinus rhythm errors are written as sinus heart rate, the rhythm and rate have the same pronunciation, and both heart rate and rhythm have respective practical meanings, existing deep-learning neural network models have difficulty in dealing with such complex nonlinear relationships, so that the models may not make a correct prediction in the face of morphology-similar or pronunciation-similar spelling errors.

Disclosure of Invention

In order to solve the problems, the invention provides a Chinese spelling error correction method suitable for medical contexts.

The method comprises the following steps:

dividing sentences to be corrected into Chinese characters as units to obtain Chinese characters, the first/>The individual Chinese characters are/>，/>Will/>Mapping each Chinese character through a word list to obtain a sequence, and adding/> before the sequenceAfter the sequence add/>Obtaining the Chinese character label sequence/>, of the sentence to be corrected；

Step two, the Chinese character label sequenceInputting the information into the BERT pre-training Chinese language model to obtain the contextual information feature/>Contextual information feature/>Dimension conversion to/>Obtain confidence prediction/>；

Step three, defining confidence predictionChinese characters/>, corresponding toConfidence prediction of (1) is Chinese character confidence prediction/>Prediction/>, of Chinese character confidence coefficientBefore selecting all values from big to smallThe value is taken as the/>, in the sentence to be correctedThe candidate Chinese character probability sets at the positions are normalized, wherein the first/>, in the sentences to be corrected, of the candidate Chinese character probability sets are correctedFirst/>, at the individual locationsNormalized confidence of each candidate Chinese character is/>；

Step four, calculating the fourth sentence in the sentence to be corrected based on the edit distance algorithmChinese characters/>And the first sentence in the sentence to be correctedFirst/>, at the individual locationsSpeech similarity/>, between candidate Chinese characters；

Step five, calculating the fifth sentence in the sentence to be corrected based on the edit distance algorithmChinese characters/>And the first sentence in the sentence to be correctedFirst/>, at the individual locationsVisual similarity/>, between candidate Chinese characters；

Step six, based on voice similaritySimilarity to vision/>Calculating Chinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSimilarity/>, between candidate Chinese charactersBased on similarity/>Normalized confidence/>Calculating the/>, in the sentence to be correctedFirst/>, at the individual locationsComprehensive weight/>, of candidate Chinese charactersAccording to the comprehensive weight/>Calculating the/>, in the sentence to be correctedChinese character/>, corrected at each position。

Further, the step two is to sequence the Chinese character labelsInputting the information into the BERT pre-training Chinese language model to obtain the contextual information feature/>Specifically refers to the Chinese character label sequence/>Inputting the information into the BERT pre-training Chinese language model to obtain the contextual information feature/>：

；

Wherein,Representing feature extraction operations through the BERT pre-trained chinese language model.

Further, the step two is to characterize the context informationDimension conversion to/>Obtain confidence prediction/>In particular to the context information feature/>Performing dimension conversion to obtain confidence prediction/>：

；

Wherein,Representing linear transformation operations, confidence prediction/>Is/>。

Further, normalizing the confidence in step threeThe calculation method of (1) is as follows:

；

Wherein, Representing confidence prediction/>, of Chinese charactersThe vectors of (2) are ordered from large to small according to the numerical valueA value.

Further, the fourth step specifically includes: the phonetic sequence of each Chinese character is formed by the phonetic and tone codes of the Chinese character, and the first sentence of the sentences to be corrected is definedChinese characters/>The pinyin sequence of (2) is/>Calculating the/>, in the sentence to be corrected, based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSpeech similarity between candidate Chinese characters：

；

Wherein,Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsThe vocabulary index of the candidate Chinese characters,Representing a decoding function for converting a vocabulary index into a corresponding Chinese character,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsCandidate Chinese characters,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsPinyin sequence of candidate Chinese characters,/>Representing an edit distance calculation function,/>Representing absolute value operations,/>Representing a maximizing function.

Further, the fifth step specifically includes: defining the first sentence in the sentence to be correctedChinese characters/>The ideographic description sequence of (1) is/>Calculating the/>, in the sentence to be corrected, based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsVisual similarity/>, between candidate Chinese characters：

；

Wherein,Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsThe vocabulary index of the candidate Chinese characters,Representing a decoding function for converting a vocabulary index into a corresponding Chinese character,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsCandidate Chinese characters,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsIdeographic description sequence of candidate Chinese characters,/>Representing an edit distance calculation function,/>Representing absolute value operations,/>Representing a maximizing function.

Further, the ideographic description sequence specifically refers to:

Splitting each Chinese character by taking a single character as a unit to obtain an internal character forming part, and combining the split residual strokes and the nearest single character as an internal character forming part for the Chinese characters which cannot be completely split into the single characters;

Splitting each internal character forming part continuously according to the sequence of the Chinese character writing rule until individual strokes are obtained;

According to the splitting sequence, constructing an ideographic description tree of the Chinese characters with a tree structure, wherein the root node of the ideographic description tree is the structural information code for describing the relative positions of the internal character forming components obtained by the first splitting, the leaf nodes are the stroke codes of single strokes, and the middle nodes are the structural information codes for describing the relative positions of the internal character forming components or strokes;

the ideographic description sequence of the Chinese characters is a sequence obtained by traversing the ideographic description tree.

Further, the traversal ideographic tree specifically refers to: and traversing the ideographic description tree according to the preamble traversing sequence.

Further, the sixth step specifically includes calculating the first sentence in the sentence to be correctedChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSimilarity/>, between candidate Chinese characters：

；

Wherein,An adjustment factor for adjusting the voice similarity and the visual similarity;

Combining the similarity and the normalized confidence to obtain the first sentence in the sentence to be corrected First/>, at the individual locationsComprehensive weight/>, of candidate Chinese characters：

；

Then the first sentence in the sentence to be correctedChinese character/>, corrected at each positionThe method comprises the following steps:

；

Wherein, Representing a function of selecting the maximum value in brackets/(Representing a function that converts comprehensive weights into vocabulary indices,/>Representing a decoding function that converts the vocabulary index into a corresponding chinese character.

One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

The spelling error correction method provided by the invention is based on the context confidence and the Chinese character similarity, and introduces the BERT pre-training Chinese language model, thereby introducing background knowledge in the pre-training process, and enabling the input sentence to be encoded on the basis, thereby integrating the current context characteristics and solving the problem that partial correct but unsuitable words are difficult to identify. Meanwhile, the text structure, namely the visual similarity and the voice similarity of the Chinese characters are modeled, so that the model is helped to recognize similar wrongly written characters, and the problem of similar character errors is solved.

Drawings

FIG. 1 is a schematic diagram of two internal word forming components according to an embodiment of the present invention in a left-to-right relationship;

FIG. 2 is a schematic diagram of two internal word forming components according to an embodiment of the present invention in a top-down relationship;

FIG. 3 is a schematic diagram of three internal word forming components according to an embodiment of the present invention in a left-to-right relationship;

FIG. 4 is a schematic diagram of three internal word forming components according to an embodiment of the present invention in a top-down relationship;

FIG. 5 is a schematic diagram showing the outside-in relationship of two internal word forming components according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of two internal letter components with three sides surrounding and lower opening relationship according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of two internal letter components with three sides surrounding and upper opening relationship according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of two internal letter components with three-sided surrounding and right opening relationship according to an embodiment of the present invention;

FIG. 9 is a schematic diagram showing two inner letter components in a left-top to right-bottom two-sided surrounding relationship according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of two internal word forming components according to an embodiment of the present invention in a top right to bottom left two-sided surrounding relationship;

FIG. 11 is a schematic diagram showing two internal word forming parts according to an embodiment of the present invention in a surrounding relationship from bottom left to top right;

FIG. 12 is a schematic diagram of a partially overlapping relationship of two internal word forming components according to an embodiment of the present invention;

fig. 13 is a schematic diagram of a Chinese character "by" ideographic tree according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the drawings and detailed embodiments, and before the technical solutions of the embodiments of the present invention are described in detail, the terms and terms involved will be explained, and in the present specification, the components with the same names or the same reference numerals represent similar or identical structures, and are only limited for illustrative purposes.

The invention corrects sentences input by the user and outputs the most probable correction result. Specifically, firstly, word segmentation is carried out on an input sentence to obtain a sequence of single Chinese characters, and then a special identifier is added to obtain a Chinese character label sequence of the sentence to be corrected; inputting a Chinese character label sequence of a sentence to be corrected into the BERT pre-training Chinese language model to obtain context information characteristics given by the BERT pre-training Chinese language model, and performing linear transformation on the context information characteristics to enable the dimensions of the context information characteristics to be aligned with a word list; before calculating each positionNormalized confidence of each candidate item, get the front/>, of each locationConfidence of individual candidates; calculate the front/>, for each locationThe candidate items correspond to the visual similarity and the voice similarity of the Chinese characters and the input Chinese characters, and weight the two to obtain the front/>, of each positionSimilarity of the individual candidates; front/>, of each position is calculated by fusing similarity and confidenceComprehensive weights of the individual candidates; and taking the Chinese character with the highest comprehensive weight at each position as the Chinese character after error correction.

The method provided by the invention specifically comprises the following steps:

1. word segmentation of sentences

In order for the BERT pre-trained chinese language model to be able to process sentences composed of characters, it is necessary to word the sentences.

Sentences to be correctedDividing by taking Chinese characters as a unit, and obtaining/>, obtained by dividingMapping each Chinese character through a word list to obtain a sequence/>，/>。/>Representing the number of Chinese characters included in the sentence to be corrected,/>Representing the/>, in the sentence to be correctedChinese characters/>And mapping to obtain the Chinese character digital label. The vocabulary is a table for mapping Chinese characters into numbers, and the corresponding number range of the vocabulary of the BERT pre-training Chinese language model is 0-21127。

Based on the format requirement of BERT pre-trained Chinese language model, in sequencePlus/>Tagging followed by/>Marking to obtain the Chinese character label sequence of the sentence to be corrected。

2. Acquiring features containing contextual information

The BERT pre-trained Chinese language model can model context semantic information and sequence Chinese character labelsInputting the information into the BERT pre-training Chinese language model to obtain the contextual information feature/>：

；

Wherein,Representing extraction of feature operations, contextual information features by BERT pre-trained Chinese language modelsIs a dimension/>Vector of/>Is the output dimension of the BERT pre-trained chinese language model.

In order to calculate the probability of possible Chinese characters at each position of the sentence to be corrected in the subsequent steps, the context information needs to be characterizedPerforming dimension conversion to obtain confidence prediction/>：

；

3. Calculating confidence

Defining confidence predictionsThe corresponding sentence to be corrected in the sentence/>Chinese characters/>Confidence prediction of (1) is Chinese character confidence prediction/>Chinese character confidence prediction/>For length/>Each value in the vector represents a Chinese character corresponding to a certain value in the vocabulary of the BERT pre-trained Chinese language model as the/>Chinese characters/>Is a probability of (2).

Prediction of confidence level of Chinese charactersAll values in the vector of (a) are sorted from big to small according to the numerical value, and then the front/> isselectedThe individual values form a set as the/>, in the sentence to be correctedCandidate Chinese character probability set at each position, wherein Chinese characters corresponding to the candidate Chinese character probability set are the most likely to be predicted by the BERT pre-trained Chinese language model to be used as the/> in the sentence to be correctedChinese characters/>/>Chinese characters, and the first/>, in the candidate Chinese character probability setChinese characters corresponding to the individual values are the/>, in the sentence to be correctedFirst/>, at the individual locationsCandidate Chinese characters. Normalizing the candidate Chinese character probability set, and carrying out/>, in the sentence to be correctedFirst/>, at the individual locationsNormalized confidence/>, of each candidate chinese characterThe method comprises the following steps:

；

The invention only considers Chinese character confidence prediction when calculating normalized confidenceMedium value is the largest/>The individual values are candidates, not all values in the vocabulary. The purpose of this design is to draw confidence gaps because of the Chinese confidence prediction/>The values of the first few candidate values are relatively close, and if normalization is performed by using all the values in the vocabulary, the calculated normalized confidence of each candidate value will be too close.

4. Calculating speech similarity

The pronunciation of Chinese characters can be directly represented by their corresponding pinyin and tone. The invention judges the phonetic similarity of Chinese characters by converting Chinese characters into corresponding phonetic sequences, and the phonetic sequences of each Chinese character are defined as phonetic and tone coding compositions. In this embodiment, the tone encoding is digital. For example, the pinyin sequence of the Chinese character "medical" is "yi1", where "1" represents the encoding of the first tone. Defining the first sentence in the sentence to be correctedChinese characters/>The pinyin sequence of (2) is/>。

Calculating the first sentence in the sentence to be corrected based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSpeech similarity/>, between candidate Chinese characters：

；

5. Calculating visual similarity

The invention adopts ideographic character description sequence to represent the visual information of Chinese characters, wherein the ideographic character description sequence comprises a plurality of structural information codes for describing the relative positions of character forming components in the Chinese characters and Chinese character stroke codes, and the strokes are ordered according to the writing rule of the Chinese characters. Based on the Chinese character structure information and the Chinese character stroke information, each ideographic description sequence can be in one-to-one correspondence with the Chinese characters described by the ideographic description sequence.

The invention describes the relative positions of character forming components in Chinese characters based on twelve structural information codes, thereby accurately representing the structural information of the Chinese characters. Fig. 1 shows a schematic view of two internal letter components in a left-to-right relationship, fig. 2 shows a schematic view of two internal letter components in a top-to-bottom relationship, fig. 3 shows a schematic view of three internal letter components in a left-to-right relationship, fig. 4 shows a schematic view of three internal letter components in a top-to-bottom relationship, fig. 5 shows a schematic view of two internal letter components in an outside-to-inside relationship, fig. 6 shows two internal letter components in a three-sided surrounding, a schematic view of an opening relationship below, fig. 7 shows two internal letter components in a three-sided surrounding, a schematic view of an opening relationship above, fig. 8 shows two internal letter components in a three-sided surrounding, a schematic view of an opening relationship to the right, fig. 9 shows two internal letter components in a two-sided surrounding relationship from the top left to the bottom, fig. 10 shows two internal letter components in a two-sided surrounding relationship from the top to the right, fig. 11 shows two internal letter components in a two-sided surrounding relationship from the top to the left, and fig. 12 shows two internal letter components in a partially overlapping relationship.

And for Chinese characters which cannot be completely split into individual characters, combining the split residual strokes and the nearest individual character as an internal character forming part, wherein the nearest individual character refers to the individual character closest to the split residual strokes in writing positions. And continuing to split each internal character forming part according to the sequence of the Chinese character writing rule until separate strokes are obtained. According to the splitting sequence, an ideographic description tree of the Chinese characters with a tree structure is constructed, the root node of the ideographic description tree is the structural information code for describing the relative positions of the internal character forming components obtained by splitting for the first time, the leaf nodes are the stroke codes of single strokes, and the middle nodes are the structural information codes for describing the relative positions of the internal character forming components or strokes. The ideographic description sequence of the Chinese characters is a sequence obtained by traversing the ideographic description tree.

In this embodiment, the ideographic tree is traversed sequentially by the preamble traversal. Fig. 13 shows the ideographic tree of the chinese character "by". Splitting the Chinese character 'from' for the first time to obtain a first internal character forming component '冂' and a second internal character forming component 'earth', wherein the first internal character forming component '冂' and the second internal character forming component 'earth' are in a partially overlapped relation corresponding to fig. 12, so that the root node of the ideographic description tree of the Chinese character 'from' is the structural information code of the partially overlapped relation corresponding to fig. 12; splitting the first internal word forming part 冂 to obtain a stroke I and a stroke I, wherein the stroke I and the stroke I are corresponding to the relation from left to right in FIG. 1; splitting the second internal character forming component 'soil' to obtain a third internal character forming component 'ten' and a stroke 'one', wherein the third internal character forming component 'ten' and the stroke 'one' are in a corresponding up-down relation of FIG. 2; splitting the third internal word forming part 'ten' to obtain a stroke 'one' and a stroke 'I', wherein the stroke 'one' and the stroke 'I' are corresponding partial overlapping relations in FIG. 12. To this end, the Chinese character "from" is completely split into individual strokes, and an ideographic tree as shown in fig. 13 is obtained, and the upper part of fig. 13 is a sequence obtained by performing a preface traversal on the ideographic tree of the Chinese character "from".

Defining the first sentence in the sentence to be correctedChinese characters/>The ideographic description sequence of (1) is/>Calculating the/>, in the sentence to be corrected, based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsVisual similarity/>, between candidate Chinese characters：

；

Wherein,Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsAnd the ideographic description sequence of the candidate Chinese characters.

6. Sentence correction

Calculating the first sentence in the sentence to be correctedChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSimilarity/>, between candidate Chinese characters：

；

Wherein,To adjust the adjustment factor of the voice similarity and the visual similarity.

Combining the similarity and the normalized confidence to obtain the first sentence in the sentence to be correctedFirst/>, at the individual locationsComprehensive weight/>, of candidate Chinese characters：

。

；

Wherein, Representing a function of selecting the maximum value in brackets/(Representing a function that converts comprehensive weights into vocabulary indices,/>Representing slaveAnd selecting the comprehensive weight with the largest value, and calculating the vocabulary index of the comprehensive weight.

Chinese character after error correctionIs the/>, in the corrected sentenceChinese characters corresponding to the positions and correcting the errorPossibly with Chinese characters/>The same or different, and the corrected Chinese characters/>And Chinese characters/>The same indicates the/>, in the corrected sentenceThe individual positions are unmodified; corrected Chinese character/>And Chinese characters/>Inequality indicates the/>, of sentences to be correctedThe positions are modified.

Each corrected Chinese character is composed in sequenceI.e. an error corrected sentence.

The above embodiments are merely illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims

1. A method of chinese spelling error correction for medical contexts, comprising the steps of:

Step two, the Chinese character label sequenceInput into BERT pre-trained Chinese language model to obtain context information featuresContextual information feature/>Dimension conversion to/>Obtain confidence prediction/>；

Step four, calculating the fourth sentence in the sentence to be corrected based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSpeech similarity/>, between candidate Chinese characters；

Step five, calculating the fifth sentence in the sentence to be corrected based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsVisual similarity/>, between candidate Chinese characters；

Step six, based on voice similaritySimilarity to vision/>Calculating Chinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSimilarity/>, between candidate Chinese charactersBased on similarity/>Confidence with normalizedCalculating the/>, in the sentence to be correctedFirst/>, at the individual locationsComprehensive weight/>, of candidate Chinese charactersAccording to the comprehensive weight/>Calculating the/>, in the sentence to be correctedChinese character/>, corrected at each position。

2. The method of claim 1, wherein in step two, the sequence of Chinese character labels is used for correcting the spelling of Chinese charactersInputting the information into the BERT pre-training Chinese language model to obtain the contextual information feature/>Specifically refers to the Chinese character label sequence/>Inputting the information into the BERT pre-training Chinese language model to obtain the contextual information feature/>：

；

3. A method of chinese spelling correction for medical context as recited in claim 1 wherein in step two the contextual information is characterized byDimension conversion to/>Obtain confidence prediction/>In particular to the context information feature/>Performing dimension conversion to obtain confidence prediction/>：

；

4. The method of claim 1, wherein the confidence level is normalized in step threeThe calculation method of (1) is as follows:

；

5. The method of claim 1, wherein the step four specifically comprises: the phonetic sequence of each Chinese character is formed by the phonetic and tone codes of the Chinese character, and the first sentence of the sentences to be corrected is definedChinese characters/>The pinyin sequence of (2) is/>Calculating the/>, in the sentence to be corrected, based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSpeech similarity/>, between candidate Chinese characters：

；

Wherein,Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsVocabulary index of candidate Chinese characters,/>Representing a decoding function for converting a vocabulary index into a corresponding Chinese character,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsCandidate Chinese characters,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsPinyin sequence of candidate Chinese characters,/>Representing an edit distance calculation function,/>Representing absolute value operations,/>Representing a maximizing function.

6. The method of claim 1, wherein the fifth step comprises: defining the first sentence in the sentence to be correctedChinese characters/>The ideographic description sequence of (1) is/>Calculating the/>, in the sentence to be corrected, based on the edit distance algorithmChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsVisual similarity/>, between candidate Chinese characters：

；

Wherein,Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsVocabulary index of candidate Chinese characters,/>Representing a decoding function for converting a vocabulary index into a corresponding Chinese character,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsCandidate Chinese characters,/>Representing the/>, in the sentence to be correctedFirst/>, at the individual locationsIdeographic description sequence of candidate Chinese characters,/>Representing an edit distance calculation function,/>Representing absolute value operations,/>Representing a maximizing function.

7. The method for correcting spelling errors in chinese language for medical context according to claim 6, wherein the sequence of ideographic descriptions is specifically:

8. The method of claim 7, wherein the traversing ideographic description tree specifically refers to: and traversing the ideographic description tree according to the preamble traversing sequence.

9. The method of claim 1, wherein the sixth step comprises calculating the first sentence in the sentence to be correctedChinese characters/>And the/>, in the sentence to be correctedFirst/>, at the individual locationsSimilarity/>, between candidate Chinese characters：

；