CN114385791A - Text expansion method, device, equipment and storage medium based on artificial intelligence - Google Patents
Text expansion method, device, equipment and storage medium based on artificial intelligence Download PDFInfo
- Publication number
- CN114385791A CN114385791A CN202210040654.1A CN202210040654A CN114385791A CN 114385791 A CN114385791 A CN 114385791A CN 202210040654 A CN202210040654 A CN 202210040654A CN 114385791 A CN114385791 A CN 114385791A
- Authority
- CN
- China
- Prior art keywords
- text
- keyword
- expanded
- initial
- tuple
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a text expansion method, a device, equipment and a storage medium based on artificial intelligence, wherein the method comprises the following steps: extracting an initial keyword tuple from a pre-obtained text to be expanded; processing the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples which are different from the initial keyword tuple; respectively inputting a plurality of target keyword tuples into a pre-trained text generation model to generate a plurality of directional texts, wherein the text generation model is obtained by training according to historical text data; respectively calculating semantic similarity between each directional text and the text to be expanded; and removing the oriented texts with the semantic similarity lower than a preset similarity threshold. According to the method, after the initial keyword tuples are extracted from the text to be expanded, the initial keyword tuples are processed to obtain a plurality of target keyword tuples, and the target keyword tuples are utilized to generate the directional text, so that the text is expanded orderly and directionally.
Description
Technical Field
The present application relates to the field of text expansion based on artificial intelligence, and in particular, to a text expansion method, apparatus, device and storage medium based on artificial intelligence.
Background
With the rapid development of intelligent terminals and network technologies, people are more and more accustomed to using intelligent terminals to fulfill various requirements. In a human-computer interaction scenario, semantic parsing is an essential link, which mainly analyzes the voice input by a user, learns the intention of the user, and converts the voice into a structured data format which can be understood by a machine. In the semantic analysis process, a deep learning model is often adopted for implementation.
In order to ensure the accuracy of the model for semantic implementation, modern deep learning models often need a large amount of labeled data to be trained to achieve the purpose of accurately implementing text semantics, and the labeling process of training samples consumes a large amount of manpower and resources. At present, a feasible method is to use an autoregressive model to perform text automatic generation to manufacture a large number of training texts, but the text content generated by the method cannot be predicted, and training a deep learning model by using the unpredictable text may result in poor training effect of the model and final accuracy reduction of the model.
Disclosure of Invention
The application provides a text expansion method, a device, equipment and a storage medium based on artificial intelligence, which are used for solving the problem that texts expanded by the existing text expansion method cannot be predicted.
In order to solve the technical problem, the application adopts a technical scheme that: a text expansion method based on artificial intelligence is provided, which comprises the following steps: extracting an initial keyword tuple from a pre-obtained text to be expanded; processing the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples which are different from the initial keyword tuple; respectively inputting a plurality of target keyword tuples into a pre-trained text generation model to generate a plurality of directional texts, wherein the text generation model is obtained by training according to historical text data; respectively calculating semantic similarity between each directional text and the text to be expanded; and removing the oriented texts with the semantic similarity lower than a preset similarity threshold.
As a further improvement of the present invention, the step of pre-training the text generation model comprises: acquiring a training sample text and an initial training keyword tuple corresponding to the training sample text; processing the initial training keyword tuples based on a preset rule to obtain a plurality of target training keyword tuples which are different from the initial training keyword tuples; respectively inputting a plurality of target training keyword tuples into a text generation model to be trained to generate a plurality of training directional texts; and reversely propagating and updating the text generation model according to the plurality of training directional texts, the training sample texts and a preset loss function.
As a further improvement of the present invention, the processing of the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples different from the initial keyword tuple includes: and carrying out synonym replacement of random number and/or deletion of random number and/or random disordering of arrangement position sequence on the keywords in the initial keyword tuples to obtain a plurality of target keyword tuples.
As a further improvement of the invention, the method for extracting the initial keyword tuple from the pre-obtained text to be expanded comprises the following steps: performing word segmentation on a text to be expanded by using a pre-constructed word segmentation device to obtain a plurality of candidate words and attributes of each candidate word; according to the attributes, respectively scoring the candidate words by using a preset scoring algorithm to obtain scoring results; and sorting the candidate words in a descending order according to the scoring result, and selecting a preset number of candidate words arranged in the front to construct an initial keyword tuple.
As a further improvement of the invention, the method for extracting the initial keyword tuple from the pre-obtained text to be expanded comprises the following steps: performing word segmentation on a text to be expanded by using a pre-constructed word segmentation device to obtain a plurality of candidate words; and filtering the relation words in the candidate words by using a pre-constructed relation word library, and constructing an initial keyword tuple by using the remaining candidate words.
As a further improvement of the present invention, after extracting the initial keyword tuple from the pre-obtained text to be expanded, the method further includes: when the number of the keywords in the initial keyword tuple is lower than a preset number threshold, inquiring the expanded keywords of each keyword in the initial keyword tuple from a preset expanded keyword knowledge base; the expanded keyword is added to the initial keyword tuple.
As a further improvement of the present invention, before adding the expanded keyword to the initial keyword tuple, the method further includes: inputting the expanded keywords and keywords in the initial keyword tuples into a text generation model to obtain an expanded text; calculating the semantic similarity between the expanded text and the text to be expanded; judging whether the semantic similarity is higher than a preset similarity threshold value or not; if so, the step of adding the expanded keyword to the initial keyword tuple is allowed to be performed.
In order to solve the above technical problem, another technical solution adopted by the present application is: provided is an artificial intelligence-based text augmentation apparatus including: the extraction module is used for extracting an initial keyword tuple from a pre-acquired text to be expanded; the processing module is used for processing the initial keyword tuples based on a preset rule to obtain a plurality of target keyword tuples which are different from the initial keyword tuples; the generating module is used for respectively inputting the target keyword tuples into a pre-trained text generating model to generate a plurality of directional texts, and the text generating model is obtained by training according to historical text data; the calculation module is used for respectively calculating the semantic similarity between each directional text and the text to be expanded; and the removing module is used for removing the oriented texts of which the semantic similarity is lower than a preset similarity threshold.
In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions which, when executed by the processor, cause the processor to perform the steps of any of the artificial intelligence based text augmentation methods described above.
In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a storage medium storing program instructions capable of implementing the artificial intelligence based text augmentation method described above.
The beneficial effect of this application is: the text expansion method based on artificial intelligence comprises the steps of extracting initial keyword tuples consisting of keywords in a text to be expanded, processing the initial keyword tuples according to a preset rule to obtain a plurality of target keyword tuples, generating a plurality of expanded directional texts by utilizing the target keyword tuples, wherein the target keyword tuples and the initial keyword tuples are different, so that the directional texts generated according to the target keyword tuples are not completely identical with the text to be expanded, the target keyword tuples and the initial keyword tuples still have more same keywords, the generated directional texts are close to the text to be expanded, the directional texts are filtered according to the similarity between the texts, and the directional texts with larger semantic deviation are deleted, thus, control over the augmented text is achieved, rather than generating text at will.
Drawings
FIG. 1 is a flowchart illustrating a text augmentation method based on artificial intelligence according to a first embodiment of the present invention;
FIG. 2 is a flowchart illustrating a text augmentation method based on artificial intelligence according to a second embodiment of the present invention;
FIG. 3 is a functional block diagram of an artificial intelligence-based text augmentation apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
FIG. 1 is a flowchart illustrating a text augmentation method based on artificial intelligence according to a first embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:
step S101: and extracting an initial keyword tuple from a pre-acquired text to be expanded.
It should be understood that the keyword is a word capable of expressing the central content of the document, and the keyword can reflect the semantic to be expressed of a section of text to a great extent, so in this embodiment, in order to implement directional expansion of the text to be expanded, processing may be performed on the keyword tuple extracted from the text to be expanded, so that the processed keyword tuple may generate a directional text that is not completely the same as the text to be expanded, and because the keyword has greater repeatability, the semantic of the finally generated text is closer, thereby implementing directional control.
Specifically, in step S101, before the text to be expanded is expanded, all keywords need to be extracted from the text to be expanded, and then all keywords are combined into one initial keyword tuple. When the sample text of the training deep learning model is expanded, the text to be expanded is the marked training sample.
In this embodiment, in order to reduce the cost of the manual labeling training set, an unsupervised keyword extraction mode is adopted to perform keyword extraction. The unsupervised method does not need the process of manually marking the training set, and extracts the key words by finding more important words in the text as the key words, so that the unsupervised method is quicker. Compared with a supervised keyword extraction mode, the method has no cost of manual labeling. Therefore, in some embodiments, step S101 specifically includes:
1. and performing word segmentation on the text to be expanded by using a pre-constructed word segmentation device to obtain a plurality of candidate words and the attribute of each candidate word.
Specifically, after a text to be expanded, which is desired to be expanded by a user, is obtained, the text to be expanded may be segmented by using an NLP algorithm, or the text to be expanded may be segmented by using a feature template extraction algorithm, so as to obtain a plurality of candidate words and attributes of each candidate word. It should be understood that the words extracted from the text to be augmented include necessary keywords and unnecessary keywords, the unnecessary keywords do not have special semantics in the text, and the role of the unnecessary keywords may be to implement the continuity of the text, and such keywords do not need to participate in the keyword processing process of the present application.
2. And according to the attributes, respectively scoring the candidate words by using a preset scoring algorithm to obtain scoring results.
It should be noted that the preset scoring algorithm includes one of a TD-IDF algorithm, a TextRank algorithm, and an LDA algorithm. The TD-IDF algorithm is a keyword extraction algorithm based on statistical characteristics, and the idea of the keyword extraction algorithm based on the statistical characteristics is to extract keywords of a document by utilizing statistical information of words in the document; the TextRank algorithm is a keyword extraction algorithm based on a word graph model, a language network graph of a document is firstly constructed by keyword extraction based on the word graph model, then, the language is analyzed by the network graph, words or phrases with important functions are searched on the graph, and the phrases are keywords of the document; the LDA algorithm is a keyword extraction algorithm based on a topic model, and the keyword extraction is mainly performed by using the property about topic distribution in the topic model based on the topic keyword extraction algorithm.
3. And sorting the candidate words in a descending order according to the scoring result, and selecting a preset number of candidate words arranged in the front to construct an initial keyword tuple.
Specifically, after scores of all candidate words are obtained, all candidate words are sorted in descending order from high scores to low scores, then a preset number of candidate words arranged in the front are selected as final keywords, and the keywords are utilized to construct an initial keyword tuple.
Further, in some embodiments, step S101 further specifically includes:
1. and performing word segmentation on the text to be expanded by utilizing a pre-constructed word segmentation device to obtain a plurality of candidate words.
2. And filtering the relation words in the candidate words by using a pre-constructed relation word library, and constructing an initial keyword tuple by using the remaining candidate words.
Specifically, a piece of text mainly includes related key words and related words that connect keywords in series to form sentences, for example, words such as "yes", "one", and the like, and these related words do not have specific semantics.
Step S102: and processing the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples which are different from the initial keyword tuple.
It should be understood that the text generation model is generated according to the keywords when generating the model, and therefore, when the keywords are different, the generated text is also different. Therefore, in this embodiment, after the initial keyword tuple of the text to be expanded is obtained, the keywords in the initial keyword tuple are processed according to the preset rule, so as to obtain a plurality of target keyword tuples which are different from the initial keyword tuple. In this case, each time processing is performed, one target keyword tuple can be generated, and thus, when processing is performed for a plurality of times, a plurality of target keyword tuples can be generated.
Further, in some embodiments, the preset rule may be that synonym replacement is performed on the keywords, and since the semantic similarity between the synonyms is extremely high, after the synonym replacement is performed, the semantic similarity between the text generated according to the keyword tuple after the synonym replacement and the text to be expanded is also extremely high, therefore, the step of processing the initial keyword tuple based on the preset rule to obtain a plurality of target keyword tuples different from the initial keyword tuple specifically includes:
and carrying out synonym replacement of random number on the keywords in the initial keyword tuples to obtain a plurality of target keyword tuples.
For example, taking the text to be expanded, "shenzhen is a prosperous city in south," as an example, the extracted initial keyword tuple is [ shenzhen, south, prosperous, city ], and "prosperous" can be replaced by the synonym "prosperous," the obtained target keyword tuple is [ shenzhen, south, prosperous, city ], and further replaced by the keyword "south," then two target keyword tuples [ shenzhen, south, prosperous, city ], [ shenzhen, south, prosperous, city ] can be obtained, and by this way of synonym replacement, multiple target keyword tuples can be obtained.
Further, in some embodiments, the preset rule may be to delete the keyword, and therefore, the step of processing the initial keyword tuple based on the preset rule to obtain a plurality of target keyword tuples different from the initial keyword tuple specifically includes:
and deleting the random number of the keywords in the initial keyword tuples to obtain a plurality of target keyword tuples.
For example, also taking the text to be extended "Shenzhen is a prosperous city in south" as an example, with respect to the initial keyword tuple [ Shenzhen, south, prosperous, city ], deleting the keyword "south", then obtaining the target keyword tuple [ Shenzhen, prosperous, city ], deleting the keyword "prosperous", then obtaining the target keyword tuple [ Shenzhen, south, city ].
Further, in some embodiments, the preset rule may be an order of the random key words, and therefore, the step of processing the initial key word tuple based on the preset rule to obtain a plurality of target key word tuples different from the initial key word tuple specifically includes:
and randomly disordering the arrangement position sequence of the keywords in the initial keyword tuples to obtain a plurality of target keyword tuples.
It should be understood that, in this embodiment, when the initial keyword tuple in the text to be expanded is extracted, the keywords therein are sequentially extracted into the initial keyword tuple according to the positions of the keywords in the text, for example, it is also explained by taking the text to be expanded that "shenzhen is a prosperous city in south", the initial keyword tuple extracted by the text is [ shenzhen, south, prosperous, city ], the precedence order of the keywords is "shenzhen — south — prosperous — city", and then the order of the keywords therein is scrambled, for example, the positions are exchanged by "shenzhen" and "city", so that the target keyword tuple [ city, south, prosperous, shenzhen ] is obtained.
Furthermore, in some embodiments, synonym replacement, keyword deletion and keyword order disorder are performed on the keywords, the three ways of processing the keywords can be realized independently or in combination, the target keyword tuple can be obtained by independent realization and combined realization, and the process of realizing independent realization or combined realization is not limited.
It should be understood that, in the above embodiments, replacing, deleting, or sequentially scrambling the keywords is one way of processing the keywords given in this embodiment, and other ways of processing the keywords that can not greatly change the semantics of the text to be expanded all fall within the protection scope of the present invention.
Step S103: and respectively inputting the target keyword tuples into a pre-trained text generation model to generate a plurality of directional texts, wherein the text generation model is obtained by training according to historical text data.
In step S103, it should be noted that the text generation model is obtained by pre-training, after obtaining a plurality of target keyword tuples, the target keyword tuples are respectively input into the text generation model, and each time one target keyword tuple is input, a directed text is generated by the text generation model according to the target keyword tuple, for example, when the target keyword tuple is [ shenzhen, south, prosperity, city ], the generated directed text is "shenzhen is a prosperity city in south", and when the target keyword tuple is [ shenzhen, south, city ], the generated directed text is "shenzhen is a city in south".
Further, the text generation model is obtained by pre-training, and the pre-training of the text generation model includes:
1. and acquiring a training sample text and an initial training keyword tuple corresponding to the training sample text.
2. And processing the initial training keyword tuples based on a preset rule to obtain a plurality of target training keyword tuples which are different from the initial training keyword tuples.
Specifically, the preset rule may be a synonym replacement method, a keyword deletion method, a keyword order disorder method, and the like for the keywords.
3. And respectively inputting the target training keyword tuples into a text generation model to be trained so as to generate a plurality of training oriented texts.
4. And reversely propagating and updating the text generation model according to the plurality of training directional texts, the training sample texts and a preset loss function.
For example, the target keyword tuple can be [ Shenzhen, south, prosperity, city ], [ Shenzhen, south, city ], [ city, south, prosperity, Shenzhen ], and the training sample text is "Shenzhen is a prosperity city in south", and [ Shenzhen, south, prosperity, city ], [ Shenzhen, south, city ], [ city, south, prosperity, Shenzhen ] are distributively input into the text generation model to be trained, so as to respectively generate corresponding training oriented texts, and then the training oriented texts and the training sample text "Shenzhen are a model generated by combining the preset loss function counter-propagation update text in the flourishing city in south.
Step S104: and respectively calculating the semantic similarity between each directional text and the text to be expanded.
In step S104, after the directional text is obtained, the directional text and the text to be expanded are both converted into vector representations, a cosine distance between the two vector representations is calculated, and semantic similarity between the directional text and the text to be expanded is confirmed according to the cosine distance.
Step S105: and removing the oriented texts with the semantic similarity lower than a preset similarity threshold.
In step S105, after the semantic similarity is calculated, the directional texts with the semantic similarity lower than the preset similarity threshold are removed, where the preset similarity threshold is preset, and the directional texts with the semantic similarity lower than the preset similarity threshold are deleted, so as to avoid generating texts with a greater ambiguity with the text to be expanded, and ensure the controllability of text generation.
The text expansion method based on artificial intelligence of the first embodiment of the invention extracts the initial keyword tuple consisting of the keywords in the text to be expanded, processes the initial keyword tuple according to the preset rule to obtain a plurality of target keyword tuples, and then generates a plurality of expanded directional texts by utilizing the plurality of target keyword tuples, wherein the directional texts generated according to the target keyword tuples are not completely the same as the text to be expanded due to the difference between the target keyword tuples and the initial keyword tuples, and the target keyword tuples and the initial keyword tuples still have more same keywords, so that the semantics between the generated directional texts and the text to be expanded are more similar, and the directional texts are filtered according to the similarity between the texts to delete the directional texts with larger semantic deviation, thus, control over the augmented text is achieved, rather than generating text at will.
FIG. 2 is a flowchart illustrating a text augmentation method based on artificial intelligence according to a second embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 2 if the results are substantially the same. As shown in fig. 2, the method comprises the steps of:
step S201: and extracting an initial keyword tuple from a pre-acquired text to be expanded.
In this embodiment, step S201 in fig. 2 is similar to step S101 in fig. 1, and for brevity, is not described herein again.
Step S202: and when the number of the keywords in the initial keyword tuple is lower than a preset number threshold, inquiring the expanded keywords of each keyword in the initial keyword tuple from a preset expanded keyword knowledge base.
It should be noted that the extended keyword knowledge base is preset, and specifically, a binary group of related feature words may be mined by using Apriori algorithm, so as to construct the extended keyword knowledge base.
In step S202, after the initial keyword tuple is obtained, if the keyword data in the initial keyword tuple is lower than the preset number threshold, it is indicated that the text to be expanded is a short text, the number of the keywords extracted from the short text is small, and once the keywords in the short text are processed, the semantic meaning is greatly changed. Therefore, in this embodiment, when the number of the keywords in the initial keyword tuple is lower than the preset number threshold, the corresponding extended keywords are queried from the preset extended keyword knowledge base, and the extended keywords are used to perform content extension on the text to be extended.
Step S203: the expanded keyword is added to the initial keyword tuple.
In step S203, after the expanded keyword is queried, the expanded keyword is added to the initial keyword tuple to expand the initial keyword tuple, so that the number of keywords in the initial keyword tuple exceeds a preset number threshold.
Further, in order to avoid a large deviation between the semantics and the text to be expanded due to the expansion, step S203 further includes:
1. inputting the expanded keywords and keywords in the initial keyword tuples into a text generation model to obtain an expanded text;
2. calculating the semantic similarity between the expanded text and the text to be expanded;
3. judging whether the semantic similarity is higher than a preset similarity threshold value or not;
4. if so, the step of adding the expanded keyword to the initial keyword tuple is allowed to be performed.
Specifically, after the expanded keywords are obtained, expanded texts of the texts to be expanded are generated by utilizing the expanded keywords and the keywords in the initial keyword tuples, semantic similarity between the expanded texts and the texts to be expanded is calculated, and when the semantic similarity is higher than a preset similarity threshold value, the expanded keywords are allowed to be added into the initial keyword tuples, so that large semantic deviation after the texts to be expanded are expanded is avoided. Step S204: and processing the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples which are different from the initial keyword tuple.
In this embodiment, step S204 in fig. 2 is similar to step S102 in fig. 1, and for brevity, is not described herein again.
Step S205: and respectively inputting the target keyword tuples into a pre-trained text generation model to generate a plurality of directional texts, wherein the text generation model is obtained by training according to historical text data.
In this embodiment, step S205 in fig. 2 is similar to step S103 in fig. 1, and for brevity, is not described herein again.
Step S206: and respectively calculating the semantic similarity between each directional text and the text to be expanded.
In this embodiment, step S206 in fig. 2 is similar to step S104 in fig. 1, and for brevity, is not described herein again.
Step S207: and removing the oriented texts with the semantic similarity lower than a preset similarity threshold.
In this embodiment, step S207 in fig. 2 is similar to step S105 in fig. 1, and for brevity, is not described herein again.
The text expansion method based on artificial intelligence in the second embodiment of the present invention determines whether the text to be expanded is a short text by determining the number of keywords in the initial keyword tuple based on the first embodiment, and performs content expansion on the initial keyword tuple of the short text if the text to be expanded is a short text, so as to reduce the influence degree on the semantics after subsequent keyword processing, and also determines the expanded semantics when the short text is expanded, thereby avoiding the too large semantic deviation between the expanded semantics and the original text to be expanded.
FIG. 3 is a functional block diagram of an artificial intelligence-based text augmentation apparatus according to an embodiment of the present invention. As shown in fig. 3, the artificial intelligence based text augmentation apparatus 30 includes an extraction module 31, a processing module 32, a generation module 33, a calculation module 34, and a culling module 35.
The extraction module 31 is configured to extract an initial keyword tuple from a pre-obtained text to be expanded;
a processing module 32, configured to process the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples that are different from the initial keyword tuple;
the generating module 33 is configured to input the multiple target keyword tuples into a pre-trained text generating model respectively to generate multiple directional texts, where the text generating model is obtained by training according to historical text data;
a calculating module 34, configured to calculate semantic similarity between each directional text and the text to be expanded;
and the eliminating module 35 is configured to eliminate the directional text with the semantic similarity lower than the preset similarity threshold.
Optionally, the artificial intelligence based text augmentation apparatus 30 further includes a training module, configured to train a text generation model in advance, where the operation of the training module to train the text generation model specifically includes: acquiring a training sample text and an initial training keyword tuple corresponding to the training sample text; processing the initial training keyword tuples based on a preset rule to obtain a plurality of target training keyword tuples which are different from the initial training keyword tuples; respectively inputting a plurality of target training keyword tuples into a text generation model to be trained to generate a plurality of training directional texts; and reversely propagating and updating the text generation model according to the plurality of training directional texts, the training sample texts and a preset loss function.
Optionally, the processing module 32 may further perform an operation of processing the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples that are different from the initial keyword tuple, where: and carrying out synonym replacement of random number and/or deletion of random number and/or random disordering of arrangement position sequence on the keywords in the initial keyword tuples to obtain a plurality of target keyword tuples.
Optionally, the operation performed by the extracting module 31 to extract the initial keyword tuple from the pre-obtained text to be expanded may also be: performing word segmentation on a text to be expanded by using a pre-constructed word segmentation device to obtain a plurality of candidate words and attributes of each candidate word; according to the attributes, respectively scoring the candidate words by using a preset scoring algorithm to obtain scoring results; and sorting the candidate words in a descending order according to the scoring result, and selecting a preset number of candidate words arranged in the front to construct an initial keyword tuple.
Optionally, the operation performed by the extracting module 31 to extract the initial keyword tuple from the pre-obtained text to be expanded may also be: performing word segmentation on a text to be expanded by using a pre-constructed word segmentation device to obtain a plurality of candidate words; and filtering the relation words in the candidate words by using a pre-constructed relation word library, and constructing an initial keyword tuple by using the remaining candidate words.
Optionally, after the extracting module 31 performs an operation of extracting an initial keyword tuple from the pre-obtained text to be augmented, the extracting module is further configured to: when the number of the keywords in the initial keyword tuple is lower than a preset number threshold, inquiring the expanded keywords of each keyword in the initial keyword tuple from a preset background knowledge base; the expanded keyword is added to the initial keyword tuple.
Optionally, before the extracting module 31 performs the operation of adding the expanded keyword to the initial keyword tuple, it is further configured to: inputting the expanded keywords and keywords in the initial keyword tuples into a text generation model to obtain an expanded text; calculating the semantic similarity between the expanded text and the text to be expanded; judging whether the semantic similarity is higher than a preset similarity threshold value or not; if so, the step of adding the expanded keyword to the initial keyword tuple is allowed to be performed.
For other details of the technical solution for implementing each module in the text expansion apparatus based on artificial intelligence in the foregoing embodiment, reference may be made to the description in the text expansion method based on artificial intelligence in the foregoing embodiment, and details are not described here again.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 4, the computer device 40 includes a processor 41 and a memory 42 coupled to the processor 41, wherein the memory 42 stores program instructions, and the program instructions, when executed by the processor 41, cause the processor 41 to execute the steps of the artificial intelligence based text augmentation method according to any one of the above embodiments.
The processor 41 may also be referred to as a CPU (Central Processing Unit). The processor 41 may be an integrated circuit chip having signal processing capabilities. The processor 41 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores program instructions 51 capable of implementing all the methods described above, where the program instructions 51 may be stored in the storage medium in the form of a software product, and include several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or computer equipment, such as a computer, a server, a mobile phone, and a tablet.
In the several embodiments provided in the present application, it should be understood that the disclosed computer apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.
Claims (10)
1. A text expansion method based on artificial intelligence is characterized by comprising the following steps:
extracting an initial keyword tuple from a pre-obtained text to be expanded;
processing the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples which are different from the initial keyword tuple;
respectively inputting the target keyword tuples into a pre-trained text generation model to generate a plurality of directional texts, wherein the text generation model is obtained by training according to historical text data;
respectively calculating semantic similarity between each directional text and the text to be expanded;
and eliminating the oriented texts of which the semantic similarity is lower than a preset similarity threshold.
2. The artificial intelligence based text augmentation method of claim 1, wherein the step of pre-training the text generation model comprises:
acquiring a training sample text and an initial training keyword tuple corresponding to the training sample text;
processing the initial training keyword tuples based on the preset rules to obtain a plurality of target training keyword tuples which are different from the initial training keyword tuples;
respectively inputting the target training keyword tuples into the text generation model to be trained to generate a plurality of training oriented texts;
and updating the text generation model according to the plurality of training directional texts, the training sample texts and the preset loss function back propagation.
3. The artificial intelligence based text augmentation method of claim 1, wherein the processing the initial keyword tuple based on preset rules to obtain a plurality of target keyword tuples that are different from the initial keyword tuple comprises:
and carrying out synonym replacement with random quantity and/or deletion with random quantity and/or random disordering of the arrangement position sequence on the keywords in the initial keyword tuples to obtain a plurality of target keyword tuples.
4. The artificial intelligence based text augmentation method of claim 1, wherein the extracting of the initial keyword tuple from the pre-obtained text to be augmented comprises:
utilizing a pre-constructed word segmentation device to segment the text to be expanded to obtain a plurality of candidate words and the attribute of each candidate word;
according to the attributes, respectively scoring the candidate words by using a preset scoring algorithm to obtain scoring results;
and sorting the candidate words in a descending order according to the scoring result, and selecting a preset number of candidate words arranged in the front to construct the initial keyword tuple.
5. The artificial intelligence based text augmentation method of claim 1, wherein the extracting of the initial keyword tuple from the pre-obtained text to be augmented comprises:
utilizing a pre-constructed word segmentation device to segment the text to be expanded to obtain a plurality of candidate words;
and filtering the relation words in the candidate words by using a pre-constructed relation word library, and constructing the initial keyword tuple by using the remaining candidate words.
6. The artificial intelligence based text augmentation method of claim 1, wherein after extracting an initial keyword tuple from a pre-obtained text to be augmented, the method further comprises: when the number of the keywords in the initial keyword tuple is lower than a preset number threshold, inquiring the expanded keywords of each keyword in the initial keyword tuple from a preset expanded keyword knowledge base;
adding the expanded keyword to the initial keyword tuple.
7. The artificial intelligence based text augmentation method of claim 6, wherein the adding the expanded keyword to the initial keyword tuple is preceded by:
inputting the expanded keywords and the keywords in the initial keyword tuples into the text generation model to obtain an expanded text;
calculating the semantic similarity between the expanded text and the text to be expanded;
judging whether the semantic similarity is higher than the preset similarity threshold;
if yes, allowing the step of adding the expanded keyword to the initial keyword tuple to be executed.
8. An artificial intelligence based text augmentation apparatus, comprising:
the extraction module is used for extracting an initial keyword tuple from a pre-acquired text to be expanded;
the processing module is used for processing the initial keyword tuple based on a preset rule to obtain a plurality of target keyword tuples which are different from the initial keyword tuple;
the generating module is used for respectively inputting the target keyword tuples into a pre-trained text generating model to generate a plurality of directional texts, and the text generating model is obtained by training according to historical text data;
the calculation module is used for respectively calculating the semantic similarity between each directional text and the text to be expanded;
and the removing module is used for removing the oriented text with the semantic similarity lower than a preset similarity threshold.
9. A computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions that, when executed by the processor, cause the processor to perform the steps of the artificial intelligence based text augmentation method of any one of claims 1-7.
10. A storage medium storing program instructions capable of implementing the artificial intelligence based text augmentation method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210040654.1A CN114385791A (en) | 2022-01-14 | 2022-01-14 | Text expansion method, device, equipment and storage medium based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210040654.1A CN114385791A (en) | 2022-01-14 | 2022-01-14 | Text expansion method, device, equipment and storage medium based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114385791A true CN114385791A (en) | 2022-04-22 |
Family
ID=81201633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210040654.1A Pending CN114385791A (en) | 2022-01-14 | 2022-01-14 | Text expansion method, device, equipment and storage medium based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114385791A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391191A (en) * | 2023-10-25 | 2024-01-12 | 山东高速信息集团有限公司 | Knowledge graph expansion method, equipment and medium for expressway emergency field |
WO2024011813A1 (en) * | 2022-07-15 | 2024-01-18 | 山东海量信息技术研究院 | Text expansion method and apparatus, device, and medium |
-
2022
- 2022-01-14 CN CN202210040654.1A patent/CN114385791A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024011813A1 (en) * | 2022-07-15 | 2024-01-18 | 山东海量信息技术研究院 | Text expansion method and apparatus, device, and medium |
CN117391191A (en) * | 2023-10-25 | 2024-01-12 | 山东高速信息集团有限公司 | Knowledge graph expansion method, equipment and medium for expressway emergency field |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635273B (en) | Text keyword extraction method, device, equipment and storage medium | |
CN112101041B (en) | Entity relationship extraction method, device, equipment and medium based on semantic similarity | |
CN109388795B (en) | Named entity recognition method, language recognition method and system | |
CN106776544B (en) | Character relation recognition method and device and word segmentation method | |
CN108287858B (en) | Semantic extraction method and device for natural language | |
CN109344240B (en) | Data processing method, server and electronic equipment | |
CN110569354B (en) | Barrage emotion analysis method and device | |
US20140032207A1 (en) | Information Classification Based on Product Recognition | |
CN108538294B (en) | Voice interaction method and device | |
CN112215008A (en) | Entity recognition method and device based on semantic understanding, computer equipment and medium | |
CN111143571B (en) | Entity labeling model training method, entity labeling method and device | |
CN115017303A (en) | Method, computing device and medium for enterprise risk assessment based on news text | |
CN114385791A (en) | Text expansion method, device, equipment and storage medium based on artificial intelligence | |
CN109117477B (en) | Chinese field-oriented non-classification relation extraction method, device, equipment and medium | |
Jang et al. | A novel density-based clustering method using word embedding features for dialogue intention recognition | |
CN113850080A (en) | Rhyme word recommendation method, device, equipment and storage medium | |
CN115186080A (en) | Intelligent question-answering data processing method, system, computer equipment and medium | |
CN110874408A (en) | Model training method, text recognition device and computing equipment | |
CN110969005A (en) | Method and device for determining similarity between entity corpora | |
CN111428487B (en) | Model training method, lyric generation method, device, electronic equipment and medium | |
CN110990451A (en) | Data mining method, device and equipment based on sentence embedding and storage device | |
CN108021609B (en) | Text emotion classification method and device, computer equipment and storage medium | |
CN109947932B (en) | Push information classification method and system | |
CN110162615A (en) | A kind of intelligent answer method, apparatus, electronic equipment and storage medium | |
CN113609287A (en) | Text abstract generation method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |