CN115186065A - Target word retrieval method and device - Google Patents

Target word retrieval method and device Download PDF

Info

Publication number
CN115186065A
CN115186065A CN202210842766.9A CN202210842766A CN115186065A CN 115186065 A CN115186065 A CN 115186065A CN 202210842766 A CN202210842766 A CN 202210842766A CN 115186065 A CN115186065 A CN 115186065A
Authority
CN
China
Prior art keywords
phrase
search
query
determining
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210842766.9A
Other languages
Chinese (zh)
Inventor
綦红镀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202210842766.9A priority Critical patent/CN115186065A/en
Publication of CN115186065A publication Critical patent/CN115186065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a target word retrieval method and a target word retrieval device, which can be applied to the field of artificial intelligence. Obtaining an original retrieval phrase; determining a candidate set according to the original retrieval phrase; and querying each query phrase in the candidate set, and determining a query result corresponding to each query phrase. The method extracts the potential semantic information contained in the corpus of the search library by introducing the potential semantics, and constructs the candidate set query with similar semantics based on the original query, so as to solve the problem that the existing full-text search lacks semantic matching capability, and improve the intelligent degree and the user experience of the full-text search.

Description

Target word retrieval method and device
Technical Field
The application relates to the technical field of computers, in particular to a target word retrieval method and device.
Background
Full-text retrieval is retrieval in which arbitrary content information in the entire book or the entire article stored in a database is searched out. The method can obtain information about chapters, sections, paragraphs, sentences, words and the like in the whole text as required, namely, similarly, a label is added to each word in the whole book, and various statistics and analysis can be carried out. The TFIDF algorithm cannot dig out deep semantic relations among vocabularies, so that the traditional ES search engine cannot process the condition of one meaning of multiple words, stays at a low-level keyword search level and cannot provide search for a user semantic level. For example: a user searches for "automobile", i.e. car, a traditional full text search would only return records containing the "automobile" word, while records containing the "car" word may actually be desired by the user.
That is, in the current search method for target words, the search result is usually limited by the word face of the request sentence input by the user, and cannot deeply capture the real intention behind the sentence input by the user. The retrieval method of the words has poor recall and poor accuracy.
Disclosure of Invention
In view of this, the embodiment of the present application provides a method and an apparatus for retrieving a target word, which aim to implement accurate full-text retrieval of the target word.
In a first aspect, an embodiment of the present application provides a method and an apparatus for retrieving a target word, where the method includes:
acquiring an original retrieval phrase;
determining a candidate set according to the original search phrase; the candidate set comprises an original search phrase and a plurality of first search phrases, wherein the first search phrases are phrases semantically similar to the original search phrase
And querying each query phrase in the candidate set, and determining a query result corresponding to each query phrase.
Optionally, the determining a candidate set according to the original search phrase includes:
acquiring a potential semantic computation model;
determining a plurality of first search terms through the latent semantic calculation model and a first rule, wherein the first search terms are terms similar to the original search terms in semantic meaning; the first rule is used for determining the number of the first search phrases;
merging the original search phrase with the plurality of first search phrases to form a candidate set.
Optionally, the determining the query result corresponding to each query phrase includes:
determining a text search record according to the query phrase, wherein the text search record comprises text related terms;
and determining the query result according to a second rule, wherein the second rule is used for determining the number of the text related terms in the query result.
Optionally, after determining the query result corresponding to each query phrase, the method further includes:
and combining the query results corresponding to each query phrase, and taking the combined results as a final query result set.
Optionally, the latent semantic calculation model is a latent semantic analysis model or a word vector model.
In a second aspect, an embodiment of the present application provides an apparatus for retrieving a target word, where the apparatus includes:
the original retrieval phrase acquisition module is used for acquiring an original retrieval phrase;
a candidate set determining module, configured to determine a candidate set according to the original search phrase; the candidate set comprises an original search phrase and a plurality of first search phrases, wherein the first search phrases are phrases which are similar to the original search phrase in semantic meaning;
and the query result determining module is used for querying each query phrase in the candidate set and determining the query result corresponding to each query phrase.
Optionally, the candidate set determining module includes:
the calculation model acquisition module is used for acquiring a potential semantic calculation model;
a first search phrase determination module, configured to determine a plurality of first search phrases according to the latent semantic calculation model and a first rule, where the first search phrases are phrases semantically similar to the original search phrase; the first rule is used for determining the number of the first search phrase;
and the candidate set forming module is used for combining the original search phrase and the plurality of first search phrases to form a candidate set.
Optionally, the query result determining module includes:
the text search record determining module is used for determining text search records according to the query phrases, wherein the text search records comprise text related terms;
and the query result determining module is used for determining the query result according to a second rule, and the second rule is used for determining the number of the text related terms in the query result.
Optionally, the apparatus further comprises:
and the merging module is used for merging the query results corresponding to each query phrase, and taking the merged results as a final query result set.
Optionally, the latent semantic calculation model is a latent semantic analysis model or a word vector model.
The embodiment of the application provides a target word retrieval method and device. When the method is executed, obtaining an original retrieval phrase; determining a candidate set according to the original search phrase; and querying each query phrase in the candidate set, and determining a query result corresponding to each query phrase. Therefore, after the user inputs the target retrieval phrase, the system analyzes the target retrieval phrase to obtain the synonymous phrase, and obtains a plurality of retrieval results by taking the original text to be retrieved and a plurality of semantic similar text records as the retrieval conditions. Therefore, the comprehensive retrieval effect of the target words is achieved. Therefore, the search result integrates keyword search and semantic search, and the recall ratio and precision ratio of the conventional full-text search are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and obviously, the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow diagram of a method for retrieving a target term provided by an embodiment of the present application;
FIG. 2 is a flow diagram of a method for retrieving a target term provided by an embodiment of the present application;
fig. 3 is a schematic structural diagram of the retrieval of the target word provided in the embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As mentioned above, the TFIDF algorithm is used to calculate the correlation degree for the current full-text search engine, such as the ES search bottom layer. However, the inventor finds that the method is relatively limited, the TFIDF algorithm cannot dig out deep semantic relations among vocabularies, so that the retrieval method cannot capture the real intention behind the sentence input by the user deeply, and the method has the defects of poor comprehensiveness checking performance and low accuracy.
In order to solve the problem, the embodiment of the application provides a method and a device for searching a target word, wherein when the method is executed, an original search phrase is obtained; determining a candidate set according to the original search phrase; and querying each query phrase in the candidate set, and determining a query result corresponding to each query phrase. Therefore, after the user inputs the target search phrase, the system analyzes the target search phrase to obtain the synonymous phrase, and obtains a plurality of search results by taking the original text to be searched and a plurality of semantic similar text records as search conditions. Therefore, the comprehensive retrieval effect of the target words is achieved. Therefore, the search result integrates keyword search and semantic search, and the recall ratio and precision ratio of the conventional full-text search are improved.
The method provided by the embodiment of the application is executed by a search engine and a background server, for example, the background server comprises a retrieval system with a retrieval function and an integration function. After the retrieval system obtains the semantic calculation model in the retrieval library, the original retrieval phrases are analyzed to obtain phrases with similar semantics, the system inputs the query object into a search engine such as an ES (electronic storage system) for retrieval, and a query result is formed according to the relevancy. The background server may be one server device, or may be a server cluster composed of a plurality of servers.
The following describes a method for retrieving a target word provided by the present application, by using an embodiment. Referring to fig. 1, fig. 1 is a flowchart of a method for retrieving a target word according to an embodiment of the present application, including:
s101: the original search phrase is obtained.
The original search phrase is an initial search target phrase. In a specific application scenario, the original search phrase may be input by a user, or may be set by the system according to a query requirement.
S102: determining a candidate set from the original search phrase.
The candidate set comprises an original search phrase and a plurality of first search phrases, wherein the first search phrases are phrases semantically similar to the original search phrase.
Assuming that the query original retrieval phrase is 'big data investigation', calculating the first 2 records with most similar semantemes in the retrieval library of the to-be-retrieved phrase as 'investigation' and 'learning' through a potential semantic model, and then the two records are the first retrieval phrase. The system can include the original search phrase and the plurality of first search phrases obtained from the query into a candidate set. In an actual application scenario, the system can set different marks for each first search phrase according to different correlation degrees of different first search phrases and original search phrases, and can be used for distinguishing search results with different similarity degrees in a subsequent process.
For how to determine the candidate set according to the original search phrase, it is specifically referred to the following text and will not be described herein.
S103: and querying each query phrase in the candidate set, and determining a query result corresponding to each query phrase.
The query text is phrases that are queried one by one in the candidate set. And sequentially extracting the original retrieval phrase and the plurality of first retrieval phrases from the candidate set, and taking the original retrieval phrase and the plurality of first retrieval phrases as query phrases one by one. A query result corresponding to each query phrase is determined.
In an actual application scenario, the system may merge the query results corresponding to each query phrase to form a final query result set.
The following describes in detail a method for retrieving a target word provided in an embodiment of the present application. Referring to fig. 2, fig. 2 is another schematic flow chart of the retrieval of the target words by the embodiments of the present application. The specific process is as follows:
s201: the original search phrase is obtained.
The system obtains the original search phrase to be searched.
S202: and acquiring a potential semantic calculation model.
Obtaining a corpus offline calculation potential semantic model in a search library, wherein the model can be an LSA model or a Word2vec model, and is not limited herein, the following steps are implemented by using an LSA as an example, and specifically include:
and analyzing the document set to establish a vocabulary-text matrix A. Let A be a matrix of m x n text data (n < < m), indicating that the corpus contains m words, n documents.
Singular value decomposition is carried out on the vocabulary-text matrix, dimension reduction is carried out on the matrix after SVD decomposition, a potential semantic space LSA model is constructed by using the matrix after dimension reduction, and the formula is as follows:
Figure BDA0003751753770000061
in the formula, A m×n For m x n text data matrix, the formula decomposes large matrix A into product matrix of 3 matrixes, U m×k In the form of a word-topic matrix,
Figure BDA0003751753770000062
for the topic text matrix, a can decompose k eigenvalues, where k can refer to the number of topics, and we select r eigenvalues with larger values after sorting, and the value of r can be calculated according to the following formula:
Figure BDA0003751753770000063
in the formula P r The sum of squares of the first r larger eigenvalues of the diagonal matrix is obtained, P is the sum of squares of all eigenvalues of the diagonal matrix, the calculated r can have more than 95% of the information content of the original matrix, and r is far less than k.
Thus, the device is provided with
Figure BDA0003751753770000064
The matrix a can be approximated. U is a word-topic matrix, each column represents a latent semantic meaning, the meaning of the latent semantic meaning is formed by combining m words according to different weights, the row represents a word, and the column represents a document. Typically, an element of a word-document matrix is the number of occurrences of the word in the document. Because each column in U is independent, r latent semantics form a semantic space, each column in the matrix U represents a keyword, and the larger the value, the more relevant, therefore, the more U is passed through m×r The correlation between words and word senses can be seen. Each row in the matrix V represents a category of topics, wherein each non-zero element represents the relevance of a topic to a document, such as a document
Figure BDA0003751753770000065
The relevance of the text to the topic can be seen. And Σ V T Is a topic-document matrix, Σ V T Each column in represents a document that is mapped into a semantic space, each singular value in Σ indicates the importance of the latent semantic, and the matrix Σ represents the correlation between the article topic and the keyword.
S203: a plurality of first search terms is determined from the latent semantic computation model and a first rule.
Wherein the first rule is used for determining the number of the first search phrase. In a specific application scenario, the first rule may be set by the user, or may be set by the system according to the query requirement.
In some possible implementation manners, a user may select or remove search records with similar semantics provided by the system according to personal search requirements, or the user may select different first search phrases and original search phrases to form different combinations according to requirements. For example, when a user deletes a semantic second nearby search term, the semantic third nearby search term may actively complement the bit. The first search phrase can also be set by the user himself, and other search phrases with similar semantemes obtained by the search of the system are selected or the content of the first search phrase is input by himself.
For example, when the first rule indicates that the number of the first search terms is 2, in an actual application scenario, the system calculates the first 2 records with the most similar semantics in the search base of the terms to be searched through the latent semantic model. Suppose the query is: query = "big data investigation", and the top 2 records with most similar semantemes in the search base of the phrase to be searched are calculated through a potential semantic model as: similar _ top _2= { "hadoop investigation", "spark learning" }. Wherein, the "research" and the "learning" are the first search phrase corresponding to the current first rule.
Wherein, hadoop and Spark are both representative signs of different first search phrases. In practical application, the system can distinguish the first retrieval phrases with different similarity of the different representative signs, and in some possible implementation manners, the system can set and adaptively modify the representative signs of the first retrieval phrases.
In practical applications, both Hadoop and Spark are big data frames, spark is a fast and general computing engine designed specifically for large-scale data processing. Hadoop is a distributed system infrastructure developed by the Apache Foundation, and data processing is performed by Hadoop in a reliable, efficient, and scalable manner. During the application process, the data retrieved by the system can be further processed according to the two large data frame frames.
As a further optimization, in S203, the first N records with the most similar semantics in the search base of the phrase to be searched are calculated through the latent semantic model, which specifically includes:
for a given query, we base on the words A contained in this query q Constructing a pseudo document: v q =A q U∑ -1 The cosine similarity is then calculated for each column in the pseudo document and V to obtain the N documents that are most similar to a given query. Suppose that the text vector corresponding to the t-th column in V is V t Then a pseudo document vector V q And V t The cosine similarity between them is calculated by the formula:
Figure BDA0003751753770000071
in the formula V q And V t For the vector representation of the query text with the text corresponding to the t-th column in the semantic space matrix, | V q I and V t Is the vector V respectively q And V t Die of (c), cos (V) q ,V t ) The cosine similarity between the text vector and the document vector.
S204: merging the original search phrase with the plurality of first search phrases to form a candidate set.
Merging the original search phrase with a plurality of first search phrases corresponding to the first rule to form a candidate set, for example, merging the original search phrase "big data research" with the first search phrase "hadoop research" and "spark learning" to obtain candidate _ list = { "big data research", "hadoop research", "spark learning" }.
S205: and inquiring each query phrase in the candidate set, and determining text search records according to the query phrases.
The text search records comprise a plurality of text related terms, and the text related terms are the terms which are inquired and have relevance with the inquiry phrases. The text search record is a collection of a plurality of related terms.
Specifically, the system sequentially takes out phrases to be queried from the candidate set, and determines text search records corresponding to the current phrase according to the query phrase, for example, the query text is an original search phrase, that is, query1= "big data investigation", the text search records corresponding to the current query phrase may be "big data development status investigation", "big data related component research", "big data and artificial intelligence relationship", "big data development prospect exploration", or "application expansion of big data", the text search records are full-text search results based on the current query text, that is, if N contents directly related to the current query phrase are in the full text, the text search records may be N.
In some possible implementations, the system can rank the respective relevance terms based on the relevance to the query phrase. In other words, in the process of acquiring the text related phrases by the system, the acquired text related phrases are sorted according to the relevancy to form a sequential text search record. Therefore, in the current text search record, the sequence relation of the relevancy exists among a plurality of text related terms, the query phrase big data research is used as the relevancy judgment standard, the research relevancy of the big data development status is higher than that of the research big data related component, and the relevancy gradually decreases backwards.
In some possible implementations, the system retrieves the full text based only on the query phrase, without ordering the terms of relevance obtained during the retrieval process. Therefore, the formed text search records have no high-low order relation of the relevance among the relevant words of the texts. Regarding extracting the text related words in the text search records according to the relevancy, after the system obtains the number request corresponding to the second rule, the system can sort the text related words in the text search records, and can also set a relevancy threshold value to directly screen a plurality of text related words in the text search records, so as to obtain the number of text related words corresponding to the second rule. Namely, the system selects a plurality of text related phrases with the relevance reaching the standard from the text search record set without the sequence relation.
S206: and determining the query result according to a second rule.
The second rule is used for determining the number of the text related terms in the query result. In a specific application scenario, the second rule may be set by the user, or may be set by the system according to the query requirement.
For example, when the second rule indicates that the number of the text related phrases is 3, in an actual application scenario, the system takes a text record of 3 before the degree of correlation to form a query result. Assuming that the text search record corresponding to the current query phrase is "big data development status investigation", "research big data related component", "big data and artificial intelligence relationship", "big data development prospect exploration", "application expansion of big data", selecting a text record with a correlation degree of 3 before to form a query result according to the text related phrase selection rule mentioned in the step S205, and then selecting a corresponding result set result1= { "big data development status investigation", "research big data related component", "big data and artificial intelligence relationship" }.
S207: and combining the query results corresponding to each query phrase, and taking the combined results as a final query result set.
According to the step S206, query results corresponding to a plurality of query texts in the candidate set are determined.
For example, the query texts are sequentially extracted from the candidate set and input into a full-text search engine such as an ES for retrieval, and the text records with the highest relevance degree of 3 are extracted to form a query result. Query text query1= "big data investigation", query2= "hadoop investigation", query3= "spark learning", corresponding result set result1= { "big data development status investigation", "research big data related component", "big data and artificial intelligence relationship" }, result2= { "hadoop investigation", "hadoop technical investigation", "hadoop fast entry" }, result3 { "spark learning", "spark learning note", "spark basic course" }. In the current step, combining the query results corresponding to each query phrase to form a search result set, wherein the search result set is as follows: result _ list = { "big data development status investigation", "research big data related component", "big data and artificial intelligence relationship", "hadoop investigation", "hadoop technology investigation", "hadoop fast entry", "spark learning note", "spark basic course" }.
In an actual application scenario, the query results may be displayed in a classified manner on the user-side search interface, for example, the original search phrase and the first search phrase with different similarity may be distinguished, the query result of the original search phrase is set to be located in the first row, and then, the similarity of the query result with the row number corresponding to the first search phrase is decreased progressively. Or the system can set different colors for different phrases, so as to achieve the distinguishing effect on the display interface.
The foregoing provides some specific implementation manners of a retrieval method based on latent semantic analysis for the embodiments of the present application, and based on this, the present application also provides a corresponding apparatus. The device provided by the embodiment of the present application will be described in terms of functional modularity.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a target word retrieval device according to an embodiment of the present application.
In this embodiment, the apparatus may include:
an original search phrase obtaining module 300, configured to obtain an original search phrase;
a candidate set determining module 301, configured to determine a candidate set according to the original search phrase; the candidate set comprises an original search phrase and a plurality of first search phrases, wherein the first search phrases are phrases which are similar to the original search phrase in semantic meaning;
a query result determining module 302, configured to query each query phrase in the candidate set from the target text, and determine a query result corresponding to each query phrase.
Optionally, the candidate set determining module includes:
the calculation model acquisition module is used for acquiring a potential semantic calculation model;
a first search phrase determination module, configured to determine a plurality of first search phrases according to the latent semantic calculation model and a first rule, where the first search phrases are phrases semantically similar to the original search phrase; the first rule is used for determining the number of the first search phrase;
and the candidate set forming module is used for combining the original search phrase and the plurality of first search phrases to form a candidate set.
Optionally, the query result determining module includes:
the text search record determining module is used for determining text search records according to the query phrases, wherein the text search records comprise text related terms;
and the query result determining module is used for determining the query result according to a second rule, and the second rule is used for determining the number of the text related terms in the query result.
Optionally, the apparatus further comprises:
and the merging module is used for merging the query results corresponding to each query phrase, and taking the merged results as a final query result set.
Optionally, the apparatus further comprises:
the latent semantic calculation model is a latent semantic analysis model or a word vector model.
It should be noted that the target word retrieval method and device provided by the invention can be used in the field of artificial intelligence. The above description is only an example, and does not limit the application field of the target word retrieval method and apparatus provided by the present invention.
The above provides a detailed description of a method and apparatus for retrieving a target word. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It should also be noted that, in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for retrieving a target word, the method comprising:
acquiring an original retrieval phrase;
determining a candidate set according to the original retrieval phrase; the candidate set comprises an original search phrase and a plurality of first search phrases which are semantically similar to the original search phrase
And querying each query phrase in the candidate set, and determining a query result corresponding to each query phrase.
2. The method of claim 1, wherein said determining a candidate set from said original search phrase comprises:
acquiring a potential semantic computation model;
determining a plurality of first search terms through the latent semantic calculation model and a first rule, wherein the first search terms are terms similar to the original search terms in semantic meaning; the first rule is used for determining the number of the first search phrase;
merging the original search phrase with the plurality of first search phrases to form a candidate set.
3. The method for retrieving the target term according to claim 1, wherein the determining the query result corresponding to each query phrase comprises:
determining a text search record according to the query phrase, wherein the text search record comprises text related terms;
and determining the query result according to a second rule, wherein the second rule is used for determining the number of the text related terms in the query result.
4. The method for retrieving the target term in claim 1, wherein after determining the query result corresponding to each query phrase, the method further comprises:
and combining the query results corresponding to each query phrase, and taking the combined results as a final query result set.
5. The method for retrieving target words according to claim 2, wherein the latent semantic calculation model is an implicit semantic analysis model or a word vector model.
6. An apparatus for retrieving a target word, the apparatus comprising:
the original retrieval phrase acquisition module is used for acquiring an original retrieval phrase;
a candidate set determining module, configured to determine a candidate set according to the original search phrase; the candidate set comprises an original search phrase and a plurality of first search phrases, wherein the first search phrases are phrases which are similar to the original search phrase in semantic meaning;
and the query result determining module is used for querying each query phrase in the candidate set and determining a query result corresponding to each query phrase.
7. The apparatus of claim 6, wherein the candidate set determination module comprises:
the calculation model acquisition module is used for acquiring a potential semantic calculation model;
a first search phrase determination module, configured to determine a plurality of first search phrases according to the latent semantic calculation model and a first rule, where the first search phrases are phrases semantically similar to the original search phrase; the first rule is used for determining the number of the first search phrase;
and the candidate set forming module is used for combining the original search phrase and the plurality of first search phrases to form a candidate set.
8. The apparatus of claim 6, wherein the query result determination module comprises:
the text search record determining module is used for determining text search records according to the query phrases, wherein the text search records comprise text related terms;
and the query result determining module is used for determining the query result according to a second rule, and the second rule is used for determining the number of the text related terms in the query result.
9. The apparatus of claim 6, further comprising:
and the merging module is used for merging the query results corresponding to each query phrase, and taking the merged results as a final query result set.
10. The apparatus of claim 7, wherein the latent semantic computation model is an implicit semantic analysis model or a word vector model.
CN202210842766.9A 2022-07-18 2022-07-18 Target word retrieval method and device Pending CN115186065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210842766.9A CN115186065A (en) 2022-07-18 2022-07-18 Target word retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210842766.9A CN115186065A (en) 2022-07-18 2022-07-18 Target word retrieval method and device

Publications (1)

Publication Number Publication Date
CN115186065A true CN115186065A (en) 2022-10-14

Family

ID=83518402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210842766.9A Pending CN115186065A (en) 2022-07-18 2022-07-18 Target word retrieval method and device

Country Status (1)

Country Link
CN (1) CN115186065A (en)

Similar Documents

Publication Publication Date Title
CN108959270B (en) Entity linking method based on deep learning
CN105653706B (en) A kind of multilayer quotation based on literature content knowledge mapping recommends method
US7912849B2 (en) Method for determining contextual summary information across documents
US8073877B2 (en) Scalable semi-structured named entity detection
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN106202124B (en) Webpage classification method and device
US8341112B2 (en) Annotation by search
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
CN111104794A (en) Text similarity matching method based on subject words
CN109829104A (en) Pseudo-linear filter model information search method and system based on semantic similarity
CN106708929B (en) Video program searching method and device
CN110750995B (en) File management method based on custom map
US20180341686A1 (en) System and method for data search based on top-to-bottom similarity analysis
CN115270738B (en) Research and report generation method, system and computer storage medium
CN112597305B (en) Scientific literature author name disambiguation method and web end disambiguation device based on deep learning
CN112559684A (en) Keyword extraction and information retrieval method
CN110888991A (en) Sectional semantic annotation method in weak annotation environment
Al-Obaydy et al. Document classification using term frequency-inverse document frequency and K-means clustering
CN112507109A (en) Retrieval method and device based on semantic analysis and keyword recognition
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN111475725A (en) Method, apparatus, device, and computer-readable storage medium for searching for content
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN115794995A (en) Target answer obtaining method and related device, electronic equipment and storage medium
Shehata et al. An efficient concept-based retrieval model for enhancing text retrieval quality
CN111966899A (en) Search ranking method, system and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination