CN118114675B

CN118114675B - Medical named entity recognition method and device based on large language model

Info

Publication number: CN118114675B
Application number: CN202410533245.4A
Authority: CN
Inventors: 晏晓东; 赵登; 张志强; 顾进杰; 周俊
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2024-04-29
Filing date: 2024-04-29
Publication date: 2024-07-26
Anticipated expiration: 2044-04-29
Also published as: CN118114675A

Abstract

One or more embodiments of the present application provide a method and apparatus for identifying medical named entities based on a large language model, the method comprising: under the guidance of each first type of prompt text in a plurality of different first types of prompt texts, carrying out named entity recognition on the original text based on the candidate entity class set by using the large language model to obtain a named entity recognition result; determining each target named entity and at least one candidate entity category corresponding to each target named entity in the original text based on the named entity identification result, and converting the target named entity into at least one view corresponding to the target named entity, wherein the view is used for indicating the entity category corresponding to the named entity; acquiring a knowledge text related to the definition of the target named entity; extracting arguments corresponding to each viewpoint from the knowledge text by the large language model, and further evaluating the accuracy of each viewpoint based on the arguments; and determining the candidate entity category indicated by the target viewpoint with highest accuracy as the entity category corresponding to the target named entity.

Description

Medical named entity recognition method and device based on large language model

Technical Field

One or more embodiments of the present application relate to the field of artificial intelligence technology, and in particular, to a method and apparatus for identifying medical named entities based on a large language model.

Background

Named Entity Recognition (NER) is a fundamental technology in the field of natural language processing, whose main objective is to automatically recognize and sort named entities of particular significance in text. These entities are typically proper nouns, including names of people, places, organizations, time expressions, numbers, monetary values, percentages, etc.

With the development of deep learning technology, the named entity recognition technology based on deep learning has also made remarkable progress. The large language model (Large Language Model, LLM) is used as a deep learning model, can capture rich language features through large-scale corpus training, has strong language understanding capability, so how to combine the strong language understanding capability of the large language model to improve the accuracy of named entity recognition is now a concern.

Disclosure of Invention

One or more embodiments of the present application provide the following technical solutions:

the application provides a named entity recognition method based on a large language model, which comprises the following steps:

Inputting an original text to be identified, a preset candidate entity class set and each first type of prompt text in a plurality of different first types of prompt texts into a large language model, and carrying out named entity identification on the original text based on the candidate entity class set under the guidance of each first type of prompt text by the large language model to obtain named entity identification results corresponding to each first type of prompt text;

Determining at least one candidate entity category in each target named entity and the corresponding candidate entity category set in the original text based on the named entity identification result, and converting the target named entity and the corresponding at least one candidate entity category into at least one view corresponding to the target named entity; wherein the views are used for indicating entity categories corresponding to named entities;

Acquiring knowledge texts related to the definition of the target named entity;

Inputting the at least one viewpoint and the knowledge text into a large language model, extracting arguments corresponding to each viewpoint in the at least one viewpoint from the knowledge text by the large language model, and further evaluating the accuracy of each viewpoint based on the arguments to obtain a target viewpoint with highest accuracy in the at least one viewpoint;

And determining the candidate entity category indicated by the target viewpoint as the entity category corresponding to the target named entity.

The application also provides a named entity recognition device based on the large language model, which comprises:

the recognition module is used for inputting an original text to be recognized, a preset candidate entity class set and each first type of prompt text in a plurality of different first types of prompt texts into a large language model, and carrying out named entity recognition on the original text based on the candidate entity class set under the guidance of each first type of prompt text by the large language model to obtain a named entity recognition result corresponding to each first type of prompt text;

The conversion module is used for determining each target named entity and at least one candidate entity category in the corresponding candidate entity category set in the original text based on the named entity identification result, and converting the target named entity and the at least one candidate entity category corresponding to the target named entity into at least one view corresponding to the target named entity; wherein the views are used for indicating entity categories corresponding to named entities;

the acquisition module acquires knowledge texts related to the definition of the target named entity;

an evaluation module for inputting the at least one viewpoint and the knowledge text into a large language model, extracting, from the knowledge text, arguments corresponding to each of the at least one viewpoint by the large language model, and further evaluating the accuracy of each viewpoint based on the arguments, thereby obtaining a target viewpoint with the highest accuracy of the at least one viewpoint;

And the determining module is used for determining the candidate entity category indicated by the target viewpoint as the entity category corresponding to the target named entity.

The present application also provides an electronic device including:

A processor;

a memory for storing processor-executable instructions;

wherein the processor implements the steps of the method as described in any of the preceding claims by executing the executable instructions.

The application also provides a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method as claimed in any of the preceding claims.

In the technical scheme, an original text to be recognized, a preset candidate entity class set and each of a plurality of different prompt texts can be input into a large language model, the large language model carries out named entity recognition on the original text based on the candidate entity class set under the guidance of each prompt text to obtain named entity recognition results corresponding to each prompt text, and therefore each named entity and at least one corresponding candidate entity class thereof in the original text can be determined based on the named entity recognition results; for each target named entity in the original text, the target named entity and at least one candidate entity category corresponding to the target named entity can be converted into at least one viewpoint corresponding to the target named entity and used for indicating the entity category corresponding to the named entity, the at least one viewpoint and the acquired knowledge text related to the definition of the target named entity can be input into a large language model, the knowledge text is extracted by the large language model from the knowledge text, the arguments corresponding to each viewpoint are further evaluated based on the arguments, and the viewpoint with the highest accuracy in the at least one viewpoint is obtained, so that the candidate entity category indicated by the viewpoint is determined as the entity category corresponding to the target named entity.

By adopting the mode, aiming at the large language model, through executing the primary named entity task, any one named entity and at least one entity category corresponding to the named entity are identified from the text to be identified, the large language model can evaluate the correctness according to the simulation dialect mechanism, and the entity category with the highest evaluated correctness is finally determined as the entity category corresponding to the named entity, so that the accuracy of named entity identification can be improved, and the finally determined entity category result can be ensured to meet a plurality of evaluation standards set according to requirements. In addition, the large language model is guided to execute the primary named entity recognition task by using different prompting strategies, so that the self-consistency of the large language model on the named entity recognition task can be improved, and the consistency and stability of the named entity recognition result are ensured.

Drawings

The drawings that are required for use in the description of the exemplary embodiments will be described below, in which:

FIG. 1 is a schematic diagram of a named entity recognition process based on a large language model according to an exemplary embodiment of the present application.

FIG. 2 is a flow chart of a named entity recognition method based on a large language model, according to an exemplary embodiment of the application.

FIG. 3 is a schematic diagram illustrating a named entity recognition process according to an exemplary embodiment of the present application.

FIG. 4 is a schematic diagram of a dialect-based deduction assessment process according to an exemplary embodiment of the present application.

Fig. 5 is a schematic structural view of an apparatus according to an exemplary embodiment of the present application.

FIG. 6 is a block diagram of a named entity recognition device based on a large language model, according to an exemplary embodiment of the application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments are not representative of all implementations consistent with one or more embodiments of the application. Rather, they are merely examples consistent with aspects of one or more embodiments of the present application.

It should be noted that in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described. In some other embodiments, the method may include more or fewer steps than described herein. Furthermore, individual steps described in this disclosure may be broken down into multiple steps in other embodiments; while various steps described in this application may be combined into a single step in other embodiments.

Large language models refer to deep learning models trained using large amounts of text data that may be used to generate natural language text or to understand the meaning of natural language text. The large language model can process various natural language tasks such as text classification, named entity recognition, question and answer, dialogue and the like, and is an important path to artificial intelligence.

In the field of natural language processing, large-scale text datasets are commonly referred to as corpora (Corpus). The corpus may contain various types of text data, such as: literature, academic papers, legal documents, news stories, daily conversations, emails, internet forum posts, etc. Through learning text data in a corpus, a large language model can acquire and understand rules and modes of natural language, and further effective processing and generation of human language are achieved.

Large language models typically employ a fransformer architecture, i.e., large language models are typically deep learning models based on a fransformer architecture. The deep learning model based on the transducer architecture is a neural network model adopting the transducer architecture, and the model shows color in the fields of natural language processing and the like.

The transducer is a neural network model for Sequence-to-Sequence (Sequence-to-Sequence) modeling. The Transformer does not need to rely on a recursion structure, can perform parallelization training and reasoning, and quickens the model processing speed. In a deep learning model based on a transform architecture, multiple layers of transform encoders are typically used to extract features from an input sequence and one transform decoder is used to convert the extracted features into an output sequence. At the same time, such models also typically employ a Self-attention mechanism (Self-Attention Mechanism) to capture long-range dependencies in the input sequence, and residual connection (Residual Connection) and regularization method (Normalization Method) to accelerate training and improve model performance.

The pre-training model is a large language model that is pre-trained on large-scale unlabeled text data. The pre-trained model is a generic model that is not designed and optimized for a particular task. In order to adapt the pre-trained model to specific application scenarios and task requirements, fine tuning is required to improve the performance of the model on specific tasks. The large language model finally put into use is usually a model for performing supervised learning based on tagged text data with further fine tuning based on a pre-trained model. Pretraining and fine tuning are complementary processes, the pretraining enables the model to have extensive language understanding capability, and the fine tuning enables the model to become more specialized and accurate on specific tasks.

That is, the training process of large language models can be divided into two phases: pre-training (Pre-training) and Fine-tuning (Fine-tuning). In the pre-training stage, the method can adopt an unsupervised learning (such as self-supervised learning) mode to perform pre-training on a large-scale and unlabeled text data set (such as network encyclopedia, network articles, books and the like), particularly can predict missing parts or next words according to the context, learn statistical rules and language structures such as semantics, syntax and the like, minimize prediction loss through back propagation and optimization algorithms (such as gradient descent method), iteratively update model parameters and gradually improve the understanding ability of the model to the language. In the fine tuning stage, corresponding supervised learning tasks (such as text classification, named entity identification, question-answering system, dialogue system and the like) can be selected according to specific application scenes and task requirements, and task-specific text data sets are prepared, so that a pre-trained model can be used as a fine tuning starting point, fine tuning training can be performed on the task-specific text data sets in a supervised learning mode, the task can be specifically executed based on the text data sets, loss of performance of the model in the process of processing the specific task is minimized through a back propagation and optimization algorithm (such as a gradient descent method), and model parameters are updated iteratively, so that the performance of the model on the specific task is gradually improved.

The pre-trained large language model is generally referred to as a basic model of the large language model, and the fine-tuned large language model is generally referred to as a service model of the large language. The language understanding capability of the large language model learned in the pre-training stage and the fine tuning stage enables the large language model to perform logic inference, knowledge inference or problem solving capability by understanding, analyzing and synthesizing text information when facing complex problems or tasks, and such capability is generally referred to as the inference capability of the large language model.

Large language models typically perform specific tasks or generate specific text under the direction of Prompt text (which may be referred to as Prompt). The prompt text is an initial text or segment of text provided to the large language model that is intended to motivate the model to produce a corresponding output. By prompting the text, the large language model can be explicitly told what tasks it expects to perform, for example: answer a question, simulate a dialogue, compose an article, translate text, etc. At the same time, the hint text may provide the necessary background information and context for the large language model to enable the large language model to understand the logic, style, theme, or standpoint that should be followed in generating the content. In addition, the prompt text may also motivate the large language model to exhibit its inherent knowledge reserves or specific language capabilities, such as: interpreting complex concepts, referencing regulations, mimicking a specific composer's writing style, etc.

The application provides a technical scheme for realizing named entity recognition based on a large language model, in the technical scheme, an original text to be recognized, a preset candidate entity class set and various prompt texts in a plurality of different prompt texts can be input into the large language model, the large language model carries out named entity recognition on the original text based on the candidate entity class set under the guidance of the various prompt texts to obtain named entity recognition results corresponding to the various prompt texts, so that various named entities and at least one corresponding candidate entity class in the original text can be determined based on the named entity recognition results; for each target named entity in the original text, the target named entity and at least one candidate entity category corresponding to the target named entity can be converted into at least one viewpoint corresponding to the target named entity and used for indicating the entity category corresponding to the named entity, the at least one viewpoint and the acquired knowledge text related to the definition of the target named entity can be input into a large language model, the knowledge text is extracted by the large language model from the knowledge text, the arguments corresponding to each viewpoint are further evaluated based on the arguments, and the viewpoint with the highest accuracy in the at least one viewpoint is obtained, so that the candidate entity category indicated by the viewpoint is determined as the entity category corresponding to the target named entity.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating a named entity recognition procedure based on a large language model according to an exemplary embodiment of the present application.

In this embodiment, a plurality of entity categories may be preset and used as candidate entity categories in the named entity recognition task, that is, entity categories to which the named entity recognized in the recognition process may belong, and the plurality of candidate entity categories form a candidate entity category set.

Under the condition that a text (which can be called an original text) to be subjected to named entity recognition is obtained, each first-type prompt text in the original text, the candidate entity class set and a plurality of different prompt texts (which can be called first-type prompt texts) can be input into a large language model, and the large language model performs named entity recognition on the original text based on the candidate entity class set under the guidance of each first-type prompt text to obtain named entity recognition results corresponding to each first-type prompt text. For any one first-type prompt text, the named entity recognition result obtained by performing the named entity recognition task under the guidance of the first-type prompt text by the large language model may include each named entity recognized from the original text and a candidate entity category in the candidate entity category set corresponding to each named entity, that is, a candidate entity category hit by each named entity in the candidate entity category set. At this time, the first type of prompt text may be essentially a prompt text template for triggering the above-mentioned large language model to perform a task of identifying a named entity, i.e., the content of the first type of prompt text may be a task description for the task of identifying a named entity.

When the named entity recognition results corresponding to the respective first-type prompt texts are obtained, at least one candidate entity category in the respective named entities and the corresponding candidate entity category sets in the original text may be determined based on the named entity recognition results.

Under the condition that each named entity and at least one corresponding candidate entity category in the original text are determined, each named entity can be sequentially used as a target named entity, so that the target named entity and at least one corresponding candidate entity category can be converted into at least one view corresponding to the target named entity. Wherein a perspective may be used to indicate the entity class corresponding to a named entity, i.e., a perspective may indicate that a named entity belongs to a entity class.

For the target named entity described above, knowledge (knowledges) text related to the definition of the target named entity may be obtained. Wherein knowledge text related to the definition of the target named entity may provide arguments corresponding to the above-described point of view, i.e. a series of statements or reasons for supporting or refuting the point of view.

In the case where at least one viewpoint corresponding to the above-described target named entity and knowledge text related to the definition of the target named entity are obtained, the at least one viewpoint and the knowledge text may be input into a large language model, the arguments corresponding to each of the at least one viewpoint are extracted from the knowledge text by the large language model, and further, based on the arguments, the accuracy of each of the viewpoints is evaluated to obtain the viewpoint (which may be referred to as the target viewpoint) having the highest accuracy among the at least one viewpoint. The evaluation process is a dialect-based deductive evaluation (Debate-Based Deductive Evaluation) process.

When the target viewpoint is determined, the candidate entity class indicated by the target viewpoint may be determined as the entity class corresponding to the target named entity. Thus, the entity class corresponding to each named entity identified from the original text can be finally determined through deductive evaluation based on debate.

The following describes in detail a named entity recognition procedure based on a large language model as shown in fig. 1.

Referring to fig. 2, fig. 2 is a flowchart illustrating a named entity recognition method based on a large language model according to an exemplary embodiment of the present application.

In this embodiment, the named entity recognition method based on the large language model may be applied to a server. The server may be a server including one independent physical host, or may be a server cluster formed by a plurality of independent physical hosts; or the server may be a virtual server, cloud server, etc. that is carried by the host cluster. Or the named entity recognition method based on the large language model can be applied to electronic equipment with certain computing capacity such as tablet computers, notebook computers, desktop computers, PCs (Personal Computer, personal computers), palm computers (Personal DIGITAL ASSISTANTS, PDAS) and the like.

As shown in fig. 2, the named entity recognition method based on the large language model may include the following steps:

step 202: inputting an original text to be identified, a preset candidate entity class set and each first type of prompt text in a plurality of different first types of prompt texts into a large language model, and carrying out named entity identification on the original text based on the candidate entity class set under the guidance of each first type of prompt text by the large language model to obtain named entity identification results corresponding to each first type of prompt text.

In this embodiment, in conjunction with the schematic diagram of the named entity recognition flow shown in fig. 3, when performing named entity recognition, it is generally necessary to determine in advance the entity category to which the recognized named entity may belong. These entity categories constitute the target category of the named entity recognition task, i.e. the entity category labels that the model needs to distinguish and label during the recognition process. Therefore, a plurality of entity categories can be preset as candidate entity categories in the named entity recognition task, namely entity categories to which the named entity recognized in the recognition process possibly belongs, and the candidate entity category set is formed by the plurality of candidate entity categories.

Under the condition that a text (which can be called an original text) to be subjected to named entity recognition is obtained, each first-type prompt text in the original text, the candidate entity class set and a plurality of different prompt texts (which can be called first-type prompt texts) can be input into a large language model, and the large language model performs named entity recognition on the original text based on the candidate entity class set under the guidance of each first-type prompt text to obtain named entity recognition results corresponding to each first-type prompt text. For any one first-type prompt text, the named entity recognition result obtained by performing the named entity recognition task under the guidance of the first-type prompt text by the large language model may include each named entity recognized from the original text and a candidate entity category in the candidate entity category set corresponding to each named entity, that is, a candidate entity category hit by each named entity in the candidate entity category set.

It should be noted that, the first type of prompt text may be a prompt text template for triggering the large language model to execute the task of identifying the named entity, that is, the content of the first type of prompt text may be a task description for the task of identifying the named entity. In a specific implementation, for any one of the plurality of different first-type prompt texts, a prompt text for triggering the large language model to perform named entity recognition on the original text based on the candidate entity class set can be further constructed based on the original text, the candidate entity class set and the first-type prompt text, and the constructed prompt text is input into the large language model, so that the large language model can perform named entity recognition on the original text based on the candidate entity class set under the guidance of the first-type prompt text, and a named entity recognition result corresponding to the first-type prompt text is obtained.

At this time, the large language model may refer to a service model of the large language model. In practical application, the large language model can be built, and a mode of unsupervised learning is adopted to pretrain on a large-scale and label-free text data set so as to obtain a basic model of the large language model; further, named entity recognition can be used as a supervised learning task in fine tuning training, and a text data set specific to the named entity recognition task is prepared, so that a basic model of the large language model can be used as a starting point of fine tuning, and fine tuning training is performed on the text data set specific to the named entity recognition task in a supervised learning mode, so that a service model of the large language model is obtained.

It should be noted that, under the guidance of different prompt texts, the large language model executes the named entity recognition task for the same text to be recognized, and the obtained named entity recognition results may be the same or different. Therefore, for any named entity identified from the original text, the named entity identification result corresponding to each of the first-type prompt texts may or may not include the same candidate entity category corresponding to the named entity. In addition, the same named entity may appear multiple times in the original text, and under the influence of the context, the large language model may identify the same or different entity types corresponding to the named entity for the same named entity appearing multiple times in the original text. In this case, the different candidate entity categories corresponding to the named entity included in the named entity recognition results may be determined to be the candidate entity category corresponding to the named entity.

In some embodiments, the original text may be medically relevant text, such as: medical professional books, medical academic papers, disease diagnosis guidelines, medical reports of patients, clinical diagnosis records of doctors, and the like. The named entity recognition task executed for the original text can be a medical named entity recognition task.

Medical named entity recognition is one of the key technologies in the medical health field of natural language processing, with the aim of automatically identifying and classifying specific medical entities from unstructured medical text, for example: diseases, symptoms, drugs, and therapeutic procedures, and the like.

Accordingly, the candidate entity class may be an entity class corresponding to a medical named entity.

In some embodiments, the named entity Recognition may be Zero-sample named entity Recognition (ZS-NER).

Zero sample named entity recognition is a special way of named entity recognition that aims to identify the unobserved entity categories that do not appear in the training data, without the need to provide any annotated examples for these new entity categories. Conventional named entity recognition typically relies on a large amount of training data with entity class labels, whereas in zero sample named entity recognition, models have the ability to recognize named entities belonging to new entity classes by simply understanding known entity classes and some form of class description (e.g., text description, class attributes, relationship graph, etc.) without seeing a particular entity class. In this way, the time and labor costs in data annotation can be reduced.

In some embodiments, since the text may affect the model performance of the large language model if the original text includes at least one text paragraph, the large language model performs named entity recognition on the original text based on the candidate entity class set under the guidance of the first type prompt texts, and when obtaining the named entity recognition result corresponding to the first type prompt texts, specifically, each text paragraph, the candidate entity class set and the first type prompt texts included in the original text may be input into the large language model, and the large language model performs named entity recognition on each text paragraph under the guidance of the first type prompt texts, thereby obtaining the named entity recognition result corresponding to the first type prompt texts and the first type prompt texts.

Step 204: determining at least one candidate entity category in each target named entity and the corresponding candidate entity category set in the original text based on the named entity identification result, and converting the target named entity and the corresponding at least one candidate entity category into at least one view corresponding to the target named entity; wherein the views are used to indicate entity categories corresponding to named entities.

In this embodiment, continuing to combine the schematic diagram of the named entity recognition flow shown in fig. 3, in the case of obtaining the named entity recognition results corresponding to the respective first-type prompt texts, at least one candidate entity category in the respective named entities in the original text and the corresponding candidate entity category sets may be determined based on the named entity recognition results.

Specifically, the named entity recognition results may be integrated, so as to determine different candidate entity categories corresponding to the named entities in the original text, where the candidate entity categories are included in the named entity recognition results, and determine the different candidate entity categories corresponding to the named entities in the original text, where the candidate entity categories are included in the named entity recognition results, as at least one candidate entity category corresponding to the named entities in the original text.

For example, assume that the named entity recognition result obtained by performing the named entity recognition task under the guidance of the first type prompt text a by the above large language model is "named entity: prostatic hyperplasia; entity class: disease, named entity recognition results obtained by executing the named entity recognition task under the guidance of the first type prompt text B are named entities: prostatic hyperplasia; entity class: a pathology "may then determine that one of the named entities in the original text is prostatic hyperplasia and determine that at least one candidate entity class corresponding to the named entity includes a disease and a pathology.

Under the condition that each named entity and at least one corresponding candidate entity category in the original text are determined, each named entity can be sequentially (the determining mode of the order is not limited in particular, for example, the traversed order can be adopted) used as the target named entity, so that the target named entity and at least one corresponding candidate entity category can be converted into at least one view corresponding to the target named entity. Wherein a perspective may be used to indicate the entity class corresponding to a named entity, i.e., a perspective may indicate that a named entity belongs to a entity class.

For example, assuming that the target named entity is prostatic hyperplasia, and it is determined that at least one candidate entity class corresponding to the target named entity includes a disease and a pathology, the target named entity and the two entity classes corresponding to the target named entity may be converted into two views, where the contents of the two views are: "prostatic hyperplasia" is a disease "," prostatic hyperplasia is a pathology ".

In some embodiments, when determining each target named entity in the original text and at least one candidate entity category in the corresponding candidate entity category set based on the named entity recognition results, the number of times each target named entity in the original text is recognized as each candidate entity category in the candidate entity category set may be determined based on the named entity recognition results. For example, when the named entity recognition results are integrated, not only different candidate entity categories corresponding to the named entities in the original text, which are included in the named entity recognition results, may be determined, but also the number of times that the named entities are recognized as the candidate entity categories may be counted.

Subsequently, a predetermined number of candidate entity categories identified as the largest number thereof (i.e., top N, N is the predetermined number) may be determined as at least one candidate entity category corresponding to the target named entity.

For example, assuming that the target named entity is prostatic hyperplasia, determining that at least one candidate entity class corresponding to the target named entity includes a disease, a pathology and a cause, counting that the target named entity is identified as a disease for 10 times, is identified as a pathology for 5 times, is identified as a cause for 3 times, and assuming that the preset number is 2, since 10 > 5 > 3, the at least one candidate entity class corresponding to the target named entity can be determined to include a disease and a pathology.

In some embodiments, when determining each target named entity in the original text and at least one candidate entity category in the corresponding candidate entity category set based on the named entity recognition results, similarly to the foregoing, the number of times each target named entity in the original text is recognized as each candidate entity category in the candidate entity category set may be determined based on the named entity recognition results.

However, at this time, the candidate entity class whose number of times is recognized to reach the preset threshold may be determined as at least one candidate entity class corresponding to the above-described target named entity.

For example, assuming that the target named entity is prostatic hyperplasia, determining that at least one candidate entity class corresponding to the target named entity includes a disease, a pathology, and a cause, counting that the target named entity is identified as having a number of diseases of 10, a number of pathologies of 5, a number of causes of 3, and assuming that the threshold is 4, since 10 > 5 > 4 > 3, the at least one candidate entity class corresponding to the target named entity may be determined to include a disease and a pathology.

In some embodiments, to accommodate the manner in which named entity recognition is achieved by performing a dialect-based deductive assessment provided by the present application, the candidate entity classes in the set of candidate entity classes described above may be partitioned into a dialect candidate entity class and a non-dialect candidate entity class.

In practical applications, for both the dialectical candidate entity class and the non-dialectical candidate entity class, the same named entity is allowed to be identified as a different non-dialectical candidate entity class, but the different dialectical candidate entity classes are mutually exclusive, i.e. the same named entity is not allowed to be identified as a different dialectical candidate entity class, so that the unique dialectical candidate entity class is finally determined for the named entity according to the evaluation result by performing a deductive evaluation based on the dialectical candidate entity class. For example, the candidate entity classes in the candidate entity class set may include: surgical, pathological, topical, disease, treatment project, etiology, and symptoms; the pathology, location, disease, treatment item, symptom in these candidate entity categories may be classified as resolvable candidate entity categories, and the intraoperative, etiological categories as non-resolvable candidate entity categories.

It should be noted that, for different named entities, the distinction between the candidate entity category and the candidate entity category may be the same or different.

In this case, when determining each target named entity and at least one candidate entity category in the set of candidate entity categories corresponding to each target named entity in the original text based on the named entity recognition results and converting each target named entity and at least one candidate entity category corresponding to each target named entity into at least one point of view corresponding to each target named entity, specifically, at least one dialectical candidate entity category in each target named entity and at least one dialectical candidate entity category in the set of candidate entity categories corresponding to each target named entity in the original text may be determined based on the named entity recognition results and each target named entity and at least one dialectical candidate entity category corresponding to each target named entity may be converted into at least one point of view corresponding to each target named entity.

Step 206: and acquiring knowledge text related to the definition of the target named entity.

In this embodiment, for the above-described target named entity, knowledge text related to the definition of the target named entity may be acquired. Wherein knowledge text related to the definition of the target named entity may provide arguments corresponding to the above-described point of view, i.e. a series of statements or reasons for supporting or refuting the point of view.

In practical applications, knowledge text may refer to literal material that is dedicated to conveying, documenting, or setting forth some specialized knowledge, information, facts, concepts, principles, rules, experience, insights, and the like. Knowledge text has a clear intellectual purpose, aimed at education, instruction, reference or study, to enhance understanding of a certain topic or field.

In some embodiments, in acquiring knowledge text related to the definition of the target named entity, the target named entity may be specifically input into a large language model, and knowledge text related to the definition of the target named entity may be generated by the large language model. For example, a Query (which may be referred to as Query) text for querying knowledge related to the definition of the target named entity may be constructed based on the target named entity using a large language model-based question-and-Answer system, and reasoning is performed on the Query text by the large language model-based question-and-Answer system to generate Answer (which may be referred to as Answer) text corresponding to the Query text, which is knowledge text related to the definition of the target named entity generated by the large language model.

At this time, the large language model may refer to a service model of the large language model. In practical application, the large language model can be built, and a mode of unsupervised learning is adopted to pretrain on a large-scale and label-free text data set so as to obtain a basic model of the large language model; further, the question-answering system can be used as a supervised learning task in the fine tuning training, and a text data set specific to the question-answering system task is prepared, so that a basic model of the large language model can be used as a starting point of fine tuning, and the fine tuning training is performed on the text data set specific to the question-answering system task in a supervised learning mode, so that a service model of the large language model is obtained.

It should be noted that the large language model in step 202 and the large language model in step 206 may be the same large language model or different large language models.

In some embodiments, knowledge text related to the definition of the target named entity described above may be obtained in conjunction with retrieval enhancement (RETRIEVAL AUGMENTATION) techniques.

Retrieval enhancement is a machine learning and natural language processing technique that is mainly used to improve the performance of models, especially in generative tasks. The basic idea is to combine large-scale external knowledge bases or text data with models, and during model prediction or generation, retrieve relevant information from these external sources in real time to assist the model in making more accurate, comprehensive answers or decisions.

Step 208: inputting the at least one viewpoint and the knowledge text into a large language model, extracting arguments corresponding to each viewpoint in the at least one viewpoint from the knowledge text by the large language model, and further evaluating the accuracy of each viewpoint based on the arguments to obtain a target viewpoint with highest accuracy in the at least one viewpoint.

In this embodiment, in combination with the schematic diagram of the deductive evaluation flow based on the dialect as shown in fig. 4, in the case of obtaining at least one viewpoint corresponding to the above-mentioned target named entity and knowledge text related to the definition of the target named entity, the at least one viewpoint and the knowledge text may be input into a large language model, the arguments corresponding to each of the at least one viewpoint may be extracted from the knowledge text by the large language model, and the accuracy of each of the viewpoints may be evaluated further based on the arguments, so as to obtain the viewpoint (which may be referred to as the target viewpoint) having the highest accuracy among the at least one viewpoint.

In practical applications, the above large language model may also determine that all views are inaccurate after the accuracy evaluation of each view, i.e., the arguments of any view are insufficient to support the view. In this case, the evaluation result output by the large language model may be "the most accurate point of view is: 0 "indicates that there is no view with the highest accuracy.

It should be noted that the above evaluation process is a deduction evaluation process based on debate.

The deductive evaluation based on debate is a method for evaluating the rationality, effectiveness or merits of a certain argument, opinion, decision or scheme using debate process. The method uses the form of debate to fully discuss, question, refute and debate the two sides or more of the subjects to promote deep thinking, reveal potential problems, strengthen logic tightness and finally form comprehensive judgment on the subjects.

It should be noted that the second type of prompt text may be essentially a prompt text template for triggering the large language model to perform the dialect-based deduction assessment task, i.e. the content of the second type of prompt text may be a task description for the dialect-based deduction assessment task. In a specific implementation, first, a prompt text for exciting the large language model to extract arguments corresponding to each of the at least one viewpoint from the knowledge text may be constructed based on the at least one viewpoint and the knowledge text, and the constructed prompt text may be input into the large language model, so that the large language model may extract arguments corresponding to each of the at least one viewpoint from the knowledge text; then, based on the at least one viewpoint and the arguments, a prompt text for exciting the large language model to evaluate the accuracy of each of the at least one viewpoint based on the arguments may be further constructed, and the constructed prompt text is further input into the large language model, so that the large language model evaluates the accuracy of each of the viewpoints based on the arguments, resulting in the above-described target viewpoint.

When the above-mentioned large language model extracts the arguments corresponding to the views from the knowledge text, the arguments corresponding to the views may be generated by adding some contexts to the text segments based on the text segments in the knowledge text.

For example, assuming that the target named entity is prostatic hyperplasia, for the viewpoint of "prostatic hyperplasia is a disease" corresponding to the target named entity, the contents of two arguments corresponding to the viewpoint extracted from the knowledge text related to the definition of the target named entity may be: "prostatic hyperplasia" refers to a noncancerous abnormal increase in prostate tissue in men, which abnormal increase may cause the urethra around the prostate to be compressed, causing a series of symptoms such as frequent urination, urgent urination, etc., which suggests that it is a disease "," prostatic hyperplasia may also cause urinary tract infections and bladder problems such as urethral stricture, vesical stones, etc., which further proves that it is a disease).

In some embodiments, the large language model may be made to evaluate the correctness of the perspective based on the arguments corresponding thereto from a number of perspectives by setting the prompt text: sufficiency, logical rationality, fact compliance, subject compliance, comprehensive.

Where sufficiency refers to a correct perspective having a sufficient number of arguments to support that perspective.

Logically reasonable means that a correct view should be logically reasonable and that there should be no contradiction, paradox or paradox situations. It should be able to be derived by a clear mental and inferential process, able to withstand the near common sense and logical inspection.

The fact follows that every argument referring to a correct point of view must be extracted from the text given, rather than a subjective hypothesis or personal point of view.

Principal coincidence refers to the fact that each statement expressing a correct perspective must be consistent with the principal of that perspective.

Comprehensive refers to the situation where a correct perspective should be able to adequately take into account the relevant factors and aspects. It should not be too monolithic or biased, but rather should be weighed and judged against a variety of factors, angles and benefits.

In some embodiments, to improve accuracy of the evaluation result, the accuracy of each viewpoint may be evaluated by a large language model based on these arguments and the knowledge text, to obtain the target viewpoint.

In some embodiments, when inputting the at least one viewpoint and the knowledge text into the large language model, extracting, from the knowledge text, arguments corresponding to each of the at least one viewpoint by the large language model, and further based on the arguments, evaluating the accuracy of each of the viewpoints to obtain a target viewpoint with the highest accuracy of the at least one viewpoint, each of the at least one viewpoint, the knowledge text, and a plurality of different prompt texts (which may be referred to as second class prompt texts) may be specifically input into the large language model, extracting, from the knowledge text, arguments corresponding to each of the at least one viewpoint by the large language model under the guidance of each of the second class prompt texts, and further based on the arguments, evaluating the accuracy of each of the viewpoints to obtain the evaluation result corresponding to each of the second class prompt texts.

In a specific implementation, first, a prompt text for exciting the large language model to extract arguments corresponding to each of the at least one viewpoint from the knowledge text may be constructed based on the at least one viewpoint and the knowledge text, and the constructed prompt text may be input into the large language model, so that the large language model may extract arguments corresponding to each of the at least one viewpoint from the knowledge text; then, for any one of the plurality of different second-type cued texts, a cued text for exciting the large language model to evaluate the accuracy of each of the at least one viewpoint based on the arguments may be further constructed based on the at least one viewpoint, the arguments and the second-type cued text, and the constructed cued text is further input into the large language model, so that the large language model evaluates the accuracy of each of the viewpoints based on the arguments under the guidance of the second-type cued text, and an evaluation result corresponding to the second-type cued text is obtained.

It should be noted that, under the guidance of different prompt texts, the large language model performs the deductive evaluation task based on the dialect based on the same perspective and the arguments, and the obtained evaluation results may be the same or different. Therefore, in the case where the evaluation results corresponding to the respective second-type presentation texts are obtained, it is possible to determine the number of times each of the at least one point of view is evaluated as the point of view having the highest degree of accuracy, and to determine the point of view having the highest degree of accuracy as the target point of view, based on the evaluation results.

In practical application, since the dialect problem is a problem without a fixed solution, the large language model performs the deductive evaluation task based on the dialect based on the same viewpoint and arguments under the guidance of the same prompt text, and the obtained evaluation results may also be different. Therefore, the at least one viewpoint and the knowledge text may be input into the large language model, and the following steps may be performed by the large language model a plurality of times: and extracting arguments corresponding to each viewpoint in the at least one viewpoint from the knowledge text, and further based on the arguments, evaluating the accuracy of each viewpoint to obtain the evaluation result. In this way, a plurality of evaluation results can also be obtained, so that it is possible to determine the number of times each of the at least one point of view is evaluated as the point of view having the highest degree of accuracy, and to determine the point of view having the highest degree of accuracy as the above-described target point of view, based on these evaluation results.

Step 210: and determining the candidate entity category indicated by the target viewpoint as the entity category corresponding to the target named entity.

In this embodiment, when the target viewpoint is determined, the candidate entity class indicated by the target viewpoint may be determined as the entity class corresponding to the target named entity. Thus, the entity class corresponding to each named entity identified from the original text can be finally determined through deductive evaluation based on debate.

The application also provides an embodiment of the device corresponding to the embodiment of the method.

Referring to fig. 5, fig. 5 is a schematic view illustrating a structure of an apparatus according to an exemplary embodiment of the present application. At the hardware level, the device includes a processor 502, an internal bus 504, a network interface 506, a memory 508, and a non-volatile storage 510, although other hardware may be included as desired. One or more embodiments of the application may be implemented in a software-based manner, such as by the processor 502 reading a corresponding computer program from the non-volatile storage 510 into the memory 508 and then running. Of course, in addition to software implementation, one or more embodiments of the present application do not exclude other implementation, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following process flows is not limited to each logic module, but may also be hardware or a logic device.

Referring to fig. 6, fig. 6 is a block diagram illustrating a named entity recognition apparatus based on a large language model according to an exemplary embodiment of the present application.

The named entity recognition device based on the large language model can be applied to the equipment shown in fig. 5 to realize the technical scheme of the application. The device comprises:

The recognition module 602 inputs an original text to be recognized, a preset candidate entity class set and each first type of prompt text in a plurality of different first types of prompt texts into a large language model, and carries out named entity recognition on the original text based on the candidate entity class set under the guidance of each first type of prompt text by the large language model to obtain a named entity recognition result corresponding to each first type of prompt text;

A conversion module 604, configured to determine, based on the named entity recognition result, each target named entity in the original text and at least one candidate entity category in the candidate entity category set corresponding to the target named entity, and convert the target named entity and at least one candidate entity category corresponding to the target named entity into at least one point of view corresponding to the target named entity; wherein the views are used for indicating entity categories corresponding to named entities;

an obtaining module 606, configured to obtain knowledge text related to the definition of the target named entity;

An evaluation module 608, configured to input the at least one viewpoint and the knowledge text into a large language model, extract, from the knowledge text, arguments corresponding to respective viewpoints of the at least one viewpoint by the large language model, and evaluate, further based on the arguments, the accuracy of the respective viewpoints to obtain a target viewpoint having a highest accuracy of the at least one viewpoint;

and a determining module 610, configured to determine the candidate entity category indicated by the target viewpoint as an entity category corresponding to the target named entity.

In some embodiments, the original text is medically relevant text; the candidate entity category is an entity category corresponding to the medical named entity.

In some embodiments, the named entity identification is a zero sample named entity identification.

In some embodiments, the original text comprises at least one text passage;

inputting each first-type prompt text in an original text to be recognized, a candidate entity class set and a plurality of different first-type prompt texts into a large language model, and performing named entity recognition on the original text based on the candidate entity class set under the guidance of each first-type prompt text by the large language model to obtain a named entity recognition result corresponding to each first-type prompt text, wherein the named entity recognition method comprises the following steps:

And inputting each text paragraph, a candidate entity class set and each first type of prompt texts in a plurality of different first types of prompt texts contained in the original text to be identified into a large language model, and carrying out named entity identification on each text paragraph under the guidance of each first type of prompt text by using the large language model to obtain named entity identification results corresponding to each first type of prompt text and each text paragraph.

In some embodiments, the determining, based on the named entity recognition result, at least one candidate entity category of each target named entity and its corresponding set of candidate entity categories in the original text includes:

Determining the number of times that each target named entity in the original text is identified as each candidate entity class in the candidate entity class set based on the named entity identification result;

And determining the candidate entity category with the preset number of the largest number of times identified as the candidate entity category as at least one candidate entity category corresponding to the target named entity.

And determining the candidate entity category, the number of which is identified to reach a preset threshold, as at least one candidate entity category corresponding to the target named entity.

In some embodiments, the candidate entity categories in the set of candidate entity categories are divided into a resolvable candidate entity category and a non-resolvable candidate entity category;

The determining, based on the named entity recognition result, at least one candidate entity category in the set of candidate entity categories corresponding to each target named entity in the original text, and converting the target named entity and the at least one candidate entity category corresponding to the target named entity into at least one view corresponding to the target named entity includes:

And determining at least one dialectical candidate entity category in each target named entity and the corresponding candidate entity category set in the original text based on the named entity identification result, and converting the target named entity and the corresponding at least one dialectical candidate entity category into at least one view corresponding to the target named entity.

In some embodiments, the obtaining knowledge text related to the definition of the target named entity includes:

Inputting the target named entity into a large language model, and generating knowledge text related to the definition of the target named entity by the large language model.

In some embodiments, the inputting the at least one point of view and the knowledge text into a large language model, extracting, from the knowledge text, arguments corresponding to each of the at least one point of view by the large language model, and further based on the arguments, evaluating the accuracy of each of the points of view, resulting in a target point of view with the highest accuracy of the at least one point of view, includes:

Inputting each second-class prompt text of the at least one viewpoint, the knowledge text and a plurality of different second-class prompt texts into the large language model, extracting arguments corresponding to each viewpoint of the at least one viewpoint from the knowledge text under the guidance of each second-class prompt text by the large language model, and further evaluating the accuracy of each viewpoint based on the arguments to obtain evaluation results corresponding to each second-class prompt text;

Based on the evaluation result, the number of times each of the at least one viewpoint is evaluated as the viewpoint having the highest degree of accuracy is determined, and the viewpoint having the highest number of times the viewpoint having the highest degree of accuracy is evaluated as the target viewpoint having the highest degree of accuracy among the at least one viewpoint is determined.

For the device embodiments, they essentially correspond to the method embodiments, so that reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the technical scheme of the application.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing describes certain embodiments of the present application. Other embodiments are within the scope of the application. In some cases, the acts or steps recited in the present application may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The terminology used in the one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the application. The singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term "and/or" refers to and encompasses any or all possible combinations of one or more of the associated listed items.

The description of the terms "one embodiment," "some embodiments," "example," "specific example," or "one implementation" and the like as used in connection with one or more embodiments of the present application mean that a particular feature or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. The schematic descriptions of these terms are not necessarily directed to the same embodiment. Furthermore, the particular features or characteristics described may be combined in any suitable manner in one or more embodiments of the application. Furthermore, different embodiments, as well as specific features or characteristics of different embodiments, may be combined without contradiction.

It should be understood that while the terms first, second, third, etc. may be used in one or more embodiments of the application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

The foregoing description of the preferred embodiment(s) of the application is not intended to limit the embodiment(s) of the application, but is to be accorded the widest scope consistent with the principles and spirit of the embodiment(s) of the application.

The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.

Claims

1. A named entity recognition method based on a large language model, the method comprising:

Inputting an original text to be identified, a preset candidate entity class set and each first type of prompt text in a plurality of different first types of prompt texts into a large language model, and carrying out named entity identification on the original text based on the candidate entity class set under the guidance of each first type of prompt text by the large language model to obtain named entity identification results corresponding to each first type of prompt text; the first type prompt text is a prompt text template for exciting the large language model to execute a named entity recognition task;

Acquiring knowledge texts related to the definition of the target named entity;

2. The method of claim 1, the original text being medically relevant text; the candidate entity category is an entity category corresponding to the medical named entity.

3. The method of claim 1, the named entity identification being a zero sample named entity identification.

4. The method of claim 1, the original text comprising at least one text passage;

5. The method of claim 1, the determining at least one candidate entity category of each target named entity and its corresponding set of candidate entity categories in the original text based on the named entity recognition result, comprising:

6. The method of claim 1, the determining at least one candidate entity category of each target named entity and its corresponding set of candidate entity categories in the original text based on the named entity recognition result, comprising:

7. The method of claim 1, the candidate entity categories in the set of candidate entity categories being divided into a resolvable candidate entity category and an irreconcilable candidate entity category;

8. The method of claim 1, the obtaining knowledge text related to the definition of the target named entity, comprising:

9. The method of claim 1, the inputting the at least one point of view and the knowledge text into a large language model, extracting, from the knowledge text, arguments corresponding to respective points of view from the large language model, and further evaluating the accuracy of the respective points of view based on the arguments, resulting in a target point of view with highest accuracy from the at least one point of view, comprising:

Inputting each second-class prompt text of the at least one viewpoint, the knowledge text and a plurality of different second-class prompt texts into the large language model, extracting arguments corresponding to each viewpoint of the at least one viewpoint from the knowledge text under the guidance of each second-class prompt text by the large language model, and further evaluating the accuracy of each viewpoint based on the arguments to obtain evaluation results corresponding to each second-class prompt text; the second type of prompt text is a prompt text template for exciting the large language model to execute a deduction evaluation task based on debate;

10. A named entity recognition device based on a large language model, the device comprising:

The recognition module is used for inputting an original text to be recognized, a preset candidate entity class set and each first type of prompt text in a plurality of different first types of prompt texts into a large language model, and carrying out named entity recognition on the original text based on the candidate entity class set under the guidance of each first type of prompt text by the large language model to obtain a named entity recognition result corresponding to each first type of prompt text; the first type prompt text is a prompt text template for exciting the large language model to execute a named entity recognition task;

11. An electronic device, comprising:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to implement the method of any one of claims 1 to 9 by executing the executable instructions.

12. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any of claims 1 to 9.