CN116401339B

CN116401339B - Data processing method, device, electronic equipment, medium and program product

Info

Publication number: CN116401339B
Application number: CN202310668213.0A
Authority: CN
Inventors: 吴甜; 黄金凤; 姜文斌; 陆超; 徐童
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2024-09-06
Anticipated expiration: 2043-06-07
Also published as: CN116401339A

Abstract

The disclosure provides a data processing method, a data processing device, electronic equipment, a medium and a program product, relates to the technical field of data processing, in particular to a knowledge graph technology, and specifically relates to a data processing task for realizing information by using a large language model. The implementation scheme is as follows: determining first information to be verified and at least one item of second information related to the first information; processing the first information and the at least one piece of second information based on each evaluation dimension with a trained natural language generation model to obtain evaluation information for each of the plurality of evaluation dimensions; determining a verification result of the first information based on the first information, the at least one second information, and the evaluation information, wherein the verification result indicates authenticity of the first information.

Description

Data processing method, device, electronic equipment, medium and program product

Technical Field

The present disclosure relates to the field of data processing technology, and in particular, to a knowledge graph technology, and more particularly, to a data processing method, apparatus, electronic device, computer readable storage medium, and computer program product.

Background

The fact verification task refers to retrieving relevant knowledge from a large text corpus as evidence, and verifying the authenticity of the statement by using the evidence.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The present disclosure provides a data processing method, apparatus, electronic device, computer readable storage medium, and computer program product.

According to an aspect of the present disclosure, there is provided a data processing method including: determining first information to be verified and at least one item of second information related to the first information; processing the first information and the at least one piece of second information based on each evaluation dimension with a trained natural language generation model to obtain evaluation information for each of the plurality of evaluation dimensions; determining a verification result of the first information based on the first information, the at least one second information, and the evaluation information, wherein the verification result indicates authenticity of the first information.

According to another aspect of the present disclosure, there is provided a data processing apparatus including: an information acquisition unit configured to determine first information to be authenticated and at least one item of second information related to the first information; an evaluation unit configured to process the first information and the at least one item of second information based on respective evaluation dimensions with a trained natural language generation model to obtain evaluation information for the plurality of evaluation dimensions, respectively; and a verification unit configured to determine a verification result of the first information based on the first information, the at least one item of second information, and the evaluation information, wherein the verification result indicates authenticity of the first information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aforementioned method.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the aforementioned method.

According to another aspect of the disclosure, a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the aforementioned method.

According to one or more embodiments of the present disclosure, the first information and the second information may be evaluated from multiple dimensions using a trained natural language generation model, resulting in a verification result that indicates the authenticity of the first information. By using the method, the first information and the second information can be evaluated from different dimensions and by using external knowledge by means of the general problem solving capability of the trained natural language generation model, and the evaluation result is comprehensively given out based on the result obtained by the evaluation.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates an exemplary flow chart of a data processing method according to an embodiment of the present disclosure;

FIG. 3 illustrates an exemplary block diagram of a data processing technique according to an embodiment of the present disclosure;

FIG. 4 illustrates an exemplary block diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another element. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable execution of methods according to embodiments of the present disclosure.

In some embodiments, server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may use client devices 101, 102, 103, 104, 105, and/or 106 to enter information and obtain verification results for the information. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.

Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays (such as smart glasses) and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101, 102, 103, 104, 105, and/or 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and/or 106.

In some implementations, the server 120 may be a server of a distributed system or a server that incorporates a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The cloud server is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and Virtual special server (VPS PRIVATE SERVER) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and video files. Database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. Database 130 may be of different types. In some embodiments, the database used by server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

According to the related art, the fact checking method generally adopts only simple literal or semantic matching operation, and has no interpretability. The effect is limited to a certain extent and it requires training on a large scale of supervisory corpus to achieve practical performance.

According to another related art, prooFVer verification algorithms use a seq2seq model to generate natural logic based reasoning as proof. In particular, the technique constructs a proof by combining spam of multiple evidence sentences with entity references that link the sentences. The authenticity of the statement is verified by following certain derivation rules on this proof. ProoFVer in this way, firstly, better interpretability is provided, and the effect is greatly improved compared with the prior method of performing the fact verification by using only evidence.

However, in the related art, information is mined only from the data itself, and training based on a large-scale supervision corpus is required.

In order to improve the decision-making effect of the fact verification task, the present disclosure provides a new data processing method.

Fig. 2 shows an exemplary flowchart of a data processing method according to an embodiment of the present disclosure.

In step S202, first information to be authenticated and at least one item of second information related to the first information are determined.

In step S204, the first information and the at least one piece of second information are processed with the trained natural language generation model based on the respective evaluation dimensions to obtain evaluation information for the plurality of evaluation dimensions, respectively.

In step S206, a verification result of the first information is determined based on the first information, the at least one item of second information, and the evaluation information, wherein the verification result indicates the authenticity of the first information.

By using the data processing method provided by the embodiment of the disclosure, the first information and the second information can be evaluated from multiple dimensions by using the trained natural language generation model, so that a verification result indicating the authenticity of the first information is obtained. By using the method, the first information and the second information can be evaluated from different dimensions and by using external knowledge by means of the general problem solving capability of the trained natural language generation model, and the evaluation result is comprehensively given out based on the result obtained by the evaluation.

The principles of the present disclosure will be described in detail below.

In step S202, first information to be verified and at least one item of second information related to the first information may be determined.

Wherein the first information may be declaration information. The second information may be evidence information. The content of the evidence information is related to the content of the declaration information. For example, the first information may be a statement that "perspiration is not equal to burning fat". For example, the evidence information may be "the only way fat has disappeared is to increase your heart rate by aerobic exercise. The first information and the second information may be read from a database storing information or obtained from the network by means of user input. The specific manner of acquiring the first information and the second information is not limited in the embodiments of the present disclosure.

In step S204, the first information and the at least one second information may be processed with the trained natural language generation model based on the respective evaluation dimensions to obtain evaluation information for the plurality of evaluation dimensions, respectively.

The natural language generation model may be a large language model (Large Language Model, LLM, also referred to as a large model). The large language model refers to a natural language generation model trained by a large-scale data set, such as a generation-type pre-training transducer model. A large-scale data set as referred to herein refers to a data set having a training data amount of at least the order of GB. A natural language generative model trained in this manner has parameter sets on at least billions scales. In some examples, the natural language generation model may be a question-answer model, such as a generative dialog model. The natural language generation model trained by the large-scale data set has certain general problem solving capability, and can understand and judge the information processed currently by means of external knowledge without marking corpus for a fact verification task and training the model by using the marked corpus.

In some embodiments, the plurality of evaluation dimensions may include one or more of the following: text semantics, text logic, proof force, and a relationship between the first information and each of the second information. By understanding and judging the contents of the first information and the second information from different evaluation dimensions, the authenticity of the first information can be comprehensively verified from a plurality of angles.

Text semantics may include an interpretation of at least one word in the first information and/or the second information. By interpreting at least one word in the first information and/or the second information and giving a corresponding paraphrasing, it is possible to help understand the text meaning of the whole declaration information or the evidence information.

The text logic may include internal consistency of the first information and/or the second information. For example, language logic, information rationality, validity, and the like of the first information and/or the second information may be evaluated to obtain an evaluation result of whether the content of the first information and/or the second information has internal consistency. The internal consistency can embody the reliability of the information and influence the verification result of the information authenticity.

The proving force may include at least one of credibility, integrity, accuracy, timeliness, bias, and integration result of the second information. By evaluating the proving power of the second information, it can be evaluated whether the second information can provide the capability for verifying the authenticity of the first information.

The relationship between the first information and each of the second information may also be evaluated. For example, the relationship between the first information and the second information may be one of implication, equivalence, conflict, and neutrality. By evaluating the relationship between the first information and each of the second information, the authenticity of the content of the first information can be evaluated based on the content of the second information.

In some embodiments, the evaluation information for the multiple dimensions obtained in step S204 may include at least one of: knowledge information for each evaluation dimension; and preliminary verification information of the authenticity of the first information for each evaluation dimension.

In some examples, the first information to be verified and/or the second information related to the first information may be input into a natural language generation model, and knowledge information for each evaluation dimension may be generated using the natural language generation model. Knowledge information of each evaluation dimension describes specific content of an evaluation result of each evaluation dimension in a natural language manner. In some examples, the amount of information that prompts the control of replies given by the natural language generation model, such as controlling the number of words of the answer, may be added to the input.

One example of knowledge information determining the respective dimensions is given below.

For the evaluation dimension of the text semantics of the first information, a "please give a natural language generating model" which words in the following statement need to be interpreted and give a corresponding interpretation, and the interpretation of these words helps people to understand the meaning of the whole statement. The answer is as clear and concise as possible, please control within 100 words, and the given text is not repeated as much as possible. Statement: perspiration is not equal to fat burning ", and a reply of the natural language generation model is determined as knowledge information of text semantics for the first information.

For the evaluation dimension of the text logic of the first information, "please evaluate the internal consistency of the following statement, including the aspects of the logic of the statement, the rationality and the legality of the information, etc., can be input to the natural language generating model. In evaluating the logic of the claim, you should consider only the contents of the claim itself, without resorting to external common sense knowledge. In making a reasonable and legal judgment of the information on a claim, you should consider giving a scientific judgment about this claim by means of external knowledge, or based on what you have already been. The answer is as clear and concise as possible, please control within 100 words, and the given text is not repeated as much as possible. Statement: perspiration is not equal to fat burning ", and a reply of the natural language generation model is determined as knowledge information of the text logic for the first information.

For the evaluation dimension of the proving power of the second information, "the credibility, the integrity, the accuracy, the timeliness and the bias degree of each evidence in the following evidence set for the statement authenticity verification can be respectively evaluated" input into the natural language generation model. Without giving relationships between different pieces of evidence, without cross-understanding multiple pieces of evidence, the judgment of each piece of evidence may be made from the statement (each piece of evidence is present only to help verify the authenticity of the statement). The judgment result given by you is required to be readable naturally, and accords with the thinking logic of people. The answer is as clear and concise as possible, please control within 300 words, and the given text is not repeated as much as possible. Evidence: the only way fat disappears is to increase your heart rate "by aerobic exercise and determine the natural language generation model's reply as knowledge information of the proving power (credibility, integrity, accuracy, timeliness, bias of the second information) for the second information.

The evaluation dimension of the proof force for the second information may also give the set of evidence shown below to the natural language generation model input. The evidence sets are integrated and an understanding and judgment of the integrated content is given. Requiring you not simply splice in the process of integration should meet the human thinking logic. And the given judgment result is naturally readable. The answer is as clear and concise as possible, please control within 300 words, and the given text is not repeated as much as possible. Evidence: the only method of fat disappearance is to increase your heart rate "by aerobic exercise, and to determine the return of the natural language generation model as knowledge information of the proving power for the second information (the integration result of the second information).

For the evaluation dimension of the relationship between the first information and each of the second information, a natural language generative model input "may be given a declaration and evidence set shown below. Please evaluate the relationship between the statement and each evidence in the evidence set, and determine which of implication, equivalence, conflict, and neutrality? And please give a conclusion as to whether you think each proof is supporting, against or under-informative for verifying the authenticity of the statement. The answer is as clear and concise as possible, please control within 300 words, and the given text is not repeated as much as possible. Statement: perspiration is not equal to fat burning. Evidence: the only way to disappear fat is to increase your heart rate "by aerobic exercise and to determine a reply of the natural language generated model as knowledge information for the relationship between the first information and each of the second information.

In other examples, first information to be verified and/or second information related to the first information may be input into a natural language generation model, and preliminary verification information of authenticity of the first information for each evaluation dimension may be generated using the natural language generation model. In this example, the length of the evaluation result of the natural language output is controlled in the form of options. In an example, the natural language model outputs preliminary verification information for verification of the authenticity of the claim from different evaluation dimensions.

An example of preliminary verification information determining respective dimensions is given below. In some implementations, knowledge information for each evaluation dimension may be input to a natural language generation model to obtain preliminary verification information. In some examples, knowledge information for each evaluation dimension may be generated by a natural language generation model.

For the evaluation dimension of text semantics, "known verification of the authenticity of a claim can be subdivided into the following five tags" can be input to the natural language generation model: n is completely correct: based on the statement of reliable evidence and facts, it is widely recognized as correct. N is mostly correct: these statements are primarily correct, but there may be some small uncertainty or dispute. N uncertainty: in these statements, neither correctness nor mistakes are determinable, typically because of the lack of sufficient information or evidence. N most errors: these statements are primarily erroneous, but there may be some minor correctness or disputes. N total error: statement based on reliable evidence and facts is widely regarded as erroneous. N/n is from dimension 1: from the perspective of "text understanding of statement and evidence, respectively," which of these five labels you consider this statement? Please select an answer from these five tags, prohibiting your interpretation process from being given. Statement: perspiration is not equal to fat burning. Evidence: the only way fat disappears is by aerobic exercise to increase your heart rate. The natural language generation model will give preliminary verification information for declarative authenticity for the text semantic aspect based on the input information, including a selected one of the five tags. In some examples, the information entered into the natural language generation model may also include knowledge information of the text semantic dimension. The input knowledge information may be user-defined or may be generated using a natural language generation model.

For the evaluation dimension of text logic, "known verification of the authenticity of a claim can be subdivided into the following five tags" can be input to the natural language generation model: n is completely correct: based on the statement of reliable evidence and facts, it is widely recognized as correct. N is mostly correct: these statements are primarily correct, but there may be some small uncertainty or dispute. N uncertainty: in these statements, neither correctness nor mistakes are determinable, typically because of the lack of sufficient information or evidence. N most errors: these statements are primarily erroneous, but there may be some minor correctness or disputes. N total error: statement based on reliable evidence and facts is widely regarded as erroneous. N/n is from dimension 2: "rationality and legitimacy assessment of information for declaration and evidence, respectively. What should you consider that declarations and evidence are scientifically judged by external knowledge, or based on what you have knowledge, "what are you think that this declaration is these five tags from the perspective? Please select an answer from these five tags, prohibiting your interpretation process from being given. Statement: perspiration is not equal to fat burning. Evidence: the only way fat disappears is by aerobic exercise to increase your heart rate. The natural language generation model will give preliminary verification information for the declaration of authenticity for the text logic aspect based on the input information, including a selected one of the five tags. In some examples, the information entered into the natural language generation model may also include knowledge information of the text logical dimension. The input knowledge information may be user-defined or may be generated using a natural language generation model.

For the proving force evaluation dimension of text, "known authenticity verification of declarations can be subdivided into the following five tags" can be input to the natural language generative model: n is completely correct: based on the statement of reliable evidence and facts, it is widely recognized as correct. N is mostly correct: these statements are primarily correct, but there may be some small uncertainty or dispute. N uncertainty: in these statements, neither correctness nor mistakes are determinable, typically because of the lack of sufficient information or evidence. N most errors: these statements are primarily erroneous, but there may be some minor correctness or disputes. N total error: statement based on reliable evidence and facts is widely regarded as erroneous. N/n is from dimension 3: "integrate information of all evidence in evidence set, ask you not to simply splice in the process of integration, should you consider this statement to be what of these five tags? Please select an answer from these five tags, prohibiting your interpretation process from being given. Statement: perspiration is not equal to fat burning. Evidence: the only way fat disappears is by aerobic exercise to increase your heart rate. The natural language generation model will give preliminary verification information for the declaration of authenticity for the aspect of proof of evidence based on the input information, including a selected one of the five tags. In some examples, the information entered into the natural language generation model may also include knowledge information proving the force dimension. The input knowledge information may be user-defined or may be generated using a natural language generation model.

For the evaluation dimension of the relationship between the first information and the second information, "known verification of the authenticity of the claim can be subdivided into the following five tags" can be input to the natural language generation model: n is completely correct: based on the statement of reliable evidence and facts, it is widely recognized as correct. N is mostly correct: these statements are primarily correct, but there may be some small uncertainty or dispute. N uncertainty: in these statements, neither correctness nor mistakes are determinable, typically because of the lack of sufficient information or evidence. N most errors: these statements are primarily erroneous, but there may be some minor correctness or disputes. N total error: statement based on reliable evidence and facts is widely regarded as erroneous. N/n is from dimension 4: from the perspective of "evaluate the relationship between the claim and each evidence, which is strong, weak, neutral, strong conflict, weak conflict," which is you think that this claim is which of these five tags? Please select an answer from these five tags, prohibiting your interpretation process from being given. Statement: perspiration is not equal to fat burning. Evidence: the only way fat disappears is by aerobic exercise to increase your heart rate. The natural language generation model will give preliminary verification information for the authenticity of the claim, including a selected one of the five tags, based on the above-described input information for the aspect of the relationship between the claim and the evidence. In some examples, the information entered into the natural language generation model may also include knowledge information of a relationship dimension between the declaration and the evidence. The input knowledge information may be user-defined or may be generated using a natural language generation model.

In step S206, a verification result of the first information may be determined based on the first information, the at least one item of second information, and the evaluation information, wherein the verification result indicates authenticity of the first information.

In some embodiments, information to be verified for the first information may be determined based on the first information, the at least one item of second information, and the evaluation information, and a verification result may be determined based on the information to be verified. In some examples, the information to be verified may be obtained by directly concatenating or combining the first information, the at least one item of second information, and the evaluation information. In other examples, the first information, the at least one piece of second information, and the evaluation information may be integrated, pruned, deformed, and the like, and then the information to be verified may be generated.

In some implementations, the information to be verified may be classified to obtain a verification result. The information to be verified can be input into a trained classification model, and the verification result is determined based on the classification result of the classification model. The information to be verified can be obtained by directly splicing the first information, at least one piece of second information and the evaluation information. The classification model may be any classification model commonly used in the art such as logistic regression, support vector machines, convolutional networks, etc. The classification model may be configured to output three classification results, including "correct", "incorrect", and "uncertain". The label of the classification result may correspond to the determined verification result.

In other implementations, the information to be verified may be input into a natural language generation model to obtain a verification result. In some examples, the information to be verified may be directly input into a natural language generation model to obtain the verification result. For example, in the case where the natural language generation model does not generate knowledge information of each evaluation dimension for each evaluation dimension, but only gives an evaluation result in the form of an option, information to be verified may be directly input into the natural language generation model to obtain a verification result. In other examples, the natural language generation model may be fine-tuned in a manner based on a small number of hints (few-shot prompting) in order to provide the natural language generation model with the ability to infer verification results from the information to be verified. The sample information may be used to determine the content of a small number of cues. Knowledge information for each evaluation dimension is generated for each evaluation dimension in the natural language generation model, and the natural language generation model can be fine-tuned by utilizing a specific reasoning process.

The following steps may be performed before inputting the information to be verified into the natural language generation model: determining a plurality of sample evaluation information for a plurality of evaluation dimensions for the first sample information based on the first sample information and the at least one second sample information; a sample reasoning process for verifying the first sample information based on the sample evaluation information; the sample reasoning process is input into the natural language generation model.

In some examples, first sample information, at least one item of second sample information related to the first sample information may be determined.

For example, the first sample information may include a sample declaration: perspiration is equal to burning fat.

The at least one second sample information may include a sample evidence set:

evidence 1: but this is not the case, and there is no necessarily a relationship between how much perspiration is produced and the amount of fat burned;

Evidence 2: fat is the calories stored in the body, the only way to make them disappear is to increase your heart rate by aerobic exercise, putting oneself in a state of caloric deficit, rather than measuring how much sweat is emitted;

Evidence 3: so perspiration can not prove that fat is burning, and perspiration can not show that the fat burning effect is good;

Evidence 4: indeed, there is much perspiration and weight loss, but all that is lost is in the body is water rather than fat.

Based on the aforementioned plurality of evaluation dimensions, a plurality of sample evaluation information for the plurality of evaluation dimensions for the first sample information may be determined based on the first sample information and the at least one second sample information. For example, the text of the sample declaration and the sample evidence set may be semantically understood, the credibility, integrity, accuracy, timeliness, and bias of each evidence in the sample evidence set may be evaluated, and the relationship of each evidence in the sample declaration and the sample evidence set may be evaluated.

A sample inference process may be used to verify the first sample information based on the sample evaluation information. In an example, the authenticity of a sample claim can be validated inferentially based on the following steps:

The first step: understanding declarations and evidence. Statement that "perspiration is equal to fat burning" and evidence 1, 2, 3 and 4 respectively provide information related to statement that is negated;

and a second step of: and judging the evidence. According to the credibility, the integrity, the accuracy, the timeliness and the bias degree of the evidence, the four evidences are obtained to have higher credibility and accuracy without obvious bias;

and a third step of: the relationship between the declaration and the evidence is analyzed. A conflict relationship exists between the statement and the evidence 1,2,3,4, all of which refute the statement;

In summary, the evidence provides information related to the statement, and the statement is refuted, and the evidence has higher credibility and accuracy. We can conclude that: the objection states that "perspiration is equal to fat burning".

The declaration, evidence set, and sample reasoning process described above may be input into the natural language generation model to fine tune the natural language generation model so that the natural language generation model learns the ability to conduct plausible verification reasoning based on the declaration, evidence, and assessment information.

FIG. 3 illustrates an exemplary block diagram of a data processing technique according to an embodiment of the present disclosure.

As shown in fig. 3, in block 310, first information to be verified (claim 311) and second information for verifying the first information (proof 312) may be determined.

At block 320, input information for obtaining assessment information from a plurality of assessment dimensions may be determined. The input information can cause the trained natural language generation model to output an evaluation result according to requirements in the input information.

At block 330, the input information may be processed using a natural language generation model to obtain evaluation results 340 for each evaluation dimension. The evaluation result may be generated using step S204 described in connection with fig. 2.

At block 350, a verification result for the first information may be generated based on the first information, the second information, and the evaluation result. The verification result may be generated using step S206 described in connection with fig. 2.

Fig. 4 illustrates an exemplary block diagram of a data processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 4, the data processing apparatus 400 may include an information acquisition unit 410, an evaluation unit 420, and a verification unit 430.

The information acquisition unit 410 may be configured to determine first information to be authenticated and at least one item of second information related to the first information.

The evaluation unit 420 may be configured to process the first information and the at least one second information based on the respective evaluation dimensions with a trained natural language generation model to obtain evaluation information for the plurality of evaluation dimensions, respectively.

The verification unit 430 may be configured to determine a verification result of the first information based on the first information, the at least one item of second information, and the evaluation information, wherein the verification result indicates authenticity of the first information.

In some embodiments, the plurality of evaluation dimensions may include one or more of the following: text semantics, text logic, proof of second information, and relationships between the first information and each of the second information.

In some embodiments, the text semantics may include an interpretation for at least one word in the first information and/or the second information.

In some embodiments, the text logic may include internal consistency of the first information and/or the second information.

In some embodiments, the proving force may include at least one of credibility, integrity, accuracy, timeliness, bias, multiple text integration results of the second information.

In some embodiments, the evaluation information for the plurality of evaluation dimensions may include at least one of: knowledge information for each evaluation dimension; and preliminary verification information of the authenticity of the first information for each evaluation dimension.

In some embodiments, the verification unit may be configured to: determining information to be verified for the first information based on the first information, the at least one item of second information, and the evaluation information; and determining a verification result based on the information to be verified.

In some embodiments, determining the verification result based on the information to be verified may include: classifying the information to be verified to obtain a verification result.

In some embodiments, determining the verification result based on the information to be verified may include: inputting the information to be verified into a natural language generation model to obtain a verification result.

In some embodiments, the data processing apparatus may further comprise a training unit, which may be configured to: determining first sample information, at least one item of second sample information related to the first sample information; determining a plurality of sample evaluation information for a plurality of evaluation dimensions for the first sample information based on the first sample information and the at least one second sample information; a sample reasoning process for verifying the first sample information based on the sample evaluation information; the sample reasoning process is input into the natural language generation model.

In some embodiments, the natural language generation model may be a question-answer model.

In some embodiments, the first information may be declarative information and the at least one second information may be evidence information.

It should be appreciated that the various modules or units of the apparatus 400 shown in fig. 4 may correspond to the various steps in the method 200 described with reference to fig. 2. Thus, the operations, features, and advantages described above with respect to method 200 are equally applicable to apparatus 400 and the modules and units that it comprises, and certain operations, features, and advantages are not described here in detail for brevity.

Although specific functions are discussed above with reference to specific modules, it should be noted that the functions of the various units discussed herein may be divided into multiple units and/or at least some of the functions of the multiple units may be combined into a single unit.

It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various elements described above with respect to fig. 4 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the units may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these units may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the units 410-430 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip including one or more components of a Processor (e.g., a central processing unit (Central Processing Unit, CPU), microcontroller, microprocessor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), etc.), memory, one or more communication interfaces, and/or other circuitry, and may optionally execute received program code and/or include embedded firmware to perform functions.

According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data processing method according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a data processing method according to an embodiment of the present disclosure.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a data processing method according to embodiments of the present disclosure.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

Referring to fig. 5, a block diagram of an electronic device 500 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506, an output unit 507, a storage unit 508, and a communication unit 509. The input unit 506 may be any type of device capable of inputting information to the electronic device 500, the input unit 506 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 507 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 508 may include, but is not limited to, magnetic disks, optical disks. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices over a computer network such as the internet and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of method 200 described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method 200 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. A data processing method, comprising:

determining first information to be verified and at least one item of second information related to the first information;

Processing the first information and the at least one second information based on a plurality of evaluation dimensions with a trained natural language generation model to obtain respective evaluation information for each evaluation dimension, wherein the evaluation information comprises knowledge information for each evaluation dimension describing an evaluation result for each evaluation dimension of the first information and each of the second information in a natural language manner, and wherein the plurality of evaluation dimensions comprises at least two of: text semantics, text logic, proof of the second information, a relationship between the first information and each of the second information;

Determining a verification result of the first information based on the first information, the at least one second information and the evaluation information, wherein the verification result indicates the authenticity of the first information,

Wherein determining a verification result of the first information based on the first information, the at least one item of second information, and the evaluation information comprises:

Determining information to be verified for the first information based on the first information, the at least one second information, and the evaluation information;

inputting the information to be verified into the natural language generation model to obtain the verification result,

Wherein, before inputting the information to be verified into the natural language generation model, the method further comprises:

determining first sample information, at least one item of second sample information related to the first sample information;

Determining a plurality of sample evaluation information for the plurality of evaluation dimensions for the first sample information based on the first sample information and the at least one second sample information;

A sample reasoning process for verifying the first sample information based on the sample evaluation information;

inputting the first sample information, the second sample information and the sample reasoning process into the natural language generation model, so that the natural language generation model learns the capability of carrying out authenticity verification reasoning.

2. The data processing method of claim 1, wherein the relationship between the first information and each of the second information is one of implication, equivalence, conflict, and neutrality.

3. The data processing method of claim 1, wherein the text semantics include an interpretation for at least one word in the first information and/or the second information.

4. The data processing method of claim 1, wherein the text logic comprises an internal consistency of the first information and/or the second information.

5. The data processing method of claim 1, wherein the attestation effort includes at least one of credibility, integrity, accuracy, timeliness, bias, multiple text integration results of the second information.

6. The data processing method of any of claims 1-5, wherein the evaluation information for the plurality of evaluation dimensions further comprises:

preliminary verification information of the authenticity of the first information for each evaluation dimension.

7. The data processing method of any one of claims 1-5, wherein the natural language generation model is a question-answer model.

8. The data processing method according to any one of claims 1 to 5, wherein the first information is declarative information and the at least one piece of second information is evidence information.

9. A data processing apparatus comprising:

An information acquisition unit configured to determine first information to be authenticated and at least one item of second information related to the first information;

An evaluation unit configured to process the first information and the at least one item of second information based on a plurality of evaluation dimensions with a trained natural language generation model to obtain respective evaluation information for each evaluation dimension, respectively, wherein the evaluation information comprises knowledge information for each evaluation dimension describing an evaluation result for each evaluation dimension of the first information and each item of second information in a natural language manner, and wherein the plurality of evaluation dimensions comprises at least two of: text semantics, text logic, proof of the second information, a relationship between the first information and each of the second information;

A verification unit configured to determine a verification result of the first information based on the first information, the at least one item of second information, and the evaluation information, wherein the verification result indicates authenticity of the first information,

Wherein the verification unit is configured to:

Wherein the data processing apparatus further comprises a training unit configured to:

10. The data processing apparatus of claim 9, wherein the relationship between the first information and each of the second information is one of implication, equivalence, conflict, and neutrality.

11. The data processing apparatus of claim 9, wherein the text semantics include an interpretation for at least one word in the first information and/or the second information.

12. The data processing apparatus of claim 9, wherein the text logic comprises an internal consistency of the first information and/or the second information.

13. The data processing apparatus of claim 9, wherein the attestation effort includes at least one of trustworthiness, integrity, accuracy, timeliness, bias, multiple text integration results of the second information.

14. The data processing apparatus of any of claims 9-13, wherein the evaluation information for the plurality of evaluation dimensions further comprises:

15. The data processing apparatus of any of claims 9-13, wherein the natural language generation model is a question-answer model.

16. The data processing apparatus according to any one of claims 9-13, wherein the first information is declarative information and the at least one second information is evidence information.

17. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein the method comprises the steps of

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-8.