CN118332304B

CN118332304B - Method and system for evaluating artificial intelligence model

Info

Publication number: CN118332304B
Application number: CN202410758560.7A
Authority: CN
Inventors: 罗雪冰
Original assignee: Tianyun Rongchuang Data Science & Technology Beijing Co ltd
Current assignee: Tianyun Rongchuang Data Science & Technology Beijing Co ltd
Priority date: 2024-06-13
Filing date: 2024-06-13
Publication date: 2024-08-27
Anticipated expiration: 2044-06-13
Also published as: CN118332304A

Abstract

The application discloses a method and a system for evaluating an artificial intelligence model, which relate to the technical field of artificial intelligence, wherein the method for evaluating the artificial intelligence model comprises the following steps: s1: receiving an evaluation request, carrying out security verification on a request terminal according to the evaluation request, and generating a request terminal security result; if the safety result of the request end is safe, executing S2; if the safety result of the request end is unsafe, a request risk alarm is sent; s2: receiving an evaluation model to be tested, carrying out security detection on the model to be tested, and generating a model detection result; if the model detection result is safe, executing S3; if the model detection result is unsafe, sending a model risk alarm; s3: and acquiring an execution evaluation virtual machine according to the model to be evaluated, evaluating the model to be evaluated through the execution evaluation virtual machine to obtain an evaluation result, and sending the evaluation result to the request terminal. The application can improve the evaluation safety and the evaluation accuracy of the artificial intelligent model.

Description

Method and system for evaluating artificial intelligence model

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a system for evaluating an artificial intelligence model.

Background

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a technology for realizing task processing in a brain-like manner by means of big data, network computing and deep learning. An artificial intelligence model is a model for recognizing and processing contents such as images, voices and words through artificial intelligence. The model test is an important link of artificial intelligent model production, and the advantages and disadvantages of the model can be analyzed through the model test technology, so that the performance of the model is confirmed. However, the existing model test platform is adopted to test the artificial intelligent model, so as to obtain the test result of the test index corresponding to the artificial intelligent model, and the following disadvantages still exist:

(1) The existing model test platform generally does not perform security verification on the request end before receiving the artificial intelligent model to be evaluated, so that the security of the request end cannot be ensured, and the security of data sent by the request end cannot be ensured.

(2) The existing model test platform receives the artificial intelligent model to be evaluated, and before the artificial intelligent model to be evaluated is evaluated, the safety verification of the artificial intelligent model to be evaluated is usually not performed, so that the safety and the accuracy of the artificial intelligent model to be evaluated cannot be ensured.

(3) The existing model test platform generally only uses a group of test data to test the artificial intelligent model to be evaluated, so that a test result is obtained, and the accuracy of the test result cannot be ensured because the group number of the test data is limited.

Disclosure of Invention

The application aims to provide a method and a system for evaluating an artificial intelligent model, which can improve the evaluation safety and the evaluation accuracy of the artificial intelligent model.

To achieve the above object, the present application provides a method for evaluating an artificial intelligence model, comprising the steps of: s1: receiving an evaluation request, carrying out security verification on a request terminal according to the evaluation request, and generating a request terminal security result, wherein the request terminal security result is safe or unsafe; if the safety result of the request end is safe, executing S2; if the safety result of the request end is unsafe, a request risk alarm is sent; s2: receiving an evaluation model to be tested, carrying out safety detection on the model to be tested, and generating a model detection result, wherein the model detection result is safe or unsafe; if the model detection result is safe, executing S3; if the model detection result is unsafe, sending a model risk alarm; s3: the method comprises the steps of obtaining an execution evaluation virtual machine according to a model to be evaluated, evaluating the model to be evaluated through the execution evaluation virtual machine, obtaining an evaluation result, and sending the evaluation result to a request end; wherein, the evaluation result at least comprises: and outputting the precision value and the precision judgment result.

As described above, the sub-steps of receiving the evaluation request, performing security verification on the request terminal according to the evaluation request, and generating the security result of the request terminal are as follows: s11: receiving a login request or a registration request, and receiving an evaluation request after the login or registration is completed, wherein the evaluation request at least comprises: request time, request address and request end information; wherein, the request end information at least comprises: the address of the request end, the equipment model, the current use time length, a plurality of real-time safety data and the value of each real-time safety data; s12: determining the current influence weight of each piece of real-time safety data according to the equipment model and the current use time length in the request terminal information; s13: calculating a request end safety value according to the value of each real-time safety data in the request end information and the current influence weight; s14: judging the safety value of the request terminal according to a preset safety threshold value of the request terminal, and generating a safety result of the request terminal; if the request end safety value is greater than or equal to the request end safety threshold value, the generated request end safety result is safe; if the request end safety value is smaller than the request end safety threshold value, the generated request end safety result is unsafe.

As described above, the sub-steps of determining the current impact weight of each real-time security data according to the device model and the current use time length in the request end information are as follows: s121: traversing the influence weight database according to the equipment model in the request end information, and taking an influence weight table with the same influence equipment model as the equipment model in the request end information as a demand weight table; s122: traversing a plurality of security data in a demand weight table according to the plurality of real-time security data in the request end information, and taking the influence weight corresponding to the security data identical to the real-time security data as the demand influence weight; s123: analyzing a weight difference value table corresponding to each demand influence weight according to the current use time length in the request end information to obtain a current weight difference value; s124: and calculating the current influence weight according to the current weight difference value and the demand influence weight, wherein the current influence weight = demand influence weight + the current weight difference value.

As described above, the sub-steps of analyzing the weight difference table corresponding to each demand influence weight according to the current time length in the request end information to obtain the current weight difference value are as follows: s1231: sequencing a plurality of standard use time lengths and the current use time lengths according to the sequence from short time length to long time length, taking the standard use time length with the time length smaller than the current use time length as a short time length, and generating a short sequence number for each short time length; taking the standard use time length with the time length longer than the current use time length as the long time length, and generating a long sequence number for each long time length; wherein, the short sequence number corresponding to the short time length is smaller than the short sequence number corresponding to the short time length, and the long sequence number corresponding to the long time length is smaller than the long sequence number corresponding to the long time length; s1232: analyzing all short time lengths to obtain a short difference value; s1233: analyzing all the long time periods to obtain a long difference value; s1234: and calculating the current weight difference value according to the short difference value and the long difference value.

As above, the expression of the current weight difference is as follows:；

wherein, Is the firstCurrent weight difference values of the real-time security data; For short sequence number Average weight difference value corresponding to short duration of (2); Is the first Short differences in real-time security data; For short sequence number A duration value corresponding to the short duration of (2); For short sequence number A duration value corresponding to the short duration of (2);， The total number of the short sequence numbers; a duration value which is the current use duration; Is of long sequence number Average weight difference value corresponding to long time length of (2); Is of long sequence number A time length value corresponding to the long time length of the (a); Is of long sequence number A time length value corresponding to the long time length of the (a);， The total number of the long sequence numbers; Is the first A long difference of the real-time security data.

As above, the sub-steps of performing security detection on the model to be evaluated and generating the model detection result are as follows: s21: invoking a simulator, and receiving an evaluation model to be tested through the simulator; wherein the simulator is used for simulating an isolated operating environment; s22: virus detection is carried out on the model to be evaluated through a simulator, and a model detection result is generated; if no virus feature code exists in the evaluation model to be tested, the generated model detection result is safe; if the to-be-tested evaluation model has at least one virus feature code, deleting the to-be-tested evaluation model, and generating a model detection result which is unsafe.

As described above, the simulator is provided with an artificial intelligent model trained in advance for virus detection, and the model to be tested is used for virus detection through the artificial intelligent model, and a model detection result is generated.

As described above, the sub-steps of acquiring the execution evaluation virtual machine according to the model to be evaluated, and evaluating the model to be evaluated by executing the evaluation virtual machine to obtain the evaluation result are as follows: s31: performing type analysis on the model to be evaluated, and determining the model type of the model to be evaluated; s32: taking the evaluation virtual machines with the same type as the model type as primary selection equipment; s33: reading the number of tasks to be executed of each primary selection device, and selecting the primary selection device with the minimum number of tasks to be executed as an execution evaluation virtual machine; s34: and carrying out evaluation on the model to be evaluated by calling a plurality of pieces of evaluation data through the execution evaluation virtual machine to obtain an evaluation result, wherein each piece of evaluation data at least comprises: the evaluation input data and the evaluation output data.

As described above, the sub-steps of evaluating the model to be evaluated by calling a plurality of evaluation data by executing the evaluation virtual machine to obtain the evaluation result are as follows: s341: randomly generating an evaluation sequence number for each evaluation data, wherein one evaluation data corresponds to one evaluation sequence number, and the evaluation sequence numbers are sequentially increased from small to large according to the sequence of randomly generating the evaluation sequence numbers; s342: taking the evaluation data with the smallest evaluation sequence number as current evaluation data, inputting the evaluation input data of the current evaluation data into an evaluation model to be tested, executing the evaluation input data by the evaluation model to be tested, outputting a current execution result, analyzing the current execution result by using the evaluation output data of the current evaluation data to obtain a data similarity value, and executing S343; s343: judging the evaluation sequence number of the current evaluation data according to the total number of the evaluation sequence numbers, if the total number of the evaluation sequence numbers is larger than the evaluation sequence number of the current evaluation data, storing the data similarity value, eliminating the evaluation sequence number of the current evaluation data, and executing S342; if the total number of the evaluation numbers is equal to the evaluation number of the current evaluation data, storing the data similarity value, and executing S344; s344: calculating all the data similarity values to obtain an output precision value; s345: judging the output precision value through a preset precision threshold value, and generating a precision judgment result; if the output precision value is greater than or equal to the precision threshold value, the generated precision judgment result is excellent; if the output precision value is smaller than the precision threshold value, the generated precision judgment result is inferior; s346: and generating an evaluation result according to the accuracy judgment result and the output accuracy value.

The application also provides a system for evaluating an artificial intelligence model, which at least comprises: a plurality of request terminals and an artificial intelligent model evaluation center; wherein, the request end: the method is used for sending an evaluation request; receiving an evaluation result; receiving a request risk alert; receiving a model risk alert; artificial intelligence model evaluation center: for performing the above-described method for evaluating an artificial intelligence model.

The beneficial effects achieved by the application are as follows:

(1) According to the method and the system for evaluating the artificial intelligent model, the safety verification is required to be carried out on the request end before the artificial intelligent model to be evaluated (namely, the evaluation model to be evaluated) is received, so that the safety of the request end can be further ensured, and the safety of data sent by the request end is ensured.

(2) According to the method and the system for evaluating the artificial intelligent model, after the artificial intelligent model to be evaluated is received and before the artificial intelligent model to be evaluated is evaluated, the safety verification of the artificial intelligent model to be evaluated is required, so that the safety and the accuracy of the artificial intelligent model to be evaluated can be further ensured.

(3) According to the method and the system for evaluating the artificial intelligent model, the plurality of evaluation data are used for evaluating the artificial intelligent model to be evaluated, so that the evaluation result is obtained, and the accuracy of the evaluation result can be further ensured due to the sufficient quantity of the evaluation data.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a schematic diagram of one embodiment of a system for evaluating an artificial intelligence model;

FIG. 2 is a flow chart of one embodiment of a method for evaluating an artificial intelligence model.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in FIG. 1, the present application provides a system for evaluating an artificial intelligence model, comprising at least: a plurality of requesters 110 and an artificial intelligence model evaluation center 120.

Wherein, the request end 110: the method is used for sending an evaluation request; receiving an evaluation result; receiving a request risk alert; a model risk alert is received.

Artificial intelligence model evaluation center 120: for performing the method for evaluating an artificial intelligence model described below.

Further, the artificial intelligence model evaluation center 120 includes at least: the system comprises a security verification module, a scheduling module, a plurality of simulators, a plurality of evaluation virtual machines and a storage unit.

Wherein, the security verification module: receiving an evaluation request, carrying out security verification on a request terminal according to the evaluation request, and generating a request terminal security result; and if the safety result of the request end is unsafe, sending a request risk alarm.

And a scheduling module: if the safety result of the request end is safe, calling a simulator; and if the model detection result is safe, selecting one from the plurality of evaluation virtual machines as an execution evaluation virtual machine.

A simulator: the method comprises the steps of receiving an evaluation model to be tested, carrying out safety detection on the model to be tested, and generating a model detection result; and if the model detection result is unsafe, sending a model risk alarm.

Evaluating a virtual machine: and the method is used for evaluating the model to be evaluated, obtaining an evaluation result and sending the evaluation result to the request terminal.

And a storage unit: the method is used for storing an influence weight database, the influence weight database is stored with a plurality of influence weight tables, each influence weight table corresponds to one influence equipment model, and each influence weight table at least comprises: a plurality of security data, each security data corresponding to an impact weight; an impact weight corresponds to a weight difference table, each weight difference table comprising at least: and a plurality of standard use time periods, wherein one standard use time period corresponds to one average weight difference value.

As shown in fig. 2, the present application provides a method for evaluating an artificial intelligence model, comprising the steps of:

S1: receiving an evaluation request, carrying out security verification on a request terminal according to the evaluation request, and generating a request terminal security result, wherein the request terminal security result is safe or unsafe; if the safety result of the request end is safe, executing S2; and if the safety result of the request end is unsafe, sending a request risk alarm.

Specifically, the request risk alert includes at least: alert time and requesting end security results. And the artificial intelligent model evaluation center sends the request risk alarm to the request end.

Further, the sub-steps of receiving an evaluation request, performing security verification on the request terminal according to the evaluation request, and generating a security result of the request terminal are as follows:

S11: receiving a login request or a registration request, and receiving an evaluation request after the login or registration is completed, wherein the evaluation request at least comprises: request time, request address and request end information; wherein, the request end information at least comprises: the address of the request end, the equipment model, the current use time length, a plurality of real-time safety data and the value of each real-time safety data.

Specifically, the request time is a time node of the request end sending the evaluation request.

The request address is the position coordinate where the request end sends the evaluation request.

The current use time length is the actual use time length from the first operation of the request end to the current operation.

Wherein, the real-time security data at least comprises: hardware security data, software security data, and network security data.

Specifically, the value of the hardware security data is the security value of the device hardware of the requesting end, which is acquired in real time when the evaluation request is sent, and is used for indicating the current hardware security of the requesting end. The value of the hardware security data can be obtained through the existing acquisition mode.

The software security data is the security value of the software of the request terminal acquired in real time when the evaluation request is sent, and is used for indicating the current software security of the request terminal. The value of the software security data can be obtained through the existing acquisition mode.

The network security data is a network security value of the request end acquired in real time when the evaluation request is sent, and is used for indicating the current network security of the request end. The value of the network security data can be obtained through the existing acquisition mode.

S12: and determining the current influence weight of each piece of real-time safety data according to the equipment model and the current use time length in the request end information.

Further, the sub-step of determining the current impact weight of each real-time security data according to the equipment model and the current use time length in the request end information is as follows:

S121: traversing the influence weight database according to the equipment model in the request end information, and taking an influence weight table with the same influence equipment model as the equipment model in the request end information as a demand weight table.

S122: traversing the plurality of security data in the demand weight table according to the plurality of real-time security data in the request end information, and taking the influence weight corresponding to the security data identical to the real-time security data as the demand influence weight.

S123: and analyzing a weight difference value table corresponding to each demand influence weight according to the current use time length in the request end information to obtain a current weight difference value.

Further, the sub-steps of analyzing the weight difference value table corresponding to each demand influence weight according to the current use time length in the request end information to obtain the current weight difference value are as follows:

S1231: sequencing a plurality of standard use time lengths and the current use time lengths according to the sequence from short time length to long time length, taking the standard use time length with the time length smaller than the current use time length as a short time length, and generating a short sequence number for each short time length; taking the standard use time length with the time length longer than the current use time length as the long time length, and generating a long sequence number for each long time length; wherein, the short sequence number corresponding to the short time length is smaller than the short sequence number corresponding to the short time length, and the long sequence number corresponding to the long time length is smaller than the long sequence number corresponding to the long time length.

Specifically, the interval duration between the two adjacent standard use time periods can be the same or different, and the application preferably has the same interval duration between the two adjacent standard use time periods.

The specific value of the interval duration between the adjacent two standard use durations depends on the actual situation. For example: the current use time is 3.5 years, the short time of the short serial number 1 is 1 year, the short time of the short serial number 2 is 2 years, and the short time of the short serial number 3 is 3 years. The long time of the long serial number 1 is 4 years, the long time of the long serial number 2 is 5 years, the long time of the long serial number 3 is 6 years, … …, and the long time of the long serial number N is M years.

Further, because the using time periods of the equipment are different, the actual abrasion of the equipment is also different, and therefore, the average weight difference values corresponding to the using time periods of different standards are also different.

The average weight difference refers to: and under the condition that the equipment models are the same and the standard use time length is the same, the average value of actual change values between the actual influence weights of the actual safety data of the plurality of request terminals and the influence weights corresponding to the safety data in the influence weight table.

S1232: all short time lengths are analyzed to obtain short difference values.

Further, the expression of the short difference value is as follows:

；

wherein, Is the firstShort differences in real-time security data; For short sequence number Average weight difference value corresponding to short duration of (2); For short sequence number Average weight difference value corresponding to short duration of (2); For short sequence number A duration value corresponding to the short duration of (2); For short sequence number A duration value corresponding to the short duration of (2);， Is the total number of short sequence numbers.

In particular, the method comprises the steps of,Indicating short sequence numberShort duration and short sequence number of (2)A variation value of the average weight difference between the short time lengths of (a).Indicating short sequence numberShort duration and short sequence number of (2)A change in the duration value between short durations of (a). The time length value of the short time length is the value of the actual use time length corresponding to the short time length.

S1233: and analyzing all the long time periods to obtain a long difference value.

Further, the expression of the long difference is as follows:

；

wherein, Is the firstA long difference value of the real-time security data; Is of long sequence number Average weight difference value corresponding to long time length of (2); Is of long sequence number Average weight difference value corresponding to long time length of (2); Is of long sequence number A time length value corresponding to the long time length of the (a); Is of long sequence number A time length value corresponding to the long time length of the (a);， the total number of the long sequence numbers.

In particular, the method comprises the steps of,Indicating the long number asLong duration and long sequence number of (2)A variation value of the average weight difference between the long durations of (a).Indicating the long number asLong duration and long sequence number of (2)A change value of the duration value between the long durations of (a). The long time length value is the value of the actual use time length corresponding to the long time length.

S1234: and calculating the current weight difference value according to the short difference value and the long difference value.

Further, the expression of the current weight difference is as follows:

；

Specifically, the current use duration value is the value of the actual use duration corresponding to the current use duration.

S124: and calculating the current influence weight according to the current weight difference value and the demand influence weight, wherein the current influence weight = demand influence weight + the current weight difference value.

Specifically, the current weight difference value may be a positive number or a negative number, which depends on the actual influence situation.

S13: and calculating the safety value of the request terminal according to the value of each piece of real-time safety data in the information of the request terminal and the current influence weight.

Further, the expression of the request-side security value is as follows:

；

wherein, Is the request end safety value; if the address of the request end is verified to be an illegal address by the third party platform as the illegal address value If the address of the request end is verified to be legal through the third party platform, then；Is the firstCurrent impact weights corresponding to the real-time security data; Is the first The value of the individual real-time security data,，Is the total number of the real-time safety data.

Specifically, the third party platform is a platform for verifying the validity of the address of the request end, for example: official authentication platform.

S14: judging the safety value of the request terminal according to a preset safety threshold value of the request terminal, and generating a safety result of the request terminal; if the request end safety value is greater than or equal to the request end safety threshold value, the generated request end safety result is safe; if the request end safety value is smaller than the request end safety threshold value, the generated request end safety result is unsafe.

Specifically, the specific value of the request end safety threshold is set according to the actual situation.

S2: receiving an evaluation model to be tested, carrying out safety detection on the model to be tested, and generating a model detection result, wherein the model detection result is safe or unsafe; if the model detection result is safe, executing S3; and if the model detection result is unsafe, sending a model risk alarm.

Specifically, the model risk alert includes at least: alert time and model detection results. The evaluation model to be tested is an artificial intelligence model of different types. And the artificial intelligence model evaluation center sends the model risk alarm to the request end.

Further, as an embodiment, the existing virus detection model is used for detecting the virus of the model to be evaluated, and generating a model detection result, if the model to be evaluated does not have the virus, the generated model detection result is safe, and if the model to be evaluated has the virus, the generated model detection result is unsafe.

Specifically, the existing virus detection model is a pre-trained artificial intelligent model for virus detection.

Further, as another embodiment, the sub-steps of performing security detection on the model to be evaluated and generating a model detection result are as follows:

s21: invoking a simulator, and receiving an evaluation model to be tested through the simulator; wherein the simulator is configured to simulate an isolated operating environment.

S22: virus detection is carried out on the model to be evaluated through a simulator, and a model detection result is generated; if no virus feature code exists in the evaluation model to be tested, the generated model detection result is safe; if the to-be-tested evaluation model has at least one virus feature code, deleting the to-be-tested evaluation model, and generating a model detection result which is unsafe.

Furthermore, an artificial intelligent model which is trained in advance and used for detecting viruses is arranged in the simulator, the viruses of the model to be tested are detected through the artificial intelligent model, and a model detection result is generated.

S3: the method comprises the steps of obtaining an execution evaluation virtual machine according to a model to be evaluated, evaluating the model to be evaluated through the execution evaluation virtual machine, obtaining an evaluation result, and sending the evaluation result to a request end; wherein, the evaluation result at least comprises: and outputting the precision value and the precision judgment result.

Further, the sub-steps of acquiring the execution evaluation virtual machine according to the model to be evaluated, evaluating the model to be evaluated by the execution evaluation virtual machine, and obtaining the evaluation result are as follows:

S31: and carrying out type analysis on the model to be evaluated, and determining the model type of the model to be evaluated.

In particular, artificial Intelligence (AI) models come in a variety of types, each of which is adapted to different tasks and problems. Therefore, the specific classification mode of the model type of the evaluation model to be tested depends on the actual situation, wherein the classification mode at least comprises: classifying according to the application of the evaluation model to be tested and classifying according to the construction principle of the evaluation model to be tested.

Further, the model types at least include: neural network model classes, cluster model classes, reinforcement learning model classes, and natural language processing model classes.

Further, the model type of the model to be evaluated is determined by performing type analysis on the model to be evaluated through a pre-trained classification model.

S32: and taking the evaluation virtual machines with the same type as the model type as primary selection equipment.

Specifically, one evaluation virtual machine corresponds to one virtual machine type. The specific classification mode for evaluating the virtual machine type of the virtual machine is determined according to the actual situation, wherein the classification mode at least comprises: classifying according to the purpose of the evaluation target and classifying according to the construction principle of the evaluation target. The evaluation target is the evaluation model to be tested.

Further, the virtual machine types include at least: neural network model classes, cluster model classes, reinforcement learning model classes, and natural language processing model classes.

S33: and reading the number of tasks to be executed of each primary selection device, and selecting the primary selection device with the minimum number of tasks to be executed as the execution evaluation virtual machine.

Specifically, the number of tasks to be executed is the number of evaluation models to be tested in a queuing waiting evaluation state in a task list of the primary selection device.

S34: and carrying out evaluation on the model to be evaluated by calling a plurality of pieces of evaluation data through the execution evaluation virtual machine to obtain an evaluation result, wherein each piece of evaluation data at least comprises: the evaluation input data and the evaluation output data.

Further, the sub-steps of evaluating the model to be evaluated by calling a plurality of evaluation data by executing the evaluation virtual machine to obtain an evaluation result are as follows:

S341: and randomly generating an evaluation sequence number for each evaluation data, wherein one evaluation data corresponds to one evaluation sequence number, and the evaluation sequence numbers are sequentially increased from small to large according to the sequence of the randomly generated evaluation sequence numbers.

Specifically, the sequence number of the first evaluation data with the sequence of randomly generated evaluation numbers is 1; the sequence number of the second evaluation data with the sequence of the randomly generated evaluation sequence number is 2 and … …, and the sequence number of the evaluation data with the sequence of the randomly generated evaluation sequence number is U.

S342: and taking the evaluation data with the smallest evaluation sequence number as current evaluation data, inputting the evaluation input data of the current evaluation data into an evaluation model to be tested, executing the evaluation input data by the evaluation model to be tested, outputting a current execution result, analyzing the current execution result by using the evaluation output data of the current evaluation data to obtain a data similarity value, and executing S343.

Specifically, the data similarity value indicates the degree of identity between the current execution result and the evaluation output data. And obtaining the data similarity value through a pre-trained similarity value analysis model.

S343: judging the evaluation sequence number of the current evaluation data according to the total number of the evaluation sequence numbers, if the total number of the evaluation sequence numbers is larger than the evaluation sequence number of the current evaluation data, storing the data similarity value, eliminating the evaluation sequence number of the current evaluation data, and executing S342; if the total number of the evaluation numbers is equal to the evaluation number of the current evaluation data, the data similarity value is stored, and S344 is performed.

S344: and calculating all the data similarity values to obtain an output precision value.

Further, the expression of the output precision value is:

；

wherein, Outputting an accuracy value; Is the first Data similarity values;， Is the total number of data similarity values.

S345: judging the output precision value through a preset precision threshold value, and generating a precision judgment result; if the output precision value is greater than or equal to the precision threshold value, the generated precision judgment result is excellent; if the output precision value is smaller than the precision threshold value, the generated precision judgment result is inferior.

Specifically, the specific value of the precision threshold is set according to the actual situation.

S346: and generating an evaluation result according to the accuracy judgment result and the output accuracy value.

The beneficial effects achieved by the application are as follows:

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the scope of the application be interpreted as including the preferred embodiments and all alterations and modifications that fall within the scope of the application. It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the present application and the technical equivalents thereof, the present application is also intended to include such modifications and variations.

Claims

1. A method for evaluating an artificial intelligence model, comprising the steps of:

S1: receiving an evaluation request, carrying out security verification on a request terminal according to the evaluation request, and generating a request terminal security result, wherein the request terminal security result is safe or unsafe; if the safety result of the request end is safe, executing S2; if the safety result of the request end is unsafe, a request risk alarm is sent;

S2: receiving an evaluation model to be tested, carrying out safety detection on the model to be tested, and generating a model detection result, wherein the model detection result is safe or unsafe; if the model detection result is safe, executing S3; if the model detection result is unsafe, sending a model risk alarm;

s3: the method comprises the steps of obtaining an execution evaluation virtual machine according to a model to be evaluated, evaluating the model to be evaluated through the execution evaluation virtual machine, obtaining an evaluation result, and sending the evaluation result to a request end; wherein, the evaluation result at least comprises: outputting an accuracy value and an accuracy judgment result;

the method comprises the following substeps of receiving an evaluation request, carrying out security verification on a request terminal according to the evaluation request, and generating a request terminal security result:

s11: receiving a login request or a registration request, and receiving an evaluation request after the login or registration is completed, wherein the evaluation request at least comprises: request time, request address and request end information; wherein, the request end information at least comprises: the address of the request end, the equipment model, the current use time length, a plurality of real-time safety data and the value of each real-time safety data;

S12: determining the current influence weight of each piece of real-time safety data according to the equipment model and the current use time length in the request terminal information;

S13: calculating a request end safety value according to the value of each real-time safety data in the request end information and the current influence weight;

S14: judging the safety value of the request terminal according to a preset safety threshold value of the request terminal, and generating a safety result of the request terminal; if the request end safety value is greater than or equal to the request end safety threshold value, the generated request end safety result is safe; if the request end safety value is smaller than the request end safety threshold value, the generated request end safety result is unsafe;

the sub-steps of determining the current influence weight of each real-time security data according to the equipment model and the current use time length in the request end information are as follows:

S121: traversing the influence weight database according to the equipment model in the request end information, and taking an influence weight table with the same influence equipment model as the equipment model in the request end information as a demand weight table;

S122: traversing a plurality of security data in a demand weight table according to the plurality of real-time security data in the request end information, and taking the influence weight corresponding to the security data identical to the real-time security data as the demand influence weight;

S123: analyzing a weight difference value table corresponding to each demand influence weight according to the current use time length in the request end information to obtain a current weight difference value;

S124: calculating a current influence weight according to the current weight difference value and the demand influence weight, wherein the current influence weight=the demand influence weight+the current weight difference value;

the sub-steps of analyzing a weight difference value table corresponding to each demand influence weight according to the current use time length in the request end information to obtain the current weight difference value are as follows:

S1231: sequencing a plurality of standard use time lengths and the current use time lengths according to the sequence from short time length to long time length, taking the standard use time length with the time length smaller than the current use time length as a short time length, and generating a short sequence number for each short time length; taking the standard use time length with the time length longer than the current use time length as the long time length, and generating a long sequence number for each long time length; wherein, the short sequence number corresponding to the short time length is smaller than the short sequence number corresponding to the short time length, and the long sequence number corresponding to the long time length is smaller than the long sequence number corresponding to the long time length;

s1232: analyzing all short time lengths to obtain a short difference value;

s1233: analyzing all the long time periods to obtain a long difference value;

s1234: calculating a current weight difference value according to the short difference value and the long difference value;

the expression of the current weight difference value is as follows:

；

2. The method for evaluating an artificial intelligence model according to claim 1, characterized in that the sub-steps of performing security detection on the model to be evaluated and generating a model detection result are as follows:

s21: invoking a simulator, and receiving an evaluation model to be tested through the simulator; wherein the simulator is used for simulating an isolated operating environment;

3. The method for evaluating an artificial intelligence model according to claim 2, wherein an artificial intelligence model trained in advance for virus detection is provided in the simulator, virus detection is performed on the model to be evaluated by the artificial intelligence model, and a model detection result is generated.

4. The method for evaluating an artificial intelligence model according to claim 1, wherein the sub-steps of acquiring an execution evaluation virtual machine according to the model to be evaluated, and evaluating the model to be evaluated by executing the evaluation virtual machine, and obtaining the evaluation result are as follows:

s31: performing type analysis on the model to be evaluated, and determining the model type of the model to be evaluated;

S32: taking the evaluation virtual machines with the same type as the model type as primary selection equipment;

S33: reading the number of tasks to be executed of each primary selection device, and selecting the primary selection device with the minimum number of tasks to be executed as an execution evaluation virtual machine;

5. The method for evaluating an artificial intelligence model according to claim 4, wherein the evaluating the model to be evaluated by calling a plurality of evaluation data by executing the evaluating virtual machine, the sub-steps of obtaining the evaluation result are as follows:

s341: randomly generating an evaluation sequence number for each evaluation data, wherein one evaluation data corresponds to one evaluation sequence number, and the evaluation sequence numbers are sequentially increased from small to large according to the sequence of randomly generating the evaluation sequence numbers;

S342: taking the evaluation data with the smallest evaluation sequence number as current evaluation data, inputting the evaluation input data of the current evaluation data into an evaluation model to be tested, executing the evaluation input data by the evaluation model to be tested, outputting a current execution result, analyzing the current execution result by using the evaluation output data of the current evaluation data to obtain a data similarity value, and executing S343;

S343: judging the evaluation sequence number of the current evaluation data according to the total number of the evaluation sequence numbers, if the total number of the evaluation sequence numbers is larger than the evaluation sequence number of the current evaluation data, storing the data similarity value, eliminating the evaluation sequence number of the current evaluation data, and executing S342; if the total number of the evaluation numbers is equal to the evaluation number of the current evaluation data, storing the data similarity value, and executing S344;

S344: calculating all the data similarity values to obtain an output precision value;

S345: judging the output precision value through a preset precision threshold value, and generating a precision judgment result; if the output precision value is greater than or equal to the precision threshold value, the generated precision judgment result is excellent; if the output precision value is smaller than the precision threshold value, the generated precision judgment result is inferior;

6. A system for evaluating an artificial intelligence model, comprising at least: a plurality of request terminals and an artificial intelligent model evaluation center;

Wherein, the request end: the method is used for sending an evaluation request; receiving an evaluation result; receiving a request risk alert; receiving a model risk alert;

artificial intelligence model evaluation center: a method for performing the method for evaluating an artificial intelligence model according to any one of claims 1 to 5.