CN111027589B

CN111027589B - Multi-division target detection algorithm evaluation system and method

Info

Publication number: CN111027589B
Application number: CN201911081589.1A
Authority: CN
Inventors: 罗庚; 陈英爽; 蒋爽; 徐涛
Original assignee: CHENGDU FOURIER ELECTRONIC TECHNOLOGY CO LTD; Shenzhen SDG Information Co Ltd
Current assignee: CHENGDU FOURIER ELECTRONIC TECHNOLOGY CO LTD; Shenzhen SDG Information Co Ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2023-04-18
Anticipated expiration: 2039-11-07
Also published as: CN111027589A

Abstract

The multi-division target detection algorithm evaluation system comprises an algorithm input module, a multi-division target detection algorithm evaluation module and a multi-division target detection algorithm evaluation module, wherein the algorithm input module is used for inputting an algorithm to be detected; the test set preparation module selects a test set with a tested target type according to a to-be-tested algorithm; the comprehensive scoring module is used for weighting the image quality score, the image resolution score, the target quality score and the target size score to obtain a total comprehensive score; the application scene scoring module is used for weighting the scene image resolution scoring, the scene target quality scoring and the scene target size scoring to obtain an application scene scoring; and the free scoring module is used for generating a corresponding test set according to the user test requirement self-defined parameters so as to complete the targeted evaluation. Meanwhile, a multi-system target detection algorithm evaluation method is disclosed. The method can give out overall comprehensive scores, application scene scores and comprehensive free scores, can objectively reflect the quality of the algorithm, can reflect the quality field of the algorithm, and can give a clear optimization direction to an algorithm designer, so that the algorithm optimization is more targeted and targeted.

Description

Multi-division target detection algorithm evaluation system and method

Technical Field

The invention relates to an image processing and target detection technology, in particular to a multi-system target detection algorithm evaluation system and a method.

Background

The target detection technology is a direction which is paid much attention in the field of computer vision in recent years, mainly relates to subjects such as computer vision, image processing, artificial intelligence, pattern recognition and the like, and is widely applied to aspects such as target range measurement, target monitoring, video compression, vehicle tracking, aerospace and the like.

Currently, in the evaluation of target detection algorithms, the accuracy and recall are usually used to describe the quality of an algorithm, and the mAP (mean Average Precision) and the F-Score are widely used as an index considering both the above two indexes. However, the traditional mAP or F-Score calculation method cannot quantitatively reflect the adequacy or weakness of the algorithm in which fields, cannot provide an optimization point for an algorithm designer, and generally has the defects of one-sided evaluation, insufficient pertinence and insufficient comprehensiveness, and cannot perform tendency evaluation according to application scenes. Therefore, it is necessary to develop a multi-scoring evaluation system that combines a plurality of scoring parameters and a plurality of evaluation types to provide a more comprehensive evaluation function.

Disclosure of Invention

The invention mainly aims at the defects of the related prior art, provides a multi-division target detection algorithm evaluation system and method, can give out total comprehensive scores, application scene scores and comprehensive free scores, can objectively reflect the quality of the algorithm and the quality field of the algorithm, and can give a clear optimization direction to an algorithm designer, so that the algorithm optimization is more targeted and targeted.

In order to achieve the above object, the present invention employs the following techniques:

a multi-division target detection algorithm evaluation system is characterized by comprising:

the algorithm input module is used for inputting an algorithm to be tested;

the test set preparation module is used for selecting a test set with a tested target type according to a to-be-tested algorithm;

the comprehensive grading module is used for dividing the test set into four data sets with a plurality of different grades according to the image quality, the image resolution, the target quality and the target size, wherein before the test set is divided according to the image quality, the test set is subjected to noise adding treatment; respectively calculating mAP (maximum image processing) on each different data set to respectively obtain an overall image quality score, an overall image resolution score, an overall target quality score and an overall target size score, and carrying out average weighting on each overall score to obtain an overall comprehensive score;

the application scene scoring module is used for conducting noise adding processing on the test set, dividing the test set into data sets of different grades according to image quality, calculating mAP (maximum image resolution) by using one data set of two data sets with the best grade to obtain scene image quality scoring, respectively generating corresponding scene data sets from the test set according to preset image resolution parameters, target quality parameters and target size parameters, respectively calculating mAP by using different scene data sets to respectively obtain scene image resolution scoring, scene target quality scoring and scene target size scoring, and conducting average weighting on the scene scoring to obtain application scene scoring;

the free scoring module is used for generating a corresponding free data set from the test set according to a preset free test scene and corresponding preset free scene parameters, wherein the preset free test scene comprises at least one of a free image quality scene, a free image resolution scene, a free target quality scene and a free target size scene; calculating mAP through a free data set to obtain free scores corresponding to the free test scenes, and when more than two free scores are obtained, carrying out average weighting on the scores to obtain comprehensive free scores; when only one free score is obtained, the free score is taken as a comprehensive free score.

A multi-division target detection algorithm evaluation method is characterized by comprising the following steps:

selecting a test set with a tested target type according to an input algorithm to be tested;

responding to the input selection of a user, and executing at least one grading method of a comprehensive grading method, an application scene grading method and a free grading method; wherein:

the comprehensive scoring method comprises the following steps:

dividing the test set into four types of data sets with a plurality of different grades according to the image quality, the image resolution, the target quality and the target size, wherein before the test set is divided according to the image quality, the test set is subjected to noise adding treatment;

respectively calculating mAP for each different class of data set to respectively obtain an overall image quality score, an overall image resolution score, an overall target quality score and an overall target size score;

carrying out average weighting on all the overall scores to obtain overall comprehensive scores;

an application scenario scoring method, comprising the steps of:

the method comprises the steps of conducting noise adding processing on a test set, dividing the test set into data sets with different grades according to image quality, and calculating mAP through one data set of two data sets with the best grade to obtain a scene image quality score;

respectively generating corresponding scene data sets from the test set according to preset image resolution parameters, target quality parameters and target size parameters;

respectively calculating mAP according to different scene data sets to respectively obtain a scene image resolution score, a scene target quality score and a scene target size score;

carrying out average weighting on each scene score to obtain an application scene score;

a free-form scoring method comprising the steps of:

generating a corresponding free data set from the test set according to a preset free test scene and corresponding preset free scene parameters, wherein the preset free test scene comprises at least one of a free image quality scene, a free image resolution scene, a free target quality scene and a free target size scene;

calculating the mAP through the free data set to obtain a free score corresponding to the free test scene;

when more than two free scores are obtained, carrying out average weighting on the scores respectively to obtain a comprehensive free score; when only one free score is obtained, the free score is used as a comprehensive free score.

The invention has the beneficial effects that:

1. the traditional scoring method is that target detection is directly carried out on a batch of test sets to obtain a prediction label, and a score between the prediction label and a real label is obtained by adopting a method of mAP or FnScore, and because the test sets are not limited by tasks, the adequacy or weakness of the algorithm in which fields can not be reflected quantitatively, and an optimization point can not be provided with an algorithm designer; compared with the traditional method, the method/system can give out total comprehensive scores, application scene scores and comprehensive free scores according to evaluation requirements, specifically to comprehensive image quality scores, comprehensive image resolution scores, comprehensive image target quality scores and comprehensive image target size scores; the application scene image quality score, the scene image resolution score, the scene image target quality score and the application scene image target size score can be freely scored, so that the quality of the algorithm can be objectively reflected, the quality field of the algorithm can be reflected, a clear optimization direction can be given to an algorithm designer, and the algorithm optimization is more targeted and targeted;

2. the overall comprehensive score can objectively reflect the overall comprehensive score of the algorithm to be tested, and particularly, the overall comprehensive score is comprehensively evaluated by combining image quality, image resolution, target quality and target size, wherein the image quality and the target quality are evaluated in a normalized weight processing mode, the target quality and the target size are evaluated in an average weight processing mode, the overall evaluation is in the average weight processing mode, factors of different image attributes are considered more comprehensively and combined, and the algorithm to be tested is evaluated more comprehensively, effectively and accurately.

3. By applying scene scoring, the setting of expert/user parameters can be realized, and the effect of the algorithm to be tested in a specific application scene can be evaluated in a targeted manner;

4. through free scoring, certain specific scene items can be individually evaluated according to certain algorithms, for example, only the algorithm is subjected to target-size scoring, or resolution-size scoring, or two or more items or other items are scored, and a user can freely score according to own idea.

Drawings

Fig. 1 is a block diagram of an evaluation system according to an embodiment of the present application.

Fig. 2 is a block diagram of a structure of a composite scoring module according to an embodiment of the present application.

Fig. 3 is a block diagram of an application scenario scoring module according to an embodiment of the present application.

Fig. 4 is a block diagram of a free scoring module according to an embodiment of the present application.

Fig. 5 is a flowchart illustrating steps of an evaluation method according to an embodiment of the present application.

FIG. 6 is a flowchart illustrating the steps of the comprehensive scoring method according to the embodiment of the present application.

Fig. 7 is a flowchart illustrating steps of an application scenario scoring method according to an embodiment of the present application.

FIG. 8 is a flow chart of the steps of the free scoring method according to the embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the systems and methods of the present application is provided in conjunction with the accompanying drawings and examples.

Specifically, as an embodiment of the present application, the overall structural framework of the evaluation system is shown in fig. 1:

the system comprises an algorithm input module, a test set preparation module, a comprehensive grading module, an application scene grading module, a free grading module and an output module.

The algorithm input module is connected with the test set preparation module, the test set preparation module is respectively connected with the comprehensive scoring module, the application scene scoring module and the free scoring module, and the comprehensive scoring module, the application scene scoring module and the free scoring module are respectively connected with the output module.

Specifically, the method comprises the following steps:

and the algorithm input module is used as an initiating end and used for a user to input the algorithm to be tested and transmit the algorithm to be tested to the post-stage module.

And the test set preparation module is used for selecting the test set with the tested target type according to the algorithm to be tested.

And the output module is used for outputting the evaluation result and displaying and/or outputting the printable document containing the evaluation result.

The evaluation result comprises total comprehensive score, application scene score and comprehensive free score, and at least one or more or all of the evaluation results can be output according to the operation requirement of the user.

The following describes the three scoring modules in detail:

(1) The comprehensive grading module is used for dividing the test set into four types of data sets with a plurality of different grades according to the image quality, the image resolution, the target quality and the target size, wherein before the test set is divided according to the image quality, the test set is subjected to noise adding treatment; and respectively calculating mAP (maximum Address resolution) of each different class of data sets to respectively obtain an overall image quality score, an overall image resolution score, an overall target quality score and an overall target size score, and carrying out average weighting on each overall score to obtain an overall comprehensive score.

Specifically, as shown in fig. 2, the comprehensive scoring module includes:

4 data set generating units: the system comprises an overall image quality data set generating unit, an overall image resolution data set generating unit, an overall target quality data set generating unit and an overall target size data set generating unit;

5 scoring units: the system comprises an overall image quality scoring unit, an overall image resolution scoring unit, an overall target quality scoring unit, an overall target size scoring unit and an overall comprehensive scoring unit;

and the 1 normalization weight generation unit is used for providing normalization weights for the overall image quality scoring unit and the overall target quality scoring unit. Specifically, the method comprises the following steps: setting the degree of similarity of different grades for the mAP grades of different grades; obtaining mAP scores which correspond to the same degree in time sharing respectively according to different grades, and calculating the weight corresponding to each grade; and carrying out normalization processing on the weight corresponding to each grade to obtain the normalization weight corresponding to each grade.

As shown in fig. 2, the data set generating units are respectively connected to the corresponding scoring units, and each of the classification scoring units is respectively connected to the overall comprehensive scoring unit. And the normalization weight generation unit is respectively connected with the overall image quality grading unit and the overall target quality grading unit.

Overall image quality score:

firstly, a general image quality data set generating unit adds noise to a test set, mainly adding Gaussian noise, salt and pepper noise or other noises with different degrees to the image of the test set, and/or performing smoothing processing with different degrees, expanding the image quantity of the test set, and distinguishing the image sets with different noise levels. And one or more of SSIM, MS-SSIM, IW-SSIM, FSIM and MDSI is/are combined to be used as an image quality index to obtain a distance index of each image of the test set, and the test set is divided into data sets with a plurality of quality levels according to the distance index.

And then, the overall image quality scoring unit tests the algorithms to be tested respectively through the data sets generated by the overall image quality data set generating unit, calculates the mAP respectively, and performs weighting processing on the mAP scores corresponding to all levels by utilizing the normalized weight to obtain the overall image quality score.

Overall image resolution score:

first, an overall image resolution data set generating unit divides a test set into a plurality of data sets of different levels according to an image resolution size.

And then, the overall image resolution scoring unit tests the algorithms to be tested respectively through the data sets generated by the overall image resolution data set generating unit, calculates the mAP respectively, and performs weighting processing on the mAP scores corresponding to all levels by using average weight to obtain the overall image resolution score.

Overall target quality score:

first, the overall target quality data set generating unit divides the test set into a plurality of data sets of different levels according to the target quality difference.

And then, the overall target quality scoring unit tests the algorithms to be tested respectively through the data sets generated by the overall target quality data set generating unit, calculates mAPs respectively, and performs weighting processing on the mAP scores corresponding to all levels by using the normalized weights to obtain the overall target quality scores.

Overall target size score:

first, an overall target size data set generating unit divides the test set into a plurality of data sets of different levels according to the target size.

And then, the overall target size scoring unit tests the algorithms to be tested respectively through the data sets generated by the overall target data set generating unit, calculates the mAP respectively, and performs weighting processing on the mAP scores corresponding to all levels by using average weight to obtain the overall target size score.

Overall composite score:

and carrying out average weighting processing on the total image quality score, the total image resolution score, the total target quality score and the total target size score by a total comprehensive scoring unit to obtain a total comprehensive score.

(3) And the application scene scoring module is used for carrying out noise adding processing on the test set, dividing the test set into data sets with different grades according to the image quality, calculating mAP (maximum image resolution) by using one data set of two data sets with the best grade to obtain a scene image quality score, respectively generating corresponding scene data sets from the test set according to a preset image resolution parameter, a preset target quality parameter and a preset target size parameter, respectively calculating mAP by using different scene data sets to respectively obtain a scene image resolution score, a scene target quality score and a scene target size score, and carrying out average weighting on the scene scores to obtain the application scene score.

Specifically, as shown in fig. 3, the application scenario scoring module includes:

1 preset unit: a scene presetting unit for responding to the input of a user to finish the presetting of the resolution parameter, the target quality parameter and the target size parameter;

4 data set generating units: a scene image quality data set generating unit, a scene image resolution data set generating unit, a scene target quality data set generating unit and a scene target size data set generating unit;

5 scoring units: the system comprises a scene image quality scoring unit, a scene image resolution scoring unit, a scene target quality scoring unit, a scene target size scoring unit and a scene comprehensive scoring unit.

As shown in fig. 3, the scene presetting unit is connected to the scene image resolution dataset generating unit, the scene object quality dataset generating unit, and the scene object size dataset generating unit, and is configured to provide the corresponding parameters input by the user to the corresponding data generating units. The scene image quality data set generating unit is connected with the scene image quality scoring unit, the scene image resolution data set generating unit is connected with the scene image resolution scoring unit, the scene target quality data set generating unit is connected with the scene target quality scoring unit, and the scene target size data set generating unit is connected with the scene target size scoring unit. The scene comprehensive evaluation unit is respectively connected with the scene image quality evaluation unit, the scene image resolution evaluation unit, the scene target quality evaluation unit and the scene target size evaluation unit.

Scene image quality scoring:

firstly, a scene image quality data set generating unit adds noise to a test set, obtains a distance index of each image of the test set according to an image quality evaluation index, divides the test set into data sets with a plurality of quality grades according to the distance index, and takes one of two data sets with the best grade as a scene image quality data set.

And then, the scene image quality scoring unit tests the algorithm to be tested through the scene image quality data set generated by the scene image quality data set generating unit, and calculates the mAP to obtain the scene image quality score.

Scene image resolution score:

first, a user inputs an image resolution parameter through a scene preset unit, which responds to the input of the user to complete the preset of the resolution parameter.

Then, the scene image resolution data set generating unit generates a scene image resolution data set corresponding to a preset resolution parameter from the test set according to the preset resolution parameter.

And then, the scene image resolution scoring unit tests the algorithm to be tested through the data set generated by the scene image resolution data set generating unit and calculates the mAP to obtain the scene image resolution score.

Scene target quality scoring:

first, a user inputs a target quality parameter through a scene presetting unit, and the scene presetting unit responds to the input of the user to complete the presetting of the target quality parameter.

Then, the scene target quality data set generating unit generates a scene target quality data set corresponding to a preset target quality parameter from the test set according to the preset target quality parameter.

And then, the scene target quality scoring unit tests the algorithm to be tested through the data set generated by the scene target quality data set generating unit and calculates the mAP to obtain the scene target quality score.

Scene target size scoring:

first, a user inputs a target size parameter through a scene presetting unit, and the scene presetting unit responds to the input of the user to complete the presetting of the target size parameter.

Then, the scene target size data set generating unit generates a scene target size data set corresponding to the preset target size parameter from the test set according to the preset target size parameter.

And then, the scene target size scoring unit tests the algorithm to be tested through the data set generated by the scene target size data set generating unit and calculates the mAP to obtain the scene target size score.

Scene comprehensive scoring:

and the scene comprehensive scoring unit is used for carrying out average weighting processing on the scene image quality score, the scene image resolution score, the scene target quality score and the scene target size score to obtain an application scene score.

(3) The free scoring module is used for generating a corresponding free data set from the test set according to a preset free test scene and corresponding preset free scene parameters, wherein the preset free test scene comprises at least one of a free image quality scene, a free image resolution scene, a free target quality scene and a free target size scene; calculating mAP through the free data set to obtain free scores corresponding to the free test scenes, and when more than two free scores are obtained, carrying out average weighting on the scores to obtain comprehensive free scores; when only one free score is obtained, the free score is used as a comprehensive free score.

As shown in fig. 4, the free scoring module includes a free scene presetting unit, a free scene data set generating unit, and a free scoring unit, which are connected in sequence.

When free scoring is required:

firstly, a user inputs at least one free scene parameter of an image quality parameter, an image resolution parameter, a target quality parameter and a target size parameter according to own requirements, a free scene presetting unit responds to the input of the user and correspondingly selects a scene corresponding to the free scene parameter input by the user from an image quality scene, a free image resolution scene, a free target quality scene and a free target size scene as a free test scene.

Then, the free scene data set generating unit generates a corresponding free data set from the test set according to the free test scene and the free scene parameters.

And then, the free scoring unit tests the algorithm to be tested through the free data set and calculates the mAP to obtain the free score corresponding to the free test scene. Wherein:

when more than two free scores are obtained, carrying out average weighting on the scores respectively to obtain a comprehensive free score;

when only one free score is obtained, the free score is used as a comprehensive free score.

As shown in fig. 5, a flowchart of the steps of the evaluation method of the embodiment of the present application is shown.

A multi-division target detection algorithm evaluation method comprises the following steps:

(1) And selecting a test set with the tested target type according to the input algorithm to be tested.

(2) And responding to the input selection of the user, and executing at least one of a comprehensive scoring method, an application scene scoring method and a free scoring method.

Specifically, the detailed steps of the comprehensive scoring method are shown in fig. 6:

(1) And dividing the test set into four data sets with a plurality of different grades according to the image quality, the image resolution, the target quality and the target size, wherein before the test set is divided according to the image quality, the test set is subjected to noise processing.

(2) And respectively calculating mAP for each different class of data set to respectively obtain an overall image quality score, an overall image resolution score, an overall target quality score and an overall target size score.

(3) And carrying out average weighting on all the overall scores to obtain an overall comprehensive score.

Specifically, the detailed steps of the application scenario scoring method are shown in fig. 7:

(1) And carrying out noise adding on the test set, dividing the test set into data sets with different grades according to the image quality, and calculating the mAP by using one data set of the two data sets with the best grade to obtain a scene image quality score.

(2) And respectively generating corresponding scene data sets from the test set according to preset image resolution parameters, target quality parameters and target size parameters.

(3) And respectively calculating mAP according to different scene data sets to respectively obtain a scene image resolution score, a scene target quality score and a scene target size score.

(4) And carrying out average weighting on the scene scores to obtain the application scene score.

Specifically, the detailed steps of the free scoring method are shown in fig. 8:

(1) And generating a corresponding free data set from the test set according to a preset free test scene and corresponding preset free scene parameters, wherein the preset free test scene comprises at least one of a free image quality scene, a free image resolution scene, a free target quality scene and a free target size scene.

(2) Compute the mAP through the free dataset to obtain a free score for the free test scenario:

(2-1) when more than two free scores are obtained, carrying out average weighting on the scores respectively to obtain a comprehensive free score;

(2-2) when only one free score is obtained, the free score is taken as a composite free score.

The systems and methods of the present application are further illustrated below.

And inputting the algorithm to be tested through an input module.

And the test set preparation module selects an original test set with the tested target type in the overall test set and names the original test set TG.

And selecting at least one of comprehensive scoring, application scene scoring and free scoring for evaluation according to the testing/evaluation requirements of the user. And selecting the comprehensive score without setting parameters by a user. And selecting the application scene score, wherein a user needs to input a resolution parameter, a target quality parameter and a target size parameter which are required by the corresponding scene evaluation through a scene preset unit. Selecting free scoring, enabling a user to input attributes to be tested in a user-defined mode, limiting parameters to be tested, and if at least one of image quality parameters, image resolution parameters, target quality parameters and target size parameters needs to be input, determining an application scene to be tested according to corresponding parameters which have values and are not zero, if the user only inputs one parameter, only determining one corresponding scene, and only finishing testing under the corresponding scene parameters.

1. If the comprehensive scoring is selected to be needed, the comprehensive scoring module operates to start the following functions/steps:

(1) Overall image quality score:

and (3) carrying out noise adding treatment on the TG test set, and respectively extracting 5 test sets T1-T5 with 5 quality grades.

Calculating mAP1 of T1 and P1, calculating mAP2 of T2 and P2, and obtaining mAP3, mAP4 and mAP5 in the same way, thereby obtaining scores of 5 grades, and finally weighting to obtain a quality score S1 by utilizing normalized weight.

The specific obtaining manner of the normalization weight can refer to the following example operations:

when the grades (degree of similarity) of 1~5 mAP are respectively 0.2, 0.4, 0.6, 0.8 and 1, the evaluation algorithm has equivalent effect; weight of

Therefore, the reciprocal of the degree score (0.2, 0.4, 0.6, 0.8, 1) is calculated to obtain a weight (5,2.5,1.66666667,1.25,1).

When the mAP scores of 1-5 are 1, respectively, the total score of quality should be 1, not 5. Therefore, there is a need to normalize the weights, method:

thus, the weight w = (5,2.5,1.66666667,1.25,1)/11.41666667 = (0.4379562, 0.2189781, 0.1459854, 0.10948905, 0.08759124).

And finally, summing the mAP w of each grade to obtain a final score S, wherein n is 5 and corresponds to different grades respectively.

And (3) testing: when the mAP is (0.2, 0.4, 0.6, 0.8, 1), the respective scores are 0.08759124 (same degree), and the sum is 0.4379562.

As a result: substituting into weight normalization formula to obtain 0.4379562 as the total score. 0.4379562 is a general effect score, and if a higher score is desired, training for a strong noise image should be enhanced.

Test 2: when the mAP is (0.7, 0.7, 0.7, 0.7, 0.7), the total score should be 0.7;

as a result: the total score obtained by substituting the weight normalization formula is 0.7, which is correct.

(2) Overall image resolution score

And dividing the TG test set into 3 classes according to the resolution, the middle and the small, grading the 3 classes respectively to calculate mAP, and finally carrying out average weighting to obtain a resolution score S2.

(3) Overall target quality score

Selecting a TG test set, and selecting the results of the test set in the overall test results. Dividing the target quality into 2 sets according to the good and poor target quality, respectively scoring the 2 sets to calculate mAP, and finally weighting through normalization weight to obtain a resolution score S3.

The normalization weight is obtained in the same manner as the operation manner in the overall image quality score. Setting the mAP with good target quality to be 1 point, setting the mAP with poor quality to be 0.5 point, and then normalizing and dividing the mAP with good target quality into (1/1,1/0.5): (1/1,1/0.5)/(1/1+1/0.5), i.e., the weights are (0.33333,0.6666666), respectively.

(4) Overall target size score

Selecting a TG test set, and selecting the results of the test set in the overall test results. Dividing the target into 3 sets according to the size of the target, the size of the target and the size of the target, respectively grading the 3 sets to calculate the mAP, and finally carrying out average weighting to obtain a resolution score S4.

(5) Overall composite score

And scoring the algorithm according to different properties of the images to obtain S1-S4, and finally, carrying out average weighting on the algorithm to obtain a total score Stotal.

2. If the application scene scoring is selected to be needed, except that the application scene image quality scoring is carried out by adopting one data set of two data sets with the highest grade, other scoring is carried out by only scoring the test set under the application environment parameters set in the expert/user parameters. For example, the system classifies the resolution according to the situation of the atlas attribute of the original test set and the situation of large, medium, small or redundant three levels of the installation resolution, and is divided into multiple levels according to the current quality and difference, wherein the current quality and difference are at least more than 2 levels. And selecting the corresponding resolution, target quality and target size grade by the user according to the application scene condition to be tested so as to correspond to the required resolution range, target quality range and target size.

For example, the user sets the parameters to: resolution level 2, target quality level 2, and target size level 2 are resolution ranges, target quality ranges, and target size ranges corresponding to these levels. And respectively calculating scores under the parameters, adding the quality scores, inputting the scores into a picture with expert parameter properties, outputting the scores under all conditions, and finally weighting to obtain a total score.

(1) Scenario image quality scoring

If the quality grades of the test sets are ordered as: t1 is more than T2, more than T3, more than T4 and less than T5. And respectively adopting T4 or T5 to test, calculating mAP4 of T4 and P4, or calculating mAP5 of T5 and P5, and taking the obtained score as the scene image quality score S1.

(2) Scenario image resolution scoring

According to the requirement of a user, the user inputs parameters with medium resolution, in a specific example, a resolution level 2 is set, a scene image resolution test set belonging to the resolution level 2 is correspondingly generated according to the resolution level 2, an mAP score is calculated, and a scene image resolution score S2 is obtained.

(3) Scenario target quality scoring

And only if the target quality parameter input by the user is set as a target quality grade 2, generating a test set of a target quality range corresponding to the target quality grade 2, testing, calculating the mAP score, and obtaining a scene target quality score S3.

(4) Scenario target size scoring

And only generating a test set of a target size range corresponding to the target size grade of 2 according to the target size parameter input by the user, for example, setting the target size grade of 2, testing, calculating the mAP score, and obtaining a scene target size score S4.

(5) Application scenario scoring

3. Some algorithms want to individually test certain scene items, such as scoring only the target size, or scoring the resolution size, or scoring two or more items or other items. The user can freely score according to own thought. The system generates a corresponding free test scene/test set according to the parameters input by the user. For example, the input is a picture with a small target, and the output is a small target score.

(1) Data preparation

The user inputs a desired scoring scene in the free scene presetting unit, and specifically, the free scene presetting unit generates a free data set correspondingly by inputting corresponding parameters.

(2) Free scoring

And testing the test set, and freely scoring to obtain one or more scoring results for optimization of the algorithm model. If multiple terms are used, then the overall free score is obtained using mean boosting.

Taking the above example and scene parameters as an evaluation system/method, training a kitti data set by adopting yolov3_ tiny version, and performing target detection comparison on a test set as follows (in the example, applying scene scoring, calculating application scene image quality scoring by adopting a T4 grade, and calculating other scoring by adopting parameters of a resolution grade 2, a target quality grade 2 and a target size registration 2):

the system can simultaneously carry out overall comprehensive scoring, application scene scoring and free scoring. As a result of the examples listed in the table above, then:

by total composite score versus traditional score: the algorithm to be tested can be found to perform well on a noise-free image, but is sensitive to the detection of a noise-added image target, large-resolution, small-target and shielding, and the training of the image with the attributes in the aspects above is enhanced;

comparing the scene score with the traditional score by applying: the algorithm to be tested can be found to be good in performance on a specific application scene (namely, a test set/test atlas corresponding to a preset resolution level, a target quality level and a target size level respectively), and training on a noise image of the application scene needs to be strengthened;

the free scores were assigned to the specific assessment and did not participate in the comparison.

The system/method provided by the invention can give out total comprehensive score, application scene score and comprehensive free score, can objectively reflect the quality of the algorithm and the quality field of the algorithm, and can give clear optimization direction to an algorithm designer, so that the algorithm optimization is more targeted and targeted.

Claims

1. A multi-division target detection algorithm evaluation system is characterized by comprising:

the algorithm input module is used for inputting an algorithm to be tested;

a composite scoring module comprising:

the overall image quality data set generation unit is used for carrying out noise adding processing on the test set, obtaining a distance index of each image of the test set according to the image quality evaluation index, and dividing the test set into data sets with a plurality of quality grades according to the distance index;

the overall image resolution data set generating unit is used for dividing the test set into a plurality of data sets with different levels according to the image resolution;

the overall target quality data set generation unit is used for dividing the test set into a plurality of data sets of different grades according to the target quality;

the overall target size data set generating unit is used for dividing the test set into a plurality of data sets with different levels according to the target size;

the overall image quality scoring unit is used for testing the algorithms to be tested respectively through the data sets generated by the overall image quality data set generating unit, calculating mAP respectively, and weighting mAP scores corresponding to all levels by using normalized weights to obtain overall image quality scores;

the overall image resolution scoring unit is used for testing the algorithms to be tested respectively through the data sets generated by the overall image resolution data set generating unit, calculating mAP respectively, and weighting mAP scores corresponding to all levels by using average weight to obtain overall image resolution scores;

the overall target quality scoring unit is used for testing the algorithms to be tested respectively through the data sets generated by the overall target quality data set generating unit, calculating the mAP respectively, and weighting the mAP scores corresponding to all levels by utilizing the normalized weight to obtain the overall target quality score;

the overall target size scoring unit is used for testing the algorithms to be tested respectively through the data sets generated by the overall target data set generating unit, calculating the mAP respectively, and weighting the mAP scores corresponding to all levels by using average weight to obtain the overall target size score;

the overall comprehensive scoring unit is used for carrying out average weighting processing on the overall image quality score, the overall image resolution score, the overall target quality score and the overall target size score to obtain an overall comprehensive score;

an application scenario scoring module comprising:

the scene image quality data set generation unit is used for carrying out noise adding processing on the test set, obtaining a distance index of each image of the test set according to the image quality evaluation index, dividing the test set into data sets with a plurality of quality grades according to the distance index, and taking one of the two data sets with the best grade as a scene image quality data set;

the scene image resolution data set generating unit is used for generating a scene image resolution data set corresponding to a preset resolution parameter from the test set according to the preset resolution parameter;

a scene target quality data set generating unit, configured to generate a scene target quality data set corresponding to a preset target quality parameter from the test set according to the preset target quality parameter;

a scene target size data set generating unit, configured to generate a scene target size data set corresponding to a preset target size parameter from the test set according to the preset target size parameter;

the scene image quality scoring unit is used for testing the algorithm to be tested through the scene image quality data set generated by the scene image quality data set generating unit and calculating mAP to obtain a scene image quality score;

the scene image resolution scoring unit is used for testing the algorithm to be tested through the data set generated by the scene image resolution data set generating unit and calculating the mAP to obtain a scene image resolution score;

the scene target quality scoring unit is used for testing the algorithm to be tested through the data set generated by the scene target quality data set generating unit and calculating the mAP to obtain a scene target quality score;

the scene target size scoring unit is used for testing the algorithm to be tested through the data set generated by the scene target size data set generating unit and calculating the mAP to obtain a scene target size score;

the scene comprehensive scoring unit is used for carrying out average weighting processing on the scene image quality score, the scene image resolution score, the scene target quality score and the scene target size score to obtain an application scene score;

a free-form scoring module comprising:

the free scene presetting unit is used for responding at least one free scene parameter of an image quality parameter, an image resolution parameter, a target quality parameter and a target size parameter input by a user and correspondingly selecting a scene corresponding to the free scene parameter from an image quality scene, a free image resolution scene, a free target quality scene and a free target size scene as a free test scene;

the free scene data set generating unit is used for generating a corresponding free data set from the test set according to the free test scene and the free scene parameters;

the free scoring unit is used for testing the algorithm to be tested through the free data set, calculating the mAP to obtain a free score corresponding to a free testing scene, and when more than two free scores are obtained, carrying out average weighting on the scores to obtain a comprehensive free score; when only one free score is obtained, the free score is taken as a comprehensive free score.

2. The multi-division target detection algorithm evaluation system according to claim 1, wherein the comprehensive scoring module further comprises a normalized weight generation unit for setting degree-of-homogeneity of different levels for mAP scores of different levels; acquiring mAP (minimum Address Power) scores which correspond to the same degree in time and respectively according to different grades, and calculating weights corresponding to the grades; and carrying out normalization processing on the weight corresponding to each grade to obtain the normalization weight corresponding to each grade.

3. The multi-component object detection algorithm evaluation system of claim 1, wherein the application scenario scoring module further comprises a scenario presetting unit for responding to user input to complete the presetting of resolution parameters, object quality parameters and object size parameters.

4. The multi-split target detection algorithm evaluation system of claim 1, further comprising: and the output module is used for outputting the evaluation result and displaying and/or outputting the printable document containing the evaluation result.

5. The multi-division target detection algorithm evaluation system according to claim 4, wherein the evaluation result is at least one of a total composite score, an application scenario score, and a composite free score.

6. A multi-division target detection algorithm evaluation method is characterized by comprising the following steps:

responding to the input selection of the user, and executing at least one scoring method of a comprehensive scoring method, an application scene scoring method and a free scoring method; wherein:

the comprehensive scoring method comprises the following steps:

the method comprises the steps of conducting noise adding processing on a test set, obtaining a distance index of each image of the test set according to an image quality evaluation index, dividing the test set into data sets with a plurality of quality levels according to the distance index, testing algorithms to be tested respectively, calculating mAP respectively, and conducting weighting processing on mAP scores corresponding to all the levels by utilizing normalization weights to obtain an overall image quality score;

dividing the test set into a plurality of data sets of different levels according to the resolution of the image, respectively testing algorithms to be tested, respectively calculating mAP, and performing weighting processing on mAP scores corresponding to the levels by using average weight to obtain total image resolution scores;

dividing the test set into a plurality of data sets with different grades according to the quality difference of the target, respectively testing the algorithm to be tested, respectively calculating mAP, and weighting mAP scores corresponding to all grades by utilizing normalized weights to obtain total target quality scores;

dividing the test set into a plurality of data sets with different grades according to the target size, testing algorithms to be tested respectively, calculating mAP respectively, and weighting mAP scores corresponding to all grades by using average weight to obtain total target size scores;

carrying out average weighting processing on the total image quality score, the total image resolution score, the total target quality score and the total target size score to obtain a total comprehensive score;

the application scene scoring method comprises the following steps:

the method comprises the steps of conducting noise adding processing on a test set, obtaining a distance index of each image of the test set according to an image quality evaluation index, dividing the test set into data sets with a plurality of quality grades according to the distance index, and taking one of two data sets with the best grade as a scene image quality data set; testing the algorithm to be tested through the scene image quality data set, and calculating the mAP to obtain a scene image quality score;

generating a scene image resolution data set corresponding to a preset resolution parameter from the test set according to the preset resolution parameter; testing the algorithm to be tested through the scene image resolution data set, and calculating the mAP to obtain a scene image resolution score;

generating a scene target quality data set corresponding to a preset target quality parameter from a test set according to the preset target quality parameter; testing the algorithm to be tested through the scene target quality data set, and calculating the mAP to obtain a scene target quality score;

generating a scene target size data set corresponding to a preset target size parameter from the test set according to the preset target size parameter; testing the algorithm to be tested through a scene target size data set, and calculating the mAP to obtain a scene target size score;

carrying out average weighting processing on the scene image quality score, the scene image resolution score, the scene target quality score and the scene target size score to obtain an application scene score;

the free scoring method comprises the following steps:

responding to at least one free scene parameter of an image quality parameter, an image resolution parameter, a target quality parameter and a target size parameter input by a user, and correspondingly selecting a scene corresponding to the free scene parameter from a free image quality scene, a free image resolution scene, a free target quality scene and a free target size scene as a free test scene;

generating a corresponding free data set from the test set according to the free test scene and the free scene parameters;

testing the algorithm to be tested through a free data set, calculating mAP to obtain free scores corresponding to free test scenes, and carrying out average weighting on the scores to obtain comprehensive free scores when more than two free scores are obtained; when only one free score is obtained, the free score is used as a comprehensive free score.