CN111402185A

CN111402185A - Image detection method and device

Info

Publication number: CN111402185A
Application number: CN201811528544.XA
Authority: CN
Inventors: 张修宝; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-07-10
Anticipated expiration: 2038-12-13
Also published as: CN111402185B

Abstract

The application relates to the technical field of image processing, in particular to an image detection method and device, wherein the method comprises the following steps: acquiring a monitoring video, and determining the pixel information difference between any two adjacent frames of images in the monitoring video; then, screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video, and intercepting a target area image from the at least one candidate image; and finally, determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting the target object information included in each candidate image by using each determined target object detection model. By the method, the calculated amount in the target object detection process can be reduced, and the accuracy and the detection efficiency of the target object detection are considered.

Description

Image detection method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image detection method and apparatus.

Background

At present, in the application fields of video monitoring, security protection, unmanned driving and the like, detection of a target object in a monitored video is involved, and the target object is, for example, a pedestrian or a vehicle appearing in the monitored video. When detecting a target object in an existing surveillance video, each frame of image in the surveillance video is generally input into a preset target object detection model so as to detect whether each frame of image includes feature information of the target object.

However, whether a target object appears in each frame of image of the monitoring video, and related information such as the number and position of the target object are not fixed, so that if the detection method is adopted, the detection efficiency may be low, and the accuracy of detection may also be affected.

Disclosure of Invention

In view of this, embodiments of the present application provide an image detection method and apparatus to improve efficiency and accuracy of image detection.

In a first aspect, an embodiment of the present application provides an image detection method, including:

acquiring a monitoring video, and determining the pixel information difference between any two adjacent frames of images in the monitoring video;

screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video, and intercepting a target area image from the at least one candidate image;

and determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting the target object information included in each candidate image by using the determined target object detection models.

In one possible embodiment, the feature information of the target area image includes at least one of the following information:

the number of target region images, the area ratio between the total area of the target region images and the total area of the corresponding candidate images.

In one possible embodiment, the determining, based on the feature information of the target area image corresponding to each candidate image, a target object detection model matching each candidate image includes:

for a kth candidate image of the at least one candidate image, k being a positive integer, performing the following:

when the number of target area images corresponding to the kth candidate image is less than or equal to a preset number, determining a target object detection model matched with the kth candidate image as a first target object detection model;

when the number of the target area images corresponding to the kth candidate image is larger than the preset number, determining that a target object detection model matched with the kth candidate image is a second target object detection model;

wherein the complexity of the second target object detection model is higher than the complexity of the first target object detection model.

when the number of target area images corresponding to the kth candidate image is larger than the preset number and the area ratio is larger than a preset area threshold value, determining that a target object detection model matched with the kth candidate image is a second target object detection model;

when the number of target area images corresponding to the kth candidate image is greater than the preset number and the area ratio is less than or equal to the preset area threshold, determining that a target object detection model matched with the kth candidate image is a third target object detection model;

wherein the complexity of the first target object detection model is lower than the complexity of the second target object detection model, and the complexity of the second target object detection model is lower than the complexity of the third target object detection model.

In one possible implementation, the detecting, by using the determined target object detection models, target object information included in each candidate image respectively includes:

inputting each target area image intercepted from the kth candidate image into a target object detection model matched with the kth candidate image respectively, and detecting target object information included in the kth candidate image; or,

and splicing all the target area images intercepted from the kth candidate image, inputting the spliced target area images into a target object detection model matched with the kth candidate image, and detecting target object information included in the kth candidate image.

In a possible embodiment, the stitching the target area images cut from the kth candidate image includes:

calculating the area of each target region image intercepted from the kth candidate image;

arranging the areas of the images of the target areas in a descending order;

and splicing the target area images intercepted from the kth candidate image based on the obtained sequencing result to obtain a spliced target area image.

In a possible implementation, the target object information included in the kth candidate image includes at least one of the following information:

the mark information of the region image of the target object exists in the kth candidate image;

and mapping the area image of each existing target object to coordinate position information on the corresponding image in the monitoring video.

In one possible embodiment, the coordinate position information of the area image of each existing target object mapped to the corresponding image in the surveillance video is detected according to the following method:

determining the relative coordinate distance between the coordinate position of a second selected pixel point in each regional image of the existing target object and the first selected pixel point by taking the coordinate position of the first selected pixel point in the corresponding image in the monitoring video as a reference coordinate position;

based on the relative coordinate distance, adjusting the coordinate position of each pixel point in the regional image of each existing target object;

and determining the coordinate position of each pixel point after adjustment in the regional image of each existing target object as the coordinate position of the regional image of each existing target object mapped to the corresponding image in the monitoring video.

In a possible implementation manner, the determining a difference degree of pixel information between any two adjacent frames of images in the surveillance video includes:

executing a first processing process aiming at the ith frame image and the (i + 1) th frame image in the monitoring video, wherein i is a positive integer; wherein the first processing procedure comprises:

converting the ith frame image into a first gray level image, and converting the (i + 1) th frame image into a second gray level image;

subtracting the gray values of pixel points of the second gray image and the first gray image one by one to obtain a third gray image;

and determining the pixel information difference degree between the (i + 1) th frame image and the ith frame image based on the third grayscale image.

In one possible embodiment, the determining the difference degree of the pixel information between the i +1 th frame image and the i frame image based on the third grayscale image includes:

determining first-class pixel points with the gray values larger than a first set threshold value and second-class pixel points with the gray values not larger than the first set threshold value in the third gray image;

adjusting the gray value of the first type pixel point to be a first numerical value, and adjusting the gray value of the second type pixel point to be a second numerical value to obtain a fourth gray image;

and determining the pixel information difference degree between the (i + 1) th frame image and the ith frame image according to the number of the pixel points of which the gray values in the fourth gray image are the first values.

In a possible implementation manner, the determining the candidate images in the monitored video whose pixel information difference degrees meet a preset condition includes:

and when the pixel information difference degree between the (i + 1) th frame image and the ith frame image is determined to be larger than a set difference degree threshold value, determining a candidate image corresponding to the (i + 1) th frame image according to the (i + 1) th frame image and the fourth gray-scale image.

In a possible implementation manner, the determining, according to the i +1 th frame image and the fourth grayscale image, a candidate image corresponding to the i +1 th frame image includes:

determining a gray area image formed by pixel points of which the gray values are the first numerical values in the fourth gray image;

determining a candidate area image matched with the gray area image in the (i + 1) th frame image;

adjusting the pixel values of the pixel points of the other region images except the candidate region image in the (i + 1) th frame image to the second numerical value;

and determining the adjusted (i + 1) th frame image as a candidate image corresponding to the (i + 1) th frame image.

In a possible embodiment, the intercepting a target area image from the at least one candidate image includes:

executing a second processing procedure for a jth candidate image in the at least one candidate image, j being a positive integer; wherein the second processing procedure comprises:

determining pixel points of which the pixel values are not the second values in the jth candidate image;

and intercepting at least one target area image containing pixel points with pixel values not being second numerical values from the jth candidate image.

In a second aspect, an embodiment of the present application provides an image detection apparatus, including:

the determining module is used for acquiring a monitoring video and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video;

the screening module is used for screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video and intercepting a target area image from the at least one candidate image;

and the detection module is used for determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting the target object information included in each candidate image by using each determined target object detection model.

In one possible design, the feature information of the target area image includes at least one of the following information:

In one possible design, the detection module, when determining the target object detection model matching each candidate image based on the feature information of the target area image corresponding to each candidate image, is specifically configured to:

In one possible design, when the detection module respectively detects target object information included in each candidate image by using the determined target object detection models, the detection module is specifically configured to:

In one possible design, when stitching the target area images captured from the kth candidate image, the detection module is specifically configured to:

arranging the areas of the images of the target areas in a descending order;

In one possible design, the target object information included in the kth candidate image includes at least one of the following information:

In one possible design, the detection module detects coordinate position information of mapping each area image of the existing target object onto a corresponding image in the surveillance video according to the following manner:

In one possible design, when determining the difference between pixel information of any two adjacent frames of images in the surveillance video, the determining module is specifically configured to:

In one possible design, when determining the difference degree of the pixel information between the i +1 th frame image and the i th frame image based on the third grayscale image, the determining module is specifically configured to:

In one possible design, when determining that the pixel information difference in the monitored video meets a candidate image of a preset condition, the screening module is specifically configured to:

In a possible design, when determining the candidate image corresponding to the i +1 th frame image according to the i +1 th frame image and the fourth grayscale image, the screening module is specifically configured to:

In one possible design, when the filtering module is configured to intercept the target area image from the at least one candidate image, the filtering module is specifically configured to:

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the image detection method of the first aspect described above, or any possible implementation of the first aspect.

In a fourth aspect, this application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the image detection method according to the first aspect, or any one of the possible implementation manners of the first aspect.

According to the image detection method and device provided by the embodiment of the application, based on the pixel information difference between any two adjacent frames of images in the monitored video, candidate images with the pixel information difference meeting the preset conditions are screened out from the monitored video, then based on the target area images intercepted from the candidate images, and according to the characteristic information of the target area images corresponding to each candidate image, the target object detection model matched with each candidate image is determined, and the target object information included in each candidate image is respectively detected by using the determined target object detection models. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, and the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect the target object information by combining the characteristic information of the target area image in which the target object possibly appears, so that the detection efficiency and the detection accuracy can be improved.

In order to make the aforementioned objects, features and advantages of the embodiments of the present application more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic flowchart illustrating an image detection method provided in an embodiment of the present application;

fig. 2 is a schematic flowchart illustrating a target object detection model determining method according to an embodiment of the present application;

fig. 3 is a schematic flowchart illustrating a target object detection model determining method according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a spliced target area image according to an embodiment of the present disclosure;

FIG. 5 shows a schematic flowchart of a target area image stitching method provided by an embodiment of the present application;

FIG. 6 is a flow chart illustrating a first process execution provided by an embodiment of the present application;

fig. 7 is a schematic diagram illustrating an example of a calculation process of a third grayscale image provided by an embodiment of the present application;

fig. 8 is a schematic diagram illustrating an example of a fourth grayscale image determination method according to an embodiment of the present application;

FIG. 9 is a flowchart illustrating a candidate image determination method according to an embodiment of the present application;

FIG. 10 is a diagram illustrating an example of candidate image determination provided by an embodiment of the application;

FIG. 11 is a diagram illustrating an example of candidate image determination provided by an embodiment of the application;

FIG. 12 is a flow chart illustrating a second process implementation provided by an embodiment of the present application;

FIG. 13 is a schematic diagram illustrating coordinate transformation provided by an embodiment of the present application;

FIG. 14 is a flow chart illustrating an image detection method provided by an embodiment of the present application;

FIG. 15 is a flowchart illustrating a method for training a target object detection model according to an embodiment of the present disclosure;

fig. 16 is a schematic diagram illustrating an architecture of an image detection apparatus 1600 provided in an embodiment of the present application;

fig. 17 shows a schematic structural diagram of an electronic device 170 according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The following detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

First, an application scenario to which the present application is applicable will be described. The method and the device can be applied to application scenes of monitoring suspects possibly appearing in a certain area, counting pedestrians or vehicles appearing in the certain area in a specified time period and the like. In the prior art, each frame of image in a surveillance video is mainly input into a preset target object detection model, and whether each frame of image includes characteristic information of a target object is detected.

In an example, when the selected target object detection model is a simple neural network model, if the image features in the input image are complex, deep features may not be accurately extracted by using the simple neural network model, and thus the accuracy of detecting the target object is low. For example, when a certain image in the surveillance video includes a large number of target objects, if a simple network model is adopted, the deep features of the existing large number of target objects cannot be extracted, and then the accuracy rate is low when the image with the large number of target objects is detected.

In another example, when the selected target object detection model is a complex neural network model, if the image features in the input image are relatively simple, the detection result can be accurately determined without extracting deep features, and in this case, the detection efficiency of the target object is low by using the complex neural network model. For example, when a few images in the surveillance video or only one target object appears, the detection result can be accurately determined without extracting deep features, and therefore, the detection efficiency is low when the images with few or only one target object appear are detected by adopting a complex neural network model.

It is worth noting that if some images in the monitoring video are complex, deep level sub-features in the images cannot be extracted through a simple neural network model, so that the accuracy rate is low when the target object is detected; if some images in the monitoring video are simple, the detection result can be accurately determined without extracting deep level sub-features in the images, and the detection efficiency is low when the detection is carried out through a complex neural network model. Therefore, the prior art cannot give consideration to both the detection efficiency and the detection accuracy when detecting the target object appearing in the surveillance video.

In view of the above problems, embodiments of the present application provide an image detection method and apparatus, where a candidate image with a pixel information difference meeting a preset condition is screened from a surveillance video based on a pixel information difference between any two adjacent frames of images in the surveillance video, then a target object detection model matching each candidate image is determined based on a target area image captured from the candidate image, and according to feature information of the target area image corresponding to each candidate image, and target object information included in each candidate image is detected by using the determined target object detection models. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, and the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect the target object information by combining the characteristic information of the target area image in which the target object possibly appears, so that the detection efficiency and the detection accuracy can be improved.

The following describes the image detection method and apparatus provided in the present application in detail with reference to specific embodiments.

Example one

Referring to fig. 1, a schematic flow chart of an image detection method provided in an embodiment of the present application is shown, including the following steps:

step 101, acquiring a monitoring video, and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video.

The pixel information difference degree can be understood as the difference of pixel information between two adjacent frames of images, and when the pixel information in two adjacent frames of images in the monitored video changes, the pixel information difference degree between two adjacent frames of images is not zero.

For example, if no moving object is present in a certain area monitored by the monitoring video, the monitored picture of each frame of image in the monitoring video is the same, for example, the monitoring video at the gate of a certain cell in late night, in this case, the pixel information of two adjacent frames of images in the monitoring video is not changed, and therefore, the corresponding pixel information difference degree is zero; on the contrary, if a moving object appears in a certain region monitored by the monitoring video, images with different monitored pictures exist in the monitoring video, and in this case, the pixel information between two adjacent frames of images changes, so that the degree of difference of the corresponding pixel information is not zero.

Step 102, screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video, and intercepting a target area image from the at least one candidate image.

In specific implementation, the preset condition may be that a difference degree of pixel information between a current frame image and a previous frame image is greater than a set difference degree threshold, and when the current frame image satisfies the condition, a candidate image of the current frame image may be determined based on the current frame image; and when the pixel information difference between the current frame image and the previous frame image is not larger than the set difference threshold, the current frame image does not contain the moving object, and whether the next frame image contains the moving object is further analyzed.

When the pixel information difference between the current frame image and the previous frame image meets the preset condition, the pixel information difference between the current frame image and the previous frame image is larger, so that the current frame image is an image with a possible target object.

However, considering the interference of some environmental factors, for example, when the leaves are included in the surveillance video and the leaves are blown due to wind interference in the external environment, the pixel information of the part of the leaves included in the adjacent frame images of the surveillance video may change, but at this time, the current frame image does not include the target object, but may be determined as a candidate image whose pixel information difference meets the preset condition, and therefore, the candidate image may be further processed to accurately identify the image of the target object existing in the surveillance video.

In the embodiment of the application, after the candidate image is determined according to the pixel information difference, in order to accurately detect the local image with changed pixel information, the target area image may be intercepted from the candidate image.

For example, the image a is a current frame image, the image B is a previous frame image of the current frame image, the image a and the image B are composed of 1, 2, 3 and 4 area images, the pixel difference between the image a and the image B meets a preset condition, and a candidate image corresponding to the image a can be determined; however, if only the difference between the pixel information of the 1 region of the image a and the pixel information of the 1 region of the B image is large, it can be said that the target object may exist in the 1 region of the a image, and then the 1 region of the a image can be cut out as the target region image when the candidate image corresponding to the image a is detected.

Specifically, the method of extracting the target area image from the candidate image will be described in embodiment two, and will not be described here.

And 103, determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting target object information included in each candidate image by using each determined target object detection model.

Wherein the feature information of the target area image comprises at least one of the following information:

(1) the number of the target area images is considered, and the number of the target area images can be used as the feature information of the target area images when the number of the target area images intercepted from any one candidate image is at least one;

(2) and an area ratio between the total area of the target region image and the total area of the corresponding candidate image.

In a possible implementation manner, when the feature information of the target area image includes the number of the target area images, then when determining the target object detection model matching with each candidate image based on the feature information of the target area image corresponding to each candidate image, taking the kth candidate image in at least one candidate image as an example, where k is a positive integer, the target object detection model determination method shown in fig. 2 may be performed, and includes the following steps:

step 201, acquiring the number of target area images corresponding to the kth candidate image.

Step 202, judging whether the number of target area images corresponding to the kth candidate image is larger than a preset number.

If yes, go to step 203;

if the determination result is negative, step 204 is executed.

Step 203, determining the target object detection model matched with the kth candidate image as a second target object detection model.

And 204, determining a target object detection model matched with the kth candidate image as a first target object detection model.

Wherein the complexity of the second target object detection model is higher than the complexity of the first target object. In specific implementation, considering that the more the number of the target area images is, the more complicated the features included in the candidate image are, the more deep features need to be extracted, and based on this, when the number of the target area images corresponding to the kth candidate image is greater than the preset number, determining the target object detection model matched with the kth candidate image as the second target object detection model; and when the number of the target area images corresponding to the kth candidate image is less than or equal to the preset number, determining the target object detection model matched with the kth candidate image as the first target object detection model.

In another possible embodiment, if the feature information of the target region image includes the number of target region images and the area ratio between the total area of the target region images and the total area of the corresponding candidate images, when determining the target object detection model matching each candidate image based on the feature information of the target region image corresponding to each candidate image, taking the kth candidate image in at least one candidate image as an example and k as a positive integer, the target object detection model determination method shown in fig. 3 may be further performed, and includes the following steps:

step 301, acquiring the number of target region images corresponding to the kth candidate image and the area ratio between the total area of the target region images corresponding to the kth candidate image and the total area of the kth candidate image.

Step 302, determining whether the number of target area images corresponding to the kth candidate image is greater than a preset number.

If the determination result is negative, go to step 303;

if yes, go to step 304.

Step 303, determining the target object detection model matched with the kth candidate image as the first target object detection model.

And step 304, judging whether the area ratio between the total area of the target area image corresponding to the kth candidate image and the total area of the kth candidate image is larger than a preset area threshold value.

If yes, go to step 305;

if the determination result is negative, go to step 306.

And 305, determining a target object detection model matched with the kth candidate image as a second target object detection model.

And step 306, determining the target object detection model matched with the kth candidate image as a third target object detection model.

Wherein the complexity of the first target object detection model is lower than the complexity of the second target object detection model and the complexity of the third target object detection model. In a specific implementation, considering that the larger the number of target area images, the more complicated the features included in the candidate image are, the more deep features need to be extracted, based on which, when the number of target area images is less than or equal to the preset number, the target object detection model matched with the kth candidate image is determined as the first target object detection model, and when the number of target area images is greater than the preset number, the target object detection model matched with the kth candidate image is determined as the second target object detection model or the third target object detection model.

When the number of the target region images is greater than the preset number, in order to select a more matched target object detection model, in the embodiment of the present application, further determination may be performed according to an area ratio between a total area of the target region images corresponding to the candidate images and a total area of the candidate images. Specifically, when the area ratio between the total area of the target region image corresponding to the candidate image and the total area of the candidate image is smaller, it is described that the total area of the target region image is smaller, a more complex target object detection model is required to perform deep feature extraction, and thus the target object is detected.

In one possible implementation, the complexity of the third target object detection model is higher than the complexity of the second target object detection model. When the number of target area images corresponding to the kth candidate image is larger than the preset number and the area ratio is larger than the preset area threshold value, determining a target object detection model matched with the kth candidate image as a second target object detection model; and when the number of the target area images corresponding to the kth candidate image is larger than the preset number and the area ratio is smaller than or equal to the preset area threshold, determining the target object detection model matched with the kth candidate image as a third target object detection model.

Therefore, based on the above embodiment, the complexity of the target object detection model in the embodiment of the present application is, in order from high to low: a third target object detection model, a second target object detection model, a first target object detection model.

In an example of the present application, the first target object detection model may be, for example, a MobileNet model, a ShuffleNet model, or the like; the second target object detection model may be, for example, a ResNet18 model, a ResNet34 model, or the like; the third target object detection model may be, for example, a ResNet50 model, a ResNet101 model, or the like, and the network model may be another network model in practical applications, which is not limited in the present application.

Specifically, the training process of the target object detection model will be described in detail in embodiment four, and will not be described here for the moment.

After determining a target object detection model matched with each candidate image based on the feature information of the target area image corresponding to each candidate image, the target object detection models may be used to respectively detect target object information included in each candidate image, taking the kth candidate image in at least one candidate image as an example, where k is a positive integer, and the following two ways may be referred to:

the first method is as follows: respectively inputting each target area image intercepted from the kth candidate image into a target object detection model matched with the kth candidate image, and detecting target object information included in the kth candidate image;

before each target area image is respectively input into the target object detection model matched with the kth candidate image, the images of the target area images can be adjusted to be images with the same size, then the target area images after size adjustment are sequentially input into the target object identification model, and after the target area images input each time are identified by the target object identification model, if target object information is detected, the input target area images can be labeled.

The second method comprises the following steps: and splicing all the target area images intercepted from the kth candidate image, inputting the spliced target area images into a target object detection model matched with the kth candidate image, and detecting target object information included in the kth candidate image.

The target area images are spliced and then input into the target object detection model, so that the detection of a plurality of target area images can be realized simultaneously, and compared with the method provided by the first mode, the method improves the detection efficiency.

For example, when stitching target area images cut from the kth candidate image, reference may be made to the target area image stitching method shown in fig. 5, which includes the following steps:

step 501, calculating the area of each target region image cut out from the k-th candidate image.

Step 502, arranging the areas of the images of the target areas in the descending order.

And 503, splicing the target area images intercepted from the kth candidate image based on the obtained sequencing result to obtain a spliced target area image.

During specific splicing, at least one target area image can be spliced according to the principle of minimum area, so that a spliced target area image is obtained. For example, after at least one target area is sorted from large to small, the target areas are spliced in a data structure manner of a binary tree, so as to obtain a spliced target area image as shown in fig. 4.

According to the image detection method provided by the embodiment of the application, based on the pixel information difference degree between any two adjacent frames of images in the monitored video, candidate images with the pixel information difference degree meeting the preset conditions are screened out from the monitored video, then based on the target area images intercepted from the candidate images, and according to the characteristic information of the target area images corresponding to each candidate image, the target object detection model matched with each candidate image is determined, and the target object information included in each candidate image is respectively detected by using the determined target object detection models. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, and the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect the target object information by combining the characteristic information of the target area image in which the target object possibly appears, so that the detection efficiency and the detection accuracy of the target object can be improved.

Example two

With reference to the image detection process described in the first embodiment, the image detection method provided in the first embodiment is specifically described below.

In a specific implementation, when determining a difference degree of pixel information between any two frames of images in a monitored video, taking an ith frame of image and an (i + 1) th frame of image in the monitored video as an example, where i is a positive integer, a first processing procedure is executed, where an execution step of the first processing procedure may refer to a first processing procedure execution flowchart shown in fig. 6, and includes the following steps:

step 601, converting the ith frame image into a first gray scale image, and converting the (i + 1) th frame image into a second gray scale image.

In one possible implementation, in the case that each frame of image in the surveillance video is a color image, the ith frame of image may be converted into a first gray scale image, and the (i + 1) th frame of image may be converted into a second gray scale image, so as to determine the pixel information difference degree by comparing the difference of gray scale values between the ith frame of image and the (i + 1) th frame of image.

Step 602, subtracting the gray values of the pixel points of the second gray image and the first gray image one by one, and then obtaining a third gray image.

In specific implementation, the first grayscale image and the second grayscale image are derived from the same monitored video, so that the image sizes and resolutions of the first grayscale image and the second grayscale image are the same, and if the first grayscale image and the second grayscale image both include M × N pixel points, the grayscale values of two pixel points at the same position in the two images can be subtracted, and then the third grayscale image is obtained after the subtraction.

For example, fig. 7 is a schematic diagram illustrating an example of a calculation process of a third grayscale image, where in fig. 7, an image 1 is a first grayscale image and includes 9 pixel points, a number in the image represents a grayscale value of each pixel point in the first grayscale image, an image 2 is a second grayscale image and includes 9 pixel points, a number in the image represents a grayscale value of each pixel point in the second grayscale image, an image 3 is a third grayscale image obtained by subtracting the first grayscale image and the second grayscale image, the third grayscale image includes 9 pixel points, and a number in the image represents a grayscale value of each pixel point in the third grayscale image.

Step 603, determining the pixel information difference degree between the i +1 th frame image and the i-th frame image based on the third gray scale image.

In one example, the third grayscale image may be first subjected to binarization processing to obtain a binarized fourth grayscale image.

Specifically, the pixel points in the third gray image with the gray value larger than the first set threshold value may be determined as the first-class pixel points, and the pixel points in the third gray image with the gray value not larger than the first set threshold value may be determined as the second-class pixel points; then, adjusting the gray value of the first type of pixel points to a first numerical value, and adjusting the gray value of the second type of pixel points to a second numerical value to obtain a fourth gray image; and finally, determining the pixel information difference degree between the (i + 1) th frame image and the ith frame image according to the number of pixel points of which the gray values in the fourth gray image are the first values. Wherein the first value is not equal to the second value.

For example, fig. 8 is a schematic diagram illustrating an example of a fourth grayscale image determining method, where an image 4 is a third image, and the image 4 includes 9 pixel points, where a value on the image 4 represents a grayscale value of a pixel point corresponding to each value. If the first set threshold is set to be 5, the image 5 is the pixel type determined by each pixel point of the third image, a "1" in the image 5 indicates that the pixel point is a first-type pixel point, and a "2" in the image 5 indicates that the pixel point is a second-type pixel point.

For example, if the first value may be 0 and the second value may be 255, the image 4 in fig. 8 may be converted into the image 6, where the first value may be 0 and the second value may be 255, the converted image 6 is a binary image, and in an actual application, the image may also be adjusted to be a non-binary image, which is not limited in this application.

In a specific implementation, after the fourth grayscale image is obtained by adjusting the grayscale value, it is considered that the pixel information of the pixel points in the partial image may change, but after calculation and conversion, the pixel information is not displayed in the fourth grayscale image with the correct pixel value, and at this time, the fourth grayscale image needs to be refined, for example, the fourth grayscale image may be subjected to expansion operation, erosion operation, opening operation, closing operation, and the like, so that the fourth grayscale image is converted into a grayscale image with a clear edge and no hole in the middle of the image. Specifically, the processing procedures of the dilation operation, erosion operation, open operation, and close operation will not be described.

In a possible implementation manner, when determining the difference degree of the pixel information between the i +1 th frame image and the i th frame image according to the number of the pixels with the gray scale value of the first value in the fourth gray scale image, for example, a ratio between the number of the pixels with the gray scale value of the first value and the number of the pixels in the whole image may be used as the difference degree of the pixel information, or the number of the pixels with the gray scale value of the first value may be directly used as the difference degree of the pixel information.

After the pixel information difference degree between any two adjacent frames of images in the monitored video is calculated, a candidate image with the pixel information difference degree meeting a preset condition in the monitored video can be determined based on the pixel information difference degree between any two adjacent frames of images, and specifically, when the pixel information difference degree between the i +1 th frame of image and the i-th frame of image is determined to be greater than a set difference degree threshold value, the candidate image corresponding to the i +1 th frame of image can be determined according to the i +1 th frame of image and the fourth gray scale image.

In an example, the method for determining a candidate image corresponding to the i +1 th frame image may refer to a flowchart of a candidate image determination method shown in fig. 9, and includes the following steps:

and step 901, determining a gray area image formed by pixel points of which the gray values are the first numerical values in the fourth gray image.

Because the gray value of the pixel point in the fourth gray image has two values: the image processing method comprises the steps that a first numerical value and a second numerical value are obtained, pixels with the gray value being the first numerical value are pixels, the difference between the gray value of the pixels with the gray value being the first numerical value and the gray value of the pixels of the previous frame is larger than a first set threshold, pixels with the gray value being the second numerical value are pixels, the difference between the gray value of the pixels with the gray value being the previous frame is not larger than the first set threshold, and the image area, the difference between the gray value of the current frame image and the gray value of the pixels of the previous frame is not larger.

And step 902, determining a candidate area image matched with the gray area image in the (i + 1) th frame image.

In one possible implementation, the coordinate position of the grayscale region image in the fourth grayscale image may be determined first, and then the region image corresponding to the determined coordinate position in the i +1 th frame image may be determined based on the determined coordinate position, and the region image corresponding to the determined coordinate position may be used as the candidate region image.

Step 903, adjusting the pixel values of the pixel points of the images in the other areas except the candidate area image in the (i + 1) th frame image to be a second numerical value.

In one example, the second value is, for example, 255. Through the above processing, the image areas except for the candidate area image in the i +1 th frame image can be adjusted to be white areas, and only the pixel information of the candidate area image in the i +1 th frame image is retained.

And step 904, determining the adjusted (i + 1) th frame image as a candidate image corresponding to the (i + 1) th frame image.

In an example, as shown in fig. 10, an image a in fig. 10 represents a fourth grayscale image, where the first value is 0, the second value is 255, an image B represents an i +1 th frame image, an image C represents a candidate image, a black region in the image a represents a grayscale region image with a grayscale value of 0, a part framed by a white line in the image B represents an image in the i +1 th frame image that matches the grayscale region image, and after adjusting pixel values of pixel points of region images other than the candidate region image in the i +1 th frame image to the second value (i.e., 255), the candidate image shown in the image C is obtained.

Considering that there may be more than one part of the gray-scale region image in the fourth gray-scale image, as shown in fig. 11, the two black regions in the image a of fig. 11 represent the gray-scale region image of the fourth image, and then two regions in the i +1 th frame image represented by the image B are matched with the gray-scale region image in the image a, so that the candidate image comprising the two target region images shown in the image C can be obtained.

After the target area image in the candidate image is determined in the above manner, the target area image can be intercepted from the candidate image, and then the target area image is identified, so that the process of identifying the whole candidate area image is omitted, and the calculation amount is reduced.

In a specific implementation, for the j-th candidate image in the determined candidate images, j is a positive integer, the second processing procedure is executed to intercept the target area image from the candidate images.

The second processing procedure may be as shown in fig. 12, and includes the following steps:

step 1201, determining pixel points of which the pixel values are not the second numerical values in the jth candidate image;

step 1202, at least one target area image containing pixel points with pixel values not being the second numerical value is intercepted from the jth candidate image.

Further, when detecting whether the determined candidate image includes the target object information based on the intercepted target area image, the target object information included in the jth candidate image may be determined based on at least one target area image intercepted from the jth candidate image and a pre-trained target object detection model, and specifically, the process of selecting the target object detection model and identifying the target area image by using the target object detection model refers to the method described in the first embodiment, and will not be described herein again.

The target object information may include at least one of the following information:

(1) the marking information of the region image in which the target object appears in the kth candidate image may be, for example, a rectangular frame for marking the region image in which the target object appears in the target region image;

(2) and mapping the area image of each existing target object to coordinate position information on the corresponding image in the monitoring video.

In the embodiment of the application, after the target object area image with the label is obtained, the position coordinate and position information of the target object on the i +1 th frame image of the monitored video can be determined based on the target object area image with the label.

In a possible implementation manner, the coordinate position of a first selected pixel point in a corresponding image in a monitoring video can be taken as a reference coordinate position, and the relative coordinate distance between the coordinate position of a second selected pixel point in each regional image of the existing target object and the first selected pixel point is determined; then, based on the relative coordinate distance, the coordinate position of each pixel point in the regional image of each existing target object is adjusted; and determining the coordinate position of each pixel point after adjustment in the regional image of each existing target object as the coordinate position of the regional image of each existing target object mapped to the corresponding image in the monitoring video.

As shown in the coordinate transformation diagram of fig. 13, the right gray area represents the target area image, and the i +1 th frame image on the left side is the corresponding image in the monitored video to which the target area image is mapped.

Taking the first selected pixel point as the O point at the upper left corner in the i +1 th frame image as an example, and taking the O point as the origin to establish the first coordinate system on the i +1 th frame image, the position of the O point in the first coordinate system is (0, 0), wherein the coordinate position of the point a at the upper left corner in the target area image on the first coordinate system is (x0, y 0).

Taking the second selected pixel point as the point a ' at the upper left corner in the target area image as an example, and taking a ' as the origin to establish a second coordinate system on the target area image, the position of the point a ' in the second coordinate system is (0, 0).

Since the A 'point and the A point correspond to the same pixel point, the relative coordinate distance between the O point and the A point, namely the relative coordinate distance between the O point and the A' point, can be determined, and therefore the relative coordinate distance can be determined to be (x0, y 0).

Further, assuming that after the candidate image is detected, it is determined that the B '(x, y) point in the target area image shown in fig. 11 is a certain pixel point in the area image where the target object appears, after the B' point is mapped onto the i +1 th frame image, the coordinate position of the obtained B point on the i +1 th frame image is (x + x0, y + y 0).

In a possible embodiment, after the target object information is detected in the target area image, the target object included in the target area image may be labeled in a form of a box, coordinates of four vertices of the box labeling the target object in the target area image may be determined, then coordinates of the four vertices of the box labeling the target object in the video frame image may be determined according to a corresponding relationship between a coordinate position of the target area image and a coordinate position of the video frame image, the target object is labeled in the video frame image, and then the position of the target object is determined in the video frame image. Here, the video frame image is a corresponding image in the monitoring video to which the target area image is mapped.

According to the method, a current video frame image and a previous video frame image are converted into gray level images, then a third gray level image is determined by utilizing the difference of gray level values among pixel points one by one, then the gray level value of the pixel point of which the gray level value is larger than a first set threshold value in the third gray level image is adjusted to be a first numerical value, the gray level value of the pixel point of which the gray level value is not larger than the first set threshold value is adjusted to be a second numerical value, and therefore a fourth gray level image is determined; then, determining a candidate image based on the fourth gray image and the current video frame image; further, at least one target area image is intercepted from the candidate image, and the at least one target area image is spliced to obtain a spliced target area image; and finally, determining a matched target object detection model based on the spliced target area images, and determining target object information contained in the candidate images based on the spliced target area images and the target object detection model matched with the spliced target area images. By the method, not all images in the monitoring video are detected, but partial area images of the images with greatly changed pixel information are detected, so that the calculated amount in the target object information is reduced; and according to the spliced target area images, a proper target object detection model is selected, so that the accuracy and the detection efficiency of target object detection are improved.

EXAMPLE III

In the embodiment of the present application, an image detection method provided in the first embodiment is described with reference to the second embodiment, taking an ith frame and an (i + 1) th frame of an image in a monitoring video as an example, and as shown in fig. 14, the method includes the following steps:

and 1401, determining a third gray image according to the ith frame image and the (i + 1) th frame image.

Specifically, the ith frame image may be converted into a first gray scale image, the (i + 1) th frame image may be converted into a second gray scale image, and then the gray scale values of the pixel points of the first gray scale image and the second gray scale image are subtracted one by one to obtain a third gray scale image.

Step 1402, adjusting the gray value of each pixel point in the third gray image, and determining the adjusted image as a fourth gray image.

In one possible embodiment, the gray scale value of the point in the third gray scale image with the gray scale value larger than the first set threshold may be adjusted to a first value, and the gray scale value of the point in the third gray scale image with the gray scale value not larger than the first set threshold may be adjusted to a second value. Therefore, the gray value of the pixel point included in the fourth gray image has two possible values: a first value and a second value.

And step 1403, determining the pixel information difference degree between the i +1 th frame image and the i frame image according to the fourth gray scale image.

In an example of the present application, a ratio of the number of the first-value pixel points in the fourth grayscale image to the i +1 th frame image may be used as the pixel information difference.

Step 1404, determining candidate images of the i +1 th frame image according to the pixel information difference degree.

In specific implementation, when the difference degree of the pixel information is greater than a set difference degree threshold value, determining a gray area image formed by pixel points of which the gray values are first numerical values in the fourth gray image; then determining a candidate area image matched with the gray area image in the (i + 1) th frame image; and adjusting the pixel values of the pixel points of the images in the (i + 1) th frame except the candidate region image to be second numerical values, and determining the adjusted (i + 1) th frame as the candidate image corresponding to the (i + 1) th frame.

Step 1405, intercepting the target area image from the candidate image.

In one possible embodiment, pixels having pixel values not equal to the second value are extracted from the candidate image of the i +1 th frame image, and an image formed by the pixels is determined as the target area image, wherein the determined target area image may be at least one.

Step 1406, based on the feature information of the target area image intercepted from the candidate image, determines the target object detection model matched with the candidate image.

The method for determining the target object detection model matched with the candidate image refers to the method described in the first embodiment, and will not be described herein again.

Step 1407 detects target object information included in the candidate image using the determined target object detection model.

In an example, at least one captured target area image may be first stitched, and then the stitched image is input into a pre-trained target object detection model, so as to label the target area image including the target object.

After the target area image with the label is obtained, the coordinate position of a first selected pixel point in the candidate image can be determined as a reference coordinate position, then the relative coordinate distance between the coordinate position of a second selected pixel point of the target area image and the first selected pixel point is determined, then the position coordinates of the pixel points in the area image with the target object appearing in the target area image are adjusted based on the relative distance coordinates, and finally the coordinate position of each pixel point adjusted in the area image with each existing target object is determined to be the coordinate position of each area image with each existing target object mapped to the i +1 th frame image of the monitoring video.

By the method, not all images in the monitoring video are detected, but partial area images of the images with greatly changed pixel information are detected, so that the calculated amount in the target object information is reduced; and according to the spliced target area images, a proper target object detection model is selected to realize the detection of the target object, so that the accuracy and the detection efficiency of the target object detection are improved.

Example four

In a fourth embodiment of the present application, a training process of a target object detection model is described in detail, and as shown in a flowchart of a training method of a target object detection model shown in fig. 15, the method includes the following steps:

step 1501, a training sample image set and a verification sample image set of the target object detection model are obtained.

Specifically, the training sample image set may be an image including a target object, wherein the target may be at least one target object. For example, the set of training sample images may be a set of images containing target object a, images containing target object B, images containing target object C, images containing target object D. And verifying the sample image set into a set of sample images of each sample image included in the training sample image set and marked by the target object information.

And 1502, sequentially inputting each sample image in the training sample image set into the target recognition model to obtain a training result of the training sample image set.

And 1503, determining the accuracy of the target object detection model based on the training result of the training sample image set and the verification sample image set.

And 1504, judging whether the accuracy is greater than a preset accuracy.

If yes, go to step 1505;

if the determination result is negative, step 1506 is executed.

Step 1505, determining that the training of the target object detection model is completed.

And step 1506, adjusting the model parameters of the target object detection model, then returning to the step 1501, and continuing to train the target object detection model until the accuracy of the training result of the target object detection model is determined to be greater than the preset accuracy.

By adopting the embodiment, the target object included in the target area image can be identified through the target area image and the pre-trained target object detection model, so that the identification of each pixel point of the image is avoided, and the efficiency of identifying the target object is improved.

EXAMPLE five

Referring to fig. 16, a schematic diagram of an architecture of an image inspection apparatus 1600 provided in the embodiment of the present application includes a determining module 1601, a screening module 1602, and an inspecting module 1603, specifically:

the determining module 1601 is configured to obtain a monitoring video, and determine a difference degree of pixel information between any two adjacent frames of images in the monitoring video;

a screening module 1602, configured to screen at least one candidate image in which the difference of the pixel information meets a preset condition from the surveillance video, and intercept a target area image from the at least one candidate image;

a detecting module 1603, configured to determine a target object detection model matching each candidate image based on the feature information of the target area image corresponding to each candidate image, and detect target object information included in each candidate image respectively by using the determined target object detection models.

In one possible design, the detecting module 1603, when determining the target object detection model matching each candidate image based on the feature information of the target area image corresponding to each candidate image, is specifically configured to:

In one possible design, the detecting module 1603, when detecting the target object information included in each candidate image by using the determined target object detection models, is specifically configured to:

In one possible design, the detecting module 1603, when stitching the target area images cut from the kth candidate image, is specifically configured to:

arranging the areas of the images of the target areas in a descending order;

and taking the target area image arranged at the head as a reference area image, and sequentially splicing the target area images arranged at the head and behind on the left side and the right side of the reference area image to obtain a spliced kth candidate image.

In one possible design, the detecting module 1603 detects the coordinate position information of the area image of each existing target object mapped to the corresponding image in the surveillance video according to the following method:

In one possible design, the determining module 1601 is specifically configured to, when determining a difference degree of pixel information between any two adjacent frames of images in the surveillance video:

In one possible design, the determining module 1601, when determining the difference degree of the pixel information between the i +1 th frame image and the i th frame image based on the third gray scale image, is specifically configured to:

In a possible design, the screening module 1602, when determining that the pixel information difference in the monitored video meets a candidate image of a preset condition, is specifically configured to:

In a possible design, the screening module 1602, when determining the candidate image corresponding to the i +1 th frame image according to the i +1 th frame image and the fourth grayscale image, is specifically configured to:

In one possible design, the filtering module 1602, when intercepting the target area image from the at least one candidate image, is specifically configured to:

The image detection device provided by the embodiment of the application screens out candidate images with pixel information difference degrees meeting preset conditions from a monitoring video based on the pixel information difference degrees between any two adjacent frames of images in the monitoring video, then determines target object detection models matched with each candidate image according to the characteristic information of the target area image corresponding to each candidate image based on the target area image intercepted from the candidate image, and respectively detects the target object information included in each candidate image by using each determined target object detection model. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, and the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect the target object information by combining the characteristic information of the target area image in which the target object possibly appears, so that the detection efficiency and the detection accuracy of the target object can be improved.

EXAMPLE six

Based on the same technical concept, the embodiment of the application also provides the electronic equipment. Referring to fig. 17, a schematic structural diagram of an electronic device 170 provided in the embodiment of the present application includes a processor 171, a memory 172, and a bus 173. The memory 172 is used for storing execution instructions, and includes a memory 1721 and an external memory 1722; the memory 1721 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 171 and data exchanged with an external memory 1722 such as a hard disk, the processor 171 exchanges data with the external memory 1722 through the memory 1721, and when the computer device 170 is operated, the processor 171 communicates with the memory 172 through the bus 173, so that the processor 171 executes the following instructions:

The specific processing flow of the processor 171 may refer to the description of the above method embodiment, and is not described herein again.

Based on the same technical concept, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the image detection method.

Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and when the computer program on the storage medium is executed, the image detection method can be executed to reduce the amount of calculation in the target object detection process and improve the detection efficiency and accuracy of the target object.

Based on the same technical concept, embodiments of the present application further provide a computer program product, which includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the image detection method, and specific implementation may refer to the above method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image detection method, comprising:

2. The method of claim 1, wherein the feature information of the target area image comprises at least one of:

3. The method of claim 2, wherein determining a target object detection model matching each candidate image based on feature information of a target area image corresponding to each candidate image comprises:

4. The method of claim 2, wherein determining a target object detection model matching each candidate image based on feature information of a target area image corresponding to each candidate image comprises:

5. The method according to any one of claims 1 to 4, wherein the detecting target object information included in each candidate image separately by using the determined target object detection models comprises:

6. The method of claim 5, wherein said stitching the target region images cut from the kth candidate image comprises:

arranging the areas of the images of the target areas in a descending order;

7. The method of claim 5, wherein the target object information included in the kth candidate image comprises at least one of:

8. The method of claim 7, wherein the coordinate position information that the area image of each existing target object is mapped onto the corresponding image in the surveillance video is detected according to:

9. The method of claim 1, wherein the determining the difference degree of pixel information between any two adjacent frames of images in the surveillance video comprises:

10. The method of claim 9, wherein the determining a pixel information disparity between the i +1 th frame image and the i-th frame image based on the third grayscale image comprises:

11. The method according to claim 10, wherein the determining the candidate images in the monitored video whose pixel information difference degree meets a preset condition includes:

12. The method of claim 11, wherein determining the candidate image corresponding to the i +1 th frame image according to the i +1 th frame image and the fourth gray scale image comprises:

13. The method of any of claims 9 to 12, wherein said truncating the target area image from the at least one candidate image comprises:

14. An image detection apparatus, characterized by comprising: the determining module is used for acquiring a monitoring video and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video;

15. The apparatus of claim 14, wherein the feature information of the target area image comprises at least one of:

16. The apparatus of claim 15, wherein the detection module, when determining the target object detection model matching each candidate image based on the feature information of the target area image corresponding to each candidate image, is specifically configured to:

17. The apparatus of claim 15, wherein the detection module, when determining the target object detection model matching each candidate image based on the feature information of the target area image corresponding to each candidate image, is specifically configured to:

18. The apparatus according to any one of claims 14 to 17, wherein the detecting module, when detecting the target object information included in each candidate image respectively by using the determined target object detection models, is specifically configured to:

19. The apparatus as claimed in claim 18, wherein the detecting module, when stitching the target area images cut from the kth candidate image, is specifically configured to:

arranging the areas of the images of the target areas in a descending order;

20. The apparatus of claim 18, wherein the target object information included in the kth candidate image comprises at least one of:

21. The apparatus of claim 20, wherein the detection module detects coordinate position information that the area image of each existing target object is mapped onto a corresponding image in the surveillance video according to:

22. The apparatus according to claim 14, wherein the determining module, when determining the difference degree of the pixel information between any two adjacent frames of images in the surveillance video, is specifically configured to:

23. The apparatus according to claim 22, wherein the determining module, when determining the pixel information difference between the i +1 th frame image and the i frame image based on the third grayscale image, is specifically configured to:

24. The apparatus according to claim 23, wherein the screening module, when determining that the pixel information difference in the monitored video meets a candidate image of a preset condition, is specifically configured to:

25. The apparatus according to claim 24, wherein the screening module, when determining the candidate image corresponding to the i +1 th frame image according to the i +1 th frame image and the fourth grayscale image, is specifically configured to:

26. The apparatus according to any one of claims 22 to 25, wherein the filtering module, when extracting the target area image from the at least one candidate image, is specifically configured to:

27. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the image detection method according to any one of claims 1 to 13.

28. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the image detection method according to one of claims 1 to 13.