CN117197086A

CN117197086A - Image detection method, device, computer equipment and storage medium

Info

Publication number: CN117197086A
Application number: CN202311164877.XA
Authority: CN
Inventors: 倪启业; 李安
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-09-11
Filing date: 2023-09-11
Publication date: 2023-12-08

Abstract

The application relates to an image detection method, an image detection device, computer equipment and a storage medium, which can be applied to the technical field of artificial intelligence. The method comprises the steps of obtaining an image to be detected, and then carrying out feature extraction on the image to be detected through a feature extraction network to obtain a target feature map of the image to be detected; the feature extraction network comprises an attention convolution module and a pooling layer; finally, classifying the target feature images through a classification network to obtain detection results of the images to be detected; the detection result is that the image to be detected is tampered or the image to be detected is not tampered. According to the image detection method, the attention convolution module and the pooling layer are used for extracting the characteristics of the image to be detected, so that the target characteristic diagram of the image to be detected can accurately reflect the characteristics of the image, the image detection result is more accurate, and the accuracy of image tampering detection is improved.

Description

Image detection method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image detection method, apparatus, computer device, and storage medium.

Background

At present, a large amount of image data is used in many transaction scenes and business scenes, and if the image data is tampered, the transaction or business security is greatly affected. Along with the continuous progress of technology, image processing technology is developed, and a large amount of image editing software is derived, wherein the image editing software has a powerful image editing function, and editing and tampering of images by the software can often achieve the aim of taking a spurious step which is indistinguishable only by naked eyes. Thus, image tampering detection is required for the image material.

In the prior art, a detection model is generally adopted to detect the image, and the accuracy is improved by continuously increasing the depth of the neural network. However, increasing the depth of the neural network does not improve the detection accuracy well.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an image detection method, apparatus, computer device, and storage medium that can improve the accuracy of image tamper detection.

In a first aspect, the present application provides an image detection method. The method comprises the following steps:

acquiring an image to be detected;

Extracting the characteristics of the image to be detected through a characteristic extraction network to obtain a target characteristic diagram of the image to be detected; wherein the feature extraction network comprises an attention convolution module and a pooling layer;

classifying the target feature images through a classification network to obtain detection results of the images to be detected; and the detection result is that the image to be detected is tampered or the image to be detected is not tampered.

In one embodiment, the feature extraction network includes at least two end-to-end convolution pooling layers, each convolution pooling layer including an attention convolution module and a pooling layer.

In one embodiment, the extracting the features of the image to be detected through the feature extraction network to obtain the target feature map of the image to be detected includes:

extracting the characteristics of the input information of the image through each convolution pooling layer, and taking the output result of the last convolution pooling layer as a target characteristic diagram of the image to be detected; the input information of the first convolution pooling layer is the image to be detected, and the input information of any other convolution pooling layer is the output result of the last convolution pooling layer of the convolution pooling layer.

In one embodiment, feature extraction is performed on input information of the convolutional pooling layers through the convolutional pooling layers, and the method comprises the following steps:

aiming at each convolution pooling layer, performing feature extraction on the input information of the convolution pooling layer through an attention convolution module contained in the convolution pooling layer to obtain a basic feature map;

and compressing the basic feature map through a pooling layer contained in the convolution pooling layer to obtain an output result of the convolution pooling layer.

In one embodiment, the attention convolution module comprises a first convolution module, a second convolution module, a convolution attention module and a joint feature convolution module, the input information comprises first input information and second input information, and the base feature map comprises a first base feature map and a second base feature map;

the feature extraction is performed on the input information of the convolution pooling layer through the attention convolution module contained in the convolution pooling layer to obtain a basic feature map, and the method comprises the following steps:

performing feature extraction on the first input information of the convolution pooling layer through the first convolution module and the convolution attention module to obtain a first basic feature map;

extracting features of second input information of the convolution pooling layer through the second convolution module to obtain a first intermediate feature map;

And fusing the first basic feature map and the first intermediate feature map through the joint feature convolution module to obtain a second basic feature map.

In one embodiment, the feature extraction of the first input information of the convolution pooling layer by the first convolution module and the convolution attention module to obtain a first basic feature map includes:

extracting features of the first input information of the convolution pooling layer through the first convolution module to obtain a second intermediate feature map;

and extracting channel characteristics and space characteristics of the second intermediate characteristic diagram through the convolution attention module to obtain a first basic characteristic diagram.

In a second aspect, the present application further provides an image detection apparatus. The device comprises:

the acquisition module is used for acquiring the image to be detected;

the extraction module is used for carrying out feature extraction on the image to be detected through a feature extraction network to obtain a target feature map of the image to be detected; wherein the feature extraction network comprises an attention convolution module and a pooling layer;

the classification module is used for carrying out classification processing on the target feature images through a classification network to obtain detection results of the images to be detected; and the detection result is that the image to be detected is tampered or the image to be detected is not tampered.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

acquiring an image to be detected;

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

acquiring an image to be detected;

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

acquiring an image to be detected;

According to the image detection method, the device, the computer equipment and the storage medium, the feature extraction network comprising the attention convolution module and the pooling layer is introduced to perform feature extraction on the image to be detected, so that the target feature image of the image to be detected can be obtained to accurately reflect the features of the image, and further, the result of classifying the target feature image through the classification network is more accurate, so that whether the image to be detected is tampered or not can be accurately determined, and the accuracy of image tampering detection is improved.

Drawings

FIG. 1 is a diagram of an application environment for an image detection method in one embodiment;

FIG. 2 is a flow chart of an image detection method according to an embodiment;

FIG. 3 is a schematic diagram of a feature extraction network in one embodiment;

FIG. 4 is a schematic diagram of the structure of an attention convolution module in one embodiment;

FIG. 5 is a flow diagram of a process for obtaining a base feature map in one embodiment;

FIG. 6 is a schematic diagram of a convolution attention module in one embodiment;

FIG. 7 is a schematic diagram of an image detection model in one embodiment;

FIG. 8 is a schematic diagram of an image detection apparatus according to an embodiment;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The image detection method provided by the embodiment of the application can be suitable for detecting whether the image is tampered or not. The method can be performed by a server, by a terminal with relatively strong calculation forces, or by interaction between the terminal and the server. For example, fig. 1 is an application environment diagram of an image detection method provided in an embodiment of the present application. The terminal 102 may send the image to be detected to the server 104 through a network, and the server 104 may integrate an image detection model, where the image detection model includes a feature extraction network and a classification network, so that the server 104 may tamper-detect the image to be detected through the two networks. Optionally, the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and so on. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, an image detection method is provided, and the method is applied to the server 104 in fig. 2 for illustration, and includes the following steps:

s201, acquiring an image to be detected.

In the embodiment of the present application, an image to be detected may be understood as an image for detecting whether or not it is tampered with.

Alternatively, there may be a plurality of ways to obtain the image to be detected, which is not limited in the embodiment of the present application. For example, one implementation may be that when the terminal 102 initiates an image tampering detection request to the server 104, the terminal 102 sends an image to be detected to the server 104; another embodiment may be that, when the server 104 detects that there is an image tampering detection request, the image to be detected or the like is acquired under a specified storage path.

S202, extracting features of the image to be detected through a feature extraction network to obtain a target feature map of the image to be detected.

Optionally, the feature extraction network is used for extracting feature information of the image to be detected. Further, the feature extraction network includes an attention convolution module and a pooling layer.

After the image to be detected is obtained, the feature extraction can be performed on the image to be detected through a attention convolution module in a feature extraction network, and the extracted features are compressed by utilizing a pooling layer in the feature extraction network, so that a target feature diagram of the image to be detected is obtained.

S203, classifying the target feature images through a classification network to obtain detection results of the images to be detected.

Optionally, the classification network may output a classification result corresponding to the feature information according to the feature information.

For example, a classification network can be utilized to classify the target feature map to obtain a detection result of the image to be detected; further, the detection result may be that the image to be detected is tampered or that the image to be detected is not tampered. That is, the detection result is used to indicate that the image to be detected is a tampered image or an untampered image.

According to the image detection method, the feature extraction network comprising the attention convolution module and the pooling layer is introduced to perform feature extraction on the image to be detected, so that the target feature image of the image to be detected can accurately reflect the features of the image, and further, the result of classification processing on the target feature image is more accurate through the classification network, so that whether the image to be detected is tampered or not can be accurately determined, and the accuracy of image tampering detection is improved.

In one embodiment, referring to fig. 3, fig. 3 is a schematic structural diagram of a feature extraction network according to an embodiment of the present application. The feature extraction network comprises at least two end-to-end convolution pooling layers, and each convolution pooling layer comprises an attention convolution module and a pooling layer. In this way, the convolution pooling layer in the feature extraction network is used for carrying out layer-by-layer feature extraction on the image to be detected, so that the obtained target feature map of the image to be detected can reflect the features of the image more, and the accuracy of the detection result of the image to be detected is improved.

On the basis of fig. 3, the embodiment of the application provides an implementation manner for acquiring a target feature map of an image to be detected, which specifically can be as follows: extracting the characteristics of the input information of the convolution pooling layers, and taking the output result of the last convolution pooling layer as a target characteristic diagram of the image to be detected; the input information of the first convolution pooling layer is an image to be detected, and the input information of any other convolution pooling layer is an output result of the last convolution pooling layer of the convolution pooling layer.

The following describes in detail an example in which the feature extraction network has two convolutional pooling layers.

The method comprises the steps of carrying out feature extraction on an input image to be detected through a attention convolution module in a first convolution pooling layer, inputting extracted feature information into the pooling layer in the first convolution pooling layer for compression, taking the compressed feature information as an output result of the first convolution pooling layer, and inputting the compressed feature information into a second convolution pooling layer as input information of the second convolution pooling layer. The attention convolution module of the second convolution pooling layer performs feature extraction on the output result of the first convolution pooling layer, and inputs the extracted feature information to the pooling layer of the second convolution pooling layer for compression to obtain the output result; since the second convolution pooling layer is the last convolution pooling layer in this embodiment, the output result of the second convolution pooling layer is used as the target feature map of the image to be detected.

In the embodiment of the application, the convolution pooling layer in the feature extraction network is used for carrying out layer-by-layer feature extraction on the image to be detected, so that the obtained target feature image of the image to be detected can reflect the features of the image, and the accuracy of the detection result of the image to be detected is improved.

In one embodiment, the convolution pooling layer in embodiments of the present application may be comprised of an attention convolution module and a pooling layer. When the input information of each convolution pooling layer is subjected to feature extraction, the input information of each convolution pooling layer can be subjected to feature extraction through an attention convolution module contained in the convolution pooling layer to obtain a basic feature map; and then compressing the basic feature map through a pooling layer contained in the convolution pooling layer to obtain an output result of the convolution pooling layer.

Optionally, in combination with the embodiment shown in fig. 3, when the feature extraction network includes two end-to-end convolution pooling layers, the attention convolution module in the first convolution pooling layer performs feature extraction on the input information of the first convolution pooling layer to obtain a basic feature map; and then compressing the basic feature map through a pooling layer contained in the first convolution pooling layer to obtain an output result of the first convolution pooling layer. Then, the output result of the first convolution pooling layer is input into a second convolution pooling layer, and the attention convolution module in the second convolution pooling layer performs feature extraction on the output information of the first convolution pooling layer to obtain a basic feature map; and then compressing the basic feature map obtained by the attention convolution module in the second convolution pooling layer through the pooling layer contained in the second convolution pooling layer to obtain an output result of the second convolution pooling layer.

In the embodiment of the application, the attention convolution modules in each convolution pooling layer are used for extracting the image characteristics layer by layer, so that the obtained target characteristic image of the image to be detected can reflect the characteristics of the image, and the accuracy of the detection result of the image to be detected is further improved.

In order to improve the accuracy of the image detection result, the structure of the feature extraction network is further refined. In one embodiment, referring to fig. 4, fig. 4 is a schematic structural diagram of an attention convolution module according to an embodiment of the present disclosure. The attention convolution module in the embodiment of the present application may be composed of a first convolution module, a second convolution module, a convolution attention module and a joint feature convolution module, and the connection relationship of these modules is shown in fig. 4.

Illustratively, the first convolution module may be a VGG convolution module in a VGG-16 model, where the convolution kernel parameters may employ historical empirical parameters; the second convolution module can be a common convolution module, and the convolution kernel parameters of the second convolution module can be randomly generated by random numbers; the convolution kernel parameters of the convolution attention module and the joint feature convolution module can be randomly generated by random numbers.

Optionally, in the embodiment of the present application, the input information input into the convolutional pooling layer includes first input information and second input information; the obtained basic feature map comprises a first basic feature map and a second basic feature map. Furthermore, on the basis of fig. 4, with reference to fig. 5, for any convolution pooling layer, when the input information of the convolution pooling layer is subjected to feature extraction by the attention convolution module included in the convolution pooling layer, the method can be specifically implemented by the following steps:

S501, performing feature extraction on first input information of the convolution pooling layer through a first convolution module and a convolution attention module to obtain a first basic feature map.

S502, performing feature extraction on second input information of the convolution pooling layer through a second convolution module to obtain a first intermediate feature map.

S503, fusing the first basic feature map and the first intermediate feature map through a joint feature convolution module to obtain a second basic feature map.

The first input information of the convolution pooling layer can be subjected to feature extraction through a first convolution module and a convolution attention module in the convolution pooling layer to obtain a first basic feature map; meanwhile, the second input information of the convolution pooling layer can be subjected to feature extraction through a second convolution module in the convolution pooling layer to obtain a first intermediate feature map; and then inputting the second characteristic diagram and the first intermediate characteristic diagram into a joint characteristic convolution module, and fusing the first basic characteristic diagram and the first intermediate characteristic diagram by the joint characteristic convolution module to obtain the second basic characteristic diagram.

It should be noted that, each convolution pooling layer includes an attention convolution module, that is, the number of convolution pooling layers is the same as that of the attention convolution modules, when the convolution pooling layer is the first convolution pooling layer, the attention convolution module in the convolution pooling layer is the first attention convolution module, and at this time, the first input information and the second input information are the same and are all images to be detected. Correspondingly, other convolution pooling layers correspond to other attention convolution modules, first input information and second input information input into any other attention convolution module are different, and the first input information input into any other attention convolution module is a characteristic diagram obtained by the last attention convolution module after pooling layers are laminated; and inputting second input information in any other attention convolution module into a feature map obtained by the previous attention convolution module after the second basic feature map is compressed by a pooling layer.

In the embodiment of the application, the attention convolution modules in each convolution pooling layer are used for extracting image characteristics layer by layer, two branches are arranged in each attention convolution module, and a first basic characteristic diagram is obtained by the first convolution module and the convolution attention module; the second branch is used for obtaining a first intermediate feature image through a second convolution module, and the first basic feature image and the intermediate feature image are fused through a joint feature convolution module to obtain a second basic feature image, so that the obtained first basic feature image and second basic feature image can reflect the features of the image more, and the accuracy of the detection result of the image to be detected is improved.

In one embodiment, to improve the accuracy of the image detection result, the structure of the convolution attention module is further refined. Referring to fig. 6, fig. 6 is a schematic structural diagram of a convolution attention module according to an embodiment of the present disclosure. The convolution attention module in the embodiment of the application can be composed of a channel attention module and a space attention module.

Alternatively, the channel attention module may use a 3-dimensional template, where the height and width are 1 and the depth is the same as the number of channels of the input profile. The spatial attention module may also use a 3-dimensional template, where the height and width are the same as the height and width of the input feature map, and the depth is 1.

Optionally, the expression of the convolution attention module is as follows:

wherein F represents an input feature map, M _c Representing channel attention operations, M _s The spatial attention operation is represented as such,representing the multiplication of the corresponding pixel points.

Optionally, when the first input information of the convolution pooling layer is subjected to feature extraction through the first convolution module and the convolution attention module to obtain a first basic feature map, the first input information of the convolution pooling layer can be subjected to feature extraction through the first convolution module to obtain a second intermediate feature map; and then, extracting channel characteristics and space characteristics of the second intermediate characteristic diagram through a convolution attention module to obtain a first basic characteristic diagram.

Specifically, a first convolution module performs feature extraction on first input information of the convolution pooling layer to obtain a second intermediate feature map, then inputs the second intermediate feature map into a channel attention module, and performs corresponding pixel point multiplication operation on the feature map output by the channel attention module and the second intermediate feature map to obtain a channel feature map; and then inputting the channel feature map into a spatial attention module, carrying out corresponding pixel point multiplication operation on the feature map output by the spatial attention module and the channel feature map to obtain a spatial feature map, and taking the spatial feature map as a first basic feature map.

In some embodiments, to increase the efficiency of the channel attention module to extract features, the feature images may be compressed in the spatial dimension using a max-pooling layer and an average pooling layer on the feature images before the channel attention module extracts features; likewise, to increase the efficiency of the spatial attention module to extract features, the feature images may be compressed in the channel dimension using a max-pooling layer and an average pooling layer on the feature images before the channel attention module extracts features.

According to the embodiment of the application, the channel attention module and the space attention module are arranged in the convolution attention module, and the channel attention module and the space attention module are utilized to perform feature extraction on the second intermediate feature map, so that the obtained first basic feature map obtains richer feature content and can reflect the features of the image, and therefore, a shorter network model can be used to obtain a more accurate detection result.

In one embodiment, referring to fig. 7, fig. 7 is a schematic structural diagram of an image detection model according to an embodiment of the present application. The feature extraction network in the image detection model comprises five convolution pooling layers connected end to end. The image detection model in the embodiment of the application designs 5 layers of attention convolution modules, each of which is followed by a pooling layer. And finally summarizing the features by using the full connection layer. The embodiment of the application adopts a VGG-16 network model as a basic network. 16 in VGG-16 means that this network contains 16 convolutional layers. The structure of VGG-16 is relatively simple and the network structure is very regular, with several convolutional layers followed by a pooling layer that can compress the image size. Meanwhile, the change of the number of the filters of the convolution layer has a certain rule, namely 64 times is changed into 128, and 256 and 512 are reached. At the same time, the model can well extract image content in a shallow layer, so that the model is often used as an infrastructure, and the main disadvantage of the model is that the number of features required to be trained is very large. Therefore, the embodiment of the application only selects the first 5 convolution and pooling layers of VGG-16 as the basic framework of the downsampling module, changes the convolution layers in the downsampling module and adds an attention mechanism, which is called an attention convolution module.

It should be appreciated that the dataset needs to be cropped and uniformly sized prior to training the image detection model. Meanwhile, in order to further improve the robustness of the training model, enhance the generalization capability of the training model and avoid the phenomenon of overfitting, the off-line image enhancement technology can be utilized to process the training set data, namely, the image data is processed before the model training to form a fixed data set. The treatment strategy is as follows:

rotation variation: the images were rotated 90 degrees, 180 degrees, 270 degrees to simulate images at different shooting angles.

Color dithering: image chromaticity, saturation and contrast are randomly enhanced, and image data under different light conditions are simulated.

Sharpening: the edge contour of the image is enhanced, and images with different definition are simulated.

When the image detection model is trained, the difference between the detection result and the real result can be calculated by adopting a loss function so as to further adjust the parameters of the model and improve the detection accuracy of the model.

Specifically, the embodiment of the application can use the two classification cross entropy loss functions to calculate the gap. Firstly, marking the prediction probability output by the classification function as y ', marking the other prediction probability as 1-y', and the calculation formula of the difference is as follows:

Wherein L represents the difference between the predicted result and the true result, y' represents the predicted probability, y represents the true result, and N represents the number of samples.

After the gap value is obtained, parameters are adjusted through a back propagation algorithm, so that the gap value is reduced. The formula of the reverse algorithm is as follows:

wherein w' represents the adjusted parameter, α represents the learning rate, w represents the parameter of the current detection model, and the derivative formula of w is as follows:

where L represents the gap value, s represents the activation function, and y' represents the predictive probability.

Thus, the image detection model can adjust model parameters according to the adjusted parameters w' so as to improve the detection accuracy of the image detection model.

The following describes an image detection method provided by the embodiment of the present application with reference to fig. 7, and the method specifically includes the following steps:

step one, an image to be detected is obtained.

Alternatively, there may be a plurality of ways to obtain the image to be detected, which is not limited in the embodiment of the present application. For example, one implementation may be that when the terminal 102 initiates an image tampering detection request to the server 104, the terminal 102 sends an image to be detected to the server 104; another embodiment may be that the server 104 acquires the image to be detected under a specified storage path when detecting that there is an image falsification detection request.

And secondly, inputting the image to be detected into the attention convolution module 1 to obtain a first basic feature map and a second basic feature map.

Inputting an image to be detected into an attention convolution module 1 in an image detection model; the attention convolution module 1 comprises a first convolution module, a second convolution module, a convolution attention module and a joint characteristic convolution module. The first convolution module in the attention convolution module 1 is used for extracting the characteristics of the image to be detected input into the attention convolution module 1 to obtain a second intermediate characteristic diagram; then inputting the second intermediate feature map into a channel attention module in the convolution attention module, and carrying out the operation of multiplying the feature map output by the channel attention module by the corresponding pixel points of the second intermediate feature map to obtain a channel feature map; and then inputting the channel feature map into a spatial attention module in the convolution attention module, carrying out corresponding pixel point multiplication operation on the feature map output by the spatial attention module and the channel feature map to obtain a spatial feature map, and taking the spatial feature map as a first basic feature map.

Likewise, the second convolution module also performs feature extraction on the image to be detected input into the attention convolution module 1 to obtain a first intermediate feature map; and then fusing the first basic feature map and the first intermediate feature map through a joint feature convolution module to obtain a second basic feature map.

And thirdly, inputting the first basic feature map and the second basic feature map into the pooling layer 1.

The first basic feature map and the second basic feature map are input into the pooling layer 1, and the pooling layer 1 is used for compressing the first basic feature map and the second basic feature map so as to perform the feature extraction operation of the next convolution pooling layer.

And step four, inputting a compression result of the first basic feature map compressed by the pooling layer 1 into the attention convolution module 2, and inputting a compression result of the second basic feature map compressed by the pooling layer 1 into the attention convolution module 2. The attention convolution module 2 also comprises a first convolution module, a second convolution module, a convolution attention module and a joint characteristic convolution module. The first convolution module in the attention convolution module 2 is used for carrying out feature extraction on the compression result of the first basic feature map compressed by the pooling layer 1 to obtain a third intermediate feature map; then inputting the third intermediate feature map into a channel attention module in the convolution attention module, and performing the operation of multiplying the feature map output by the channel attention module by the corresponding pixel points of the third intermediate feature map to obtain a channel feature map; and then inputting the channel feature map into a spatial attention module in the convolution attention module, carrying out corresponding pixel point multiplication operation on the feature map output by the spatial attention module and the channel feature map to obtain a spatial feature map, and taking the spatial feature map as a third basic feature map.

Similarly, the second convolution module also performs feature extraction on the compressed result of the second basic feature map input into the attention convolution module 2 after being compressed by the pooling layer 1 to obtain a fourth intermediate feature map; and then the terminal fuses the fourth basic feature map and the fourth intermediate feature map through a joint feature convolution module to obtain a fourth basic feature map.

And fifthly, inputting the third basic feature map and the fourth basic feature map into the pooling layer 2.

The terminal inputs the third basic feature map and the fourth basic feature map into the pooling layer 2, and compresses the third basic feature map and the fourth basic feature map through the pooling layer 2 so as to perform the feature extraction operation of the next convolution pooling layer.

And step six, correspondingly, carrying out feature extraction on the output result corresponding to the pooling layer 2 through the three subsequent convolution pooling layers to obtain a target feature map.

The operation of the three latter convolution pooling layers in the embodiment of the present application for extracting features of the basic feature map is the same as the operation logic of the first two convolution pooling layers described in the above embodiment, and specific operations of the three latter convolution pooling layers are not described here.

And step seven, classifying the target feature images through a classification network to obtain detection results of the images to be detected.

The classifying function of the classifying network can adopt a Sigmoid function, and the classifying network is used for classifying the target feature image to obtain a detection result of the image to be detected, wherein the detection result is used for indicating whether the image to be detected is tampered or not.

Optionally, the detection result of the image to be detected includes two categories, namely that the image is tampered and the image is not tampered. And carrying out classification processing on the target feature map through a classification network to obtain a detection result of the image to be detected, specifically setting a prediction probability corresponding to the classification result output by the classification function, judging whether the prediction probability is larger than a preset probability threshold, and outputting the detection result according to the comparison result. For example, a prediction probability corresponding to the falsified classification function output image is set, and the prediction probability threshold is set to 95%; in this way, when the prediction probability corresponding to the tampered image output by the classification function is greater than 95%, the obtained detection result is that the image is tampered, otherwise, the obtained detection result is that the image is not tampered.

Illustratively, the pooling layers 1-5 in the above embodiments may employ a maximum pooling layer of 2x 2.

The specific processes of the first to seventh steps may be referred to the description of the above method embodiments, and the implementation principle and technical effects are similar, and are not repeated herein.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an image detection device for realizing the image detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the image detection device or devices provided below may be referred to the limitation of the image detection method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 8, there is provided an image detection apparatus 1 including:

an acquisition module 10, configured to acquire an image to be detected;

the extracting module 20 is configured to perform feature extraction on the image to be detected through a feature extracting network to obtain a target feature map of the image to be detected; the feature extraction network comprises an attention convolution module and a pooling layer;

the classification module 30 is configured to perform classification processing on the target feature map through a classification network to obtain a detection result of the image to be detected; the detection result is that the image to be detected is tampered or the image to be detected is not tampered.

According to the image detection device, the feature extraction network comprising the attention convolution module and the pooling layer is introduced to perform feature extraction on the image to be detected, so that the target feature image of the image to be detected can accurately reflect the features of the image, and further, the result of classification processing on the target feature image is more accurate through the classification network, so that whether the image to be detected is tampered or not can be accurately determined, and the accuracy of image tampering detection is improved. .

In one embodiment, the feature extraction network includes at least two end-to-end convolution pooling layers, each including an attention convolution module and a pooling layer.

In one embodiment, the extraction module 20 is specifically configured to:

extracting the characteristics of the input information of the convolution pooling layers, and taking the output result of the last convolution pooling layer as a target characteristic diagram of the image to be detected; the input information of the first convolution pooling layer is an image to be detected, and the input information of any other convolution pooling layer is an output result of the last convolution pooling layer of the convolution pooling layer.

In one embodiment, the extraction module 20 specifically includes an extraction unit and a compression unit;

the extraction unit is used for extracting the characteristics of the input information of each convolution pooling layer through the attention convolution module contained in the convolution pooling layer to obtain a basic characteristic diagram;

and the compression unit is used for compressing the basic feature map through the pooling layer contained in the convolution pooling layer to obtain an output result of the convolution pooling layer.

In one embodiment, the attention convolution module includes a first convolution module, a second convolution module, a convolution attention module, and a joint feature convolution module, the input information includes first input information and second input information, and the base feature map includes a first base feature map and a second base feature map;

The extraction unit specifically comprises a first extraction subunit, a second extraction subunit and a fusion subunit;

the first extraction subunit is specifically configured to perform feature extraction on the first input information of the convolution pooling layer through a first convolution module and a convolution attention module, so as to obtain a first basic feature map;

the second extraction subunit is specifically configured to perform feature extraction on the second input information of the convolution pooling layer through a second convolution module, so as to obtain a first intermediate feature map;

the fusion subunit is specifically configured to fuse the first basic feature map and the first intermediate feature map through the joint feature convolution module, so as to obtain a second basic feature map.

In one embodiment, the first extraction subunit is specifically configured to:

extracting features of the first input information of the convolution pooling layer through a first convolution module to obtain a second intermediate feature map; and extracting channel characteristics and space characteristics of the second intermediate characteristic diagram through a convolution attention module to obtain a first basic characteristic diagram.

The respective modules in the above-described image detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing image data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image detection method.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

acquiring an image to be detected;

extracting features of the image to be detected through a feature extraction network to obtain a target feature map of the image to be detected; the feature extraction network comprises an attention convolution module and a pooling layer;

classifying the target feature images through a classification network to obtain detection results of the images to be detected; the detection result is that the image to be detected is tampered or the image to be detected is not tampered.

In one embodiment, a feature extraction network involved in executing a computer program by a processor includes at least two convolutional pooling layers connected end-to-end, each comprising an attention convolution module and a pooling layer.

In one embodiment, the processor when executing the computer program further performs the steps of:

aiming at each convolution pooling layer, performing feature extraction on the input information of the convolution pooling layer through an attention convolution module contained in the convolution pooling layer to obtain a basic feature map; and compressing the basic feature map through a pooling layer contained in the convolution pooling layer to obtain an output result of the convolution pooling layer.

the attention convolution module comprises a first convolution module, a second convolution module, a convolution attention module and a joint feature convolution module, wherein input information comprises first input information and second input information, and the basic feature map comprises a first basic feature map and a second basic feature map; performing feature extraction on the first input information of the convolution pooling layer through a first convolution module and a convolution attention module to obtain a first basic feature map; extracting features of second input information of the convolution pooling layer through a second convolution module to obtain a first intermediate feature map; and fusing the first basic feature map and the first intermediate feature map through a joint feature convolution module to obtain a second basic feature map.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring an image to be detected;

In one embodiment, a feature extraction network involved in the execution of a computer program by a processor includes at least two end-to-end convolution pooling layers, each including an attention convolution module and a pooling layer.

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

acquiring an image to be detected;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. An image detection method, the method comprising:

acquiring an image to be detected;

2. The method of claim 1, wherein the feature extraction network comprises at least two end-to-end convolutional pooling layers, each comprising an attention convolutional module and a pooling layer.

3. The method according to claim 2, wherein the performing feature extraction on the image to be detected through a feature extraction network to obtain a target feature map of the image to be detected includes:

4. A method according to claim 3, wherein the feature extraction of its input information by each convolutional pooling layer comprises:

5. The method of claim 4, wherein the attention convolution module comprises a first convolution module, a second convolution module, a convolution attention module, and a joint feature convolution module, the input information comprises a first input information and a second input information, and the base feature map comprises a first base feature map and a second base feature map;

6. The method of claim 5, wherein the performing feature extraction on the first input information of the convolution pooling layer by the first convolution module and the convolution attention module to obtain a first basic feature map includes:

7. An image detection apparatus, the apparatus comprising:

the acquisition module is used for acquiring the image to be detected;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.