CN112819096A

CN112819096A - Method for constructing fossil image classification model based on composite convolutional neural network

Info

Publication number: CN112819096A
Application number: CN202110219351.1A
Authority: CN
Inventors: 张蕾; 王晓宇; 罗杰; 卜起荣; 冯筠
Original assignee: Northwestern University
Current assignee: Northwestern University
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-05-18
Anticipated expiration: 2041-02-26
Also published as: CN112819096B

Abstract

The invention discloses a method for constructing a fossil image classification model based on a composite convolutional neural network, which comprises the following steps of: s1: processing an original fossil image to obtain a gradient image, and constructing a fossil image feature extraction model; s2: the method comprises the steps that a fossil image feature extraction model carries out feature extraction on an original fossil image to obtain a depth feature map, the fossil image feature extraction model carries out feature extraction on a gradient image to obtain a primary visual feature map, the depth feature map and the primary visual feature map are fused and then processed sequentially through a global average pooling layer and a full connection layer to obtain an image category probability value, and a primary fossil image classification model is constructed according to the image category probability value; s3: training a primary fossil image classification model. The method respectively extracts the depth features of the original fossil images and the primary visual features of the corresponding gradient images, and further improves the accuracy of the fossil image classification task through feature fusion.

Description

Method for constructing fossil image classification model based on composite convolutional neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a method for constructing a fossil image classification model based on a composite convolutional neural network, and relates to an image classification method.

Background

In the conventional classification work of microsilica fossils, the classification is performed manually, but since microsilica fossils are too small and are difficult to be found or classified by visual observation, classification is performed manually through observation with a microscope. With the progress of work such as archaeological excavation, constantly there is new type of microsome fossil kind to discover out, this just makes the total sample kind increase, and the efficiency of work such as manual sorting selection is more low. The whole process is tedious, the error probability can be increased along with time, and meanwhile, the long-time high-intensity observation can cause serious damage to the eyesight and other physical health conditions of observers.

With the development of artificial intelligence, the research of deep learning has made great progress, and the deep learning gradually plays an increasingly important role in daily work and has achieved more and more obvious effects in image classification tasks. The method can learn through a small amount of sample data, automatically acquire the most appropriate features for classification through a network model, does not need to manually select the appropriate features, and can obtain higher classification accuracy. The micro fossil images are classified by utilizing the convolutional neural network, so that a large amount of artificial resources are saved, high classification accuracy and high working efficiency can be guaranteed, and a certain reference value is provided for archaeological work.

The existing research techniques mainly include: an image feature based method and a deep learning based method. The traditional fossil image classification method based on image features needs complex steps of feature extraction and feature optimization, and the algorithm complexity is high; the existing fossil image model based on deep learning only extracts features from an original image, primary visual features are not enhanced from the angle of image gradient change, the number of network layers is deep, and a fossil image classification task is easy to be overfitted.

Disclosure of Invention

The invention aims to provide a method for constructing a fossil image classification model based on a composite convolutional neural network, aiming at the problems of high algorithm complexity, low training speed, easiness in fitting and low detection precision of a fossil image classification task.

In order to realize the task, the invention adopts the following technical scheme:

a method for constructing a fossil image classification model based on a composite convolutional neural network comprises the following steps:

s1: processing an original fossil image to obtain a gradient image, and constructing a fossil image feature extraction model;

s2: the method comprises the steps that a fossil image feature extraction model carries out feature extraction on an original fossil image to obtain a depth feature map, the fossil image feature extraction model carries out feature extraction on a gradient image to obtain a primary visual feature map, the depth feature map and the primary visual feature map are fused and then processed sequentially through a global average pooling layer and a full connection layer to obtain an image category probability value, and a primary fossil image classification model is constructed according to the image category probability value;

s3: training a primary fossil image classification model.

Optionally, the S2 further includes:

s21: fusing the depth feature map and the primary visual feature map from the channel dimension to obtain a fused feature map;

s22: inputting the fused feature map into a global average pooling layer and outputting to obtain a feature vector;

s23: the feature vectors are subjected to convolution and classification of the full connection layer to obtain an image category probability value, and a primary fossil image classification model is constructed according to the image category probability value.

Optionally, processing the original fossil image by using a Canny operator to obtain a gradient image;

the classification of the full connectivity layer is done by a Softmax classifier.

Optionally, the S3 further includes:

s31: constructing a target loss function;

s32: initializing network parameters of a fossil image feature extraction model by using a ResNet50 deep residual error network, and initializing network parameters of a full connection layer according to normal distribution;

s33: for original fossil images in a data set to be processed, randomly selecting 80% of the original fossil images as training set images, and selecting 10% of the original fossil images as verification set images;

s34: inputting an original training set image and a preprocessed training set image into a fossil image classification model, and minimizing the target loss function value by adopting a batch gradient descending training method to obtain the weight of a trained fossil image classification network;

s35: and feeding back a target loss function value of the preprocessed verification set image, if the target loss function value is smaller than the weight of the trained fossil image classification network, updating the weight of the fossil image classification network, and otherwise, saving the weight of the fossil image classification network.

Optionally, the S31 further includes:

constructing an objective loss function L, expressed as:

L＝-θα(1-b)^γlog(b)-(1-θ)log(b)

wherein, alpha belongs to [0,1] is a weight factor used for adjusting and balancing the importance degree of different types of samples; gamma >0 is a modulation factor used for adjusting the weight of the easily classified samples; θ is a hyperparameter used to adjust the weights of the two loss functions; b is equal to [0,1] to represent the estimated probability of the model to the true label of the sample.

Optionally, the method for constructing the fossil image feature extraction model includes:

s1: establishing a stackable composite convolution residual block, and setting the number of basic channels and the hole convolution expansion rate of the composite convolution residual block;

s2: stacking the composite convolution residual blocks to form neural frameworks with different depths, wherein the depths of the neural frameworks are determined by image data sets where different tasks are located;

the complex convolution residual block set described in S1:

a first layer: a convolution kernel with the size of 1 multiplied by 1 reduces the calculation parameters of the model;

a second layer: 3 × 3 convolution operation, combining the traditional convolution and the cavity convolution, wherein the traditional convolution captures continuous structural dependency characteristics, and the cavity convolution captures structural dependency relationships with longer spacing;

and a third layer: and (5) performing 1 × 1 convolution kernel to restore the number of the feature maps.

Optionally, the number of the basic channels increases linearly with the depth of the neural structure.

Optionally, the fossil image feature extraction model has a structure shown in table 1:

TABLE 1 fossil image feature extraction model network architecture

A method of fossil image classification, comprising: preprocessing a fossil image to be detected to obtain a gradient image;

inputting the fossil image to be detected and the corresponding gradient image into the fossil image classification model constructed by the construction method of the fossil image classification model based on the composite convolutional neural network, and performing prediction classification on the classification of the fossil image classification model to obtain the classification of the fossil image.

A computer-readable storage medium storing computer instructions for causing a computer to execute the method for constructing a composite convolutional neural network-based fossil image classification model according to the present invention.

Compared with the prior art, the invention has the following technical characteristics:

(1) in the aspect of constructing a feature extraction network, the invention provides a stackable composite convolution residual block for stacking neural architectures with different depths, wherein the specific depths are determined by image data sets of different tasks. The module combines the traditional convolution and the cavity convolution, the former captures continuous structural dependence characteristics, and the latter captures structural dependence relations with longer spacing distance, so that the receptive field is increased under the condition of not increasing the parameter number.

(2) In the aspect of constructing a fossil classification model, the depth features of the original fossil image and the primary visual features of the fossil gradient image, such as information of image structural components, edge textures and the like, are respectively extracted, and the accuracy of a fossil image classification task is further improved through fusion of the features.

(3) Aiming at the problem that the classification loss function used by the existing fossil image classification model cannot well solve the problem of unbalanced sample class number and relieve overfitting, the invention constructs a new target loss function, so that the model can pay more attention to samples which are difficult to classify while reducing the loss proportion of samples which are easy to classify, and the accuracy of the fossil image classification model is effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow chart of a method for constructing a fossil image classification model based on a composite convolutional neural network according to the present invention;

FIG. 2 is a flowchart of a method for constructing a fossil image feature extraction model according to the present invention;

FIG. 3 is a diagram illustrating the structure of a composite convolution residual block according to the present invention;

fig. 4 is a schematic diagram of the fossil image feature extraction model extracting depth features from an original fossil image according to the present invention.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.

With reference to fig. 1, 2 and 3, the method for constructing a fossil image classification model based on a composite convolutional neural network of the present invention includes the following steps:

s1: processing an original fossil image to obtain a gradient image, and constructing a fossil image feature extraction model; the original fossil image used by the invention is in JPEG format; the gradient image referred to in the invention refers to an image calculated from a gray scale image of an original fossil image by using Canny operator.

S2: the method comprises the steps that a fossil image feature extraction model carries out feature extraction on an original fossil image to obtain depth features, the fossil image feature extraction model carries out feature extraction on a gradient image to obtain primary visual features, the depth features and the primary visual features are fused and then processed sequentially through a global average pooling layer and a full-link layer to obtain an image category probability value, and a primary fossil image classification model is constructed according to the image category probability value; the primary visual features referred to in the invention refer to features such as color, edge texture, structural relationship and the like extracted from a gradient image corresponding to an original fossil image; the depth features refer to abstract semantic features extracted from an original fossil image, and the depth features extracted at each stage of the fossil image feature extraction model are visualized, as shown in fig. 4.

S3: the method comprises the steps of training a primary fossil image classification model, wherein the training process comprises training of a fossil image feature extraction model, and the fossil image feature extraction model is embedded into the fossil image classification model to jointly form a final fossil image classification model.

In an embodiment of the present disclosure, S2 further includes:

Specifically, processing an original fossil image by using a Canny operator to obtain a gradient image; the classification of the full connectivity layer is done by a Softmax classifier.

In an embodiment of the present disclosure, S3 further includes:

s31: constructing a target loss function;

Specifically, the target loss function L is represented as:

L＝-θα(1-b)^γlog(b)-(1-θ)log(b)

With reference to fig. 2, in an embodiment of the present disclosure, a method for constructing a fossil image feature extraction model includes:

s11: establishing a stackable composite convolution residual block, and setting the number of basic channels and the hole convolution expansion rate of the composite convolution residual block; for example, the convolution expansion rate of the hole set in the present invention is 2.

S12: stacking the composite convolution residual blocks to form neural frameworks with different depths, wherein the depths of the neural frameworks are determined by image data sets where different tasks are located; because the resolution of the input images in different image data sets is different, the depth of the network needs to be reset according to different data sets; for example, the neural architecture of the fossil image feature extraction model formed in the present invention is: input image → convolutional layer → max pooling layer → 8 complex convolutional residual blocks are stacked → output feature map.

The complex convolution residual block set described in S11:

In embodiments of the present disclosure, the number of base channels increases linearly as the depth of the neural architecture gets deeper.

In an embodiment of the present disclosure, the fossil image feature extraction model has a structure shown in table 1.

Example 1:

the embodiment discloses a method for constructing a fossil image classification model based on a composite convolutional neural network, which comprises the following steps:

step 1, preprocessing an original fossil image, calculating the gradient of the original fossil image by using a Canny operator, calculating edge texture information of the image to obtain a larger gradient value, and calculating a smoother part to obtain a smaller gradient value to obtain a final gradient image;

the method specifically comprises the following steps:

step 1.1, converting the color fossil image in the RGB format into a gray level image, such as a formula:

f(x，y)＝＝0.299R+0.587G+0.114B

wherein f (x, y) represents a gray image generated by an original image, x and y respectively represent the coordinate positions of pixel points of the image, and R, G, B represents the colors of red, green and blue channels.

Step 1.2, in order to reduce the extraction result of the gradient image by noise as much as possible, a gaussian filter is used to denoise a gray level image f (x, y), the step is called as a smooth image, a selected gaussian function is set as G (x, y), and the smoothed image is set as H (x, y), then:

H(x，y)＝f(x，y)*G(x，y)

where σ represents the standard deviation of a two-dimensional gaussian function, affecting the quality of the gaussian filtering. "x" is an operator that represents a convolution.

Step 1.3, calculating the amplitude and direction of the gradient by using the finite difference of the first-order partial derivatives, wherein the gradient of the image is defined as the change degree of the gray value of the pixel in the computer vision field, and the calculation of the change degree can be described as calculating the partial derivatives of the corresponding pixel along the x-axis direction and the y-axis direction in the micro-integration, then:

since the image can be regarded as a discrete matrix, the above differential function is rewritten into a discrete differential operator, and then a one-dimensional gaussian smoothing is combined on the basis to obtain a cable operator, which is also called a first-order differential operator, and the formula is as follows:

on the basis of the derivation, the calculation process for solving the image gradient is mathematically abstracted to pass the image to be processed through S_x，S_yThe Sobel operators in the two directions carry out filtering calculation to obtain gradient graphs G in the two corresponding directions_x，G_yThereby it is convenientThe gradient G and direction θ of the pixel point can be determined:

wherein G is gradient strength, theta represents gradient direction, and arctan is arctangent direction;

step 1.4, based on the above operation steps, a gradient edge composed of many pixels can be obtained, and the edge information is inaccurate and the edge is thick. Therefore, non-maximum suppression is required to obtain accurate edge information with a single pixel composition.

The method specifically comprises the following steps:

step 1.4.1, comparing the gradient intensity of the current pixel with two pixels along the positive and negative gradient directions;

step 1.4.2, if the gradient intensity of the current pixel is maximum compared with the other two pixels, the pixel point is reserved as an edge point, otherwise, the pixel point is inhibited;

and step 1.5, performing edge detection and connection on the basis of the step 1.4. Ideally, edge detection only processes a pixel set located on an edge, but in an actual processing process, noise exists all the time, so that the edge is disconnected, and the pixel set on the edge cannot completely and effectively describe edge characteristics, so that threshold judgment is introduced. And judging the threshold of the pixel points on the edge by setting a proper threshold range.

The method specifically comprises the following steps:

step 1.5.1, if the gradient intensity value of the pixel point on the edge is greater than the maximum value of the threshold value, recording the pixel point as the edge point;

step 1.5.2, if the gradient intensity value of the pixel point on the edge is smaller than the minimum value of the threshold value, recording the pixel point as a non-edge point;

step 1.5.3, if the pixel point is positioned between the maximum value and the minimum value of the threshold value, whether the pixel point is in eight-way connection with the edge point marked in front or not is calculated, if the pixel point is in eight-way connection, the pixel point is marked as the edge point, and if not, the pixel point is marked as a non-edge point;

step 1.5.4, traversing all pixel points on the edge, thereby connecting the edge points which are not closed to form a contour, and obtaining a gradient image;

and 2, performing feature extraction by using a fossil image feature extraction model based on the composite convolutional neural network, and constructing a fossil image classification model on the basis.

The method specifically comprises the following steps:

step 2.1, constructing a fossil image feature extraction model based on a composite convolutional neural network;

the method specifically comprises the following steps:

step 2.1.1, establishing a stackable composite convolution residual block for stacking to form a neural framework with different depths, wherein the specific depths are determined by image data sets of different tasks; because the resolution of the input images in different image data sets is different, the depth of the network needs to be reset according to different data sets;

further, as shown in fig. 3, the first layer of the composite convolution residual block main path is a convolution kernel of 1 × 1 size for reducing the calculation parameters of the model. The 3 x 3 convolution operation of the middle layer combines the traditional convolution and the cavity convolution, the former captures continuous structural dependence characteristics, and the latter captures structural dependence relationships with longer spacing, so that the receptive field is increased under the condition of not increasing the number of parameters. The third layer uses a 1 x 1 convolution kernel to reduce the number of feature maps to ensure that inputs can be added to the output, ensuring model accuracy while reducing computational parameters.

The calculation process can be defined as:

F_l，i＝ReLU(W_l,p),

where p represents the input signature, W is the weight of the convolution kernel, ReLU is the activation function, F_l,iAn output feature map representing the model layer I, which is the output of the first stage of the complex convolutional residual block, comprisingi feature maps.

In the second stage, will be composed of

Is represented by F_l,iHalf of the feature map of (1) is input into the conventional convolution, and the output is F_{l+1_conv,j}Contains j feature maps; the other half of the characteristic diagram

Then input into the hole convolution and output is F_{l+1_dconv,k}Contains k feature maps.

q＝ReLU(W_l+3iF_l+2,j+k)+W_sp，

In the third stage, the output characteristic graphs of the traditional convolution and the cavity convolution are overlapped from the channel dimension, and the result is F_l+2,j+kThe method comprises j + k feature maps in total. And obtaining an output characteristic diagram q after the final layer of 1 × 1 convolution operation and jump connection.

Step 2.1.2, setting the number of basic channels and the hole convolution expansion rate of the composite convolution residual block; then stacking the composite convolution residual blocks to determine a final feature extraction model, as shown in table 1;

the number of base channels per complex convolutional residual block in step 2.1.2 increases linearly as the network gets deeper;

step 2.2, constructing a fossil image classification model;

the method specifically comprises the following steps:

step 2.2.1, respectively sending the original fossil image and the gradient image obtained by preprocessing according to the step 1 into the fossil image feature extraction model based on the composite convolutional neural network constructed in the step 2.1, and extracting to obtain 1024 feature maps containing depth features and 7 × 7 pixels and 1024 feature maps containing primary visual features and 7 × 7 pixels;

step 2.2.2, fusing the depth feature map and the primary visual feature map from the channel dimension to obtain 2048 feature maps with the size of 7 × 7 pixels;

step 2.2.3, sending the feature maps fused in the step 2.2.2 into a global average pooling layer, and performing global average pooling operation on each input feature map, namely calculating the average value of all pixel points of each feature map and outputting a data value, wherein 2048 feature maps output 2048 data points which form a 2048-dimensional vector called a feature vector;

step 2.2.4, finally, mapping the distributed characteristic representation obtained by the global average pooling to a sample mark space through a full connection layer, namely, performing convolution operation on an output result obtained by the global average pooling by adopting C convolution kernels with the size of 1 multiplied by 2048, namely the length of 1, the width of 1 and the number of channels of 2048; and finally, obtaining probability values divided into various categories through a Softmax classifier, and obtaining the categories to which the final fossil images belong. Where C is the number of fossil image categories.

Step 3, training the constructed fossil image classification model;

step 3.1, constructing a target loss function;

the method specifically comprises the following steps:

step 3.1.1, in the existing research work, a Cross Entropy (CE) loss function is widely applied to a training process of a fossil image classification model, such as a formula:

wherein a is the true label (coded in one-hot form) of the sample.

Representing the prediction result of the sample, C is the number of fossil image classes, and for ease of representation, the formula is rewritten as:

CE(b)＝＝-log(b)

wherein b ∈ [0,1] represents the estimated probability of the model for the sample true label. The CE loss function can reflect the difference between the predicted probability distribution and the true probability distribution, and in the training process of the model, we expect to see that the predicted probability and the true probability are as close as possible, so the training goal of the model is to minimize the loss function. However, the CE loss function cannot solve the problem of unbalanced number of sample classes and alleviate the over-fitting problem, so the CE loss function is selected to be improved.

Step 3.1.2, constructing an objective loss function L, proposing to use a loss function improved based on focal loss as the objective function, and improving the CE loss function, which can be expressed as:

L＝-6α(1-b)^γlog(b)-(1-θ)log(b)

wherein, alpha is ∈ [0,1]]Is a weighting factor for adjusting and balancing the importance, Count, of the samples of different classes_mIndicates the number of samples of the m-th class, γ>0 is a modulation factor for adjusting the weight of the easily classified samples, and θ is a hyperparameter for adjusting the weights of the two loss functions.

Step 3.2, initializing parameters in the fossil image feature extraction model as initial parameter values by using a ResNet50 deep residual error network trained in advance on an Imagenet large-scale data set, and initializing network parameters of a full connection layer according to normal distribution;

3.3, randomly selecting 80% of images in the data set as a training set, 10% of images as a verification set and 10% of images as a test set; preprocessing the images in the training set and the verification set according to the step 1 to obtain preprocessed images in the training set and preprocessed images in the verification set;

step 3.4, sending the preprocessed training set images including the original fossil images and the corresponding gradient images into a constructed fossil image classification model, minimizing the target loss function value by adopting a batch gradient descending training method, and further adjusting parameters of all layers in the network to obtain the weight of the trained composite convolutional neural network;

in this embodiment, the batch processing size of the training is set to 64, the parameter update momentum is set to 0.9, the learning rate is set to 0.001, the iteration number (Epoch) is set to 500, the retraining is performed on the weight of the trained shallow convolutional neural network model by using a back propagation algorithm to obtain a weight training result, and the batch random gradient is decreased, so that the loss value is minimized.

And 3.6, feeding back the loss value of the weight training result and the preprocessed verification set image after each generation of training is finished: inputting the verification set image into a currently trained fossil image classification model, calculating a loss value of the verification set image, updating the weight of the model if the current error value is smaller than the weight training result, and otherwise, continuously storing the previous weight training result; after the iterative training is carried out for 500 times, the training of the model is terminated, and the optimal model result is stored.

And 4, classifying the fossil images by using a fossil image classification model based on the composite convolutional neural network.

Step 4.1, preprocessing a fossil image to be detected according to the step 1 to obtain a gradient image;

and 4.2, inputting the fossil image to be detected and the corresponding gradient image into the fossil image classification model obtained by training in the step 3 to predict the classification of the fossil image to obtain the final classification to which the fossil image belongs.

Example 2:

the data set in the embodiment is provided by geological institute of northwest university and comprises 2354 fossil images of 1392 Yunnan cephalosporins, 852 frigoria, 85 trefoil osmidges and 25 wuding worms. And (3) taking the classification accuracy as an evaluation index of the model performance, wherein the value is [0,1], and the higher the value is, the better the performance of the model is.

TABLE 2 comparison of the results between the different methods

As can be seen from the results of Table 2, the performance of the present invention on this data set is higher than the compared fossil image classification models. To further prove that the innovations proposed in the invention of this year can have a beneficial effect on the final result, this example compares the effects of four different methods, as follows:

n1: only one sub-network is included, namely the input is the original fossil image, and the whole network is trained end to end by adopting a cross entropy loss function.

N2: only one sub-network is included, namely the input is the original fossil image, and the whole network is trained end to end by adopting the loss function L provided by the invention.

N3: only one sub-network is included, namely a gradient image corresponding to an original fossil image is input, and the whole network is trained end to end by adopting the loss function L provided by the invention.

N4: the method comprises two sub-networks, wherein the input of the two sub-networks are respectively an original fossil image and a gradient image corresponding to the original fossil image, the features extracted by the two sub-networks are spliced and fused, and the loss function L provided by the invention is adopted to train the whole network end to end.

TABLE 3 contrast effect of ablation experiment

As can be seen from the results in table 3, the innovation provided by the present invention can have a beneficial effect on the final result, thereby further improving the performance of the fossil image classification model.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for constructing a fossil image classification model based on a composite convolutional neural network is characterized by comprising the following steps:

s3: training a primary fossil image classification model.

2. The method for constructing a composite convolutional neural network-based fossil image classification model as claimed in claim 1, wherein the S2 further comprises:

3. The method for constructing the fossil image classification model based on the composite convolutional neural network as claimed in claim 1 or 2, wherein a Canny operator is utilized to process an original fossil image to obtain a gradient image;

4. The method for constructing a fossil image classification model based on a composite convolutional neural network as claimed in claim 1 or 2, wherein the S3 further comprises:

s31: constructing a target loss function;

5. The method for constructing a composite convolutional neural network-based fossil image classification model as claimed in claim 4, wherein the S31 further comprises:

constructing an objective loss function L, expressed as:

L＝-θα(1-b)^γlog(b)-(1-θ)log(b)

wherein alpha belongs to [0,1] as a weight factor; gamma >0 is a modulation factor; θ is a hyperparameter; b ∈ [0,1] represents the estimated probability.

6. The method for constructing the fossil image classification model based on the composite convolutional neural network as claimed in claim 1 or 2, wherein the method for constructing the fossil image feature extraction model comprises the following steps:

the complex convolution residual block set described in S1:

7. The method for constructing a composite convolutional neural network-based fossil image classification model as claimed in claim 6, wherein the number of basic channels increases linearly as the depth of the neural architecture becomes deeper.

8. The method for constructing a fossil image classification model based on a composite convolutional neural network as claimed in claim 1 or 2, wherein the fossil image feature extraction model has a structure shown in table 1:

TABLE 1 fossil image feature extraction model network architecture

9. A method of classifying fossil images, comprising: preprocessing a fossil image to be detected to obtain a gradient image;

inputting the fossil image to be detected and the corresponding gradient image into the fossil image classification model constructed by the composite convolutional neural network-based fossil image classification model construction method according to any one of claims 1 to 8, and performing prediction classification on the classification of the fossil image to obtain the classification of the fossil image.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the method of constructing a composite convolutional neural network-based fossil image classification model according to any one of claims 1 to 8.