CN117437557A

CN117437557A - Hyperspectral image classification method based on double-channel feature enhancement

Info

Publication number: CN117437557A
Application number: CN202210812607.4A
Authority: CN
Inventors: 张俊三; 赵利; 沈秀轩; 邵奇
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2024-01-23

Abstract

The application discloses a hyperspectral image classification method based on dual-channel feature enhancement. Aiming at the problem of how to more fully extract and utilize the spatial information and the spectral information of the hyperspectral image under the condition of limited training samples, the invention provides a hyperspectral image classification method based on double-channel feature enhancement (DCFE). First, two channels are designed to capture spectral and spatial features, respectively, using a three-dimensional convolution as a feature extractor in each channel. Then, the feature map in the optical channel is fused with the feature map of the spatial channel after the dimension reduction operation. And finally, inputting the feature map which integrates the spectral features and the spatial features into an attention module, and realizing feature enhancement by improving the attention of important information and reducing the interference of useless information. The experimental results on four hyperspectral data sets show that the method has good classification performance.

Description

Hyperspectral image classification method based on double-channel feature enhancement

Technical Field

The invention relates to the field of remote sensing images, in particular to a hyperspectral image classification method based on double-channel feature enhancement.

Background

The hyperspectral image (Hyperspectral Image, HSI) is also called hyperspectral remote sensing image, is a three-dimensional image captured by an aerospace vehicle carrying a hyperspectral imager, and consists of a two-dimensional space dimension and a spectrum dimension, wherein the spectrum dimension comprises tens or hundreds of spectrum bands, so that the hyperspectral image has wide application prospects in fields such as land coverage analysis, water monitoring, anomaly detection, change region detection and the like. Thus, hyperspectral classification techniques that evolve rapidly and with high classification accuracy will bring great progress to the development of society.

The goal of hyperspectral image classification is to assign a class label to each pixel in the image based on the sample characteristics. In early research on hyperspectral image classification, methods such as a Support Vector Machine (SVM), sparse Representation Classification (SRC), polynomial logistic regression (MLR) and the like are proposed, however, the methods only use information of spectrum dimension, and the hyperspectral image has higher spatial correlation while containing rich spectrum characteristics, so that feature extraction is not complete enough, and a classifier with higher accuracy is difficult to learn under the condition of fewer samples.

The deep learning has outstanding expression in the aspect of extracting nonlinear and hierarchical features, and has great breakthrough in the fields of image classification, natural language processing, target detection and the like. Hyperspectral image classification is a typical classification task and is deeply affected by deep learning. Chen et al propose a method for extracting high-order features of hyperspectral image data based on a deep learning method of a stacked self-coding network (SAE), and obtain classification results using logistic regression. Makantasis et al propose a method for integrally extracting spatial and spectral features using random principal component analysis (R-PCA). Chen et al propose a classification method based on a Deep Belief Network (DBN) and a Restricted Boltzmann Machine (RBM). Zhao et al applied CNN as a feature extractor to hyperspectral image classification. Zhang et al propose a method based on a differential area convolutional neural network (DRCNN) that uses different image blocks within the target pixel neighborhood as input to the CNN, effectively enhancing the input data. Lee et al propose a Contextual Deep Convolutional Neural Network (CDCNN) with deeper and wider networks. He et al propose a lightweight fused CNN algorithm 3D-2D-1D CNN, which effectively improves the analysis speed of data on the premise of ensuring high classification accuracy. Although these methods are effective, the extraction and utilization of spatial information and spectral information for hyperspectral images is insufficient, resulting in a failure to obtain better classification in the case of limited training samples.

In order to solve the problem of insufficient extraction and utilization of spatial information and spectral information in hyperspectral image classification, a hyperspectral image classification method based on dual-channel feature enhancement (DCFE) is proposed. The method comprises two channels of spectrum and space, wherein a multi-branch 3D convolutional neural network is used for capturing spectrum characteristics and space characteristics in each channel, output characteristics in the two channels are subjected to characteristic fusion, and characteristics in the fused characteristic diagram are subjected to characteristic enhancement through Coordinate Attention (CA). Finally, the final classification result is obtained by the fully connected layer (FC).

Disclosure of Invention

The hyperspectral image classification method based on the double-channel feature enhancement is provided, features of two dimensions of a hyperspectral image are extracted through two channels respectively, weight distribution is carried out by using attention, useless features are restrained by the enhanced useful features, and space information and spectrum information are extracted and utilized better, so that a better classification effect is achieved.

To achieve the above object, the present application provides the following solutions:

a hyperspectral image classification method based on dual-channel feature enhancement comprises the following steps:

extracting the spatial features and the spectral features of the hyperspectral image through 3D convolution in two channels;

s1: and carrying out preprocessing such as pixel point segmentation and filling on the original image, and dividing the data.

S2: and extracting the spectral features and the spatial features of the hyperspectral image.

S3: and (3) performing dimension reduction on the spectral feature map extracted in the step (S2), and fusing the spectral feature map with the extracted spatial feature map.

S4: and (3) inputting the feature map which is fused with the spectral features and the spatial features in the S2 into an attention module to realize feature enhancement.

S5: the hyperspectral image is classified using a method based on dual channel feature enhancement.

Preferably, the raw hyperspectral image data is preprocessed before inputting the data into the DCFE model: firstly, cutting an image into a square with 11×11×band (spectrum dimension of a hyperspectral image) by taking a category pixel point as a center, and performing filling operation on an edge pixel point.

Preferably, the spectral feature and spatial feature extraction method includes: in the spectral channel, first, 11×11×band image samples are convolved into a feature map of 11×11×97 by a convolution layer, and then input into a spectral block consisting of 3 multi-branch blocks, each consisting of a convolution layer, a batch normalization layer, and a mich activation function. In the spatial channel, first, 11×11×200 image samples are convolved into a feature map of 11×11×1 by a convolution layer, and then input into a spatial block composed of 3 multi-branch blocks each composed of a convolution layer, a batch normalization layer, and a mich activation function.

The network for extracting the spectral features and the spatial features in the channel is composed of a plurality of multi-branch blocks, in each multi-branch block, a normalization layer and an activation layer enter a convolution layer using m a x D convolution kernels to perform 3D convolution operation, a feature map after passing through the convolution layer is added with feature maps of two residual branches element by element, and a result is used as an input of the next multi-branch block.

Preferably, the method for reducing the dimension of the optical characteristic diagram comprises the following steps: in the spectrum channel, the size of the output spectrum characteristic diagram is 11 multiplied by 97, the dimension reduction operation is carried out by a convolution layer with the convolution kernel size of 11 multiplied by 97, the characteristic diagram with the same size as the dimension of the space channel is output, and finally the space characteristic diagram and the spectrum characteristic diagram are fused along the third dimension.

Preferably, the method for enhancing the characteristics of the characteristic map comprises the following steps: compared with the attention module which is used in other methods and can only establish the unidirectional relation of the feature map, the feature map which is fused with the spectral feature and the spatial feature is input into the CA module, and the CA embeds the information on the space of the feature map into the attention of the channel by using a space information coding mode, so that the relationship between the channels of the feature map can be established by using the information on the space.

In CA, in order to preserve spatial information, global pooling is first decomposed into two one-dimensional feature encoding operations, and each channel is encoded along horizontal and vertical coordinates using two pooling kernels of (H, 1) and (1, W), respectively, to generate a pair of direction-sensing feature maps. And then carrying out feature fusion on the generated feature map, inputting the feature map into a convolution layer of C/r convolution kernels with the size of 1 multiplied by 1 to carry out the operation of reducing and activating to obtain output with the size of C/r multiplied by 1 multiplied by (H+W), splitting the output along the space dimension, carrying out convolution operation respectively, and converting the output into tensors with the same number as the original input channels. Finally, the two tensors are used as attention weights to multiply the original input to realize characteristic enhancement.

Preferably, the training method of the hyperspectral image classification method based on the dual-channel feature enhancement is as follows:

all experiments of the invention were run on a system of Intel (R) Xeon (R) 4208CPU @ 2.10GHz processor, nvidia GeForce RTX 2060Ti graphics card, all classifiers were implemented using pyrerch, batch size was set to 16, optimizer using RMSprop, learning rate initial value of 0.00008, adjusting learning rate using cosine annealing, loss function using cross entropy loss function. In order to verify the feasibility of the method, the invention performs a comparison test on four public data sets with other five hyperspectral image classification methods.

The beneficial effects of this application are:

1) The invention provides a multi-branch structure +3D convolution method for extracting spectrum and space characteristics respectively. The method not only can more fully extract and utilize the spectrum information and the space information of the hyperspectral image, but also solves the problem of gradient disappearance in the deep network, accelerates the speed of network training and convergence, prevents overfitting and improves the quality of results.

2) The present invention proposes a method of embedding spatial information into channel attention. The method not only can capture cross-channel information, but also can capture spatial information. In this way, the feature map fused by the two channels is enhanced with useful information and suppressed with redundant information.

3) The dual-channel characteristic enhancement network provided by the invention obtains the most advanced classification precision in the limited data set of four training samples.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the following brief description is given to the accompanying drawings:

FIG. 1 is a diagram of a 3D-CNN structure incorporating batch normalization

FIG. 2 is a Coordinate Attention architecture diagram

FIG. 3 is a diagram of a multi-branch structure

FIG. 4 is a diagram of a multi-branch block structure

Fig. 5 is a diagram of DCFE network architecture

FIG. 6 is a diagram of classification results for IP datasets

FIG. 7 is a diagram of classification results for UP data sets

FIG. 8 is a diagram of classification results of SV data sets

FIG. 9 is a diagram of classification results of BS data sets

Fig. 10 is a flow chart of DCFE network

Detailed Description

The following description of the invention is illustrative in nature and is not to be construed as limiting the scope of the invention.

FIG. 1 is a diagram of a 3D-CNN structure incorporating batch normalization. In the convolution process, the 3D-CNN carries out convolution operation in three directions of a width direction, a height direction and a channel direction, so that spectrum information and space information in a hyperspectral image sample can be directly extracted. Thus, 3D-CNN is used herein as the basic structure of the DCFE process. In addition, a batch normalization layer (BN) is added to each 3D-CNN layer to improve numerical stability.

As shown in fig. 1, the input is n ^k P in size ^k ×p ^k ×b ^k Through n ^k+1 The convolution kernel has a size of a ^k+1 ×a ^k+1 ×d ^k+1 To generate n ^k+1 P in size ^k+1 ×p ^k+1 ×b ^k+1 The ith output of the add batch normalized kth+1th 3D-CNN layer is expressed as:

wherein:is the j-th input feature map of the k+1-th layer,>representing the final output of the k-th layer, E (-) and Var (-) representing the expectation and variance functions,/and->And->Representing the weight and bias of the k+1 layer, R (·) is a nonlinear activation function in the network.

Fig. 2 is a Coordinate Attention architecture diagram. As shown in fig. 2, to be able to preserve spatial information, global pooling is decomposed into two one-dimensional feature encoding operations, specifically, given an input X, X of size c×h×w, each channel is encoded along horizontal and vertical coordinates using two pooling cores of sizes (H, 1) and (1, W), respectively. Thus, the output of the c-th channel of height h and the output of the c-th channel of width w can be expressed as:

wherein: x is x _c Representing the input. The two transforms aggregate features along two spatial directions, respectively, to generate a pair of direction perception feature maps. Feature fusion is carried out on the feature graphs generated by the formula (3) and the formula (4), and then the feature graphs are input into a convolution layer of C/r convolution kernels with the size of 1 multiplied by 1, and the output of the convolution layer can be expressed as:

f＝δ(F ₁ ([z ^h ,z ^w ])) (5)

wherein:[·,·]representing the stitching operation along the spatial dimension, delta is a nonlinear activation function, r is the reduction rate used to control the number of channels. Then splitting f along the spatial dimension into +.>Andrespectively performing convolution operation, and converting to the same channel number as the input X to obtain an output g ^h And g ^w

g ^h ＝σ(F _h (f ^h )) (6)

g ^w ＝σ(F _w (f ^w )) (7)

Wherein: f (F) _h And F _w Represents convolution operation, sigma represents Sigmoid activation function, and then outputs g ^h And g ^w The final output, which is expanded and used as a attention weight, can be expressed as:

y _c (i,j)＝x _c (i,j)×g ^h (i)×g ^w (j) (8)

wherein: x is x _c Representing input g ^h And g ^w Representing the attention weight.

Fig. 3 is a multi-branch structure diagram. As shown in fig. 3, F represents a hidden layer, which contains a convolution layer, a normalization layer and an activation layer, G represents a convolution layer with a convolution kernel size of 1 x 1, directly through the network to the next layer is an Identity branch. It is the presence of these branches that makes the extracted features finer and solves the problem of gradient extinction in deep networks. The output of the i-1 th multi-branch block in the figure can be expressed as:

X _i ＝H _i-1 (X _i-1 )+G _i-1 (X _i-1 )+X _i-1 (9)

where F (·) represents convolution, normalization, and activation operations and G (·) represents convolution operations. The network in the channel for extracting the spectral and spatial features is composed of a plurality of multi-branched blocks.

Fig. 4 is a diagram of a multi-branch block structure. As shown in fig. 4, assuming that n feature maps of p×p×b are input, the normalization layer and the activation layer are input to the convolution layer using m convolution kernels of a×a×d to perform a 3D convolution operation, the feature maps after passing through the convolution layer and the feature maps of two residual branches are added element by element, and the result is taken as the input of the next multi-branch block.

Fig. 5 is a DCFE network architecture diagram. As shown in fig. 5, the DCFE model architecture is described by taking the Indian Pins (IP) dataset as an example. Indian pins contain 145 x 145 pixels, each with 200 spectral bands, i.e., the Indian pins dataset is 145 x 200 in size. The number of pixels with corresponding labels is 20249, the other pixels are background, the sample size is allocated to 11×11×200, and the number of convolution kernels of the convolution layers is fixed to 24.

In the spectral channel, first, 11×11×200 image samples are convolved into a feature map of 11×11×97 by a convolution layer, and then input into a spectral block consisting of 3 multi-branch blocks, each consisting of a convolution layer, a batch normalization layer, and a mich activation function.

In the spatial channel, first, 11×11×200 image samples are convolved into a feature map of 11×11×1 by a convolution layer, and then input into a spatial block composed of 3 multi-branch blocks each composed of a convolution layer, a batch normalization layer, and a mich activation function.

And carrying out feature fusion on the output feature map of the spectrum channel and the feature map of the space channel, inputting the feature fusion into the attention module, adding the attention weight obtained by calculation to the original feature map, obtaining a 1 multiplied by 48 feature map through pooling operation, and finally obtaining a classification result through a full connection layer.

In the network structure, a proper activation function can accelerate the back propagation and convergence speed of the network, in the DCFE model, normalization and activation operations are carried out after each convolution operation, the selected activation function is a Mish function, a self-regularized non-monotonic activation function is adopted instead of a traditional ReLU activation function, and the Mish function is calculated as follows:

mish(x)＝x×tanh(ln1+e ^x ) (10)

wherein: x represents the input of the activation function. ReLU is a piecewise linear function that unifies all negative inputs to zero even though useful information is contained in the negative inputs. In contrast, the Mish function retains the negative input as a negative output, and therefore, some useful information is retained.

Fig. 6 is a classification result diagram of an IP dataset.

Fig. 7 is a classification result diagram of the UP dataset.

Fig. 8 is a classification result diagram of SV data sets.

Fig. 9 is a classification result diagram of a BS data set.

Claims

1. The hyperspectral image classification method based on the double-channel feature enhancement is characterized by comprising the following steps of:

s1: and preprocessing such as pixel point cutting and filling is carried out on the original image, and data are divided.

S4: and (3) inputting the feature map which is fused with the spectral features and the spatial features in the step (S2) into an attention module, realizing feature enhancement, and finally obtaining a classification result through a full-connection layer.

2. The hyperspectral image classification method based on dual-channel feature enhancement as claimed in claim 1, wherein the process of S1 is:

and loading original hyperspectral image data, dividing category pixel points in the image, and filling edge pixel points, wherein the size of specified clipping is 11×11. Dividing the segmented sample into a training set, a verification set and a test set, wherein the training set and a corresponding label thereof are used for updating network parameters, the verification set and the label thereof are used for monitoring a temporary model generated in a training stage, and the test set is used for evaluating an optimal model.

3. The hyperspectral image classification method based on the dual-channel feature enhancement as claimed in claim 1, wherein the process of S2 is:

firstly, carrying out convolution operation on hyperspectral image samples by using two different 3D convolutions, then respectively inputting the obtained two different feature graphs into a spectrum channel and a space channel, wherein a feature extractor consists of three layers of convolutions, and in the feature extraction process, if the input is n ^k P in size ^k ×p ^k ×b ^k Through n ^k+1 The convolution kernel has a size of a ^k+1 ×a ^k+1 ×d ^k+1 To generate n ^k+1 P in size ^k+1 ×p ^k+1 ×b ^k+1 The ith output of the added batch normalized kth+1th 3D-CNN layer is shown in the formulas (1) and (2):

wherein:is the j-th input feature map of the k+1-th layer,>representing the final output of the k-th layer, E (-) and Var (-) representing the expectation and variance functions,/and->And->Representing the weight and bias of the k+1 layer, R (·) is a nonlinear activation function in the network. Finally, a spectral feature map and a spatial feature map are obtained by convolution operation in the two channels.

4. The hyperspectral image classification method based on dual-channel feature enhancement as claimed in claim 1, wherein the process of S3 is:

and carrying out convolution operation on the extracted spectrum feature map to reduce the dimension, and then fusing the spectrum feature map with the space feature map to obtain a feature map fused with the space spectrum feature and the space feature.

5. The hyperspectral image classification method based on the dual-channel feature enhancement as claimed in claim 1, wherein the process of S4 is:

first, to be able to preserve spatial information, global pooling is decomposed into two one-dimensional feature encoding operations, specifically, given an input X, X of size c×h×w, each channel is encoded along horizontal and vertical coordinates, respectively, using two pooling kernels of sizes (H, 1) and (1, W). Thus, the output of the c-th channel of height h and the output of the c-th channel of width w can be expressed as:

wherein: x is x _c Representing the input. The two transforms aggregate features along two spatial directions, respectively, to generate a pair of direction perception feature maps. Then, feature fusion is carried out on the feature graphs generated by the formula (3) and the formula (4), and then the feature graphs are input into a convolution layer of C/r convolution kernels with the size of 1 multiplied by 1, and the output of the convolution layer can be expressed as:

f＝δ(F ₁ ([z ^h ,z ^w ])) (5)

wherein:[·,·]representing the stitching operation along the spatial dimension, delta is a nonlinear activation function, r is the reduction rate used to control the number of channels. Then splitting f along the spatial dimension into +.>Andrespectively performing convolution operation, and converting to the same channel number as the input X to obtain an output g ^h And g ^w ：

g ^h ＝σ(F _h (f ^h )) (6)

g ^w ＝σ(F _w (f ^w )) (7)

y _c (i,j)＝x _c (i,j)×g ^h (i)×g ^w (j) (8)

wherein: x is x _c Representing input g ^h And g ^w Representing the attention weight. And finally, inputting the feature map enhanced by the attention module into a full-connection layer to obtain a classification result.