CN118506407B

CN118506407B - Light pedestrian re-recognition method and system based on random color discarding and attention

Info

Publication number: CN118506407B
Application number: CN202410950010.5A
Authority: CN
Inventors: 张艳艳; 秦雨; 丁紫郁
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-07-16
Filing date: 2024-07-16
Publication date: 2024-09-13
Anticipated expiration: 2044-07-16
Also published as: CN118506407A

Abstract

The invention discloses a light weight pedestrian re-recognition method and a light weight pedestrian re-recognition system based on random color discarding and attention, which relate to the technical field of pedestrian re-recognition, and are used for receiving image data and preprocessing the image data to obtain preprocessed image data; inputting the preprocessed image data into a OSNet which is built in advance and embedded with a cascade self-attention module, and extracting features to obtain image features; classifying the image features through the full connection layer, and mapping the image features onto corresponding class labels to obtain classified image features; calculating the identity loss with label smoothness by using the classified image features, and performing optimization training on a pre-established lightweight pedestrian re-recognition network model through a counter-propagation update gradient to obtain an optimized lightweight pedestrian re-recognition network model; and acquiring a test set of the pedestrian re-recognition data set, and inputting the test set of the pedestrian re-recognition data set into the optimized light-weight pedestrian re-recognition network model to obtain a light-weight pedestrian re-recognition result.

Description

Light pedestrian re-recognition method and system based on random color discarding and attention

Technical Field

The invention relates to the technical field of pedestrian re-identification, in particular to a light weight pedestrian re-identification method and system based on random color discarding and attention.

Background

With the rapid development of deep learning technology, the pedestrian re-recognition field has made remarkable progress. The residual error network (ResNet) is applied to the field of pedestrian re-recognition and achieves remarkable results, and the ResNet network solves the gradient disappearance problem in deep network training by introducing residual error connection, so that the network is easier to optimize.

The robustness and generalization of the recognition method in the prior art are not high, and in order to improve the robustness and generalization of the recognition system, some data enhancement strategies such as color enhancement are widely applied. The color information is used as important distinguishing characteristics of pedestrians, can enrich the expression of the pedestrian characteristics, particularly plays an important role when facing complex scenes such as illumination change, and has more obvious advantages compared with the traditional method in complex environments when being used as a pedestrian re-identification system integrating the color information. In some cases, however, the color bias generated by the color features limits the model to a certain extent to make the correct predictions, mainly in two ways: firstly, the color difference between the images of the same pedestrian increases the possibility of false recognition; and secondly, the color deviation weakens the characteristic difference between different pedestrian images, and reduces the recognition degree of the system. Currently, few studies are performed to reduce color bias and improve model robustness.

Disclosure of Invention

To solve the above-mentioned shortcomings in the background art, an object of the present invention is to provide a lightweight pedestrian re-recognition method and system based on random color discarding and attention.

In a first aspect, the object of the present invention can be achieved by the following technical solutions: the light pedestrian re-identification method based on random color discarding and attention comprises the following steps:

Receiving image data, and preprocessing the image data to obtain preprocessed image data;

inputting the preprocessed image data into a OSNet which is built in advance and embedded with a cascade self-attention module, extracting characteristics, and outputting to obtain image characteristics;

classifying the image features through the full connection layer, and mapping the image features onto corresponding class labels to obtain classified image features;

Calculating the identity loss with label smoothness by using the classified image features, and performing optimization training on a pre-established lightweight pedestrian re-recognition network model through counter propagation update gradient to obtain an optimized lightweight pedestrian re-recognition network model;

and acquiring a test set of the pedestrian re-recognition data set, inputting the test set of the pedestrian re-recognition data set into the optimized light-weight pedestrian re-recognition network model, and outputting to obtain a light-weight pedestrian re-recognition result.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the preprocessing of the image data comprises the following steps:

data enhancement: and executing random overturn, random erasing and a random color discarding strategy based on LAGT to finally obtain the preprocessed image data.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the random color discarding strategy based on LAGT adopts aggregation gray level transformation to gray the image, and the calculation process is as follows:

Wherein, ，AndRepresenting the red, green and blue color channels,，，，AndRepresenting pixel values at specific positions of red, green and blue channels, respectively, the weighting coefficients are expressed as，For the height of the image to be high,Is the width of the image.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the implementation process of LAGT is as follows:

during the data loading process, a random identity sampler is adopted to randomly select Identity of each selectedThe size of the training batch of pictures for pedestrians is as followsThe set is expressed asWhereinRepresenting the first of the training lotsThe image of the object is a single image,Represent the firstSample labels of individual images LAGT with probabilityConverting the original image into a gray image, randomly selecting a rectangular area from the original image, replacing gray values of the areas corresponding to the gray image into the original image, and giving an original pedestrian pictureWith probabilityPerforming an aggregate gray scale transformation, the corresponding AGT image is defined as:

Original image Area size：

Wherein, For the height of the image to be high,Is the width of the image;

Area of AGT rectangle ：

Wherein, 、Minimum and maximum values for the AGT image area to original image area ratio;

Aspect ratio of AGT rectangle High and highSum width of：

Wherein the method comprises the steps of，Maximum and minimum values of the aspect ratio of the gray scale transformation rectangle;

in the original image Random initialization of a pointThe following conditions are satisfied:

For the height of the image to be high, Is the width of the image;

The selected LAGT region is ：

For each ofThe LAGT region rect is selected as:

The LAGT algorithm can ultimately be expressed as:

Wherein, Will bePixels in an image corresponding rectangle are given toThe image is displayed in a form of a picture,Is LAGT transformed samples.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the embedded cascade self-attention module includes SSAM and CSAM.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the SSAM intermediate feature map of the preprocessed image data extraction isWhereinAs a result of the number of characteristic channels,For the size of the intermediate feature diagramPerforming a1×1 convolution operation to obtain，，WhereinFor a pair of，，After the remodelling operation, obtaining a spatial self-attention affinity matrix,WhereinThe procedure is represented as follows:

Wherein the method comprises the steps of Representation of the first in spaceNumber of position pairsAttention weight of individual locations, willAnd (3) withMultiplying the obtained images by embedding attention weights, and overlapping the obtained images with original characteristic pixels to obtain a spatial self-attention weighted characteristic diagram：

Wherein the method comprises the steps ofIs a feature map that weights spatial self-attention by adjusting SSAM the affected hyper-parametersThe processing is performed by CSAM.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the CSAM has a spatial self-attention weighted feature map for the inputAfter the remodeling operation, obtaining a channel self-attention affinity matrix，The procedure is represented as follows:

Wherein the method comprises the steps of Representation channelPaired channelsIs weighted for (a) attention toInitialize one andThe sizes are the same and the values are allMatrix of maximaThe new channel self-attention weight affinity matrix is，Will beAnd (3) withMultiplying and embedding attention weight and thenCorresponding position pixels are overlapped to obtain a characteristic diagram of channel self-attention weighting；

Wherein the method comprises the steps ofIs to adjust the super parameter of CSAM influence;

feature map weighting channel self-attention And obtaining the image characteristics after passing through OSNet backbone networks.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: the process of calculating the identity loss L _ID with label smoothing by using the classified image features comprises the following steps:

Wherein the method comprises the steps of Representing the category of the pedestrian,Representing the number of images of the pedestrians in the training set,As a real tag it is possible to provide a real tag,Is a predicted logical value for the network.

In a second aspect, to achieve the above object, the present invention discloses a lightweight pedestrian re-recognition system based on random color discarding and attention, comprising:

The image processing module is used for receiving the image data, preprocessing the image data and obtaining preprocessed image data;

the feature extraction module is used for inputting the preprocessed image data into the OSNet which is built in advance and embedded with the cascade self-attention module, extracting features and outputting the features to obtain image features;

The image classification module is used for classifying the image features through the full connection layer, mapping the image features onto corresponding category labels and obtaining classified image features;

the model training module is used for calculating the identity loss with label smoothness by using the classified image characteristics, and carrying out optimization training on a pre-established lightweight pedestrian re-recognition network model through counter propagation update gradient to obtain an optimized lightweight pedestrian re-recognition network model;

The pedestrian re-recognition module is used for acquiring a test set of the pedestrian re-recognition data set, inputting the test set of the pedestrian re-recognition data set into the optimized light-weight pedestrian re-recognition network model, and outputting to obtain a light-weight pedestrian re-recognition result.

A terminal device comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, the memory storing the computer program capable of running on the processor, when loading and executing the computer program, employing a lightweight pedestrian re-recognition method based on random color discarding and attention as described above.

The invention has the beneficial effects that:

the invention adopts OSNet as a backbone network of the model, and the parameter quantity is greatly reduced;

For pedestrians with clothes of the same color, the negative influence of color deviation on the recognition effect can be effectively restrained through LAGT algorithm, the model is encouraged to find and pay attention to characteristic information irrelevant to the color, the weights of the neural network on the color characteristic and the non-color characteristic are balanced, and the Baseline recognition effect is improved;

for pedestrian pictures with complex backgrounds and shielding, the invention can effectively aggregate pedestrians and inhibit irrelevant backgrounds through the cascade self-attention module, so that the extracted features are finer and have discrimination, and the Baseline identification effect is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort;

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is an overall frame diagram of a lightweight pedestrian re-recognition network based on random color discard and self-attention in accordance with the present invention;

FIG. 3 is an original sample and LAGT sample example of the present invention;

FIG. 4 is SSAM employed in the present invention;

FIG. 5 is a CSAM used in the present invention;

FIG. 6 is a search result of a training model using RGB images and original grayscale transformed images on a Market-1501 dataset;

FIG. 7 is an example of a pedestrian image in a green background on a dataset;

FIG. 8 is a search result of a green background pedestrian image on a Market-1501 dataset using an RGB image and an original gray scale transformed image training model;

FIG. 9 is a search result of a gray image training model using RGB images and different gray transforms on a Market-1501 dataset;

FIG. 10 is a visual comparison of pedestrian re-identification under color deviation in accordance with the present invention;

FIG. 11 is a visual comparison of pedestrian re-recognition under a complex background and occlusion in accordance with the present invention;

FIG. 12 is a schematic diagram of the system of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Embodiment one:

the following description is made of the relevant terms related to the embodiments of the present application:

The gray scale transformation refers to a method of changing the gray scale value of each pixel in the source image point by point according to a certain transformation relationship according to a certain target condition. The purpose is to improve the image quality and make the display effect of the image clearer. The gray level conversion processing of the image is a very basic and direct spatial domain image processing method in the image enhancement processing technology, and is also an important component of image digitizing software and image display software.

The Self-Attention mechanism is a special Attention mechanism that allows the model to take into account the relationship of each element in a sequence to all other elements when processing a sequence. Such a mechanism may help the model better understand the context information in the sequence, thereby processing the sequence data more accurately. ( Sequence data is a type of data in which elements exist in a particular order. Each element has a specific position, and the sequence relation among the positions has important influence on the meaning and processing mode of the data )

As shown in fig. 1, the lightweight pedestrian re-recognition method based on random color discarding and attention is characterized in that the method comprises the following steps:

The process of preprocessing the image data includes:

data enhancement: performing random overturn, random erasing and random color discarding strategies based on LAGT to finally obtain preprocessed image data;

The random color discarding strategy based on LAGT adopts aggregation gray level transformation to gray the image, and the calculation process is as follows:

Wherein, ，AndRepresenting the red, green and blue color channels,，，，AndRepresenting pixel values at specific positions of red, green and blue channels, respectively, the weighting coefficients are expressed as。

The LAGT algorithm is implemented as follows: during the data loading process, a random identity sampler is adopted to randomly selectIdentity of each selectedThe size of the training batch of pictures for pedestrians is as follows. The set is expressed asWhereinRepresenting the first of the training lotsThe image of the object is a single image,Represent the firstSample labels of the individual images. LAGT with probabilityAnd converting the original image into a gray image, randomly selecting a rectangular area from the original image, and replacing gray values of the corresponding position area of the gray image into the original image. Given an original pedestrian pictureWith probabilityAn aggregate gray scale transformation is performed, the corresponding AGT image of which is defined as:

Original image Area size：

Wherein, For the height of the image to be high,Is the width of the image.

Area of AGT rectangle：

Wherein, 、Is the minimum and maximum value of the AGT image area to original image area ratio.

Aspect ratio of AGT rectangleHigh and highSum width of：

Wherein the method comprises the steps of，Maximum and minimum values of the aspect ratio of the rectangle are transformed for the gray scale.

In the original imageRandom initialization of a pointThe following conditions are satisfied:

For the height of the image to be high, Is the width of the image;

The selected LAGT region is ：

For each ofThe LAGT region selected is:

The LAGT algorithm can ultimately be expressed as:

Wherein, Will bePixels in an image corresponding rectangle are given toThe image is displayed in a form of a picture,Is LAGT transformed samples. The original samples and LAGT samples are shown in fig. 3.

The embedded cascade self-attention module comprises SSAM and CSAM, as shown in figures 4 and 5 respectively;

SSAM can help the network aggregate semantically related features in space and highlight individual features, making the network more focused on details of pedestrians.

The neutral network extracted intermediate characteristic diagram isWhereinAs a result of the number of characteristic channels,Is the size of the intermediate feature map. For a pair ofPerforming a1×1 convolution operation to obtain，，In the process of deducing the attention matrix, more refined and abstract characteristic representation can be extracted for channel dimension reduction, and the network performance is not affected while the computational complexity is reduced, wherein. For a pair of，，After the remodeling (reshape) operation, a spatial self-attention affinity matrix is obtained,WhereinThe process is represented as follows:

Wherein the method comprises the steps of Representation of the first in spaceNumber of position pairsAttention weight for each location. Will beAnd (3) withMultiplying the obtained images by embedding attention weights, and overlapping the obtained images with original characteristic pixels to obtain a spatial self-attention weighted characteristic diagram：

Wherein the method comprises the steps ofIs the hyper-parameter that adjusts SSAM effects.

The CSAM can help the network learn the association information among the channels, and further extract the effective characteristic representation among different channels in the image.

Feature map with spatial self-attention weighting for inputAfter the remodeling (reshape) operation, a channel self-attention affinity matrix is obtained，The procedure is represented as follows:

Wherein the method comprises the steps of Representation channelPaired channelsIs a weight of attention of (2). For the followingWe initialize one and using the normalization (Normalize) methodThe sizes are the same and the values are allMatrix of maximaThe new channel self-attention weight affinity matrix is，The method effectively avoids the influence of noise or abnormal value on the maximum activation value, encourages other channels to provide complementary information by subtracting the maximum activation value from each channel, increases the perception diversity of the model to different characteristic channels, and improves the robustness of the model. Will beAnd (3) withMultiplying and embedding attention weight and thenCorresponding position pixels are overlapped to obtain a characteristic diagram of channel self-attention weighting：

Wherein the method comprises the steps ofIs a super parameter for adjusting the CSAM effect.

After passing through OSNet backbone network with cascade self-attention module, feature map with size b×512×16×8 is output, where b is batch size, 512 is channel number, 16 is feature map height, and 8 is feature map width.

characteristic score with b×N, N is the number of categories;

A calculation process for calculating identity loss L _ID with label smoothing by using the classified image features:

The label changes smoothlyIs characterized by comprising the following structure:

Wherein the method comprises the steps of The model is encouraged to not trust the training set too much, increasing generalization ability.

And acquiring a pedestrian re-recognition data set, inputting the pedestrian re-recognition data set into the optimized light-weight pedestrian re-recognition network model, and outputting to obtain a light-weight pedestrian re-recognition result.

Tests were performed on both marker-1501 and DukeMTMC-reID pedestrian re-identification datasets. And adopting AMSGrad optimizers, setting the initial learning rate to be 0.0015, and adopting a cosine annealing strategy to update the learning rate. Batch size and weight decay were 64 and 5e-4, respectively. Training epoch to be 250, adjusting LAGT algorithm probability to be 0.6, and controlling super parameters influenced by SSAM and CSAM modulesAndFor 1, the tag smoothed ID Loss is used for supervision and the pedestrian matching uses a cosine distance.

Specifically, the following examples are provided to further illustrate the present invention:

The experimental environment parameters are shown in table 1.

TABLE 1

In the experiments, the present invention selects OSNet pre-trained on ImageNet as Baseline, and in order to verify the effect of LAGT and SAM alone and in combination on the network model, ablation experiments were performed on both of the marker-1501 and DukeMTMC-reID datasets, the results of which are shown in table 2. As can be seen from Table 2, LAGT and SAM both have some improvement over Baseline. The simultaneous addition of both modules to Baseline LAGT weakens the negative impact of color bias on model recognition, SAM makes the model extract finer features in space and channels, thus the network achieves the best recognition performance, rank-1 and mAP in the marker-1501 dataset reach 95.5% and 87.7% respectively, 0.7% and 1.0% improvement over Baseline, and Rank-1 and mAP in the DukeMTMC-reID dataset reach 89.2% and 77.2% respectively, 0.8% and 0.5% improvement over Baseline. In addition, our network is lightweight, the parameter is only 2.8M, and the network provides convenience for application and deployment.

TABLE 2

To verify the effectiveness of the present invention, the two datasets of Market-1501, dukeMTMC-reID are compared with the advanced pedestrian re-recognition method in recent years, wherein the method is based on PCB+RPP of local feature extraction, HA-CNN, mancs, AANet and IANet of attention mechanism. In addition, we have chosen some other better performing methods for comparison, such as BDB, boT, etc., and the statistics are shown in Table 3.

As can be seen from Table 3, the Rank-1 index of the invention reached 95.5% and 89.2% on the two data sets, and the mAP index reached 87.7% and 77.2% on the two data sets, respectively, the invention focused on global feature extraction only, and Rank-1 and mAP increased by 2.4% and 6.7% on the mark-1501 and 6.3% and 9.7% on DukeMTMC-reID, respectively, compared with the PCB+RPP network with local feature extraction. Compared to the complex attribute attention network AANet, rank-1 and mAP are promoted by 1.6% and 4.3% on Market-1501, respectively, and 1.5% and 2.9% on DukeMTMC-reID, respectively. Compared with the strong baseline BoT method which only focuses on global features, the method has the advantage of small parameter, and Rank-1 and mAP are respectively improved by 1.0% and 1.8% on the mark-1501 and 2.8% and 0.8% on the DukeMTMC-reID.

TABLE 3 Table 3

Table 4 shows the comparison result of the parameters of the network and the main stream network, and the comparison table shows that the network adds LAGT and SAM modules on the basis of OSNet to effectively improve the recognition performance of the model only on the premise of increasing a small amount of parameters and calculation time. Compared with ResNet networks, the model parameter is simplified, the consumed computing resources are less, the training time of the model is shortened, the model can adapt to tasks more quickly, and the efficiency of the model in practical application is improved. Compared with OSNet, the invention improves the recognition precision and generalization capability of pedestrians, and simultaneously increases the parameter by only 0.6M, which proves that the designed network has better light weight.

TABLE 4 Table 4

The dataset contains various complex and varying color deviations, resulting in an insufficient robustness of the model in coping with color changes. To address this issue, it is important to balance the weights between the color features and other key discriminant features. Fig. 6 shows the search results of training models on the mark-1501 dataset by using RGB images and gray images, respectively, and it can be seen from the figure that the search results are affected when color deviation exists between the query image and the gallery image, and the search effect of the sample is improved to a certain extent after color information is ignored.

Based on the problem, the training image with the local gray level image is generated through random color discarding, so that the robustness of the model to color deviation is improved. The key step in the RCD strategy is to replace a local color image with its corresponding local gray image in the image, and the gray conversion used is:

Wherein, 、、Respectively representing red, green and blue channels. It can be seen from the formula that, in order to reflect the sensitivity of human eyes to different colors, the red, green and blue three channels are provided with different conversion weights, and the conversion strategy is more in line with the visual image of human eyes, but the following problems exist in the task of re-identifying pedestrians based on a deep learning network:

(1) Different from the sensitivity of human eye vision to colors, the network model has the same identification capability to the information of three color channels, so that the adoption of a color weight distribution strategy based on human eye vision perception characteristics has a certain limitation.

(2) In the pedestrian data set, according to the statistics of the wearing habits of people, fewer pedestrians with green wearing colors are worn, and green features often appear on background information of pictures, as shown in fig. 7, trees and lawns serve as green background information of pedestrian images, when the maximum transition weight is assigned to a green channel, the generalization capability of a model on the background information is increased, the generalization capability of the model on the pedestrians is ignored, and recognition deviation is generated.

For this problem, a corresponding experimental verification was performed as shown in fig. 8. It can be seen from the graph that when the deviation between the background color characteristics of the picture and the pedestrian color characteristics is large, particularly the green background picture, and the original gray level transformation strategy is adopted, the large transformation weight of the green channel can improve the generalization capability of the model to the background, neglect the generalization capability to the pedestrian, and introduce deviation to reduce the recognition effect.

According to the analysis, the invention designs a color discarding strategy more suitable for a learning model on the basis of RCD, changes the original gray level transformation, and the new transformation method should meet the following three requirements:

(1) The variability of the network model on three color channels is reduced, the negative influence of color deviation on recognition is weakened, and meanwhile, extra deviation is avoided;

(2) After the color conversion, more stable structure and texture characteristics in the original image are reserved, so that distortion is reduced.

(3) The improved gray level conversion method does not change the learning strategy, can avoid the phenomenon of overfitting during training, saves the computing resource and achieves the purposes of light weight and effectiveness.

Based on the analysis of the gray scale transformation principle set forth in the above description, a weighted average method is employed to optimize the gray scale transformation strategy by carefully balancing the transformation weights assigned to the red, green and blue channels. For the followingPedestrian image size, aggregate gray scale transform (AGT) formula is as follows:

Wherein, ，AndRepresenting the red, green and blue color channels,，。，AndRepresenting pixel values at specific locations of the red, green and blue channels, respectively. The weight coefficient is expressed asIs a constant balance factor, selects. In the above formula, uniform weights are assigned to three channels in the conversion from an RGB image to a gray image.

To preliminarily verify the effectiveness of the improvement, the results of a search of a gray image training model generated on a mark-1501 using RGB images and different gray transforms are shown as shown in FIG. 9, in which、、The search results obtained by the original gray level conversion and the average gray level conversion are respectively obtained when the input is an RGB image. The result shows that AGT does not introduce extra deviation while eliminating color deviation, and maintains stable structural texture characteristics of RGB image, and the identification effect is superior to that of original gray scale transformation.

Embodiment two: in a second aspect, as shown in fig. 12, in order to achieve the above object, the present invention discloses a lightweight pedestrian re-recognition system based on random color discarding and attention, comprising:

The image processing module 11 is configured to receive image data, and perform preprocessing on the image data to obtain preprocessed image data;

the feature extraction module 12 is configured to input the preprocessed image data into the pre-established OSNet embedded with the cascade self-attention module, extract features, and output image features;

The image classification module 13 is used for classifying the image features through the full connection layer, mapping the image features onto corresponding category labels, and obtaining classified image features;

The model training module 14 is configured to calculate an identity loss with label smoothing using the classified image features, and perform optimization training on a pre-established lightweight pedestrian re-recognition network model by back-propagating an update gradient to obtain an optimized lightweight pedestrian re-recognition network model;

The pedestrian re-recognition module 15 is configured to obtain a test set of the pedestrian re-recognition data set, input the test set of the pedestrian re-recognition data set into the optimized lightweight pedestrian re-recognition network model, and output and obtain a lightweight pedestrian re-recognition result.

Based on the same inventive concept, the present invention also provides a computer apparatus comprising: one or more processors, and memory for storing one or more computer programs; the program includes program instructions and the processor is configured to execute the program instructions stored in the memory. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processor, digital signal processor (DIGITAL SIGNAL Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field-Programmable gate array (Field-Programmable GATEARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., that are the computational core and control core of the terminal for implementing one or more instructions, particularly for loading and executing one or more instructions within a computer storage medium to implement the methods described above.

It should be further noted that, based on the same inventive concept, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor performs the above method. The storage media may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electrical, magnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing has shown and described the basic principles, principal features, and advantages of the present disclosure. It will be understood by those skilled in the art that the present disclosure is not limited to the embodiments described above, which have been described in the foregoing and description merely illustrates the principles of the disclosure, and that various changes and modifications may be made therein without departing from the spirit and scope of the disclosure, which is defined in the appended claims.

Claims

1. A lightweight pedestrian re-identification method based on random color discarding and attention, the method comprising the steps of:

The preprocessing of the image data comprises the following steps:

Wherein, ，AndRepresenting the red, green and blue color channels,，，，AndRepresenting pixel values at specific positions of red, green and blue channels, respectively, the weighting coefficients are expressed as，For the height of the image to be high,Is the width of the image;

The implementation process of LAGT is as follows:

Original image Area size：

Wherein, For the height of the image to be high,Is the width of the image;

Area of AGT rectangle ：

Aspect ratio of AGT rectangle High and highSum width of：

For the height of the image to be high, Is the width of the image;

The selected LAGT region is ：

For each ofThe LAGT region rect is selected as:

The LAGT algorithm can ultimately be expressed as:

Wherein, Will bePixels in an image corresponding rectangle are given toThe image is displayed in a form of a picture,Is LAGT transformed samples;

the embedded cascade self-attention module comprises SSAM and a CSAM;

The SSAM intermediate feature map of the preprocessed image data extraction is WhereinAs a result of the number of characteristic channels,For the size of the intermediate feature diagramPerforming a1×1 convolution operation to obtain，，WhereinFor a pair of，，After the remodelling operation, obtaining a spatial self-attention affinity matrix,WhereinThe procedure is represented as follows:

Wherein the method comprises the steps ofIs a feature map that weights spatial self-attention by adjusting SSAM the affected hyper-parametersProcessing by CSAM;

the CSAM has a spatial self-attention weighted feature map for the input After the remodeling operation, obtaining a channel self-attention affinity matrix，The procedure is represented as follows:

feature map weighting channel self-attention Obtaining image characteristics after passing through OSNet backbone networks;

2. The method for lightweight pedestrian re-identification based on random color discard and attention as in claim 1, wherein the calculation process of calculating identity loss with label smoothing L _ID using the classified image features:

3. A lightweight pedestrian re-identification system based on random color discard and attention comprising:

The preprocessing of the image data comprises the following steps:

The implementation process of LAGT is as follows:

Original image Area size：

Wherein, For the height of the image to be high,Is the width of the image;

Area of AGT rectangle ：

Aspect ratio of AGT rectangle High and highSum width of：

For the height of the image to be high, Is the width of the image;

The selected LAGT region is ：

For each ofThe LAGT region rect is selected as:

The LAGT algorithm can ultimately be expressed as:

the embedded cascade self-attention module comprises SSAM and a CSAM;

4. A terminal device comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, characterized in that the memory has stored therein a computer program capable of running on the processor, which processor, when loaded and executing, employs the light weight pedestrian re-recognition method based on random color discarding and attention as claimed in any one of claims 1 to 2.