CN117036711A - Weak supervision semantic segmentation method based on attention adjustment - Google Patents
Weak supervision semantic segmentation method based on attention adjustment Download PDFInfo
- Publication number
- CN117036711A CN117036711A CN202311064941.7A CN202311064941A CN117036711A CN 117036711 A CN117036711 A CN 117036711A CN 202311064941 A CN202311064941 A CN 202311064941A CN 117036711 A CN117036711 A CN 117036711A
- Authority
- CN
- China
- Prior art keywords
- attention
- block
- class
- semantic segmentation
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000004913 activation Effects 0.000 claims abstract description 51
- 238000012549 training Methods 0.000 claims description 39
- 238000010586 diagram Methods 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 230000006798 recombination Effects 0.000 claims description 2
- 238000005215 recombination Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000010200 validation analysis Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a weak supervision semantic segmentation method based on attention regulation, which explores the application of a transducer in a weak supervision semantic segmentation task. The Transformer-based method optimizes the class activation graph using attention, but the class activation graph obtained after optimization has an incomplete activation problem due to the fact that the attention between partial class and block is wrong. In order to solve the problem, the invention provides a novel weak supervision semantic segmentation framework, wherein an attention adjustment strategy is designed in the framework, the attention between classes is adjusted according to the attention between blocks, and more target areas can be activated by the adjusted attention. Compared with the latest method, the method provided by the invention achieves the optimal result on the PASCAL VOC 2012 data set and the MS COCO 2014 data set.
Description
Technical Field
The invention belongs to the technical field of image segmentation, and particularly relates to a weak supervision semantic segmentation method based on attention regulation.
Background
Semantic segmentation is one of the fundamental and challenging tasks in the computer vision field, whose research purpose is to classify each pixel in an image and assign it to a specific semantic class. Semantic segmentation has wide application in many fields, such as image recognition, autopilot, medical image analysis, scene understanding, and video analysis, etc., which can help computers better understand the content in images, thereby enabling automated scene understanding and decision making. In recent years, due to the vigorous development of the deep learning method, the semantic segmentation has also made remarkable progress, wherein a fully supervised semantic segmentation model is widely applied and has excellent performance. However, training the fully supervised semantic segmentation model often requires large-scale pixel-level labeling data, and obtaining the pixel-level labeling data is often difficult, time-consuming and labor-consuming. To address this problem, many efforts began to employ weakly supervised semantic segmentation techniques. The semantic segmentation network is trained by weak labels such as boundary box labels, point labels, graffiti labels or image-level labels. The image level labels are the labels which are most convenient to acquire, and are widely studied in weak supervision semantic segmentation.
Although the acquisition of image-level annotations is very convenient, the image-level annotations have a problem in that they do not provide sufficient location supervision information, because they only give information on the class of objects contained in one image, and do not indicate specific location information of the class of objects in the image. The development of class activation diagrams (CAMs) provides an efficient way to obtain location information using only image level labels. For weakly supervised semantic segmentation of image level labels, most existing approaches are usually solved using the following procedure: 1) Training a Convolutional Neural Network (CNN) using image-level labeling, from which class activation maps are generated to obtain seed regions; 2) Expanding the seed area with a certain constraint to obtain a pseudo tag; 3) The fully supervised semantic segmentation network is trained using pseudo labels as real labels. However, the class activation map generated by convolutional neural networks has a problem in that it tends to activate a localized, discernable region, while ignoring the complete object range, resulting in incomplete activation problems. At present, research proves that the characteristic is caused by the inherent characteristic of the convolutional neural network, namely, the convolutional operation in the convolutional neural network can only capture a small range of characteristic dependence and can not explore global characteristic relations, so that the activation object area is too small, the quality of the generated pseudo tag is influenced, and finally, an ideal weak supervision semantic segmentation result is difficult to obtain.
At present, transform has enjoyed tremendous success in many computer vision tasks, mainly due to its own attention mechanisms. The transducer's attention mechanism can model global feature relationships and overcome the above-described drawbacks of convolutional neural networks. Some researchers have started weak-supervision semantic segmentation studies using transformers, which typically use a Transformer structure to extract image features and generate class activation graphs, and then use attention to optimize the class activation graphs to obtain a more complete class activation graph. Although the existing weak supervision semantic segmentation method based on the Transformer generally uses attention to optimize the class activation graph, the class activation graph still cannot completely activate the object region after being subjected to attention optimization due to errors between the attention middle classification generated by the Transformer and the attention between the blocks.
Disclosure of Invention
The invention aims to solve the problem that a target area cannot be completely activated in weak supervision semantic segmentation, and provides a weak supervision semantic segmentation method based on attention fusion.
The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a weak supervision semantic segmentation method based on attention adjustment, comprising the following steps:
step 1, data preparation: acquiring a labeling image data set, and dividing the data set into a training set, a verification set and a test set;
step 2, data preprocessing: carrying out random horizontal overturn and color dithering treatment on the image, carrying out normalization treatment on the image, carrying out random clipping, and taking the clipped image as the input of a weak supervision semantic segmentation model;
step 3, building a model: constructing a weak supervision semantic segmentation model by taking DeiT-S pre-trained on an ImageNet as a backbone of the model;
step 4, model training: optimizing a weak supervision semantic segmentation model by using an Adam optimizer, training the model for a set period by using a training set, and generating a class activation diagram by using a loss function through multi-label cross entropy loss and the trained model;
step 5, distributing a class to each pixel position according to the value of the class activation diagram to generate a pixel-level pseudo tag, and then training a semantic segmentation network deep V2 by using the pixel-level pseudo tag; and inputting the pictures in the verification set and the test set into the trained model to obtain a final segmentation map.
Further, the model construction in the step 3 comprises:
step 3.1, constructing a weak supervision semantic segmentation framework based on attention fusion, segmenting a preprocessed image into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the framework;
step 3.2, inputting the input token into a Transfomer coding layer in the framework to obtain an output token; then extracting the last N block tokens from the output tokens to form an output block token Tp_out, and carrying out recombination and convolution operation on the output block token Tp_out to obtain an initial class activation diagram Original-CAM;
and 3.3, when the input token passes through the Transfomer coding layer, the Attention module calculates the Attention of the input token to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a matrix array and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, T represents matrix transposition, and d k Representing a scaling factor;
step 3.4, attention is further divided into class-to-block attention A c2p Sum block-to-block attention a p2p Then pass the attention A between the blocks p2p Attention a between class and block c2p Adjusting;
step 3.5, use class-to-block attention A c2p Sum block-to-block attention a p2p The initial class activation map is optimized.
Further, class-to-block attention A c2p Sum block-to-block attention a p2p The expression is as follows:
A c2p =Attention[1:C,C+1:C+N]
A p2p =Attention[C+1:C+N,C+1:C+N]
the attention between class c and block i is adjusted as follows:
firstly, sorting all blocks according to the order of attention values from big to small according to the attention between the blocks and the block i, and selecting the top p% of the sorted blocks;
then, the attention between class c and the selected block is taken out and calculated to obtain the attention adjustment factor between class c and block i:
wherein r (c, i) represents A c2p Attention regulator between class C and block i, C e {1,2, …, C } represents the total number of data set classes, i, j represents blocks, i e {1,2, …, N }, j e U, U represents the set of the first p% of blocks with the greatest attention between block i, S represents the number of blocks in U; a is that c2p (c, j) represents the attention between the representation class c and the block j;
attention adjustment factor r (c, i) is then added to the attention between class c and block i to adjust:
A c2p (c,i)=A c2p (c,i)+α*r(c,i)
wherein A is c2p (c, i) represents the attention between class c and block i, and α represents the attention regulator coefficient.
Further, class-to-block attention A is used in step 3.5 c2p Sum block-to-block attention a p2p Optimizing the initial class activation graph, comprising:
multiplying an initial class activation diagram initial-CAM by class-to-block attention to obtain a preliminary optimized adjustment class activation diagram;
and then further optimizing by matrix multiplication between the block-to-block attention and the adjustment class activation map to obtain a final class activation map.
Further, the model training process in step 4 is as follows:
step 4.1, setting super parameters of a weak supervision semantic segmentation model: model training times Epoch, initial learning rate and model training batch batch_size, wherein an optimizer used in training is an Adam optimizer, and a loss function is multi-label cross entropy loss;
step 4.2, carrying out multi-round training on the weak supervision semantic segmentation model, and storing parameters corresponding to a round of results with the highest training mIoU value;
and 4.3, after the weak supervision semantic segmentation model is trained, loading the stored best parameters into the model, inputting training set data into the model, and generating a complete class activation diagram by the trained model.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
the invention mainly solves the problem of incomplete activation of the class activation graph in the weak supervision semantic segmentation. A simple and effective weak supervision semantic segmentation framework is provided by taking a transducer as a basic network structure. In the framework, firstly, an attention adjustment strategy is designed, attention between the classes and the blocks is adjusted according to the attention between the blocks, the probability of error association between the classes and the blocks is effectively reduced, then the class activation diagram is optimized by using the adjusted attention, and at the moment, a target area in the obtained class activation diagram can be activated more completely and accurately, and the problem of incomplete activation of the class activation diagram can be better solved.
Drawings
FIG. 1 is a diagram of a weakly supervised semantic segmentation overall framework based on attention fusion.
Fig. 2 is an exemplary graph of segmentation results on a pass VOC 2012 validation set.
Fig. 3 is an exemplary graph of segmentation results on an MS co 2014 verification set.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
The invention discloses a weak supervision semantic segmentation method based on attention regulation, which provides a novel framework based on a Transformer, which is used for weak supervision semantic segmentation tasks under image level annotation, and the overall structure of the framework is shown in figure 1 and mainly comprises three parts: 1) Performing feature extraction by using a transducer and generating an initial class activation diagram; 2) The attention adjusting module is used for adjusting the attention between the classes and the blocks according to the attention between the blocks, so that the accuracy of the attention between the classes and the blocks is effectively improved; 3) The class activation map is optimized by using the attention, and a more complete and accurate class activation map is obtained. The method comprises the following steps:
step 1: data preparation.
In the present invention, the paspal VOC 2012 dataset and the MS COCO 2014 dataset are used. Wherein the Pascal VOC 2012 dataset has 21 categories, including 20 object classes and one background class; the MS COCO 2014 dataset has 81 categories, including 80 object classes and one background class. The paspal VOC 2012 dataset can be divided into three parts: training set (comprising 1464 images), validation set (comprising 1449 images) and test set (comprising 1456 images), wherein training set is typically 10582 images augmented with additional data; the MS COCO 2014 dataset can be divided into two parts: a training set (comprising 82081 images) and a validation set (comprising 40137 images).
Step 2: and (5) preprocessing data.
And carrying out random horizontal overturn and color dithering treatment on the image, and setting the brightness, contrast and saturation values of the image to 0.3. The image was normalized using transform.normal to be 256×256 in size, and then randomly cropped using transform.random crop to be 224×224 in size. The cropped image is input into the model.
Step 3: and (5) building a model.
Step 3.1: and (3) constructing a weak supervision semantic segmentation framework based on attention fusion, segmenting the image preprocessed in the step (2) into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the framework.
Step 3.2: the input token is input to a Transfomer encoding layer in the framework to obtain an output token. The last N block tokens are then extracted from the output tokens to form an output block token Tp_out, which is subjected to a reorganization (Reshape) and convolution (Conv) operation to obtain an initial class activation map Original-CAM.
Original-CAM=Conv(Reshape(Tp_out))
Step 3.3: when the input token passes through the Transfomer coding layer, the Attention module calculates the Attention of the input token to generate Attention, the shape is [ C+N, C+N ], and the calculation formula is as follows:
wherein Q, K represents a matrix array and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, T represents matrix transposition, and d k Representing the scaling factor.
Step 3.4: attention can be further divided into class-to-block Attention A c2p Sum block-to-block attention a p2p Wherein A is c2p =Attention[1:C,C+1:C+N],A p2p =Attention[C+1:C+N,C+1:C+N]. Then pass the attention A from block to block p2p To pay attention between classes and blocksForce A c2p And adjusting. If the attention between the class c and the block i is to be adjusted, firstly, sorting the blocks according to the order of the attention values from big to small according to the attention between the blocks, then selecting some blocks which are ranked 30% before sorting, and then calculating the attention between the blocks to obtain the attention adjustment factor between the class c and the block i:
wherein r (c, i) represents A c2p Attention regulator between class C and block i, C e {1,2, …, C } represents the total number of data set classes, i, j represents a block, i e {1,2, …, N }, j e U, U represents a set of blocks of greater attention to block i, S represents the number of blocks in U. Attention adjustment factor r (c, i) is then added to the attention between class c and block i to adjust:
A c2p (c,i)=A c2p (c,i)+α*r(c,i)
wherein A is c2p (c, i) represents the attention between class c and block i, and α represents the attention regulator coefficient.
Step 3.5: using class-to-block attention A c2p Sum block-to-block attention a p2p To optimize the initial class activation map. The initial class activation diagram initial-CAM is multiplied by class-to-block attention to obtain a preliminary optimized adjustment class activation diagram, and then the adjustment class activation diagram is further optimized by matrix multiplication between the block-to-block attention and the adjustment class activation diagram to obtain a final class activation diagram.
Step 4: and (5) model training.
Step 4.1: setting relevant super parameters of a weak supervision semantic segmentation model, setting the model training frequency Epoch to 60, setting the model training batch batch_size to 64, setting an optimizer used during training to be an Adam optimizer, wherein the loss function is multi-label cross entropy loss, and setting the initial learning rate to 5e-4.
Step 4.2: and carrying out multi-round training on the weak supervision semantic segmentation model, and storing parameters corresponding to the best round of training result (the highest training mIoU value) by observing the training result.
Step 4.3: after the weak supervision semantic segmentation model is trained, the stored best parameters are loaded into the model, then training set data are input into the model, and the trained model can generate a complete class activation diagram.
Step 5: and (3) assigning a class to each pixel position according to the value of the class activation graph to generate a pixel-level pseudo tag, and then training the existing semantic segmentation network deep V2 by using the pixel-level pseudo tag. The pictures in the verification set and the test set are input into the trained model to obtain a final segmentation map, as shown in fig. 2 and 3, the second column is a real segmentation map, the third column is a prediction segmentation map of the invention, and the model prediction segmentation map of the invention is found to be very close to the real segmentation map.
Claims (5)
1. The weak supervision semantic segmentation method based on attention regulation is characterized by comprising the following steps of:
step 1, data preparation: acquiring a labeling image data set, and dividing the data set into a training set, a verification set and a test set;
step 2, data preprocessing: carrying out random horizontal overturn and color dithering treatment on the image, carrying out normalization treatment on the image, carrying out random clipping, and taking the clipped image as the input of a weak supervision semantic segmentation model;
step 3, building a model: constructing a weak supervision semantic segmentation model by taking DeiT-S pre-trained on an ImageNet as a backbone of the model;
step 4, model training: optimizing a weak supervision semantic segmentation model by using an Adam optimizer, training the model for a set period by using a training set, and generating a class activation diagram by using a loss function through multi-label cross entropy loss and the trained model;
step 5, distributing a class to each pixel position according to the value of the class activation diagram to generate a pixel-level pseudo tag, and then training a semantic segmentation network deep V2 by using the pixel-level pseudo tag; and inputting the pictures in the verification set and the test set into the trained model to obtain a final segmentation map.
2. The attention-adjustment-based weak supervision semantic segmentation method according to claim 1, wherein the model building in step 3 comprises:
step 3.1, constructing a weak supervision semantic segmentation framework based on attention fusion, segmenting a preprocessed image into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the framework;
step 3.2, inputting the input token into a Transfomer coding layer in the framework to obtain an output token; then extracting the last N block tokens from the output tokens to form an output block token Tp_out, and carrying out recombination and convolution operation on the output block token Tp_out to obtain an initial class activation diagram Original-CAM;
and 3.3, when the input token passes through the Transfomer coding layer, the Attention module calculates the Attention of the input token to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a matrix array and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, T represents matrix transposition, and d k Representing a scaling factor;
step 3.4, attention is further divided into class-to-block attention A c2p Sum block-to-block attention a p2p Then pass the attention A between the blocks p2p Attention a between class and block c2p Adjusting;
step 3.5, use class-to-block attention A c2p Sum block-to-block attention a p2p The initial class activation map is optimized.
3. The attention-based weak supervision semantic segmentation method according to claim 2, wherein the method comprises the following steps ofClass-to-block attention A c2p Sum block-to-block attention a p2p The expression is as follows:
A c2p =Attention[1:C,C+1:C+N]
A p2p =Attention[C+1:C+N,C+1:C+N]
the attention between class c and block i is adjusted as follows:
firstly, sorting all blocks according to the order of attention values from big to small according to the attention between the blocks and the block i, and selecting the top p% of the sorted blocks;
then, the attention between class c and the selected block is taken out and calculated to obtain the attention adjustment factor between class c and block i:
wherein r (c, i) represents A c2p Attention regulator between class C and block i, C e {1,2,..c } represents the total number of data set categories, i, j represents a block, i e {1,2, n., j e U, U representing the set of the first p% of blocks with the greatest attention between block i, S representing the number of blocks in U; a is that c2p (c, j) represents the attention between the representation class c and the block j;
attention adjustment factor r (c, i) is then added to the attention between class c and block i to adjust:
A c2p (c,i)=A c2p (c,i)+α*r(c,i)
wherein A is c2p (c, i) represents the attention between class c and block i, and α represents the attention regulator coefficient.
4. The attention-based weak supervision semantic segmentation method according to claim 2, wherein class-to-block attention a is used in step 3.5 c2p Sum block-to-block attention a p2p Optimizing the initial class activation graph, comprising:
multiplying an initial class activation diagram initial-CAM by class-to-block attention to obtain a preliminary optimized adjustment class activation diagram;
and then further optimizing by matrix multiplication between the block-to-block attention and the adjustment class activation map to obtain a final class activation map.
5. The attention-based weak supervision semantic segmentation method according to any one of claims 1 to 4, wherein the model training process in step 4 is as follows:
step 4.1, setting super parameters of a weak supervision semantic segmentation model: model training times Epoch, initial learning rate and model training batch batch_size, wherein an optimizer used in training is an Adam optimizer, and a loss function is multi-label cross entropy loss;
step 4.2, carrying out multi-round training on the weak supervision semantic segmentation model, and storing parameters corresponding to a round of results with the highest training mIoU value;
and 4.3, after the weak supervision semantic segmentation model is trained, loading the stored best parameters into the model, inputting training set data into the model, and generating a complete class activation diagram by the trained model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311064941.7A CN117036711A (en) | 2023-08-23 | 2023-08-23 | Weak supervision semantic segmentation method based on attention adjustment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311064941.7A CN117036711A (en) | 2023-08-23 | 2023-08-23 | Weak supervision semantic segmentation method based on attention adjustment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117036711A true CN117036711A (en) | 2023-11-10 |
Family
ID=88641034
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311064941.7A Pending CN117036711A (en) | 2023-08-23 | 2023-08-23 | Weak supervision semantic segmentation method based on attention adjustment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117036711A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117593517A (en) * | 2024-01-19 | 2024-02-23 | 南京信息工程大学 | Camouflage target detection method based on complementary perception cross-view fusion network |
-
2023
- 2023-08-23 CN CN202311064941.7A patent/CN117036711A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117593517A (en) * | 2024-01-19 | 2024-02-23 | 南京信息工程大学 | Camouflage target detection method based on complementary perception cross-view fusion network |
CN117593517B (en) * | 2024-01-19 | 2024-04-16 | 南京信息工程大学 | Camouflage target detection method based on complementary perception cross-view fusion network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263912B (en) | Image question-answering method based on multi-target association depth reasoning | |
CN110032926B (en) | Video classification method and device based on deep learning | |
Kadam et al. | Detection and localization of multiple image splicing using MobileNet V1 | |
CN111259940B (en) | Target detection method based on space attention map | |
CN114758383A (en) | Expression recognition method based on attention modulation context spatial information | |
CN110929610B (en) | Plant disease identification method and system based on CNN model and transfer learning | |
CN111861945B (en) | Text-guided image restoration method and system | |
CN112052906B (en) | Image description optimization method based on pointer network | |
CN113111716B (en) | Remote sensing image semiautomatic labeling method and device based on deep learning | |
CN113343705A (en) | Text semantic based detail preservation image generation method and system | |
CN110245620B (en) | Non-maximization inhibition method based on attention | |
CN110674777A (en) | Optical character recognition method in patent text scene | |
CN113159067A (en) | Fine-grained image identification method and device based on multi-grained local feature soft association aggregation | |
CN112488209A (en) | Incremental image classification method based on semi-supervised learning | |
CN116452410A (en) | Text-guided maskless image editing method based on deep learning | |
CN115019173A (en) | Garbage identification and classification method based on ResNet50 | |
CN115563327A (en) | Zero sample cross-modal retrieval method based on Transformer network selective distillation | |
CN115565056A (en) | Underwater image enhancement method and system based on condition generation countermeasure network | |
CN116258874A (en) | SAR recognition database sample gesture expansion method based on depth condition diffusion network | |
CN114283285A (en) | Cross consistency self-training remote sensing image semantic segmentation network training method and device | |
CN116071553A (en) | Weak supervision semantic segmentation method and device based on naive VisionTransformer | |
CN117437317A (en) | Image generation method, apparatus, electronic device, storage medium, and program product | |
CN111126155B (en) | Pedestrian re-identification method for generating countermeasure network based on semantic constraint | |
CN117036711A (en) | Weak supervision semantic segmentation method based on attention adjustment | |
Campana et al. | Variable-hyperparameter visual transformer for efficient image inpainting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |