CN116168046B - 3D point cloud semantic segmentation method, system, medium and device under complex environment - Google Patents
3D point cloud semantic segmentation method, system, medium and device under complex environment Download PDFInfo
- Publication number
- CN116168046B CN116168046B CN202310456371.XA CN202310456371A CN116168046B CN 116168046 B CN116168046 B CN 116168046B CN 202310456371 A CN202310456371 A CN 202310456371A CN 116168046 B CN116168046 B CN 116168046B
- Authority
- CN
- China
- Prior art keywords
- point cloud
- augmentation
- semantic segmentation
- data
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000003416 augmentation Effects 0.000 claims description 50
- 238000013507 mapping Methods 0.000 claims description 12
- 230000003321 amplification Effects 0.000 claims description 11
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 11
- 238000009877 rendering Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 9
- 230000003190 augmentative effect Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 abstract description 3
- 239000003086 colorant Substances 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 101150050759 outI gene Proteins 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of point cloud semantic segmentation, and provides a 3D point cloud semantic segmentation method, a system, a medium and equipment in a complex environment, aiming at the problem of label dependence, the method, the system, the medium and the equipment are relieved by means of a small amount of marked point cloud data and a large amount of unmarked point cloud data, based on cross-modal learning, the corresponding relation between spatial structural features in a 3D mode and 2D-3D cross-modal is mined, the similarity of 2D and 3D features is maximized in a unified semantic feature space, interaction between modes is enhanced by means of 2D appearance semantic information and the spatial invariant characteristic of the point cloud, and the local geometric information of the point cloud can be captured more efficiently. And the method can be quickly adapted to complex environments through an end-to-end network, and the segmentation accuracy is enhanced.
Description
Technical Field
The invention belongs to the technical field of point cloud semantic segmentation, and particularly relates to a 3D point cloud semantic segmentation method, system, medium and device in a complex environment.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The robot needs to apply point cloud segmentation to sense and identify the surrounding environment in the inspection process, so that the intelligent inspection is realized instead of manpower. The point cloud segmentation is a technology for separating different parts in the point cloud data, so that environment perception and object recognition are realized, and the method has important significance for intelligent operation of a robot. Specifically, the robot scans the surrounding environment through a laser radar or a depth camera to acquire environment point cloud data, and can separate obstacles, floors, walls and the like in the environment through a point cloud segmentation technology to construct a map model and separate and identify surrounding objects.
However, for the point cloud segmentation technique, there are mainly two challenges: (1) complex environments: in complex environments, the difficulty of point cloud segmentation increases greatly, for example, for objects with complex shapes, objects with multiple colors, and objects with occlusion, the point cloud segmentation effect may be poor; (2) amount of point cloud annotation data: the amount of the point cloud data is huge, but manual marking is time-consuming and labor-consuming, and training a model by using a small amount of the point cloud data can seriously affect the accuracy of the point cloud segmentation. Currently, existing point cloud segmentation models require a large amount of data and computational resources, which makes the training process difficult, and existing models have difficulty interpreting the segmentation results, which limits the transparency and reliability of the model.
Disclosure of Invention
In order to solve at least one technical problem in the background art, the invention provides a 3D point cloud semantic segmentation method and a system under a complex environment, which are based on a weak supervision cross-modal 3D point cloud semantic segmentation method to reduce dependence on data labels and enhance understanding of 3D point cloud data. Based on the contrast learning paradigm, spatial invariance in the 3D point cloud modality is learned by learning distinguishing structural features in the 3D point cloud modality and visual concept mapping relationships between the 2D image and the 3D point cloud.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the first aspect of the invention provides a 3D point cloud semantic segmentation method under a complex environment, comprising the following steps:
generating a 2D image according to the 3D point cloud data by random rendering;
obtaining a 3D point cloud semantic segmentation result based on the 3D point cloud data, the 2D image and the trained point cloud semantic segmentation model; the construction process of the point cloud semantic segmentation model comprises the following steps:
constructing an augmentation dictionary based on the 3D point cloud data, constructing the augmentation data based on the augmentation dictionary, and extracting 3D image features of the augmentation data through a structure encoder; extracting features of the 2D image to obtain 2D image features;
mapping the 2D image features and the 3D image features to a unified semantic feature space through cross-modal learning, and obtaining feature representation of local points in the 3D point cloud data based on 2D description by capturing the corresponding relation between the two features;
and decoding the learned 3D image features through a decoder to obtain a 3D point cloud semantic segmentation result.
Further, the constructing augmentation data based on the augmentation dictionary specifically includes:
the built augmentation dictionary comprises a plurality of augmentation steps, and each augmentation step is provided with an augmentation factor;
when the 3D point cloud data is amplified, an amplification probability is randomly generated for each amplification step; when the probability of the augmentation is greater than the factor of the augmentation, the step of augmentation is employed, otherwise the step of augmentation is not employed.
Further, the 3D image features of the augmented data are extracted by a structural encoder:
step 1: the 3D point cloud data are learned through a convolution block to obtain a first weight, the first weight is multiplied with the 3D point cloud, and a first characteristic tensor is output;
step 2: performing feature extraction on the first feature tensor through the MLP layer, and outputting a second feature tensor; based on the two feature tensors, obtaining a second weight through convolution block learning, and multiplying the second weight by the second feature tensor to obtain a third tensor;
step 3: and (3) repeating the step (2) and sequentially taking the final tensor obtained by increasing the feature dimension as the structural coding feature.
Further, the convolution block consists of a multi-layer perceptron shared at each point, a max pooling layer and a full connection layer, outputting oneAffine transformation matrix of>Both the multi-layer perceptron and the max-pooling layer include a ReLU activation function and batch normalization operations, depending on the characteristic dimensions of the inputs to the convolution block.
Furthermore, when the point cloud semantic segmentation model is trained, only partial data with labels are up-sampled through a decoder by adopting a weak supervision learning paradigm, and then segmentation loss is calculated.
Further, after the augmented data is obtained, the 3D features having distinctiveness in the 3D modality are learned by comparing the learning paradigm, including: and obtaining the similarity of maximized augmentation data and the similarity of structural features between different point clouds based on the contrast loss.
Further, the mapping relation between the 2D image features and the 3D image features is obtained by maximizing feature similarity between the 2D image and the 3D point cloud in a feature space.
A second aspect of the present invention provides a 3D point cloud semantic segmentation system in a complex environment, comprising:
a 2D image rendering module configured to: generating a 2D image according to the 3D point cloud data by random rendering;
a semantic segmentation module configured to: obtaining a 3D point cloud semantic segmentation result based on the 3D point cloud data, the 2D image and the trained point cloud semantic segmentation model; the construction process of the point cloud semantic segmentation model comprises the following steps:
constructing an augmentation dictionary based on the 3D point cloud data, constructing the augmentation data based on the augmentation dictionary, and extracting 3D image features of the augmentation data through a structure encoder; extracting features of the 2D image to obtain 2D image features;
mapping the 2D image features and the 3D image features to a unified semantic feature space through cross-modal learning, and obtaining feature representation of local points in the 3D point cloud data based on 2D description by capturing the corresponding relation between the two features;
and decoding the learned 3D image features through a decoder to obtain a 3D point cloud semantic segmentation result.
A third aspect of the present invention provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a 3D point cloud semantic segmentation method in a complex environment as described above.
A fourth aspect of the invention provides an electronic device.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a 3D point cloud semantic segmentation method in a complex environment as described above when the program is executed.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, based on cross-modal learning, the corresponding relation between the spatial structure features in the 3D mode and the 2D-3D cross-modal is mined, the similarity of the 2D features and the 3D features is maximized in a unified semantic feature space, interaction between modes is enhanced by means of the 2D appearance semantic information and the space invariant characteristic of the point cloud, local geometric information of the point cloud can be captured more efficiently, and the segmentation accuracy is enhanced by rapidly adapting to a complex environment through an end-to-end network.
2. The method is based on a weak supervision learning paradigm, and the problem of label dependence is relieved by means of a small amount of marked point cloud data and a large amount of unmarked point cloud data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a network frame of a 3D point cloud semantic segmentation method in a complex environment provided by an embodiment of the present invention.
Fig. 2 is a block diagram of a structure encoder and decoder provided in an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Interpretation of the terms
A complex environment refers to a scene of objects having a variety of shapes, sizes, textures, colors, and relationships encountered in the real world. In point cloud segmentation techniques, complex environments can present many challenges to algorithms and models. The following are some specific features of the complex environment:
objects of complex shape: the physical world objects vary in shape from simple geometric shapes (e.g., cubes, spheres) to objects with complex curves and structures (e.g., vegetation, buildings). Complex shaped objects can make point cloud segmentation more difficult.
Objects of multiple colors: objects in the environment may have various colors and textures, which may make it difficult for a point cloud segmentation algorithm to distinguish between adjacent objects or to identify different portions of the same object.
Occlusion and overlap: objects in a complex environment may obscure or overlap each other, making it difficult for the point cloud segmentation algorithm to segment them correctly. In addition, occlusion can also lead to missing and incomplete information in the point cloud data.
Background noise and spurious points: background noise and stray points may be present in the point cloud data due to sensor errors, ambient light, etc. These noise points can interfere with the performance of the point cloud segmentation algorithm.
Dynamic environment: objects in a complex environment may move at different points in time, resulting in dynamic changes in point cloud data. Processing these dynamic changes is a challenge for point cloud segmentation algorithms.
Large scale and high density: point cloud data in complex environments is often characterized by large scale and high density, meaning that point cloud segmentation algorithms need to handle a large number of points and complex relationships between them. This not only increases the computational complexity, but may also lead to memory and storage problems.
Example 1
As shown in fig. 1-2, the present embodiment provides a 3D point cloud semantic segmentation method in a complex environment, including the following steps:
s1: and generating a 2D image according to the 3D point cloud data through random rendering.
Acquiring a 3D point cloud dataset:
,
Wherein,,representing the number of tagged point cloud data, +.>The number of unlabeled point clouds is represented,representing point cloud data->Representing ∈data according to point cloud>Rendering the resulting 2D image, per point cloud data +.>There is->Data points, the characteristic dimension of each point is +.>The characteristics include position and color information, +.>Representing the length and width of the 2D image; />A tag representing point cloud data;
for the followingIn this embodiment, a pre-trained DISN (deep implicit surface network) is used to randomly select an angle for rendering to obtain a 2D image +.>。
S2: distinguishing structural features within the 3D modality are learned.
By utilizing geometrical invariance of point clouds in a 3D space, distinguishing 3D features in a 3D mode are learned based on a comparison learning paradigm by constructing two augmented point cloud data.
Specifically, an augmentation dictionary is first built, and a plurality of augmentation steps such as rotation, scaling, translation, normalization, elastic distortion and the like are included in the dictionary, wherein each augmentation step is provided with an augmentation factor with a value between 0 and 1.
For convenience, the amplification factor of all the amplification steps is set to 0.5 in this embodiment.
When the point cloud data is amplified, an amplification probability is randomly generated for each amplification step, and when the value is larger than the amplification factor, the amplification step is adopted, otherwise, the amplification step is not adopted.
Thus, for each point cloud dataDifferent augmentation data may be obtained by means of a random linear combination based on the augmentation dictionary.
In the invention, the problem of computational complexity is considered, and only the use is consideredIs>。
Since point cloud data has invariance in geometric space, the data after augmentation is also similar in feature space.
Two augmented data are extracted by one structure encoderStructural coding features of->The same point cloud data->Generated->Should be similar in feature space and different point cloud data +.>And->The distance of the generated augmentation data in the feature space should be as far as possible.
Maximization ofThe similarity of structural features between different point clouds can be defined based on contrast Loss InfoNCE Loss as:
wherein,,representing the calculated cosine similarity,/->Is a super parameter, and K is the data volume of the point cloud which participates in training in the same batch.
The obtained maximizationAnd (3) minimizing the similarity of structural features among different point clouds for calculating the gradient of the model, and optimizing the model by adopting a random gradient descent method.
The specific coding process of the structural coder is as follows:
s2.1: input 3D point cloudLearning a size of +.>The weight is multiplied with the input 3D point cloud, so that the 3D point cloud is aligned, invariance of the model to specific space conversion is ensured, and the model is outputIs a characteristic tensor of (c).
S2.2: extracting the characteristics of the characteristic tensor obtained in S2.1 through the MLP layer, and outputtingIs a characteristic tensor of (c). The feature tensor learns a magnitude of +.>Weights, weights multiplied by the characteristic tensor of the input, output +.>Is a two-dimensional tensor of (c).
S2.3: repeating step S2.2 3 times, sequentially increasing feature dimensions to [128, 512, 1024]Finally obtainIs a two-dimensional tensor of (2)As a structural feature.
S2.2, the convolution block is composed of a Multi-Layer perceptron (MLP), a maximum pooling Layer (Max pooling) and two output sizes shared at each point,/>) Is composed of completely connected layers (fully connected layers, FC) for outputting a +.>Affine transformation matrix of>Depending on the characteristic dimension of the input of the convolution block. Notably, all layers of the convolution block, except the last layer, include a ReLU activation function and batch normalization (Batch Normalization, BN) operation.
S3: and learning a cross-modal mapping relationship between the 3D point cloud data and the 2D image.
S3, 2D imageRandomly selecting an angle according to the data +.>The appearance information of the point cloud data plane is rendered, so that the two are naturally provided with a mapping relation.
Through inter-modality learning, 2D image features and 3D image features can be mapped to a unified semantic feature space, and by capturing correspondence between the two, a more generalized feature representation of local points in 3D point cloud data can be obtained based on the 2D description.
Because of the large attribute difference between the 2D image and the 3D point cloud, the method and the device perform feature extraction on the 2D image by taking the common ResNet network as an image encoder, and perform feature extraction on the point cloud data2D features can be obtained>. In order to learn the mapping relation of point cloud data among cross modes, features are coded at the same time>Obtaining +.>Characterizing point cloud data->。
Maximizing feature similarity between a 2D image and a 3D point cloud within a feature space may be defined as:
by maximizing the feature similarity between the 2D image and the 3D point cloud, the mapping relationship of the point cloud data between the cross modes can be learned, and according to the assumption of spatial invariance, the same part of the object, such as a sofa armrest, has similar features with the 3D object (3D point cloud) no matter from which angle the object is observed (i.e. the 2D image), so that the feature similarity between the two needs to be maximized.
The scheme has the advantage that based on the cross-modal learning, the corresponding relation between the spatial structure characteristics in the 3D mode and the 2D-3D cross-mode is mined. By maximizing the similarity of 2D and 3D features in a unified semantic feature space, interaction between modes is enhanced by means of 2D appearance semantic information and the space invariant characteristic of point clouds, local geometric information of the point clouds can be captured more efficiently, complex environments can be quickly adapted through an end-to-end network, and segmentation accuracy is enhanced.
S4: and outputting a prediction semantic segmentation result.
The learned 3D junction is passed through a decoderConstruct characterizationMapped to the original point cloud size.
Due to the point cloud datasetOnly part of the point cloud data has labels, so that only part of the data with labels is +.>Upsampled by the decoder and then the segmentation loss is calculated.
The specific upsampling process of the decoder is as follows:
s4.1: characterizing 3D structureAnd 2D appearance feature->Fusion is performed as input to the decoder>Wherein->Is a parameter that can be learned. Learning a block of size by a convolutionWeight, weight multiplied by input feature, output dimension size is +.>Is a two-dimensional tensor of (c).
S4.2: feature extraction is carried out on tensors obtained in S4.1 through a shared MLP layer, and output is carried outIs a characteristic tensor of (c). The feature tensor learns a magnitude of +.>Weights, weights multiplied by the characteristic tensor of the input, output +.>Is a two-dimensional tensor of (c).
S4.3: the step S2.2 is repeated 3 times, sequentially increasing the feature dimension to 128, 63,]finally obtainAnd obtaining a final semantic segmentation prediction result through a full connection layer and a softmax function.
In this embodiment, the segmentation loss in S4 uses a cross entropy loss.
In the whole segmentation framework, the label-free data is only applied to the S2 and the S3, and is used for mining useful structure and appearance information from the point cloud data, and optimizing the structure encoder of the S2. The tagged data is used mainly to learn model weights for the decoder, except for optimizing the structural encoder.
The above scheme has the advantage that the problem of tag dependence is alleviated by means of a small amount of marked point cloud data and a large amount of unmarked point cloud data based on a weakly supervised learning paradigm.
The scheme of the invention can be applied to example segmentation in an unmanned urban scene, but is not limited to the scene, and can also be applied to other complex environments.
Table 1 is a simulation experiment based on the open source 3D point cloud dataset ScanNetv2, where only 20% of the tags were used in the training set and the remaining 80% of the data were processed as unlabeled data. The experiment adopts the overall classification accuracy PA (Point Accuracy), namely the ratio of the correct points to the total points of the point cloud; average classification accuracy MPA (Mean Point Accuracy), i.e. calculating the ratio of the correct points of each class to all points of the class and then averaging; the average IoU value MIoU (Mean Intersection over Union) of each class is used as an evaluation index.
Table 1 comparison of the accuracy of the invention with other algorithms
Example two
The embodiment provides a 3D point cloud semantic segmentation system under a complex environment, which comprises the following steps:
a 2D image rendering module configured to: generating a 2D image according to the 3D point cloud data by random rendering;
a semantic segmentation module configured to: obtaining a 3D point cloud semantic segmentation result based on the 3D point cloud data, the 2D image and the trained point cloud semantic segmentation model; the construction process of the point cloud semantic segmentation model comprises the following steps:
constructing an augmentation dictionary based on the 3D point cloud data, constructing the augmentation data based on the augmentation dictionary, and extracting 3D image features of the augmentation data through a structure encoder; extracting features of the 2D image to obtain 2D image features;
mapping the 2D image features and the 3D image features to a unified semantic feature space through cross-modal learning, and obtaining feature representation of local points in the 3D point cloud data based on 2D description by capturing the corresponding relation between the two features;
and decoding the learned 3D image features through a decoder to obtain a 3D point cloud semantic segmentation result.
Example III
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in a 3D point cloud semantic segmentation method under a complex environment as described above.
Example IV
The embodiment provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the 3D point cloud semantic segmentation method under the complex environment when executing the program.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. The 3D point cloud semantic segmentation method under the complex environment is characterized by comprising the following steps of:
generating a 2D image according to the 3D point cloud data by random rendering;
obtaining a 3D point cloud semantic segmentation result based on the 3D point cloud data, the 2D image and the trained point cloud semantic segmentation model; the construction process of the point cloud semantic segmentation model comprises the following steps:
constructing an augmentation dictionary based on the 3D point cloud data, constructing the augmentation data based on the augmentation dictionary, and extracting 3D image features of the augmentation data through a structure encoder; extracting features of the 2D image to obtain 2D image features;
mapping the 2D image features and the 3D image features to a unified semantic feature space through cross-modal learning, and obtaining feature representation of local points in the 3D point cloud data based on 2D description by capturing the corresponding relation between the two features;
decoding the learned 3D image features through a decoder to obtain a 3D point cloud semantic segmentation result;
the construction of the augmentation data based on the augmentation dictionary specifically comprises the following steps:
the built augmentation dictionary comprises a plurality of augmentation steps, and each augmentation step is provided with an augmentation factor;
when the 3D point cloud data is amplified, an amplification probability is randomly generated for each amplification step; when the augmentation probability is greater than the augmentation factor, the step of augmentation is employed, otherwise the step of augmentation is not employed;
the 3D image features of the augmented data are extracted by a structural encoder:
step 1: the 3D point cloud data are learned through a convolution block to obtain a first weight, the first weight is multiplied with the 3D point cloud, and a first characteristic tensor is output;
step 2: extracting the characteristics of the first characteristic tensor through the multi-layer sensor, and outputting a second characteristic tensor; based on the two feature tensors, obtaining a second weight through convolution block learning, and multiplying the second weight by the second feature tensor to obtain a third tensor;
step 3: and (3) repeating the step (2) and sequentially taking the final tensor obtained by increasing the feature dimension as the structural coding feature.
2. The 3D point cloud semantic segmentation method according to claim 1, wherein the convolution block consists of a multi-layer perceptron, a max pooling layer and a full connection layer shared at each point, outputting oneAffine transformation matrix of>Both the multi-layer perceptron and the max-pooling layer include a ReLU activation function and batch normalization operations, depending on the characteristic dimensions of the inputs to the convolution block.
3. The 3D point cloud semantic segmentation method under the complex environment according to claim 1, wherein when the point cloud semantic segmentation model is trained, only part of data with labels is up-sampled through a decoder by adopting a weak supervision learning paradigm, and then segmentation loss is calculated.
4. The method for 3D point cloud semantic segmentation in a complex environment according to claim 1, wherein learning the 3D features having distinctiveness in the 3D modality by comparing the learning paradigm after obtaining the augmented data, comprises: and obtaining the similarity of maximized augmentation data and the similarity of structural features between different point clouds based on the contrast loss.
5. The 3D point cloud semantic segmentation method under the complex environment according to claim 1, wherein the mapping relationship between the 2D image features and the 3D image features is obtained by maximizing feature similarity between the 2D image and the 3D point cloud in a feature space.
6. The 3D point cloud semantic segmentation system in the complex environment is realized by applying the 3D point cloud semantic segmentation method in the complex environment as claimed in claim 1, and is characterized by comprising the following steps:
a 2D image rendering module configured to: generating a 2D image according to the 3D point cloud data by random rendering;
a semantic segmentation module configured to: obtaining a 3D point cloud semantic segmentation result based on the 3D point cloud data, the 2D image and the trained point cloud semantic segmentation model; the construction process of the point cloud semantic segmentation model comprises the following steps:
constructing an augmentation dictionary based on the 3D point cloud data, constructing the augmentation data based on the augmentation dictionary, and extracting 3D image features of the augmentation data through a structure encoder; extracting features of the 2D image to obtain 2D image features;
mapping the 2D image features and the 3D image features to a unified semantic feature space through cross-modal learning, and obtaining feature representation of local points in the 3D point cloud data based on 2D description by capturing the corresponding relation between the two features;
and decoding the learned 3D image features through a decoder to obtain a 3D point cloud semantic segmentation result.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the 3D point cloud semantic segmentation method in a complex environment according to any of claims 1-5.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the 3D point cloud semantic segmentation method in a complex environment according to any of claims 1-5 when the program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310456371.XA CN116168046B (en) | 2023-04-26 | 2023-04-26 | 3D point cloud semantic segmentation method, system, medium and device under complex environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310456371.XA CN116168046B (en) | 2023-04-26 | 2023-04-26 | 3D point cloud semantic segmentation method, system, medium and device under complex environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116168046A CN116168046A (en) | 2023-05-26 |
CN116168046B true CN116168046B (en) | 2023-08-25 |
Family
ID=86420429
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310456371.XA Active CN116168046B (en) | 2023-04-26 | 2023-04-26 | 3D point cloud semantic segmentation method, system, medium and device under complex environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116168046B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116612285B (en) * | 2023-06-15 | 2024-09-20 | 重庆市测绘科学技术研究院 | Building point cloud data segmentation and point cloud data semantic segmentation method and system |
CN116740820B (en) * | 2023-08-16 | 2023-10-31 | 南京理工大学 | Single-view point cloud three-dimensional human body posture and shape estimation method based on automatic augmentation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233124A (en) * | 2020-10-14 | 2021-01-15 | 华东交通大学 | Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning |
CN113239749A (en) * | 2021-04-27 | 2021-08-10 | 四川大学 | Cross-domain point cloud semantic segmentation method based on multi-modal joint learning |
CN114067112A (en) * | 2021-11-06 | 2022-02-18 | 西北工业大学 | Point cloud segmentation method based on quick graph convolution |
CN114241226A (en) * | 2021-12-07 | 2022-03-25 | 电子科技大学 | Three-dimensional point cloud semantic segmentation method based on multi-neighborhood characteristics of hybrid model |
CN115601275A (en) * | 2022-09-07 | 2023-01-13 | 北京紫光展锐通信技术有限公司(Cn) | Point cloud augmentation method and device, computer readable storage medium and terminal equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11687087B2 (en) * | 2020-03-12 | 2023-06-27 | Honda Motor Co., Ltd. | Systems and methods for shared cross-modal trajectory prediction |
-
2023
- 2023-04-26 CN CN202310456371.XA patent/CN116168046B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233124A (en) * | 2020-10-14 | 2021-01-15 | 华东交通大学 | Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning |
CN113239749A (en) * | 2021-04-27 | 2021-08-10 | 四川大学 | Cross-domain point cloud semantic segmentation method based on multi-modal joint learning |
CN114067112A (en) * | 2021-11-06 | 2022-02-18 | 西北工业大学 | Point cloud segmentation method based on quick graph convolution |
CN114241226A (en) * | 2021-12-07 | 2022-03-25 | 电子科技大学 | Three-dimensional point cloud semantic segmentation method based on multi-neighborhood characteristics of hybrid model |
CN115601275A (en) * | 2022-09-07 | 2023-01-13 | 北京紫光展锐通信技术有限公司(Cn) | Point cloud augmentation method and device, computer readable storage medium and terminal equipment |
Non-Patent Citations (1)
Title |
---|
Detection and segmentation of unlearned objects in unknow environment;zhang jianhua;《IEEE》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116168046A (en) | 2023-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | A review of deep learning-based semantic segmentation for point cloud | |
Han et al. | Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era | |
Garcia-Garcia et al. | A review on deep learning techniques applied to semantic segmentation | |
CN115240121B (en) | Joint modeling method and device for enhancing local features of pedestrians | |
CN116168046B (en) | 3D point cloud semantic segmentation method, system, medium and device under complex environment | |
Zhang et al. | Deep hierarchical guidance and regularization learning for end-to-end depth estimation | |
Yuniarti et al. | A review of deep learning techniques for 3D reconstruction of 2D images | |
JP2023073231A (en) | Method and device for image processing | |
CN110852182A (en) | Depth video human body behavior recognition method based on three-dimensional space time sequence modeling | |
Cho et al. | Semantic segmentation with low light images by modified CycleGAN-based image enhancement | |
CN113345106A (en) | Three-dimensional point cloud analysis method and system based on multi-scale multi-level converter | |
CN117197727B (en) | Global space-time feature learning-based behavior detection method and system | |
Cao et al. | Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure | |
CN115147601A (en) | Urban street point cloud semantic segmentation method based on self-attention global feature enhancement | |
Li et al. | Deep learning based monocular depth prediction: Datasets, methods and applications | |
CN116912296A (en) | Point cloud registration method based on position-enhanced attention mechanism | |
CN112488117B (en) | Point cloud analysis method based on direction-induced convolution | |
CN117475228A (en) | Three-dimensional point cloud classification and segmentation method based on double-domain feature learning | |
Cao et al. | Label-efficient deep learning-based semantic segmentation of building point clouds at LOD3 level | |
Gao et al. | Semantic Segmentation of Substation Site Cloud Based on Seg-PointNet | |
CN117689887A (en) | Workpiece grabbing method, device, equipment and storage medium based on point cloud segmentation | |
Malah et al. | Generating 3D Reconstructions Using Generative Models | |
Wang et al. | PatchCNN: An explicit convolution operator for point clouds perception | |
CN114638953A (en) | Point cloud data segmentation method and device and computer readable storage medium | |
CN114998990B (en) | Method and device for identifying safety behaviors of personnel on construction site |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |