CN114444613A

CN114444613A - Object classification and object segmentation method based on 3D point cloud information

Info

Publication number: CN114444613A
Application number: CN202210127425.3A
Authority: CN
Inventors: 刘振泽; 陈金炎; 董迪锴; 吴闯; 何井全; 张渝敏; 周亚仑
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-06

Abstract

The invention relates to an object classification and object segmentation method based on 3D point cloud information, belongs to the field of 3D point cloud processing, and particularly relates to the field of disordered point cloud difficulty in processing and point cloud feature difficulty in extracting. In the invention, in order to extract more effective information in extracting point cloud characteristics, a brand-new combined global characteristic and local characteristic structure is provided and used. A novel extraction mechanism is provided to replace the traditional symmetric function when extracting high-dimensional features. A large number of experiments show that the neural network designed by the method can solve various point cloud tasks including but not limited to object classification, object segmentation, scene segmentation and the like. The invention has the beneficial effects that: the designed neural network design special for Mix-Net is mainly designed aiming at the point cloud identification and segmentation algorithm, innovation is carried out on a plurality of modules such as a down-sampling algorithm, a characteristic value extraction and fusion mechanism, and the point cloud identification and segmentation accuracy is obviously improved.

Description

Object classification and object segmentation method based on 3D point cloud information

Technical Field

The invention relates to the field of 3D point cloud processing, in particular to the field of disordered point cloud difficulty in processing and point cloud feature difficulty in extracting, and provides an object classification and object segmentation method based on 3D point cloud information.

Background

The 3D point cloud technology is applied to the fields of intelligent driving, virtual reality and robots at present. The method mainly aims to extract effective features from the 3D point cloud to describe the category or the segmentation part of the point cloud. The processing of 3D point clouds mainly requires attention: the order disorder of the point clouds and the permutation disorder of the point clouds. The order disorder is: the point cloud may have N characterization forms in the container, but either characterization form still describes the same object. The permutation disorder is a simple operation of rotating and translating the point cloud in a measurement space, and under a standard coordinate system, a container value representing the point cloud changes along with the point cloud, but still represents the same target object. In both cases, the multiple representations of the container can still extract similar or identical features to characterize the point cloud, which is a difficult point in point cloud processing. At present, the prior art only aims at extracting point cloud overall features, and is still difficult to relate to an efficient end-to-end neural network. Compared with the existing neural network, the accuracy rate in the tasks of object classification and object segmentation is obviously improved.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an object classification and object segmentation method based on 3D point cloud information, aiming at improving the accuracy of point cloud identification and segmentation and reducing the information loss caused by dimension reduction or voxel change; and can be widely used for various sensing devices.

The invention is further described with reference to the accompanying drawings:

the object classification and object segmentation method based on the 3D point cloud information comprises the following steps:

step one, adding position codes to input point cloud information. For the input point cloud information, the input relative position information is added. The relative position information is added in the first-time input layer, as well as in the attention function. A large number of experiments prove that the network performance can be improved by adding the relative position codes, and the final output accuracy is improved.

And step two, a data down-sampling algorithm. And for input, a down-sampling algorithm is used, so that the complexity of the whole network can be reduced, the network training speed is accelerated, and overfitting can be inhibited. The invention provides a novel attention-based downsampling algorithm to replace the traditional farthest point distance algorithm.

And step three, fusing the relative position information into a multi-head attention mechanism of the pair. As a main layer for extracting features in the network, a multi-attention layer of a relative position information fusion pair is used for replacing a traditional convolutional layer. Experiments show that the network performance can be improved by a multi-head attention layer, and the final output accuracy is improved.

And step four, designing the structure of the point cloud encoder. The point cloud encoder structure designed by the invention is used for extracting high-dimensional characteristic vectors from input point cloud data, wherein the characteristic vectors comprise point cloud global characteristic information and point cloud local characteristic information. And when the integral point cloud characteristics are obtained by fusing the characteristics, the invention provides a novel characteristic fusion mechanism to replace the traditional symmetric function.

And step five, designing a point cloud decoder structure. The invention designs different decoder structures for different point cloud processing tasks, and the structures are similar and different.

And step six, designing a Mix-Net network training method. The method mainly comprises the steps of designing a loss function, designing an optimizer and designing various training parameter indexes aiming at different data sets according to different point cloud processing tasks.

Further, the implementation process of the position coding in step one is as follows: because there are many storage methods for the container of the input point cloud, it is difficult to directly obtain the relative position information between points (between the same input different elements) in the network. Relative position information is introduced to increase the position feature extraction between points. The position coding formula of the invention is as follows:

wherein, P_iIs the input of data or a characteristic that,

is the average of the input data or features and MLP is a commonly used multi-layer perceptron. According to the invention, multiple experiments show that the addition of the position information in the first layer input and attention mechanism module obviously improves the performance of the whole network.

Further, the main implementation process of the data down-sampling algorithm in the second step is as follows:

(1) the traditional farthest point sampling algorithm is replaced by a multi-point attention mechanism-based algorithm. And obtaining attention scores by the characteristics output by convolution through a multi-head attention mechanism, and achieving the effect similar to the effect of collecting k values at the farthest point by selecting the front k maximum attention score values. Experiments prove that the down-sampling algorithm used by the invention is more suitable for point cloud, similar characteristic output can be obtained aiming at various different sequential inputs, and the problem of point cloud disorder can be solved.

(2) And matching the determined sequence points in the origin cloud input according to the obtained k maximum values, and representing the sequence points as key points. The key points represent the most key parts of the point cloud, and can be easily seen through key point visualization, and the key points are skeleton information of the point cloud

(3) And selecting the feature points around each key point by using a k-means algorithm according to the selected key points, obtaining all the feature points conforming to the k-means algorithm for feature extraction, and fusing the multilayer features by using a multi-scale feature fusion mode. And outputting the down-sampled high-dimensional features.

Further, the implementation process of the multi-head attention mechanism of the relative position information fusion pair in the third step is mainly as follows:

(1) and the overall performance of the network can be increased by using the relative distance at the input break, and a great number of experiments show that the overall network performance can be increased by increasing the relative distance rho in the attention function. The invention adds the relative distance to the attention function, and the formula is as follows:

q, K and V are query, key and value matrixes obtained according to respective change matrixes, softmax is an activation function, rho is a relative distance, a relative distance value is added at the transposition of K, and a relative distance value is added at V. The attention function designed by the invention can extract more sufficient elements and information among the elements in the input, and is more suitable for point cloud processing.

(2) Introducing a multi-head attention mechanism. The input is passed through several independent attention layers, and finally the output results of every layer are cascaded to obtain the output of multi-head attention layer. The formula of the multi-head attention mechanism is as follows:

F_i＝Attention(Q_i，K_i，V_i)，i＝1，...，h，

MultiHead(Q，K，V)＝Concat(F₁，F₂，...，F_h)W^O

wherein, X represents the input of the input,

and

three groups of different linear matrixes are adopted, h represents the number of the multi-head attention mechanism, h is 4,

representing the output feature matrix. F_iRepresenting the value of the final output of each multi-attention layer, Concat representing the cascade, the multi-attention mechanism dividing the input into h independent attention layers, and operating each attention layer in parallel.

Further, the implementation process of the structural design of the point cloud encoder in the step four is as follows:

(1) for the input, the input of a point cloud sample is

Wherein N represents the input number, each input point cloud has d-dimensional vectors, common d is 3 or 6, when d is 3, the point cloud only contains position information, and when d is 6, the point cloud contains position information and normal vector information. F_pIs output via a relative position-coding layer, F_eIs F_pThe output feature vectors that have passed through the down-sampling layer,

F_ois F_eAfter the output of the 4 layers of multi-head attention mechanism layers,

F_othe output feature vector formula is as follows:

MSA(A)＝Attention(A，A，A)，

F₁＝MSA(F_e)，

F_i＝MSA(F_i-1)，i＝2，3，4，

F_o＝Concat(F₁，F₂，...，F₄)·W^O，

wherein each MSA (F)_i) And representing the output characteristics of the ith attention layer, wherein the output dimension of each layer is consistent with the input dimension. W is a group of_oIs the weight of the linear layer, which changes during the network training process.

(2) To extract global features F_globalAnd local feature F_localThe invention designs a novel feature fusion layer to replace the traditional symmetric function. The newly proposed fusion layer can still solve the problem of the disorder of the point cloud and reduce the loss characteristics of the point cloud due to dimension reduction to a great extent, and the formula of the new fusion layer is as follows:

MCA(A，B)＝Attention(A，B，B)，

F_lg＝MCA(F_local，F_global)。

the invention has the beneficial effects that: the invention mainly designs a new neural network design special for Mix-Net aiming at a point cloud identification and segmentation algorithm, innovatively invents a plurality of modules such as a down-sampling algorithm, a characteristic value extraction mechanism and a fusion mechanism, and obviously improves the point cloud identification and segmentation accuracy. In addition, the network is an end-to-end network, other auxiliary modules are not needed to be added for adjusting input, the input point cloud is directly processed, and information loss caused by dimension reduction or voxel change is reduced. Full experiments are performed on common data sets, common radar equipment and common RGB-D cameras, and the algorithm provided by the invention can be widely applied to various sensing and sensing equipment.

Drawings

FIG. 1 is a schematic diagram of a Mix-Net network

FIG. 2 is a flow chart of the overall structure of the Mix-Net network

Fig. 3 Mix-Net network down-sampling module

FIG. 4 representation of Mix-Net network in classification task

FIG. 5 is a graph of Mix-Net network segmentation of an airplane model in a segmentation task, wherein the left graph is artificial label segmentation, and the right graph is a network output result

FIG. 6 is a water cup model segmentation task in a Mix-Net network, wherein the left graph is artificial label segmentation, and the right graph is a network output result

FIG. 7 is a graph of chair model segmentation in the Mix-Net network in the segmentation task, wherein the left graph is artificial label segmentation, and the right graph is a network output result

Detailed Description

The following detailed description of the embodiments of the present invention will be made in order to make the objects, technical solutions, features, and the like of the present invention easier to understand, and it is obvious that the described embodiments are only some embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The examples are given solely for the purpose of illustration and are not intended to be limiting.

The invention provides an object classification and object segmentation method based on 3D point cloud information, which is mainly structurally characterized in that flow charts are shown in figures 1 and 2, and the complete method can be described by the following steps:

step one, inputting and adding position codes. The method is commonly used in natural language processing in position coding. The physical relation among the input features can be increased, the overall performance of the network is improved, and the accuracy of classification and segmentation tasks is improved. Position information needs to be added in the input process, and the position information can be generally divided into a relative position and an absolute position. In point cloud processing, the input data already contains explicit three-dimensional position information. However, since there are many storage methods for the container of the input point cloud, it is difficult to directly obtain the relative position information between points (between different elements of the same input) in the network. Relative position information is introduced to increase the position feature extraction between points. The relative position coding formula of the invention is as follows:

wherein, P_iIs the input of data or a characteristic that,

And step two, a data down-sampling algorithm. The down-sampling algorithm outputs input data or characteristic down-sampling data, so that the follow-up network can conveniently aim at strong characteristic information, the complexity of the whole network is reduced, the whole parameters of the network are reduced, and the network training process is accelerated. The overall structure diagram is shown in fig. 3, and the specific implementation comprises the following steps:

1. convolution processing is used for the input point cloud. The convolutional network easily extracts low-dimensional feature information compared to the attention mechanism, by two convolutional layers, each containing a one-dimensional convolution, a batch normalization layer, and a ReLu activation function layer.

2. A multi-point attention system based algorithm is used instead of the conventional farthest point sampling algorithm. And obtaining attention scores by the characteristics output by convolution through a multi-head attention mechanism, and achieving the effect similar to the effect of collecting k values at the farthest point by selecting the front k maximum attention score values. Experiments prove that the down-sampling algorithm used by the invention is more suitable for point cloud, similar characteristic output can be obtained aiming at various different sequential inputs, and the problem of point cloud disorder can be solved.

3. And matching the determined sequence points in the origin cloud input according to the obtained k maximum values, and representing the sequence points as key points. The key points represent the most key parts of the point cloud, and can be easily seen through key point visualization, and the key points are skeleton information of the point cloud.

4. And according to the selected key points, selecting feature points around each key point by using a k-means algorithm, obtaining all feature points conforming to the k-means algorithm for feature extraction, and fusing the multilayer features by using a multi-scale feature fusion mode. And outputting the down-sampled high-dimensional features.

And step three, a multi-head attention mechanism with fusion of relative position information. The concrete implementation comprises the following steps:

1. the attention mechanism is applied to the field of natural language processing, and calculates the correlation degree between words aiming at input sentences. Q, K and V are query, key and value matrixes obtained by the input sentences according to the respective change matrixes. For input

When dx is dy, the attention mechanism is self-attentive; when dx ≠ dy is different, for the cross attention mechanism, the calculation formula of the attention mechanism is as follows:

Q＝F_inx·W^Q，K＝F_iny·W^K，V＝F_iny·W^v

wherein, W^Q，W^K，W^VAre all linear matrices, and d_q，d_k，d_vDimensions of query, key, value, respectively. It can be seen that query is by F_inxDetermining that key, value is from F_inyAnd (6) determining. For the invention

The computational efficiency of the attention mechanism can be increased.

One attention mechanism layer is mainly based on the product of Q and related K, and the output is obtained by multiplying a specific activation function by the corresponding V. The commonly used attention function formula is as follows:

where the attention weight is vector multiplied by Q and the transpose corresponding to K, and divided by d_kIt is possible to prevent the value obtained by the integral multiplication from being excessively large. The final attention weight is obtained using the softmax function. And finally multiplying the attention weight by the corresponding V to obtain the attention output.

The overall performance of the network can be increased by using the relative distance at the input, and through a large number of experiments, it can be seen that increasing the relative distance ρ in the attention function can still increase the overall network performance. The invention adds the relative distance to the attention function, and the formula is as follows:

in the invention, a relative distance value is added at the transposition of K, and a relative distance value is added at V. The attention function designed by the invention can extract more sufficient elements and information among the elements in the input, and is more suitable for point cloud processing.

2. Multi-head attention mechanism

A single-headed attention mechanism may not extract enough features due to the limitation of too large input vectors or features. In order to solve the problem, the invention uses a multi-head attention mechanism, and the method can accelerate the overall training speed of the network and reduce overfitting. The main idea is as follows: the input is passed through several independent attention layers, and finally the output results of every layer are cascaded to obtain the output of multi-head attention layer. The multi-head attention mechanism formula is as follows:

F_i＝Attention(Q_i，K_i，V_i)，i＝1，...，h，

MultiHead(Q，K，V)＝Concat(F₁，F₂，...，F_h)W^O

wherein h represents the number of the multi-head attention mechanism, and h is 4 in the invention. Wherein

Representing the output feature matrix. F_iA value representing the output of each multi-headed attention layer.

And

are three different sets of linear matrices. Finally, the multi-head attention mechanism divides the input into h independent attention mechanism layers, and operates each attention mechanism layer in parallel.

And step four, point cloud encoder structure. The general structure diagram of Mix-Net is shown in FIG. 2. The aim of Mix-Net is to characterize the point cloud by taking a new high-dimensional vector through the network encoder structure. The high-dimensional vector has certain robustness for different inputs, and has a generalized effect on the rotation and translation of the input point cloud and the unordered input of the point cloud. The high-dimensional feature vector can be used as a key vector for most point cloud processing tasks. In the invention, the feature vector is defined as a point cloud feature vector, and the local feature vector and the global feature vector are combined.

The encoder structure designed by the invention increases relative position information aiming at the input point cloud, obtains more obvious point cloud characteristic data through a down-sampling layer, reduces the complexity of the whole network, uses a multi-scale characteristic fusion algorithm and uses a novel down-sampling algorithm taking an attention mechanism as a fraction index. And respectively extracting global characteristics and local characteristics of the point cloud according to the output downsampling characteristics. The dimension of global feature vector extraction is set to be 512, and the dimension of local feature vector is set to be 128. Through 4 layers of multi-head attention mechanisms, the output of each multi-head attention mechanism is cascaded, a residual error structure, a data normalization layer and a ReLu activation layer are added in each multi-head attention layer, and the network convergence speed can be accelerated. Obtaining a high-dimensional global feature vector F_globalAnd local feature vector F_local。

1. For input, the input of a sample of point clouds is

Wherein N represents the input number, each input point cloud has d-dimensional vectors, common d is 3 or 6, when d is 3, the point cloud only contains position information, and when d is 6, the point cloud contains position information and normal vector information. F_pIs output via a relative position-coding layer, F_eIs F_pThrough lower miningThe output feature vector of the sample layer is,

F_othe output feature vector formula is as follows:

MSA(A)＝Attention(A，A，A)，

F₁＝MSA(F_e)，

F_i＝MSA(F_i-1)，i＝2，3，4，

wherein each MSA (F)_i) And representing the output characteristics of the ith attention layer, wherein the output dimension of each layer is consistent with the input dimension. W_oIs the weight of the linear layer, which changes during the network training process.

2. To extract global features F_globalAnd local feature F_localThe invention designs a novel feature fusion layer to replace the traditional symmetric function. The newly proposed fusion layer can still solve the problem of the disorder of the point cloud, and can also reduce the loss characteristics of the point cloud due to dimension reduction to a great extent, and the formula of the new fusion layer is as follows:

MCA(A，B)＝Attention(A，B，B)，

F_lg＝MCA(F_local，F_global)，

and step five, point cloud decoder structure. For different point cloud processing tasks, the structure at the decoder part is different due to the different output dimensions.

1. And (5) an object classification task. The classification network is shown in the overall flow chart decoder portion, see fig. 2. Will be high dimensional feature F_lgConversion to N_cBy planting object catalogue scores (e.g. aeroplanes, tables, chairs), and using numbersMarking, e.g. with 0 for aircraft class, 1 for desk class, 2 for chair class, total N_c40 object categories. The invention puts the high-dimensional features into a preset classification decoder, wherein the preset classification decoder comprises a convolution network, two cascaded linear, normalization and ReLu activation function combination layers and a dropout layer for preventing overfitting. Finally, the final classification score is predicted using a linear function

The category corresponding to the sequence with the maximum prediction score is the neural network prediction category.

2. And (5) an object segmentation task. For object segmentation, the network needs to label each point of the input with a label, e.g. table top part, table leg part, and we need to predict each point. Generally, each object type is divided into 2-4 parts, which are sequentially marked by using the sequence, and 20 object types and 50 object segmentation areas are counted. The invention uses a high-dimensional feature F_lgTo learn parts of the common model, we convert the labels into one-dimensional object vectors and high-dimensional features F_lgA cascade is performed. For the subsequent object segmentation decoder and the object classification decoder, features of high-dimensional information are extracted mainly by convolution, decoding is performed by two cascaded linear, normalization and ReLu activation function combination layers, and overfitt layers are used for preventing overfitting. Finally, we can predict the segmentation score of each point

Each point is determined as the portion of the label corresponding to the highest score.

And step six, carrying out a Mix-Net network training method.

1. And a point cloud classification task and a point cloud segmentation task data set.

In the point cloud classification task, we validated our Mix-Net network using the ModelNet40 test set, which contains 10000 point cloud information per point cloud sample and 40 object classes, including 9843 training samples and 2468 test samples. In the network training part, data enhancement is carried out on the input point cloud, wherein the data enhancement comprises range scaling, the scaling ratio is [0.8, 1.25], the area translation, the translation range is [ -0.1, 0.1], the random loss input is also included, and the loss ratio is [0, 12.5% ]. During the experiment, N1024, D6 (xyz + normal), 256 local _ dim, 512 global _ dim, and 50K are used. The result of the point cloud classification task is shown in fig. 4, and the classification accuracy on the test set can reach 92.5%.

In the point cloud segmentation task, we validate our Mix-Net network using the very challenging sharenet dataset, which includes 13998 training sets and 2874 training sets. Contains 16 different classes, each class containing 2 to 4 partial tags for a total of 50 partial tags. The segmentation objective is to classify the object for each input point. In the network training part, data enhancement is carried out on the input point cloud, including range scaling, the scaling ratio is [0.8, 1.25], zone translation, the translation range is [ -0.1, 0.1], and random loss input is also included, and the loss ratio is [0, 12.5% ]. During the experiment, we used N1024, D6 (xyz + normal), local _ dim 256, global _ dim 512, and K50. The result of the point cloud segmentation task part is shown in fig. 5, the left graph is marking point cloud, and the right graph is neural network prediction point cloud.

2. Mix-Net other training details: aiming at a point cloud classification task and a point cloud segmentation task. The loss functions are cross entropy loss functions, in the training, a random gradient descent method is used as the most optimized method, and the momentum value is set to be 0.9. The learning rates are all set to 1e-4, batch _ size is 32, gpu is RTX-3070, and neural network architecture is pytorch.

In conclusion, the invention provides a classification and segmentation algorithm for point cloud processing, and can effectively and accurately realize the tasks of object classification and object segmentation of point cloud. Compared with a common algorithm, the Mix-Net neural network is more suitable for point cloud processing, can better solve the problems of the replacement disorder and the sequence disorder of the point cloud in a novel feature extraction structure and a novel fusion structure, reduces the overall parameter quantity of the network, and improves the accuracy of overall output.

The present invention has been described in further detail with reference to the embodiments, and it is not to be construed that the embodiments are limited thereto. For those skilled in the art to which the present invention pertains and related arts, the extension, operation method and structure should be replaced based on the technical idea of the present invention.

Claims

1. An object classification and object segmentation method based on 3D point cloud information comprises the following steps:

step one, adding position codes to input point cloud information

For the input point cloud information, the input relative position information is added, the relative position information is added in the first input layer and the attention function, and a large number of experiments prove that the network performance can be improved and the final output accuracy can be improved due to the addition of the relative position code;

step two, data down-sampling algorithm

For input, a down-sampling algorithm is used, the complexity of the whole network is reduced, the network training speed is accelerated, overfitting is inhibited, and a down-sampling algorithm based on an attention mechanism replaces a farthest point distance algorithm;

step three, a multi-head attention mechanism of the relative position information fusion pair

As a main layer for extracting features in the network, a multi-head attention layer of a relative position information fusion pair is used for replacing a convolutional layer, and the multi-head attention layer on the surface after the experiment increases the network performance and improves the final output accuracy;

step four, designing the structure of the point cloud encoder

The designed point cloud encoder structure is used for extracting high-dimensional feature vectors from input point cloud data, the feature vectors comprise point cloud global feature information and point cloud local feature information, and when the point cloud global feature is obtained by fusing the features, a novel feature fusion mechanism is provided to replace a symmetric function;

step five, designing the structure of the point cloud decoder

Different decoder structures are designed according to different point cloud processing tasks, and are similar and different;

sixthly, designing a Mix-Net network training method

According to different point cloud processing tasks, a loss function is designed, an optimizer is designed according to different data sets, and various training parameter indexes are designed.

2. The method of claim 1, wherein the position coding of step one is implemented as follows: because the container of the input point cloud has a plurality of storage methods, the relative position information between the points, namely the same input different elements, is difficult to directly obtain in the network, the relative position information is introduced to increase the position characteristic extraction between the points, and the position coding formula is as follows:

wherein, P_iIs the input of data or a characteristic that,

the MLP is the average value of input data or characteristics, the MLP is a common multilayer perceptron, and multiple experiments show that the addition of position information in a first layer input and attention mechanism module obviously improves the performance of the whole network.

3. The method for object classification and object segmentation based on 3D point cloud information according to claim 1, wherein the data down-sampling algorithm in step two is implemented as follows:

(1) using multi-head attention system based algorithm to replace traditional farthest point sampling algorithm

The features of the convolution output are processed by a multi-head attention mechanism to obtain attention scores, the effect similar to that of k values acquired by a farthest point is achieved by selecting the first k maximum attention score values, experiments prove that the down-sampling algorithm used by the method is more suitable for point clouds, similar feature output can be obtained aiming at various different sequential inputs, and the problem of point cloud disorder can be solved;

(2) matching determined sequence points in the origin cloud input according to the obtained k maximum values, and representing the sequence points as key points, wherein the key points represent the most key parts of the point cloud and can be easily seen through key point visualization, and the key points are skeleton information of the point cloud;

(3) and according to the selected key points, selecting feature points around each key point by using a k-means algorithm, acquiring all feature points which accord with the k-means algorithm to extract features, fusing the multilayer features by using a multi-scale feature fusion mode, and outputting the down-sampled high-dimensional features.

4. The method for object classification and object segmentation based on 3D point cloud information as claimed in claim 1, wherein the implementation process of the multi-head attention mechanism of the relative position information fusion pair in step three is as follows:

(1) the overall performance of the network can be increased by using the relative distance at the input break, and a large number of experiments show that the overall network performance can still be increased by increasing the relative distance ρ in the attention function, and the relative distance is added to the attention function, and the formula is as follows:

q, K and V are query, key and value matrixes obtained by inputting respective change matrixes, softmax is an activation function, rho is a relative distance, a relative distance value is added at the transposition of K, a relative distance value is added at V, and a designed attention function extracts more sufficient information among elements in input and is more suitable for point cloud processing;

(2) introducing a multi-head attention mechanism: the input is passed through a plurality of independent attention layers, and finally the output results of each layer are cascaded to obtain the output of a multi-head attention layer, and the multi-head attention mechanism has obvious speed advantage in processing input data by realizing parallel operation, and has the following formula:

F_i＝Attention(Q_i，K_i，V_i)，i＝1，...，h，

MultiHead(Q，K，V)＝Concat(F₁，F₂，...，F_h)W^O

wherein, X represents the input of the input,

and

representing the output characteristic matrix, F_iRepresenting the value of the final output of each multi-attention layer, Concat representing the cascade, the multi-attention mechanism dividing the input into h independent attention layers, and operating each attention layer in parallel.

5. The method for object classification and object segmentation based on 3D point cloud information as claimed in claim 1, wherein the implementation process of the structural design of the point cloud encoder in step four is as follows:

(1) for input, the input of a sample of point clouds is

Wherein N represents the number of inputs, each input point has a d-dimensional vector, and d is 3Or 6, when d is 3, the point cloud only contains the position information, when d is 6, the point cloud contains the position information and the normal vector information, F_pIs output via a relative position-coding layer, F_eIs F_pThe output feature vectors that have passed through the down-sampling layer,

F_othe output feature vector formula is as follows:

MSA(A)＝Attention(A，A，A)，

F₁＝MSA(F_e)，

F_i＝MSA(F_i-1)，i＝2，3，4，

F_o＝Concat(F₁，F₂，...，F₄)·W^O，

wherein each MSA (F)_i) Representing the output characteristics of the ith attention layer, the output dimension of each layer is consistent with the input dimension, W_oIs the weight of the linear layer, which changes during the network training process:

(2) to extract global features F_globalAnd local feature F_localThe point cloud characteristic of (2) a new characteristic fusion layer is designed to replace the traditional symmetric function, the newly proposed fusion layer can still solve the problem of point cloud disorder, and the loss characteristic of the point cloud due to dimension reduction is reduced to a great extent, and the new fusion layer formula is as follows:

MCA(A，B)＝Attention(A，B，B)，

F_lg＝MCA(F_local，F_global)。