CN118765499A

CN118765499A - Encoding method, encoder, and storage medium

Info

Publication number: CN118765499A
Application number: CN202280092029.XA
Authority: CN
Inventors: 元辉; 王璐; 李明; 王晓辉
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2024-10-11
Also published as: WO2023159428A1

Abstract

The embodiment of the application provides an encoding method, an encoder and a storage medium. The method comprises the following steps: determining the coding sequence of attribute information of the current point cloud based on the geometric information of the current point cloud; determining a first quantized residual value of a current point to be encoded based on at least one neighbor point located before the current point in the encoding sequence; determining at least one second quantized residual value for the current point; determining a third quantized residual value based on the first quantized residual value and the at least one second quantized residual value; and encoding the third quantized residual value to obtain a code stream. The coding method provided by the application can improve the coding performance of the coder.

Description

Encoding method, encoder, and storage medium

Technical Field

The embodiment of the application relates to the technical field of encoding and decoding, and more particularly relates to an encoding method, an encoder and a storage medium.

Background

Point clouds have begun to spread into various fields, e.g., virtual/augmented reality, robotics, geographic information systems, medical fields, etc. Along with the continuous improvement of the reference degree and the speed of the scanning equipment, a large amount of point clouds on the surface of the object can be accurately acquired, and hundreds of thousands of points can be corresponding in one scene. Such a large number of points also presents challenges for storage and transmission of the computer. Thus, compression of the points also becomes a hotspot problem.

For compression of the point cloud, the encoder needs to compress its location information and attribute information; specifically, the encoder performs octree coding through the position information of the point cloud; meanwhile, the encoder selects a neighbor point for predicting the attribute value of the current point from the encoded points according to the position information of the current point after octree encoding, predicts the current point by referring to the selected neighbor point to obtain the attribute predicted value of the current point, calculates the residual value of the current point based on the attribute predicted value of the current point and the attribute original value of the current point, quantizes the residual value of the current point to obtain a quantized residual value, and finally the encoder transmits the quantized residual value of the current point to a decoding end in a code stream mode; the decoding end can obtain the quantized residual value of the current point by receiving and analyzing the code stream, and obtains the residual value of the current point by the steps of inverse transformation, inverse quantization and the like, and the decoding end predicts the quantized residual value by the same process to obtain an attribute predicted value, and obtains the attribute reconstruction value of the current point after overlapping the attribute predicted value with the residual value obtained by analyzing the code stream.

In general, after the encoder predicts the current point with reference to the selected neighboring point to obtain the attribute predicted value of the current point, the encoder needs to perform differential prediction on the attribute of the current point by using the attribute reconstructed value of the selected neighboring point to obtain the residual value of the current point, and then the encoder performs quantization encoding on the residual value of the current point. However, in this encoding process, since the encoder performs quantization encoding by using the same quantization parameter for all points in the same point cloud sequence, encoding efficiency of some points is too low or reconstruction quality is too low, and thus encoding performance of the encoder is reduced.

Disclosure of Invention

The embodiment of the application provides an encoding method, an encoder and a storage medium, which can improve the encoding performance of the encoder.

In a first aspect, the present application provides a coding method, comprising:

determining the coding sequence of attribute information of the current point cloud based on the geometric information of the current point cloud;

determining a first quantized residual value of a current point to be encoded based on at least one neighbor point located before the current point in the encoding sequence;

Determining at least one second quantized residual value for the current point;

Determining a third quantized residual value based on the first quantized residual value and the at least one second quantized residual value;

and encoding the third quantized residual value to obtain a code stream.

In a second aspect, the present application provides an encoder comprising:

a determining unit configured to:

Determining at least one second quantized residual value for the current point;

and the encoding unit is used for encoding the third quantized residual value to obtain a code stream.

In a third aspect, the present application provides an encoding apparatus comprising:

A processor adapted to implement computer instructions; and

A computer readable storage medium storing computer instructions adapted to be loaded by a processor and to perform the encoding method of the first aspect or implementations thereof.

In one implementation, the processor is one or more and the memory is one or more.

In one implementation, the computer-readable storage medium may be integral to the processor or separate from the processor.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing computer instructions that, when read and executed by a processor of a computer device, cause the computer device to perform the encoding method of the first aspect or implementations thereof.

In a fifth aspect, an embodiment of the present application provides a code stream, where the code stream is a code stream generated by the method in the first aspect or the implementation manner of the first aspect.

Based on the technical scheme, when the quantized residual value of the final coded code stream is determined aiming at the current point, at least one second quantized residual value is introduced on the basis of considering the first quantized residual value determined based on at least one neighbor point positioned in front of the current point to be coded in the coding sequence, and the quantization scale of the current point is different because the current point adopts different quantized residual values; therefore, the encoder determines the third quantized residual value according to the first quantized residual value and the at least one second quantized residual value, and encodes the third quantized residual value, so that the encoder can be prevented from adopting the same quantization scale for all points in the current point cloud to perform quantization encoding, and is also beneficial to selecting a proper quantization scale for the points in the current point cloud, further, the encoding efficiency of some points or the reconstruction quality of some points can be prevented from being too low, and the encoding performance of the encoder is improved.

Drawings

Fig. 1 is an example of a point cloud image provided by an embodiment of the present application.

Fig. 2 is a partial enlarged view of the point cloud image shown in fig. 1.

Fig. 3 is an example of a point cloud image with six viewing angles provided by an embodiment of the present application.

Fig. 4 is a schematic block diagram of an encoding framework provided by an embodiment of the present application.

Fig. 5 is an example of a bounding box provided by an embodiment of the present application.

Fig. 6 is an example of octree partitioning of bounding boxes provided by an embodiment of the present application.

Fig. 7 to 9 show the arrangement order of morton codes in a two-dimensional space.

Fig. 10 shows the arrangement order of morton codes in three-dimensional space.

Fig. 11 is a schematic block diagram of an LOD layer provided by an embodiment of the present application.

Fig. 12 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.

Fig. 13 is another schematic block diagram of an encoding framework provided by an embodiment of the present application.

Fig. 14 is another schematic block diagram of a decoding framework provided by an embodiment of the present application.

Fig. 15 is a schematic flowchart of an encoding method provided by an embodiment of the present application.

Fig. 16 is a schematic block diagram of an attribute code stream provided in an embodiment of the present application.

Fig. 17 is a schematic block diagram of a load in an attribute code stream according to an embodiment of the present application.

Fig. 18 is another schematic flow chart of an encoding method provided by an embodiment of the present application.

Fig. 19 is a schematic flowchart of a decoding method provided by an embodiment of the present application.

Fig. 20 is a schematic flow chart of an encoder provided by an embodiment of the present application.

Fig. 21 is another schematic flow chart of an encoder provided by an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

A Point Cloud (Point Cloud) is a set of irregularly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. Fig. 1 and 2 show a three-dimensional point cloud image and a partial magnified view, respectively, it being seen that the point cloud surface consists of densely distributed points.

The two-dimensional image has information expression at each pixel point, so that the position information of the two-dimensional image does not need to be additionally recorded; however, the distribution of the points in the point cloud in the three-dimensional space has randomness and irregularity, so that the position of each point in the space needs to be recorded to completely express a point cloud. Similar to a two-dimensional image, each point in the point cloud has corresponding attribute information, typically RGB color values, which reflect the color of the object; for the point cloud, the attribute information corresponding to each point may be a reflectance (reflectance) value, which reflects the surface texture of the object, in addition to the color. Each point in the point cloud may include geometric information and attribute information, where the geometric information of each point in the point cloud refers to cartesian three-dimensional coordinate data of the point, and the attribute information of each point in the point cloud may include, but is not limited to, at least one of the following: color information, material information, laser reflection intensity information. The color information may be information in any color space. For example, the color information may be Red Green Blue (RGB) information. For another example, the color information may also be luminance and chrominance (YCbCr, YUV) information. Where Y represents brightness (Luma), cb (U) represents a blue chrominance component, and Cr (V) represents a red chrominance component. Each point in the point cloud has the same amount of attribute information. For example, each point in the point cloud has both color information and laser reflection intensity attribute information. For another example, each point in the point cloud has three attribute information, namely color information, material information and laser reflection intensity information.

The point cloud image may have a plurality of viewing angles, for example, six viewing angles that the point cloud image may have as shown in fig. 3, and the data storage format corresponding to the point cloud image is composed of a file header information part and a data part, where the header information includes a data format, a data representation type, a point cloud total point number, and a content represented by the point cloud.

The point cloud can flexibly and conveniently express the space structure and the surface attribute of a three-dimensional object or scene, and can provide extremely strong sense of reality on the premise of ensuring the accuracy because the point cloud is obtained by directly sampling the real object, so that the application range is wide, and the range comprises virtual reality games, computer aided design, geographic information systems, automatic navigation systems, digital cultural heritage, free viewpoint broadcasting, three-dimensional immersion remote presentation, three-dimensional reconstruction of biological tissue and organs and the like.

For example, the point clouds may be divided into two major categories based on the application scenario, namely machine-aware point clouds and human eye-aware point clouds. Application scenarios for machine-aware point clouds include, but are not limited to: the system comprises an autonomous navigation system, a real-time inspection system, a geographic information system, a visual sorting robot, an emergency rescue and disaster relief robot and other point cloud application scenes. Application scenarios for human eye-aware point clouds include, but are not limited to: point cloud application scenes such as digital cultural heritage, free viewpoint broadcasting, three-dimensional immersive communication, three-dimensional immersive interaction and the like. Correspondingly, the point cloud can be divided into a dense point cloud and a sparse point cloud based on the acquisition mode of the point cloud; the point cloud may be divided into a static point cloud and a dynamic point cloud based on an acquisition path of the point cloud, and more specifically, may be divided into three types of point clouds, i.e., a first static point cloud, a second type of dynamic point cloud, and a third type of dynamic acquisition point cloud. For a first static point cloud, the object is stationary and the device acquiring the point cloud is also stationary; for a second type of dynamic point cloud, the object is moving, but the device acquiring the point cloud is stationary; for a third type of dynamic acquisition point cloud, the device acquiring the point cloud is in motion.

Exemplary acquisition approaches for point clouds include, but are not limited to: computer generation, 3D laser scanning, 3D photogrammetry, and the like. The computer can generate a point cloud of the virtual three-dimensional object and the scene; the 3D laser scanning can obtain the point cloud of a static real world three-dimensional object or scene, and millions of point clouds can be obtained every second; 3D photogrammetry can obtain a point cloud of a dynamic real world three-dimensional object or scene, which can be tens of millions of point clouds per second. Specifically, the point cloud on the surface of the object can be acquired through acquisition equipment such as photoelectric radars, laser scanners, multi-view cameras and the like. A point cloud obtained according to the laser measurement principle may include three-dimensional coordinate information of a point and laser reflection intensity (reflection) of the point. A point cloud, which is derived from photogrammetry principles, may include three-dimensional coordinate information of the points and color information of the points. A point cloud is obtained in combination with laser measurement and photogrammetry principles, which may include three-dimensional coordinate information of the point, laser reflection intensity (reflection) of the point, and color information of the point. The technology reduces the acquisition cost and time period of the point cloud data and improves the accuracy of the data. For example, in the medical field, point clouds of biological tissue organs can be obtained from magnetic resonance imaging (magnetic resonance imaging, MRI), computed tomography (computed tomography, CT), electromagnetic localization information. The technology reduces the acquisition cost and time period of the point cloud and improves the accuracy of the data. The transformation of the point cloud data acquisition mode enables the acquisition of a large amount of point cloud data, and the processing of the massive 3D point cloud data encounters the bottleneck of storage space and transmission bandwidth limitation along with the increase of application requirements.

Taking a point cloud video with a frame rate of 30fps (frames per second) as an example, the number of points per frame of point cloud is 70 ten thousand, wherein each point in each frame of point cloud has coordinate information xyz (float) and color information RGB (uchar), the data size of a 10 s-length point cloud video is about 0.7 million (million) × (4 byte×3+1byte×3) ×30fps×10 s=3.15 GB, and the data size of a 10s two-view 3D video is about 0.33×2=0.66 GB for a 1280×720 two-dimensional video with a frame rate of 24fps, and the data size of 10s is about 1280×720×12 bits×24frames×10s≡0.33 GB. It can be seen that the data volume of the point cloud video far exceeds the data volumes of the two-dimensional video and the three-dimensional video of the same duration. Therefore, in order to better realize data management, save the storage space of the server, reduce the transmission flow and transmission time between the server and the client, and the point cloud compression becomes a key problem for promoting the development of the point cloud industry.

The point cloud compression generally adopts a mode of respectively compressing point cloud geometric information and attribute information, at a coding end, the point cloud geometric information is firstly coded in a geometric coder, and then the reconstructed geometric information is input into the attribute coder as additional information so as to assist in the attribute compression of the point cloud; at the decoding end, the point cloud geometric information is firstly decoded in a geometric decoder, and then the decoded geometric information is input into an attribute decoder as additional information to assist in the attribute compression of the point cloud. The whole coder consists of preprocessing/post-processing, geometric coding/decoding and attribute coding/decoding.

Illustratively, the point cloud may be encoded and decoded by various types of encoding and decoding frameworks, respectively. As an example, the codec frame may be a geometric point cloud compression (Geometry Point Cloud Compression, G-PCC) codec frame or a video point cloud compression (Video Point Cloud Compression, V-PCC) codec frame provided by the moving picture experts group (Moving Picture Experts Group, MPEG), or may be an AVS-PCC codec frame or a point cloud compression reference Platform (PCRM) frame provided by the audio video coding standard (Audio Video Standard, AVS) thematic group. The G-PCC codec framework may be configured to compress for a first static point cloud and a third type of dynamic acquisition point cloud, and the V-PCC codec framework may be configured to compress for a second type of dynamic point cloud. The G-PCC codec framework is also referred to as Point cloud codec TMC13, and the V-PCC codec framework is also referred to as Point cloud codec TMC2. Both G-PCC and AVS-PCC are directed to a static sparse point cloud, with the coding framework being approximately the same.

The following describes a codec framework applicable to the embodiment of the present application, taking the G-PCC framework as an example.

As shown in fig. 4, the encoding framework 100 may acquire location information and attribute information of a point cloud from an acquisition device. The encoding of the point cloud includes position encoding and attribute encoding. In one embodiment, the process of position encoding includes: carrying out pretreatment such as coordinate transformation, quantification, repeated point removal and the like on the original point cloud; and constructing an octree, and then encoding to form a geometric code stream.

As shown in fig. 4, the position encoding process of the encoder may be implemented by:

A coordinate transformation (Tanmsform coordinates) unit 101, a quantization and removal repetition point (Quantize and remove points) unit 102, an octree analysis (analysis octree) unit 103, a geometric reconstruction (Reconstruct geometry) unit 104, and a first arithmetic coding (ARITHMETIC ENCODE) unit 105.

The coordinate transformation unit 101 may be used to transform world coordinates of points in the point cloud into relative coordinates. For example, the geometrical coordinates of the points are respectively subtracted by the minimum value of xyz coordinate axes, which corresponds to a dc-cut operation, to achieve the transformation of the coordinates of the points in the point cloud from world coordinates to relative coordinates. Quantifying and removing the repeat point unit 102 may reduce the number of coordinates by quantifying; the original different points may be given the same coordinates after quantization, based on which duplicate points may be deleted by a deduplication operation; for example, multiple clouds with the same quantization location and different attribute information may be merged into one cloud through attribute transformation. In some embodiments of the present application, the quantization and removal of the repeat point unit 102 is an optional unit module. The octree analysis unit 103 may encode the position information of the quantized points using an octree (octree) encoding scheme. For example, the point cloud is regularized in the form of an octree, so that the positions of the points can be in one-to-one correspondence with the positions of the octree, and geometric encoding is performed by counting the positions of the points in the octree and marking the identification (flag) thereof as 1. The first arithmetic coding unit 105 may perform arithmetic coding on the position information output by the octree analysis unit 103 in an entropy coding manner, that is, generate a geometric code stream using the position information output by the octree analysis unit 103 in an arithmetic coding manner; the geometric code stream may also be referred to as a geometric bit stream (geometry bit stream).

The following describes a regularization processing method of the point cloud.

Because of the characteristic of irregular distribution of the point cloud in space, challenges are brought to the encoding process, and therefore, the points in the point cloud are regularly expressed as the center of a cube by adopting a recursive octree structure. For example, as shown in fig. 5, the whole point cloud may be placed in a cube bounding box, where coordinates of points in the point cloud may be represented as (x ^k,y ^k,z ^k), k=0, …, and K-1, where K is the total number of points in the point cloud, and the boundary values of the point cloud in the x-axis, y-axis, and z-axis directions are respectively:

x ^min＝min(x ⁰,x ¹,…,x ^K-1)；

y ^min＝min(y ⁰,y ¹,…,y ^K-1)；

z ^min＝min(z ⁰,z ¹,…,z ^K-1)；

x ^max＝max(x ⁰,x ¹,…,x ^K-1)；

y ^max＝max(y ⁰,y ¹,…,y ^K-1)；

z ^max＝max(z ⁰,z ¹,…,z ^K-1)。

In addition, the origin (x ^origin,y ^origin,z ^origin) of the bounding box can be calculated as follows:

x ^origin＝int(floor(x ^min))；

y ^origin＝int(floor(y ^min))；

z ^origin＝int(floor(z ^min))。

wherein floor () represents a rounding-down calculation or a rounding-down calculation. int () represents a rounding operation.

Based on this, the encoder may calculate the dimensions of the bounding box in the x-axis, y-axis, and z-axis directions based on the calculation formulas of the boundary value and the origin as follows:

BoudingBoxSize_x＝int(x ^max-x ^origin)+1；

BoudingBoxSize_y＝int(y ^max-y ^origin)+1；

BoudingBoxSize_z＝int(z ^max-z ^origin)+1。

As shown in fig. 6, after the encoder obtains the dimensions of the bounding box in the x-axis, y-axis and z-axis directions, the bounding box is first octree-divided to obtain eight sub-blocks each time, then the non-empty blocks (blocks containing points) in the sub-blocks are octree-divided again, so that the non-empty sub-blocks of the final size are recursively divided up to a certain depth, and the non-empty sub-blocks of the final size are called voxels (voxels), each voxel contains one or more points, the geometric positions of the points are normalized to the center point of the voxel, and the attribute value of the center point takes the average value of the attribute values of all the points in the voxel. Regularizing the point cloud as a block in space is advantageous for describing the positional relationship of points in the point cloud with points in front of the points, and thus for designing a specific coding order, based on which the encoder can encode each voxel (voxel) based on the determined coding order, i.e. encode the point (or "node") represented by each voxel.

And reconstructing the geometric information after the geometric coding of the encoder is completed, and coding the attribute information by using the reconstructed geometric information. The attribute encoding process includes: and selecting one of three prediction modes to conduct point cloud prediction by giving the reconstruction information of the position information of the input point cloud and the true value of the attribute information, quantifying the predicted result, and conducting arithmetic coding to form an attribute code stream.

As shown in fig. 4, the property encoding process of the encoder may be implemented by:

A color space Transform (Transform colors) unit 110, an attribute Transform (Transfer attributes) unit 111, a Region-adaptive hierarchical Transform (Region ADAPTIVE HIERARCHICAL Transform, RAHT) unit 112, a prediction Transform (PREDICTING TRANSFORM) unit 113, and a lifting Transform (lifting Transform) unit 114, a quantization (quantization) unit 115, and a second arithmetic coding unit 116.

The color space conversion unit 110 may be used to convert the RGB color space at the point cloud into YCbCr format or other formats. The attribute transformation unit 111 may be used to transform attribute information of points in the point cloud to minimize attribute distortion. For example, in the case of geometric lossy encoding, since geometric information is changed after geometric encoding, the attribute transformation unit 111 is required to reassign an attribute value for each point after geometric encoding so that an attribute error of the reconstructed point cloud and the original point cloud is minimized. For example, the attribute information may be color information of a point. The attribute transformation unit 111 may be configured to obtain an attribute original value of a point, and after the attribute original value of the point is transformed by the attribute transformation unit 111, any prediction unit may be selected to predict the point in the point cloud. The means for predicting points in the point cloud may include: RAHT 112, a prediction change (PREDICTING TRANSFORM) unit 113, and a lift change (lifting transform) unit 114. In other words, any one of RAHT 112,112, the prediction change (PREDICTING TRANSFORM) unit 113, and the promotion change (lifting transform) unit 114 may be used to predict attribute information of a point in the point cloud to obtain an attribute prediction value of the point, and further may obtain a residual value of the attribute information of the point based on the attribute prediction value of the point. For example, the residual value of the attribute information of the point may be the attribute original value of the point minus the attribute predicted value of the point. The quantization unit 115 may be used to quantize residual values of attribute information of points. For example, if the quantization unit 115 is connected to the predictive conversion unit 113, the quantization unit 115 may be used to quantize a residual value of attribute information of a point output by the predictive conversion unit 113. For example, the residual value of the attribute information of the point output from the prediction transform unit 113 is quantized using a quantization step to achieve enhancement of system performance. The second arithmetic coding unit 116 may entropy-encode the residual value of the point attribute information using zero-run coding (Zero run length coding) to obtain an attribute bitstream. The attribute code stream may be bit stream information.

The prediction transform unit 113 may be configured to obtain an original order (original order) of the point cloud, divide the point cloud into detail Layers (LOD) based on the original order of the point cloud, and after the prediction transform unit 113 obtains the LOD of the point cloud, predict attribute information of points in the LOD sequentially, so as to calculate a residual value of the attribute information of the points, so that a subsequent unit performs subsequent quantization encoding processing based on the residual value of the attribute information of the points. For each point in the LOD, finding 3 neighbor points positioned in front of the current point based on neighbor point search results on the LOD where the current point is positioned, and predicting the current point by using an attribute reconstruction value of at least one neighbor point in the 3 neighbor points to obtain an attribute prediction value of the current point; based on this, the residual value of the attribute information of the current point can be obtained based on the attribute predicted value of the current point and the attribute original value of the current point.

The original order of the point clouds acquired by the prediction transformation unit 113 may be the resulting arrangement order in which the prediction transformation unit 113 morton reorders the current point clouds. The encoder can obtain the original sequence of the current point cloud by reordering the current point cloud, and after obtaining the original sequence of the current point cloud, the encoder can divide the points in the point cloud according to the original sequence of the current point cloud so as to obtain LOD of the current point cloud, and further predict the attribute information of the points in the point cloud based on the LOD.

As shown in fig. 7, the encoder may employ a "z" shaped morton arrangement in a two-dimensional space formed by 2x 2 blocks. As shown in fig. 8, the encoder may use a "z" morton arrangement sequence in a two-dimensional space formed by 4 x2 blocks, where the two-dimensional space formed by 2x 2 blocks may also use a "z" morton arrangement sequence, and finally, the morton arrangement sequence used by the encoder in the two-dimensional space formed by 4*4 blocks may be obtained. As shown in fig. 9, the encoder may use a "z" morton arrangement sequence in a two-dimensional space formed by 4 4*4 blocks, where the "z" morton arrangement sequence may also be used in a two-dimensional space formed by 2x 2 blocks per 4 and a two-dimensional space formed by 2x 2 blocks per 2x 2, and finally, the morton arrangement sequence used by the encoder in a two-dimensional space formed by 8 x 8 blocks may be obtained.

Fig. 10 shows the arrangement order of morton codes in three-dimensional space.

As shown in fig. 10, the morton arrangement sequence is not only applicable to two-dimensional space, but also can be extended to three-dimensional space, for example, 16 points are shown in fig. 10, and each of the morton arrangement sequences between each of the "z" and the "z" is encoded along the x-axis, then along the y-axis, and finally along the z-axis.

The LOD generation process comprises the following steps: acquiring Euclidean distance between points according to the position information of points in the point cloud; the points are separated into different LOD layers according to euclidean distance. In one embodiment, the Euclidean distances may be sorted and the different ranges of Euclidean distances divided into different LOD layers. For example, a point may be randomly selected as the first LOD layer. And then calculating the Euclidean distance between the rest points and the points, and classifying the points with the Euclidean distance meeting the first threshold value requirement as the second LOD layer. And acquiring the mass center of the midpoint of the second LOD layer, calculating the Euclidean distance between the points except the first LOD layer and the second LOD layer and the mass center, and classifying the point with the Euclidean distance conforming to the second threshold value as a third LOD layer. And so on, all points are assigned to the LOD layer. By adjusting the threshold value of the euclidean distance, the number of points per layer LOD can be made incremental. It should be understood that the LOD layer may be divided in other ways, which the present application is not limited to. It should be noted that, the point cloud may be directly divided into one or more LOD layers, or the point cloud may be divided into a plurality of point cloud slices (slices) first, and then each point cloud slice may be divided into one or more LOD layers. For example, the point cloud may be divided into a plurality of point cloud tiles, and the number of points per point cloud tile may be between 55 ten thousand and 110 ten thousand. Each point cloud tile can be viewed as a separate point cloud. Each point cloud tile may in turn be divided into a plurality of LOD layers, each LOD layer comprising a plurality of points. In one embodiment, the partitioning of the LOD layer may be done according to the euclidean distance between points.

As shown in fig. 11, assuming that the point cloud includes a plurality of points arranged in the original order (original order), i.e., P0, P1, P2, P3, P4, P5, P6, P7, P8, and P9, it is assumed that the point cloud can be divided into 3 LOD layers, i.e., LOD0, LOD1, and LOD2, based on the euclidean distance between the points. Where LOD0 may include P0, P5, P4, and P2, LOD2 may include P1, P6, and P3, and LOD3 may include P9, P8, and P7. At this time, LOD0, LOD1, and LOD2 may be used to form LOD-based orders (LOD-based orders) of the point cloud, i.e., P0, P5, P4, P2, P1, P6, P3, P9, P8, and P7. The LOD-based order may be used as the encoding order for the point cloud.

Illustratively, when predicting a current point in the point cloud, the encoder creates a plurality of prediction variable candidates, that is, indexes of the prediction modes (predmodes), based on neighbor point search results on the LOD where the current point is located, and the values may be 0 to 3. For example, when the attribute information of the current point is encoded by using a prediction mode, the encoder finds 3 neighboring points located before the current point based on the neighboring point search result on the LOD where the current point is located, wherein the prediction mode with index of 0 refers to determining a weighted average of reconstructed attribute values of 3 neighboring points as an attribute prediction value of the current point based on the distance between the 3 neighboring points and the current point; the prediction mode with index of 1 refers to taking the attribute reconstruction value of the nearest neighbor point in the 3 neighbor points as the attribute prediction value of the current point; the prediction mode with index of 2 refers to taking the attribute reconstruction value of the next neighbor point as the attribute prediction value of the current point; the prediction mode with index of 3 refers to taking attribute reconstruction values of neighbor points except the nearest neighbor point and the next neighbor point in the 3 neighbor points as attribute prediction values of the current point; after obtaining candidates for the attribute predictor of the current point based on the above-described various prediction modes, the encoder may select the best attribute predictor using a rate distortion optimization (Rate distortion optimization, RDO) technique and then arithmetically encode the selected attribute predictor.

Further, if the index of the prediction mode of the current point is 0, coding is not required to encode the index of the prediction mode in the code stream, and if the index of the prediction mode selected by RDO is 1,2 or 3, coding is required to encode the index of the selected prediction mode in the code stream, that is, coding is required to encode the index of the selected prediction mode into the attribute code stream.

TABLE 1

Predictor index	Predicted value
0	average
1	P4(1 ^st nearest point)
2	P5(2 ^nd nearest point)
3	P0(3 ^rd nearest point)

As shown in table 1, when the attribute information of the current point P2 is encoded using the prediction method, the prediction mode with index 0 refers to determining the weighted average of the reconstructed attribute values of the neighbor points P0, P5 and P4 as the attribute prediction value of the current point P2 based on the distances of the neighbor points P0, P5 and P4; the prediction mode with index of 1 refers to taking the attribute reconstruction value of the nearest neighbor point P4 as the attribute prediction value of the current point P2; the prediction mode with index of 2 refers to taking the attribute reconstruction value of the next neighbor point P5 as the attribute prediction value of the current point P2; the prediction mode with index of 3 refers to taking the attribute reconstruction value of the next neighbor point P0 as the attribute prediction value of the current point P2.

The RDO technique is exemplified below.

The encoder calculates the maximum difference maxDiff of the attribute of at least one neighbor point of the current point, compares maxDiff with a set threshold, and uses a prediction mode of weighted average of neighbor point attribute values if the maximum difference is smaller than the set threshold; otherwise, the optimal prediction mode is selected for the point by using RDO technology. Specifically, the encoder calculates a maximum difference maxDiff in attribute of at least one neighbor point of the current point, for example, first calculates a maximum difference in R component of at least one neighbor point of the current point, that is, max (R1, R2, R3) -min (R1, R2, R3); similarly, the encoder calculates the maximum difference in the G and B components of at least one neighbor point of the current point, i.e., max (G1, G2, G3) -min (G1, G2, G3) and max (B1, B2, B3) -min (B1, B2, B3), and then selects the maximum difference value in the R, G, B components as maxDiff, i.e., maxDiff =max (max (R1, R2, R3) -min (R1, R2, R3), max (G1, G2, G3) -min (G1, G2, G3), max (B1, B2, B3) -min (B1, B2, B3)); the encoder compares the obtained maxDiff with a set threshold value, and if the obtained maxDiff is smaller than the set threshold value, the prediction mode of the current point is set to be 0, namely predmode=0; if greater than or equal to the set threshold, the encoder may determine the prediction mode used by the current point using RDO techniques for the current point. For RDO techniques, the encoder may calculate a corresponding rate-distortion cost for each prediction mode of the current point, and then select the prediction mode with the smallest rate-distortion cost, i.e., the optimal prediction mode, as the attribute prediction mode of the current point.

Illustratively, the rate-distortion cost for a prediction mode with index 1, 2, or 3 may be calculated by the following formula:

J _{indx_i}＝D _{indx_i}+λ×R _{indx_i}；

wherein J _{indx_i} represents the rate distortion cost when the current point adopts the prediction mode with index i, and D is the sum of attrResidualQuant three components, namely d= attrResidualQuant [0] + attrResidualQuant [1] + attrResidualQuant [2]. And lambda is determined according to the quantization parameter of the current point, and R _{indx_i} represents the bit number required by the quantization residual value in the code stream, which is obtained when the current point adopts the prediction mode with index of i.

Illustratively, after determining the prediction mode used by the current point, the encoder may determine the attribute predicted value attrPred of the current point based on the determined prediction mode, and then subtract the attribute original value attrValue of the current point from the attribute predicted value attrPred of the current point and quantize the result to obtain the quantized residual value attrResidualQuant of the current point. For example, the encoder may determine the quantized residual value of the current point by the following equation:

attrResidualQuant＝(attrValue-attrPred)/Qstep；

Wherein attrResidualQuant denotes a quantization residual value of the current point, attrPred denotes an attribute predicted value of the current point, attrValue denotes an attribute original value of the current point, and Qstep denotes a quantization step size. Wherein Qstep is calculated from the quantization parameters (Quantization Parameter, qp).

Illustratively, the attribute reconstruction value of the current point may be used as a neighbor candidate of the subsequent point, and the attribute information of the subsequent point is predicted by using the reconstruction value of the current point. The encoder may reconstruct a value of an attribute of the current point determined based on the first quantized residual value by:

Recon＝attrResidualQuant×Qstep+attrPred；

Wherein Recon represents the attribute reconstruction value of the current point determined based on the quantized residual value of the current point, attrResidualQuant represents the quantized residual value of the current point, qstep represents the quantization step size, and attrPred represents the attribute prediction value of the current point. Wherein Qstep is calculated from the quantization parameters (Quantization Parameter, qp).

In the present application, the attribute predicted value (predictedvalue) of the current point may also be referred to as a predicted value or a color predicted value (predictedColor) of the attribute information. The original value of the attribute of the current point may also be referred to as a true value or a color original value of the attribute information of the current point. The residual value of the current point may also be referred to as a difference between the original value of the attribute of the current point and the predicted value of the attribute of the current point or may also be referred to as a color residual value of the current point (residualColor). The reconstructed value of the attribute of the current point (reconstructedvalue) may also be referred to as a reconstructed value or a color reconstructed value of the attribute of the current point (reconstructedColor).

Fig. 12 is a schematic block diagram of a decoding framework 200 provided by an embodiment of the present application.

The decoding framework 200 may obtain the code stream of the point cloud from the encoding device, and obtain the position information and the attribute information of the points in the point cloud by parsing the code. Wherein the decoding of the point cloud includes position decoding and attribute decoding. The process of position decoding includes: performing arithmetic decoding on the geometric code stream; combining after constructing the octree, and reconstructing the point position information to obtain the reconstruction information of the point position information; and carrying out coordinate transformation on the reconstruction information of the point position information to obtain the point position information. The positional information of the points may also be referred to as geometric information of the points. The attribute decoding process includes: acquiring a residual value of attribute information of points in the point cloud by analyzing the attribute code stream; performing inverse quantization on the residual value of the point attribute information to obtain a residual value of the inverse quantized point attribute information; selecting one of three prediction modes to perform point cloud prediction based on reconstruction information of the position information of the point obtained in the position decoding process, and obtaining an attribute reconstruction value of the point; and performing color space inverse transformation on the point attribute reconstruction value to obtain a decoding point cloud.

As shown in fig. 12, the position decoding can be achieved by: a first arithmetic decoding unit 201, an octree analysis (synthesize octree) unit 202, a geometric reconstruction (Reconstruct geometry) unit 203, and a coordinate inverse change (inverse transform coordinates) unit 204. Attribute encoding may be achieved by: a second arithmetic decoding unit 210, an inverse quantization (inverse quantize) unit 211, a RAHT unit 212, a prediction change (PREDICTING TRANSFORM) unit 213, a lifting change (lifting transform) unit 214, and a color space inverse transform (inverse transform colors) unit 215.

It should be noted that decompression is the inverse of compression, and similarly, the functions of the respective units in the decoding framework 200 can be referred to as the functions of the corresponding units in the encoding framework 100. For example, the decoding framework 200 may divide the point cloud into a plurality of LODs according to the euclidean distance between points in the point cloud; then, sequentially decoding attribute information of points in the LOD; for example, the number of zeros (zero_cnt) in the zero-run encoding technique is calculated to decode the residual based on the number of zeros; then, the decoding framework 200 may perform inverse quantization based on the decoded residual value, and add the quantized residual value to the predicted value of the current point to obtain a reconstructed value of the point cloud until all the point clouds are decoded. The current point will be the nearest neighbor of the subsequent LOD point, and the attribute information of the subsequent point is predicted by using the reconstructed value of the current point.

A codec frame to which the embodiments of the present application are applicable will be described below by taking PCRM frames as an example.

Fig. 13 is a schematic block diagram of an encoding framework provided by an embodiment of the present application.

As shown in fig. 13, in the encoding framework, geometric information of the point cloud and attribute information corresponding to each point are separately encoded.

In the geometric coding part of the coding end, the original geometric information is preprocessed, namely, the geometric information is subjected to coordinate transformation through coordinate translation, so that point clouds are all contained in a bounding box. Then converting the geometric information from the floating point number to plastic through coordinate quantization, facilitating subsequent regularization processing, wherein the geometric information of a part of points is the same due to quantization rounding, and determining whether to remove the repeated points is needed at the moment, and the quantization and the repeated point removal belong to a preprocessing process; the regularized geometry information is then geometrically encoded, i.e., the bounding box is partitioned (e.g., octree/quadtree/binary tree) in order of breadth-first traversal, and the placeholder for each node is encoded. As an example, in the octree-based geometric code frame, bounding boxes are sequentially divided to obtain subcubes, and the subcubes which are not empty (contain points in a point cloud) are continuously divided until the division is stopped when the leaf node obtained by the division is a unit cube of 1x1x1 so as to encode the points contained in the leaf node, and finally, the encoding of the geometric octree is completed, and a binary code stream is generated. In the geometric decoding process based on octree, the decoding end firstly obtains the occupation code of each node through continuous analysis according to the breadth-first traversal sequence, and continuously divides the nodes in sequence until the unit cubes of 1x1x1 are obtained by division, then analyzes the points contained in each leaf node to obtain the point number, and finally recovers or reconstructs the geometric information of the point cloud.

It should be understood that, for avoiding repetition, the regularization processing method of the point cloud may refer to the descriptions of fig. 5 and fig. 6, which are not repeated herein.

And reconstructing the geometric information after the geometric coding of the encoder is completed, and coding the attribute information by using the reconstructed geometric information.

In the attribute encoding section, first, attribute encoding is mainly encoded for color information and/or reflectance information. Firstly, an encoder judges whether to convert a color space, if the processed attribute information is color information, the original color is required to be converted into a YUV color space which is more in line with the visual characteristics of human eyes; then, under the condition of geometric lossy coding, as the geometric information is changed after geometric coding, attribute values need to be reassigned for each point after geometric coding so as to minimize attribute errors of reconstructed point cloud and original point cloud, and the process is called attribute interpolation or attribute re-coloring; then, the attribute information after preprocessing is subjected to attribute encoding, and attribute prediction and attribute transformation are classified in the attribute information encoding. For attribute prediction in the attribute information encoding, the encoder may obtain an attribute prediction value of the current point in a prediction manner, and after obtaining the attribute prediction value of the current point, may be based on a residual value of the current point, for example, the residual value is a difference value between an attribute original value and an attribute prediction value of the current point. For attribute transformation in attribute information coding, an encoder performs wavelet transformation on an attribute original value of a current point to obtain a transformation coefficient, and then quantizes the obtained transformation coefficient to obtain a quantized transformation coefficient; further, the encoder obtains an attribute reconstruction value of the current point through inverse quantization and inverse wavelet transformation; the encoder then calculates the difference between the original value of the attribute of the current point and the reconstructed value of the attribute of the current point to obtain the residual value of the current point. After the encoder obtains the residual value of the current point, the residual value can be quantized to obtain the quantized residual value of the current point, and the quantized residual value is input into the attribute entropy encoder to form an attribute code stream.

For example, for attribute prediction in attribute information encoding, the encoder may reorder the current point cloud and make attribute predictions. Alternatively, methods of reordering are Morton reordering and Hilbert (Hilbert) reordering. The encoder can obtain the coding sequence of the current point cloud by reordering the current point cloud, and after the encoder obtains the coding sequence of the current point cloud, the encoder can predict the attribute information of the points in the point cloud according to the coding sequence of the current point cloud.

It should be appreciated that the relevant content of the morton code reordering may be found in the relevant content referred to in fig. 7-10 above, and will not be repeated here to avoid repetition.

And for attribute prediction of the current point in the current point cloud, if the geometric information of the current point and the coded point before the coding sequence is the same, namely the repeated point, using the attribute reconstruction value of the repeated point as the attribute prediction value of the current point. Optionally, if the geometric information of the current point and the encoded point before the encoding sequence is different, that is, not the repetition point, the encoder may select the previous m points as neighbor candidate points of the current point according to the encoding sequence, then calculate manhattan distances between the m points and the geometric information of the current point, determine n points closest to the m points as neighbor points of the current point, and then calculate a weighted average of the attributes of the n neighbors with the inverse of the distance as a weight, as an attribute prediction value of the current point.

Illustratively, the encoder may derive the attribute prediction value for the current point by:

attrPred＝(1/W ₁×ref ₁+1/W ₂×ref ₂+1/W ₃×ref ₃)/(1/W ₁+1/W ₂+1/W ₃).

wherein W ₁、W ₂、W ₃ represents the geometric distances between the neighbor point 1, the neighbor point 2, the neighbor point 3 and the current point, and ref ₁、ref ₂、ref ₃ represents the attribute reconstruction values of the neighbor point 1, the neighbor point 2, and the neighbor point 3, respectively.

Illustratively, the encoder may determine the quantized residual value of the current point by the following equation:

attrResidualQuant＝(attrValue-attrPred)/Qstep；

Recon＝attrResidualQuant×Qstep+attrPred；

Fig. 14 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.

As shown in fig. 14, at the decoding end, the manner of decoding geometry and attribute respectively is adopted. In the geometric decoding part, a decoder firstly carries out entropy decoding on the geometric code stream to obtain geometric information of each point, then constructs an octree structure of the point cloud in the same mode as geometric coding, combines the decoded geometric code stream to obtain the geometric information expressed by the octree structure, carries out coordinate inverse quantization and inverse translation on the geometric information expressed by the octree structure on one hand to obtain the decoded geometric information, and on the other hand, inputs the decoded geometric information into an attribute decoder as additional information. In the attribute decoding part, the Morton order is constructed in the same mode as the encoding end, and the attribute code stream is entropy decoded to obtain quantized residual information; then performing inverse quantization to obtain a point cloud residual error; similarly, according to the same mode as the attribute coding, obtaining an attribute predicted value of the current point to be decoded, and then adding the attribute predicted value and the residual error value to recover a YUV attribute value of the current point to be decoded; finally, the decoding attribute information is obtained through inverse transformation of the color space.

It follows that for compression of the point cloud, the encoder needs to compress its location information and attribute information, whether for the G-PCC framework or AVS-PCC framework; specifically, the encoder performs octree coding through the position information of the point cloud; meanwhile, the encoder selects a neighbor point for predicting the attribute value of the current point from the encoded points according to the position information of the current point after octree encoding, predicts the current point by referring to the selected neighbor point to obtain the attribute predicted value of the current point, calculates the residual value of the current point based on the attribute predicted value of the current point and the attribute original value of the current point, quantizes the residual value of the current point to obtain a quantized residual value, and finally the encoder transmits the quantized residual value of the current point to a decoding end in a code stream mode; the decoding end can obtain the quantized residual value of the current point by receiving and analyzing the code stream, and obtains the residual value of the current point by the steps of inverse transformation, inverse quantization and the like, and the decoding end predicts the quantized residual value by the same process to obtain an attribute predicted value, and obtains the attribute reconstruction value of the current point after overlapping the attribute predicted value with the residual value obtained by analyzing the code stream.

In view of the above, the present application provides an encoding method, an encoder, and a storage medium, which can improve the encoding performance of the encoder.

Fig. 15 is a schematic flow chart of an encoding-based method 300 provided by an embodiment of the present application. The method 300 may be performed by an encoder or an encoding framework, such as the encoding framework shown in fig. 4.

As shown in fig. 15, the encoding method 300 may include:

S310, determining the coding sequence of attribute information of the current point cloud based on the geometric information of the current point cloud;

s320, determining a first quantized residual value of a current point to be encoded based on at least one neighbor point positioned before the current point in the encoding sequence;

s330, determining at least one second quantized residual value of the current point;

S340, determining a third quantized residual value according to the first quantized residual value and the at least one second quantized residual value;

s350, encoding the third quantized residual value to obtain a code stream.

In an exemplary embodiment, the encoder may select one quantized residual value from the first quantized residual value and the at least one quantized residual value as the third quantized residual value after determining the first quantized residual value based on at least one neighboring point located before the current point in the encoding order, and encode the third quantized residual value to obtain an attribute code stream. Optionally, the third quantized residual value may be the first quantized residual value, or may be one quantized residual value in the at least one second quantized residual value, which is not specifically limited in the present application.

In the embodiment of the application, when the quantized residual value of the final coded code stream is determined aiming at the current point, the application introduces at least one second quantized residual value on the basis of considering the first quantized residual value determined based on at least one neighbor point positioned before the current point to be coded in the coding sequence, and the quantization scale of the current point is different because the current point adopts different quantized residual values; therefore, the encoder selects the third quantized residual value according to the first quantized residual value and the at least one second quantized residual value, and encodes the third quantized residual value, so that the encoder can be prevented from adopting the same quantization scale for all points in the current point cloud to perform quantization encoding, and is also beneficial to selecting a proper quantization scale for the points in the current point cloud, further, the encoding efficiency of some points or the reconstruction quality of some points can be prevented from being too low, and the encoding performance of the encoder is improved.

The following describes an attribute code stream obtained by encoding the third quantized residual value by the encoder.

As shown in fig. 16, because some point cloud data is too huge, before encoding the point cloud, the point cloud may be divided into a plurality of point cloud tiles (slices), which is equivalent to that before encoding the point cloud, the encoder converts the large-scale point cloud into a plurality of small-scale point clouds, and serial encoding is performed between the point cloud tiles, that is, after encoding all points in the first point cloud tile, the encoder encodes the second point cloud tile, and so on, until encoding all point cloud tiles. The encoder may obtain an Attribute code stream by encoding all the point cloud tiles included in the current point cloud, where the Attribute code stream may include Attribute parameter set (Attribute PARAMETER SET, APS) and Attribute header (ABH) information and load (Payload), the APS includes basic information and configuration files of the current point cloud, such as a layer number of detail layers (levels of detail, LOD), the ABH information includes a partition parameter of the point cloud tile, and each of the point cloud tiles is encoded, and the encoder encodes ABH information of the point cloud tile first and then encodes load of the point cloud tile, where the load of the point cloud tile is information obtained by encoding a quantized residual value of a point (i.e., a third quantized residual value of the current point).

As shown in fig. 17, the encoder may encode the quantized residual values of the points by using a run-Length (run Length) parameter to count the number of continuously distributed quantized residual values of the current point cloud, and encode the run Length first and then encode the quantized residual values (Res) when encountering the quantized residual values of non-zero, so as to obtain the attribute code stream of the current point cloud.

It should be noted that, before encoding the quantized residual value, the prediction mode PredMode used by the current point may be encoded, when predicting the attribute, when the attribute maximum difference maxDiff of at least one neighbor point of the current point is greater than or equal to a set threshold (for example, the error maxDiff between the attribute of the nearest neighbor point and the attribute of the farthest neighbor point in the 3 neighbor points is greater than the set threshold), the encoder needs to select the prediction mode PredMode of the current point (that is, the prediction modes of indexes 1-3 in table 1) by using a rate-distortion algorithm, where the encoder may also encode the prediction mode used by the current point, that is, the encoder may encode the prediction mode used by the current point before encoding the quantized residual value and after writing the run length parameter. Of course, when the attribute maximum difference maxDiff of at least one neighboring point of the current point is smaller than the set threshold, the encoder predicts the attribute information of the current point by using a prediction mode with index of 0, and the encoder may not write the prediction mode used by the current point into the code stream.

In some embodiments, the S330 may include:

The at least one second quantized residual value is determined based on the first quantized residual value.

Illustratively, the first quantized residual value is proportional to the at least one second quantized residual value.

Illustratively, each of the at least one second quantized residual is smaller than the first quantized residual.

Illustratively, a portion of the at least one second quantized residual value is smaller than the first quantized residual value, and another portion of the at least one second quantized residual value is larger than the first quantized residual value.

It should be appreciated that the application is not limited to a particular implementation of determining the at least one second quantized residual value based on the first quantized residual value. For example, the at least one second quantized residual value may be determined based on the first quantized residual value based on a certain functional relationship or mapping relationship.

In some embodiments, the S330 may include:

and determining an integer smaller than the first quantized residual value as the at least one second quantized residual value.

Illustratively, when the value of the first quantized residual value is N, the number of the at least one second quantized residual values is N-1, and the values of the at least one second quantized residual values are respectively: 0, …, N-1. Wherein N is a positive integer.

In some embodiments, the S330 may include:

and determining a default residual value as the at least one second quantized residual value.

The default residual value may also be referred to as a default residual value, for example.

Illustratively, the default residual value may be agreed by a protocol. Wherein the protocol may be various encoding criteria or decoding standard protocols.

Illustratively, assume that the true residual value is set to 0.

In some embodiments, the S340 may include:

calculating the rate-distortion cost of the first quantized residual value and the rate-distortion cost of the second quantized residual value;

and determining the quantized residual value with the minimum rate distortion cost in the rate distortion cost of the first quantized residual value and the rate distortion cost of the second quantized residual value as the third quantized residual value.

Illustratively, the encoder calculates a rate-distortion cost of the first quantized residual value and a rate-distortion cost of each second quantized residual value, and determines a quantized residual value having the smallest rate-distortion cost among the first quantized residual value and the at least one second quantized residual value as the third quantized residual value. For example, when the rate-distortion cost of the first quantized residual value is smaller than the rate-distortion cost of each second quantized residual value in the at least one second quantized residual value, the first quantized residual value is determined to be the third quantized residual value, and when the second quantized residual value with the smallest rate-distortion cost in the at least one second quantized residual value is smaller than the first quantized residual value, the second quantized residual value with the smallest rate-distortion cost is determined to be the first quantized residual value.

In some embodiments, the rate-distortion cost of the first quantized residual value is calculated in the same manner as the rate-distortion cost of the second quantized residual value.

Of course, in other alternative embodiments, the calculation manner of the rate distortion cost of the first quantized residual value and the calculation manner of the rate distortion cost of the second quantized residual value may be different, which is not specifically limited in the present application.

In some embodiments, the rate distortion cost of the first quantized residual value or the second quantized residual value is calculated according to the following equation:

J ₁＝D ₁+λ×R ₁；

Wherein J ₁ represents a rate distortion cost of the first quantized residual value, D ₁ represents an error between an original value of an attribute of the current point and a reconstructed value of an attribute of the current point determined based on the first quantized residual value, λ is determined according to a quantization parameter of the current point, and R ₁ represents a number of bits required by the first quantized residual value in the code stream; or J ₁ represents the rate distortion cost of the second quantized residual value, D ₁ represents the error between the original value of the attribute of the current point and the reconstructed value of the attribute of the current point determined based on the second quantized residual value, λ is determined according to the quantization parameter of the current point, and R ₁ represents the number of bits required by the second quantized residual value in the code stream.

Illustratively, D may be obtained by the following formula:

D ₁＝|attrValue-Recon1|；

Wherein attrValue denotes an attribute original value of a current point, and Recon1 denotes an attribute reconstruction value of the current point determined based on the first quantized residual value.

Illustratively, λ may be a parameter obtained through a number of tests, alternatively, the test range of λ may be approximately 0.0 to 4.0, that is, the value range of λ may be 0.0 to 4.0. Of course, the test range of λ may be other numerical ranges, which is not particularly limited by the present application. For example, λ may be a parameter tested in case of a quantization step size or quantization parameter of the current point, for example, the quantization step size of the current point is 0.1. In other words, λ and quantization parameter or quantization step also form a certain functional relationship, such as λ=qp×0.1. Alternatively, different QPs may correspond to different valued λ, for example, 6 QPs (48, 40, 32, 24, 16, 8) for point cloud standard test environment C1 (test condition with non-destructive geometry near-lossy properties), where each of the 6 QPs may correspond to a valued λ. For example, the point cloud standard test environment CY (a test condition with a near-lossless geometry lossy attribute) corresponds to 5 QPs (10, 16, 22, 28, 34), and each QP in the 5 QPs may correspond to a valued λ. Of course, the above QP values are merely examples of the present application and should not be construed as limiting the present application.

In this embodiment, when calculating the rate distortion cost of the first quantized residual value, D is designed to be used to represent an error between an original value of an attribute of the current point and a reconstructed value of an attribute of the current point determined based on the first quantized residual value, so that J can consider the reconstructed quality of the reconstructed value of the attribute of the current point determined based on the first quantized residual value, R is designed to be used to represent the number of bits required by the first quantized residual value in the code stream, so that J can consider the code rate of the current point, that is, J can consider the coding efficiency under the quantization scale corresponding to the first quantized residual value, that is, by the design of D and R, the rate distortion cost of the first quantized residual value can consider both the reconstructed quality and the coding efficiency under the quantization scale corresponding to the first quantized residual value, and further ensure the coding performance. In addition, by introducing λ, it is advantageous to combine the designs of D and R in the case that all points in the current point cloud adopt the same quantization step, so that not only the first quantization residual value and the second quantization residual value are in different quantization scales, but also the rate distortion cost of the first quantization residual value and the rate distortion cost of the second quantization residual value are considered simultaneously, so that the final quantization residual value selected by the current point is the third quantization residual value with the best quantization scale, and the encoding efficiency of the current point is avoided from being too low or the reconstruction quality is too low, and further, the encoding performance of the encoder is improved.

Illustratively, the attribute reconstruction value of the current point may be used as a neighbor candidate of the subsequent point, and the attribute information of the subsequent point is predicted by using the reconstruction value of the current point.

Illustratively, the encoder may reconstruct the value of the attribute of the current point determined based on the first quantized residual value by the following formula:

Recon1＝attrResidualQuant1×Qstep+attrPred；

wherein Recon1 represents the attribute reconstruction value of the current point determined based on the first quantized residual value, attrResidualQuant represents the first quantized residual value, qstep represents the quantization step size, and attrPred represents the attribute prediction value of the current point. Wherein Qstep is calculated from the quantization parameters (Quantization Parameter, qp).

In some embodiments, λ is determined from a quantization parameter of the current point and at least one of:

The sequence type of the current point cloud and the current component of the current point.

For example, the value of λ corresponding thereto may be different for different sequence types. Alternatively, λ may be a parameter obtained through a number of tests, for example, λ may be a parameter obtained through a test under the sequence type of the current point cloud; in other words, the corresponding lambda value can be obtained by using point clouds of different sequence types through a test or training mode.

For example, the value of λ corresponding thereto may be different for different components of the current point. Alternatively, λ may be a parameter that is tested in a number of tests, for example, λ may be a parameter that is tested at the current component of the current point. In other words, the corresponding lambda value can be obtained by using different components of the point cloud midpoint through a test or training mode. Alternatively, the current component may be a component of the current point to be encoded. For example, the values of λ corresponding to the V component, the U component, and the Y component may be the same or different. For another example, the values of λ corresponding to the R component, the G component, and the B component may be the same or different. Alternatively, the coding scheme provided by the present application may be applied to only a part of the components of the current point, or may be applied to all the components of the current point, which is not limited in detail by the present application.

The scheme of determining the first quantized residual value of the current point is described below in connection with a G-PCC framework.

In some embodiments, the S320 may include:

Determining an adopted prediction mode of a current point to be coded based on at least one neighbor point positioned before the current point in the coding sequence;

And determining a first quantized residual value of the current point based on a prediction mode adopted by the current point.

The encoder determines a prediction mode adopted by the current point, predicts an attribute value of the current point based on the prediction mode adopted by the current point to obtain an attribute prediction value of the current point, and obtains a first quantized residual value of the current point based on the attribute prediction value and the attribute original value of the current point.

In some embodiments, the attribute difference of the at least one neighbor point is determined based on the difference in the respective components of the at least one neighbor point; if the attribute difference of the at least one neighbor point is smaller than a first threshold value, determining a first prediction mode as a prediction mode adopted by the current point; the first prediction mode refers to a prediction mode for predicting an attribute value of the current point based on a weighted average of the at least one neighbor point.

Illustratively, when predicting a current point in the point cloud, the encoder creates a plurality of prediction mode candidates based on neighbor point search results on the LOD where the current point is located. For example, when attribute information of a current point is encoded by using a prediction method, an encoder first finds 3 neighbor points located before the current point based on neighbor point search results on the LOD where the current point is located, and creates 4 prediction mode candidates, that is, a prediction mode (predMode) with an index value of 0 to 3. Wherein, the prediction mode with index of 0 refers to determining a weighted average value of the reconstructed attribute values of 3 neighbor points as an attribute prediction value of the current point based on the distance between the 3 neighbor points and the current point; the prediction mode with index of 1 refers to taking the attribute reconstruction value of the nearest neighbor point in the 3 neighbor points as the attribute prediction value of the current point; the prediction mode with index of 2 refers to taking the attribute reconstruction value of the next neighbor point as the attribute prediction value of the current point; the prediction mode with index of 3 refers to that the attribute reconstruction value of the neighbor points except the nearest neighbor point and the next neighbor point in the 3 neighbor points is used as the attribute prediction value of the current point. After the encoder obtains each prediction mode, candidates of the attribute prediction value of the current point may be obtained based on each prediction mode, and an optimal attribute prediction value may be selected using a rate-distortion optimization (Rate distortion optimization, RDO) technique, and then the selected attribute prediction value may be arithmetically encoded. For example, when determining the prediction mode used by the current point, the encoder may first calculate the maximum difference maxDiff of its attribute for at least one neighboring point of the current point, then compare maxDiff to a set threshold, and use the prediction mode with index 0 (i.e., the first prediction mode) if it is smaller than the set threshold (i.e., the first threshold).

In some embodiments, a difference in the attribute of the at least one neighbor point is determined by a maximum difference in the differences of the at least one neighbor point over the respective components.

Illustratively, when the encoder calculates the maximum difference maxDiff of the attribute of at least one neighbor point of the current point, the encoder may first calculate the maximum difference of the at least one neighbor point of the current point on each component; for example, the maximum difference in R component of at least one neighbor point of the current point is max (R1, R2, R3) -min (R1, R2, R3), and the maximum difference in G and B components is max (G1, G2, G3) -min (G1, G2, G3) and max (B1, B2, B3) -min (B1, B2, B3), respectively; then, the largest difference in R, G, B components was selected as maxDiff, i.e., maxDiff =max (max (R1, R2, R3) -min (R1, R2, R3), max (G1, G2, G3) -min (G1, G2, G3), max (B1, B2, B3) -min (B1, B2, B3)); after the encoder obtains maxDiff, the index of the prediction mode of the current point may be 0 if it is smaller than the set threshold (i.e., predmode=0) based on maxDiff being compared with the set threshold (i.e., the first threshold).

In some embodiments, the method 300 may further comprise:

If the attribute difference of the at least one neighbor point is greater than or equal to the first threshold, determining a prediction mode with the minimum rate distortion cost in at least one second prediction mode as the prediction mode adopted by the current point; the at least one second prediction mode corresponds to the at least one neighbor point one by one, and the second prediction mode refers to a prediction mode in which an attribute reconstruction value of a neighbor point corresponding to the second prediction mode in the at least one neighbor point is used as an attribute prediction value of the current point;

And writing the prediction mode adopted by the current point into the code stream.

Illustratively, the encoder first calculates the maximum difference maxDiff of its attributes for at least one neighbor point of the current point, compares maxDiff to a set threshold, and uses a prediction mode of weighted average of neighbor point attribute values if less than the set threshold; otherwise, the optimal prediction mode is selected for the current point by using RDO technology. Specifically, when the encoder calculates the maximum difference maxDiff of the attribute of at least one neighbor point of the current point, the encoder may calculate the maximum difference of the at least one neighbor point of the current point on each component; for example, the maximum difference of at least one neighbor point of the current point on the R component is max (R1, R2, R3) -min (R1, R2, R3), and the maximum difference of at least one neighbor point of the current point on the G and B components is max (G1, G2, G3) -min (G1, G2, G3) and max (B1, B2, B3) -min (B1, B2, B3), respectively; the encoder may then select the largest difference in R, G, B components as maxDiff, i.e., maxDiff =max (max (R1, R2, R3) -min (R1, R2, R3), max (G1, G2, G3) -min (G1, G2, G3), max (B1, B2, B3) -min (B1, B2, B3)); after the encoder obtains maxDiff, based on maxDiff and comparing with a set threshold (i.e., the first threshold), if the index of the prediction mode of the current point is 0 if the index is smaller than the set threshold, i.e., predmode=0; if greater than or equal to the set threshold, the encoder may determine an index of the prediction mode used by the current point using RDO techniques for the current point. For RDO techniques, the encoder may calculate a corresponding rate-distortion cost for each prediction mode of the current point, and then select the prediction mode with the smallest rate-distortion cost, i.e., the optimal prediction mode, as the attribute prediction mode of the current point.

In some embodiments, the method 300 may further comprise:

determining a quantized residual value of the current point on each component based on attribute reconstruction values of the neighbor points corresponding to the second prediction mode on each component;

And determining the sum of quantized residual values of the current point on each component as the rate distortion cost of the second prediction mode.

J _{indx_i}＝D _{indx_i}+λ×R _{indx_i}；

In some embodiments, the S330 may include:

determining an attribute predicted value of the current point based on a predicted mode adopted by the current point;

calculating a difference value between an original value of the attribute of the current point and a predicted value of the attribute of the current point;

Determining the ratio of the difference value to the quantization step length adopted by the current point as the first quantization residual value; and determining the quantization step length adopted by the current point according to the quantization parameter of the current point.

For example, the encoding end may first determine the attribute predicted value of the current point on the first component according to the prediction mode adopted by the at least one neighboring point on the first component, then calculate the difference between the attribute original value of the current point on the first component and the attribute predicted value of the current point on the first component, and finally determine the ratio of the difference between the attribute original value of the current point on the first component and the attribute predicted value of the current point on the first component and the quantization step adopted by the current point as the first quantization residual value. Alternatively, the first component may be a V component, a U component, or a Y component, and may also be an R component, a G component, or a B component. Alternatively, the quantization step used by the current point may be a quantization step used by the current point cloud, that is, all points in the current point cloud use the same quantization step.

Illustratively, after determining the prediction mode used by the current point, the encoder may determine the attribute predicted value attrPred of the current point based on the determined prediction mode, and then subtract the attribute original value attrValue of the current point from the attribute predicted value attrPred of the current point and quantize the result thereof to obtain the first quantized residual value attrResidualQuant1 of the current point. For example, after the encoder obtains the attribute prediction value of the current point, the first quantized residual value may be determined by the following formula:

attrResidualQuant1＝(attrValue-attrPred)/Qstep；

Wherein attrResidualQuant denotes the first quantized residual value, attrPred denotes the attribute predicted value of the current point, attrValue denotes the attribute original value of the current point, and Qstep denotes the quantization step size. Wherein Qstep is calculated from the quantization parameters (Quantization Parameter, qp).

In some embodiments, the coding order comprises a detail layer LOD order.

Illustratively, the encoder may obtain the original sequence of the current point cloud by reordering the current point cloud, and after obtaining the original sequence of the current point cloud, the encoder may divide the point cloud according to the original sequence of the current point cloud, so as to predict the attribute information of the points in the point cloud based on the LOD.

The results of the protocol provided by the present application as tested on the G-PCC reference software TMC13V14.0 are described below in connection with Table 2.

Wherein, table 2 is BD-rate of each component of each point cloud sequence of Cat1A type in the case of lossless geometry (lossless geometry) compression and near lossless property (near-loss attributes) compression and in the case of a value of λ of 1.2; wherein Cat1-A represents a point cloud of points comprising color information of the points, end-to-End representative rate distortion (End-to-End Bit distortion, end-to-End BD-rate) is an index for measuring algorithm performance or coding performance, and represents that the coding algorithm provided by the application has better performance when the whole is negative value compared with the transformation condition of the original coding algorithm on code rate and PSNR, and Hausdorff BD-rate is also a standard of unique judging performance under the condition of near lossless attribute, and represents the maximum difference between attribute values, and also represents that the new algorithm has better performance when the whole value is negative number. L, cb and Cr (also called Y, U, V) represent the performance of the luminance and chrominance three components of the point cloud color.

TABLE 2

As shown in table 2, under the conditions of lossless geometric compression and near lossless attribute compression and the value of λ is 1.2, the End-to-End BD-rate of the sequence 2 has performance improvement on each component, the Hausdorff BD-rate of the sequence 1 has performance improvement on each component, especially on Cb component, the Hausdorff BD-rate of the sequence 2 and the Hausdorff BD-rate of the sequence 3 also have performance improvement on L component and Cr component, that is, the scheme provided by the application can improve coding performance.

The scheme of determining the first quantized residual value of the current point is described below in connection with the AVS-PCC framework.

In some embodiments, the S120 may include:

determining a weighted average value of the at least one neighbor point as an attribute predicted value of the current point;

For example, the encoding end may determine, as the attribute predicted value of the current point on the first component, a weighted average value of the at least one neighboring point on the first component, calculate a difference between the attribute original value of the current point on the first component and the attribute predicted value of the current point on the first component, and finally determine, as the first quantized residual value, a ratio of a difference between the attribute original value of the current point on the first component and the attribute predicted value of the current point on the first component and a quantization step size adopted by the current point. Alternatively, the first component may be a V component, a U component, or a Y component, and may also be an R component, a G component, or a B component. Alternatively, the quantization step used by the current point may be a quantization step used by the current point cloud, that is, all points in the current point cloud use the same quantization step.

Illustratively, after the encoder obtains the attribute prediction value of the current point, the first quantized residual value may be determined by the following formula:

attrResidualQuant1＝(attrValue-attrPred)/Qstep；

In some embodiments, the coding order comprises a morton re-order or a hilbert order.

The following describes the results obtained by testing the scheme provided by the application on the latest point cloud compression platform PCRMv4.0 of the AVS with reference to tables 3 to 6.

Wherein, table 3 is BD-rate of each component of Cat1A to Cat1a+cat2 in the case of limiting the lossy geometry (limit-lossy geometry) compression and the lossy attribute (lossy attributes) compression and in the case of a value of λ of 0.5; where Cat1A represents a point cloud of points including only reflectance information of the points, cat1B represents a point cloud of points including only color information of the points, cat1C represents a point cloud of points including both the color information and the reflectance information of the points, cat2 represents a point cloud of points including the reflectance information and other attribute information of the points, and Cat3 represents a point cloud of points including the color information and other attribute information of the points. Table 4 is BD-rate of color information of Cat1A to Cat1a+cat2 in the case of lossless geometry (lossless geometry) compression and lossy attribute compression and in the case of a value of λ of 0.5. Table 5 is BD-rate of each component of Cat1A to Cat1a+cat2 in the case of lossless geometric compression and lossy attribute compression and in the case of a value of λ of 0.8. Table 6 is BD-rate of each component of Cat1A to Cat1a+cat2 in the case where lossy geometric compression and lossy attribute compression are limited and where the value of λ is 0.8. The total average (overlap) represents the overall performance of all sequences. The representative rate distortion (BD-rate) is an index for measuring the performance or coding performance of the algorithm, which indicates that the overall negative value of the coding algorithm provided by the application indicates that the performance is better compared with the transformation condition of the original coding algorithm on the code rate and the PSNR. L, cb and Cr (also called Y, U, V) represent the performance of the three components of luminance and chromaticity of the point cloud color, and reflectance (reflectance) represents the performance of the reflectance property.

TABLE 3 Table 3

As shown in table 3, in the case of limiting the lossy geometric compression and the lossy attribute compression and the value of λ is 0.5, the total average value of BD-rate can be raised by 7.6% on the L component, 1.7% on the Cb component, 4.4% on the Cr component, and 2.5% on the reflectivity component, that is, the scheme provided by the present application can raise the encoding performance.

TABLE 4 Table 4

As shown in table 4, in the case of lossless geometric compression and lossy attribute compression and the value of λ is 0.5, the total average value of BD-rates can be raised by 7.2% on the L component, 6.5% on the Cb component, and 6.6% on the Cr component, that is, the scheme provided by the present application can raise the coding performance.

TABLE 5

As shown in table 5, in the case of limiting the lossy geometric compression and the lossy attribute compression and the value of λ is 0.8, the total average value of BD-rate can be raised by 7.6% on the L component, 2.0% on the Cb component, and 4.5% on the Cr component, that is, the scheme provided by the present application can raise the encoding performance.

TABLE 6

As shown in table 6, in the case of lossless geometric compression and lossy attribute compression and the value of λ is 0.5, the total average value of BD-rates can be raised by 8.3% on the L component, 6.9% on the Cb component, and 6.9% on the Cr component, that is, the scheme provided by the present application can raise the coding performance.

The following describes aspects of the application in connection with specific embodiments.

In this embodiment, the first quantized residual value of the current point is a quantized residual value attrResidualQuant1 determined based on an attribute predicted value obtained by performing attribute prediction on the current point by the encoder, and attrResidualQuant2 obtained after attrResidualQuant1 is set to 0 is taken as at least one second quantized residual value of the current point, based on which the encoder can determine the final quantized residual value of the current point in attrResidualQuant and attrResidualQuant, that is, the third quantized residual value referred to above.

Illustratively, after the encoder obtains the attribute prediction value of the current point through the predictive transform (PREDICTING TRANSFORM), the attribute reconstruction value Recon1 of the current point may be determined based on the attribute prediction value and attrResidualQuant1, and the encoder sets attrResidualQuant1 to 0 and takes it as attrResidualQuant2. Further, the encoder may determine an attribute reconstruction value Recon2 of the current point based on the attribute prediction value and attrResidualQuant. Then, the encoder may calculate attrResidualQuant a rate distortion cost of the current point and a rate distortion cost of attrResidualQuant based on the attribute reconstruction value Recon1 of the current point and the attribute reconstruction value Recon2 of the current point, respectively, determine attrResidualQuant with the minimum rate distortion cost as a final quantized residual value of the current point, and finally encode the final quantized residual value of the current point to implement encoding of attribute information of the current point.

Fig. 18 is a schematic flow chart of an encoding method 400 provided by an embodiment of the present application. The method 400 may be performed by an encoder or an encoding framework, such as the encoding framework shown in fig. 4.

As shown in fig. 18, the encoding method 400 may include:

S410, the encoder acquires the current point cloud.

S420, the encoder predicts the attribute of the current point to obtain an attribute predicted value of the current point.

For the G-PCC framework, when predicting a current point in the point cloud, the encoder creates a plurality of prediction mode candidates based on neighbor point search results on the LOD where the current point is located. For example, when attribute information of a current point is encoded by using a prediction method, an encoder first finds 3 neighbor points located before the current point based on neighbor point search results on the LOD where the current point is located, and creates 4 prediction mode candidates, that is, a prediction mode (predMode) with an index value of 0 to 3. Wherein, the prediction mode with index of 0 refers to determining a weighted average value of the reconstructed attribute values of 3 neighbor points as an attribute prediction value of the current point based on the distance between the 3 neighbor points and the current point; the prediction mode with index of 1 refers to taking the attribute reconstruction value of the nearest neighbor point in the 3 neighbor points as the attribute prediction value of the current point; the prediction mode with index of 2 refers to taking the attribute reconstruction value of the next neighbor point as the attribute prediction value of the current point; the prediction mode with index of 3 refers to taking attribute reconstruction values of neighbor points except the nearest neighbor point and the next neighbor point in the 3 neighbor points as attribute prediction values of the current point; after obtaining candidates for the attribute predictor of the current point based on the above-described various prediction modes, the encoder may select the best attribute predictor using a rate distortion optimization (Rate distortion optimization, RDO) technique and then arithmetically encode the selected attribute predictor. When the encoder determines the prediction mode used by the current point, the encoder may first calculate the maximum difference maxDiff of its attributes for at least one neighbor point of the current point, compare maxDiff to a set threshold, and use the prediction mode with index 0 (i.e., the first prediction mode) if it is smaller than the set threshold (i.e., the first threshold).

Illustratively, when the encoder determines the prediction mode used by the current point, the encoder may first calculate the maximum difference maxDiff of its attributes for at least one neighbor point of the current point, compare maxDiff to a set threshold, and if greater than or equal to the set threshold, the encoder may determine the index of the prediction mode used by the current point for the current point using RDO techniques. For RDO techniques, the encoder may calculate a corresponding rate-distortion cost for each prediction mode of the current point, and then select the prediction mode with the smallest rate-distortion cost, i.e., the optimal prediction mode, as the attribute prediction mode of the current point.

For the AVS-PCC framework, the encoder may reorder the current point cloud and make attribute predictions. Alternatively, methods of reordering are Morton reordering and Hilbert (Hilbert) reordering. The encoder can obtain the coding sequence of the current point cloud by reordering the current point cloud, and after the encoder obtains the coding sequence of the current point cloud, the encoder can predict the attribute information of the points in the point cloud according to the coding sequence of the current point cloud. Optionally, if the geometric information of the current point and the encoded point before the encoding sequence is the same, that is, the repeated point, the attribute reconstruction value of the repeated point is used as the attribute prediction value of the current point. Optionally, if the geometric information of the current point and the encoded point before the encoding sequence is different, that is, not the repetition point, the encoder may select the previous m points as neighbor candidate points of the current point according to the encoding sequence, then calculate manhattan distances between the m points and the geometric information of the current point, determine n points closest to the m points as neighbor points of the current point, and then calculate a weighted average of the attributes of the n neighbors with the inverse of the distance as a weight, as an attribute prediction value of the current point.

S430, the encoder determines the residual value of the current point based on the attribute original value and the attribute predicted value of the current point.

S440, the encoder quantizes the residual value of the current point to obtain a quantized residual value attrResidualQuant1 of the current point.

Illustratively, after the encoder obtains the attribute prediction value of the current point, attrResidualQuant1 may be determined by the following equation:

attrResidualQuant1＝(attrValue-attrPred)/Qstep；

wherein attrPred denotes an attribute predicted value of the current point, attrValue denotes an attribute original value of the current point, and Qstep denotes a quantization step size. Wherein Qstep is calculated from the quantization parameters (Quantization Parameter, qp).

S441, the encoder determines an attribute reconstruction value Recon1 of the current point based on attrResidualQuant and the attribute prediction value of the current point.

Illustratively, the encoder may reconstruct the value Recon1 based on the attribute of the current point determined by attrResidualQuant1 by the following formula:

Recon1＝attrResidualQuant1×Qstep+attrPred；

Wherein Recon1 represents the attribute reconstruction value of the current point determined based on attrResidualQuant1, qstep represents the quantization step size, and attrPred represents the attribute prediction value of the current point. Wherein Qstep is calculated from the quantization parameters (Quantization Parameter, qp).

S442, the encoder determines attrResidualQuant a rate-distortion cost J ₁ based on Recon 1.

Illustratively, the encoder calculates attrResidualQuant a rate distortion cost according to the following formula:

J ₁＝D ₁+λ×R ₁；

Where J ₁ represents the rate-distortion cost of attrResidualQuant1, D ₁ represents the error between the original value attrValue of the attribute of the current point and Recon1, λ is determined according to the quantization parameter of the current point, and R ₁ represents the number of bits required by attrResidualQuant1 in the code stream.

Illustratively, D may be obtained by the following formula:

D ₁＝|attrValue-Recon1|；

Wherein attrValue denotes an attribute original value of a current point, and Recon1 denotes an attribute reconstruction value of the current point determined based on attrResidualQuant 1.

S450, the encoder determines the quantized residual value attrResidualQuant of the current point based on attrResidualQuant a 1.

Illustratively, the encoder sets attrResidualQuant to 0 and treats it as attrResidualQuant. I.e. attrResidualQuant 2=0.

S451, the encoder determines an attribute reconstruction value Recon2 of the current point based on attrResidualQuant2 and the attribute prediction value of the current point.

Illustratively, the encoder may reconstruct the value Recon2 based on the attribute of the current point determined by attrResidualQuant2 by the following formula:

Recon2＝attrResidualQuant2×Qstep+attrPred；

Wherein, since attrResidualQuant2 =0, the attribute reconstruction value of the current point obtained based on attrResidualQuant2 is equal to the attribute prediction value of the current point, namely, recon2= attrPred.

S452, the encoder determines attrResidualQuant a rate-distortion cost J ₂ of 2 based on Recon 2.

Illustratively, the encoder calculates the rate distortion cost for the quantized residual value attrResidualQuant1 according to the following equation:

J ₂＝D ₂+λ×R ₂；

Where J ₂ represents the rate distortion cost of attrResidualQuant2, D ₂ represents the error between the original value attrValue of the attribute of the current point and Recon2, λ ₂ is determined according to the quantization parameter of the current point, and R ₁ represents the number of bits required by attrResidualQuant2 in the code stream.

Illustratively, D ₂ can be derived by the following formula:

D ₂＝|attrValue-Recon2|；

Wherein attrValue denotes an attribute original value of a current point, and Recon2 denotes an attribute reconstruction value of the current point determined based on attrResidualQuant. I.e., attrResidualQuant 2=0, recon2= attrPred.

S460, the encoder determines J ₁＞J ₂?

S461, if J ₁＞J ₂, the encoder encodes attrResidualQuant2.

For example, if J ₁＞J ₂ represents that the coding cost performance of attrResidualQuant is relatively high, then the coding scheme of attrResidualQuant2 may be selected, i.e., attrResidualQuant is set to 0 and then the final quantized residual value of the current point is encoded, that is, the encoder determines attrResidualQuant2 as the final quantized residual value of the current point (i.e., the third quantized residual value described above) and encodes attrResidualQuant 2.

S462, if J ₁≤J ₂, the encoder encodes attrResidualQuant1.

For example, if J ₁≤J ₂ represents that the encoding cost of attrResidualQuant1 is relatively small, then the encoding scheme of attrResidualQuant1 may be selected, i.e., attrResidualQuant is directly used as the final quantized residual value of the current point and encoded, that is, the encoder determines attrResidualQuant1 as the final quantized residual value of the current point (i.e., the third quantized residual value described above) and encodes attrResidualQuant 1.

In this embodiment, the encoder calculates the rate distortion cost of attrResidualQuant1 and the rate distortion cost of attrResidualQuant, determines attrResidualQuant with the minimum rate distortion cost as the final quantization residual value of the current point, and finally encodes the final quantization residual value of the current point to implement encoding of attribute information of the current point, so that not only quantization scales of attrResidualQuant1 and attrResidualQuant are different, but also the rate distortion cost of attrResidualQuant1 and the rate distortion cost of attrResidualQuant take the reconstruction quality and the encoding efficiency of the current point into consideration, so that the final quantization residual value selected by the current point is the third quantization residual value with the optimal quantization scale, and the encoding efficiency of the current point is prevented from being too low or the reconstruction quality is prevented from being too low, and further, the encoding performance of the encoder is enabled.

Of course, FIG. 18 is merely an example of the present application and should not be construed as limiting the application.

For example, in fig. 18, the encoder takes attrResidualQuant2 obtained after attrResidualQuant is set to 0 as at least one second quantized residual value of the current point, but the present application is not limited thereto. For example, in other alternative embodiments, when the value of the first quantized residual value is N, the number of the at least one second quantized residual values is N-1, and the values of the at least one second quantized residual values are respectively: 0, …, N-1. Wherein N is a positive integer. Based on this, the encoder may obtain N quantized residual values and an attribute reconstruction value, i.e., may obtain a rate distortion cost, i.e., J ₁,J _2,…,J _N, for each of the N quantized residual values. For example, assuming that the first quantized residual of the current point is 10, the at least one second quantized residual value may be sequentially set to 0-9, and then 10 attribute reconstruction values may be obtained, where 10 rate-distortion costs may be obtained by using a calculation formula of the rate-distortion cost, i.e., J ₁,J _2,…,J ₁₀; and comparing the sizes of J ₁,J _2,…,J ₁₀, selecting the rate distortion cost Min (J ₁,J _2,…,J ₁₀) with the minimum cost and determining the quantized residual value corresponding to the minimum rate distortion cost as the final quantized residual value of the current point, finally encoding the final quantized residual value of the current point, and repeating the process until all points in the current point cloud are encoded, so as to encode the attribute information of the current point.

The application also provides a decoding method corresponding to the encoding method.

As shown in fig. 19, the decoder first obtains APS and ABH for subsequent use by decoding the attribute code stream, then obtains the attribute predicted value attrPred of the first point from the attribute code stream, obtains the quantized residual value of the first point from the attribute code stream, and dequantizes the quantized residual value to obtain the residual value of the first point, based on this, the decoder obtains the attribute reconstructed value of the first point from the attribute predicted value of the first point, and the attribute reconstructed value of the first point can be used as a neighbor candidate of the subsequent point, then analyzes the quantized residual value of the second point from the attribute code stream, dequantizes the quantized residual value and adds the dequantized result to the attribute predicted value of the second point to obtain the attribute reconstructed value of the second point, and so on until the last point of the point cloud is decoded.

Illustratively, the decoder may dequantize the quantized residual value of the current point (i.e., the third quantized residual value related to the present application) based on the following formula to obtain the residual value of the current point:

attrResidual＝attrResidualQuant×Qstep；

Wherein attrResidual denotes a residual value of the current point, attrResidualQuant denotes a quantized residual value of the current point, and Qstep denotes a quantization step size. Wherein Qstep is calculated from the quantization parameters (Quantization Parameter, qp).

Illustratively, the decoder may derive the attribute reconstruction value for the current point based on the following formula:

Recon＝attrResidual+attrPred；

Wherein Recon represents the attribute reconstruction value of the current point determined based on the quantized residual value of the current point, attrResidual represents the residual value of the current point, and attrPred represents the attribute prediction value of the current point.

The preferred embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be regarded as the disclosure of the present application. It should be further understood that, in the various method embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application.

An encoder or decoder provided by an embodiment of the present application will be described with reference to the accompanying drawings.

Fig. 20 is a schematic block diagram of an encoder 500 provided by an embodiment of the present application.

As shown in fig. 20, the encoder 500 may include:

a determining unit 510, configured to:

Determining at least one second quantized residual value for the current point;

the encoding unit 520 is configured to encode the third quantized residual value to obtain a code stream.

In some embodiments, the determining unit 510 is specifically configured to:

calculating a rate distortion cost of the first quantized residual value or the second quantized residual value according to the following formula:

J ₁＝D ₁+λ×R ₁；

In some embodiments, the determining unit 510 is specifically configured to:

Determining a property difference of the at least one neighbor point based on the difference of the at least one neighbor point on each component;

If the attribute difference of the at least one neighbor point is smaller than a first threshold value, determining a first prediction mode as a prediction mode adopted by the current point; the first prediction mode refers to a prediction mode for predicting an attribute value of the current point based on a weighted average of the at least one neighbor point.

In some embodiments, the determining unit 510 is specifically configured to:

And determining the attribute difference of the at least one neighbor point by the largest difference among the differences of the at least one neighbor point on each component.

In some embodiments, the encoding unit 520 is further configured to:

In some embodiments, the determining unit 510 is further configured to:

In some embodiments, the determining unit 510 is specifically configured to:

In some embodiments, the coding order comprises a detail layer LOD order.

In some embodiments, the determining unit 510 is specifically configured to:

It should be noted that the encoder 500 may also be combined with the encoding framework shown in fig. 4, i.e., the units in the encoder 500 may be replaced or combined with relevant portions of the encoding framework. For example, the determining unit 510 may correspond to the attribute Transform (Transfer attributes) unit 111, the Region ADAPTIVE HIERARCHICAL Transform (RAHT) unit 112, the prediction change (PREDICTING TRANSFORM) unit 113, or the lifting change (lifting Transform) unit 114 in the encoding frame, and for example, the encoding unit 520 may correspond to the second arithmetic encoding unit 116 in the encoding frame.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the encoder 500 may correspond to a corresponding main body in the method 300 or the method 400 for executing the embodiment of the present application, and each unit in the encoder 500 is not described herein for brevity in order to implement a corresponding flow in the method 300 or the method 400, respectively.

It should also be understood that each unit in the encoder 500 according to the embodiment of the present application may be separately or all combined into one or several additional units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiment of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the encoder 500 may also include other units, and in actual practice, these functions may also be implemented with the assistance of other units, and may be implemented by the cooperation of multiple units. According to another embodiment of the present application, the encoder 500 according to the embodiment of the present application may be constructed by running a computer program (including program code) capable of executing steps involved in the respective methods on a general-purpose computing device of a general-purpose computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and the encoding method provided by the embodiment of the present application may be implemented. The computer program may be recorded on a computer readable storage medium, and loaded onto any electronic device having data processing capabilities and executed therein to implement a corresponding method according to an embodiment of the present application.

In other words, the units referred to above may be implemented in hardware, or may be implemented by instructions in software, or may be implemented in a combination of hardware and software. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software in the decoding processor. Alternatively, the software may reside in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

Fig. 21 is a schematic block diagram of a codec device 600 according to an embodiment of the present application.

As shown in fig. 21, the codec device 600 includes at least a processor 610 and a computer-readable storage medium 620. Wherein the processor 610 and the computer-readable storage medium 620 may be connected by a bus or other means. The computer readable storage medium 620 is used to store a computer program 621, the computer program 621 including computer instructions, and the processor 610 is used to execute the computer instructions stored by the computer readable storage medium 620. Processor 610 is a computational core as well as a control core of codec device 600, which is adapted to implement one or more computer instructions, in particular to load and execute one or more computer instructions to implement the corresponding method flows or corresponding functions.

By way of example, the processor 610 may also be referred to as a central processing unit (Central Processing Unit, CPU). The processor 610 may include, but is not limited to: a general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

By way of example, computer readable storage medium 620 can be high speed RAM memory or Non-volatile memory (Non-VolatileMemory), such as at least one disk memory; alternatively, it may be at least one computer-readable storage medium located remotely from the aforementioned processor 610. In particular, computer-readable storage media 620 include, but are not limited to: volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct memory bus RAM (DR RAM).

In one implementation, the codec device 600 may be the encoding framework shown in fig. 6 or the encoder 500 shown in fig. 20; the computer readable storage medium 620 has stored therein first computer instructions; the first computer instructions stored in the computer readable storage medium 620 are loaded and executed by the processor 610 to implement the corresponding steps in the encoding method provided in the embodiment of the present application, and are not repeated herein.

According to another aspect of the present application, the embodiment of the present application also provides a computer-readable storage medium (Memory), which is a Memory device in the codec device 600, for storing programs and data. Such as computer-readable storage medium 620. It is understood that the computer readable storage medium 620 herein may include a built-in storage medium in the codec device 600, and may include an extended storage medium supported by the codec device 600. The computer readable storage medium provides a storage space that stores an operating system of the codec device 600. Also stored in this memory space are one or more computer instructions, which may be one or more computer programs 621 (including program code), adapted to be loaded and executed by the processor 610. These computer instructions are for use in a computer to perform the encoding methods provided in the various alternatives described above.

According to another aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. Such as computer program 621. At this time, the codec device 600 may be a computer, and the processor 610 reads the computer instructions from the computer-readable storage medium 620, and the processor 610 executes the computer instructions so that the computer performs the encoding method provided in the above-described various alternatives.

In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, runs the processes of, or implements the functions of, embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from one website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.

Those of ordinary skill in the art will appreciate that the elements and process steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Finally, it should be noted that the above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about the changes or substitutions within the technical scope of the present application, and the changes or substitutions are all covered by the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

A method of encoding, comprising:

determining the coding sequence of attribute information of the current point cloud based on the geometric information of the current point cloud;

determining a first quantized residual value of a current point to be encoded based on at least one neighbor point located before the current point in the encoding sequence;

Determining at least one second quantized residual value for the current point;

Determining a third quantized residual value based on the first quantized residual value and the at least one second quantized residual value;

and encoding the third quantized residual value to obtain a code stream.
The method of claim 1, wherein said determining at least one second quantized residual value for the current point comprises:

The at least one second quantized residual value is determined based on the first quantized residual value.
The method of claim 2, wherein the determining the at least one second quantized residual value based on the first quantized residual value comprises:

and determining an integer smaller than the first quantized residual value as the at least one second quantized residual value.
The method of claim 1, wherein said determining at least one second quantized residual value for the current point comprises:

and determining a default residual value as the at least one second quantized residual value.
The method according to any one of claims 1 to 4, wherein said determining a third quantized residual value from said first quantized residual value and said at least one second quantized residual value comprises:

calculating the rate-distortion cost of the first quantized residual value and the rate-distortion cost of the second quantized residual value;

and determining the quantized residual value with the minimum rate distortion cost in the rate distortion cost of the first quantized residual value and the rate distortion cost of the second quantized residual value as the third quantized residual value.
The method of claim 5, wherein the rate-distortion cost of the first quantized residual value is calculated in the same manner as the rate-distortion cost of the second quantized residual value.
The method of claim 5 or 6, wherein said calculating a rate-distortion cost for the first quantized residual value and a rate-distortion cost for the second quantized residual value comprises:

calculating a rate distortion cost of the first quantized residual value or the second quantized residual value according to the following formula:

J ₁＝D ₁+λ×R ₁；

Wherein J ₁ represents a rate distortion cost of the first quantized residual value, D ₁ represents an error between an original value of an attribute of the current point and a reconstructed value of an attribute of the current point determined based on the first quantized residual value, λ is determined according to a quantization parameter of the current point, and R ₁ represents a number of bits required by the first quantized residual value in the code stream; or J ₁ represents the rate distortion cost of the second quantized residual value, D ₁ represents the error between the original value of the attribute of the current point and the reconstructed value of the attribute of the current point determined based on the second quantized residual value, λ is determined according to the quantization parameter of the current point, and R ₁ represents the number of bits required by the second quantized residual value in the code stream.
The method of claim 7, wherein λ is determined from a quantization parameter of the current point and at least one of:

The sequence type of the current point cloud and the current component of the current point.
The method according to any one of claims 1 to 8, wherein said determining a first quantized residual value for a current point to be encoded based on at least one neighboring point in the encoding order that is located before the current point comprises:

Determining an adopted prediction mode of a current point to be coded based on at least one neighbor point positioned before the current point in the coding sequence;

And determining a first quantized residual value of the current point based on a prediction mode adopted by the current point.
The method of claim 9, wherein the determining the adopted prediction mode of the current point based on at least one neighbor point located before the current point to be encoded in the encoding order comprises:

Determining a property difference of the at least one neighbor point based on the difference of the at least one neighbor point on each component;

If the attribute difference of the at least one neighbor point is smaller than a first threshold value, determining a first prediction mode as a prediction mode adopted by the current point; the first prediction mode refers to a prediction mode for predicting an attribute value of the current point based on a weighted average of the at least one neighbor point.
The method of claim 10, wherein the determining the attribute differences of the at least one neighbor point based on the differences of the at least one neighbor point on the respective components comprises:

And determining the attribute difference of the at least one neighbor point by the largest difference among the differences of the at least one neighbor point on each component.
The method according to claim 10 or 11, characterized in that the method further comprises:

If the attribute difference of the at least one neighbor point is greater than or equal to the first threshold, determining a prediction mode with the minimum rate distortion cost in at least one second prediction mode as the prediction mode adopted by the current point; the at least one second prediction mode corresponds to the at least one neighbor point one by one, and the second prediction mode refers to a prediction mode in which an attribute reconstruction value of a neighbor point corresponding to the second prediction mode in the at least one neighbor point is used as an attribute prediction value of the current point;

And writing the prediction mode adopted by the current point into the code stream.
The method according to claim 12, wherein the method further comprises:

determining a quantized residual value of the current point on each component based on attribute reconstruction values of the neighbor points corresponding to the second prediction mode on each component;

And determining the sum of quantized residual values of the current point on each component as the rate distortion cost of the second prediction mode.
The method according to any one of claims 10 to 13, wherein the determining a first quantized residual value for the current point based on a prediction mode employed by the current point comprises:

determining an attribute predicted value of the current point based on a predicted mode adopted by the current point;

calculating a difference value between an original value of the attribute of the current point and a predicted value of the attribute of the current point;

Determining the ratio of the difference value to the quantization step length adopted by the current point as the first quantization residual value; and determining the quantization step length adopted by the current point according to the quantization parameter of the current point.
The method according to any of the claims 9 to 14, characterized in that the coding order comprises a detail layer LOD order.
The method according to any one of claims 1 to 8, wherein said determining a first quantized residual value for a current point to be encoded based on at least one neighboring point in the encoding order that is located before the current point comprises:

determining a weighted average value of the at least one neighbor point as an attribute predicted value of the current point;

calculating a difference value between an original value of the attribute of the current point and a predicted value of the attribute of the current point;

Determining the ratio of the difference value to the quantization step length adopted by the current point as the first quantization residual value; and determining the quantization step length adopted by the current point according to the quantization parameter of the current point.
The method of claim 16, wherein the encoding order comprises a morton re-order or a hilbert order.
An encoder is provided, which is used for encoding a data signal, characterized by comprising the following steps:

a determining unit configured to:

determining the coding sequence of attribute information of the current point cloud based on the geometric information of the current point cloud;

determining a first quantized residual value of a current point to be encoded based on at least one neighbor point located before the current point in the encoding sequence;

Determining at least one second quantized residual value for the current point;

Determining a third quantized residual value based on the first quantized residual value and the at least one second quantized residual value;

and the encoding unit is used for encoding the third quantized residual value to obtain a code stream.
An encoding apparatus, comprising:

a processor adapted to execute a computer program;

A computer readable storage medium having stored therein a computer program which, when executed by the processor, implements the encoding method of any one of claims 1 to 17.
A computer-readable storage medium storing a computer program for causing a computer to execute the encoding method according to any one of claims 1 to 17.
A code stream, characterized in that the code stream is a code stream generated by the method of any one of claims 1 to 17.