CN118485702B

CN118485702B - High-precision binocular vision ranging method

Info

Publication number: CN118485702B
Application number: CN202410926167.4A
Authority: CN
Inventors: 王威; 廖峪; 王建; 杨伟; 袁智高; 张力川
Original assignee: Chengdu Zhonggui Track Equipment Co ltd
Current assignee: Chengdu Zhonggui Track Equipment Co ltd
Priority date: 2024-07-11
Filing date: 2024-07-11
Publication date: 2024-09-20
Anticipated expiration: 2044-07-11
Also published as: CN118485702A

Abstract

The invention belongs to the technical field of image processing, and particularly relates to a high-precision binocular vision ranging method. The method comprises the following steps: step 1: acquiring a left-view image and a right-view image of a target using a binocular vision image acquisition apparatus; step 2: respectively constructing a multi-scale pyramid corresponding to the enhanced right view image and the enhanced left view image to obtain a corresponding multi-scale pyramid image; step 3: based on the left image feature map and the right image feature map, performing preliminary parallax estimation on the lowest resolution layer to obtain a parallax map; step 4: sequentially performing structured light compensation, optical filter compensation, temperature compensation and vibration compensation on the parallax map to obtain a compensated parallax map; step 5: based on the compensated parallax map, a high-precision and robust binocular vision ranging method is provided for calculating the distance from the binocular vision image acquisition equipment to the target, and the ranging capability and adaptability of the system in a complex environment are remarkably improved.

Description

High-precision binocular vision ranging method

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a high-precision binocular vision ranging method.

Background

Binocular vision ranging technology has become an important research direction in the field of computer vision, and has been widely used in various application fields such as autopilot, robot navigation, three-dimensional reconstruction, etc. The binocular vision ranging system utilizes two cameras to shoot images of the same scene at the same time by simulating the parallax principle of human eyes, and calculates the depth information of each point in the scene by analyzing the parallax of the corresponding point in the images. The method has the advantages of non-contact, strong real-time performance, high measurement precision and the like. However, existing binocular vision ranging techniques still face many challenges and problems in practical applications, requiring further research and improvement.

Currently, research on binocular vision ranging technology is mainly focused on parallax estimation, feature extraction, image enhancement, multiple compensation and the like. The prior art mainly comprises the following methods: the conventional binocular vision ranging method mainly depends on feature point matching, such as SIFT, SURF, ORB and other feature extraction algorithms. These algorithms calculate parallax by extracting salient feature points in images and matching in left and right images. However, these methods often suffer from matching errors when processing scenes with less texture or more repetitive texture, resulting in inaccurate disparity estimation. In addition, these methods are sensitive to illumination changes and noise, and have poor robustness in practical applications. The block matching method searches windows with fixed sizes in the left and right images, finds out the window pair with highest similarity, and calculates parallax between the window pairs. The method is simple and visual, has small calculated amount, is easily influenced by noise and shielding when processing complex scenes, and causes the reduction of parallax estimation accuracy. Common block matching algorithms include SSD (Sum of Squared Differences) and SAD (Sum of Absolute Differences), etc. In recent years, methods based on optimization are becoming the mainstream of parallax estimation. The methods represent the cost of parallax estimation by constructing an energy function, and find the parallax distribution with minimum energy by a global optimization technology. Common optimization methods include Graph Cuts (Graph Cuts), belief propagation (Belief Propagation), semi-Global Matching (Semi-Global Matching), and the like. The method can improve the precision and the robustness of parallax estimation to a certain extent, but has higher computational complexity, and is difficult to meet the requirement of real-time application. With the development of deep learning technology, a parallax estimation method based on a Convolutional Neural Network (CNN) is gradually rising. According to the method, high-level features in the image can be automatically learned through large-scale data training, and high-precision parallax estimation is realized. Common deep learning models include DispNet, GC-Net, PSMAT, and the like. These methods perform well in handling complex scenes and lighting changes, but require high computational resources and training data.

Disclosure of Invention

In view of the above, the main purpose of the present invention is to provide a high-precision binocular vision ranging method, and a high-precision and robust binocular vision ranging method is provided, which significantly improves the ranging capability and adaptability of the system in a complex environment.

The technical scheme adopted by the invention is as follows: a high-precision binocular vision ranging method, the method comprising:

step 1: acquiring a left-view image and a right-view image of a target using a binocular vision image acquisition apparatus; respectively carrying out image enhancement on the left-view image and the right-view image to obtain an enhanced right-view image and an enhanced left-view image;

Step 2: respectively constructing a multi-scale pyramid corresponding to the enhanced right view image and the enhanced left view image to obtain a corresponding multi-scale pyramid image; based on the multi-scale pyramid image, extracting features by using a multi-layer convolution network to obtain a left image feature map and a right image feature map;

Step 3: based on the left image feature map and the right image feature map, performing preliminary parallax estimation on the lowest resolution layer to obtain a parallax map;

Step 4: sequentially performing structured light compensation, optical filter compensation, temperature compensation and vibration compensation on the parallax map to obtain a compensated parallax map;

Step 5: based on the compensated disparity map, a distance from the binocular vision image acquisition apparatus to the target is calculated.

Further, let the left-view image beAnd right-view image is; Image enhancement is performed on the left-view image and the right-view image using the following formula, and an enhanced right-view image and an enhanced left-view image are obtained:

；

Wherein, To enhance the left-view image; To enhance the right view image; representing the pixel coordinate position and, In the form of an X-axis coordinate,Is Y-axis coordinate; is a preset enhancement coefficient; Is the pixel mean value of the left-view image; the pixel mean value of the right-view image; is the pixel standard deviation of the left-view image; Is the standard deviation of the pixels of the right-view image.

Further, the number of layers of the multi-scale pyramid in the step 2 ranges from 4 to 8 layers; the multi-scale pyramid is a phase consistency multi-scale pyramid, and the direction number of each layer is in the range of 2 to 6.

Further, the multi-scale pyramid image corresponding to the left-view image is enhancedCalculated using the following formula:

；

Wherein, To enhance the corresponding first of the left-view imagesA multi-scale pyramid image of the layer; To enhance the corresponding first of the left-view images Layer at the firstAmplitude pyramid images of the individual directions,To enhance the corresponding first of the left-view imagesLayer at the firstAmplitude pyramid images in the individual directions; To enhance the corresponding first of the left-view images Layer at the firstPhase pyramid images of the individual directions,Representing the cosine operation of each pixel; To enhance the corresponding first of the left-view images Layer at the firstPhase pyramid images of the individual directions,Representing performing sine operation on each pixel; representing the image as a matrix and then calculating its determinant value;

enhancing right view corresponding multi-scale pyramid images Calculated using the following formula:

；

Wherein, To enhance the corresponding first of the right-view imagesA multi-scale pyramid image of the layer; to enhance the corresponding first of the right-view images Layer at the firstAmplitude pyramid images of the individual directions,To enhance the corresponding first of the right-view imagesLayer at the firstAmplitude pyramid images in the individual directions; to enhance the corresponding first of the right-view images Layer at the firstPhase pyramid images of the individual directions,Representing the cosine operation of each pixel; to enhance the corresponding first of the right-view images Layer at the firstPhase pyramid images of the individual directions,Representing a sine operation for each of its pixels.

Further, an enhanced left view image is arranged at the firstThe phase pyramid image of the layer isCalculated using the following formula:

；

Wherein, To enhance left-view image at the firstA phase gradient of the layer in the Y-axis direction; To enhance left-view image at the first A phase gradient of the layer in the X-axis direction; Performing modular operation; To enhance left-view image at the first Phase image of the layer;

Set the enhanced right view image at the first The phase pyramid image of the layer isCalculated using the following formula:

；

Wherein, To enhance the right-view image in the first placeA phase gradient of the layer in the Y-axis direction; To enhance the right-view image in the first place A phase gradient of the layer in the X-axis direction; Performing modular operation; To enhance the right-view image in the first place Phase image of the layer; Is a scalar addition.

；

Wherein, Representing a downsampling operation; Is Gaussian kernel; Is a convolution; To enhance left-view image at the first Amplitude image of the layer; To enhance left-view image at the first An amplitude gradient of the layer in the Y-axis direction; To enhance left-view image at the first An amplitude gradient of the layer in the X-axis direction; set the enhanced right view image at the firstThe phase pyramid image of the layer isCalculated using the following formula:

；

Wherein, To enhance the right-view image in the first placeAmplitude image of the layer; To enhance the right-view image in the first place An amplitude gradient of the layer in the Y-axis direction; To enhance the right-view image in the first place Amplitude gradient of the layer in the X-axis direction.

Further, based on the multi-scale pyramid image, extracting features by using a multi-layer convolution network to obtain a first stepThe left image feature map of the layer isFirst, theThe right image feature map of the layer is; In step 3, the following formula is used to perform preliminary parallax estimation at the lowest resolution layer to obtain a parallax map：

；

Wherein, The parallax value is used for representing the displacement difference of the left image characteristic diagram and the right image characteristic diagram at the same pixel position; for window size, representing the local neighborhood used to calculate the disparity in the disparity estimation process, defining a range of pixels to consider around each pixel; the layer with the lowest resolution is a layer with the lowest resolution of the multi-scale pyramid; The weight coefficient is preset; Is the first A left image feature map of the layer; Is the first Right image feature map of layer.

Further, in step 4, scalar addition operation is performed on the preset structured light compensation value, the optical filter compensation value, the temperature compensation value and the vibration compensation value and the parallax map respectively, so as to implement structured light compensation, optical filter compensation, temperature compensation and vibration compensation on the parallax map in sequence, and obtain a compensated parallax map.

Further, in step 5, based on the compensated disparity map, a disparity value is obtained asThe distance from the binocular vision image acquisition apparatus to the target is calculated using the following formula：

；

Wherein, A focal length for the binocular vision image acquisition apparatus; is the baseline distance between two cameras in a binocular vision image acquisition apparatus.

By adopting the technical scheme, the invention has the following beneficial effects: the invention utilizes the multi-scale pyramid technology to decompose the image into a plurality of layers with different resolutions, so that the global information and local details of the image can be effectively captured and processed. The multi-scale pyramid technology generates a series of image layers with different resolutions through layer-by-layer downsampling, and can analyze and process image information under different scales. By combining a multi-layer Convolutional Neural Network (CNN), the method can extract rich characteristic information from the multi-scale pyramid images, including high-level characteristics such as edges, textures, shapes and the like. The combination method has the advantages that the convolutional neural network can automatically learn and extract complex features in the image through the operations of layer-by-layer convolution and pooling, so that the feature extraction process is more accurate and robust. The multi-scale pyramid provides a multi-level image representation, so that the system can give consideration to information of different scales and resolutions when processing complex scenes, and the effect of feature extraction and the precision of parallax estimation are improved. In the parallax estimation process, the invention uses the feature images extracted by the convolutional neural network for matching, and adopts a method for carrying out preliminary parallax estimation at the lowest resolution layer. The method can remarkably reduce the calculated amount, improve the calculation efficiency, and simultaneously reduce the interference of detail noise by utilizing the global structure information of the low-resolution image, so that the preliminary parallax estimation is more stable and robust. Specifically, by performing parallax estimation on the lowest resolution layer, the system can quickly obtain a preliminary parallax map, and provide good initial conditions for the refinement processing of the subsequent high resolution layer. In the parallax estimation process, the matching cost function between the feature graphs is utilized, and the absolute difference value and the weighted square difference are combined, so that the calculation of the parallax value is ensured to be more accurate and reliable. The process fully utilizes the characteristics extracted by multi-scale analysis and deep learning, and obviously improves the precision and stability of parallax estimation.

Drawings

Fig. 1 is a flow chart of a high-precision binocular vision ranging method according to an embodiment of the present invention.

Detailed Description

All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.

Any feature disclosed in this specification (including any accompanying claims, abstract) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. That is, each feature is one example only of a generic series of equivalent or similar features, unless expressly stated otherwise.

Example 1: referring to fig. 1, a high-precision binocular vision ranging method, the method comprising:

Binocular vision image acquisition apparatuses are generally composed of two cameras placed side by side, and can simulate the visual effects of human eyes by simultaneously photographing left and right view images of the same object, thereby acquiring three-dimensional information. After the left view image and the right view image are obtained, image enhancement processing is performed to improve contrast, definition and detail expression of the images, so that feature extraction and matching in the subsequent steps are more accurate. The image enhancement process may be implemented by a variety of methods including, but not limited to, histogram equalization, laplace operator enhancement, and Gamma correction. The histogram equalization enhances the contrast of the image by adjusting the gray level distribution of the image, and the detail part is more prominent; the Laplace operator is enhanced, and the edge part of the image is enhanced through the edge detection operator, so that the structural information in the image is more obvious; gamma correction enables details of dark parts to be clearer and bright parts to be not overexposed by adjusting brightness of an image. Through the image enhancement technologies, the quality of the image can be effectively improved, so that the features in the image are clearer and more discernable, and a good basis is provided for subsequent feature extraction and matching. Other advanced image processing techniques, such as adaptive filtering, sharpening, etc., may also be combined during the image enhancement process to further improve image quality. The self-adaptive filtering can reduce noise while keeping image details by adjusting filtering parameters according to image local characteristics; the sharpening process then makes the edges and details of the image more sharp by enhancing the high frequency content of the image. By comprehensively applying the image enhancement techniques, high-quality images can be obtained under different illumination conditions and in different environments, so that the characteristics extracted in the subsequent steps are ensured to have higher robustness and accuracy.

Firstly, constructing a multi-scale pyramid is a classical image processing technique, which generates a series of image layers of different resolutions by downsampling the original image layer by layer. Each layer of image is scaled down in proportion to the previous layer of image so that the image can be analyzed and processed on multiple scales. The multi-scale pyramid has the advantage that the multi-scale pyramid can capture global features and local details of an image at the same time, so that the feature extraction process can consider information under different scales. For example, at a high resolution level, subtle edge and texture information can be captured, while at a low resolution level, the overall structure of the image and large-scale shape features can be extracted. based on multi-scale pyramid images, the use of multi-layer Convolutional Neural Networks (CNNs) to extract features is an important method in modern computer vision. The convolutional neural network can automatically learn and extract complex features in the image through a series of combinations of convolutional layers, pooling layers and activation functions. Specifically, the convolution layer performs sliding operation on the image through a convolution kernel (filter) to extract the characteristics of the local area; the pooling layer reduces the size of the feature map through downsampling operation, and simultaneously retains important feature information; the activation function (e.g., reLU) then enables the network to learn more complex patterns through nonlinear transformations. In the invention, the multi-scale pyramid images of the enhanced left-view image and right-view image are respectively input into a convolutional neural network, and feature images under different scales are extracted through multi-layer convolution and pooling operations. These feature maps contain not only the spatial information of the image, but also advanced features such as edges, textures, shapes, etc., that are automatically learned through convolutional networks. By the method, information of different scales and different layers in the image can be effectively captured, and abundant characteristic data is provided for subsequent parallax estimation. Compared with the traditional feature extraction method, the method provided by the invention has the obvious advantage that a mode of combining the multi-scale pyramid and the convolutional neural network is adopted. Conventional methods typically rely on manually designed feature extraction algorithms, such as SIFT, SURF, etc., which, while performing well in some cases, tend to be more sensitive to environmental and parameter variations. The convolutional neural network can automatically learn the characteristics adapting to different scenes and conditions in a data driving mode, and has robustness and generalization capability. In addition, the introduction of the multi-scale pyramid enables the feature extraction process to be carried out under different scales, and the problem of scale change in the image is effectively solved. For example, in practical applications, the size and distance of the object may vary significantly, while the multi-scale pyramid is able to analyze and process images at different scales, ensuring stability and accuracy of feature extraction. The convolutional neural network can capture complex modes and details in the image through multi-level feature extraction and nonlinear transformation, and the accuracy of feature matching is improved. In the specific implementation process, the enhanced left-view image and right-view image are firstly constructed into a multi-scale pyramid, and the images of each scale level are processed by a convolutional neural network to extract corresponding feature images. These feature maps will be used in subsequent steps for feature matching and disparity estimation. In this way, the method not only can extract high-quality image features in a complex environment, but also can improve the accuracy and the robustness of binocular vision ranging through multi-scale analysis and deep learning.

Disparity estimation is one of the key steps of binocular vision ranging, which determines disparity, i.e., horizontal displacement of the same object in two images, by comparing similar feature points in left and right view images. This process needs to solve two main problems: matching of feature points and calculation of parallax. In the feature point matching process, the system needs to find corresponding feature point pairs in the left image and the right image. By searching similar partial areas in the left and right images, a pair of best matching points is found, and parallax therebetween is calculated. Parallax is determined by the base line distance between the left and right cameras and the focal length of the cameras, and reflects the depth information of the target. Specifically, the greater the parallax, the closer the target is to the camera; conversely, the smaller the parallax, the farther the target is from the camera. In the present invention, the matching of feature points is performed at the lowest resolution layer, which has various advantages. First, the preliminary parallax estimation is performed at a low resolution level, and the calculation amount can be significantly reduced. Since the resolution of the image is reduced, the number of pixel points to be processed is reduced, thereby improving the calculation efficiency. And secondly, the preliminary estimation is carried out at a low resolution level, so that noise and detail interference in the image can be effectively eliminated, and the feature matching is more robust and stable. The low-resolution image retains main structural information, and high-frequency noise is filtered, so that characteristic points are more accurately matched. Furthermore, performing the preliminary disparity estimation at a low resolution level also provides good initial conditions for the refinement process in the subsequent step. By obtaining the disparity map at a coarser level, the system can perform further refinement at a subsequent high resolution level, gradually increasing the accuracy of the disparity estimation. Common methods in the disparity estimation process include block matching, phase correlation, and energy minimization-based optimization methods, etc. In the block matching method, the system finds the window pair with the highest similarity by searching for windows of a fixed size in the left image and the right image, and calculates the parallax therebetween. The phase correlation method determines parallax by calculating a phase difference using frequency domain information of an image. The energy minimization-based optimization method represents errors of feature matching by constructing a cost function, and finds the best parallax estimation value by minimizing the cost function. Each of these approaches has advantages and disadvantages, with the specific choice depending on the application scenario and computing resources. The invention performs preliminary parallax estimation at the lowest resolution layer, and effectively combines the advantages of the block matching and optimizing method. By processing at the lowest level of the multi-scale pyramid, the system can quickly obtain a preliminary disparity map and provide initial conditions for subsequent multi-compensation and refinement processes. The method not only improves the calculation efficiency, but also enhances the robustness and the accuracy of parallax estimation. Compared with the prior art, the method and the device have the advantages that in the preliminary parallax estimation process, the calculated amount is obviously reduced by processing on a low-resolution level, and the stability and the accuracy of matching are improved. The method effectively combines the characteristics extracted by multi-scale pyramid and deep learning, and utilizes the rich information in the left-right view images to realize high-efficiency and high-precision parallax estimation. Through the innovative method, the method not only solves the problem of high computational complexity of the traditional method on a high-resolution level, but also provides a more robust and stable parallax estimation scheme, and provides reliable technical support for high-precision binocular vision ranging.

First, structured light compensation is to introduce a known pattern of structured light, such as a laser grid, stripes or a lattice, into the disparity estimation. These structured light patterns, by being projected into the measurement scene, can form well-defined marks in the image, which marks appear in both the left and right images. By identifying the deformations of these structured light patterns, the system can accurately calculate the parallaxes and correct the parallaxes map. The advantage of structured light compensation is that it can significantly improve the accuracy of disparity estimation, especially in scenes with less texture or more repetitive textures, the structured light provides additional reference information, reducing the probability of mismatching. By carrying out structured light compensation on the parallax map, parallax errors caused by complex scenes can be effectively eliminated, and the accuracy of ranging is improved. Secondly, the filter compensation is to add a filter with specific wavelength in front of the camera to filter out unwanted light, such as ambient light and infrared light. These unwanted rays may interfere with the image, resulting in inaccuracy of the disparity estimation. The filter compensation reduces ambient light interference by selectively allowing light of a specific wavelength to pass therethrough, thereby improving contrast and sharpness of the image. The compensation technology is particularly suitable for scenes with complex illumination conditions, such as outdoor strong light or low light environments, and can ensure consistency of image quality through optical filter compensation, so that accuracy of parallax images is ensured. The temperature compensation is to cope with the influence of temperature variation on the camera and its internal components. The temperature change may cause a focal length change of the camera and thermal noise of the sensor, thereby affecting quality of the image and accuracy of parallax estimation. The temperature sensor monitors the ambient temperature in real time and combines the temperature response curve of the camera, so that the image can be subjected to temperature compensation. The temperature compensation method generally comprises digital signal processing and hardware correction, and the influence caused by temperature change is eliminated by adjusting the gray value and color balance of the image, so that the stability and the accuracy of the parallax image are ensured. The compensation technology is particularly important in scenes with large temperature difference, such as rapid switching of indoor and outdoor environments, and temperature compensation can effectively reduce distance measurement errors caused by temperature change. Finally, vibration compensation is to cope with minor vibrations and displacements that the camera may be subjected to during the measurement. These vibrations may originate from the external environment or from mechanical movements of the measuring device itself, which may have an influence on the stability of the image and thus on the accuracy of the parallax estimation. Vibration compensation is usually carried out by monitoring the motion state of a camera in real time through a gyroscope and an acceleration sensor and dynamically correcting an image by combining an image stabilization algorithm. The vibration compensation method comprises image registration and motion estimation, and the accuracy and stability of a parallax image are ensured by adjusting the displacement of a camera in real time in the image processing process. The technology is particularly widely applied to mobile platforms, such as unmanned aerial vehicles and automatic driving automobiles, and can obtain a stable parallax image in the motion process through vibration compensation, so that the distance measurement reliability is ensured. The precision and stability of the parallax map can be remarkably improved through the comprehensive application of the techniques of structured light compensation, optical filter compensation, temperature compensation and vibration compensation. Each compensation technology has unique application scenes and advantages, and can cover the ranging requirements under various complex environments through combined application, so that the high precision and the robustness of the binocular vision ranging method are ensured. Compared with the prior art, the multiple compensation strategy provided by the invention not only improves the accuracy of distance measurement, but also enhances the adaptability of the system in various environments, thereby realizing more reliable high-precision binocular vision distance measurement.

Example 2: let the left-view image beAnd right-view image is; Image enhancement is performed on the left-view image and the right-view image using the following formula, and an enhanced right-view image and an enhanced left-view image are obtained:

；

Specifically, first, the normalization process is the first step of the image enhancement formula. The purpose of normalization is to adjust the pixel value distribution of an image to a state where the mean is zero and the standard deviation is one. By subtracting the average value of the image from each pixel value and then dividing by the standard deviation of the image, the brightness shift in the image due to illumination variation or image pickup apparatus difference is eliminated. The normalization processing can effectively balance the brightness in the image, so that the gray value of the image is more concentrated, and the contrast of the image is enhanced. The processing mode not only can improve the visual effect of the image, but also provides a more consistent basis for the subsequent feature extraction, and reduces the interference of illumination change on the feature extraction process. Second, the image enhancement formula introduces an enhancement term based on a gaussian distribution. The function of the gaussian distribution term is to adjust the contrast and detail performance of the image by an exponential function. Specifically, the gaussian distribution term is adjusted according to the difference between the pixel value and the image mean value, so that the bright part and the dark part of the image are enhanced to different degrees. When the pixel value is far from the image mean value, the value of the exponential function is reduced, so that the areas are enhanced strongly; conversely, for the region close to the image mean value, the enhancement effect is weaker. In this way, the details of the bright part and the dark part of the image are highlighted, and the overall contrast of the image is improved. The preset enhancement factor is a key parameter in the formula. By adjusting the value of the enhancement coefficient, the intensity of the image enhancement can be controlled. The larger the enhancement coefficient is, the more remarkable the contrast and detail enhancement effects of the image are; the smaller the enhancement coefficient, the weaker the enhancement effect. Through reasonably setting the enhancement coefficient, the image can be optimized according to specific application requirements, so as to achieve the best visual effect and characteristic extraction effect. This image enhancement method has significant advantages over conventional methods. Conventional image enhancement methods, such as histogram equalization and contrast stretching, while capable of improving the contrast of an image to some extent, often introduce problems of excessive enhancement or loss of detail. By combining the standardized processing and the Gaussian distribution enhancement, the method not only maintains the original details of the image, but also further optimizes the contrast and detail performance of the image, so that the robustness of the image in a complex environment is greatly improved. Furthermore, this approach is also advantageous in terms of computational complexity. The amount of calculation of the standardization process and the Gaussian distribution enhancement is relatively small, and the calculation cost can be reduced while the image enhancement effect is ensured. This is of great importance for application scenarios requiring real-time processing of images, such as autopilot and robotic vision. By implementing the image enhancement algorithm on hardware, the real-time processing capacity of the system can be effectively improved, and the application requirements of high precision and high efficiency are met. Through the standardized processing and Gaussian distribution enhancement of the image, the invention realizes the multi-level optimization of the image. In the standardized processing stage, the brightness distribution of the image is balanced, and the contrast and consistency are enhanced; in the Gaussian distribution enhancement stage, the detail and the layering sense of the image are obviously improved. The combination of the two can ensure that the image can keep high quality and high robustness under various complex environments, and provides a solid foundation for subsequent feature extraction and matching.

Example 3: the layer number of the multi-scale pyramid in the step 2 ranges from 4 to 8 layers; the multi-scale pyramid is a phase consistency multi-scale pyramid, and the direction number of each layer is in the range of 2 to 6.

Specifically, a multi-scale pyramid is an image processing technology, and a series of image layers with different resolutions are generated by downsampling an original image layer by layer. Each layer of images is a scaled down version of the previous layer of images so that the images can be analyzed and processed at different scales. In this embodiment, the number of layers of the multi-scale pyramid ranges from 4 to 8 layers, which means that the image will be decomposed into 4 to 8 layers of different resolution. This decomposition method allows both global information and local details of the image to be captured and processed efficiently. The key to the multi-scale pyramid is that information at different levels can complement and verify each other. On a high-resolution level, the detail and texture information of the image are rich, and small-scale features can be captured; at the low resolution level, the overall structure of the image and the large-scale features are more obvious, and the method can be used for global analysis and feature matching. Through multi-scale analysis, the system can simultaneously consider the global and local information of the image, and the accuracy and the robustness of feature extraction are improved. Secondly, the phase consistency multi-scale pyramid further enhances the effect of image processing. The phase consistency is a feature extraction method based on local energy, and feature points with consistency are extracted by analyzing phase information of images in different scales and directions. The phase-consistent multi-scale pyramid performs multiple directional analysis in each layer, with a range of direction numbers from 2 to 6, meaning that each layer image will be decomposed into 2 to 6 differently directed components. The method can capture the edge and texture features in different directions in the image, and improves the accuracy of feature extraction. The advantage of phase consistency is that it is robust to illumination variations and noise. Traditional feature extraction methods, such as gradient-based edge detection, are prone to failure in cases of large illumination variations and noise. And the phase consistency can keep a stable characteristic extraction effect under illumination change and noise interference by analyzing local energy. This is of great importance for binocular vision ranging in complex environments. In the specific implementation process, the construction of the phase consistency multi-scale pyramid requires multi-scale and multi-directional filtering processing of the image. Each layer of image is first downsampled by gaussian filtering and then direction decomposed by applying Gabor filters in different directions. The Gabor filter is a linear filter capable of effectively extracting edges and texture features in an image. By applying Gabor filters in different scales and directions, the system is able to extract feature points in the image that have phase consistency. These feature points play a key role in subsequent feature matching and disparity estimation. By combining multi-scale analysis and phase consistency feature extraction, the image processing method of the present embodiment exhibits excellent performance in a complex environment. The multi-scale pyramid enables the global information and local details of the image to be captured effectively, and the phase consistency feature extraction ensures the stability and accuracy of feature extraction. The method not only improves the precision of image processing, but also enhances the robustness of the system under illumination change and noise interference. Compared with the prior art, the multi-scale pyramid and phase consistency feature extraction method provided by the invention has obvious technical advantages. The traditional feature extraction method is easy to lose effectiveness under illumination change and noise interference, and the robustness and accuracy of feature extraction are effectively improved through multi-scale and multi-directional analysis. In addition, the number of layers and the number of directions of the multi-scale pyramid are flexibly set, so that the system can be optimized according to specific application requirements, and the adaptability and the practicability of the method are further enhanced.

Example 4: enhancing left view corresponding multi-scale pyramid imagesCalculated using the following formula:

；

Specifically, the multi-scale pyramid is constructed by performing multi-level downsampling on the image, so that the image can be analyzed on different scales. In this embodiment, the enhanced left-view image and the right-view image respectively construct a multi-scale pyramid, and each layer of image is decomposed in multiple directions. Specifically, the first left-view image is enhancedLayered multi-scale pyramid imageAnd enhancing the right-view imageLayered multi-scale pyramid imageRespectively calculated by formulas. These formulas combine the information of the amplitude pyramid image and the phase pyramid image. In each layer of the multi-scale pyramid, the amplitude pyramid image and the phase pyramid image of the image in different directions are respectively expressed asAndAndAnd. By weighted summing these amplitude and phase information, a phase consistency image can be obtained. The calculation formula of the phase consistency image comprehensively considers the amplitude and phase information of adjacent layers and directions. The first part of the formula is obtained by applying the method to the first part ofLayer and the firstAnd carrying out weighted summation on the amplitude pyramid images of the layers, and calculating the contribution of phase consistency by combining corresponding phase information. The core of this part is the amplitude pyramid image for each layerAnd phase pyramid imageAnd performing cosine transformation and sine transformation, thereby extracting the characteristics of phase consistency. Specifically, the cosine and sine transforms involved in the formula separately calculate the phase angle information in the phase pyramid image so that the phase information in each direction can be effectively integrated. Second, the denominator part of the formula is to sum the absolute values of the amplitude pyramid images in adjacent levels and directions. This process ensures that the amplitude information at different directions and different levels is fully accounted for, resulting in an overall phase consistency metric. In this way, the formula not only combines information in all directions, but also considers the result of multi-scale analysis, so that the calculation of phase consistency is more accurate and stable. Multi-scale pyramid image enhancing right-view imagesSimilar to enhancing the left-view image, is also achieved by combining amplitude and phase information in different directions and at different levels. The enhanced right view image at the first position can be obtained by carrying out weighted summation on the amplitude pyramid image and the phase pyramid image and combining cosine and sine transformation of the phase informationPhase-consistent multi-scale pyramid images on layers. The phase consistency calculation method has obvious technical advantages. First, it combines multi-scale analysis with multi-directional phase information so that details and features in the image can be more fully captured. And secondly, by comprehensively considering amplitude and phase information, the algorithm can keep a stable characteristic extraction effect in a complex environment and has stronger robustness. Furthermore, the phase consistency metric also exhibits good stability under illumination variations and noise interference, which is particularly important for binocular vision ranging applications. By the method, the multi-scale pyramid images for enhancing the left-view image and the right-view image not only keep the global structural information of the images, but also enhance the local details and texture characteristics of the images. The method provides a solid foundation for subsequent feature matching and parallax calculation, and improves the overall accuracy and reliability of binocular vision ranging. In the embodiment 4, the phase consistency multi-scale pyramid image of the enhanced left view image and the right view image on each layer is calculated by constructing a multi-scale pyramid with phase consistency and combining amplitude and phase information. The method not only improves the processing precision of the image, but also enhances the adaptability of the system in a complex environment, and provides reliable technical support for high-precision binocular vision ranging.

Example 5: set the enhanced left view image at the firstThe phase pyramid image of the layer isCalculated using the following formula:

；

Specifically, a multi-scale phase pyramid that enhances left-view and right-view images is obtained by computing phase gradients and phase information at different levels. The method ensures the consistency and stability of the characteristic points in all directions by combining the phase information and the phase gradient, thereby obviously improving the accuracy and the robustness of image processing. First, the left view image is enhanced at the firstPhase pyramid image of layersIs through the previous layerIs a phase image of (a)Phase gradient of the sameAndCalculated. The phase gradient represents the rate of change of phase of an image in a particular direction, i.e., the speed and direction of phase change in the image. By calculating these phase gradients, small phase changes in the image can be captured, which is important for extraction of fine features. In particular, phase gradientsAndRespectively represent the images inShaft and method for producing the samePhase change in the axial direction. These gradients are obtained by differential operation of the phase images, representing the phase change trend of the images in both directions. The calculation of the phase gradient is an important step in the image processing and it can provide information on the phase change in the image, which plays a key role in the subsequent phase update. Next, by combining these phase gradients with the phase image of the previous layerA new phase pyramid image is calculated. In the formula, the phase gradient is normalized to ensure that the value is within a reasonable range, and instability in calculation is avoided. This normalization process typically takes the form of a sum of squares, i.eAndIn this way, a numerical stability of the phase gradient is ensured. Then, these normalized phase gradient values are added to the phase image of the previous layer to obtain an updated phase image. This step is achieved by scalar addition, i.e. a normalized phase gradient is added pixel by pixel to the phase image of the previous layer. Therefore, the new phase image not only contains the phase information of the previous layer, but also combines the phase change trend of the current layer, and the continuity and consistency of the phase information are ensured. In order to keep the phase value within a reasonable range, the formula adopts modulo operationLimiting the phase value toTo the point ofBetween them. The purpose of the modulo operation is to prevent the phase value from exceeding a predetermined range, and to keep it always within a cycle interval by performing the modulo operation on the phase value. This operation ensures the periodicity and consistency of the phase information and avoids calculation errors caused by numerical overflow. Phase pyramid image for enhanced right view imageThe calculation process is similar to that of the left image. By phase imaging the previous layerPhase gradient of the sameAndCalculating, and obtaining the right-view image on the first side through normalization processing and modulo operationPhase pyramid image of layer. The process ensures consistency and accuracy of the left and right view images in phase information, and provides a reliable basis for subsequent parallax estimation. The multiscale phase pyramid method has the advantage of being able to capture subtle changes in the image and to conduct propagation and updating of phase information at multiple levels. Through comprehensive calculation of the phase gradient, the multi-scale phase pyramid can effectively improve the accuracy and the robustness of phase information. The method has important significance for feature extraction and matching in binocular vision ranging, and can maintain a high-precision ranging result in a complex environment. In addition, the multi-scale phase pyramid method has strong robustness to illumination change and noise. The traditional phase processing method is easy to fail under the condition of large illumination change and noise, and the multi-scale phase pyramid can keep a stable feature extraction effect under the condition of illumination change and noise interference through comprehensive calculation of phase gradient. This allows the method to exhibit excellent performance in complex environments, such as outdoor strong light or low light environments, and to extract stable and accurate phase information. In practical application, the multi-scale phase pyramid method can show excellent performance in different illumination conditions and complex scenes. The method not only improves the contrast and detail performance of the image, but also enhances the adaptability of the system in complex environments. For application scenes requiring high precision and high robustness, such as automatic driving and robot vision, the multi-scale phase pyramid method is low in calculation complexity realized through hardware, and can reduce calculation cost and meet the requirement of real-time processing while guaranteeing the image enhancement effect.

Example 6: set the enhanced left view image at the firstThe phase pyramid image of the layer isCalculated using the following formula:

；

Specifically, first, the left-view image is enhanced at the firstAmplitude pyramid image of layerBy the previous layerAmplitude image of (a)And its amplitude gradientAndAnd (3) performing calculation. The amplitude gradient represents the rate of change of the amplitude of the image in a particular direction, i.e. the speed and direction of the amplitude change in the image. By calculating the amplitude gradients, small amplitude changes in the image can be captured, so that finer characteristic information is extracted. In the formulaAndRespectively represent the images inShaft and method for producing the sameAmplitude variation in the axial direction. These gradients are obtained by differential operation of the amplitude image, representing the trend of the image in both directions. The calculation of the amplitude gradient is an important step in the image processing and it can provide information on the amplitude variation in the image, which plays a key role in the subsequent amplitude update. Next, by combining these amplitude gradients with the amplitude image of the previous layerA new amplitude pyramid image is calculated. In the formula, the amplitude image is obtainedIs the square root of (d) and gaussian kernelThe convolution operation is performed, and the purpose of this step is to smooth the image, remove noise, and retain important amplitude information. Gaussian convolution is a common filtering method in image processing, and can effectively reduce the influence of noise on the image and improve the quality of the image by carrying out smoothing processing on the image. And combining the smoothed amplitude image with the phase gradient information, calculating to obtain a new amplitude image through a cosine function, and calculating to obtain the new amplitude image through the cosine function. The step utilizes the change trend of the phase information, so that the new amplitude image not only contains the amplitude information, but also combines the information of the phase change, and the accuracy and the robustness of the image characteristics are improved. Finally, by downsampling operationsThe resolution of the image is reduced by half, generating a new amplitude pyramid image. The purpose of the downsampling operation is to reduce the size of the image while preserving important feature information so that the multi-scale pyramid can effectively represent features of the image at different resolutions. This operation ensures continuity and consistency of the image in the multi-scale representation, providing a solid basis for subsequent image processing and analysis. Amplitude pyramid image for enhanced right view imageThe calculation process is similar to that of the left image. By imaging the previous layer amplitudeAnd its amplitude gradientAndCalculating, and obtaining a right-view image at the first position through Gaussian filtering and smoothing, phase gradient combining and downsampling operationAmplitude pyramid image of the layer. The process ensures consistency and accuracy of the left and right view images in amplitude information, and provides a reliable basis for subsequent parallax estimation. The multiscale phase pyramid method is advantageous in that it can capture subtle changes in the image and conduct propagation and updating of amplitude information at multiple levels. Through comprehensive calculation of amplitude gradients, the multi-scale phase pyramid can effectively improve accuracy and robustness of amplitude information. The method has important significance for feature extraction and matching in binocular vision ranging, and can maintain a high-precision ranging result in a complex environment.

Example 7: based on the multi-scale pyramid image, extracting features by using a multi-layer convolution network to obtain a first stepThe left image feature map of the layer isFirst, theThe right image feature map of the layer is; In step 3, the following formula is used to perform preliminary parallax estimation at the lowest resolution layer to obtain a parallax map：

；

Specifically, features are extracted from the multi-scale pyramid image through a multi-layer convolutional neural network. And processing each layer of pyramid images of the left-view image and the right-view image by using a convolutional neural network respectively to generate corresponding feature images. The convolution network can extract advanced characteristic information in the image, including edges, textures, shapes and the like through multi-level convolution and pooling operations. These feature information play a vital role in disparity estimation because they more accurately reflect the structure and content in the image. Next, preliminary disparity estimation is performed at the lowest resolution layer. The reason for selecting the lowest resolution layer is that the low resolution image can remarkably reduce the calculated amount, meanwhile, the global structure information of the image is reserved, interference of detail noise is reduced, and therefore parallax estimation is more stable and robust. At the lowest resolution level, the system compares the feature maps of the left and right images to determine the disparity value for each pixel. The core of disparity estimation is to match the displacement difference of feature points in the left and right images. In this process, the system searches for possible parallax values in the local neighborhood of each pixel, calculates a matching cost corresponding to each parallax value, and selects the parallax value with the minimum cost as the preliminary parallax. The computation of the matching cost consists of two main parts: the absolute difference and the weighted squared difference. The absolute difference is used for directly measuring the difference of the left image feature map and the right image feature map under different parallax values, the weighted square error is used for further measuring the difference through the square error, and the sum of the left image feature map and the right image feature map is combined for weighting so as to better reflect the similarity between the image features. Specifically, the system calculates differences in feature values by comparing feature maps pixel by pixel and introduces a weight coefficient to balance the effects of the different parts. The weight coefficient ensures that the difference between the feature graphs is reasonably processed in calculation, so that the matching process is more accurate. Finally, by searching the local neighborhood of each pixel, the system can find the disparity value that minimizes the cost function, which reflects the difference in displacement of the left and right images at that pixel location. The advantage of performing the preliminary disparity estimation at the lowest resolution layer is not only that it is computationally efficient, but it also provides a stable and robust initial disparity map. This is important for the subsequent refinement process of the high resolution layer. The preliminary disparity map provides a good starting point for further disparity refinement, and ensures continuity and accuracy of disparity estimation. In addition, the application of the convolutional neural network in feature extraction significantly improves the accuracy of parallax estimation. Traditional feature extraction methods often rely on manually designed features, such as SIFT and SURF, which may perform poorly in complex scenarios. And the convolutional neural network can automatically learn advanced features in the image through deep learning, and has better generalization capability and robustness. By combining the multi-scale pyramid images, the system can extract rich characteristic information on different resolutions and scales, so that parallax estimation is more accurate and stable.

Example 8: and 4, respectively performing scalar addition operation with the parallax map through a preset structured light compensation value, a preset optical filter compensation value, a preset temperature compensation value and a preset vibration compensation value to sequentially perform structured light compensation, optical filter compensation, a preset temperature compensation value and a preset vibration compensation value on the parallax map so as to obtain a compensated parallax map.

Specifically, first, the structured light compensation is implemented by introducing a preset structured light compensation value and performing scalar addition operation with the parallax map. Structured light is an active light source technique that can significantly enhance the accuracy of parallax estimation by projecting a known pattern of light, such as a grid or stripe of laser light, into the measurement scene. In practice, structured light forms well-defined marks in the image, which marks appear in both left and right looking images. By identifying the deformations and displacements of these markers, accurate disparity values can be calculated. The structured light compensation value is obtained in advance through an experiment or a calibration method, and then added to the parallax map in the parallax map calculation process so as to correct errors caused by changes in illumination conditions or surface reflection characteristics. Next, the filter compensation is performed by adding a preset filter compensation value. Filters are used to reduce interference of ambient light and other unwanted light sources with images, especially in outdoor or complex lighting conditions. The filter compensation values are also predetermined by experimental or calibration methods, which can effectively correct errors in the disparity map due to illumination variations. In practical application, the filter compensation can remarkably improve the contrast and definition of the image, so that the parallax estimation is more accurate. The temperature compensation is realized by scalar addition operation of a preset temperature compensation value and the parallax map. Temperature variations can lead to changes in physical characteristics inside the camera, such as small changes in focal length, which can directly affect the accuracy of the parallax estimation. The temperature sensor monitors the ambient temperature in real time, and the temperature compensation value is predetermined by combining the temperature response curve of the camera. In the parallax map calculation process, the temperature compensation values are added, so that errors caused by temperature change can be effectively corrected, and the accuracy and stability of the parallax map are ensured. Finally, vibration compensation is achieved by adding a preset vibration compensation value. Vibrations usually come from the external environment or from mechanical movements of the device itself, which can cause small changes in the position of the camera, affecting the accuracy of the disparity map. In practical application, the vibration state of the equipment is monitored in real time through the acceleration sensor and the gyroscope, and the vibration compensation value is predetermined in combination with experimental data. In the process of calculating the parallax map, scalar addition operation is carried out on the vibration compensation values and the parallax map, so that errors caused by vibration can be effectively corrected, and the stability and the accuracy of parallax estimation are improved.

Example 9: in step 5, based on the compensated disparity map, a disparity value is obtained asThe distance from the binocular vision image acquisition apparatus to the target is calculated using the following formula：

；

Specifically, the parallax valueIs obtained by performing multiple compensation on the parallax map. In step 4, the preliminary parallax map has been corrected by various compensation methods such as structured light compensation, optical filter compensation, temperature compensation, and vibration compensation, to eliminate the influence of the external environment on parallax estimation. The compensated parallax image is more accurate and stable, and a reliable parallax value is provided. Next, the formulaIn (a) and (b)Representing a baseline distance between two cameras in the binocular vision image acquisition apparatus. The baseline distance refers to the horizontal distance between the two cameras, which is a known fixed value. The larger the base line distance is, the stronger the depth perception capability of the system is, and the accuracy of distance measurement is correspondingly improved.The focal length of the binocular vision image acquisition apparatus, i.e., the focal length of the camera, is represented. The focal length determines the camera's viewing angle and imaging ratio, another known fixed parameter. The longer the focal length, the larger the ranging range of the system, but at the same time the higher the parallax accuracy requirement. Parallax errorIs the difference in horizontal displacement of the object in the left and right view images. The binocular vision system calculates depth information of a target using parallax between left and right cameras by simultaneously photographing two images of the same scene. The larger the parallax value is, the closer the target is to the camera; the smaller the disparity value, the further away the target is from the camera. The accurate calculation of the parallax value is the core of the whole ranging process, and the accuracy of the parallax value is ensured through multiple compensation of the parallax map. Formula (VI)Is based on the principle of geometric triangulation. Assuming that the optical axes of the two cameras are parallel and positioned on the same horizontal line, the projection points of the target points in the left image and the right image are respectivelyAndParallax error. According to the principle of similar triangle, the target distance can be obtainedIs an expression of (2). This formula illustrates the relationship between the baseline distance, the focal length, and the parallax, and by knowing the baseline distance and the focal length, and the calculated parallax value, the distance of the target can be directly calculated.

While specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are by way of example only, and that various omissions, substitutions, and changes in the form and details of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the above-described method steps to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is limited only by the following claims.

Claims

1. The high-precision binocular vision ranging method is characterized by comprising the following steps of:

step 5: calculating the distance from the binocular vision image acquisition equipment to the target based on the compensated parallax map;

enhancing left view corresponding multi-scale pyramid images Calculated using the following formula:

；

2. The method of high-accuracy binocular vision ranging according to claim 1, wherein in step 1, a left-view image is set asAnd right-view image is; Image enhancement is performed on the left-view image and the right-view image using the following formula, and an enhanced right-view image and an enhanced left-view image are obtained:

；

3. The high-precision binocular vision ranging method of claim 2, wherein the number of layers of the multi-scale pyramid in step 2 ranges from 4 to 8 layers; the multi-scale pyramid is a phase consistency multi-scale pyramid, and the direction number of each layer is in the range of 2 to 6.

4. The high-precision binocular vision ranging method of claim 3, wherein the enhanced left view image is set at the firstThe phase pyramid image of the layer isCalculated using the following formula:

；

5. The high-precision binocular vision ranging method of claim 4, wherein the enhanced left view image is set at the firstThe phase pyramid image of the layer isCalculated using the following formula:

；

6. The method according to claim 5, wherein in step 2, features are extracted using a multi-layer convolution network based on the multi-scale pyramid image to obtain the second featureThe left image feature map of the layer isFirst, theThe right image feature map of the layer is; In step 3, the following formula is used to perform preliminary parallax estimation at the lowest resolution layer to obtain a parallax map：

；

7. The method of high-precision binocular vision ranging according to claim 6, wherein in the step 4, scalar addition operation is performed on the preset structured light compensation value, optical filter compensation value, temperature compensation value and vibration compensation value and the parallax map respectively to sequentially perform structured light compensation, optical filter compensation, temperature compensation and vibration compensation on the parallax map, so as to obtain the compensated parallax map.

8. The high-precision binocular vision ranging method of claim 7, wherein the disparity value obtained in the step 5 is based on the compensated disparity mapThe distance from the binocular vision image acquisition apparatus to the target is calculated using the following formula：

；