CN108682026B

CN108682026B - Binocular vision stereo matching method based on multi-matching element fusion

Info

Publication number: CN108682026B
Application number: CN201810249294.XA
Authority: CN
Inventors: 孙福明; 杜仁鹏; 蔡希彪
Original assignee: Jiang Dabai
Current assignee: China Applied Technology Co Ltd
Priority date: 2018-03-22
Filing date: 2018-03-22
Publication date: 2021-08-06
Anticipated expiration: 2038-03-22
Also published as: CN108682026A

Abstract

The stereo matching method based on the fusion of multiple matching elements for binocular vision comprises a computer, two cameras and a stereo matching method based on the fusion of multiple matching elements. The camera is a ZED binocular camera. The stereo matching method of multi-matching element fusion comprises three stages: an initial matching cost stage, a cost function aggregation stage and a parallax post-processing stage. In the initial matching cost stage, a cost function containing fusion of a color primitive and a gradient primitive is designed, and self-adaptive adjustment is carried out through a Kalman coefficient alpha; in a cost function aggregation stage, a self-adaptive matching window is designed based on RGB color and distance relation, and aggregation is carried out through correlation between a reference image and a cost function in the window; and finally, performing parallax postprocessing through LRC left-right consistency and sub-pixel level-based adaptive weight median filtering. Compared with the classical adaptive weight algorithm, the method improves the accuracy and the real-time performance of the algorithm and shows higher robustness in a complex area.

Description

Binocular vision stereo matching method based on multi-matching element fusion

Technical Field

The invention relates to a binocular vision stereo matching method based on multi-matching element fusion, which can be widely applied to the technical fields of unmanned driving, robot navigation and the like.

Background

The stereo matching is widely applied to the fields of unmanned driving, robot navigation and the like as a core technology in binocular vision. The basic principle is that a two-dimensional image of a scene is obtained through a binocular camera, the parallax of points to be matched is obtained through a matching algorithm, and then the depth information of the scene is obtained. The stereo matching algorithm is widely concerned by scholars due to its high accuracy characteristic. However, due to the complexity of texture in natural scenes and the discontinuity of scene depth, the practical application of stereo matching algorithm is restricted.

Currently, mainstream stereo matching algorithms can be divided into global stereo matching algorithms and local stereo matching algorithms. The global stereo matching method is complex in calculation and low in efficiency due to the fact that operation of an energy function is involved, and actual requirements are difficult to meet. Based on a local stereo matching method, only local data items are operated in a cost function aggregation stage, the real-time performance is high, but the accuracy is relatively low, until Yoon proposes a classical adaptive support weight Algorithm (ASW), the accuracy is qualitatively improved, but the characteristic and texture information of an image cannot be reflected due to the fixed window.

The local stereo matching algorithm can be subdivided into stereo matching methods based on regions, characteristics and phases according to different matching primitives. The self-adaptive support weight method has the advantages that the accuracy of the algorithm in a complex texture region is low due to the fact that the matching primitive is single and the window is fixed, and the real-time performance of the algorithm is low due to the fact that the weight calculation is complex.

Aiming at the problems of the classical algorithm, some scholars express the initial matching cost of the image based on various matching primitive information such as gradient and color, and the accuracy of the algorithm is further improved. However, complex weight calculation is not abandoned, so that the real-time performance of the algorithm is still low and the robustness is weak.

Lin et al use linear functions to fit Gaussian weight functions in the original algorithm, down-sample the original image to obtain Gaussian pyramid samples, and use hierarchical clustering algorithm to calculate cost aggregation functions, thereby improving algorithm instantaneity and accuracy. However, due to the limitation of the sampling times, the sampled image is too blurred, and the robustness of the algorithm is poor.

Researchers have studied adaptive windows all the time, and abandon some apparent wrong parallax values by utilizing the coincidence degree of a reference image and an image to be matched, and abandon the operation of complex weight values in a cost aggregation stage. However, this makes the accuracy of the algorithm no better than the classical one, since the correlation between the reference image and the matching cost function is not taken into account.

Aiming at the problems, in order to improve the robustness of the algorithm in the practical scene application, the color matching primitive and the gradient matching primitive are fused in the initial matching cost stage, and the proportionality coefficient alpha of the color and the gradient matching primitive is adaptively adjusted by utilizing the Cones image which conforms to the natural scene condition of the Middlebury platform; in a cost function aggregation stage, aiming at the problem that the accuracy of the algorithm is low under a complex texture condition caused by a fixed window in the traditional method, a central pixel point is expanded according to color information and spatial distance information between pixels, and then the traditional complex energy function and weight calculation are replaced by correlation calculation between an initial matching cost function and a reference image pixel for cost aggregation, so that the accuracy of the algorithm is improved, and the time complexity of the algorithm is reduced; and finally, performing LRC left-right consistency detection on the obtained disparity map and performing sub-pixel level-based adaptive weight median filtering operation, thereby improving the accuracy of the algorithm again.

Disclosure of Invention

The invention aims to provide a binocular vision stereo matching method based on multi-matching element fusion, aiming at the problems in the prior art, the average mismatching rate of a disparity map obtained by adopting the method is greatly improved, the image is clear and accurate, and the precision of the running track of a robot or an unmanned aerial vehicle is greatly improved.

The technical scheme is as follows:

a binocular vision stereo matching method based on multi-matching element fusion comprises a computer, two cameras and the stereo matching method based on multi-matching element fusion, and is characterized in that:

the two cameras adopt ZED binocular cameras produced by Astrolabes company, and the ZED binocular cameras are set as follows:

1) image acquisition:

the ZED SDK and the CUDA are downloaded and connected with a computer through a USB3.0 interface. Connecting a ZED binocular camera through a webcam function in an MATLAB, and acquiring pictures through a snapshot function;

2) calibrating a camera:

the aim of camera calibration is to obtain accurate internal and external parameters of a camera. The internal parameters mainly comprise the focal lengths of the left lens and the right lens and the baseline distance; the extrinsic parameters mainly include a rotation matrix of the two cameras relative to a world coordinate system and a relative translation matrix of the left camera and the right camera. The invention obtains the default internal and external parameters of the camera from the official manual;

3) setting image parameters:

and performing epipolar correction through the calibration parameters to enable the acquired left and right images to meet epipolar constraint conditions. Resetting parameters by embedding a ZED Explorer plug-in into the ZED SDK;

4) stereo matching:

stereo matching is a core part of the binocular vision system. The purpose of stereo matching is to perform imaging point matching on the acquired left and right images, obtain a parallax value through the matching points, and obtain depth information of a scene.

The ZED binocular camera not only can acquire high-quality image information, but also has the advantages of small size and low power consumption. Especially, because the ZED SDK function library and the CUDA are embedded for parallel processing, the processing time of the system is greatly reduced. The ZED binocular camera can be installed on a robot or an unmanned plane. The actual scene shot by the two eyes can achieve the purpose of a real scene through the stereo matching processing of the fusion of multiple matching primitives, and then a navigation instruction is sent to a robot or unmanned aerial vehicle control and driving system through the processing of a computer arranged on the robot or the unmanned aerial vehicle; or the navigation instruction is sent to the robot or the unmanned aerial vehicle control and drive system through a computer arranged outside the robot or the unmanned aerial vehicle, connected with the computer through a local area network, an Ethernet or a wire and processed by the computer. The precision of the running track of the robot or the unmanned aerial vehicle is greatly improved.

The stereo matching module method for multi-matching element fusion comprises the following processes:

the invention adopts an improved ASW stereo matching method.

The currently known ASW stereo matching algorithm is:

in the stereo matching algorithm, it is generally defaulted that two images satisfy epipolar constraint conditions, that is, matching points corresponding to left and right images are located at the same row position of the two images. The core of the ASW algorithm is that when the support weights are used to measure the similarity between image pixels, the support of neighboring pixels is only valid if they come from the same depth, which has the same disparity as the pixel to be matched. Therefore, the support weight w of the pixels around the window is proportional to the window disparity probability Pr, as shown in equation (1):

w(p，q)∝Pr(d_p＝d_q) (1)

wherein p is the pixel to be matched, q is the other pixels except the pixel to be matched in the window, and d is the required parallax. w (p, q) is related to the color and distance of the image, as shown in equation (2):

w(p，q)＝k·f(Δc_pq，Δg_pq) (2)

wherein, Δ c_pq，Δg_pqRespectively representing the distances of two points p and q in an LAB color space and a geometric space, wherein k is a proportionality coefficient, a specific numerical value is obtained through experiments, f is a Laplace kernel function, and the two are independent from each other, as shown in a formula (3):

f(Δc_pq，Δg_pq)＝f(Δc_pq)·f(Δg_pq) (3)

wherein Δ c_pq，Δg_pqThe calculation is shown in equation (4) (5):

c_p＝|L_p，a_p，b_p|，c_q＝|L_q，a_q，b_ql is Lab three-channel color space chroma value, L component in Lab color space is used for representing pixel brightness, and the value range is [0, 100 ]]From pure black to pure white; a represents the range from red to green, and the value range is [127, -128 ]](ii) a b represents the range from yellow to blue, and the value range is [127 to 128 ]]。(p_x，p_y) And (q)_x，q_y) Are coordinate values of the geometric space. The grouping strength is defined using a laplacian kernel, as shown in equations (6) (7).

γ_c，γ_pObtaining gamma through experiments_c＝7，γ_p＝36

(the value is typically related to the window size). And then performing cost aggregation using equation (8).

Wherein the initial matching cost

As shown in formula (9):

I_c(q) and

for fixing reference image and image to be matchedAnd determining the gray values of two pixels with the parallax of d in the window. And finally, determining a final disparity map by a WTA (Winner-Takes-All) method.

The invention adopts an improved ASW matching method which mainly comprises the following steps: the method comprises a left reference image reading stage, a right reference image reading stage, a left initial matching cost stage, a right initial matching cost stage, a left cost function aggregation stage and a right cost function aggregation stage, and 4 parallax post-processing stages, wherein the 4 parallax post-processing stages mainly comprise LRC left and right consistency detection and filtering operation. Wherein:

firstly, the calculation of the initial matching cost is as follows:

the ASW algorithm uses the gray information of the image as a primitive for matching cost calculation. The method is improved by setting a truncation threshold value for the mean value of the gradient primitive and the R, G, B three-channel color primitive, fusing R, G, B color and gradient information of a pixel through the thought of Kalman filtering, and performing adaptive adjustment through a control coefficient alpha. The specific process is as follows:

(1) setting color and gradient thresholds t1 and t2 respectively, and calculating initial cost e_g，e_cAs shown in equations (10) (11).

(2) And (3) adaptively adjusting the alpha coefficient to calculate the final initial matching cost, wherein the final initial matching cost is shown in formula (12).

Secondly, an improved adaptive window expansion algorithm:

the traditional ASW matching algorithm has poor robustness in processing complex texture regions due to the fixed window. The invention adopts a self-adaptive window method according to the color and the space distance between pixels. And knowing a central pixel point p (x, y) to be matched, wherein the neighborhood pixel points in the x and y directions are p (x-1, y), p (x +1, y), p (x, y-1) and p (x, y + 1). Different from the traditional adaptive window expansion method which carries out pixel expansion by using the gray information of the central pixel point, the invention takes the central points R, G and B as expansion elements, and carries out window expansion when the three-channel information of the neighborhood pixels and the central pixel point simultaneously meets the condition of the following formula (13).

I_r，g，b(x，y)-I_r，g，b(x-1，y)＜t (13)

t is a preset color threshold, and t ∈ (0, 1). When the same region pixel jumps due to discontinuous textures in the image, it is difficult to make three channels of pixel information of the neighboring pixels satisfy formula (13) at the same time. Based on this characteristic, the present invention makes an improvement over the conventional fixed window. When the neighborhood pixels are subjected to window adaptive expansion under the condition of satisfying the formula (13), if a texture repetition region exists in a scene, the window is too large, so that the calculation is too complex when the cost is aggregated, and the method does not meet the real-time requirement of the algorithm. The invention sets the truncation arm length for the adaptive window according to the geometric characteristics of the image. The window arm length is truncated when the following equation (14) is satisfied.

Wherein, p (x), p (y) are horizontal and vertical coordinate values of the central pixel, and q (x), q (y) are coordinate values of the adjacent pixels. The minimum arm length L is set through experiments on four test images of tsukuba, teddy, cons and venus under a midlebury platform_minThreshold L5_max＝11。

Adaptive window, as shown in fig. 3. First, the central pixel I (x, y) is expanded laterally, and if the final window arm length is less than the shortest arm length, the minimum arm length L is used_minInstead of the original length. In the window expansion process, when the spatial distance between the neighborhood pixel and the central pixel is greater than a truncation threshold, the window is truncated. The final result is shown in FIG. 4, which is based on the center point, and has four arm lengths L with different lengths_u，L_d，L_l，L_r. The values of the row and column sizes of the self-adaptive window are respectively shown in the following formulas (15) and (16):

Rows＝L_u+L_d (15)

Cows＝L_l+L_r (16)

thirdly, matching cost aggregation algorithm:

after the initial matching cost is calculated, cost function aggregation is carried out, which is different from the complex weight operation of a classical algorithm. According to the method, the variance in the self-adaptive window of the reference image and the covariance of the reference image and the cost function are calculated through the correlation between the cost function and the reference image and the correlation of the reference image, so that the correlation function is obtained, and cost aggregation is performed after the correlation function is subtracted from the initial matching cost function, so that the accuracy and the real-time performance of the algorithm are greatly improved. The specific process is as follows:

first, the mean value of the convolution of the R, G, B three-channel pixel value of the pixel to be matched and the matching cost function within the matching window is calculated, as shown in equation (17).

Wherein, p is the pixel coordinate of the center point of the window, I (x, y) is the pixel coordinate of the center point in the window, I (x, y) is the self-adaptive window, and n is the total number of pixels in the window. After the mean of the convolution functions of the channels is calculated, the covariance function is calculated, as shown in equation (18):

in the above formula, m_c，m_eThe average values of the three channel elements and the matching cost function in the adaptive window are respectively shown as the following formulas (19) and (20):

then, a variance matrix θ of the three channel element composition in the reference image adaptation window is calculated, which is described in detail as shown in the following equation (21).

Wherein, each element of the matrix theta is calculated as shown in formula (22):

the coefficient matrix obtained by equations (17) to (20) is shown in equation (23):

due to v_ce(c ∈ { r, g, b }) is a vector of 1 x 3, and γ can be obtained_knContains a vector of three channels. When the correlation coefficient gamma is calculated_knAnd then, subtracting the convolution of three-channel pixels of the reference image and the correlation coefficient from the mean value of each point cost function, so that the matching cost of the left image and the right image is more independent. Finally, the initial cost function is shown in equation (24) below:

due to gamma_knIs a three channel vector, so n ∈ (1,2, 3). After the final matching cost is calculated, matching cost aggregation is performed through an adaptive window, and a specific formula is shown as a formula (25).

The invention only carries out aggregation on the basis of the reference image,

the image to be matched is not added. The experimental result shows that the accuracy of the algorithm is not reduced while the real-time performance of the algorithm is improved. And finally, selecting the parallax value with the minimum cost function as the pixel value of the parallax image by adopting a formula (26) through a WTA (Winner-Takes-A11) algorithm.

Four, parallax post-processing

1, LRC left-right consistency detection

In the stereo matching algorithm, the problem of occlusion is inevitable because of the parallax of the left and right images. Before the final disparity map is obtained, the invention firstly adopts an LRC left-right consistency algorithm to carry out disparity postprocessing operation.

When the left image is taken as a reference image, the parallax d is calculated_lObtaining the parallax d by using the right image as the reference image_r. When the following condition of equation (27) is satisfied:

|d_l-d_r|＞δ (27)

δ is a threshold, δ ∈ (0, 1). In the present invention, δ is 1. And when the absolute value of the left-right parallax difference is larger than delta, the occlusion point is considered. And taking the smaller parallax value in the left-right parallax of the occlusion point for parallax filling.

2, adaptive weighted median filtering

After the cost aggregation algorithm is carried out, the obtained disparity map has more salt-pepper noise, and the median filtering is necessary for the image. Conventional filtering, however, tends to ignore the correlation between pixels. The invention endows different weights to the pixels in the window based on the difference of the color and the distance between the pixels in the space, and the specific weight calculation is shown as a formula (28).

γ_c，γ_dIs constant, gamma is obtained by experiment_c＝0.1，γ_d9. k1, k2 is obtained by the difference between the central pixel and the surrounding pixel points in the color space and the distance space,

obtained from the following formulas (29) and (30), respectively.

The window size is 19 x 19. And after the weight value of each pixel in the window is obtained, the self-adaptive median filtering is carried out. The specific process is as follows:

(1) and multiplying the gray value of each pixel except the central point in the window by the respective weight value to obtain a new gray value, and calculating by using the formula (31).

I′(q)＝w·I(q) (31)

(2) The new values for the pixels within the window, including the center point, are sorted by taking the 2 pixel values I' (q) nearest the center point near the median value₁)，I′(q₂) And taking the average value to obtain a new gray value of the sub-pixel level to replace the gray value of the original central point pixel, and calculating by the formula (32).

Description of the Algorithm

The detailed implementation of the algorithm is described in table 1. TABLE 1

Sixth, experimental results

The experimental environment of the algorithm is Intel core i 7-67003.5 HZ CPU, 12G memory and Matlab2016a platform. In order to verify the effectiveness of the algorithm provided by the invention, the invention firstly compares a disparity map of a classical adaptive weight algorithm under a midlebury platform, for example, fig. 6 is a comparison view of the mismatching rates of all regions and depth discontinuity regions in a cons image mismatching rate comparison map.

The maximum and minimum arm lengths of the self-adaptive window in the experiment are respectively L_min＝5，L_maxThe minimum gradient and color threshold are t 1-2, t 2-7, and kalman coefficient α -0.11, respectively. Self-adaptive weight filtering weight coefficient gamma_c＝0.1，γ_dThe window is 19 x 19, 9. The invention has three test indexes of the obtained disparity map under a midlebury platform: the nonocc (non-occlusion region mismatching rate), all (all region mismatching rate), disc (depth discontinuity region mismatching rate) test compares the conventional adaptive stereo matching algorithm and the De-Maeztu gradient-based ASW improved algorithm, and the specific ratio is shown in table 2 below.

TABLE 2 evaluation of matching algorithm Table (unit:%)

From table 1, it can be seen that compared with the classical adaptive weight and its improved algorithm, the stereo matching algorithm based on multi-matching primitive fusion has obvious advantages in Venus, Teddy, cons images, no matter in all regions, non-occlusion regions or depth discontinuity region scenes. Under the Tsukuba image with darker light, a certain promotion is also provided in the non-blocking area. Compared with the GradAdaptWgt algorithm based on single gradient primitives, Tsukuba, Venus and Teddy have lower mismatching rate in all regions, and the mismatching rate of the Conses image is equivalent to that of the original GradAdaptWgt algorithm. The algorithm of the invention performs average mismatching rate comparison under middleb ury platform, as shown in table 3, and obtains considerable effect.

TABLE 3 average mismatch ratio comparison

In order to improve the robustness of the algorithm, the Kalman coefficient ratio of the color and gradient information is compared. Selecting the Cones images conforming to the natural scene for comparison, and comparing mismatching rates of matching conditions of noccc, all and disc under different alpha coefficients, wherein alpha belongs to (0.11: 0.02: 0.29).

From the above results, we set the value of α to 0.23 in practical applications to achieve the robustness requirement of the stereo matching algorithm. Compared with the classical self-adaptive support weight algorithm, the method abandons the operation of complex weight and greatly improves the real-time property of the algorithm. The invention contrasts with the classic adaptive support weight algorithm in time complexity as shown in table 4.

TABLE 4 comparison of time complexity (unit: ms) between the algorithm of the present invention and the classical ASW algorithm

The experimental results show that the algorithm greatly reduces the time complexity of the algorithm.

The invention has the advantages that:

the invention provides a novel stereo matching method based on multiple matching primitives by fusing color matching primitives and gradient matching primitives and introducing the correlation between a cost function and a reference image. The algorithm greatly improves the accuracy of scene information under the unmanned condition, contributes to obtaining a more accurate depth map, greatly reduces the time complexity of the algorithm, and provides a theoretical basis for the practical application of unmanned driving. And the navigation precision of the robot or the unmanned aerial vehicle is further improved.

Drawings

Fig. 1 is left and right images acquired by the zed binocular camera after calibration and image setting.

Fig. 2 is a flow chart of the improved ASW algorithm.

Fig. 3 is a lateral expansion diagram in an adaptive window map.

Fig. 4 is a diagram of the window expansion final effect in the adaptive window map.

FIG. 5 is a graph of the Cones image mismatch rate versus the non-occluded area mismatch rate in the plot.

Fig. 6 is a cross-sectional view of the cross-sectional area and depth discontinuity area cross-matching rate in the cons image cross-matching rate comparison map.

Detailed Description

1. the camera is a ZED binocular camera manufactured by Astrolabes company.

The resolution of the video stream collected by the ZED binocular camera is divided into four categories,

the specific parameters are shown in the following table 1:

table 1: ZED output video stream mode

2. Image acquisition:

3. calibrating a camera:

4. setting image parameters:

5. stereo matching:

The ZED binocular camera can be installed on a robot or an unmanned plane. The actual scene shot by the two eyes can achieve the purpose of a real scene through the three-dimensional matching processing of the fusion of the multiple matching elements, and then a navigation instruction is sent to a robot or unmanned aerial vehicle control and driving system through the processing of a computer arranged on the robot or the unmanned aerial vehicle.

The stereo matching method of the multi-matching primitive fusion comprises the following processes:

with the improved invention, an improved ASW matching method is adopted:

the improved ASW algorithm flow provided by the invention mainly comprises the following steps: the method comprises a left reference image reading stage, a right reference image reading stage, a left initial matching cost stage, a right initial matching cost stage, a left cost function aggregation stage and a right cost function aggregation stage, and 4 parallax post-processing stages, wherein the 4 parallax post-processing stages mainly comprise LRC left and right consistency detection and filtering operation. Wherein:

firstly, the calculation of the initial matching cost is as follows:

Secondly, an improved adaptive window expansion algorithm: the traditional ASW matching algorithm has poor robustness in processing complex texture regions due to the fixed window. The invention adopts a self-adaptive window method according to the color and the space distance between pixels. And knowing a central pixel point p (x, y) to be matched, wherein the neighborhood pixel points in the x and y directions are p (x-1, y), p (x +1, y), p (x, y-1) and p (x, y + 1). Different from the traditional adaptive window expansion method which carries out pixel expansion by using the gray information of the central pixel point, the invention takes the central points R, G and B as expansion elements, and carries out window expansion when the three-channel information of the neighborhood pixels and the central pixel point simultaneously meets the condition of the following formula (13).

I_r，g，b(x，y)-I_r，g，b(x-1，y)＜t (13)

Rows＝L_u+L_d (15)

Cows＝L_l+L_r (16)

thirdly, matching cost aggregation algorithm:

after the initial matching cost is calculated, cost function aggregation is carried out, which is different from the complex weight operation of a classical algorithm. According to the method, through the correlation between the cost function and the reference image and the correlation between the reference image and the self-adaptive window of the reference image in the document [11], the variance in the adaptive window of the reference image and the covariance between the reference image and the cost function are calculated to obtain the correlation function of the reference image, and cost aggregation is performed after the correlation function is subtracted from the initial matching cost function, so that the accuracy and the real-time performance of the algorithm are greatly improved. The specific process is as follows:

Wherein p is the pixel coordinate of the center point of the window, q is the pixel coordinate of the center point in the window, and N_pFor an adaptive window, n is the total number of pixels in the window. After the mean of the convolution functions of the channels is calculated, the covariance function is calculated, as shown in equation (18):

The invention only carries out aggregation on the basis of the reference image,

the image to be matched is not added. The experimental result shows that the accuracy of the algorithm is not reduced while the real-time performance of the algorithm is improved. And finally, selecting the parallax value with the minimum cost function as the pixel value of the parallax image by adopting a formula (26) through a WTA (Winner-Takes-All) algorithm.

Four, parallax post-processing

1, LRC left-right consistency detection

When the left image is taken as a reference image, the parallax d is calculated_lObtaining the parallax d by using the right image as the reference image_r. When the following are satisfiedThe conditions of equation (27):

|d_l-d_r|＞δ (27)

2, adaptive weighted median filtering

obtained from the following formulas (29) and (30), respectively.

I′(q)＝w·I(q) (31)

Claims

1. A binocular vision stereo matching method based on multi-matching element fusion comprises a computer, two cameras and the stereo matching method based on multi-matching element fusion, and is characterized in that:

the camera selects a ZED binocular camera and is set as follows:

(1) and image acquisition:

the ZED SDK and the CUDA are downloaded and connected with a computer through a USB3.0 interface; connecting a ZED binocular camera through a webcam function in an MATLAB, and acquiring pictures through a snapshot function;

(2) calibrating a camera:

the aim of camera calibration is to obtain accurate internal and external parameters of a camera; the internal parameters mainly comprise the focal lengths of the left lens and the right lens and the baseline distance; the external parameters mainly comprise a rotation matrix of the two cameras relative to a world coordinate system and a relative translation matrix of the left camera and the right camera; acquiring default internal and external parameters of the camera;

(3) and image parameter setting:

performing epipolar line correction through calibration parameters so that the acquired left and right images meet epipolar constraint conditions; resetting parameters by embedding a ZED Explorer plug-in into the ZED SDK;

(4) and stereo matching:

stereo matching is used as a core part of a binocular vision system; the purpose of stereo matching is to perform imaging point matching on the acquired left and right images, obtain a parallax value through the matching points and obtain depth information of a scene;

the ZED binocular camera can be arranged on a robot or an unmanned aerial vehicle, the actual scene shot by the two eyes can achieve the purpose of a real scene through the stereo matching processing of the fusion of multiple matching primitives, and then a navigation instruction is sent to a robot or unmanned aerial vehicle control and driving system through the processing of a computer arranged on the robot or the unmanned aerial vehicle;

an improved ASW matching method is adopted, and the method mainly comprises a left reference image reading-in stage, a left initial matching cost stage, a right initial matching cost stage, a left cost function aggregation stage and a right cost function aggregation stage, and 4 stages of parallax post-processing; the parallax post-processing stage mainly comprises LRC left-right consistency detection and filtering operation, wherein:

1) the initial matching cost calculation is:

the ASW algorithm utilizes the gray information of the image as a primitive of matching cost calculation; the method is improved by setting a truncation threshold value for the mean value of a gradient element and an R, G, B three-channel color element, fusing R, G, B color and gradient information of a pixel through the idea of Kalman filtering, and performing adaptive adjustment through a control coefficient alpha; the specific process is as follows:

(1) setting color and gradient thresholds t1 and t2 respectively, and calculating initial cost

As shown in equations (10) (11);

(2) adaptively adjusting the alpha coefficient to calculate the final initial matching cost, wherein the final initial matching cost is shown in a formula (12);

2) the improved adaptive window expansion algorithm comprises the following steps:

according to the color and the space distance between pixels, an adaptive window method is adopted, the central pixel point p (x, y) to be matched is known, and the neighborhood pixel points in the x and y directions are p (x-1, y), p (x +1, y), p (x, y-1) and p (x, y +1) respectively; the method is different from the traditional self-adaptive window expansion method which carries out pixel expansion by using the gray information of the central pixel point; taking the central points R, G and B as extension elements, and carrying out window expansion when three-channel information of the neighborhood pixels and the central pixel simultaneously meets the condition of the following formula (13);

t is a preset color threshold, and t belongs to (0, 1); when the same region pixel jumps due to discontinuous textures in the image, three-channel pixel information of adjacent pixels hardly satisfies formula (13); based on the characteristic, the traditional fixed window is improved; when the neighborhood pixels are subjected to window self-adaptive expansion under the condition of satisfying the formula (13), if a texture repetition region exists in a scene, the window is too large, so that the calculation is too complex when the cost is aggregated, and the real-time requirement of the algorithm is not met; setting a truncation arm length for the self-adaptive window according to the geometric characteristics of the image; truncating the window arm length when the following formula (14) is satisfied;

wherein, p (x), p (y) are horizontal and vertical coordinate values of the central pixel, q (x), q (y) are coordinate values of the neighborhood pixels; the minimum arm length L is set through experiments on four test images of tsukuba, teddy, cons and venus under a midlebury platform_minThreshold L5_max11; the adaptive window is first adapted by, for a central pixel I (x,y) performing transverse expansion, and if the final window arm length is less than the shortest arm length, using the minimum arm length L_minReplacing the original length; in the window expansion process, when the space distance between the neighborhood pixel and the central pixel is greater than a truncation threshold, the window is truncated; four arms with different lengths L at upper, lower, left and right sides by taking a central point as reference_u,L_d,L_l,L_r(ii) a The values of the row and column sizes of the adaptive window are respectively shown in the following formulas (15) and (16):

Rows＝L_u+L_d (15)

Cows＝L_l+L_r (16)

3) and matching cost aggregation algorithm:

calculating the mean value of the convolution of the R, G, B three-channel pixel value of the pixel to be matched and the matching cost function in the matching window, as shown in the formula (17);

wherein, p is the pixel coordinate of the center point of the window, q is the pixel coordinate of the center point in the window, Np is the self-adaptive window, and n is the total number of pixels in the window; after the mean of the convolution functions of each channel is calculated,

the covariance function is calculated as shown in equation (18):

in the above formula, m_c,m_eThe average values of the three channel elements and the matching cost function in the adaptive window are respectively shown as the following formulas (19) and (20):

then, calculating a variance matrix theta formed by three channel elements in a reference image adaptive window, wherein the variance matrix theta is specifically described as shown in a formula (21);

due to v_ce(c ∈ { r, g, b }) is a vector of 1 x 3, and γ can be obtained_knContains a vector of three channels; when the correlation coefficient gamma is calculated_knThen, the convolution of three-channel pixels of the reference image and the correlation coefficient is subtracted from the average value of each point cost function, so that the matching cost of the left image and the right image is more independent; finally, the initial cost function is shown in equation (24) below:

due to gamma_knIs a three channel vector, so n ∈ (1,2, 3); after the final matching cost is calculated, performing matching cost aggregation through a self-adaptive window, wherein a specific formula is shown as a formula (25);

only the reference image is aggregated, and the image to be matched is not added; the experimental result shows that the accuracy of the algorithm is not reduced while the real-time performance of the algorithm is improved; finally, selecting the parallax value with the minimum cost function as the pixel value of the parallax image by adopting a formula (26) through a WTA (Winner-Takes-All) algorithm;

4) parallax post-processing

(1) LRC left and right consistency detection

In the stereo matching algorithm, the problem of occlusion is inevitable all the time because of the parallax of the left and right images; before obtaining a final disparity map, performing disparity postprocessing operation by adopting an LRC left-right consistency algorithm;

when the left image is taken as a reference image, the parallax d is calculated_lObtaining a parallax dr by taking the right image as a reference image; when the following condition of equation (27) is satisfied:

delta is a threshold value, delta belongs to (0, 1); herein δ takes the value 1; when the absolute value of the left-right parallax difference is larger than delta, the left-right parallax difference is considered as a shielding point; selecting a smaller parallax value in left-right parallax from the shielding points to perform parallax filling;

(2) adaptive weighted median filtering

After the cost aggregation algorithm is carried out, the obtained disparity map usually has more salt and pepper noises, and the median filtering is necessary to be carried out on the image; however, conventional filtering tends to ignore the correlation between pixels; different weights are given to the pixels in the window based on the difference between the colors and the distances of the pixels in the space, and the specific weight calculation is shown as a formula (28);

γ_c，γ_dis constant, gamma is obtained by experiment_c＝0.1，γ_d9; k1, k2 are derived from the difference between the central pixel and the surrounding pixel points in color space and distance space, and are obtained by the following formulas (29) and (30), respectively;

window size 19 x 19; after obtaining the weight of each pixel in the window, the self-adaptive median filtering is performed, and the specific process is as follows:

(1) multiplying the gray value of each pixel except the central point in the window by the respective weight value to obtain a new gray value, and calculating by using the formula (31);

(2) sorting the new values of the pixels including the central point in the window, taking the 2 pixel values I '(q 1) and I' (q2) which are positioned near the median and are closest to the central point, taking the average value to obtain the new gray value of the sub-pixel level to replace the gray value of the pixel at the original central point, and obtaining the gray value by the calculation of the formula (32)

。