Disclosure of Invention
The invention aims to provide a binocular vision stereo matching method based on multi-matching element fusion, aiming at the problems in the prior art, the average mismatching rate of a disparity map obtained by adopting the method is greatly improved, the image is clear and accurate, and the precision of the running track of a robot or an unmanned aerial vehicle is greatly improved.
The technical scheme is as follows:
a binocular vision stereo matching method based on multi-matching element fusion comprises a computer, two cameras and the stereo matching method based on multi-matching element fusion, and is characterized in that:
the two cameras adopt ZED binocular cameras produced by Astrolabes company, and the ZED binocular cameras are set as follows:
1) image acquisition:
the ZED SDK and the CUDA are downloaded and connected with a computer through a USB3.0 interface. Connecting a ZED binocular camera through a webcam function in an MATLAB, and acquiring pictures through a snapshot function;
2) calibrating a camera:
the aim of camera calibration is to obtain accurate internal and external parameters of a camera. The internal parameters mainly comprise the focal lengths of the left lens and the right lens and the baseline distance; the extrinsic parameters mainly include a rotation matrix of the two cameras relative to a world coordinate system and a relative translation matrix of the left camera and the right camera. The invention obtains the default internal and external parameters of the camera from the official manual;
3) setting image parameters:
and performing epipolar correction through the calibration parameters to enable the acquired left and right images to meet epipolar constraint conditions. Resetting parameters by embedding a ZED Explorer plug-in into the ZED SDK;
4) stereo matching:
stereo matching is a core part of the binocular vision system. The purpose of stereo matching is to perform imaging point matching on the acquired left and right images, obtain a parallax value through the matching points, and obtain depth information of a scene.
The ZED binocular camera not only can acquire high-quality image information, but also has the advantages of small size and low power consumption. Especially, because the ZED SDK function library and the CUDA are embedded for parallel processing, the processing time of the system is greatly reduced. The ZED binocular camera can be installed on a robot or an unmanned plane. The actual scene shot by the two eyes can achieve the purpose of a real scene through the stereo matching processing of the fusion of multiple matching primitives, and then a navigation instruction is sent to a robot or unmanned aerial vehicle control and driving system through the processing of a computer arranged on the robot or the unmanned aerial vehicle; or the navigation instruction is sent to the robot or the unmanned aerial vehicle control and drive system through a computer arranged outside the robot or the unmanned aerial vehicle, connected with the computer through a local area network, an Ethernet or a wire and processed by the computer. The precision of the running track of the robot or the unmanned aerial vehicle is greatly improved.
The stereo matching module method for multi-matching element fusion comprises the following processes:
the invention adopts an improved ASW stereo matching method.
The currently known ASW stereo matching algorithm is:
in the stereo matching algorithm, it is generally defaulted that two images satisfy epipolar constraint conditions, that is, matching points corresponding to left and right images are located at the same row position of the two images. The core of the ASW algorithm is that when the support weights are used to measure the similarity between image pixels, the support of neighboring pixels is only valid if they come from the same depth, which has the same disparity as the pixel to be matched. Therefore, the support weight w of the pixels around the window is proportional to the window disparity probability Pr, as shown in equation (1):
w(p,q)∝Pr(dp=dq) (1)
wherein p is the pixel to be matched, q is the other pixels except the pixel to be matched in the window, and d is the required parallax. w (p, q) is related to the color and distance of the image, as shown in equation (2):
w(p,q)=k·f(Δcpq,Δgpq) (2)
wherein, Δ cpq,ΔgpqRespectively representing the distances of two points p and q in an LAB color space and a geometric space, wherein k is a proportionality coefficient, a specific numerical value is obtained through experiments, f is a Laplace kernel function, and the two are independent from each other, as shown in a formula (3):
f(Δcpq,Δgpq)=f(Δcpq)·f(Δgpq) (3)
wherein Δ cpq,ΔgpqThe calculation is shown in equation (4) (5):
cp=|Lp,ap,bp|,cq=|Lq,aq,bql is Lab three-channel color space chroma value, L component in Lab color space is used for representing pixel brightness, and the value range is [0, 100 ]]From pure black to pure white; a represents the range from red to green, and the value range is [127, -128 ]](ii) a b represents the range from yellow to blue, and the value range is [127 to 128 ]]。(px,py) And (q)x,qy) Are coordinate values of the geometric space. The grouping strength is defined using a laplacian kernel, as shown in equations (6) (7).
γc,γpObtaining gamma through experimentsc=7,γp=36
(the value is typically related to the window size). And then performing cost aggregation using equation (8).
Wherein the initial matching cost
As shown in formula (9):
I
c(q) and
for fixing reference image and image to be matchedAnd determining the gray values of two pixels with the parallax of d in the window. And finally, determining a final disparity map by a WTA (Winner-Takes-All) method.
The invention adopts an improved ASW matching method which mainly comprises the following steps: the method comprises a left reference image reading stage, a right reference image reading stage, a left initial matching cost stage, a right initial matching cost stage, a left cost function aggregation stage and a right cost function aggregation stage, and 4 parallax post-processing stages, wherein the 4 parallax post-processing stages mainly comprise LRC left and right consistency detection and filtering operation. Wherein:
firstly, the calculation of the initial matching cost is as follows:
the ASW algorithm uses the gray information of the image as a primitive for matching cost calculation. The method is improved by setting a truncation threshold value for the mean value of the gradient primitive and the R, G, B three-channel color primitive, fusing R, G, B color and gradient information of a pixel through the thought of Kalman filtering, and performing adaptive adjustment through a control coefficient alpha. The specific process is as follows:
(1) setting color and gradient thresholds t1 and t2 respectively, and calculating initial cost eg,ecAs shown in equations (10) (11).
(2) And (3) adaptively adjusting the alpha coefficient to calculate the final initial matching cost, wherein the final initial matching cost is shown in formula (12).
Secondly, an improved adaptive window expansion algorithm:
the traditional ASW matching algorithm has poor robustness in processing complex texture regions due to the fixed window. The invention adopts a self-adaptive window method according to the color and the space distance between pixels. And knowing a central pixel point p (x, y) to be matched, wherein the neighborhood pixel points in the x and y directions are p (x-1, y), p (x +1, y), p (x, y-1) and p (x, y + 1). Different from the traditional adaptive window expansion method which carries out pixel expansion by using the gray information of the central pixel point, the invention takes the central points R, G and B as expansion elements, and carries out window expansion when the three-channel information of the neighborhood pixels and the central pixel point simultaneously meets the condition of the following formula (13).
Ir,g,b(x,y)-Ir,g,b(x-1,y)<t (13)
t is a preset color threshold, and t ∈ (0, 1). When the same region pixel jumps due to discontinuous textures in the image, it is difficult to make three channels of pixel information of the neighboring pixels satisfy formula (13) at the same time. Based on this characteristic, the present invention makes an improvement over the conventional fixed window. When the neighborhood pixels are subjected to window adaptive expansion under the condition of satisfying the formula (13), if a texture repetition region exists in a scene, the window is too large, so that the calculation is too complex when the cost is aggregated, and the method does not meet the real-time requirement of the algorithm. The invention sets the truncation arm length for the adaptive window according to the geometric characteristics of the image. The window arm length is truncated when the following equation (14) is satisfied.
Wherein, p (x), p (y) are horizontal and vertical coordinate values of the central pixel, and q (x), q (y) are coordinate values of the adjacent pixels. The minimum arm length L is set through experiments on four test images of tsukuba, teddy, cons and venus under a midlebury platformminThreshold L5max=11。
Adaptive window, as shown in fig. 3. First, the central pixel I (x, y) is expanded laterally, and if the final window arm length is less than the shortest arm length, the minimum arm length L is usedminInstead of the original length. In the window expansion process, when the spatial distance between the neighborhood pixel and the central pixel is greater than a truncation threshold, the window is truncated. The final result is shown in FIG. 4, which is based on the center point, and has four arm lengths L with different lengthsu,Ld,Ll,Lr. The values of the row and column sizes of the self-adaptive window are respectively shown in the following formulas (15) and (16):
Rows=Lu+Ld (15)
Cows=Ll+Lr (16)
thirdly, matching cost aggregation algorithm:
after the initial matching cost is calculated, cost function aggregation is carried out, which is different from the complex weight operation of a classical algorithm. According to the method, the variance in the self-adaptive window of the reference image and the covariance of the reference image and the cost function are calculated through the correlation between the cost function and the reference image and the correlation of the reference image, so that the correlation function is obtained, and cost aggregation is performed after the correlation function is subtracted from the initial matching cost function, so that the accuracy and the real-time performance of the algorithm are greatly improved. The specific process is as follows:
first, the mean value of the convolution of the R, G, B three-channel pixel value of the pixel to be matched and the matching cost function within the matching window is calculated, as shown in equation (17).
Wherein, p is the pixel coordinate of the center point of the window, I (x, y) is the pixel coordinate of the center point in the window, I (x, y) is the self-adaptive window, and n is the total number of pixels in the window. After the mean of the convolution functions of the channels is calculated, the covariance function is calculated, as shown in equation (18):
in the above formula, mc,meThe average values of the three channel elements and the matching cost function in the adaptive window are respectively shown as the following formulas (19) and (20):
then, a variance matrix θ of the three channel element composition in the reference image adaptation window is calculated, which is described in detail as shown in the following equation (21).
Wherein, each element of the matrix theta is calculated as shown in formula (22):
the coefficient matrix obtained by equations (17) to (20) is shown in equation (23):
due to vce(c ∈ { r, g, b }) is a vector of 1 x 3, and γ can be obtainedknContains a vector of three channels. When the correlation coefficient gamma is calculatedknAnd then, subtracting the convolution of three-channel pixels of the reference image and the correlation coefficient from the mean value of each point cost function, so that the matching cost of the left image and the right image is more independent. Finally, the initial cost function is shown in equation (24) below:
due to gammaknIs a three channel vector, so n ∈ (1,2, 3). After the final matching cost is calculated, matching cost aggregation is performed through an adaptive window, and a specific formula is shown as a formula (25).
The invention only carries out aggregation on the basis of the reference image,
the image to be matched is not added. The experimental result shows that the accuracy of the algorithm is not reduced while the real-time performance of the algorithm is improved. And finally, selecting the parallax value with the minimum cost function as the pixel value of the parallax image by adopting a formula (26) through a WTA (Winner-Takes-A11) algorithm.
Four, parallax post-processing
1, LRC left-right consistency detection
In the stereo matching algorithm, the problem of occlusion is inevitable because of the parallax of the left and right images. Before the final disparity map is obtained, the invention firstly adopts an LRC left-right consistency algorithm to carry out disparity postprocessing operation.
When the left image is taken as a reference image, the parallax d is calculatedlObtaining the parallax d by using the right image as the reference imager. When the following condition of equation (27) is satisfied:
|dl-dr|>δ (27)
δ is a threshold, δ ∈ (0, 1). In the present invention, δ is 1. And when the absolute value of the left-right parallax difference is larger than delta, the occlusion point is considered. And taking the smaller parallax value in the left-right parallax of the occlusion point for parallax filling.
2, adaptive weighted median filtering
After the cost aggregation algorithm is carried out, the obtained disparity map has more salt-pepper noise, and the median filtering is necessary for the image. Conventional filtering, however, tends to ignore the correlation between pixels. The invention endows different weights to the pixels in the window based on the difference of the color and the distance between the pixels in the space, and the specific weight calculation is shown as a formula (28).
γc,γdIs constant, gamma is obtained by experimentc=0.1,γd9. k1, k2 is obtained by the difference between the central pixel and the surrounding pixel points in the color space and the distance space,
obtained from the following formulas (29) and (30), respectively.
The window size is 19 x 19. And after the weight value of each pixel in the window is obtained, the self-adaptive median filtering is carried out. The specific process is as follows:
(1) and multiplying the gray value of each pixel except the central point in the window by the respective weight value to obtain a new gray value, and calculating by using the formula (31).
I′(q)=w·I(q) (31)
(2) The new values for the pixels within the window, including the center point, are sorted by taking the 2 pixel values I' (q) nearest the center point near the median value1),I′(q2) And taking the average value to obtain a new gray value of the sub-pixel level to replace the gray value of the original central point pixel, and calculating by the formula (32).
Description of the Algorithm
The detailed implementation of the algorithm is described in table 1. TABLE 1
Sixth, experimental results
The experimental environment of the algorithm is Intel core i 7-67003.5 HZ CPU, 12G memory and Matlab2016a platform. In order to verify the effectiveness of the algorithm provided by the invention, the invention firstly compares a disparity map of a classical adaptive weight algorithm under a midlebury platform, for example, fig. 6 is a comparison view of the mismatching rates of all regions and depth discontinuity regions in a cons image mismatching rate comparison map.
The maximum and minimum arm lengths of the self-adaptive window in the experiment are respectively Lmin=5,LmaxThe minimum gradient and color threshold are t 1-2, t 2-7, and kalman coefficient α -0.11, respectively. Self-adaptive weight filtering weight coefficient gammac=0.1,γdThe window is 19 x 19, 9. The invention has three test indexes of the obtained disparity map under a midlebury platform: the nonocc (non-occlusion region mismatching rate), all (all region mismatching rate), disc (depth discontinuity region mismatching rate) test compares the conventional adaptive stereo matching algorithm and the De-Maeztu gradient-based ASW improved algorithm, and the specific ratio is shown in table 2 below.
TABLE 2 evaluation of matching algorithm Table (unit:%)
From table 1, it can be seen that compared with the classical adaptive weight and its improved algorithm, the stereo matching algorithm based on multi-matching primitive fusion has obvious advantages in Venus, Teddy, cons images, no matter in all regions, non-occlusion regions or depth discontinuity region scenes. Under the Tsukuba image with darker light, a certain promotion is also provided in the non-blocking area. Compared with the GradAdaptWgt algorithm based on single gradient primitives, Tsukuba, Venus and Teddy have lower mismatching rate in all regions, and the mismatching rate of the Conses image is equivalent to that of the original GradAdaptWgt algorithm. The algorithm of the invention performs average mismatching rate comparison under middleb ury platform, as shown in table 3, and obtains considerable effect.
TABLE 3 average mismatch ratio comparison
In order to improve the robustness of the algorithm, the Kalman coefficient ratio of the color and gradient information is compared. Selecting the Cones images conforming to the natural scene for comparison, and comparing mismatching rates of matching conditions of noccc, all and disc under different alpha coefficients, wherein alpha belongs to (0.11: 0.02: 0.29).
From the above results, we set the value of α to 0.23 in practical applications to achieve the robustness requirement of the stereo matching algorithm. Compared with the classical self-adaptive support weight algorithm, the method abandons the operation of complex weight and greatly improves the real-time property of the algorithm. The invention contrasts with the classic adaptive support weight algorithm in time complexity as shown in table 4.
TABLE 4 comparison of time complexity (unit: ms) between the algorithm of the present invention and the classical ASW algorithm
The experimental results show that the algorithm greatly reduces the time complexity of the algorithm.
The invention has the advantages that:
the invention provides a novel stereo matching method based on multiple matching primitives by fusing color matching primitives and gradient matching primitives and introducing the correlation between a cost function and a reference image. The algorithm greatly improves the accuracy of scene information under the unmanned condition, contributes to obtaining a more accurate depth map, greatly reduces the time complexity of the algorithm, and provides a theoretical basis for the practical application of unmanned driving. And the navigation precision of the robot or the unmanned aerial vehicle is further improved.
Detailed Description
A binocular vision stereo matching method based on multi-matching element fusion comprises a computer, two cameras and the stereo matching method based on multi-matching element fusion, and is characterized in that:
1. the camera is a ZED binocular camera manufactured by Astrolabes company.
The resolution of the video stream collected by the ZED binocular camera is divided into four categories,
the specific parameters are shown in the following table 1:
table 1: ZED output video stream mode
2. Image acquisition:
the ZED SDK and the CUDA are downloaded and connected with a computer through a USB3.0 interface. Connecting a ZED binocular camera through a webcam function in an MATLAB, and acquiring pictures through a snapshot function;
3. calibrating a camera:
the aim of camera calibration is to obtain accurate internal and external parameters of a camera. The internal parameters mainly comprise the focal lengths of the left lens and the right lens and the baseline distance; the extrinsic parameters mainly include a rotation matrix of the two cameras relative to a world coordinate system and a relative translation matrix of the left camera and the right camera. The invention obtains the default internal and external parameters of the camera from the official manual;
4. setting image parameters:
and performing epipolar correction through the calibration parameters to enable the acquired left and right images to meet epipolar constraint conditions. Resetting parameters by embedding a ZED Explorer plug-in into the ZED SDK;
5. stereo matching:
stereo matching is a core part of the binocular vision system. The purpose of stereo matching is to perform imaging point matching on the acquired left and right images, obtain a parallax value through the matching points, and obtain depth information of a scene.
The ZED binocular camera can be installed on a robot or an unmanned plane. The actual scene shot by the two eyes can achieve the purpose of a real scene through the three-dimensional matching processing of the fusion of the multiple matching elements, and then a navigation instruction is sent to a robot or unmanned aerial vehicle control and driving system through the processing of a computer arranged on the robot or the unmanned aerial vehicle.
The stereo matching method of the multi-matching primitive fusion comprises the following processes:
with the improved invention, an improved ASW matching method is adopted:
the improved ASW algorithm flow provided by the invention mainly comprises the following steps: the method comprises a left reference image reading stage, a right reference image reading stage, a left initial matching cost stage, a right initial matching cost stage, a left cost function aggregation stage and a right cost function aggregation stage, and 4 parallax post-processing stages, wherein the 4 parallax post-processing stages mainly comprise LRC left and right consistency detection and filtering operation. Wherein:
firstly, the calculation of the initial matching cost is as follows:
the ASW algorithm uses the gray information of the image as a primitive for matching cost calculation. The method is improved by setting a truncation threshold value for the mean value of the gradient primitive and the R, G, B three-channel color primitive, fusing R, G, B color and gradient information of a pixel through the thought of Kalman filtering, and performing adaptive adjustment through a control coefficient alpha. The specific process is as follows:
(1) setting color and gradient thresholds t1 and t2 respectively, and calculating initial cost eg,ecAs shown in equations (10) (11).
(2) And (3) adaptively adjusting the alpha coefficient to calculate the final initial matching cost, wherein the final initial matching cost is shown in formula (12).
Secondly, an improved adaptive window expansion algorithm: the traditional ASW matching algorithm has poor robustness in processing complex texture regions due to the fixed window. The invention adopts a self-adaptive window method according to the color and the space distance between pixels. And knowing a central pixel point p (x, y) to be matched, wherein the neighborhood pixel points in the x and y directions are p (x-1, y), p (x +1, y), p (x, y-1) and p (x, y + 1). Different from the traditional adaptive window expansion method which carries out pixel expansion by using the gray information of the central pixel point, the invention takes the central points R, G and B as expansion elements, and carries out window expansion when the three-channel information of the neighborhood pixels and the central pixel point simultaneously meets the condition of the following formula (13).
Ir,g,b(x,y)-Ir,g,b(x-1,y)<t (13)
t is a preset color threshold, and t ∈ (0, 1). When the same region pixel jumps due to discontinuous textures in the image, it is difficult to make three channels of pixel information of the neighboring pixels satisfy formula (13) at the same time. Based on this characteristic, the present invention makes an improvement over the conventional fixed window. When the neighborhood pixels are subjected to window adaptive expansion under the condition of satisfying the formula (13), if a texture repetition region exists in a scene, the window is too large, so that the calculation is too complex when the cost is aggregated, and the method does not meet the real-time requirement of the algorithm. The invention sets the truncation arm length for the adaptive window according to the geometric characteristics of the image. The window arm length is truncated when the following equation (14) is satisfied.
Wherein, p (x), p (y) are horizontal and vertical coordinate values of the central pixel, and q (x), q (y) are coordinate values of the adjacent pixels. The minimum arm length L is set through experiments on four test images of tsukuba, teddy, cons and venus under a midlebury platformminThreshold L5max=11。
Adaptive window, as shown in fig. 3. First, the central pixel I (x, y) is expanded laterally, and if the final window arm length is less than the shortest arm length, the minimum arm length L is usedminInstead of the original length. In the window expansion process, when the spatial distance between the neighborhood pixel and the central pixel is greater than a truncation threshold, the window is truncated. The final result is shown in FIG. 4, which is based on the center point, and has four arm lengths L with different lengthsu,Ld,Ll,Lr. The values of the row and column sizes of the self-adaptive window are respectively shown in the following formulas (15) and (16):
Rows=Lu+Ld (15)
Cows=Ll+Lr (16)
thirdly, matching cost aggregation algorithm:
after the initial matching cost is calculated, cost function aggregation is carried out, which is different from the complex weight operation of a classical algorithm. According to the method, through the correlation between the cost function and the reference image and the correlation between the reference image and the self-adaptive window of the reference image in the document [11], the variance in the adaptive window of the reference image and the covariance between the reference image and the cost function are calculated to obtain the correlation function of the reference image, and cost aggregation is performed after the correlation function is subtracted from the initial matching cost function, so that the accuracy and the real-time performance of the algorithm are greatly improved. The specific process is as follows:
first, the mean value of the convolution of the R, G, B three-channel pixel value of the pixel to be matched and the matching cost function within the matching window is calculated, as shown in equation (17).
Wherein p is the pixel coordinate of the center point of the window, q is the pixel coordinate of the center point in the window, and NpFor an adaptive window, n is the total number of pixels in the window. After the mean of the convolution functions of the channels is calculated, the covariance function is calculated, as shown in equation (18):
in the above formula, mc,meThe average values of the three channel elements and the matching cost function in the adaptive window are respectively shown as the following formulas (19) and (20):
then, a variance matrix θ of the three channel element composition in the reference image adaptation window is calculated, which is described in detail as shown in the following equation (21).
Wherein, each element of the matrix theta is calculated as shown in formula (22):
the coefficient matrix obtained by equations (17) to (20) is shown in equation (23):
due to vce(c ∈ { r, g, b }) is a vector of 1 x 3, and γ can be obtainedknContains a vector of three channels. When the correlation coefficient gamma is calculatedknAnd then, subtracting the convolution of three-channel pixels of the reference image and the correlation coefficient from the mean value of each point cost function, so that the matching cost of the left image and the right image is more independent. Finally, the initial cost function is shown in equation (24) below:
due to gammaknIs a three channel vector, so n ∈ (1,2, 3). After the final matching cost is calculated, matching cost aggregation is performed through an adaptive window, and a specific formula is shown as a formula (25).
The invention only carries out aggregation on the basis of the reference image,
the image to be matched is not added. The experimental result shows that the accuracy of the algorithm is not reduced while the real-time performance of the algorithm is improved. And finally, selecting the parallax value with the minimum cost function as the pixel value of the parallax image by adopting a formula (26) through a WTA (Winner-Takes-All) algorithm.
Four, parallax post-processing
1, LRC left-right consistency detection
In the stereo matching algorithm, the problem of occlusion is inevitable because of the parallax of the left and right images. Before the final disparity map is obtained, the invention firstly adopts an LRC left-right consistency algorithm to carry out disparity postprocessing operation.
When the left image is taken as a reference image, the parallax d is calculatedlObtaining the parallax d by using the right image as the reference imager. When the following are satisfiedThe conditions of equation (27):
|dl-dr|>δ (27)
δ is a threshold, δ ∈ (0, 1). In the present invention, δ is 1. And when the absolute value of the left-right parallax difference is larger than delta, the occlusion point is considered. And taking the smaller parallax value in the left-right parallax of the occlusion point for parallax filling.
2, adaptive weighted median filtering
After the cost aggregation algorithm is carried out, the obtained disparity map has more salt-pepper noise, and the median filtering is necessary for the image. Conventional filtering, however, tends to ignore the correlation between pixels. The invention endows different weights to the pixels in the window based on the difference of the color and the distance between the pixels in the space, and the specific weight calculation is shown as a formula (28).
γc,γdIs constant, gamma is obtained by experimentc=0.1,γd9. k1, k2 is obtained by the difference between the central pixel and the surrounding pixel points in the color space and the distance space,
obtained from the following formulas (29) and (30), respectively.
The window size is 19 x 19. And after the weight value of each pixel in the window is obtained, the self-adaptive median filtering is carried out. The specific process is as follows:
(1) and multiplying the gray value of each pixel except the central point in the window by the respective weight value to obtain a new gray value, and calculating by using the formula (31).
I′(q)=w·I(q) (31)
(2) The new values for the pixels within the window, including the center point, are sorted by taking the 2 pixel values I' (q) nearest the center point near the median value1),I′(q2) And taking the average value to obtain a new gray value of the sub-pixel level to replace the gray value of the original central point pixel, and calculating by the formula (32).