CN114140527B - Dynamic environment binocular vision SLAM method based on semantic segmentation - Google Patents
Dynamic environment binocular vision SLAM method based on semantic segmentation Download PDFInfo
- Publication number
- CN114140527B CN114140527B CN202111373890.7A CN202111373890A CN114140527B CN 114140527 B CN114140527 B CN 114140527B CN 202111373890 A CN202111373890 A CN 202111373890A CN 114140527 B CN114140527 B CN 114140527B
- Authority
- CN
- China
- Prior art keywords
- dynamic
- binocular
- feature points
- semantic
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000011218 segmentation Effects 0.000 title claims abstract description 22
- 230000003068 static effect Effects 0.000 claims abstract description 42
- 238000013135 deep learning Methods 0.000 claims abstract description 8
- 230000033001 locomotion Effects 0.000 claims description 28
- 238000010276 construction Methods 0.000 abstract description 3
- 238000013507 mapping Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/38—Electronic maps specially adapted for navigation; Updating thereof
- G01C21/3804—Creation or updating of map data
- G01C21/3833—Creation or updating of map data characterised by the source of data
- G01C21/3841—Data obtained from two or more sources, e.g. probe vehicles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
- G06T7/85—Stereo camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Automation & Control Theory (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a dynamic environment binocular vision SLAM method based on semantic segmentation, which comprises the following steps: acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network; a binocular camera is adopted to obtain multi-frame continuous binocular images; extracting characteristic points on each frame of binocular image, and matching the characteristic points on the adjacent frames of binocular images; removing the feature points on the semantic mask, and calculating the pose of the camera according to the rest feature points; separating a dynamic object and a static object on the binocular image based on the camera pose; recalculating the camera pose based on the separated static object; and constructing a static map based on the updated camera pose and the feature points on the static object. The method uses the binocular camera, takes the image after semantic information segmentation as a guide, can identify dynamic and static objects in the scene, realizes the construction of the map, has simple operation and low cost, and can be applied to most practical scenes.
Description
Technical Field
The invention relates to the technical field of visual space positioning, in particular to a dynamic environment binocular vision SLAM method based on semantic segmentation.
Background
With the development of computer technology and artificial intelligence, intelligent autonomous mobile robots are an important research direction and research hotspot in the robot field. Along with the gradual intellectualization of mobile robots, the requirements of the mobile robots on the positioning of the mobile robots and the environment map are higher. At present, intelligent mobile robots have some practical applications to accomplish self-localization and mapping in known environments, but many challenges remain in unknown environments. A technique for accomplishing positioning and mapping in such an environment is called SLAM (Simultaneous Localization AND MAPPING), i.e., synchronous positioning and mapping, the goal of which is to enable the robot to accomplish its own positioning and incremental mapping during the movement of the unknown environment.
Traditional SLAM algorithms rely mainly on distance sensors with better stability, such as lidar. However, the range data obtained by the lidar is very sparse, which results in an environment map constructed by SLAM containing only a very small number of landmark points. This map can only be used to improve the positioning accuracy of the robot, but cannot be used in other fields of robot navigation such as path planning. Furthermore, the high price, large volume weight and power consumption of lidar limit their application in certain fields. Although the camera can overcome disadvantages of the laser radar in price, volume, mass and power consumption to some extent, and the camera can acquire abundant information at the same time, the camera also has some problems such as sensitivity to light changes, high operation complexity and the like. At present, a multi-sensor fusion SLAM algorithm is also provided, and the problems caused by the defects of a single sensor can be effectively relieved, but the cost and the complexity of the algorithm are further increased.
Existing visual SLAM algorithms are mostly based on the environmental static assumption that the scene is static, with no objects in relative motion. However, there are a large number of dynamic objects such as pedestrians and vehicles in the actual outdoor scene, so that the SLAM system based on the above assumption is limited to be used in the actual scene. Aiming at the problem that the positioning accuracy and stability of the visual SLAM algorithm are reduced in a dynamic environment, the existing algorithm uses some algorithms based on probability statistics or geometric constraint, so that the influence of dynamic objects on the accuracy and stability of the visual SLAM algorithm is reduced. For example, when there are a small number of dynamic objects in the scene, the dynamic objects may be culled using RANSAC (Random Sample Consensus) et al probability algorithm. But when a large number of dynamic objects appear in the scene, the above algorithm will not normally distinguish between dynamic objects. While other algorithms use optical flow methods to distinguish dynamic objects, in a scene where there are a large number of dynamic objects, the dynamic objects can be distinguished by using optical flow methods, but because the process of calculating dense optical flow is time-consuming, the execution efficiency of the SLAM algorithm is reduced.
Therefore, how to provide a dynamic environment binocular vision SLAM method based on semantic segmentation, which is simple in operation, low in cost and applicable to most practical scenes, is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention provides a dynamic environment binocular vision SLAM method based on semantic segmentation, which aims to solve the technical problems.
In order to solve the technical problems, the invention provides a dynamic environment binocular vision SLAM method based on semantic segmentation, which comprises the following steps:
acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network;
A binocular camera is adopted to obtain multi-frame continuous binocular images;
Extracting characteristic points on the binocular images of each frame, and matching the characteristic points on the binocular images of adjacent frames;
Removing the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points;
Separating a dynamic object and a static object on the binocular image based on the camera pose;
Estimating motion parameters of the dynamic object based on the separated dynamic object;
Recalculating the camera pose based on the separated static object;
And constructing a static map based on the updated camera pose and the feature points on the static object.
Preferably, the deep learning network for generating the semantic Mask is a Mask R-CNN model.
Preferably, the method for extracting the feature points on the binocular image of each frame and matching the feature points on the binocular image of the adjacent frame comprises the following steps:
extracting the characteristic points by using an ORB method;
and acquiring descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors positioned on two adjacent frames of binocular images of one feature point, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
Preferably, the method for judging whether the feature points are located on the semantic mask comprises the following steps: the semantic mask at least comprises a frame of an object, and the coordinates of the feature points are located in the range of the frame, so that the feature points are located on the semantic mask.
Preferably, the method for calculating the pose of the camera according to the residual characteristic points comprises the following steps: and solving the camera pose by adopting a PnP algorithm.
Preferably, the separating the dynamic object and the static object on the binocular image based on the camera pose; the method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps:
Separating dynamic objects: calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular image of the adjacent frame and the semantic mask, and judging the object corresponding to the semantic mask as a dynamic object if the motion probability is larger than a first threshold;
Dynamic object matching: aiming at the dynamic object, calculating hu moment, center point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the adjacent frame binocular image, and calculating probability of matching the dynamic object in the adjacent frame binocular image based on the hu moment, the center point Euclidean distance and the histogram distribution, wherein if the probability is larger than a second threshold value, two dynamic objects in the adjacent frame binocular image are the same object; and
Dynamic object motion estimation: and completing the association of the dynamic objects between the continuous frames through the matching of the dynamic objects, and estimating the motion parameters of the dynamic objects through a PnP algorithm.
Preferably, the step of separating the dynamic object comprises:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
Calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a parallax map, wherein the parallax map is calculated by the binocular image;
Calculating errors of the corresponding feature points of the previous frame and the current frame in the directions of x, y and z, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
Preferably, the method for recalculating the pose of the camera based on the separated static object comprises the following steps: and eliminating the characteristic points on the semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the residual characteristic points.
Preferably, the method for constructing the static map based on the updated camera pose and the feature points on the static object comprises the following steps:
determining a plurality of key frames based on the updated camera pose and feature points located on the static object;
Matching the feature points on the key frames, and eliminating unmatched feature points;
checking whether the matched characteristic points meet epipolar geometric constraint or not, and eliminating unsatisfied characteristic points;
Checking whether the forward depth of field, parallax, back projection errors and scales of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
and constructing the static map based on the map points.
Preferably, before the static map is constructed, the method further comprises the step of optimizing the generated map points through beam adjustment.
Compared with the prior art, the dynamic environment binocular vision SLAM method based on semantic segmentation uses a binocular camera, takes images after semantic information segmentation as guidance, can identify dynamic and static objects in a scene, realizes map construction, is simple in operation and low in cost, and can be applied to most actual scenes.
Drawings
FIG. 1 is a flow chart of a dynamic environment binocular vision SLAM method based on semantic segmentation in an embodiment of the present invention;
FIG. 2 is a flow chart of separating dynamic objects according to an embodiment of the invention.
Detailed Description
In order to describe the technical solution of the above invention in more detail, the following specific examples are listed to demonstrate technical effects; it is emphasized that these examples are illustrative of the invention and are not limiting the scope of the invention.
The dynamic environment binocular vision SLAM method based on semantic segmentation provided by the invention, as shown in figure 1, comprises the following steps:
The semantic Mask of the object is obtained, the semantic Mask is generated through a deep learning network, and in the embodiment, the deep learning network for generating the semantic Mask is a Mask R-CNN model, so that high-quality semantic segmentation is achieved.
The method comprises the steps of acquiring a plurality of continuous binocular images by a binocular camera, and acquiring third-dimensional depth information of two-dimensional image pixels from the binocular images, wherein, of course, internal parameters and external parameters of the binocular camera mainly comprise: the parameters of the focal length f of the camera, the optical center (u, v) of the camera, the radial distortion coefficients kc 1 and kc 2 of the camera lens and the like can be obtained through calibration by a Zhang Zhengyou calibration method.
And extracting the characteristic points on the binocular image of each frame, and matching the characteristic points on the binocular images of the adjacent frames. The specific method comprises the following steps:
extracting the characteristic points by using an ORB (English full name: oriented Fast and Rotated Brief) method;
and acquiring descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors positioned on two adjacent frames of binocular images of one feature point, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
And eliminating the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points. The method for judging whether the feature points are located on the semantic mask comprises the following steps: the semantic mask at least comprises a frame of an object, and the coordinates of the characteristic points are located in the range of the frame, so that the characteristic points are located on the semantic mask; if not, the feature point is not located on the semantic mask. The method for calculating the pose of the camera according to the residual characteristic points comprises the following steps: solving the camera pose by adopting a PnP (English full scale PERSPECTIVE-n-Point) algorithm, constructing a reprojection error and optimizing the reprojection error as shown in the following formula (1):
and obtaining an optimal solution, namely the required camera pose, by minimizing the re-projection error.
The method for separating the dynamic object and the static object on the binocular image based on the camera pose comprises the following steps:
Separating dynamic objects: and calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular image of the adjacent frame and the semantic mask, and judging the object corresponding to the semantic mask as a dynamic object if the motion probability is larger than a first threshold. The specific steps are shown in fig. 2, including:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
Calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a disparity map, wherein the disparity map is calculated by the binocular image, and specifically, the disparity map can be calculated by adopting an ELAS (English full name: EFFICIENT LARGE SCALE Stereo Matching) algorithm;
Calculating errors of the corresponding feature points of the previous frame and the current frame in the directions of x, y and z, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
As known from the camera imaging principle, the conversion relationship between the three-dimensional coordinate system and the pixel (two-dimensional) coordinate system, and the depth and parallax are converted into:
the coordinate set of the jth semantic mask of the t-1 frame on the pixel coordinate system is recorded as Obtaining a three-dimensional coordinate set of the semantic mask at the moment through a formula (2) and a formula (3)
Obtaining a three-dimensional point set after movement through a formula (4)
Obtained by the formula (3)Conversion to a set under a pixel coordinate systemThen utilizeAnd the parallax map is calculated by the formula (2) and the formula (3)
Recording deviceIs thatIn (3) the point (i) of the middle,Is thatThe i-th point, the error delta i between the two points is calculated as:
the error of the object corresponding to the feature point is:
the calculated motion probability S (Δ j) is:
Dynamic object matching: and aiming at the dynamic object, calculating hu moment (namely image moment), center point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the adjacent frame binocular image, and calculating the probability of matching the dynamic object in the adjacent frame binocular image based on the hu moment, the center point Euclidean distance and the histogram distribution, wherein if the probability is larger than a second threshold value, two dynamic objects in the adjacent frame binocular image are the same object. In particular, the hu moment of an image is an image feature with translational, rotational, and scale invariance.
The general moment calculation formula of the image is as follows:
Calculating the hu moment requires calculating the center distance, and firstly calculating the barycenter coordinates:
then construct the center moment:
and then normalizing the center distance:
by constructing the hu moment with the center moment, the hu moment has 7 invariant moments, and the specific formula is as follows:
Φ1=η20+η02
Φ3=(η20-3η12)2+3(η21-η03)2
Φ4=(η30+η12)2+(η21+η03)2
Φ5=(η30+3η12)(η30+η12)[(η30+η12)2-3(η21+η03)2+(3η21-η03)(η21+η03)[3(η30+η12)2-(η21+η03)2
Φ6=(η20-η02)[(η30+η12)2-(η21+η03)2]+4η11(η30+η12)(η21+η03)
Φ7=(3η21-η03)(η30+η12)[(η30+η12)2-3(η21+η03)2]+]+(3η12-η30)(η21+η03)[3(η30+η12)2-(η21+η03)2] (12)
Recording device The hu moment of the j semantic masks of the t-1 th frame is the distance between the two semantic masks is:
Calculating the center position of each semantic mask, and then calculating the Euclidean distance of the center point position of each semantic mask between the front frame and the rear frame, wherein the Euclidean distance is recorded as:
Calculating the histogram distribution of the semantic mask, normalizing and marking as Then calculate the Kl divergence (English full name: kullback-Leibler divergence, also called relative entropy: relative entercopy) of the different semantic masks of the previous and subsequent frames.
Combining the hu moment, the Euclidean distance and the histogram, and estimating the matching probability:
The method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps: dynamic object motion estimation: and completing the association of the dynamic objects between the continuous frames through the matching of the dynamic objects, and estimating the motion parameters of the dynamic objects through a PnP algorithm.
The method comprises the following steps of recalculating the pose of the camera based on the separated static object: and eliminating the characteristic points on the semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the residual characteristic points, wherein the specific calculation method can refer to the method for calculating the camera pose for the first time.
Based on the updated camera pose and the feature points on the static object, constructing a static map, wherein the specific method comprises the following steps:
determining a plurality of key frames based on the updated camera pose and feature points located on the static object;
Matching the feature points on the key frames, triangulating the matched feature points, matching the non-matched points with the non-matched feature points in other key frames until all the matched feature points are found, and eliminating the non-matched feature points;
checking whether the matched characteristic points meet epipolar geometric constraint or not, and eliminating unsatisfied characteristic points;
Checking whether the forward depth of field, parallax, back projection errors and scales of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
and constructing the static map based on the map points.
Preferably, before the static map is constructed, the method further comprises the step of optimizing the generated map points through beam method adjustment (BA, english full name: bundle adjustment).
According to the method, the dynamic objects in the binocular images are identified through processing the binocular images, the pose of the camera and the pose of the dynamic objects are estimated, an environment map is constructed, and the requirements of the mobile robot on the three-dimensional map are met.
In summary, the dynamic environment binocular vision SLAM method based on semantic segmentation provided by the invention comprises the following steps: acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network; a binocular camera is adopted to obtain multi-frame continuous binocular images; extracting characteristic points on the binocular images of each frame, and matching the characteristic points on the binocular images of adjacent frames; removing the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points; separating a dynamic object and a static object on the binocular image based on the camera pose; estimating motion parameters of the dynamic object based on the separated dynamic object; recalculating the camera pose based on the separated static object; and constructing a static map based on the updated camera pose and the feature points on the static object. The method uses the binocular camera, takes the image after semantic information segmentation as a guide, can identify dynamic and static objects in the scene, realizes the construction of the map, has simple operation and low cost, and can be applied to most practical scenes.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (7)
1. A dynamic environment binocular vision SLAM method based on semantic segmentation is characterized by comprising the following steps:
acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network;
A binocular camera is adopted to obtain multi-frame continuous binocular images;
Extracting characteristic points on the binocular images of each frame, and matching the characteristic points on the binocular images of adjacent frames;
Removing the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points;
Separating a dynamic object and a static object on the binocular image based on the camera pose;
Estimating motion parameters of the dynamic object based on the separated dynamic object;
Recalculating the camera pose based on the separated static object;
constructing a static map based on the updated camera pose and the feature points on the static object;
separating a dynamic object and a static object on the binocular image based on the camera pose; the method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps:
Separating dynamic objects: calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular image of the adjacent frame and the semantic mask, and judging the object corresponding to the semantic mask as a dynamic object if the motion probability is larger than a first threshold;
Dynamic object matching: aiming at the dynamic object, calculating hu moment, center point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the adjacent frame binocular image, and calculating probability of matching the dynamic object in the adjacent frame binocular image based on the hu moment, the center point Euclidean distance and the histogram distribution, wherein if the probability is larger than a second threshold value, two dynamic objects in the adjacent frame binocular image are the same object; and
Dynamic object motion estimation: the association of the dynamic objects between the continuous frames is completed through the matching of the dynamic objects, and the motion parameters of the dynamic objects are estimated through a PnP algorithm;
The method for recalculating the pose of the camera based on the separated static object comprises the following steps: removing feature points on a semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the rest feature points;
the method for constructing the static map based on the updated camera pose and the feature points on the static object comprises the following steps:
determining a plurality of key frames based on the updated camera pose and feature points located on the static object;
Matching the feature points on the key frames, and eliminating unmatched feature points;
checking whether the matched characteristic points meet epipolar geometric constraint or not, and eliminating unsatisfied characteristic points;
Checking whether the forward depth of field, parallax, back projection errors and scales of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
and constructing the static map based on the map points.
2. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the deep learning network used to generate the semantic Mask is a Mask R-CNN model.
3. The dynamic environment binocular vision SLAM method based on semantic segmentation of claim 1, wherein the extracting feature points on the binocular image of each frame and matching feature points on the binocular image of the adjacent frame comprises:
extracting the characteristic points by using an ORB method;
and acquiring descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors positioned on two adjacent frames of binocular images of one feature point, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
4. The dynamic environment binocular vision SLAM method based on semantic segmentation of claim 1, wherein the method of judging whether the feature points are located on the semantic mask comprises: the semantic mask at least comprises a frame of an object, and the coordinates of the feature points are located in the range of the frame, so that the feature points are located on the semantic mask.
5. The dynamic environment binocular vision SLAM method based on semantic segmentation according to claim 1, wherein the method of calculating the camera pose according to the remaining feature points comprises: and solving the camera pose by adopting a PnP algorithm.
6. The semantic segmentation-based dynamic environment binocular vision SLAM method of claim 1, wherein the step of separating the dynamic objects comprises:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
Calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a parallax map, wherein the parallax map is calculated by the binocular image;
Calculating errors of the corresponding feature points of the previous frame and the current frame in the directions of x, y and z, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
7. The semantic segmentation-based dynamic environment binocular vision SLAM method of claim 1, further comprising the step of optimizing the generated map points by beam method adjustment before constructing the static map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111373890.7A CN114140527B (en) | 2021-11-19 | 2021-11-19 | Dynamic environment binocular vision SLAM method based on semantic segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111373890.7A CN114140527B (en) | 2021-11-19 | 2021-11-19 | Dynamic environment binocular vision SLAM method based on semantic segmentation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114140527A CN114140527A (en) | 2022-03-04 |
CN114140527B true CN114140527B (en) | 2024-09-10 |
Family
ID=80390414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111373890.7A Active CN114140527B (en) | 2021-11-19 | 2021-11-19 | Dynamic environment binocular vision SLAM method based on semantic segmentation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114140527B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115049949B (en) * | 2022-04-29 | 2024-09-24 | 哈尔滨工程大学 | Object expression method based on binocular vision |
CN116524026B (en) * | 2023-05-08 | 2023-10-27 | 哈尔滨理工大学 | Dynamic vision SLAM method based on frequency domain and semantics |
CN116883586B (en) * | 2023-06-14 | 2024-08-23 | 重庆大学 | Terrain semantic map construction method, system and product based on binocular camera |
CN116958265A (en) * | 2023-09-19 | 2023-10-27 | 交通运输部天津水运工程科学研究所 | Ship pose measurement method and system based on binocular vision |
CN117788730B (en) * | 2023-12-08 | 2024-10-15 | 中交机电工程局有限公司 | Semantic point cloud map construction method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110349213B (en) * | 2019-06-28 | 2023-12-12 | Oppo广东移动通信有限公司 | Pose determining method and device based on depth information, medium and electronic equipment |
CN113516664B (en) * | 2021-09-02 | 2024-07-26 | 长春工业大学 | Visual SLAM method based on semantic segmentation dynamic points |
-
2021
- 2021-11-19 CN CN202111373890.7A patent/CN114140527B/en active Active
Non-Patent Citations (1)
Title |
---|
基于几何⁃语义联合约束的动态环境视觉SLAM 算法;沈晔湖等;数据采集与处理;20220531;第37卷(第3期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114140527A (en) | 2022-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114140527B (en) | Dynamic environment binocular vision SLAM method based on semantic segmentation | |
CN111462135B (en) | Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation | |
CN112785702B (en) | SLAM method based on tight coupling of 2D laser radar and binocular camera | |
CN110097553B (en) | Semantic mapping system based on instant positioning mapping and three-dimensional semantic segmentation | |
CN110070615B (en) | Multi-camera cooperation-based panoramic vision SLAM method | |
CN109345588B (en) | Tag-based six-degree-of-freedom attitude estimation method | |
CN111201451A (en) | Method and device for detecting object in scene based on laser data and radar data of scene | |
CN110688905B (en) | Three-dimensional object detection and tracking method based on key frame | |
CN108537844B (en) | Visual SLAM loop detection method fusing geometric information | |
US20220051425A1 (en) | Scale-aware monocular localization and mapping | |
JP6782903B2 (en) | Self-motion estimation system, control method and program of self-motion estimation system | |
CN111882602B (en) | Visual odometer implementation method based on ORB feature points and GMS matching filter | |
CN112419497A (en) | Monocular vision-based SLAM method combining feature method and direct method | |
Shi et al. | An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds | |
CN110070578B (en) | Loop detection method | |
CN114549542A (en) | Visual semantic segmentation method, device and equipment | |
CN111899345B (en) | Three-dimensional reconstruction method based on 2D visual image | |
CN116468786B (en) | Semantic SLAM method based on point-line combination and oriented to dynamic environment | |
CN117974786A (en) | Multi-vision-based dynamic environment reconstruction and measurement method and system | |
Gan et al. | A dynamic detection method to improve SLAM performance | |
CN114648639B (en) | Target vehicle detection method, system and device | |
CN117409386A (en) | Garbage positioning method based on laser vision fusion | |
CN115880428A (en) | Animal detection data processing method, device and equipment based on three-dimensional technology | |
CN114419259A (en) | Visual positioning method and system based on physical model imaging simulation | |
Zhou et al. | 2D Grid map for navigation based on LCSD-SLAM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |