CN114140527B - Dynamic environment binocular vision SLAM method based on semantic segmentation - Google Patents

Dynamic environment binocular vision SLAM method based on semantic segmentation Download PDF

Info

Publication number
CN114140527B
CN114140527B CN202111373890.7A CN202111373890A CN114140527B CN 114140527 B CN114140527 B CN 114140527B CN 202111373890 A CN202111373890 A CN 202111373890A CN 114140527 B CN114140527 B CN 114140527B
Authority
CN
China
Prior art keywords
dynamic
binocular
feature points
semantic
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111373890.7A
Other languages
Chinese (zh)
Other versions
CN114140527A (en
Inventor
沈晔湖
李星
卢金斌
王其聪
赵冲
蒋全胜
朱其新
谢鸥
牛福洲
牛雪梅
付贵忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University of Science and Technology
Original Assignee
Suzhou University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University of Science and Technology filed Critical Suzhou University of Science and Technology
Priority to CN202111373890.7A priority Critical patent/CN114140527B/en
Publication of CN114140527A publication Critical patent/CN114140527A/en
Application granted granted Critical
Publication of CN114140527B publication Critical patent/CN114140527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G01C21/3833Creation or updating of map data characterised by the source of data
    • G01C21/3841Data obtained from two or more sources, e.g. probe vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Automation & Control Theory (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a dynamic environment binocular vision SLAM method based on semantic segmentation, which comprises the following steps: acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network; a binocular camera is adopted to obtain multi-frame continuous binocular images; extracting characteristic points on each frame of binocular image, and matching the characteristic points on the adjacent frames of binocular images; removing the feature points on the semantic mask, and calculating the pose of the camera according to the rest feature points; separating a dynamic object and a static object on the binocular image based on the camera pose; recalculating the camera pose based on the separated static object; and constructing a static map based on the updated camera pose and the feature points on the static object. The method uses the binocular camera, takes the image after semantic information segmentation as a guide, can identify dynamic and static objects in the scene, realizes the construction of the map, has simple operation and low cost, and can be applied to most practical scenes.

Description

Dynamic environment binocular vision SLAM method based on semantic segmentation
Technical Field
The invention relates to the technical field of visual space positioning, in particular to a dynamic environment binocular vision SLAM method based on semantic segmentation.
Background
With the development of computer technology and artificial intelligence, intelligent autonomous mobile robots are an important research direction and research hotspot in the robot field. Along with the gradual intellectualization of mobile robots, the requirements of the mobile robots on the positioning of the mobile robots and the environment map are higher. At present, intelligent mobile robots have some practical applications to accomplish self-localization and mapping in known environments, but many challenges remain in unknown environments. A technique for accomplishing positioning and mapping in such an environment is called SLAM (Simultaneous Localization AND MAPPING), i.e., synchronous positioning and mapping, the goal of which is to enable the robot to accomplish its own positioning and incremental mapping during the movement of the unknown environment.
Traditional SLAM algorithms rely mainly on distance sensors with better stability, such as lidar. However, the range data obtained by the lidar is very sparse, which results in an environment map constructed by SLAM containing only a very small number of landmark points. This map can only be used to improve the positioning accuracy of the robot, but cannot be used in other fields of robot navigation such as path planning. Furthermore, the high price, large volume weight and power consumption of lidar limit their application in certain fields. Although the camera can overcome disadvantages of the laser radar in price, volume, mass and power consumption to some extent, and the camera can acquire abundant information at the same time, the camera also has some problems such as sensitivity to light changes, high operation complexity and the like. At present, a multi-sensor fusion SLAM algorithm is also provided, and the problems caused by the defects of a single sensor can be effectively relieved, but the cost and the complexity of the algorithm are further increased.
Existing visual SLAM algorithms are mostly based on the environmental static assumption that the scene is static, with no objects in relative motion. However, there are a large number of dynamic objects such as pedestrians and vehicles in the actual outdoor scene, so that the SLAM system based on the above assumption is limited to be used in the actual scene. Aiming at the problem that the positioning accuracy and stability of the visual SLAM algorithm are reduced in a dynamic environment, the existing algorithm uses some algorithms based on probability statistics or geometric constraint, so that the influence of dynamic objects on the accuracy and stability of the visual SLAM algorithm is reduced. For example, when there are a small number of dynamic objects in the scene, the dynamic objects may be culled using RANSAC (Random Sample Consensus) et al probability algorithm. But when a large number of dynamic objects appear in the scene, the above algorithm will not normally distinguish between dynamic objects. While other algorithms use optical flow methods to distinguish dynamic objects, in a scene where there are a large number of dynamic objects, the dynamic objects can be distinguished by using optical flow methods, but because the process of calculating dense optical flow is time-consuming, the execution efficiency of the SLAM algorithm is reduced.
Therefore, how to provide a dynamic environment binocular vision SLAM method based on semantic segmentation, which is simple in operation, low in cost and applicable to most practical scenes, is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention provides a dynamic environment binocular vision SLAM method based on semantic segmentation, which aims to solve the technical problems.
In order to solve the technical problems, the invention provides a dynamic environment binocular vision SLAM method based on semantic segmentation, which comprises the following steps:
acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network;
A binocular camera is adopted to obtain multi-frame continuous binocular images;
Extracting characteristic points on the binocular images of each frame, and matching the characteristic points on the binocular images of adjacent frames;
Removing the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points;
Separating a dynamic object and a static object on the binocular image based on the camera pose;
Estimating motion parameters of the dynamic object based on the separated dynamic object;
Recalculating the camera pose based on the separated static object;
And constructing a static map based on the updated camera pose and the feature points on the static object.
Preferably, the deep learning network for generating the semantic Mask is a Mask R-CNN model.
Preferably, the method for extracting the feature points on the binocular image of each frame and matching the feature points on the binocular image of the adjacent frame comprises the following steps:
extracting the characteristic points by using an ORB method;
and acquiring descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors positioned on two adjacent frames of binocular images of one feature point, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
Preferably, the method for judging whether the feature points are located on the semantic mask comprises the following steps: the semantic mask at least comprises a frame of an object, and the coordinates of the feature points are located in the range of the frame, so that the feature points are located on the semantic mask.
Preferably, the method for calculating the pose of the camera according to the residual characteristic points comprises the following steps: and solving the camera pose by adopting a PnP algorithm.
Preferably, the separating the dynamic object and the static object on the binocular image based on the camera pose; the method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps:
Separating dynamic objects: calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular image of the adjacent frame and the semantic mask, and judging the object corresponding to the semantic mask as a dynamic object if the motion probability is larger than a first threshold;
Dynamic object matching: aiming at the dynamic object, calculating hu moment, center point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the adjacent frame binocular image, and calculating probability of matching the dynamic object in the adjacent frame binocular image based on the hu moment, the center point Euclidean distance and the histogram distribution, wherein if the probability is larger than a second threshold value, two dynamic objects in the adjacent frame binocular image are the same object; and
Dynamic object motion estimation: and completing the association of the dynamic objects between the continuous frames through the matching of the dynamic objects, and estimating the motion parameters of the dynamic objects through a PnP algorithm.
Preferably, the step of separating the dynamic object comprises:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
Calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a parallax map, wherein the parallax map is calculated by the binocular image;
Calculating errors of the corresponding feature points of the previous frame and the current frame in the directions of x, y and z, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
Preferably, the method for recalculating the pose of the camera based on the separated static object comprises the following steps: and eliminating the characteristic points on the semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the residual characteristic points.
Preferably, the method for constructing the static map based on the updated camera pose and the feature points on the static object comprises the following steps:
determining a plurality of key frames based on the updated camera pose and feature points located on the static object;
Matching the feature points on the key frames, and eliminating unmatched feature points;
checking whether the matched characteristic points meet epipolar geometric constraint or not, and eliminating unsatisfied characteristic points;
Checking whether the forward depth of field, parallax, back projection errors and scales of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
and constructing the static map based on the map points.
Preferably, before the static map is constructed, the method further comprises the step of optimizing the generated map points through beam adjustment.
Compared with the prior art, the dynamic environment binocular vision SLAM method based on semantic segmentation uses a binocular camera, takes images after semantic information segmentation as guidance, can identify dynamic and static objects in a scene, realizes map construction, is simple in operation and low in cost, and can be applied to most actual scenes.
Drawings
FIG. 1 is a flow chart of a dynamic environment binocular vision SLAM method based on semantic segmentation in an embodiment of the present invention;
FIG. 2 is a flow chart of separating dynamic objects according to an embodiment of the invention.
Detailed Description
In order to describe the technical solution of the above invention in more detail, the following specific examples are listed to demonstrate technical effects; it is emphasized that these examples are illustrative of the invention and are not limiting the scope of the invention.
The dynamic environment binocular vision SLAM method based on semantic segmentation provided by the invention, as shown in figure 1, comprises the following steps:
The semantic Mask of the object is obtained, the semantic Mask is generated through a deep learning network, and in the embodiment, the deep learning network for generating the semantic Mask is a Mask R-CNN model, so that high-quality semantic segmentation is achieved.
The method comprises the steps of acquiring a plurality of continuous binocular images by a binocular camera, and acquiring third-dimensional depth information of two-dimensional image pixels from the binocular images, wherein, of course, internal parameters and external parameters of the binocular camera mainly comprise: the parameters of the focal length f of the camera, the optical center (u, v) of the camera, the radial distortion coefficients kc 1 and kc 2 of the camera lens and the like can be obtained through calibration by a Zhang Zhengyou calibration method.
And extracting the characteristic points on the binocular image of each frame, and matching the characteristic points on the binocular images of the adjacent frames. The specific method comprises the following steps:
extracting the characteristic points by using an ORB (English full name: oriented Fast and Rotated Brief) method;
and acquiring descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors positioned on two adjacent frames of binocular images of one feature point, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
And eliminating the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points. The method for judging whether the feature points are located on the semantic mask comprises the following steps: the semantic mask at least comprises a frame of an object, and the coordinates of the characteristic points are located in the range of the frame, so that the characteristic points are located on the semantic mask; if not, the feature point is not located on the semantic mask. The method for calculating the pose of the camera according to the residual characteristic points comprises the following steps: solving the camera pose by adopting a PnP (English full scale PERSPECTIVE-n-Point) algorithm, constructing a reprojection error and optimizing the reprojection error as shown in the following formula (1):
and obtaining an optimal solution, namely the required camera pose, by minimizing the re-projection error.
The method for separating the dynamic object and the static object on the binocular image based on the camera pose comprises the following steps:
Separating dynamic objects: and calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular image of the adjacent frame and the semantic mask, and judging the object corresponding to the semantic mask as a dynamic object if the motion probability is larger than a first threshold. The specific steps are shown in fig. 2, including:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
Calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a disparity map, wherein the disparity map is calculated by the binocular image, and specifically, the disparity map can be calculated by adopting an ELAS (English full name: EFFICIENT LARGE SCALE Stereo Matching) algorithm;
Calculating errors of the corresponding feature points of the previous frame and the current frame in the directions of x, y and z, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
As known from the camera imaging principle, the conversion relationship between the three-dimensional coordinate system and the pixel (two-dimensional) coordinate system, and the depth and parallax are converted into:
the coordinate set of the jth semantic mask of the t-1 frame on the pixel coordinate system is recorded as Obtaining a three-dimensional coordinate set of the semantic mask at the moment through a formula (2) and a formula (3)
Obtaining a three-dimensional point set after movement through a formula (4)
Obtained by the formula (3)Conversion to a set under a pixel coordinate systemThen utilizeAnd the parallax map is calculated by the formula (2) and the formula (3)
Recording deviceIs thatIn (3) the point (i) of the middle,Is thatThe i-th point, the error delta i between the two points is calculated as:
the error of the object corresponding to the feature point is:
the calculated motion probability S (Δ j) is:
Dynamic object matching: and aiming at the dynamic object, calculating hu moment (namely image moment), center point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the adjacent frame binocular image, and calculating the probability of matching the dynamic object in the adjacent frame binocular image based on the hu moment, the center point Euclidean distance and the histogram distribution, wherein if the probability is larger than a second threshold value, two dynamic objects in the adjacent frame binocular image are the same object. In particular, the hu moment of an image is an image feature with translational, rotational, and scale invariance.
The general moment calculation formula of the image is as follows:
Calculating the hu moment requires calculating the center distance, and firstly calculating the barycenter coordinates:
then construct the center moment:
and then normalizing the center distance:
by constructing the hu moment with the center moment, the hu moment has 7 invariant moments, and the specific formula is as follows:
Φ1=η2002
Φ3=(η20-3η12)2+3(η2103)2
Φ4=(η3012)2+(η2103)2
Φ5=(η30+3η12)(η3012)[(η3012)2-3(η2103)2+(3η2103)(η2103)[3(η3012)2-(η2103)2
Φ6=(η2002)[(η3012)2-(η2103)2]+4η113012)(η2103)
Φ7=(3η2103)(η3012)[(η3012)2-3(η2103)2]+]+(3η1230)(η2103)[3(η3012)2-(η2103)2] (12)
Recording device The hu moment of the j semantic masks of the t-1 th frame is the distance between the two semantic masks is:
Calculating the center position of each semantic mask, and then calculating the Euclidean distance of the center point position of each semantic mask between the front frame and the rear frame, wherein the Euclidean distance is recorded as:
Calculating the histogram distribution of the semantic mask, normalizing and marking as Then calculate the Kl divergence (English full name: kullback-Leibler divergence, also called relative entropy: relative entercopy) of the different semantic masks of the previous and subsequent frames.
Combining the hu moment, the Euclidean distance and the histogram, and estimating the matching probability:
The method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps: dynamic object motion estimation: and completing the association of the dynamic objects between the continuous frames through the matching of the dynamic objects, and estimating the motion parameters of the dynamic objects through a PnP algorithm.
The method comprises the following steps of recalculating the pose of the camera based on the separated static object: and eliminating the characteristic points on the semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the residual characteristic points, wherein the specific calculation method can refer to the method for calculating the camera pose for the first time.
Based on the updated camera pose and the feature points on the static object, constructing a static map, wherein the specific method comprises the following steps:
determining a plurality of key frames based on the updated camera pose and feature points located on the static object;
Matching the feature points on the key frames, triangulating the matched feature points, matching the non-matched points with the non-matched feature points in other key frames until all the matched feature points are found, and eliminating the non-matched feature points;
checking whether the matched characteristic points meet epipolar geometric constraint or not, and eliminating unsatisfied characteristic points;
Checking whether the forward depth of field, parallax, back projection errors and scales of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
and constructing the static map based on the map points.
Preferably, before the static map is constructed, the method further comprises the step of optimizing the generated map points through beam method adjustment (BA, english full name: bundle adjustment).
According to the method, the dynamic objects in the binocular images are identified through processing the binocular images, the pose of the camera and the pose of the dynamic objects are estimated, an environment map is constructed, and the requirements of the mobile robot on the three-dimensional map are met.
In summary, the dynamic environment binocular vision SLAM method based on semantic segmentation provided by the invention comprises the following steps: acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network; a binocular camera is adopted to obtain multi-frame continuous binocular images; extracting characteristic points on the binocular images of each frame, and matching the characteristic points on the binocular images of adjacent frames; removing the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points; separating a dynamic object and a static object on the binocular image based on the camera pose; estimating motion parameters of the dynamic object based on the separated dynamic object; recalculating the camera pose based on the separated static object; and constructing a static map based on the updated camera pose and the feature points on the static object. The method uses the binocular camera, takes the image after semantic information segmentation as a guide, can identify dynamic and static objects in the scene, realizes the construction of the map, has simple operation and low cost, and can be applied to most practical scenes.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (7)

1. A dynamic environment binocular vision SLAM method based on semantic segmentation is characterized by comprising the following steps:
acquiring a semantic mask of an object, wherein the semantic mask is generated through a deep learning network;
A binocular camera is adopted to obtain multi-frame continuous binocular images;
Extracting characteristic points on the binocular images of each frame, and matching the characteristic points on the binocular images of adjacent frames;
Removing the characteristic points on the semantic mask, and calculating the pose of the camera according to the residual characteristic points;
Separating a dynamic object and a static object on the binocular image based on the camera pose;
Estimating motion parameters of the dynamic object based on the separated dynamic object;
Recalculating the camera pose based on the separated static object;
constructing a static map based on the updated camera pose and the feature points on the static object;
separating a dynamic object and a static object on the binocular image based on the camera pose; the method for estimating the motion parameters of the dynamic object based on the separated dynamic object comprises the following steps:
Separating dynamic objects: calculating the motion probability of an object corresponding to the semantic mask based on the camera pose and the position relation between the binocular image of the adjacent frame and the semantic mask, and judging the object corresponding to the semantic mask as a dynamic object if the motion probability is larger than a first threshold;
Dynamic object matching: aiming at the dynamic object, calculating hu moment, center point Euclidean distance and histogram distribution of a semantic mask corresponding to the dynamic object in the adjacent frame binocular image, and calculating probability of matching the dynamic object in the adjacent frame binocular image based on the hu moment, the center point Euclidean distance and the histogram distribution, wherein if the probability is larger than a second threshold value, two dynamic objects in the adjacent frame binocular image are the same object; and
Dynamic object motion estimation: the association of the dynamic objects between the continuous frames is completed through the matching of the dynamic objects, and the motion parameters of the dynamic objects are estimated through a PnP algorithm;
The method for recalculating the pose of the camera based on the separated static object comprises the following steps: removing feature points on a semantic mask corresponding to the dynamic object, and updating the camera pose by adopting a PnP algorithm according to the rest feature points;
the method for constructing the static map based on the updated camera pose and the feature points on the static object comprises the following steps:
determining a plurality of key frames based on the updated camera pose and feature points located on the static object;
Matching the feature points on the key frames, and eliminating unmatched feature points;
checking whether the matched characteristic points meet epipolar geometric constraint or not, and eliminating unsatisfied characteristic points;
Checking whether the forward depth of field, parallax, back projection errors and scales of the residual feature points are consistent, eliminating inconsistent feature points, and generating map points based on the residual feature points;
and constructing the static map based on the map points.
2. The semantic segmentation based dynamic environment binocular vision SLAM method of claim 1, wherein the deep learning network used to generate the semantic Mask is a Mask R-CNN model.
3. The dynamic environment binocular vision SLAM method based on semantic segmentation of claim 1, wherein the extracting feature points on the binocular image of each frame and matching feature points on the binocular image of the adjacent frame comprises:
extracting the characteristic points by using an ORB method;
and acquiring descriptors of each feature point on each frame of binocular image, calculating the Hamming distance between two descriptors positioned on two adjacent frames of binocular images of one feature point, and forming a group of matched feature points by two feature points with the minimum Hamming distance.
4. The dynamic environment binocular vision SLAM method based on semantic segmentation of claim 1, wherein the method of judging whether the feature points are located on the semantic mask comprises: the semantic mask at least comprises a frame of an object, and the coordinates of the feature points are located in the range of the frame, so that the feature points are located on the semantic mask.
5. The dynamic environment binocular vision SLAM method based on semantic segmentation according to claim 1, wherein the method of calculating the camera pose according to the remaining feature points comprises: and solving the camera pose by adopting a PnP algorithm.
6. The semantic segmentation-based dynamic environment binocular vision SLAM method of claim 1, wherein the step of separating the dynamic objects comprises:
calculating the position of the semantic mask of the previous frame corresponding to the current frame based on the camera pose;
Calculating three-dimensional coordinates of all feature points on the semantic mask after projection by using a parallax map, wherein the parallax map is calculated by the binocular image;
Calculating errors of the corresponding feature points of the previous frame and the current frame in the directions of x, y and z, wherein the maximum value of the errors is used as an error value of the feature point;
and converting the error value into the motion probability of the object corresponding to the semantic mask where the feature point is located, and judging whether the object corresponding to the semantic mask is a dynamic object or not based on the motion probability.
7. The semantic segmentation-based dynamic environment binocular vision SLAM method of claim 1, further comprising the step of optimizing the generated map points by beam method adjustment before constructing the static map.
CN202111373890.7A 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation Active CN114140527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111373890.7A CN114140527B (en) 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111373890.7A CN114140527B (en) 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation

Publications (2)

Publication Number Publication Date
CN114140527A CN114140527A (en) 2022-03-04
CN114140527B true CN114140527B (en) 2024-09-10

Family

ID=80390414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111373890.7A Active CN114140527B (en) 2021-11-19 2021-11-19 Dynamic environment binocular vision SLAM method based on semantic segmentation

Country Status (1)

Country Link
CN (1) CN114140527B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049949B (en) * 2022-04-29 2024-09-24 哈尔滨工程大学 Object expression method based on binocular vision
CN116524026B (en) * 2023-05-08 2023-10-27 哈尔滨理工大学 Dynamic vision SLAM method based on frequency domain and semantics
CN116883586B (en) * 2023-06-14 2024-08-23 重庆大学 Terrain semantic map construction method, system and product based on binocular camera
CN116958265A (en) * 2023-09-19 2023-10-27 交通运输部天津水运工程科学研究所 Ship pose measurement method and system based on binocular vision
CN117788730B (en) * 2023-12-08 2024-10-15 中交机电工程局有限公司 Semantic point cloud map construction method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349213B (en) * 2019-06-28 2023-12-12 Oppo广东移动通信有限公司 Pose determining method and device based on depth information, medium and electronic equipment
CN113516664B (en) * 2021-09-02 2024-07-26 长春工业大学 Visual SLAM method based on semantic segmentation dynamic points

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于几何⁃语义联合约束的动态环境视觉SLAM 算法;沈晔湖等;数据采集与处理;20220531;第37卷(第3期);全文 *

Also Published As

Publication number Publication date
CN114140527A (en) 2022-03-04

Similar Documents

Publication Publication Date Title
CN114140527B (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN112785702B (en) SLAM method based on tight coupling of 2D laser radar and binocular camera
CN110097553B (en) Semantic mapping system based on instant positioning mapping and three-dimensional semantic segmentation
CN110070615B (en) Multi-camera cooperation-based panoramic vision SLAM method
CN109345588B (en) Tag-based six-degree-of-freedom attitude estimation method
CN111201451A (en) Method and device for detecting object in scene based on laser data and radar data of scene
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
CN108537844B (en) Visual SLAM loop detection method fusing geometric information
US20220051425A1 (en) Scale-aware monocular localization and mapping
JP6782903B2 (en) Self-motion estimation system, control method and program of self-motion estimation system
CN111882602B (en) Visual odometer implementation method based on ORB feature points and GMS matching filter
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
Shi et al. An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds
CN110070578B (en) Loop detection method
CN114549542A (en) Visual semantic segmentation method, device and equipment
CN111899345B (en) Three-dimensional reconstruction method based on 2D visual image
CN116468786B (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN117974786A (en) Multi-vision-based dynamic environment reconstruction and measurement method and system
Gan et al. A dynamic detection method to improve SLAM performance
CN114648639B (en) Target vehicle detection method, system and device
CN117409386A (en) Garbage positioning method based on laser vision fusion
CN115880428A (en) Animal detection data processing method, device and equipment based on three-dimensional technology
CN114419259A (en) Visual positioning method and system based on physical model imaging simulation
Zhou et al. 2D Grid map for navigation based on LCSD-SLAM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant