CN111462135B - Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation - Google Patents

Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation Download PDF

Info

Publication number
CN111462135B
CN111462135B CN202010246158.2A CN202010246158A CN111462135B CN 111462135 B CN111462135 B CN 111462135B CN 202010246158 A CN202010246158 A CN 202010246158A CN 111462135 B CN111462135 B CN 111462135B
Authority
CN
China
Prior art keywords
semantic
camera
image
dimensional
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010246158.2A
Other languages
Chinese (zh)
Other versions
CN111462135A (en
Inventor
唐漾
钱锋
杜文莉
堵威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202010246158.2A priority Critical patent/CN111462135B/en
Publication of CN111462135A publication Critical patent/CN111462135A/en
Application granted granted Critical
Publication of CN111462135B publication Critical patent/CN111462135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of cross fusion of computer vision and deep learning, in particular to a semantic mapping method based on vision SLAM and two-dimensional semantic segmentation. The method of the invention comprises the following steps: s1, calibrating camera parameters and correcting camera distortion; s2, acquiring an image frame sequence; s3, preprocessing an image; s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5; s5, dynamic fuzzy compensation; s6, semantic segmentation, namely extracting ORB feature points for the image frames, and performing semantic segmentation by using a mask region convolution neural network algorithm model; s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model; s8, establishing a dense semantic map with the assistance of semantic information, and realizing three-dimensional semantic map establishment of the global point cloud map. The method and the system can improve the performance of the unmanned aerial vehicle semantic mapping system, and remarkably improve the robustness of feature point extraction and matching aiming at a dynamic scene.

Description

Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
Technical Field
The invention relates to the field of cross fusion of computer vision and deep learning, in particular to a semantic mapping method based on vision SLAM and two-dimensional semantic segmentation.
Background
Unmanned aerial vehicles generally consist of three modules, namely intelligent decision making, environment sensing and motion control, wherein the environment sensing is the basis of everything.
Unmanned aerial vehicles perceive the surrounding environment, requiring a set of stable, powerful sensor systems to act as "eyes", while requiring corresponding algorithms and powerful processing units to "read objects".
In the unmanned aerial vehicle's environment perception module, visual sensor is indispensable part, and visual sensor can be the camera, compares laser radar, millimeter wave radar, and the resolution ratio of camera is higher, can acquire sufficient environment details, can describe outward appearance and shape, the reading sign etc. of object for example.
Although the global positioning system (Global Positioning System, GPS) aids in the positioning process, the visual sensor cannot be replaced by the GPS system because of interference caused by tall trees, buildings, tunnels, etc. that can make GPS positioning unreliable.
The positioning and mapping (Simultaneous Localization and Mapping, SLAM) refers to that a main body carrying a specific sensor obtains an image frame by calculating the specific sensor to estimate the motion track of the main body without prior information, and establishes a map of surrounding environment, and the positioning and mapping method is widely applied to applications such as robots, unmanned aerial vehicles, automatic driving, augmented reality, virtual reality and the like.
SLAMs can be classified into laser SLAMs and vision SLAMs.
Because the laser SLAM is mature in theoretical technology and engineering application, but the laser SLAM has a fatal defect in the application of a robot, namely the structural information intelligently perceived by the laser radar is two-dimensional information, the information quantity is less, and a large amount of environmental information is lost. At the same time its high cost, huge volume and lack of semantic information make it limited in some specific application scenarios.
The perceived information source of the visual SLAM is a camera image.
Depending on the camera type, visual SLAM can be divided into three types: monocular, binocular, and deep SLAM. Similar to lidar, a depth camera may directly calculate a distance to an obstacle by collecting a point cloud. The depth camera has simple structure, easy installation and operation, low cost and wide application prospect.
With the advent of deep learning, visual SLAM has made a great progress in recent years.
Most visual SLAM schemes are feature points or pixel level, and unmanned aerial vehicles need to acquire semantic information in order to accomplish a specific task or to perform intelligent interaction with the surrounding environment.
The visual SLAM system can select useful information and reject invalid information.
With the development of deep learning, many sophisticated methods of object detection and semantic segmentation provide conditions for accurate semantic mapping. The semantic map is beneficial to improving autonomy and robustness of the unmanned aerial vehicle, completing more complex tasks, and converting path planning into task planning.
With the improvement of hardware computing power and optimization of algorithm structures, deep learning has achieved more and more attention.
A great leap is taken in the field of computer vision, and in terms of RGB image segmentation, the method can be roughly divided into target detection and semantic segmentation.
The method mainly aims at providing a target detection framework in the early stage, and more accurate target detection is realized.
The mainstream deep learning target detection framework is mainly based on CNN (Convolutional Neural Networks, convolutional neural network), wherein the YOLO (You Only Look Once, you only see once) series and R-CNN (Region-CNN, regional convolutional neural network) series are more efficient.
The technology of target perception in three-dimensional images is more and more mature, and the demand for three-dimensional understanding is also more and more urgent. Due to the irregularities of the point cloud, most researchers will convert the points into regular voxel or grid models, making predictions using deep neural networks.
Direct semantic segmentation of point cloud space requires significant computational resources to be consumed and interrelationships between spatial points are weakened.
The PointNet proposed in 2017 is the first deep neural network that can directly process the original three-dimensional point cloud.
Most of the existing visual SLAM systems adopt a dense mapping method, lack of semantic information and cannot meet the intelligent requirement.
A typical assumption for the visual SLAM algorithm is that the scene is stationary, where the presence of some dynamic objects affects not only the estimation of camera pose but also leaves shadows in the map, affecting the map quality.
The pictures captured by the camera under the condition of high-speed movement are easy to blur, and the extraction and matching of the characteristic points are greatly influenced.
Disclosure of Invention
The invention aims to provide a semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, which solves the technical problem that a dynamic object moving at a high speed affects the quality of a map to be established.
In order to achieve the above object, the present invention provides a semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, comprising the following steps:
s1, calibrating camera parameters and correcting camera distortion;
s2, acquiring an image frame sequence, wherein the image frame sequence comprises an RGB image and a depth image;
s3, preprocessing an image, and obtaining coordinates of a real three-dimensional space point corresponding to each pixel point in the RGB image by adopting a pinhole camera model;
s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5;
s5, performing dynamic fuzzy compensation, and calculating to obtain an image block centroid of the current image frame as a semantic feature point to be used as a supplement of ORB feature points;
s6, semantic segmentation is carried out on the image frame, ORB feature points are extracted, semantic segmentation is carried out by using a mask region convolution neural network algorithm model, and semantic information of each pixel point of the image of the frame is obtained;
s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model;
s8, inputting semantic information into a sparse SLAM algorithm model, assisting in dense semantic map construction, completing traversing of key frames, and realizing three-dimensional semantic map construction of a global point cloud map.
In an embodiment, the correcting camera distortion in step S1 further includes the steps of:
s11, camera seatThree-dimensional space point P (X, Y, Z) of the standard system is projected onto the normalized image plane to form normalized coordinates of the point as [ X, Y ]] T
S12, for the points [ x, y ] on the normalized plane] T Radial distortion and tangential distortion correction are performed by the following formula:
Figure BDA0002434029640000041
wherein [ x ] corrected ,y corrected ] T Is the corrected point coordinates, p 1 ,p 2 Is the tangential distortion coefficient, k, of the camera 1 ,k 2 ,k 3 The radial distortion coefficient of the camera is that r is the distance from the point P to the origin of the coordinate system;
s13, point [ x ] after correction corrected ,y corrected ] T Projection onto the pixel plane by means of an internal matrix of parameters yields its correct position u, v on the image] T The method is realized by the following formula:
Figure BDA0002434029640000042
wherein f x ,f y ,c x ,c y Is an intrinsic parameter of the camera.
In an embodiment, the image preprocessing in step S3 further includes pixel points [ u, v ]] T The mapping relation to the real three-dimensional space point P (X, Y, Z) satisfies the following formula:
Figure BDA0002434029640000043
wherein K is called an inner parameter matrix, f x ,f y ,c x ,c y For the internal parameters of the camera, P is the real three-dimensional space point coordinates, [ u, v ]] T Is the pixel point coordinates.
In one embodiment, the key frames of step S4 are screened using a sparse SLAM algorithm model.
In an embodiment, the centroid of the image block in the step S5 is obtained by:
labeling each object of the frame image as a specific class;
for each segmented object, a corresponding labeling area exists, and the segmented image is called an image block;
computing moments of image blocks
Figure BDA0002434029640000044
p,q={0,1};
Calculating corresponding centroid C as semantic feature point to supplement ORB feature point, wherein
Figure BDA0002434029640000045
In an embodiment, the step S6 further includes:
the semantic information of each pixel point comprises a semantic classification label, bounding box coordinates and a confidence score of the classification;
and based on the semantic segmentation result, eliminating ORB characteristic points extracted from the region of which the specified category is the dynamic object.
In an embodiment, the semantic segmentation is performed by the Mask R-CNN algorithm model in step S6, further including:
extracting features on different levels of an input image through a feature map pyramid network;
the interesting proposal is put forward through a regional generation network;
carrying out proposal area alignment by utilizing the arrangement of the interested areas;
performing mask segmentation by using a full convolution network;
and determining the region coordinates and classifying the categories by using a full connection layer.
In an embodiment, the sparse SLAM algorithm model further includes a trace thread, a local mapping thread, a loop detection thread:
the tracking thread is used for locating a camera of each frame of picture by searching for and matching the local map features and minimizing the re-projection error by using a pure motion beam adjustment method;
the local map building thread manages and optimizes a local map by executing a local beam adjustment method, maintains a common view relation between key frames by map points, and optimizes the pose of the common view key frames and map points by the local beam adjustment method;
the loop detection thread detects a large loop, corrects drift errors by executing pose diagram optimization, accelerates screening of closed loop matching frames, optimizes the scale, and optimizes an essential diagram and map points by a global beam adjustment method.
In an embodiment, the sparse SLAM algorithm model further includes a global beam adjustment method optimization thread, triggered after confirmation of the loop detection thread, and after pose map optimization, calculating the optimal structure and motion result of the whole system.
In an embodiment, the pose calculation in step S7 further includes: the camera pose is calculated through PnP solution in a preliminary step, the camera pose is optimized and calculated through a rear-end pose graph, and a minimized reprojection error of camera pose estimation is constructed:
Figure BDA0002434029640000051
wherein u is i For pixel coordinates, P i Is camera coordinate, ζ is corresponding lie algebra of camera pose, s i And K is the depth of the feature points and K is the matrix of parameters in the camera.
According to the semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, a dense semantic map for eliminating dynamic objects is established based on the Mask R-CNN algorithm model and the sparse SLAM algorithm model of ORB feature points, the performance of an unmanned aerial vehicle semantic mapping system is improved by utilizing inter-frame information and semantic information on image frames, and the robustness of feature point extraction and matching for dynamic scenes is improved.
Drawings
The above and other features, properties and advantages of the present invention will become more apparent from the following description of embodiments taken in conjunction with the accompanying drawings in which like reference characters designate like features throughout the drawings, and in which:
FIG. 1 discloses a flow chart of a method according to an embodiment of the invention;
FIG. 2 discloses a calibration plate for camera calibration according to an embodiment of the present invention;
FIG. 3a discloses a pinhole imaging model diagram of a pinhole camera according to one embodiment of the invention;
FIG. 3b discloses a similar triangular schematic of a pinhole camera according to one embodiment of the invention;
FIG. 4 discloses a system flow diagram of a Mask RCNN in accordance with an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The concept of a semantic map refers to a map containing rich semantic information, and represents abstraction of semantic information such as space geometric relations and existing object types and positions in the environment. The semantic map contains both ambient space information and a map of ambient semantic information so that the mobile robot can know, like a person, both the presence of objects in the environment and what the objects are.
Aiming at the problems and the shortcomings in the prior art, the invention provides a semantic mapping system based on visual SLAM and two-dimensional semantic segmentation, which uses ORB (Oriented FAST and Rotated BRIEF, rapid guiding and brief rotation) based feature points to perform semantic segmentation, combines a sparse SLAM algorithm model, and completes semantic mapping while realizing positioning.
Fig. 1 discloses a flowchart of a semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to an embodiment of the present invention, and in the embodiment shown in fig. 1, the semantic mapping method based on visual SLAM and two-dimensional semantic segmentation provided by the present invention specifically includes the following steps:
s1, calibrating camera parameters and correcting camera distortion;
s2, acquiring an image frame sequence, wherein the image frame sequence comprises an RGB image and a depth image;
s3, preprocessing an image, and obtaining coordinates of a real three-dimensional space point corresponding to each pixel point in the RGB image by adopting a pinhole camera model;
s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5;
s5, performing dynamic fuzzy compensation, and calculating to obtain an image block centroid of the current image frame as a semantic feature point to be used as a supplement of ORB feature points;
s6, semantic segmentation is carried out on the image frame, ORB feature points are extracted, semantic segmentation is carried out by using a Mask R-CNN algorithm model, and semantic information of each pixel point of the image of the frame is obtained;
s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model;
s8, inputting semantic information into a sparse SLAM algorithm model, assisting in dense semantic map construction, completing traversing of key frames, and realizing three-dimensional semantic map construction of a global point cloud map.
Each step is described in detail below.
Step S1: calibrating camera parameters and correcting camera distortion.
In image measurement processes and machine vision applications, in order to determine the interrelation between the three-dimensional geometric position of a point on the surface of a spatial object and its corresponding point in the image, a geometric model of camera imaging must be established, and these geometric model parameters are camera parameters.
The distortion coefficient belongs to one of the camera parameters and corresponds to the camera distortion phenomenon. Under most conditions, these camera parameters must be obtained through experimentation and calculation, and this process of solving for the parameters is called camera calibration (or camera calibration).
Camera distortion includes radial distortion and tangential distortion.
The radial distortion is caused by the lens shape.
More specifically, in the pinhole model, a straight line is projected onto the pixel plane or a straight line.
However, in an actual photograph, the lens of the camera tends to cause a straight line in the real environment to become a curve in the photograph, and such distortion is called radial distortion.
The tangential distortion is formed during assembly of the camera by the inability to make the lens and imaging plane exactly parallel.
Since the light projection causes inconsistency between the actual object and the image projected onto the 2D plane, the inconsistency is stable, and the distortion correction of the subsequent image can be realized by calibrating the camera and calculating the distortion parameter.
For radial distortion, correction is performed with a second and higher order polynomial function relating to distance from center:
Figure BDA0002434029640000081
wherein [ x, y] T Is the coordinates of an uncorrected point, [ x ] corrected ,y corrected ] T K for corrected point coordinates 1 , k 2 ,k 3 The radial distortion coefficient of the camera is represented by r, which is the distance of the point P from the origin of the coordinate system.
Two other parameters p can be used for tangential distortion 1 ,p 2 To correct:
Figure BDA0002434029640000082
wherein [ x, y] T Is the coordinates of an uncorrected point, [ x ] corrected ,y corrected ] T For corrected point coordinates, p 1 , p 2 Is the tangential distortion coefficient of the camera, r is the distance from the point P to the origin of the coordinate systemAnd (5) separating.
Before the camera is used, three-dimensional information is obtained from a two-dimensional image by calibrating a radial distortion coefficient and a tangential distortion coefficient of the camera, so that the distortion correction, object measurement, three-dimensional reconstruction and the like of the image are realized.
Fig. 2 shows a calibration plate for calibrating a camera according to an embodiment of the present invention, the calibration plate shown in fig. 2 is placed in a visible range of the camera, each time a picture is taken, the calibration plate is changed in position and orientation, feature points in an image are detected, internal parameters and external parameters of the camera are obtained, and distortion coefficients are obtained.
Preferably, solving for camera parameters is performed using a Camera Calibrator (camera correction) toolbox in MATLAB.
For point P (X, Y, Z) in the camera coordinate system, step S1 of the present invention performs camera distortion correction by 5 distortion coefficients, finding the correct position of this point on the pixel plane.
The correction steps of the camera distortion are as follows:
s11, projecting the three-dimensional space points to a normalized image plane. Let its normalized coordinates be [ x, y ]] T
S12, correcting radial distortion and tangential distortion of points on the normalized plane.
Figure BDA0002434029640000091
Wherein [ x ] corrected ,y corrected ] T Is the corrected point coordinates, p 1 ,p 2 Is the tangential distortion coefficient, k, of the camera 1 ,k 2 ,k 3 The radial distortion coefficient of the camera is represented by r, which is the distance of the point P from the origin of the coordinate system.
S13, correcting the corrected point [ x ] corrected ,y corrected ] T The correct position coordinates [ u, v ] of the point on the image are obtained by projecting the internal parameter matrix to the pixel plane] T
Figure BDA0002434029640000092
Wherein f x ,f y ,c x ,c y Is an intrinsic parameter of the camera.
And S2, acquiring an image frame sequence.
And acquiring an RGB-D image frame sequence by using a Kinect camera, wherein the image frame sequence comprises an RGB image and a depth image.
Step S3, image preprocessing
In one embodiment, an RGB-D camera is used as the primary sensor, an RGB image and a depth image are obtained simultaneously, and a pinhole camera model is used to map pixels of the RGB image into a real three-dimensional space.
Fig. 3a discloses a pinhole imaging model diagram of a pinhole camera according to an embodiment of the present invention, and fig. 3b discloses a similar triangle schematic diagram of a pinhole camera according to an embodiment of the present invention, wherein a camera coordinate system O-x-y-z is established, and the optical center position of the camera is taken as the origin O of the coordinate system, and the direction of the contracted arrow is taken as the forward direction, as shown in fig. 3a and 3 b.
By mapping the similar triangles shown in fig. 3b, a coordinate system O '-x' -y '-z' is established on the imaging plane of the camera, assuming the direction of the arrow as forward.
Suppose the P point coordinates are [ X, Y, Z] T The focal length of the camera lens is f, which is the distance from the camera optical center to the physical imaging plane.
Point P is projected through the optical center to point P' of the imaging plane, coordinates [ u, v ] of pixel point P ]] T
According to the corresponding relation, the mapping relation corresponds to the scaling and translation amount of one scale, and the deduction can be obtained:
Figure BDA0002434029640000093
wherein K is called an in-camera parameter matrix, which is an intrinsic parameter, and has been calibrated in step S1, f x ,f y ,c x ,c y Is a cameraInternal parameters, P is the real three-dimensional space point coordinates, [ u, v ]] T Is the pixel point coordinates.
Step S4, judging whether the frame is a key frame, if so, turning to step S6, and if not, turning to step S5;
if each frame of image is adopted to perform visual SLAM and semantic segmentation calculation, the calculated amount is too large, and therefore, the high quality of the image is selected as a key frame.
In the invention, a sparse SLAM algorithm model based on ORB (Oriented FAST and Rotated BRIEF, fast steering and simple rotation) feature points is used to screen key frames.
Each key frame contains an RGB image and a depth image.
Step S5, dynamic blurring
Because dynamic objects may exist in each frame of image, a certain number of targets are designated as dynamic targets each time a semantic mapping task is executed. In an image sequence, if the dynamic target is identified in the frame image, the method eliminates the corresponding point cloud when converting the two-dimensional pixel point into the three-dimensional space coordinate, and prevents the dynamic object from leaving a residual shadow in the map and affecting the quality of the map construction.
In step S5 of the present invention, if the frame image is not a key frame, because of motion blur, the ORB feature point is not extracted sufficiently, the following operations are performed as a complement before the image semantic segmentation step of step S6:
each object of the frame image is marked as a specific class, each segmented object is provided with a corresponding marked area, the segmented image is called an image block, and the moment m of the image block B is calculated pq
Figure BDA0002434029640000101
Centroid position C is:
Figure BDA0002434029640000102
the centroid serves as a semantic feature point to supplement the deficiency of the ORB feature point.
And supplementing semantic feature points aiming at a fuzzy image with serious ORB feature point loss, and inhibiting the matching of a tracking algorithm to a dynamic object, comprehensively screening key frames, estimating the pose of a camera and preventing a mapping algorithm from including a moving object as a part of a 3D map.
Step S6, semantic segmentation
And (3) extracting ORB characteristic points for each image frame, and performing semantic segmentation by using a Mask RCNN (Mask Region-CNN) algorithm model to acquire semantic information of each pixel point of the frame image.
Based on the semantic segmentation result, if a dynamic object is identified, the ORB feature points extracted from the region which designates a certain category as a dynamic object are eliminated.
The inhibit vision SLAM algorithm includes moving objects as part of the 3D map during the mapping process.
In step S6 of the invention, the Mask RCNN algorithm model is trained by adopting a COCO data set.
COCO, collectively Common Objects in COntext, is a data set provided by Microsoft team that can be used for image recognition, and can obtain 80 categories of classification information.
FIG. 4 is a system flow chart of Mask RCNN according to an embodiment of the invention, as shown in FIG. 4, the RGB image semantic segmentation of the image frame is realized based on the Mask R-CNN algorithm model, and the steps of semantic segmentation based on the convolutional neural network framework of the Mask R-CNN algorithm model are as follows:
extracting features on different levels of the input image through an FPN (Feature Pyramid Networks, feature map pyramid network);
the proposal of interest is proposed by the RPN (Region Proposal Network, regional generation network);
proposed region alignment is performed using the RoI alignment (Region of Interest Align, region of interest arrangement);
mask segmentation using FCN (Fully Convolutional Networks, full convolution network);
region coordinate determination and category classification are performed by using FC (Fully Connected Layers, full connection layer).
The frame image is processed by a Mask RCNN algorithm model, a semantic classification result of a pixel level is generated, namely, a semantic classification label of each pixel point is generated, and meanwhile, bounding box coordinates and confidence scores of the classification are output.
The invention adopts ORB characteristic points to carry out tracking, mapping and position identification tasks, and the ORB characteristic points have the advantages of rotation invariance and scale invariance, can rapidly extract characteristics and match, can meet the requirement of real-time operation, and can show good precision in the position identification process based on word bags.
S7 pose calculation
The visual odometer pose estimation is for two adjacent frames of images, it being understood that a plurality of such frame pose estimation accumulations are the camera's motion trajectories.
The camera pose was calculated using a sparse SLAM algorithm model based on ORB (Oriented FAST and Rotated BRIEF, fast steering and simple rotation) feature points.
After extracting the characteristic points of the image frames, estimating the pose of the camera by using PnP based on the key frames.
PnP is the abbreviation of Perselect-n-Point (n-Point Perspective), which is a method for solving the motion of 3D to 2D Point pairs: i.e. how to solve the pose of the camera when n 3D spatial points and their projection positions are given.
Assume that at time k, the camera position is x k The camera input data is u k ,w k For noise, the motion equation is constructed:
x k =f(x k-1 ,u k ,w k )。
at x k Locally observing the landmark point y j Generating a series of observations z k,j ,v k,j To observe noise, an observation equation is constructed:
z k,j =h(y j ,x k ,v k,j )。
in the step S7 of the invention, the camera pose can be initially calculated through PnP problem solving, and then the camera pose with more accuracy is further calculated by utilizing the rear-end pose chart optimization.
In step S7 of the invention, the PnP problem of camera pose estimation is constructed into a nonlinear least square problem on a domain lie algebra.
Furthermore, the camera pose estimation in step S7 of the present invention is constructed as a BA (Bundle Adjustment, beam adjustment method) problem, and the minimized re-projection error of the camera pose estimation is constructed:
Figure BDA0002434029640000121
wherein u is i For pixel coordinates, P i Is camera coordinate, ζ is corresponding lie algebra of camera pose, s i The depth of the feature points is K, the matrix of parameters in the camera is K, and n is the number of the points.
Further, step S7 of the present invention further includes repositioning using an embedded location recognition model based on DBOW (Direct index Bag of words, bag of words model) to prevent tracking failure, or re-initialization of known map scenes, loop detection, etc.
In the invention, a sparse SLAM algorithm model is adopted for screening key frames and calculating camera pose, and is obtained by improvement on the basis of ORB-SLAM2 (Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping 2, and the second generation rapid guiding and brief rotating instant positioning and map construction).
The SLAM algorithm model consists of 4 parallel threads, including a tracking thread, a local mapping thread, a loop detection thread and a global BA optimization thread.
Further, the global BA optimization thread is executed only after the loop detection thread is validated.
The first three threads are parallel threads, and are defined as follows:
1) The thread is tracked.
And (3) locating the camera of each frame of picture by searching for matching the local map features and minimizing the re-projection error by using the pure motion BA.
Preferably, the matching is performed using a constant speed model.
2) And (5) a local mapping thread.
The co-view relationship between key frames is maintained by MapPoints by performing local BA management of local maps and optimization, and the co-view key frame pose and MapPoints are optimized by local BA.
3) The loop detects the thread.
Detecting large loops and correcting drift errors by performing pose map optimization, accelerating the screening of closed loop matching frames by a cow, and optimizing Essential Graph (essence map) and MapPoints by a Sim3 optimization scale by a global BA. The Sim3 transform is a similarity transform.
The loop detection thread triggers the global BA optimization thread.
And (3) a global BA thread, and after optimizing the pose graph, calculating the optimal structure and the motion result of the whole system.
Compared with the dense SLAM algorithm model in the prior art, the sparse SLAM algorithm model adds rich semantic segmentation information of the image in the final image building process through semantic information fusion.
Step S8, three-dimensional semantic mapping
And (3) inputting semantic information into a sparse SLAM algorithm model by utilizing the semantic segmentation result of the step (S6) and combining the frame pose information acquired in the step (S7) and the real three-dimensional coordinates of the image frame pixels, projecting the same objects contained in the semantics of the frame images into a three-dimensional point cloud map with the same labeling color, assisting in dense semantic map construction, completing traversing of a key frame, and realizing three-dimensional semantic map construction of a global point cloud map.
Step S8 of the present invention further includes:
s81, projecting three-dimensional space pixels generated by a first frame key frame into an initial point cloud;
s82, generating a point cloud map through three-dimensional space coordinates corresponding to each pixel point of the current key frame obtained through calculation of the pinhole model;
s83, calculating to obtain pose changes of the current key frame and the previous key frame;
s84, overlapping and fusing three-dimensional coordinate points of the two point cloud maps through the pose transformation matrix to generate a point cloud map with more information;
s85, iterating the steps continuously, and constructing the global point cloud map when traversing all the key frames is completed.
The test results of the unmanned aerial vehicle semantic mapping method based on visual SLAM and two-dimensional semantic segmentation are further described in detail below in combination with specific tests.
Based on an operating system Ubuntu16.04 and a hardware display card Nvidia Geforce GTX 1050, the test uses Tensorflow, openCV, g O, point Cloud Library and other software tools to take a real scene as an experimental condition, and utilizes a Kinect V1 camera to real-shoot data.
For three-dimensional semantic diagramming evaluation, Q 1 Representing the number of correctly detected articles, Q 2 Representing the number of detected objects but misclassification and the actual presence of objects but not detected, Q 3 Instead of the number of objects but detected results, P represents the correct detection rate of the three-dimensional object, and the calculation method is as follows:
P=Q 1 /(Q 1 +Q 2 +Q 3 )
and (3) carrying out experimental record by constructing a dense semantic map for 9 times, wherein the accurate detection rate of the average three-dimensional object in the calculated map is 48.1086%, and the specific experimental result is shown in the following table:
TABLE 1
Figure BDA0002434029640000141
Figure BDA0002434029640000151
According to the semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, a dense semantic map for eliminating dynamic objects is established based on the Mask R-CNN algorithm model and the sparse SLAM algorithm model of ORB feature points, the performance of an unmanned aerial vehicle semantic mapping system is improved by utilizing inter-frame information and semantic information on image frames, and the robustness of feature point extraction and matching for dynamic scenes is improved.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood and appreciated by those skilled in the art.
As used in this application and in the claims, the terms "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
The embodiments described above are intended to provide those skilled in the art with a full range of modifications and variations to the embodiments described above without departing from the inventive concept thereof, and therefore the scope of the invention is not limited to the embodiments described above, but is to be accorded the broadest scope consistent with the innovative features recited in the claims.

Claims (10)

1. A semantic mapping method based on visual SLAM and two-dimensional semantic segmentation is characterized by comprising the following steps:
s1, calibrating camera parameters and correcting camera distortion;
s2, acquiring an image frame sequence, wherein the image frame sequence comprises an RGB image and a depth image;
s3, preprocessing an image, and obtaining coordinates of a real three-dimensional space point corresponding to each pixel point in the RGB image by adopting a pinhole camera model;
s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5;
s5, performing dynamic fuzzy compensation, and calculating to obtain an image block centroid of the current image frame as a semantic feature point to be used as a supplement of ORB feature points;
s6, semantic segmentation is carried out on the image frame, ORB feature points are extracted, semantic segmentation is carried out by using a mask region convolution neural network algorithm model, and semantic information of each pixel point of the image of the frame is obtained;
s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model;
s8, inputting semantic information into a sparse SLAM algorithm model, assisting in dense semantic map construction, completing traversing of key frames, and realizing three-dimensional semantic map construction of a global point cloud map.
2. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the correcting camera distortion in step S1 further comprises the steps of:
s11, projecting a three-dimensional space point P (X, Y, Z) of a camera coordinate system to a normalized image plane to form a normalized coordinate of the point as [ X, Y] T
S12, for the points [ x, y ] on the normalized plane] T Radial distortion and tangential distortion correction are performed by the following formula:
Figure FDA0002434029630000011
wherein [ x ] corrected ,y corrected ] T Is the corrected point coordinates, p 1 ,p 2 Is the tangential distortion coefficient, k, of the camera 1 ,k 2 ,k 3 Is a cameraThe radial distortion coefficient, r, is the distance of the point P from the origin of the coordinate system;
s13, point [ x ] after correction corrected ,y corrected ] T Projection onto the pixel plane by means of an internal matrix of parameters yields its correct position u, v on the image] T The method is realized by the following formula:
Figure FDA0002434029630000021
wherein f x ,f y ,c x ,c y Is an intrinsic parameter of the camera.
3. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 2, wherein the image preprocessing of step S3 further comprises pixel points [ u, v ]] T The mapping relation to the real three-dimensional space point P (X, Y, Z) satisfies the following formula:
Figure FDA0002434029630000022
wherein K is called an in-camera parameter matrix, f x ,f y ,c x ,c y For the internal parameters of the camera, P is the real three-dimensional space point coordinates, [ u, v ]] T Is the pixel point coordinates.
4. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the key frames of step S4 are screened using a sparse SLAM algorithm model.
5. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the centroid of the image block in step S5 is obtained by:
labeling each object of the frame image as a specific class;
for each segmented object, a corresponding labeling area exists, and the segmented image is called an image block;
computing moments of image blocks
Figure FDA0002434029630000023
p,q={0,1};
Calculating corresponding centroid C as semantic feature point to supplement ORB feature point, wherein
Figure FDA0002434029630000024
6. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the step S6 further comprises:
the semantic information of each pixel point comprises a semantic classification label, bounding box coordinates and a confidence score of the classification;
and based on the semantic segmentation result, eliminating ORB characteristic points extracted from the region of which the specified category is the dynamic object.
7. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the Mask R-CNN algorithm model of step S6 performs semantic segmentation, further comprising:
extracting features on different levels of an input image through a feature map pyramid network;
the interesting proposal is put forward through a regional generation network;
carrying out proposal area alignment by utilizing the arrangement of the interested areas;
performing mask segmentation by using a full convolution network;
and determining the region coordinates and classifying the categories by using a full connection layer.
8. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1 or claim 4, wherein the sparse SLAM algorithm model further comprises a tracking thread, a local mapping thread, and a loop detection thread:
the tracking thread is used for locating a camera of each frame of picture by searching for and matching the local map features and minimizing the re-projection error by using a pure motion beam adjustment method;
the local map building thread manages and optimizes a local map by executing a local beam adjustment method, maintains a common view relation between key frames by map points, and optimizes the pose of the common view key frames and map points by the local beam adjustment method;
the loop detection thread detects a large loop, corrects drift errors by executing pose diagram optimization, accelerates screening of closed loop matching frames, optimizes the scale, and optimizes an essential diagram and map points by a global beam adjustment method.
9. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the sparse SLAM algorithm model further comprises a global beam adjustment method optimization thread, wherein the global beam adjustment method optimization thread is triggered after loop detection thread confirmation, and the optimal structure and the motion result of the whole system are calculated after pose map optimization.
10. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the pose calculation of step S7 further comprises: solving the initial computer pose through PnP, optimizing the computer pose by utilizing a rear-end pose graph, and constructing a minimized reprojection error of the camera pose estimation:
Figure FDA0002434029630000041
wherein u is i For pixel coordinates, P i Is camera coordinate, ζ is corresponding lie algebra of camera pose, s i And K is the depth of the feature points and K is the matrix of parameters in the camera.
CN202010246158.2A 2020-03-31 2020-03-31 Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation Active CN111462135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010246158.2A CN111462135B (en) 2020-03-31 2020-03-31 Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010246158.2A CN111462135B (en) 2020-03-31 2020-03-31 Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation

Publications (2)

Publication Number Publication Date
CN111462135A CN111462135A (en) 2020-07-28
CN111462135B true CN111462135B (en) 2023-04-21

Family

ID=71680957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010246158.2A Active CN111462135B (en) 2020-03-31 2020-03-31 Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation

Country Status (1)

Country Link
CN (1) CN111462135B (en)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559320B (en) * 2018-09-18 2022-11-18 华东理工大学 Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network
CN111897332B (en) * 2020-07-30 2022-10-11 国网智能科技股份有限公司 Semantic intelligent substation robot humanoid inspection operation method and system
CN111950561A (en) * 2020-08-25 2020-11-17 桂林电子科技大学 Semantic SLAM dynamic point removing method based on semantic segmentation
CN112132893B (en) * 2020-08-31 2024-01-09 同济人工智能研究院(苏州)有限公司 Visual SLAM method suitable for indoor dynamic environment
CN112017188B (en) * 2020-09-09 2024-04-09 上海航天控制技术研究所 Space non-cooperative target semantic recognition and reconstruction method
CN112258575B (en) * 2020-10-13 2022-12-02 浙江大学 Method for quickly identifying object in synchronous positioning and map construction
CN112344922B (en) * 2020-10-26 2022-10-21 中国科学院自动化研究所 Monocular vision odometer positioning method and system
CN112446882A (en) * 2020-10-28 2021-03-05 北京工业大学 Robust visual SLAM method based on deep learning in dynamic scene
CN112183476B (en) * 2020-10-28 2022-12-23 深圳市商汤科技有限公司 Obstacle detection method and device, electronic equipment and storage medium
CN112348868A (en) * 2020-11-06 2021-02-09 养哇(南京)科技有限公司 Method and system for recovering monocular SLAM scale through detection and calibration
CN112308921B (en) * 2020-11-09 2024-01-12 重庆大学 Combined optimization dynamic SLAM method based on semantics and geometry
CN112465021B (en) * 2020-11-27 2022-08-05 南京邮电大学 Pose track estimation method based on image frame interpolation method
CN112396595B (en) * 2020-11-27 2023-01-24 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on point-line characteristics in dynamic environment
CN112381841A (en) * 2020-11-27 2021-02-19 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on GMS feature matching in dynamic scene
CN112571415B (en) * 2020-12-03 2022-03-01 哈尔滨工业大学(深圳) Robot autonomous door opening method and system based on visual guidance
CN112465858A (en) * 2020-12-10 2021-03-09 武汉工程大学 Semantic vision SLAM method based on probability grid filtering
CN112509051A (en) * 2020-12-21 2021-03-16 华南理工大学 Bionic-based autonomous mobile platform environment sensing and mapping method
CN112507056B (en) * 2020-12-21 2023-03-21 华南理工大学 Map construction method based on visual semantic information
CN112734845B (en) * 2021-01-08 2022-07-08 浙江大学 Outdoor monocular synchronous mapping and positioning method fusing scene semantics
CN112990195A (en) * 2021-03-04 2021-06-18 佛山科学技术学院 SLAM loop detection method for integrating semantic information in complex environment
CN112991436B (en) * 2021-03-25 2022-09-06 中国科学技术大学 Monocular vision SLAM method based on object size prior information
CN113034584B (en) * 2021-04-16 2022-08-30 广东工业大学 Mobile robot visual positioning method based on object semantic road sign
CN113192200B (en) * 2021-04-26 2022-04-01 泰瑞数创科技(北京)有限公司 Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm
CN113516692B (en) * 2021-05-18 2024-07-19 上海汽车集团股份有限公司 SLAM method and device for multi-sensor fusion
CN113537208B (en) * 2021-05-18 2024-06-11 杭州电子科技大学 Visual positioning method and system based on semantic ORB-SLAM technology
CN113269831B (en) * 2021-05-19 2021-11-16 北京能创科技有限公司 Visual repositioning method, system and device based on scene coordinate regression network
CN113674340A (en) * 2021-07-05 2021-11-19 北京物资学院 Binocular vision navigation method and device based on landmark points
CN113610763B (en) * 2021-07-09 2024-07-16 北京航天计量测试技术研究所 Rocket engine structural member pose motion compensation method in vibration environment
CN113808251B (en) * 2021-08-09 2024-04-12 杭州易现先进科技有限公司 Dense reconstruction method, system, device and medium based on semantic segmentation
CN114730471A (en) * 2021-08-13 2022-07-08 深圳市大疆创新科技有限公司 Control method, control device, movable platform and storage medium
CN113658257B (en) * 2021-08-17 2022-05-27 广州文远知行科技有限公司 Unmanned equipment positioning method, device, equipment and storage medium
CN113674416B (en) * 2021-08-26 2024-04-26 中国电子科技集团公司信息科学研究院 Three-dimensional map construction method and device, electronic equipment and storage medium
CN113903011B (en) * 2021-10-26 2024-06-11 江苏大学 Semantic map construction and positioning method suitable for indoor parking lot
CN114202579B (en) * 2021-11-01 2024-07-16 东北大学 Dynamic scene-oriented real-time multi-body SLAM system
CN114132360B (en) * 2021-11-08 2023-09-08 卡斯柯信号有限公司 Method, equipment and storage medium for preventing turnout from being squeezed based on image discrimination of turnout state
CN114359493B (en) * 2021-12-20 2023-01-03 中国船舶重工集团公司第七0九研究所 Method and system for generating three-dimensional semantic map for unmanned ship
CN114529800A (en) * 2022-01-12 2022-05-24 华南理工大学 Obstacle avoidance method, system, device and medium for rotor unmanned aerial vehicle
CN114708321B (en) * 2022-01-12 2024-06-21 北京航空航天大学 Semantic-based camera pose estimation method and system
CN114550186A (en) * 2022-04-21 2022-05-27 北京世纪好未来教育科技有限公司 Method and device for correcting document image, electronic equipment and storage medium
CN114972470B (en) * 2022-07-22 2022-11-18 北京中科慧眼科技有限公司 Road surface environment obtaining method and system based on binocular vision
CN115451939B (en) * 2022-08-19 2024-05-07 中国人民解放军国防科技大学 Parallel SLAM method under dynamic scene based on detection segmentation
CN115164918B (en) * 2022-09-06 2023-02-03 联友智连科技有限公司 Semantic point cloud map construction method and device and electronic equipment
CN116681755B (en) * 2022-12-29 2024-02-09 广东美的白色家电技术创新中心有限公司 Pose prediction method and device
CN116342800B (en) * 2023-02-21 2023-10-24 中国航天员科研训练中心 Semantic three-dimensional reconstruction method and system for multi-mode pose optimization
CN116339336A (en) * 2023-03-29 2023-06-27 北京信息科技大学 Electric agricultural machinery cluster collaborative operation method, device and system
CN116817887B (en) * 2023-06-28 2024-03-08 哈尔滨师范大学 Semantic visual SLAM map construction method, electronic equipment and storage medium
CN117392347B (en) * 2023-10-13 2024-04-30 苏州煋海图科技有限公司 Map construction method, device, computer equipment and readable storage medium
CN117611762B (en) * 2024-01-23 2024-04-30 常熟理工学院 Multi-level map construction method, system and electronic equipment
CN118447320B (en) * 2024-05-13 2024-09-27 华智清创(苏州)农业科技有限公司 Visual multitasking mounted agricultural inspection method and device based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
WO2019169540A1 (en) * 2018-03-06 2019-09-12 斯坦德机器人(深圳)有限公司 Method for tightly-coupling visual slam, terminal and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019169540A1 (en) * 2018-03-06 2019-09-12 斯坦德机器人(深圳)有限公司 Method for tightly-coupling visual slam, terminal and computer readable storage medium
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于北斗RDSS的核辐射监测应急通讯方法;王廷银等;《计算机系统应用》;20191215(第12期);全文 *
基于语义分割的增强现实图像配准技术;卞贤掌等;《电子技术与软件工程》;20181213(第23期);全文 *

Also Published As

Publication number Publication date
CN111462135A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111462135B (en) Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation
CN110097553B (en) Semantic mapping system based on instant positioning mapping and three-dimensional semantic segmentation
CN110070615B (en) Multi-camera cooperation-based panoramic vision SLAM method
CN111563415B (en) Binocular vision-based three-dimensional target detection system and method
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN112734852B (en) Robot mapping method and device and computing equipment
Zhu et al. Online camera-lidar calibration with sensor semantic information
CN103886107B (en) Robot localization and map structuring system based on ceiling image information
CN112396656A (en) Outdoor mobile robot pose estimation method based on fusion of vision and laser radar
CN114140527B (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN111998862B (en) BNN-based dense binocular SLAM method
CN113327296B (en) Laser radar and camera online combined calibration method based on depth weighting
CN112101160B (en) Binocular semantic SLAM method for automatic driving scene
US11703596B2 (en) Method and system for automatically processing point cloud based on reinforcement learning
CN112801074A (en) Depth map estimation method based on traffic camera
CN113744315B (en) Semi-direct vision odometer based on binocular vision
CN117593650B (en) Moving point filtering vision SLAM method based on 4D millimeter wave radar and SAM image segmentation
CN112767546B (en) Binocular image-based visual map generation method for mobile robot
CN111899345B (en) Three-dimensional reconstruction method based on 2D visual image
CN104166995B (en) Harris-SIFT binocular vision positioning method based on horse pace measurement
CN114677531A (en) Water surface unmanned ship target detection and positioning method fusing multi-mode information
CN115147344A (en) Three-dimensional detection and tracking method for parts in augmented reality assisted automobile maintenance
CN116804553A (en) Odometer system and method based on event camera/IMU/natural road sign
CN114648639B (en) Target vehicle detection method, system and device
CN117409386A (en) Garbage positioning method based on laser vision fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant