CN111462135B - Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation - Google Patents
Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation Download PDFInfo
- Publication number
- CN111462135B CN111462135B CN202010246158.2A CN202010246158A CN111462135B CN 111462135 B CN111462135 B CN 111462135B CN 202010246158 A CN202010246158 A CN 202010246158A CN 111462135 B CN111462135 B CN 111462135B
- Authority
- CN
- China
- Prior art keywords
- semantic
- camera
- image
- dimensional
- map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000011218 segmentation Effects 0.000 title claims abstract description 50
- 238000013507 mapping Methods 0.000 title claims abstract description 45
- 230000000007 visual effect Effects 0.000 title claims description 32
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 238000001514 detection method Methods 0.000 claims description 17
- 238000005457 optimization Methods 0.000 claims description 15
- 230000033001 locomotion Effects 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 10
- 238000012937 correction Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 8
- 239000013589 supplement Substances 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 238000012790 confirmation Methods 0.000 claims description 2
- 230000001960 triggered effect Effects 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 4
- 230000004927 fusion Effects 0.000 abstract description 3
- 102000008115 Signaling Lymphocytic Activation Molecule Family Member 1 Human genes 0.000 description 43
- 108010074687 Signaling Lymphocytic Activation Molecule Family Member 1 Proteins 0.000 description 43
- 238000003384 imaging method Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000035508 accumulation Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of cross fusion of computer vision and deep learning, in particular to a semantic mapping method based on vision SLAM and two-dimensional semantic segmentation. The method of the invention comprises the following steps: s1, calibrating camera parameters and correcting camera distortion; s2, acquiring an image frame sequence; s3, preprocessing an image; s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5; s5, dynamic fuzzy compensation; s6, semantic segmentation, namely extracting ORB feature points for the image frames, and performing semantic segmentation by using a mask region convolution neural network algorithm model; s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model; s8, establishing a dense semantic map with the assistance of semantic information, and realizing three-dimensional semantic map establishment of the global point cloud map. The method and the system can improve the performance of the unmanned aerial vehicle semantic mapping system, and remarkably improve the robustness of feature point extraction and matching aiming at a dynamic scene.
Description
Technical Field
The invention relates to the field of cross fusion of computer vision and deep learning, in particular to a semantic mapping method based on vision SLAM and two-dimensional semantic segmentation.
Background
Unmanned aerial vehicles generally consist of three modules, namely intelligent decision making, environment sensing and motion control, wherein the environment sensing is the basis of everything.
Unmanned aerial vehicles perceive the surrounding environment, requiring a set of stable, powerful sensor systems to act as "eyes", while requiring corresponding algorithms and powerful processing units to "read objects".
In the unmanned aerial vehicle's environment perception module, visual sensor is indispensable part, and visual sensor can be the camera, compares laser radar, millimeter wave radar, and the resolution ratio of camera is higher, can acquire sufficient environment details, can describe outward appearance and shape, the reading sign etc. of object for example.
Although the global positioning system (Global Positioning System, GPS) aids in the positioning process, the visual sensor cannot be replaced by the GPS system because of interference caused by tall trees, buildings, tunnels, etc. that can make GPS positioning unreliable.
The positioning and mapping (Simultaneous Localization and Mapping, SLAM) refers to that a main body carrying a specific sensor obtains an image frame by calculating the specific sensor to estimate the motion track of the main body without prior information, and establishes a map of surrounding environment, and the positioning and mapping method is widely applied to applications such as robots, unmanned aerial vehicles, automatic driving, augmented reality, virtual reality and the like.
SLAMs can be classified into laser SLAMs and vision SLAMs.
Because the laser SLAM is mature in theoretical technology and engineering application, but the laser SLAM has a fatal defect in the application of a robot, namely the structural information intelligently perceived by the laser radar is two-dimensional information, the information quantity is less, and a large amount of environmental information is lost. At the same time its high cost, huge volume and lack of semantic information make it limited in some specific application scenarios.
The perceived information source of the visual SLAM is a camera image.
Depending on the camera type, visual SLAM can be divided into three types: monocular, binocular, and deep SLAM. Similar to lidar, a depth camera may directly calculate a distance to an obstacle by collecting a point cloud. The depth camera has simple structure, easy installation and operation, low cost and wide application prospect.
With the advent of deep learning, visual SLAM has made a great progress in recent years.
Most visual SLAM schemes are feature points or pixel level, and unmanned aerial vehicles need to acquire semantic information in order to accomplish a specific task or to perform intelligent interaction with the surrounding environment.
The visual SLAM system can select useful information and reject invalid information.
With the development of deep learning, many sophisticated methods of object detection and semantic segmentation provide conditions for accurate semantic mapping. The semantic map is beneficial to improving autonomy and robustness of the unmanned aerial vehicle, completing more complex tasks, and converting path planning into task planning.
With the improvement of hardware computing power and optimization of algorithm structures, deep learning has achieved more and more attention.
A great leap is taken in the field of computer vision, and in terms of RGB image segmentation, the method can be roughly divided into target detection and semantic segmentation.
The method mainly aims at providing a target detection framework in the early stage, and more accurate target detection is realized.
The mainstream deep learning target detection framework is mainly based on CNN (Convolutional Neural Networks, convolutional neural network), wherein the YOLO (You Only Look Once, you only see once) series and R-CNN (Region-CNN, regional convolutional neural network) series are more efficient.
The technology of target perception in three-dimensional images is more and more mature, and the demand for three-dimensional understanding is also more and more urgent. Due to the irregularities of the point cloud, most researchers will convert the points into regular voxel or grid models, making predictions using deep neural networks.
Direct semantic segmentation of point cloud space requires significant computational resources to be consumed and interrelationships between spatial points are weakened.
The PointNet proposed in 2017 is the first deep neural network that can directly process the original three-dimensional point cloud.
Most of the existing visual SLAM systems adopt a dense mapping method, lack of semantic information and cannot meet the intelligent requirement.
A typical assumption for the visual SLAM algorithm is that the scene is stationary, where the presence of some dynamic objects affects not only the estimation of camera pose but also leaves shadows in the map, affecting the map quality.
The pictures captured by the camera under the condition of high-speed movement are easy to blur, and the extraction and matching of the characteristic points are greatly influenced.
Disclosure of Invention
The invention aims to provide a semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, which solves the technical problem that a dynamic object moving at a high speed affects the quality of a map to be established.
In order to achieve the above object, the present invention provides a semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, comprising the following steps:
s1, calibrating camera parameters and correcting camera distortion;
s2, acquiring an image frame sequence, wherein the image frame sequence comprises an RGB image and a depth image;
s3, preprocessing an image, and obtaining coordinates of a real three-dimensional space point corresponding to each pixel point in the RGB image by adopting a pinhole camera model;
s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5;
s5, performing dynamic fuzzy compensation, and calculating to obtain an image block centroid of the current image frame as a semantic feature point to be used as a supplement of ORB feature points;
s6, semantic segmentation is carried out on the image frame, ORB feature points are extracted, semantic segmentation is carried out by using a mask region convolution neural network algorithm model, and semantic information of each pixel point of the image of the frame is obtained;
s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model;
s8, inputting semantic information into a sparse SLAM algorithm model, assisting in dense semantic map construction, completing traversing of key frames, and realizing three-dimensional semantic map construction of a global point cloud map.
In an embodiment, the correcting camera distortion in step S1 further includes the steps of:
s11, camera seatThree-dimensional space point P (X, Y, Z) of the standard system is projected onto the normalized image plane to form normalized coordinates of the point as [ X, Y ]] T ;
S12, for the points [ x, y ] on the normalized plane] T Radial distortion and tangential distortion correction are performed by the following formula:
wherein [ x ] corrected ,y corrected ] T Is the corrected point coordinates, p 1 ,p 2 Is the tangential distortion coefficient, k, of the camera 1 ,k 2 ,k 3 The radial distortion coefficient of the camera is that r is the distance from the point P to the origin of the coordinate system;
s13, point [ x ] after correction corrected ,y corrected ] T Projection onto the pixel plane by means of an internal matrix of parameters yields its correct position u, v on the image] T The method is realized by the following formula:
wherein f x ,f y ,c x ,c y Is an intrinsic parameter of the camera.
In an embodiment, the image preprocessing in step S3 further includes pixel points [ u, v ]] T The mapping relation to the real three-dimensional space point P (X, Y, Z) satisfies the following formula:
wherein K is called an inner parameter matrix, f x ,f y ,c x ,c y For the internal parameters of the camera, P is the real three-dimensional space point coordinates, [ u, v ]] T Is the pixel point coordinates.
In one embodiment, the key frames of step S4 are screened using a sparse SLAM algorithm model.
In an embodiment, the centroid of the image block in the step S5 is obtained by:
labeling each object of the frame image as a specific class;
for each segmented object, a corresponding labeling area exists, and the segmented image is called an image block;
Calculating corresponding centroid C as semantic feature point to supplement ORB feature point, wherein
In an embodiment, the step S6 further includes:
the semantic information of each pixel point comprises a semantic classification label, bounding box coordinates and a confidence score of the classification;
and based on the semantic segmentation result, eliminating ORB characteristic points extracted from the region of which the specified category is the dynamic object.
In an embodiment, the semantic segmentation is performed by the Mask R-CNN algorithm model in step S6, further including:
extracting features on different levels of an input image through a feature map pyramid network;
the interesting proposal is put forward through a regional generation network;
carrying out proposal area alignment by utilizing the arrangement of the interested areas;
performing mask segmentation by using a full convolution network;
and determining the region coordinates and classifying the categories by using a full connection layer.
In an embodiment, the sparse SLAM algorithm model further includes a trace thread, a local mapping thread, a loop detection thread:
the tracking thread is used for locating a camera of each frame of picture by searching for and matching the local map features and minimizing the re-projection error by using a pure motion beam adjustment method;
the local map building thread manages and optimizes a local map by executing a local beam adjustment method, maintains a common view relation between key frames by map points, and optimizes the pose of the common view key frames and map points by the local beam adjustment method;
the loop detection thread detects a large loop, corrects drift errors by executing pose diagram optimization, accelerates screening of closed loop matching frames, optimizes the scale, and optimizes an essential diagram and map points by a global beam adjustment method.
In an embodiment, the sparse SLAM algorithm model further includes a global beam adjustment method optimization thread, triggered after confirmation of the loop detection thread, and after pose map optimization, calculating the optimal structure and motion result of the whole system.
In an embodiment, the pose calculation in step S7 further includes: the camera pose is calculated through PnP solution in a preliminary step, the camera pose is optimized and calculated through a rear-end pose graph, and a minimized reprojection error of camera pose estimation is constructed:
wherein u is i For pixel coordinates, P i Is camera coordinate, ζ is corresponding lie algebra of camera pose, s i And K is the depth of the feature points and K is the matrix of parameters in the camera.
According to the semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, a dense semantic map for eliminating dynamic objects is established based on the Mask R-CNN algorithm model and the sparse SLAM algorithm model of ORB feature points, the performance of an unmanned aerial vehicle semantic mapping system is improved by utilizing inter-frame information and semantic information on image frames, and the robustness of feature point extraction and matching for dynamic scenes is improved.
Drawings
The above and other features, properties and advantages of the present invention will become more apparent from the following description of embodiments taken in conjunction with the accompanying drawings in which like reference characters designate like features throughout the drawings, and in which:
FIG. 1 discloses a flow chart of a method according to an embodiment of the invention;
FIG. 2 discloses a calibration plate for camera calibration according to an embodiment of the present invention;
FIG. 3a discloses a pinhole imaging model diagram of a pinhole camera according to one embodiment of the invention;
FIG. 3b discloses a similar triangular schematic of a pinhole camera according to one embodiment of the invention;
FIG. 4 discloses a system flow diagram of a Mask RCNN in accordance with an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The concept of a semantic map refers to a map containing rich semantic information, and represents abstraction of semantic information such as space geometric relations and existing object types and positions in the environment. The semantic map contains both ambient space information and a map of ambient semantic information so that the mobile robot can know, like a person, both the presence of objects in the environment and what the objects are.
Aiming at the problems and the shortcomings in the prior art, the invention provides a semantic mapping system based on visual SLAM and two-dimensional semantic segmentation, which uses ORB (Oriented FAST and Rotated BRIEF, rapid guiding and brief rotation) based feature points to perform semantic segmentation, combines a sparse SLAM algorithm model, and completes semantic mapping while realizing positioning.
Fig. 1 discloses a flowchart of a semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to an embodiment of the present invention, and in the embodiment shown in fig. 1, the semantic mapping method based on visual SLAM and two-dimensional semantic segmentation provided by the present invention specifically includes the following steps:
s1, calibrating camera parameters and correcting camera distortion;
s2, acquiring an image frame sequence, wherein the image frame sequence comprises an RGB image and a depth image;
s3, preprocessing an image, and obtaining coordinates of a real three-dimensional space point corresponding to each pixel point in the RGB image by adopting a pinhole camera model;
s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5;
s5, performing dynamic fuzzy compensation, and calculating to obtain an image block centroid of the current image frame as a semantic feature point to be used as a supplement of ORB feature points;
s6, semantic segmentation is carried out on the image frame, ORB feature points are extracted, semantic segmentation is carried out by using a Mask R-CNN algorithm model, and semantic information of each pixel point of the image of the frame is obtained;
s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model;
s8, inputting semantic information into a sparse SLAM algorithm model, assisting in dense semantic map construction, completing traversing of key frames, and realizing three-dimensional semantic map construction of a global point cloud map.
Each step is described in detail below.
Step S1: calibrating camera parameters and correcting camera distortion.
In image measurement processes and machine vision applications, in order to determine the interrelation between the three-dimensional geometric position of a point on the surface of a spatial object and its corresponding point in the image, a geometric model of camera imaging must be established, and these geometric model parameters are camera parameters.
The distortion coefficient belongs to one of the camera parameters and corresponds to the camera distortion phenomenon. Under most conditions, these camera parameters must be obtained through experimentation and calculation, and this process of solving for the parameters is called camera calibration (or camera calibration).
Camera distortion includes radial distortion and tangential distortion.
The radial distortion is caused by the lens shape.
More specifically, in the pinhole model, a straight line is projected onto the pixel plane or a straight line.
However, in an actual photograph, the lens of the camera tends to cause a straight line in the real environment to become a curve in the photograph, and such distortion is called radial distortion.
The tangential distortion is formed during assembly of the camera by the inability to make the lens and imaging plane exactly parallel.
Since the light projection causes inconsistency between the actual object and the image projected onto the 2D plane, the inconsistency is stable, and the distortion correction of the subsequent image can be realized by calibrating the camera and calculating the distortion parameter.
For radial distortion, correction is performed with a second and higher order polynomial function relating to distance from center:
wherein [ x, y] T Is the coordinates of an uncorrected point, [ x ] corrected ,y corrected ] T K for corrected point coordinates 1 , k 2 ,k 3 The radial distortion coefficient of the camera is represented by r, which is the distance of the point P from the origin of the coordinate system.
Two other parameters p can be used for tangential distortion 1 ,p 2 To correct:
wherein [ x, y] T Is the coordinates of an uncorrected point, [ x ] corrected ,y corrected ] T For corrected point coordinates, p 1 , p 2 Is the tangential distortion coefficient of the camera, r is the distance from the point P to the origin of the coordinate systemAnd (5) separating.
Before the camera is used, three-dimensional information is obtained from a two-dimensional image by calibrating a radial distortion coefficient and a tangential distortion coefficient of the camera, so that the distortion correction, object measurement, three-dimensional reconstruction and the like of the image are realized.
Fig. 2 shows a calibration plate for calibrating a camera according to an embodiment of the present invention, the calibration plate shown in fig. 2 is placed in a visible range of the camera, each time a picture is taken, the calibration plate is changed in position and orientation, feature points in an image are detected, internal parameters and external parameters of the camera are obtained, and distortion coefficients are obtained.
Preferably, solving for camera parameters is performed using a Camera Calibrator (camera correction) toolbox in MATLAB.
For point P (X, Y, Z) in the camera coordinate system, step S1 of the present invention performs camera distortion correction by 5 distortion coefficients, finding the correct position of this point on the pixel plane.
The correction steps of the camera distortion are as follows:
s11, projecting the three-dimensional space points to a normalized image plane. Let its normalized coordinates be [ x, y ]] T 。
S12, correcting radial distortion and tangential distortion of points on the normalized plane.
Wherein [ x ] corrected ,y corrected ] T Is the corrected point coordinates, p 1 ,p 2 Is the tangential distortion coefficient, k, of the camera 1 ,k 2 ,k 3 The radial distortion coefficient of the camera is represented by r, which is the distance of the point P from the origin of the coordinate system.
S13, correcting the corrected point [ x ] corrected ,y corrected ] T The correct position coordinates [ u, v ] of the point on the image are obtained by projecting the internal parameter matrix to the pixel plane] T 。
Wherein f x ,f y ,c x ,c y Is an intrinsic parameter of the camera.
And S2, acquiring an image frame sequence.
And acquiring an RGB-D image frame sequence by using a Kinect camera, wherein the image frame sequence comprises an RGB image and a depth image.
Step S3, image preprocessing
In one embodiment, an RGB-D camera is used as the primary sensor, an RGB image and a depth image are obtained simultaneously, and a pinhole camera model is used to map pixels of the RGB image into a real three-dimensional space.
Fig. 3a discloses a pinhole imaging model diagram of a pinhole camera according to an embodiment of the present invention, and fig. 3b discloses a similar triangle schematic diagram of a pinhole camera according to an embodiment of the present invention, wherein a camera coordinate system O-x-y-z is established, and the optical center position of the camera is taken as the origin O of the coordinate system, and the direction of the contracted arrow is taken as the forward direction, as shown in fig. 3a and 3 b.
By mapping the similar triangles shown in fig. 3b, a coordinate system O '-x' -y '-z' is established on the imaging plane of the camera, assuming the direction of the arrow as forward.
Suppose the P point coordinates are [ X, Y, Z] T The focal length of the camera lens is f, which is the distance from the camera optical center to the physical imaging plane.
Point P is projected through the optical center to point P' of the imaging plane, coordinates [ u, v ] of pixel point P ]] T 。
According to the corresponding relation, the mapping relation corresponds to the scaling and translation amount of one scale, and the deduction can be obtained:
wherein K is called an in-camera parameter matrix, which is an intrinsic parameter, and has been calibrated in step S1, f x ,f y ,c x ,c y Is a cameraInternal parameters, P is the real three-dimensional space point coordinates, [ u, v ]] T Is the pixel point coordinates.
Step S4, judging whether the frame is a key frame, if so, turning to step S6, and if not, turning to step S5;
if each frame of image is adopted to perform visual SLAM and semantic segmentation calculation, the calculated amount is too large, and therefore, the high quality of the image is selected as a key frame.
In the invention, a sparse SLAM algorithm model based on ORB (Oriented FAST and Rotated BRIEF, fast steering and simple rotation) feature points is used to screen key frames.
Each key frame contains an RGB image and a depth image.
Step S5, dynamic blurring
Because dynamic objects may exist in each frame of image, a certain number of targets are designated as dynamic targets each time a semantic mapping task is executed. In an image sequence, if the dynamic target is identified in the frame image, the method eliminates the corresponding point cloud when converting the two-dimensional pixel point into the three-dimensional space coordinate, and prevents the dynamic object from leaving a residual shadow in the map and affecting the quality of the map construction.
In step S5 of the present invention, if the frame image is not a key frame, because of motion blur, the ORB feature point is not extracted sufficiently, the following operations are performed as a complement before the image semantic segmentation step of step S6:
each object of the frame image is marked as a specific class, each segmented object is provided with a corresponding marked area, the segmented image is called an image block, and the moment m of the image block B is calculated pq :
Centroid position C is:
the centroid serves as a semantic feature point to supplement the deficiency of the ORB feature point.
And supplementing semantic feature points aiming at a fuzzy image with serious ORB feature point loss, and inhibiting the matching of a tracking algorithm to a dynamic object, comprehensively screening key frames, estimating the pose of a camera and preventing a mapping algorithm from including a moving object as a part of a 3D map.
Step S6, semantic segmentation
And (3) extracting ORB characteristic points for each image frame, and performing semantic segmentation by using a Mask RCNN (Mask Region-CNN) algorithm model to acquire semantic information of each pixel point of the frame image.
Based on the semantic segmentation result, if a dynamic object is identified, the ORB feature points extracted from the region which designates a certain category as a dynamic object are eliminated.
The inhibit vision SLAM algorithm includes moving objects as part of the 3D map during the mapping process.
In step S6 of the invention, the Mask RCNN algorithm model is trained by adopting a COCO data set.
COCO, collectively Common Objects in COntext, is a data set provided by Microsoft team that can be used for image recognition, and can obtain 80 categories of classification information.
FIG. 4 is a system flow chart of Mask RCNN according to an embodiment of the invention, as shown in FIG. 4, the RGB image semantic segmentation of the image frame is realized based on the Mask R-CNN algorithm model, and the steps of semantic segmentation based on the convolutional neural network framework of the Mask R-CNN algorithm model are as follows:
extracting features on different levels of the input image through an FPN (Feature Pyramid Networks, feature map pyramid network);
the proposal of interest is proposed by the RPN (Region Proposal Network, regional generation network);
proposed region alignment is performed using the RoI alignment (Region of Interest Align, region of interest arrangement);
mask segmentation using FCN (Fully Convolutional Networks, full convolution network);
region coordinate determination and category classification are performed by using FC (Fully Connected Layers, full connection layer).
The frame image is processed by a Mask RCNN algorithm model, a semantic classification result of a pixel level is generated, namely, a semantic classification label of each pixel point is generated, and meanwhile, bounding box coordinates and confidence scores of the classification are output.
The invention adopts ORB characteristic points to carry out tracking, mapping and position identification tasks, and the ORB characteristic points have the advantages of rotation invariance and scale invariance, can rapidly extract characteristics and match, can meet the requirement of real-time operation, and can show good precision in the position identification process based on word bags.
S7 pose calculation
The visual odometer pose estimation is for two adjacent frames of images, it being understood that a plurality of such frame pose estimation accumulations are the camera's motion trajectories.
The camera pose was calculated using a sparse SLAM algorithm model based on ORB (Oriented FAST and Rotated BRIEF, fast steering and simple rotation) feature points.
After extracting the characteristic points of the image frames, estimating the pose of the camera by using PnP based on the key frames.
PnP is the abbreviation of Perselect-n-Point (n-Point Perspective), which is a method for solving the motion of 3D to 2D Point pairs: i.e. how to solve the pose of the camera when n 3D spatial points and their projection positions are given.
Assume that at time k, the camera position is x k The camera input data is u k ,w k For noise, the motion equation is constructed:
x k =f(x k-1 ,u k ,w k )。
at x k Locally observing the landmark point y j Generating a series of observations z k,j ,v k,j To observe noise, an observation equation is constructed:
z k,j =h(y j ,x k ,v k,j )。
in the step S7 of the invention, the camera pose can be initially calculated through PnP problem solving, and then the camera pose with more accuracy is further calculated by utilizing the rear-end pose chart optimization.
In step S7 of the invention, the PnP problem of camera pose estimation is constructed into a nonlinear least square problem on a domain lie algebra.
Furthermore, the camera pose estimation in step S7 of the present invention is constructed as a BA (Bundle Adjustment, beam adjustment method) problem, and the minimized re-projection error of the camera pose estimation is constructed:
wherein u is i For pixel coordinates, P i Is camera coordinate, ζ is corresponding lie algebra of camera pose, s i The depth of the feature points is K, the matrix of parameters in the camera is K, and n is the number of the points.
Further, step S7 of the present invention further includes repositioning using an embedded location recognition model based on DBOW (Direct index Bag of words, bag of words model) to prevent tracking failure, or re-initialization of known map scenes, loop detection, etc.
In the invention, a sparse SLAM algorithm model is adopted for screening key frames and calculating camera pose, and is obtained by improvement on the basis of ORB-SLAM2 (Oriented FAST and Rotated BRIEF-Simultaneous Localization and Mapping 2, and the second generation rapid guiding and brief rotating instant positioning and map construction).
The SLAM algorithm model consists of 4 parallel threads, including a tracking thread, a local mapping thread, a loop detection thread and a global BA optimization thread.
Further, the global BA optimization thread is executed only after the loop detection thread is validated.
The first three threads are parallel threads, and are defined as follows:
1) The thread is tracked.
And (3) locating the camera of each frame of picture by searching for matching the local map features and minimizing the re-projection error by using the pure motion BA.
Preferably, the matching is performed using a constant speed model.
2) And (5) a local mapping thread.
The co-view relationship between key frames is maintained by MapPoints by performing local BA management of local maps and optimization, and the co-view key frame pose and MapPoints are optimized by local BA.
3) The loop detects the thread.
Detecting large loops and correcting drift errors by performing pose map optimization, accelerating the screening of closed loop matching frames by a cow, and optimizing Essential Graph (essence map) and MapPoints by a Sim3 optimization scale by a global BA. The Sim3 transform is a similarity transform.
The loop detection thread triggers the global BA optimization thread.
And (3) a global BA thread, and after optimizing the pose graph, calculating the optimal structure and the motion result of the whole system.
Compared with the dense SLAM algorithm model in the prior art, the sparse SLAM algorithm model adds rich semantic segmentation information of the image in the final image building process through semantic information fusion.
Step S8, three-dimensional semantic mapping
And (3) inputting semantic information into a sparse SLAM algorithm model by utilizing the semantic segmentation result of the step (S6) and combining the frame pose information acquired in the step (S7) and the real three-dimensional coordinates of the image frame pixels, projecting the same objects contained in the semantics of the frame images into a three-dimensional point cloud map with the same labeling color, assisting in dense semantic map construction, completing traversing of a key frame, and realizing three-dimensional semantic map construction of a global point cloud map.
Step S8 of the present invention further includes:
s81, projecting three-dimensional space pixels generated by a first frame key frame into an initial point cloud;
s82, generating a point cloud map through three-dimensional space coordinates corresponding to each pixel point of the current key frame obtained through calculation of the pinhole model;
s83, calculating to obtain pose changes of the current key frame and the previous key frame;
s84, overlapping and fusing three-dimensional coordinate points of the two point cloud maps through the pose transformation matrix to generate a point cloud map with more information;
s85, iterating the steps continuously, and constructing the global point cloud map when traversing all the key frames is completed.
The test results of the unmanned aerial vehicle semantic mapping method based on visual SLAM and two-dimensional semantic segmentation are further described in detail below in combination with specific tests.
Based on an operating system Ubuntu16.04 and a hardware display card Nvidia Geforce GTX 1050, the test uses Tensorflow, openCV, g O, point Cloud Library and other software tools to take a real scene as an experimental condition, and utilizes a Kinect V1 camera to real-shoot data.
For three-dimensional semantic diagramming evaluation, Q 1 Representing the number of correctly detected articles, Q 2 Representing the number of detected objects but misclassification and the actual presence of objects but not detected, Q 3 Instead of the number of objects but detected results, P represents the correct detection rate of the three-dimensional object, and the calculation method is as follows:
P=Q 1 /(Q 1 +Q 2 +Q 3 )
and (3) carrying out experimental record by constructing a dense semantic map for 9 times, wherein the accurate detection rate of the average three-dimensional object in the calculated map is 48.1086%, and the specific experimental result is shown in the following table:
TABLE 1
According to the semantic mapping method based on visual SLAM and two-dimensional semantic segmentation, a dense semantic map for eliminating dynamic objects is established based on the Mask R-CNN algorithm model and the sparse SLAM algorithm model of ORB feature points, the performance of an unmanned aerial vehicle semantic mapping system is improved by utilizing inter-frame information and semantic information on image frames, and the robustness of feature point extraction and matching for dynamic scenes is improved.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood and appreciated by those skilled in the art.
As used in this application and in the claims, the terms "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
The embodiments described above are intended to provide those skilled in the art with a full range of modifications and variations to the embodiments described above without departing from the inventive concept thereof, and therefore the scope of the invention is not limited to the embodiments described above, but is to be accorded the broadest scope consistent with the innovative features recited in the claims.
Claims (10)
1. A semantic mapping method based on visual SLAM and two-dimensional semantic segmentation is characterized by comprising the following steps:
s1, calibrating camera parameters and correcting camera distortion;
s2, acquiring an image frame sequence, wherein the image frame sequence comprises an RGB image and a depth image;
s3, preprocessing an image, and obtaining coordinates of a real three-dimensional space point corresponding to each pixel point in the RGB image by adopting a pinhole camera model;
s4, judging whether the current image frame is a key frame, if so, turning to a step S6, and if not, turning to a step S5;
s5, performing dynamic fuzzy compensation, and calculating to obtain an image block centroid of the current image frame as a semantic feature point to be used as a supplement of ORB feature points;
s6, semantic segmentation is carried out on the image frame, ORB feature points are extracted, semantic segmentation is carried out by using a mask region convolution neural network algorithm model, and semantic information of each pixel point of the image of the frame is obtained;
s7, pose calculation, namely calculating the pose of the camera by using a sparse SLAM algorithm model;
s8, inputting semantic information into a sparse SLAM algorithm model, assisting in dense semantic map construction, completing traversing of key frames, and realizing three-dimensional semantic map construction of a global point cloud map.
2. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the correcting camera distortion in step S1 further comprises the steps of:
s11, projecting a three-dimensional space point P (X, Y, Z) of a camera coordinate system to a normalized image plane to form a normalized coordinate of the point as [ X, Y] T ;
S12, for the points [ x, y ] on the normalized plane] T Radial distortion and tangential distortion correction are performed by the following formula:
wherein [ x ] corrected ,y corrected ] T Is the corrected point coordinates, p 1 ,p 2 Is the tangential distortion coefficient, k, of the camera 1 ,k 2 ,k 3 Is a cameraThe radial distortion coefficient, r, is the distance of the point P from the origin of the coordinate system;
s13, point [ x ] after correction corrected ,y corrected ] T Projection onto the pixel plane by means of an internal matrix of parameters yields its correct position u, v on the image] T The method is realized by the following formula:
wherein f x ,f y ,c x ,c y Is an intrinsic parameter of the camera.
3. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 2, wherein the image preprocessing of step S3 further comprises pixel points [ u, v ]] T The mapping relation to the real three-dimensional space point P (X, Y, Z) satisfies the following formula:
wherein K is called an in-camera parameter matrix, f x ,f y ,c x ,c y For the internal parameters of the camera, P is the real three-dimensional space point coordinates, [ u, v ]] T Is the pixel point coordinates.
4. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the key frames of step S4 are screened using a sparse SLAM algorithm model.
5. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the centroid of the image block in step S5 is obtained by:
labeling each object of the frame image as a specific class;
for each segmented object, a corresponding labeling area exists, and the segmented image is called an image block;
Calculating corresponding centroid C as semantic feature point to supplement ORB feature point, wherein
6. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the step S6 further comprises:
the semantic information of each pixel point comprises a semantic classification label, bounding box coordinates and a confidence score of the classification;
and based on the semantic segmentation result, eliminating ORB characteristic points extracted from the region of which the specified category is the dynamic object.
7. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the Mask R-CNN algorithm model of step S6 performs semantic segmentation, further comprising:
extracting features on different levels of an input image through a feature map pyramid network;
the interesting proposal is put forward through a regional generation network;
carrying out proposal area alignment by utilizing the arrangement of the interested areas;
performing mask segmentation by using a full convolution network;
and determining the region coordinates and classifying the categories by using a full connection layer.
8. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1 or claim 4, wherein the sparse SLAM algorithm model further comprises a tracking thread, a local mapping thread, and a loop detection thread:
the tracking thread is used for locating a camera of each frame of picture by searching for and matching the local map features and minimizing the re-projection error by using a pure motion beam adjustment method;
the local map building thread manages and optimizes a local map by executing a local beam adjustment method, maintains a common view relation between key frames by map points, and optimizes the pose of the common view key frames and map points by the local beam adjustment method;
the loop detection thread detects a large loop, corrects drift errors by executing pose diagram optimization, accelerates screening of closed loop matching frames, optimizes the scale, and optimizes an essential diagram and map points by a global beam adjustment method.
9. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the sparse SLAM algorithm model further comprises a global beam adjustment method optimization thread, wherein the global beam adjustment method optimization thread is triggered after loop detection thread confirmation, and the optimal structure and the motion result of the whole system are calculated after pose map optimization.
10. The semantic mapping method based on visual SLAM and two-dimensional semantic segmentation according to claim 1, wherein the pose calculation of step S7 further comprises: solving the initial computer pose through PnP, optimizing the computer pose by utilizing a rear-end pose graph, and constructing a minimized reprojection error of the camera pose estimation:
wherein u is i For pixel coordinates, P i Is camera coordinate, ζ is corresponding lie algebra of camera pose, s i And K is the depth of the feature points and K is the matrix of parameters in the camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010246158.2A CN111462135B (en) | 2020-03-31 | 2020-03-31 | Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010246158.2A CN111462135B (en) | 2020-03-31 | 2020-03-31 | Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111462135A CN111462135A (en) | 2020-07-28 |
CN111462135B true CN111462135B (en) | 2023-04-21 |
Family
ID=71680957
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010246158.2A Active CN111462135B (en) | 2020-03-31 | 2020-03-31 | Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111462135B (en) |
Families Citing this family (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109559320B (en) * | 2018-09-18 | 2022-11-18 | 华东理工大学 | Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network |
CN111897332B (en) * | 2020-07-30 | 2022-10-11 | 国网智能科技股份有限公司 | Semantic intelligent substation robot humanoid inspection operation method and system |
CN111950561A (en) * | 2020-08-25 | 2020-11-17 | 桂林电子科技大学 | Semantic SLAM dynamic point removing method based on semantic segmentation |
CN112132893B (en) * | 2020-08-31 | 2024-01-09 | 同济人工智能研究院(苏州)有限公司 | Visual SLAM method suitable for indoor dynamic environment |
CN112017188B (en) * | 2020-09-09 | 2024-04-09 | 上海航天控制技术研究所 | Space non-cooperative target semantic recognition and reconstruction method |
CN112258575B (en) * | 2020-10-13 | 2022-12-02 | 浙江大学 | Method for quickly identifying object in synchronous positioning and map construction |
CN112344922B (en) * | 2020-10-26 | 2022-10-21 | 中国科学院自动化研究所 | Monocular vision odometer positioning method and system |
CN112446882A (en) * | 2020-10-28 | 2021-03-05 | 北京工业大学 | Robust visual SLAM method based on deep learning in dynamic scene |
CN112183476B (en) * | 2020-10-28 | 2022-12-23 | 深圳市商汤科技有限公司 | Obstacle detection method and device, electronic equipment and storage medium |
CN112348868A (en) * | 2020-11-06 | 2021-02-09 | 养哇(南京)科技有限公司 | Method and system for recovering monocular SLAM scale through detection and calibration |
CN112308921B (en) * | 2020-11-09 | 2024-01-12 | 重庆大学 | Combined optimization dynamic SLAM method based on semantics and geometry |
CN112465021B (en) * | 2020-11-27 | 2022-08-05 | 南京邮电大学 | Pose track estimation method based on image frame interpolation method |
CN112396595B (en) * | 2020-11-27 | 2023-01-24 | 广东电网有限责任公司肇庆供电局 | Semantic SLAM method based on point-line characteristics in dynamic environment |
CN112381841A (en) * | 2020-11-27 | 2021-02-19 | 广东电网有限责任公司肇庆供电局 | Semantic SLAM method based on GMS feature matching in dynamic scene |
CN112571415B (en) * | 2020-12-03 | 2022-03-01 | 哈尔滨工业大学(深圳) | Robot autonomous door opening method and system based on visual guidance |
CN112465858A (en) * | 2020-12-10 | 2021-03-09 | 武汉工程大学 | Semantic vision SLAM method based on probability grid filtering |
CN112509051A (en) * | 2020-12-21 | 2021-03-16 | 华南理工大学 | Bionic-based autonomous mobile platform environment sensing and mapping method |
CN112507056B (en) * | 2020-12-21 | 2023-03-21 | 华南理工大学 | Map construction method based on visual semantic information |
CN112734845B (en) * | 2021-01-08 | 2022-07-08 | 浙江大学 | Outdoor monocular synchronous mapping and positioning method fusing scene semantics |
CN112990195A (en) * | 2021-03-04 | 2021-06-18 | 佛山科学技术学院 | SLAM loop detection method for integrating semantic information in complex environment |
CN112991436B (en) * | 2021-03-25 | 2022-09-06 | 中国科学技术大学 | Monocular vision SLAM method based on object size prior information |
CN113034584B (en) * | 2021-04-16 | 2022-08-30 | 广东工业大学 | Mobile robot visual positioning method based on object semantic road sign |
CN113192200B (en) * | 2021-04-26 | 2022-04-01 | 泰瑞数创科技(北京)有限公司 | Method for constructing urban real scene three-dimensional model based on space-three parallel computing algorithm |
CN113516692B (en) * | 2021-05-18 | 2024-07-19 | 上海汽车集团股份有限公司 | SLAM method and device for multi-sensor fusion |
CN113537208B (en) * | 2021-05-18 | 2024-06-11 | 杭州电子科技大学 | Visual positioning method and system based on semantic ORB-SLAM technology |
CN113269831B (en) * | 2021-05-19 | 2021-11-16 | 北京能创科技有限公司 | Visual repositioning method, system and device based on scene coordinate regression network |
CN113674340A (en) * | 2021-07-05 | 2021-11-19 | 北京物资学院 | Binocular vision navigation method and device based on landmark points |
CN113610763B (en) * | 2021-07-09 | 2024-07-16 | 北京航天计量测试技术研究所 | Rocket engine structural member pose motion compensation method in vibration environment |
CN113808251B (en) * | 2021-08-09 | 2024-04-12 | 杭州易现先进科技有限公司 | Dense reconstruction method, system, device and medium based on semantic segmentation |
CN114730471A (en) * | 2021-08-13 | 2022-07-08 | 深圳市大疆创新科技有限公司 | Control method, control device, movable platform and storage medium |
CN113658257B (en) * | 2021-08-17 | 2022-05-27 | 广州文远知行科技有限公司 | Unmanned equipment positioning method, device, equipment and storage medium |
CN113674416B (en) * | 2021-08-26 | 2024-04-26 | 中国电子科技集团公司信息科学研究院 | Three-dimensional map construction method and device, electronic equipment and storage medium |
CN113903011B (en) * | 2021-10-26 | 2024-06-11 | 江苏大学 | Semantic map construction and positioning method suitable for indoor parking lot |
CN114202579B (en) * | 2021-11-01 | 2024-07-16 | 东北大学 | Dynamic scene-oriented real-time multi-body SLAM system |
CN114132360B (en) * | 2021-11-08 | 2023-09-08 | 卡斯柯信号有限公司 | Method, equipment and storage medium for preventing turnout from being squeezed based on image discrimination of turnout state |
CN114359493B (en) * | 2021-12-20 | 2023-01-03 | 中国船舶重工集团公司第七0九研究所 | Method and system for generating three-dimensional semantic map for unmanned ship |
CN114529800A (en) * | 2022-01-12 | 2022-05-24 | 华南理工大学 | Obstacle avoidance method, system, device and medium for rotor unmanned aerial vehicle |
CN114708321B (en) * | 2022-01-12 | 2024-06-21 | 北京航空航天大学 | Semantic-based camera pose estimation method and system |
CN114550186A (en) * | 2022-04-21 | 2022-05-27 | 北京世纪好未来教育科技有限公司 | Method and device for correcting document image, electronic equipment and storage medium |
CN114972470B (en) * | 2022-07-22 | 2022-11-18 | 北京中科慧眼科技有限公司 | Road surface environment obtaining method and system based on binocular vision |
CN115451939B (en) * | 2022-08-19 | 2024-05-07 | 中国人民解放军国防科技大学 | Parallel SLAM method under dynamic scene based on detection segmentation |
CN115164918B (en) * | 2022-09-06 | 2023-02-03 | 联友智连科技有限公司 | Semantic point cloud map construction method and device and electronic equipment |
CN116681755B (en) * | 2022-12-29 | 2024-02-09 | 广东美的白色家电技术创新中心有限公司 | Pose prediction method and device |
CN116342800B (en) * | 2023-02-21 | 2023-10-24 | 中国航天员科研训练中心 | Semantic three-dimensional reconstruction method and system for multi-mode pose optimization |
CN116339336A (en) * | 2023-03-29 | 2023-06-27 | 北京信息科技大学 | Electric agricultural machinery cluster collaborative operation method, device and system |
CN116817887B (en) * | 2023-06-28 | 2024-03-08 | 哈尔滨师范大学 | Semantic visual SLAM map construction method, electronic equipment and storage medium |
CN117392347B (en) * | 2023-10-13 | 2024-04-30 | 苏州煋海图科技有限公司 | Map construction method, device, computer equipment and readable storage medium |
CN117611762B (en) * | 2024-01-23 | 2024-04-30 | 常熟理工学院 | Multi-level map construction method, system and electronic equipment |
CN118447320B (en) * | 2024-05-13 | 2024-09-27 | 华智清创(苏州)农业科技有限公司 | Visual multitasking mounted agricultural inspection method and device based on deep learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097553A (en) * | 2019-04-10 | 2019-08-06 | 东南大学 | The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system |
WO2019169540A1 (en) * | 2018-03-06 | 2019-09-12 | 斯坦德机器人(深圳)有限公司 | Method for tightly-coupling visual slam, terminal and computer readable storage medium |
-
2020
- 2020-03-31 CN CN202010246158.2A patent/CN111462135B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019169540A1 (en) * | 2018-03-06 | 2019-09-12 | 斯坦德机器人(深圳)有限公司 | Method for tightly-coupling visual slam, terminal and computer readable storage medium |
CN110097553A (en) * | 2019-04-10 | 2019-08-06 | 东南大学 | The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system |
Non-Patent Citations (2)
Title |
---|
基于北斗RDSS的核辐射监测应急通讯方法;王廷银等;《计算机系统应用》;20191215(第12期);全文 * |
基于语义分割的增强现实图像配准技术;卞贤掌等;《电子技术与软件工程》;20181213(第23期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111462135A (en) | 2020-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111462135B (en) | Semantic mapping method based on visual SLAM and two-dimensional semantic segmentation | |
CN110097553B (en) | Semantic mapping system based on instant positioning mapping and three-dimensional semantic segmentation | |
CN110070615B (en) | Multi-camera cooperation-based panoramic vision SLAM method | |
CN111563415B (en) | Binocular vision-based three-dimensional target detection system and method | |
CN109166149B (en) | Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU | |
CN112734852B (en) | Robot mapping method and device and computing equipment | |
Zhu et al. | Online camera-lidar calibration with sensor semantic information | |
CN103886107B (en) | Robot localization and map structuring system based on ceiling image information | |
CN112396656A (en) | Outdoor mobile robot pose estimation method based on fusion of vision and laser radar | |
CN114140527B (en) | Dynamic environment binocular vision SLAM method based on semantic segmentation | |
CN111998862B (en) | BNN-based dense binocular SLAM method | |
CN113327296B (en) | Laser radar and camera online combined calibration method based on depth weighting | |
CN112101160B (en) | Binocular semantic SLAM method for automatic driving scene | |
US11703596B2 (en) | Method and system for automatically processing point cloud based on reinforcement learning | |
CN112801074A (en) | Depth map estimation method based on traffic camera | |
CN113744315B (en) | Semi-direct vision odometer based on binocular vision | |
CN117593650B (en) | Moving point filtering vision SLAM method based on 4D millimeter wave radar and SAM image segmentation | |
CN112767546B (en) | Binocular image-based visual map generation method for mobile robot | |
CN111899345B (en) | Three-dimensional reconstruction method based on 2D visual image | |
CN104166995B (en) | Harris-SIFT binocular vision positioning method based on horse pace measurement | |
CN114677531A (en) | Water surface unmanned ship target detection and positioning method fusing multi-mode information | |
CN115147344A (en) | Three-dimensional detection and tracking method for parts in augmented reality assisted automobile maintenance | |
CN116804553A (en) | Odometer system and method based on event camera/IMU/natural road sign | |
CN114648639B (en) | Target vehicle detection method, system and device | |
CN117409386A (en) | Garbage positioning method based on laser vision fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |