CN110415297B - Positioning method and device and unmanned equipment - Google Patents

Positioning method and device and unmanned equipment Download PDF

Info

Publication number
CN110415297B
CN110415297B CN201910629969.8A CN201910629969A CN110415297B CN 110415297 B CN110415297 B CN 110415297B CN 201910629969 A CN201910629969 A CN 201910629969A CN 110415297 B CN110415297 B CN 110415297B
Authority
CN
China
Prior art keywords
neural network
point cloud
visual image
target
positioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910629969.8A
Other languages
Chinese (zh)
Other versions
CN110415297A (en
Inventor
杨立荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201910629969.8A priority Critical patent/CN110415297B/en
Publication of CN110415297A publication Critical patent/CN110415297A/en
Application granted granted Critical
Publication of CN110415297B publication Critical patent/CN110415297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a positioning method, a positioning device and unmanned equipment, wherein one specific implementation mode of the method comprises the following steps: extracting visual image characteristics aiming at the current target visual image; searching a target point cloud image feature matched with the visual image feature from a pre-constructed feature library; the feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data; determining target positioning data associated with the target point cloud image features; current positioning information is determined based on the object positioning data. According to the embodiment, the pose is not required to be estimated firstly, and then positioning is carried out based on the estimated pose, so that the accuracy of positioning information is improved, and the requirement of the unmanned equipment on the positioning accuracy is met.

Description

Positioning method and device and unmanned equipment
Technical Field
The application relates to the technical field of unmanned driving, in particular to a positioning method and device and unmanned equipment.
Background
At present, in terms of positioning of unmanned equipment, in general, in some specific area environments, a visual image is acquired by using a visual image pickup device, and a pose tag corresponding to the visual image is determined, so as to obtain a training data set, and a target model is obtained by training using the training data set. When the unmanned equipment is positioned, the visual image acquired by the unmanned equipment at present is input to the target model to obtain the current pose information of the unmanned equipment, and the unmanned equipment is positioned based on the current pose information of the unmanned equipment. However, positioning by the above method requires estimating the pose first and then positioning based on the estimated pose, so that the obtained positioning information has low accuracy and is difficult to meet the requirement of the unmanned equipment on positioning accuracy.
Disclosure of Invention
In order to solve one of the above technical problems, the present application provides a positioning method, a positioning device and an unmanned device.
According to a first aspect of embodiments of the present application, there is provided a positioning method, including:
extracting visual image characteristics aiming at the current target visual image;
searching a target point cloud image feature matched with the visual image feature from a pre-constructed feature library; the feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data;
determining target positioning data associated with the target point cloud image features;
current positioning information is determined based on the object positioning data.
Optionally, the point cloud image features are extracted by using a pre-trained first target neural network;
the visual image features are extracted by utilizing a pre-trained second target neural network;
wherein the first target neural network and the second target neural network satisfy the following condition: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range.
Optionally, the first target neural network and the second target neural network are both deep convolutional neural networks;
extracting point cloud image features with the first target neural network by: inputting a frame of laser point cloud data into the first target neural network, and outputting point cloud image features corresponding to the frame of laser point cloud data by a deconvolution layer of the first target neural network;
extracting visual image features using the second target neural network by: and inputting a frame of visual image into the second target neural network, and outputting the visual image characteristics corresponding to the frame of visual image by the deconvolution layer of the second target neural network.
Optionally, the first target neural network and the second target neural network are trained by:
determining a sample sequence, wherein the sample sequence comprises a frame of sample laser point cloud data, a frame of sample visual image and a group of semantic labels corresponding to each acquisition time in a plurality of acquisition times;
iteratively executing the updating operation of the first neural network and the second neural network based on the sample sequence until a stopping condition is met, and respectively taking the first neural network and the second neural network after iterative updating as the first target neural network and the second target neural network; the update operation includes:
inputting sample laser point cloud data corresponding to any randomly selected acquisition time in the sample sequence into a current first neural network to obtain sample point cloud image characteristics and first semantics; the sample point cloud image features and the first semantics are respectively output by a current deconvolution layer and an output layer of a first neural network;
inputting the sample visual image corresponding to the acquisition time in the sample sequence into a current second neural network to obtain the sample visual image characteristics and a second semantic meaning; the sample visual image features and the second semantics are respectively output by a deconvolution layer and an output layer of a current second neural network;
determining a semantic label corresponding to the acquisition time in the sample sequence as a current semantic label;
updating a current first neural network and a current second neural network according to the sample point cloud image feature, the first semantic, the sample visual image feature, the second semantic and the current semantic label;
wherein the stop condition includes: a difference function convergence between the first semantic and the current semantic tag, a difference function convergence between the second semantic and the current semantic tag, and a difference function convergence between the sample point cloud image feature and the sample visual image feature.
Optionally, before extracting the visual image feature for the current target visual image, the method further includes:
acquiring a currently acquired visual image;
and converting the currently acquired visual image into a target visual image meeting preset shooting conditions.
Optionally, the preset shooting condition includes preset illumination and/or a preset shooting angle.
Optionally, the converting the currently acquired visual image into a target visual image meeting a preset shooting condition includes:
inputting the currently acquired visual image into a generator trained in advance to obtain a target visual image meeting preset shooting conditions; wherein the generator is obtained by training a generative confrontation network.
According to a second aspect of embodiments of the present application, there is provided a positioning apparatus, including:
the extraction module is used for extracting visual image characteristics aiming at the current target visual image;
the searching module is used for searching the target point cloud image characteristics matched with the visual image characteristics from a pre-constructed characteristic library; the feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data;
the determining module is used for determining target positioning data associated with the target point cloud image features;
and the positioning module is used for determining the current positioning information based on the object positioning data.
According to a third aspect of embodiments herein, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the above first aspects.
According to a fourth aspect of embodiments of the present application, there is provided an unmanned aerial vehicle comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the method of any one of the first aspect above.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the positioning method and device provided by the embodiment of the application, the visual image features are extracted aiming at the current target visual image, the target point cloud image features matched with the visual image features are searched from a pre-constructed feature library, and the feature library comprises the point cloud image features extracted aiming at multi-frame laser point cloud data. And determining target positioning data associated with the target point cloud image features, and determining current positioning information based on the target positioning data. Therefore, the pose is not required to be estimated firstly, and then the positioning is carried out based on the estimated pose, so that the accuracy of the positioning information is improved, and the requirement of the unmanned equipment on the positioning accuracy is met.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart illustrating a method of positioning according to an exemplary embodiment of the present application;
FIG. 2 is a flow chart illustrating another positioning method according to an exemplary embodiment of the present application;
FIG. 3 is a block diagram of a positioning device shown in the present application according to an exemplary embodiment;
FIG. 4 is a block diagram of another positioning device shown in the present application according to an exemplary embodiment;
FIG. 5 is a block diagram of another positioning device shown in the present application according to an exemplary embodiment;
FIG. 6 is a schematic diagram of an unmanned device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
As shown in fig. 1, fig. 1 is a flow chart illustrating a positioning method according to an exemplary embodiment, which may be applied in an unmanned device. Those skilled in the art will appreciate that the drone may include, but is not limited to, an unmanned vehicle, an unmanned robot, a drone, an unmanned ship, and the like. The method comprises the following steps:
in step 101, visual image features are extracted for a current target visual image.
In this embodiment, an image capturing device (e.g., a camera or the like) is installed on the unmanned device, and the image capturing device may be used to capture a visual image of the environment around the unmanned device in real time and perform positioning using the captured visual image.
In one implementation, the current visual image acquired by the image acquisition device can be used as the current target visual image. In another implementation manner, the current visual image acquired by the image acquisition device may be converted, and the obtained visual image after the conversion is used as the current target visual image. It is to be understood that the present application is not limited in this respect.
In this embodiment, the visual image features may be extracted for the current target visual image. The visual image features may be extracted in any reasonable manner known in the art and that may occur in the future. Optionally, a machine learning mode may be adopted, and a pre-trained neural network model is utilized to extract visual image features corresponding to the target visual image. It is to be understood that the present application is not limited in the particular manner in which the visual image features are extracted.
In step 102, target point cloud image features matching the visual image features are searched from a pre-constructed feature library.
In this embodiment, the feature library may include point cloud image features extracted for a plurality of frames of laser point cloud data. Specifically, the above feature library may be constructed in advance as follows: first, a sensor such as a laser radar and a positioning device may be mounted on the test apparatus. Then, the test equipment is driven under a specific scene (namely, a scene needing to be positioned by applying the method provided by the application). Meanwhile, laser radar can be adopted to collect laser point cloud data of the surrounding environment of the test equipment in real time, and positioning data is collected by a positioning device at the same time when the laser point cloud data is collected. Then, for each frame of collected laser point cloud data, corresponding point cloud image features can be extracted. And storing the point cloud image characteristics corresponding to each frame of laser point cloud data and the positioning data corresponding to the frame of laser point cloud data in an associated manner to obtain the characteristic library.
Wherein the point cloud image features may be extracted in any reasonable manner known in the art and that may occur in the future. Optionally, a machine learning mode may be adopted, and a pre-trained neural network model is used to extract point cloud image features corresponding to the laser point cloud data. It is to be understood that the present application is not limited to the specific manner of extracting the point cloud image features.
Alternatively, a feature library corresponding to each region may be constructed in advance for a plurality of different regions. And when positioning is carried out, selecting a feature library corresponding to the area according to the area where the unmanned equipment is located.
It should be noted that the test device may be a manually driven device, which is the same type of device as the unmanned device to which the present application is applied. For example, if the unmanned vehicle to which the present application is applied is an unmanned vehicle, the above-described test device may be a vehicle of the same type as the above-described unmanned vehicle. For another example, if the unmanned aerial vehicle to which the present application is applied is an unmanned aerial vehicle, the above-described test device may be a flight device of the same type as the above-described unmanned aerial vehicle. For another example, if the unmanned aerial vehicle to which the present application is applied is an unmanned robot, the above-described test device may be a robot of the same type as the above-described unmanned robot.
It should be noted that, although the visual image and the laser point cloud data are different types of data, both the visual image and the laser point cloud data can express the information of the surrounding environment and the image of the surrounding object in different ways. Therefore, the visual image and the laser point cloud data must contain the same type of information. For the visual image and the laser point cloud data collected at the same positioning point, the same characteristic information can be extracted from the visual image and the laser point cloud data respectively. Therefore, the visual image and the laser point cloud data are subjected to feature extraction to obtain visual image features and point cloud image features, and the visual image features and the point cloud image features can be used for matching so as to perform positioning.
In this embodiment, after the visual image feature corresponding to the target visual image is obtained, the target point cloud image feature matching the visual image feature may be searched from a pre-constructed feature library. Specifically, each point cloud image feature in the feature library may be traversed and the visual image feature may be compared to each point cloud image feature. And selecting the point cloud image feature with the maximum similarity with the visual image feature and the similarity greater than or equal to the preset similarity as the target point cloud image feature matched with the visual image feature.
In step 103, object location data associated with the object point cloud image features is determined.
In step 104, current positioning information is determined based on the object positioning data.
In this embodiment, when the feature library is constructed, the point cloud image features corresponding to each frame of laser point cloud data and the positioning data corresponding to the frame of laser point cloud data are stored in an associated manner, so that the positioning data associated with the target point cloud image features can be found from the feature library and used as the target positioning data. Then, the object location data may be used as the current location information, or the object location data may be slightly adjusted and the adjusted object location data may be used as the current location information.
According to the positioning method provided by the embodiment of the application, the visual image features are extracted aiming at the current target visual image, the target point cloud image features matched with the visual image features are searched from a pre-constructed feature library, and the feature library comprises the point cloud image features extracted aiming at multi-frame laser point cloud data. And determining target positioning data associated with the target point cloud image features, and determining current positioning information based on the target positioning data. Therefore, the pose is not required to be estimated firstly, and then the positioning is carried out based on the estimated pose, so that the accuracy of the positioning information is improved, and the requirement of the unmanned equipment on the positioning accuracy is met.
In some optional embodiments, the point cloud image features are extracted by using a first pre-trained target neural network, and the visual image features are extracted by using a second pre-trained target neural network. Wherein the first target neural network and the second target neural network satisfy the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range.
Specifically, in this embodiment, when the feature library is pre-constructed, corresponding point cloud image features need to be extracted for each frame of collected laser point cloud data, and the point cloud image features corresponding to each frame of laser point cloud data may be extracted by using a pre-trained first target neural network. When positioning is performed based on the feature library, the visual image features need to be extracted for the current target visual image, and the visual image features corresponding to the target visual image can be extracted by using a pre-trained second target neural network.
Wherein the first target neural network and the second target neural network can be obtained by training together, and the first target neural network and the second target neural network need to satisfy the following conditions: for any positioning point (for example, a positioning point under a specified scene which can be positioned by applying the method provided by the application), if the first target neural network is utilized, the point cloud image features are extracted according to the laser point cloud data corresponding to the positioning point. And extracting visual image features aiming at the visual images corresponding to the positioning points by utilizing a second target neural network. Then, the difference between the point cloud image features extracted by the first target neural network and the visual image features extracted by the second target neural network is within a preset error range.
It should be noted that, if the image acquisition device and the laser radar are installed on the testing device, for any one of the positioning points, the testing device simultaneously adopts the image acquisition device and the laser radar to respectively acquire the visual image and the laser point cloud data on the positioning point. And the laser point cloud data acquired by the testing equipment on the positioning point is the laser point cloud data corresponding to the positioning point. And the visual image acquired by the testing equipment on the positioning point is the visual image corresponding to the positioning point.
Because this embodiment utilizes first target neural network and second target neural network to draw point cloud image feature and visual image feature respectively for the point cloud image feature and the visual image feature dimension of drawing are high, and the expressive ability is strong, and the information that contains is abundanter, and the interference killing feature is stronger. In addition, because the first target neural network and the second target neural network meet the conditions, when the target point cloud image characteristics matched with the visual image characteristics are searched in the characteristic library, the searching can be directly and rapidly carried out based on the similarity between the visual image characteristics and the point cloud image characteristics, and the searching efficiency and the searching accuracy are improved.
In other optional embodiments, the first target neural network and the second target neural network may further satisfy the following condition: for any positioning point, a first semantic segmentation result is obtained by using the first target neural network and aiming at the laser point cloud data corresponding to the positioning point (the laser point cloud data corresponding to the positioning point is input to the first target neural network, and the first semantic segmentation result output by the output layer of the first target neural network can be obtained). And obtaining a second semantic segmentation result aiming at the visual image corresponding to the positioning point by using a second target neural network (the visual image corresponding to the positioning point is input into the second target neural network, and the second semantic segmentation result output by an output layer of the second target neural network can be obtained). And the difference between the first semantic segmentation result and the second semantic segmentation result is within a preset difference range.
In this embodiment, the first target neural network and the second target neural network further satisfy the above conditions, so that feature information expressed by the visual image feature point cloud image features extracted by using the first target neural network and the second target neural network is closer. When the target point cloud image characteristics matched with the visual image characteristics are searched in the characteristic library, the searching efficiency and the searching accuracy are further improved.
In other alternative embodiments, the first target neural network and the second target neural network are both deep convolutional neural networks.
In the present embodiment, the first target neural network may be a deep convolutional neural network having a function of processing a three-dimensional image, and the second target neural network may be a deep convolutional neural network having a function of processing a two-dimensional image.
In this embodiment, the point cloud image feature may be extracted by using the first target neural network as follows: inputting a frame of laser point cloud data into a first target neural network, and outputting point cloud image features corresponding to the frame of laser point cloud data by a deconvolution layer of the first target neural network.
In this embodiment, the visual image feature may be extracted using the second target neural network by: and inputting a frame of visual image into a second target neural network, and outputting visual image characteristics corresponding to the frame of visual image by a deconvolution layer of the second target neural network.
In the embodiment, the point cloud image feature and the visual image feature are obtained by using the deconvolution layers of the first target neural network and the second target neural network respectively. The point cloud image features and the visual image features contain more object edge detail information, and when the target point cloud image features matched with the visual image features are searched in the feature library, the searching efficiency and the searching accuracy are improved.
In other alternative embodiments, the first target neural network and the second target neural network are trained together using the same training data set, and the training processes of the first target neural network and the second target neural network are not independent.
Specifically, first, a laser radar and an image acquisition device may be mounted on the sample acquisition apparatus. Then, the sample collecting device is driven in a specific scene, and the laser radar and the image collecting device are adopted to respectively collect sample laser point cloud data and sample visual images aiming at the surrounding environment at a plurality of preset collecting moments. Then, a set of semantic labels corresponding to the acquisition time can be obtained for each acquisition time, and the semantic labels can be the result of labeled image semantic segmentation obtained for the sample laser point cloud data and the sample visual image corresponding to the acquisition time. And finally, obtaining a sample sequence, wherein the sample sequence corresponds to a plurality of acquisition moments, the sample sequence comprises sample data corresponding to each acquisition moment, and the sample data corresponding to any acquisition moment comprises a frame of sample laser point cloud data, a frame of sample visual image and a group of semantic labels.
It should be noted that the sample collection device may be a manually driven device, which is the same type of device as the unmanned device to which the present application is applied.
In one implementation, a first target neural network and a second target neural network may be trained simultaneously based on the sequence of samples. Specifically, based on the sample sequence, the updating operation on the first neural network and the second neural network can be iteratively performed until a stopping condition is satisfied, and the first neural network and the second neural network after the iterative updating are respectively used as a first target neural network and a second target neural network.
The update operation includes: firstly, inputting a frame of sample laser point cloud data corresponding to any acquisition time randomly selected from the sample sequence into a current first neural network to obtain sample point cloud image characteristics and a first semantic meaning. The sample point cloud image features are output by a deconvolution layer of the current first neural network, and the first semantics are output by an output layer of the current first neural network. And inputting a frame of sample visual image corresponding to the acquisition time in the sample sequence to a current second neural network to obtain the sample visual image characteristics and a second semantic meaning. Wherein the sample visual image feature is output by a deconvolution layer of the current second neural network, and the second semantic is output by an output layer of the current second neural network. And obtaining the semantic label corresponding to the acquisition time as the current semantic label.
Then, the current first neural network and the current second neural network are updated (i.e. network parameters in the current first neural network and the current second neural network are adjusted) according to the sample point cloud image feature, the first semantic, the sample visual image feature, the second semantic and the current semantic label. And after the updating, judging whether the stop condition is met. If the stop condition is not satisfied, the update operation continues. And if the stopping condition is met, stopping executing the updating operation, and taking the first neural network and the second neural network after iterative updating as a first target neural network and a second target neural network.
Wherein the stop condition includes: a difference function convergence between the first semantic and the current semantic tag, a difference function convergence between the second semantic and the current semantic tag, and a difference function convergence between the sample point cloud image feature and the sample visual image feature.
Due to the implementation mode, the first target neural network and the second target neural network are put together and trained simultaneously in the mode, so that the information content of the visual features extracted by the second target neural network is richer, and the visual feature information can be consistent with the point cloud image feature dimension extracted by the first target neural network. When the target point cloud image characteristics matched with the visual image characteristics are searched in the characteristic library, the searching efficiency and the searching accuracy are improved.
In another implementation, the sample sequence may be used to train to obtain a first target neural network, and then a second target neural network may be trained based on the sample sequence and the first target neural network. Or, the sample sequence is firstly used for training to obtain a second target neural network, and then the first target neural network is obtained through training based on the sample sequence and the second target neural network.
It is understood that the first target neural network and the second target neural network may be obtained by other training methods, and the present application is not limited to the specific training methods for the first target neural network and the second target neural network.
FIG. 2, as shown in FIG. 2, illustrates a flow chart of another positioning method according to an exemplary embodiment describing a process of obtaining a visual image of a target, which may be applied in an unmanned device, comprising the steps of:
in step 201, a currently acquired visual image is acquired.
In this embodiment, the image acquisition device is installed on the unmanned device, and the image acquisition device can be used to acquire the visual image of the surrounding environment of the unmanned device in real time and perform positioning by using the acquired visual image. And acquiring a visual image currently acquired by the image acquisition device during positioning.
In step 202, the currently captured visual image is converted into a target visual image satisfying a preset shooting condition.
In the present embodiment, the inventors found that different shooting conditions (e.g., lighting factors, or image shooting angle factors, etc.) have a large influence on the visual image features extracted for the visual image. Under different shooting conditions, the characteristics of the visual images corresponding to the visual images at the same positioning point are lack of uniformity, so that the final positioning result is influenced. Therefore, the currently acquired visual image can be converted into the target visual image meeting the preset shooting condition, so that the visual image characteristics corresponding to the target visual image at the same positioning point have smaller difference under different shooting conditions. The preset shooting condition may include preset illumination, or include a preset shooting angle, or include preset illumination and a preset shooting angle.
Specifically, the currently acquired visual image may be input to a pre-trained generator, and a target visual image output by the generator and meeting a preset shooting condition is obtained. Wherein, the generator can be obtained by training the generative confrontation network.
In step 203, visual image features are extracted for the current target visual image.
In step 204, the target point cloud image features matching the visual image features are searched from a pre-constructed feature library.
In step 205, object location data associated with the object point cloud image features is determined.
In step 206, current positioning information is determined based on the object positioning data.
It should be noted that, for the same steps as in the embodiment of fig. 1, details are not repeated in the embodiment of fig. 2, and related contents may refer to the embodiment of fig. 1.
According to the positioning method provided by the embodiment of the application, the currently acquired visual image is acquired, the currently acquired visual image is converted into the target visual image meeting the preset shooting condition, the visual image characteristics are extracted aiming at the current target visual image, the target point cloud image characteristics matched with the visual image characteristics are searched from the pre-constructed characteristic library, the target positioning data associated with the target point cloud image characteristics are determined, and the current positioning information is determined based on the target positioning data. Because this embodiment converts the visual image of current collection into the target visual image that satisfies preset shooting condition to unify the visual image, avoided the shooting condition to the influence that the location produced, improved the degree of accuracy of location.
It should be noted that although in the above embodiments, the operations of the methods of the present application were described in a particular order, this does not require or imply that these operations must be performed in that particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Corresponding to the embodiment of the positioning method, the application also provides an embodiment of the positioning device.
As shown in fig. 3, fig. 3 is a block diagram of a positioning apparatus according to an exemplary embodiment of the present application, and the apparatus may include: an extraction module 301, a lookup module 302, a determination module 303 and a location module 304.
The extracting module 301 is configured to extract, for a current target visual image, a visual image feature.
A searching module 302, configured to search a pre-constructed feature library for target point cloud image features matching the visual image features. The feature library comprises point cloud image features extracted aiming at multi-frame laser point cloud data.
A determining module 303, configured to determine target location data associated with the target point cloud image feature.
A positioning module 304 for determining current positioning information based on the object positioning data.
In some optional embodiments, the point cloud image features are extracted by using a first pre-trained target neural network, and the visual image features are extracted by using a second pre-trained target neural network.
Wherein the first target neural network and the second target neural network satisfy the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range.
In other alternative embodiments, the first target neural network and the second target neural network are both deep convolutional neural networks.
The point cloud image features may be extracted using a first target neural network by: inputting a frame of laser point cloud data into a first target neural network, and outputting point cloud image features corresponding to the frame of laser point cloud data by a deconvolution layer of the first target neural network.
The visual image feature may be extracted using the second target neural network by: and inputting a frame of visual image into a second target neural network, and outputting visual image characteristics corresponding to the frame of visual image by a deconvolution layer of the second target neural network.
In other alternative embodiments, the first target neural network and the second target neural network are trained by: determining a sample sequence, wherein the sample sequence comprises a frame of sample laser point cloud data corresponding to each acquisition time in a plurality of acquisition times, a frame of sample visual image and a group of semantic labels. And based on the sample sequence, iteratively executing the updating operation on the first neural network and the second neural network until a stopping condition is met, and respectively taking the first neural network and the second neural network after iterative updating as a first target neural network and a second target neural network.
The update operation includes: and inputting sample laser point cloud data corresponding to any randomly selected acquisition time in the sample sequence into a current first neural network to obtain the sample point cloud image characteristics and the first semantics. The sample point cloud image features and the first semantic are output by a current deconvolution layer and an output layer of the first neural network respectively. And inputting the sample visual image corresponding to the acquisition time in the sample sequence into a current second neural network to obtain the sample visual image characteristics and a second semantic meaning. The sample visual image features and the second semantic are output by a deconvolution layer and an output layer of the current second neural network respectively. And determining the semantic label corresponding to the acquisition time in the sample sequence as the current semantic label. And updating the current first neural network and the current second neural network according to the sample point cloud image feature, the first semantic, the sample visual image feature, the second semantic and the current semantic label.
Wherein the stop condition includes: a difference function convergence between the first semantic and the current semantic tag, a difference function convergence between the second semantic and the current semantic tag, and a difference function convergence between the sample point cloud image feature and the sample visual image feature.
As shown in fig. 4, fig. 4 is a block diagram of another positioning apparatus according to an exemplary embodiment of the present application, and the apparatus according to the embodiment shown in fig. 3 may further include: an acquisition module 305 and a conversion module 306.
The acquiring module 305 is configured to acquire a currently acquired visual image.
And a converting module 306, configured to convert the currently acquired visual image into a target visual image meeting a preset shooting condition.
In other alternative embodiments, the preset photographing condition may include a preset illumination and/or a preset photographing angle.
As shown in fig. 5, fig. 5 is a block diagram of another positioning apparatus according to an exemplary embodiment of the present application, where on the basis of the foregoing embodiment shown in fig. 4, the converting module 306 may include: input to sub-module 501.
The input sub-module 501 is configured to input a currently acquired visual image to a pre-trained generator, so as to obtain a target visual image meeting a preset shooting condition. Wherein, the generator is obtained by training a generative confrontation network.
It should be understood that the above-mentioned means may be preset in the unmanned device, or may be loaded into the unmanned device by means of downloading or the like. The corresponding modules in the above-described arrangement may cooperate with modules in the drone to implement the positioning solution.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
An embodiment of the present application further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program can be used to execute the positioning method provided in any one of the embodiments of fig. 1 to fig. 2.
Corresponding to the positioning method, the embodiment of the present application also proposes a schematic structural diagram of the unmanned aerial vehicle according to an exemplary embodiment of the present application, shown in fig. 6. Referring to fig. 6, at the hardware level, the drone includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, although it may also include hardware required for other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs the computer program to form the positioning device on a logic level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (9)

1. A method of positioning, the method comprising:
extracting visual image features aiming at the current target visual image, wherein the visual image features are extracted by utilizing a pre-trained second target neural network;
searching a target point cloud image feature matched with the visual image feature from a pre-constructed feature library; the feature library comprises point cloud image features extracted aiming at multiple frames of laser point cloud data, the point cloud image features corresponding to each frame of laser point cloud data and positioning data corresponding to the frame of laser point cloud data are stored in the feature library in an associated mode, the point cloud image features are extracted by utilizing a pre-trained first target neural network, and the first target neural network and the second target neural network meet the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range, and the second target neural network is a deep convolution neural network with a function of processing a two-dimensional image;
determining target positioning data associated with the target point cloud image features;
current positioning information is determined based on the object positioning data.
2. The method of claim 1, wherein the first target neural network and the second target neural network are both deep convolutional neural networks;
extracting point cloud image features with the first target neural network by: inputting a frame of laser point cloud data into the first target neural network, and outputting point cloud image features corresponding to the frame of laser point cloud data by a deconvolution layer of the first target neural network;
extracting visual image features using the second target neural network by: and inputting a frame of visual image into the second target neural network, and outputting the visual image characteristics corresponding to the frame of visual image by the deconvolution layer of the second target neural network.
3. The method of claim 2, wherein the first target neural network and the second target neural network are trained by:
determining a sample sequence, wherein the sample sequence comprises a frame of sample laser point cloud data, a frame of sample visual image and a group of semantic labels corresponding to each acquisition time in a plurality of acquisition times;
iteratively executing the updating operation of the first neural network and the second neural network based on the sample sequence until a stopping condition is met, and respectively taking the first neural network and the second neural network after iterative updating as the first target neural network and the second target neural network; the update operation includes:
inputting sample laser point cloud data corresponding to any randomly selected acquisition time in the sample sequence into a current first neural network to obtain sample point cloud image characteristics and first semantics; the sample point cloud image features and the first semantics are respectively output by a current deconvolution layer and an output layer of a first neural network;
inputting the sample visual image corresponding to the acquisition time in the sample sequence into a current second neural network to obtain the sample visual image characteristics and a second semantic meaning; the sample visual image features and the second semantics are respectively output by a deconvolution layer and an output layer of a current second neural network;
determining a semantic label corresponding to the acquisition time in the sample sequence as a current semantic label;
updating a current first neural network and a current second neural network according to the sample point cloud image feature, the first semantic, the sample visual image feature, the second semantic and the current semantic label;
wherein the stop condition includes: a difference function convergence between the first semantic and the current semantic tag, a difference function convergence between the second semantic and the current semantic tag, and a difference function convergence between the sample point cloud image feature and the sample visual image feature.
4. The method according to any of claims 1-3, further comprising, prior to said extracting visual image features for the current target visual image:
acquiring a currently acquired visual image;
and converting the currently acquired visual image into a target visual image meeting preset shooting conditions.
5. The method according to claim 4, wherein the preset shooting condition comprises a preset illumination and/or a preset shooting angle.
6. The method according to claim 4, wherein the converting the currently acquired visual image into the target visual image satisfying a preset shooting condition comprises:
inputting the currently acquired visual image into a generator trained in advance to obtain a target visual image meeting preset shooting conditions; wherein the generator is obtained by training a generative confrontation network.
7. A positioning device, the device comprising:
the extraction module is used for extracting visual image features aiming at the current target visual image, and the visual image features are extracted by utilizing a pre-trained second target neural network;
the searching module is used for searching the target point cloud image characteristics matched with the visual image characteristics from a pre-constructed characteristic library; the feature library comprises point cloud image features extracted aiming at multiple frames of laser point cloud data, the point cloud image features corresponding to each frame of laser point cloud data and positioning data corresponding to the frame of laser point cloud data are stored in the feature library in an associated mode, the point cloud image features are extracted by utilizing a pre-trained first target neural network, and the first target neural network and the second target neural network meet the following conditions: for any positioning point, the difference between the point cloud image features extracted by the first target neural network aiming at the laser point cloud data corresponding to the positioning point and the visual image features extracted by the second target neural network aiming at the visual image corresponding to the positioning point is within a preset error range, and the second target neural network is a deep convolution neural network with a function of processing a two-dimensional image;
the determining module is used for determining target positioning data associated with the target point cloud image features;
and the positioning module is used for determining the current positioning information based on the object positioning data.
8. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, carries out the method of any of the preceding claims 1-6.
9. An unmanned aerial device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any of claims 1-6.
CN201910629969.8A 2019-07-12 2019-07-12 Positioning method and device and unmanned equipment Active CN110415297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910629969.8A CN110415297B (en) 2019-07-12 2019-07-12 Positioning method and device and unmanned equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910629969.8A CN110415297B (en) 2019-07-12 2019-07-12 Positioning method and device and unmanned equipment

Publications (2)

Publication Number Publication Date
CN110415297A CN110415297A (en) 2019-11-05
CN110415297B true CN110415297B (en) 2021-11-05

Family

ID=68361268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910629969.8A Active CN110415297B (en) 2019-07-12 2019-07-12 Positioning method and device and unmanned equipment

Country Status (1)

Country Link
CN (1) CN110415297B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111765892B (en) * 2020-05-12 2022-04-29 驭势科技(北京)有限公司 Positioning method, positioning device, electronic equipment and computer readable storage medium
CN111649724B (en) * 2020-06-04 2022-09-06 百度在线网络技术(北京)有限公司 Visual positioning method and device based on mobile edge calculation
CN111856441B (en) * 2020-06-09 2023-04-25 北京航空航天大学 Train positioning method based on vision and millimeter wave radar fusion
CN111722245B (en) 2020-06-22 2023-03-10 阿波罗智能技术(北京)有限公司 Positioning method, positioning device and electronic equipment
CN112068570B (en) * 2020-09-18 2024-09-06 拉扎斯网络科技(上海)有限公司 Robot movement control method and device and robot
CN116664812B (en) * 2022-11-30 2024-06-07 荣耀终端有限公司 Visual positioning method, visual positioning system and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574386A (en) * 2014-12-26 2015-04-29 速感科技(北京)有限公司 Indoor positioning method based on three-dimensional environment model matching
CN108406731A (en) * 2018-06-06 2018-08-17 珠海市微半导体有限公司 A kind of positioning device, method and robot based on deep vision
CN108416808A (en) * 2018-02-24 2018-08-17 斑马网络技术有限公司 The method and device of vehicle reorientation
CN109241988A (en) * 2018-07-16 2019-01-18 北京市商汤科技开发有限公司 Feature extracting method and device, electronic equipment, storage medium, program product
CN109658457A (en) * 2018-11-02 2019-04-19 浙江大学 A kind of scaling method of laser and any relative pose relationship of camera
US10275691B2 (en) * 2017-08-22 2019-04-30 Northrop Grumman Systems Corporation Adaptive real-time detection and examination network (ARDEN)
CN109815893A (en) * 2019-01-23 2019-05-28 中山大学 The normalized method in colorized face images illumination domain of confrontation network is generated based on circulation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574386A (en) * 2014-12-26 2015-04-29 速感科技(北京)有限公司 Indoor positioning method based on three-dimensional environment model matching
US10275691B2 (en) * 2017-08-22 2019-04-30 Northrop Grumman Systems Corporation Adaptive real-time detection and examination network (ARDEN)
CN108416808A (en) * 2018-02-24 2018-08-17 斑马网络技术有限公司 The method and device of vehicle reorientation
CN108406731A (en) * 2018-06-06 2018-08-17 珠海市微半导体有限公司 A kind of positioning device, method and robot based on deep vision
CN109241988A (en) * 2018-07-16 2019-01-18 北京市商汤科技开发有限公司 Feature extracting method and device, electronic equipment, storage medium, program product
CN109658457A (en) * 2018-11-02 2019-04-19 浙江大学 A kind of scaling method of laser and any relative pose relationship of camera
CN109815893A (en) * 2019-01-23 2019-05-28 中山大学 The normalized method in colorized face images illumination domain of confrontation network is generated based on circulation

Also Published As

Publication number Publication date
CN110415297A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN110415297B (en) Positioning method and device and unmanned equipment
Akyon et al. Slicing aided hyper inference and fine-tuning for small object detection
CN111222395B (en) Target detection method and device and electronic equipment
CN109753928B (en) Method and device for identifying illegal buildings
US11983245B2 (en) Unmanned driving behavior decision-making and model training
CN110428490B (en) Method and device for constructing model
CN113038018B (en) Method and device for assisting user in shooting vehicle video
CN112487899B (en) Target identification method and system based on unmanned aerial vehicle, storage medium and electronic equipment
CN109190504B (en) Automobile image data processing method and device and readable storage medium
CN109377494B (en) Semantic segmentation method and device for image
CN109685847B (en) Training method and device for visual target detection model
CN112364843A (en) Plug-in aerial image target positioning detection method, system and equipment
CN113515655A (en) Fault identification method and device based on image classification
CN113284144B (en) Tunnel detection method and device based on unmanned aerial vehicle
CN116469079A (en) Automatic driving BEV task learning method and related device
CN112465856A (en) Unmanned aerial vehicle-based ship track correction method and device and electronic equipment
CN117975276A (en) Real-time battlefield three-dimensional scene and target perception method and system based on binocular vision
CN113792797A (en) Point cloud data screening method and storage medium
CN112668596B (en) Three-dimensional object recognition method and device, recognition model training method and device
Li et al. Driver drowsiness behavior detection and analysis using vision-based multimodal features for driving safety
CN115273013B (en) Lane line detection method, system, computer and readable storage medium
CN116259043A (en) Automatic driving 3D target detection method and related device
CN113592800B (en) Image scanning method and device based on dynamic scanning parameters
CN113487590B (en) Block processing method, device, computing equipment and storage medium
CN112116534A (en) Ghost eliminating method based on position information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant