CN111366917B

CN111366917B - Method, device and equipment for detecting travelable area and computer readable storage medium

Info

Publication number: CN111366917B
Application number: CN202010175382.7A
Authority: CN
Inventors: 唐逸之; 韩承志; 谭日成; 王智
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2022-07-15
Anticipated expiration: 2040-03-13
Also published as: CN111366917A

Abstract

The application discloses a method, a device and equipment for detecting a drivable area and a computer readable storage medium, and relates to the technical field of autonomous parking. The specific implementation scheme is as follows: acquiring a matching image and a target image acquired by shooting a surrounding area of a vehicle with a vehicle-mounted monocular camera, wherein the matching image is a previous frame image continuous with the target image; then, performing three-dimensional reconstruction on the target image according to the matched image to obtain target three-dimensional point cloud data corresponding to the target image, so that the reconstruction of the three-dimensional point cloud data of the continuous frame image acquired by the monocular camera is realized; and then, according to the target three-dimensional point cloud data, determining the nearest barrier point in the shooting area of the target image, and according to each nearest barrier point, determining the travelable area in the shooting area of the target image, without acquiring a three-dimensional image or a large number of image samples for machine learning, thereby reducing the dependence on an image acquisition mode, reducing the processing difficulty and improving the accuracy and reliability of detection.

Description

Method, device and equipment for detecting driving area and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an autonomous parking technology.

Background

In order to avoid collision of the vehicle body with an obstacle or exceeding a road boundary, the vehicle needs to perform travelable region detection during both autonomous driving and autonomous parking.

At present, image detection based on supervised depth learning or detection of a travelable area by acquiring three-dimensional point cloud by using a three-dimensional camera is mainly performed. However, image detection based on supervised deep learning requires a large amount of manual labeling cost, and a model trained on a limited data set is difficult to solve the generalization problem. The three-dimensional camera has a complex structure, high manufacturing difficulty and high detection cost.

Therefore, the existing method for detecting the travelable area is not high in reliability.

Disclosure of Invention

The application aims to provide a method, a device and equipment for detecting a travelable area and a computer readable storage medium, which improve the reliability of travelable area detection.

According to a first aspect of the present application, there is provided a travelable region detection method including:

acquiring a matching image and a target image acquired by shooting a surrounding area of a vehicle by using a vehicle-mounted monocular camera, wherein the matching image is a previous frame image continuous with the target image;

performing three-dimensional reconstruction on the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image;

determining the nearest barrier point in the shooting area of the target image according to the target three-dimensional point cloud data;

and determining a travelable area in the shooting area of the target image according to each nearest obstacle point.

According to the embodiment of the application, the three-dimensional point cloud data is reconstructed through the continuous frame images acquired by the monocular camera, the nearest barrier point is extracted, the travelable area is obtained, the dependence of travelable area detection on an image acquisition mode is reduced, the processing difficulty is reduced, and the accuracy and the reliability of detection are improved.

In some embodiments, the three-dimensional reconstruction of the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image includes:

projecting the matched image onto a plurality of preset projection surfaces according to the visual angle of the target image to obtain a plurality of projection images, wherein each projection surface corresponds to a depth relative to the origin of the camera;

determining the estimated depth of the pixels in the target image according to the matching cost of the pixels in the target image and the corresponding pixels in the plurality of projection images;

and acquiring target three-dimensional point cloud data corresponding to the target image according to the estimated depth of the pixels in the target image.

According to the method and the device, the projection images of the matched images on different depth projection surfaces are utilized, the depth recovery is realized on the target image through cost matching, the target three-dimensional point cloud data are obtained through conversion, the accuracy and the reliability of three-dimensional reconstruction on the monocular image are improved, and the accuracy and the reliability of detection of the drivable area are further improved.

In some embodiments, the projection surface comprises: n1 vertical projection planes;

the N1 vertical projection planes are parallel to the right opposite of the camera, and the distances from the camera origin to the N1 vertical projection planes are in an inversely proportional equal difference distribution, where N1 is an integer greater than 1.

According to the depth recovery method and device, the depth recovery of the area in front of the camera is achieved through the vertical projection plane, the accuracy of the depth recovery in complex environments such as a curve is improved, and then the accuracy and reliability of three-dimensional reconstruction of the target image are improved.

In some embodiments, the projection surface further comprises: n2 horizontal projection planes and/or N3 projection spheres;

the N2 horizontal projection planes are parallel to the ground right below the camera, and the N2 horizontal projection planes are uniformly arranged in the ground distribution range taking the ground as a symmetry center, wherein N2 is an integer larger than 1;

the N3 projection spherical surfaces are concentric spherical surfaces taking the camera origin as the sphere center, and the radiuses of the N3 projection spherical surfaces are in inverse proportion equal difference distribution, wherein N3 is an integer larger than 1.

According to the depth recovery method and device, the depth of the ground area in the target image is recovered through the horizontal projection plane, more normal sampling is introduced through the projection spherical surface, and the accuracy and reliability of depth recovery are improved. For points in the target image which are not on the horizontal plane and the vertical plane, the accuracy of depth recovery of the points can be improved by combining the projection spherical surface to increase normal sampling. In addition, the parallel projection spherical surface is introduced, so that a favorable projection surface can be provided for a target image of a fish eye with a viewing angle larger than 180 degrees, and the accuracy and the reliability of depth recovery are improved.

In some embodiments, the determining an estimated depth of a pixel in the target image according to a matching cost of the pixel in the target image and a corresponding pixel in the plurality of projection images includes:

acquiring target pixel window characteristics of pixels in the target image;

acquiring projection pixel window characteristics of corresponding pixels in the plurality of projection images;

according to the target pixel window characteristic and the projection pixel window characteristic, obtaining the matching cost of the pixel in the target image and the corresponding pixel in each projection image;

and taking the depth corresponding to the corresponding pixel with the minimum matching cost as the estimated depth of the pixel in the target image, wherein the depth corresponding to the corresponding pixel is the depth corresponding to the projection plane where the corresponding pixel is located.

According to the depth recovery method and device, the depth corresponding to the corresponding pixel with the minimum matching cost is used as the estimated depth of the pixel in the target image, and the accuracy of depth recovery of the pixel in the target image is improved.

In some embodiments, before determining the estimated depth of the pixel in the target image according to the matching cost of the pixel in the target image and the corresponding pixel in the plurality of projection images, the method further comprises:

determining corresponding pixels in the matching image, which correspond to pixels in the target image one to one, according to the relative camera poses of the target image and the matching image;

and determining corresponding pixels in the projection images, which correspond to pixels in the target image one by one, according to the corresponding pixels in the matched images.

According to the method and the device, the corresponding pixels in the matched image and the target image are positioned through the relative pose of the camera, so that the accuracy and the reliability of the depth recovery of the target image are improved.

In some embodiments, before the determining, according to the relative camera poses of the target image and the matching image, respective pixels in the matching image that correspond to pixels in the target image in a one-to-one manner, the method further includes:

acquiring wheel speed meter data of a rear wheel of a vehicle and vehicle-mounted Inertial Measurement Unit (IMU) data, wherein the wheel speed meter data of the rear wheel of the vehicle indicates a horizontal movement distance of the vehicle-mounted monocular camera, and the vehicle-mounted IMU data indicates a horizontal movement direction of the vehicle-mounted monocular camera;

determining camera pose data of the vehicle-mounted monocular camera according to the wheel speed meter data of the rear wheel of the vehicle and the vehicle-mounted IMU data;

and determining the relative camera pose of the target image and the matched image according to the camera pose data of the vehicle-mounted monocular camera.

According to the embodiment of the application, the rear wheels and the vehicle body do not rotate relatively in the horizontal direction, the wheel speed of the rear wheels can directly represent the moving speed of the vehicle body, the camera pose is obtained by combining the wheel speed meter data of the rear wheels of the vehicle and the vehicle-mounted IMU data, the reliability of the relative pose of the camera is improved, and the accuracy and the reliability of the detection of the travelable area are improved.

In some embodiments, the determining a nearest obstacle point in a shooting area of the target image according to the target three-dimensional point cloud data includes:

determining the number of barrier points contained in each grid in the polar coordinate grid network according to the target three-dimensional point cloud data and the polar coordinate grid network which horizontally divides the shooting area of the target image, wherein the barrier points are target three-dimensional points with the ground height larger than a preset barrier height threshold;

and determining the nearest barrier points in each direction in the shooting area of the target image according to the nearest barrier grid in each sector partition of the polar coordinate grid network, wherein the nearest barrier grid is the grid which has the closest radial distance with the origin of the camera in the sector partition and contains barrier points with the number larger than a preset number threshold.

The method and the device have the advantages that the space division is carried out on the shooting area of the target image by utilizing the polar coordinate grid network established for the camera origin, the grids which are probably the obstacles in the fan-shaped partitions in all directions and have the smallest radial distance to the camera origin are extracted, the nearest obstacle points in all directions of the camera origin are determined, the accuracy of the nearest obstacle points is improved, and the accuracy and the reliability of the travelable area detection are improved.

In some embodiments, the determining, according to the nearest obstacle grid in each sector of the polar grid network, a nearest obstacle point in each direction in the shooting area of the target image includes:

obtaining the average position point of the obstacle points contained in the nearest obstacle grid;

and taking the average position point corresponding to each nearest obstacle grid as the nearest obstacle point in the shooting area of the target image.

The embodiment of the application further optimizes the position of the nearest barrier point and improves the accuracy of the barrier point.

In some embodiments, the determining a travelable region in the shooting region of the target image according to each of the closest obstacle points includes:

in a uniform segmentation network for horizontally segmenting the shooting area of the target image, determining the weighted value of each grid unit according to the position of each grid unit in the uniform segmentation network relative to the nearest barrier point;

determining a new weight of each grid cell according to the initial weight of each grid cell and the weighted value, wherein the new weight is greater than or equal to a minimum weight threshold and less than or equal to a maximum weight threshold, and the initial weight of each grid cell is 0 or is a new weight determined when a travelable area is determined for a previous frame of image;

and determining a travelable area in the shooting area of the target image according to the first type of grid cells or the second type of grid cells indicated by the new weight, wherein the nearest barrier point is arranged between the first type of grid cells indicated by the new weight and the origin of the camera, and the nearest barrier point is not arranged between the second type of grid cells indicated by the new weight and the origin of the camera.

According to the method and the device, the weighted value of each grid unit in the uniformly-segmented network of the target image shooting area is calculated, the weight is updated for the grid units, and the new weights of the grid units are updated by weighting continuously along with continuous updating of continuous image frames, so that smooth noise can be achieved, the problem that the nearest barrier point of a single frame is missed to be detected can be solved, and the reliability of detection on the travelable area is improved.

In some embodiments, the determining a weighted value for each grid cell based on the location of each grid cell in the evenly-divided network relative to the nearest obstacle point includes:

if the difference value of the radial distance of the nearest barrier point in the direction of the grid unit to the camera origin is greater than or equal to the distance upper limit threshold value, the weighted value of the grid unit is a first value;

if the difference value of the radial distance of the grid unit to the camera origin in the direction of the nearest barrier point is subtracted from the radial distance of the nearest barrier point to the camera origin, and the difference value is smaller than or equal to a distance lower limit threshold, the weighted value of the grid unit is a second value;

if the radial distance from the nearest barrier point to the camera origin is less than the difference value of the radial distance from the grid unit to the camera origin in the direction of the nearest barrier point, and the difference value is less than the distance upper limit threshold and greater than the distance lower limit threshold, the weighted value of the grid unit is a third value, wherein the third value is a value determined according to the difference value and a preset smooth continuous function;

wherein the distance upper threshold is the inverse of the distance lower threshold, the first value is the inverse of the second value, and the absolute value of the third value is smaller than the absolute value of the distance upper threshold or the absolute value of the distance lower threshold.

The embodiment of the application realizes the determination of the weighted value of each grid unit, and the smooth transition of the grid units with the difference values between the distance upper limit threshold and the distance lower limit threshold is realized through the smooth continuous function, so that the noise in the process of multi-frame fusion weighted accumulation is further reduced, the reliability of determining the new weight of the grid units is improved, and the reliability of detecting the travelable area is further improved.

In some embodiments, the matching image and the target image are both fisheye images.

According to the embodiment of the application, the fisheye image is adopted, so that the horizontal visual angle is increased (can exceed 180 degrees), the visual field range of the image shooting area is enlarged, and the reliability of the travelable area detection is improved.

According to a second aspect of the present application, there is provided a travelable region detection apparatus including:

the system comprises an image acquisition module, a display module and a display module, wherein the image acquisition module is used for acquiring a matching image and a target image acquired by shooting a surrounding area of a vehicle by using a vehicle-mounted monocular camera, and the matching image is a previous frame image of the target image;

the first processing module is used for carrying out three-dimensional reconstruction on the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image;

the second processing module is used for determining a nearest barrier point in a shooting area of the target image according to the target three-dimensional point cloud data;

and the third processing module is used for determining a travelable area in the shooting area of the target image according to each nearest obstacle point.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the travelable area detection method as described in any of the embodiments of the first aspect and the first aspect of the present application.

According to a fourth aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the travelable area detection method as defined in any of the first and second aspects of the present application.

One embodiment in the above application has the following advantages or benefits: acquiring a matching image and a target image acquired by shooting a surrounding area of a vehicle with a vehicle-mounted monocular camera, wherein the matching image is a previous frame image continuous with the target image; then, performing three-dimensional reconstruction on the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image, thereby realizing reconstruction of the three-dimensional point cloud data of the continuous frame image acquired by the monocular camera; and then, according to the target three-dimensional point cloud data, determining a nearest barrier point in the shooting area of the target image, and according to each nearest barrier point, determining a travelable area in the shooting area of the target image, without acquiring a three-dimensional image or a large number of image samples for machine learning, thereby reducing the dependence on an image acquisition mode, reducing the processing difficulty and improving the accuracy and reliability of detection.

Other effects of the above alternatives will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be considered limiting of the present application. Wherein:

fig. 1 is a schematic view of an application scenario of autonomous parking according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for detecting a travelable area according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a method of step S102 in fig. 2 according to an embodiment of the present application;

FIG. 4 is a schematic diagram of projecting matching depths on a plurality of projection surfaces according to an embodiment of the present application;

FIG. 5 is a top view of a vertical projection plane distribution provided by an embodiment of the present application;

FIG. 6 is a side view of a multi-type projection plane profile provided by an embodiment of the present application;

fig. 7 is a top view of a polar grid network provided in an embodiment of the present application;

fig. 8 is a schematic diagram of a nearest obstacle point in a shooting area of a target image according to an embodiment of the present application;

fig. 9 is a top view of a uniform partition network according to an embodiment of the present application;

FIG. 10 is a schematic view of a drivable region provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a travelable region detection apparatus according to an embodiment of the present application;

fig. 12 is a block diagram of an electronic device of a travelable region detection method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application to assist in understanding, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In a scene of automatic driving or autonomous parking of a vehicle, the vehicle needs to detect a drivable area around the vehicle so as to plan a safe and feasible driving path or parking path, and automatic avoidance of an obstacle or automatic parking and warehousing in the automatic driving is realized. For example, in the process of autonomous parking in a garage, a travelable area in the forward or backward direction needs to be detected, and then the vehicle is controlled to enter the parking space according to the travelable area. Fig. 1 is a schematic view of an application scenario of autonomous parking according to an embodiment of the present application. In the parking scene shown in fig. 1, when a vehicle backs into a parking space, other vehicles on the left side of the empty parking space and a stone pillar on the right side need to be avoided. The method comprises the steps of firstly, acquiring three-dimensional information of an object behind the vehicle backing, and identifying the positions of other vehicles and stone pillars, so that a driving-capable area of the vehicle backing can be accurately obtained.

In the existing method for detecting the travelable area, a large number of picture samples are needed for learning through image detection with supervision and deep learning, and when a new obstacle or a complex environment is encountered, the risk of inaccurate identification may exist. And the three-dimensional camera with a complex structure needs to be configured for the vehicle when the three-dimensional camera is used for collecting the three-dimensional point cloud behind the vehicle, so that the problem of unreliable detection may exist in the vehicle running process.

The method, the device and the equipment for detecting the travelable area and the computer readable storage medium are provided, the detection of the travelable area is realized by utilizing the vehicle-mounted monocular camera, the detection difficulty is reduced, and the detection accuracy and reliability are improved.

Referring to fig. 2, which is a schematic flow chart of a method for detecting a travelable area according to an embodiment of the present disclosure, an execution main body of the method shown in fig. 2 may be a travelable area detection device of software and/or hardware, and specifically may be one or a combination of multiple types of terminals, vehicle-mounted detection systems, and cloud terminals, for example. The method shown in fig. 2 includes steps S101 to S104, which are specifically as follows:

s101, acquiring a matching image and a target image acquired by shooting a surrounding area of a vehicle by using a vehicle-mounted monocular camera, wherein the matching image is a previous frame image continuous with the target image.

The vehicle-mounted monocular camera may be mounted in front of or behind the vehicle. The on-vehicle monocular camera may capture an area around the vehicle, for example, a region in front of the vehicle, or a region behind the vehicle. Alternatively, the acquisition of the image of the area in front of the vehicle or the image of the area behind the vehicle may be selected according to the forward or backward movement of the vehicle.

And determining a target image in the continuous frame images collected by the vehicle-mounted monocular camera, and taking the previous frame of the target image as a matching image. The target image may be, for example, a current frame image acquired by a vehicle-mounted monocular camera in real time, but may also be a historical frame image, and the embodiment is not limited thereto.

The vehicle-mounted monocular camera can be a camera adopting a non-wide-angle lens, a wide-angle lens or an ultra-wide-angle lens. The fisheye lens is an optical imaging system with an ultra-large field of view and a large aperture, generally two or three negative meniscus lenses are used as a front light group, and the ultra-large field of view of an object space is compressed to a field of view range required by a conventional lens. The camera using the fisheye lens captures images with an angle of view of, for example, 220 ° or 230 °.

In some embodiments, the on-board monocular camera may be a camera employing a fisheye lens, and the matching image and the target image captured may both be fisheye images. The embodiment can enlarge the visual field range of the image shooting area and improve the reliability of the travelable area detection by adopting the fisheye image and increasing the horizontal viewing angle (for example, over 180 degrees).

And S102, performing three-dimensional reconstruction on the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image.

The target image is a two-dimensional image, but the pose of the camera can be changed with the previous frame image, so that a depth image can be generated for the target image through projection, and three-dimensional reconstruction is realized. There are various ways to implement step S102, which will be described below with reference to fig. 3 and the specific embodiment. Fig. 3 is a schematic flowchart of a method in step S102 in fig. 2 according to an embodiment of the present disclosure. The method shown in fig. 3 specifically includes steps S201 to S203, which are specifically as follows:

s201, projecting the matched image to a plurality of preset projection surfaces according to the visual angle of the target image to obtain a plurality of projection images, wherein each projection surface corresponds to a depth relative to the origin of the camera.

Referring to fig. 4, a schematic diagram of projecting matching depths on multiple projection surfaces according to an embodiment of the present application is provided. As shown in the figure, the matching image is projected onto a plurality of projection surfaces according to the view angle of the target image, resulting in projection images having various depths. It is understood that the camera is preset with projection planes of a plurality of depths. Depth is understood to be the distance from the plane where the origin of the camera is located. The plurality of projection surfaces with different depths can be understood as a plurality of projection surfaces arranged in space, and each projection surface has a different radial distance from the origin of the camera (also the distance from the plane perpendicular to the origin of the camera), so that each projection surface corresponds to one depth. When the matching image is projected onto the projection surface, the projection image on the projection surface has a depth corresponding to the projection surface, i.e. the depth of all pixels in a single projection image is the depth of the projection surface.

In some embodiments, the projection plane includes N1 vertical projection planes, for example, where N1 is an integer greater than 1. Specifically, N1 vertical projection planes are parallel to the right-to-face camera, and the distances from the camera origin to the N1 vertical projection planes are inversely proportional, equi-differenced. In the scenario shown in fig. 1, the on-board monocular camera of the joining vehicle faces the wall of the front garage, and then the N1 vertical projection planes can be understood as N1 spatial planes parallel to the wall of the front garage. In the image captured by the camera, the resolution of the near object image is usually greater than that of the far object image, which is understood to mean that the actual size represented by the distant view pixels is greater than that represented by the near view pixels. Therefore, in order to improve the reliability of depth recovery, the distribution density of the vertical projection plane close to the origin of the camera is larger than that far from the origin of the camera. In addition, the closer the obstacle is to the vehicle, the greater the effect or danger that may be created, and the accuracy of depth recovery may be improved in the region closer to the vehicle to generate more accurate depth data for the region closer to the vehicle. For example, the distribution density of N1 vertical projection planes can satisfy: the distribution density near the camera origin is greater than the distribution density far from the camera origin. Referring to fig. 5, it is a top view of a vertical projection plane distribution provided in the embodiment of the present application. In the front shooting area of the camera origin shown in fig. 5, 64 vertical projection planes (not all shown in the figure) are arranged, and the distances from the camera origin to the vertical projection planes are distributed in an equal difference manner in an inverse proportion, for example, sequentially: 20/64 m, 20/63 m, 20/62 m, 20/3 m, 20/2 m, 20 m. Wherein, the closer to the origin of the camera, the denser the vertical projection plane is distributed, and the greater the accuracy of the recovered depth. According to the depth restoration method and device, the depth restoration of the area in front of the camera is achieved through the vertical projection plane, the accuracy of the depth restoration in complex environments such as a curve is improved, and then the accuracy and reliability of three-dimensional reconstruction of the target image are improved.

In other embodiments, on the basis of the vertical projection plane, N2 horizontal projection planes and/or N3 projection spherical surfaces can be introduced as projection surfaces. Wherein, the horizontal projection plane can be used for recovering to the depth on ground, and the projection sphere can be used for recovering to the depth of distortion image in the fish-eye image.

Referring to fig. 6, a side view of a multi-type projection plane distribution provided by an embodiment of the present application is shown. Fig. 6 shows a plurality of mutually parallel vertical projection planes, mutually parallel horizontal projection planes and a concentric projection sphere. The N2 horizontal projection planes are parallel to the ground right below the camera, and the N2 horizontal projection planes are uniformly arranged in the ground distribution range taking the ground as a symmetry center, wherein N2 is an integer larger than 1. Taking fig. 6 as an example, after the camera is calibrated, the plane position of the ground right below the origin of the camera can be determined, and the 4 horizontal projection planes are uniformly distributed in the ground distribution range of-5 cm to 5cm near the ground right below the camera, so as to recover the depth of the point close to the ground. The number of the horizontal projection planes may also be 8. According to the method and the device, the depth of the ground area pixel in the target image can be recovered through the horizontal projection plane, and the accuracy and the reliability of the depth recovery of the ground image area are improved.

In the above embodiment, the N3 projection spherical surfaces are concentric spherical surfaces with the camera origin as the center of sphere, and the radii of the N3 projection spherical surfaces are in inverse proportional equal difference distribution, where N3 is an integer greater than 1. With reference to fig. 6, the origin of the camera is taken as the center of the sphere, and 64 radii are distributed in an inverse proportion and in an equal difference mode from 0.5m to 32m to form a projection spherical surface. The projection spherical surface is arranged, more normal sampling can be introduced on the basis of the projection plane, especially for pixels in the target image which are not on the horizontal plane and the vertical plane, and the accuracy of depth recovery of the pixels can be improved by combining the projection spherical surface to increase the normal sampling. In addition, in the embodiment that the target image and the matching image are both fisheye images, the parallel N3 projection spherical surfaces are introduced, and the favorable projection surface can be provided for the target image of the fisheye with the viewing angle larger than 180 degrees, so that the accuracy and the reliability of depth recovery are improved.

S202, determining the estimated depth of the pixels in the target image according to the matching cost of the pixels in the target image and the corresponding pixels in the plurality of projection images.

Continuing to refer to fig. 4, the target image has corresponding pixels in each projection image, and the smaller the matching cost (match cost) is, the more relevant the pixel features are, so that the depth of the corresponding pixel with the minimum matching cost is taken as the estimated depth of the pixel in the target image, the depth recovery of the target image is realized, and the depth map corresponding to the target image can be obtained.

In some embodiments, before performing the matching cost calculation, corresponding pixels in the matching image, which correspond to pixels in the target image in a one-to-one manner, may be determined according to the relative camera poses of the target image and the matching image. The tracking of the corresponding pixel may be implemented by using various existing tracking algorithms, which is not limited herein. After the corresponding pixels in the matching image are determined, the corresponding pixels in each of the projection images that correspond to the pixels in the target image one to one may be determined according to the corresponding pixels in the matching image. Each projection image, see fig. 4, has a corresponding pixel to the pixel in the target image. According to the depth recovery method and device, the corresponding pixels in the matched image and the target image are positioned through the relative pose of the camera, and the accuracy and reliability of the depth recovery of the target image are improved.

The position relationship among the vehicle-mounted monocular camera, the vehicle body of the vehicle, and the vehicle-mounted Inertial Measurement Unit (IMU) is calibrated in advance. According to the moving direction and the distance of the vehicle body, information such as the camera origin position, the camera shooting area and the visual angle of the vehicle-mounted monocular camera can be determined.

Before determining corresponding pixels in the matching image, which correspond to pixels in the target image one to one, according to the camera relative poses of the target image and the matching image, the camera relative pose between the matching image and the target image may be determined. For example, wheel speed count data of a rear wheel of a vehicle indicating a horizontal movement distance of the on-board monocular camera and on-board IMU data indicating a horizontal movement direction of the on-board monocular camera are collected; camera pose data of the vehicle monocular camera may be determined from wheel speed gauge data of the rear wheels of the vehicle and the vehicle IMU data. Each frame of image shot by the vehicle-mounted monocular camera has a corresponding camera pose. Therefore, the relative camera pose between the matching image and the target image can be determined according to the camera pose data of the vehicle-mounted monocular camera. In the implementation, because the rear wheel of the vehicle and the vehicle body do not rotate relatively in the horizontal direction, the wheel speed of the rear wheel can directly represent the moving speed of the vehicle body, and the camera pose is obtained by combining the wheel speed meter data of the rear wheel of the vehicle and the vehicle-mounted IMU data, so that the reliability of the camera pose is improved, and the accuracy and the reliability of the detection of the travelable area are improved.

In step S202, a specific implementation manner may be, for example, first acquiring a target pixel window feature of a pixel in the target image and a projection pixel window feature of a corresponding pixel in the plurality of projection images. The window feature here is, for example, a window feature that a sampling window of a preset size is slid on the target image and the projection image, and the average of pixel features in the window is used as a center pixel of the window. The size of the sampling window may be 7 × 7, 5 × 5, or 1 × 1, which is not limited herein. The target pixel window characteristic is, for example, the mean of the gray levels of the pixels within a sampling window centered on the pixel in the target image. The matched pixel window feature is, for example, the mean of the gray levels of the pixels within the sampling window centered on the corresponding pixel. And then, according to the target pixel window characteristic and the projection pixel window characteristic, obtaining the matching cost of the pixel in the target image and the corresponding pixel in each projection image. For example, the gray-scale mean error obtained by sampling the pixels in the target image and the corresponding pixels in the projection image in a 7 × 7 window is used as the matching cost of the corresponding pixels and the pixels in the target image. After the matching cost of the corresponding pixel in each projection image to the pixel in the target image is obtained, the depth corresponding to the corresponding pixel with the minimum matching cost may be used as the estimated depth of the pixel in the target image, where the depth corresponding to the corresponding pixel is the depth corresponding to the projection plane where the corresponding pixel is located. In the embodiment, the depth corresponding to the corresponding pixel with the minimum matching cost is used as the estimated depth of the pixel in the target image, so that the accuracy of depth recovery of the pixel in the target image is improved.

S203, acquiring target three-dimensional point cloud data corresponding to the target image according to the estimated depth of the pixels in the target image.

The process of determining the estimated depth for each pixel in the target image may be performed in parallel. And after determining the estimated depth in all the pixels in the target image, obtaining a depth image corresponding to the target image. And then combining the depth image with the pixel position of each pixel in the target image to obtain the three-dimensional information of each pixel and obtain target three-dimensional point cloud data corresponding to the target image.

In the embodiment of the three-dimensional reconstruction shown in fig. 2, the target image is subjected to depth recovery by using the projection images of the matched image on the projection surfaces of different depths at a cost matching manner, and the target three-dimensional point cloud data is obtained by conversion, so that the accuracy and reliability of the three-dimensional reconstruction of the monocular image are improved, and the accuracy and reliability of the travelable region detection are further improved.

S103, determining the nearest barrier point in the shooting area of the target image according to the target three-dimensional point cloud data.

The target three-dimensional point cloud data represents three-dimensional information in a shooting area of a target image, so that three-dimensional points of obstacles can be filtered, and the nearest obstacle points for determining the boundary of a travelable area are filtered again according to the radial distance relative to the origin of the camera.

In some embodiments, the number of obstacle points included in each grid of the polar grid network may be determined according to the target three-dimensional point cloud data and the polar grid network that horizontally divides the shooting area of the target image, and then the nearest obstacle point in each direction in the shooting area of the target image may be determined according to the nearest obstacle grid in each sector of the polar grid network.

Specifically, the camera shooting area may be first spatially divided by using a polar grid network so as to extract the nearest obstacle point therein. For example, the target three-dimensional points in each grid of the preset polar coordinate grid network may be determined according to the target three-dimensional point cloud data and the camera origin position corresponding to the target image. The camera origin position is, for example, information calibrated in advance for indicating the position of the camera origin with respect to the ground and with respect to the vehicle body. Fig. 7 is a top view of a polar grid network according to an embodiment of the present disclosure. As shown in fig. 7, the polar grid network is a segmentation network formed by overlapping and segmenting a first class of cut surfaces arranged in a fan shape and a second class of cut surfaces arranged in parallel and facing the camera in a shooting area of a target image, and an intersection line of the first class of cut surfaces is collinear with a ground perpendicular line of an origin of the camera. The first type of cut surface and the second type of cut surface are both planes perpendicular to the ground. Referring to fig. 7, the division of the plurality of first-type cuts into the shot region of the target image may form, for example, a division of 175 ° in the horizontal direction into 128 sector divisions centered on the origin of the camera in the top view. On the basis, the second cutting surfaces and the right side of the camera are distributed in parallel for superposition and segmentation. In some embodiments, the pixel resolution is higher closer to the vehicle, the target three-dimensional points are denser, and obstacles closer to the vehicle may have a greater impact on the segmentation density closer to the camera origin than farther away. The distribution density of the second type of cutting surfaces close to the origin of the camera is greater than that of the second type of cutting surfaces far away from the origin of the camera. For example, the second type of cut surface is divided into 63 segments by an equal difference in inverse proportion to the radial distance from the origin of the camera from 0.5m to 32m to form a grid.

With continued reference to fig. 7, after determining the target three-dimensional points in each grid of the preset polar grid network, the number of obstacle points included in each grid may be determined, where the obstacle points are the target three-dimensional points whose height to the ground is greater than the preset obstacle height threshold. The preset obstacle height threshold may be, for example, a limit height that can be directly crossed by the vehicle, such as a height of a chassis of the vehicle, for example, 4cm, 5cm, 6cm, and the like. Then, in the sector zones divided by the first type of cutting surface, a nearest obstacle grid is determined, wherein the nearest obstacle grid is the grid which is closest to the radial distance of the origin of the camera in the sector zones and contains obstacle points with the number larger than a preset number threshold. For example, in each direction of the polar grid network shown in fig. 7, the first grid in which the number of obstacle points is greater than a preset number threshold is searched with the origin as the nearest obstacle grid in the direction. And finally, determining the nearest obstacle point in the shooting area of the target image according to the nearest obstacle grid. Fig. 8 is a schematic diagram of a nearest obstacle point in a shooting area of a target image according to an embodiment of the present application. In the scene shown in fig. 8, other vehicles and stone pillars on both sides of the parking space are outside the nearest obstacle point, and in the area on the camera side defined by the nearest obstacle point, a travelable area can be determined. The method and the device have the advantages that the shooting area of the target image is spatially divided by utilizing the polar coordinate grid network established for the origin of the camera, the grids which are probably the obstacles in the fan-shaped partitions in all directions and have the minimum radial distance to the origin of the camera are extracted, the nearest obstacle point in all directions of the origin of the camera is determined, the accuracy of the nearest obstacle point is improved, and the accuracy and the reliability of the detection of the drivable area are further improved.

The nearest barrier point may be a center point of the nearest barrier grid or a point determined from a target three-dimensional point in the nearest barrier grid. In some embodiments, the average position point of the obstacle points included in the nearest obstacle grid may be obtained; and taking the average position point corresponding to each nearest obstacle grid as the nearest obstacle point in the shooting area of the target image. The embodiment further optimizes the position of the nearest barrier point and improves the accuracy of the barrier point.

And S104, determining a travelable area in the shooting area of the target image according to each nearest obstacle point.

After the nearest obstacle point is determined, the nearest obstacle point may be used as a boundary of the travelable area. In order to improve the reliability of the travelable area, multi-frame fusion processing can be performed by combining with the evenly-divided network scenes, and the boundary of the travelable area is continuously optimized.

In some embodiments, a uniform segmentation network including the nearest obstacle point may be obtained according to a camera origin position corresponding to the nearest obstacle point and the target image. Wherein the uniform division network is a division network formed by dividing the shooting area of the target image in a horizontal direction by a uniform square grid. Fig. 9 is a top view of a uniformly partitioned network according to an embodiment of the present application. As shown in fig. 9, the horizontal space is uniformly divided into uniformly divided networks by 0.1m × 0.1m grids, where a square grid is actually a solid grid uniformly divided in the horizontal direction, and each grid cell is actually a prismatic region with a square cross section.

In this embodiment, a space is uniformly divided through a top view of a uniform division network, and in the uniform division network for horizontally dividing the shooting area of the target image, a weighted value of each grid unit may be determined according to a position of each grid unit in the uniform division network relative to the nearest obstacle point. Specifically, the weighted value of each grid cell may be determined according to the radial distance of each grid cell in the evenly-divided network to the camera origin and the radial distance of the nearest obstacle point in the direction of the grid cell to the camera origin. Referring to fig. 9, each arrow represents a radial direction, and a respective weight value is determined for the grid cells in each radial direction. The weighted value is calculated based on the radial distance of the grid cell from the camera origin and the radial distance of the barrier point from the camera origin in the direction of its grid cell. The calculation of the weighting values is specifically illustrated in the following embodiments. When each grid cell obtains a weighted value, a truncated directed distance field taking the nearest barrier point as a truncated surface is formed, and weight calculation can be performed on each grid cell in the truncated effective distance field according to the directed distance relative to the truncated surface. And after the weighted value of each grid unit is obtained, determining a new weight value of each grid unit according to the initial weight value of each grid unit and the weighted value, wherein the new weight value is greater than or equal to a minimum weight threshold value and less than or equal to a maximum weight threshold value, and the initial weight value of each grid unit is 0 or the new weight value determined when a travelable area is determined for a previous frame of image. It should be understood that the maximum weight threshold and the minimum weight threshold are used to define the new weight value, avoiding an infinite increase in the new weight value. For example, assuming that the weighting value of the grid cell corresponding to the garage wall remains +1 for a plurality of consecutive frames, the new weighting value is increased to 10 (the maximum weighting threshold) and then is not increased.

Then, a travelable region in the shooting region of the target image may be determined according to the first type of grid cell or the second type of grid cell indicated by the new weight, where the nearest obstacle point is between the first type of grid cell indicated by the new weight and the origin of the camera, and the nearest obstacle point is not between the second type of grid cell indicated by the new weight and the origin of the camera. It is understood that the grid cells may be classified according to the value of the new weight, and whether each grid cell is a first type grid cell or a second type grid cell is determined. For example, it may be determined that the grid cell is a first type grid cell or a second type grid cell according to the new weight of the grid cell, where the nearest obstacle point is between the first type grid cell and the camera origin, and the nearest obstacle point is not between the second type grid cell and the camera origin. The second type of grid cell may be understood as a grid cell between the nearest obstacle point and the origin of the camera. Finally, a travelable area in the shooting area of the target image is determined according to the second type of grid cells, wherein the second type of grid cells adjacent to the first type of grid cells are boundaries of the travelable area. It should be understood that the mesh cells of the second type may form a travelable region in the shooting region of the target image after merging.

In the above embodiment, before determining the new weight of each grid cell according to the initial weight of each grid cell and the weighted value, the initial weight of each grid cell corresponding to the target image may be determined. And the initial weight of each grid cell of the target image is obtained by converting the truncated directed distance field corresponding to the matched image into the truncated directed distance field corresponding to the target image through rotating and translating according to the relative poses of the cameras of the matched image and the target image, and determining the corresponding relation between the grid cell corresponding to the matched image and the grid cell corresponding to the target image. And the new weight determined in the grid cell corresponding to the matched image is the initial weight of the grid cell corresponding to the target image.

If the target picture is the first picture of the consecutive pictures, the initial weight is 0.

According to the embodiment, the weighted value of each grid unit in the uniformly-divided network in the target image shooting area is calculated, the weight is updated for the grid units, and the new weight of the grid units is updated by weighting continuously along with the continuous updating of continuous image frames, so that the smooth noise can be realized, the problem of missing detection of the nearest barrier point of a single frame can be solved, and the reliability of detection of the travelable area is improved.

In the above embodiment, an implementation manner of determining the weighted value of each grid cell may be, for example:

if the difference value of the radial distance of the nearest barrier point in the direction of the grid unit to the camera origin is less than or equal to the distance upper limit threshold value, the weighted value of the grid unit is a first value.

If the difference value of the radial distance of the grid unit to the camera origin in the direction of the nearest barrier point is subtracted from the radial distance of the nearest barrier point to the camera origin and is smaller than or equal to the distance lower limit threshold, the weighted value of the grid unit is a second value.

And if the radial distance from the nearest barrier point to the camera origin is less than the difference value of the radial distance from the grid unit to the camera origin in the direction of the nearest barrier point, and the difference value is less than the distance upper limit threshold and greater than the distance lower limit threshold, the weighted value of the grid unit is a third value, wherein the third value is a value determined according to the difference value and a preset smooth continuous function.

As an example, the weighting value f (c) of the grid cell in the specified direction may be determined according to the following formula one:

where c is the radial distance of the grid cell from the camera origin, and d is the radial distance of the nearest obstacle point in the grid direction from the camera origin. In the formula I, the distance upper limit threshold is 0.3, the distance lower line threshold is-0.3, the first value is 1, the second value is-1, and the smooth continuous function is a preset trigonometric function.

Referring to fig. 10, a schematic view of a travelable area provided in the embodiment of the present application is shown. The shaded area in fig. 10 is a travelable area. The travelable area boundary shown in fig. 10 is the result of the cumulative weighting update of the weighted values of the images of the plurality of frames, and is a further optimization of the position marked by the nearest obstacle point shown in fig. 8. The grid cell weight determined by the nearest obstacle of the target image and the initial weight accumulated by previous continuous frames are fused to the travelable region boundary shown in fig. 10, and the reliability is high.

The embodiment realizes the determination of the weighted value of each grid unit, and the grid units with the difference values between the distance upper limit threshold and the distance lower limit threshold are smoothly transited through the smooth continuous function, so that the noise in the process of multi-frame fusion weighted accumulation is further reduced, the reliability of determining the new weight of the grid units is improved, and the reliability of detecting the travelable area is further improved.

The embodiment shown in fig. 1 acquires a matching image and a target image acquired by shooting a surrounding area of a vehicle with an on-vehicle monocular camera, wherein the matching image is a previous frame image consecutive to the target image; then, performing three-dimensional reconstruction on the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image, thereby realizing reconstruction of the three-dimensional point cloud data of the continuous frame image acquired by the monocular camera; and then, according to the target three-dimensional point cloud data, determining a nearest barrier point in the shooting area of the target image, and according to each nearest barrier point, determining a travelable area in the shooting area of the target image, without acquiring a three-dimensional image or a large number of image samples for machine learning, thereby reducing the dependence on an image acquisition mode, reducing the processing difficulty and improving the accuracy and reliability of detection.

Fig. 11 is a schematic structural view of a travelable area detection apparatus according to an embodiment of the present application. The travelable region detection apparatus 30 shown in fig. 11 includes:

the image acquisition module 31 is configured to acquire a matching image and a target image acquired by shooting a surrounding area of a vehicle with an on-vehicle monocular camera, where the matching image is an image of a frame preceding the target image.

And the first processing module 32 is configured to perform three-dimensional reconstruction on the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image.

And the second processing module 33 is configured to determine a nearest obstacle point in the shooting area of the target image according to the target three-dimensional point cloud data.

And a third processing module 34, configured to determine a travelable area in the shooting area of the target image according to each nearest obstacle point.

The travelable region detection device provided by the embodiment acquires a matching image and a target image acquired by shooting a region around a vehicle by using a vehicle-mounted monocular camera, wherein the matching image is a previous frame image continuous with the target image; then, performing three-dimensional reconstruction on the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image, so that the reconstruction of the three-dimensional point cloud data of the continuous frame image acquired by the monocular camera is realized; and then, according to the target three-dimensional point cloud data, determining a nearest barrier point in the shooting area of the target image, and according to each nearest barrier point, determining a travelable area in the shooting area of the target image, without acquiring a three-dimensional image or a large number of image samples for machine learning, thereby reducing the dependence on an image acquisition mode, reducing the processing difficulty and improving the accuracy and reliability of detection.

In some embodiments, the first processing module 32 is specifically configured to project the matching image onto a plurality of preset projection surfaces according to the view angle of the target image, so as to obtain a plurality of projection images, where each projection surface corresponds to a depth relative to an origin of the camera; determining the estimated depth of the pixels in the target image according to the matching cost of the pixels in the target image and the corresponding pixels in the plurality of projection images; and acquiring target three-dimensional point cloud data corresponding to the target image according to the estimated depth of the pixels in the target image.

In the embodiment, the projection images of the matched images on different depth projection surfaces are used for realizing depth recovery on the target image through cost matching, the target three-dimensional point cloud data is obtained through conversion, the accuracy and the reliability of three-dimensional reconstruction on the monocular image are improved, and the accuracy and the reliability of detection of the travelable area are further improved.

In some embodiments, the projection surface comprises: n1 vertical projection planes; the N1 vertical projection planes are parallel to the right opposite of the camera, and the distances from the camera origin to the N1 vertical projection planes are in an inversely proportional equal difference distribution, where N1 is an integer greater than 1. According to the depth restoration method and device, the depth restoration of the area in front of the camera is achieved through the vertical projection plane, the accuracy of the depth restoration in complex environments such as a curve is improved, and the accuracy and the reliability of three-dimensional reconstruction of the target image are improved.

In some embodiments, the projection surface further comprises: n2 horizontal projection planes and/or N3 projection spheres; the N2 horizontal projection planes are parallel to the ground right below the camera, and the N2 horizontal projection planes are uniformly arranged in the ground distribution range taking the ground as a symmetry center, wherein N2 is an integer larger than 1; the N3 projection spherical surfaces are concentric spherical surfaces taking the camera origin as the sphere center, and the radiuses of the N3 projection spherical surfaces are in inverse proportion equal difference distribution, wherein N3 is an integer larger than 1. According to the depth recovery method, the depth of the ground area in the target image is recovered through the horizontal projection plane, more normal sampling is introduced through the projection spherical surface, and the accuracy and the reliability of depth recovery are improved. For points in the target image which are not on the horizontal plane and the vertical plane, the accuracy of depth recovery of the points can be improved by combining the projection spherical surface to increase normal sampling. In addition, the parallel projection spherical surface is introduced, so that a favorable projection surface can be provided for a target image of a fisheye with a viewing angle larger than 180 degrees, and the accuracy and the reliability of depth recovery are improved.

In some embodiments, the first processing module 32 is specifically configured to obtain a target pixel window feature of a pixel in the target image; acquiring projection pixel window characteristics of corresponding pixels in the plurality of projection images; according to the target pixel window characteristic and the projection pixel window characteristic, obtaining the matching cost of the pixel in the target image and the corresponding pixel in each projection image; and taking the depth corresponding to the corresponding pixel with the minimum matching cost as the estimated depth of the pixel in the target image, wherein the depth corresponding to the corresponding pixel is the depth corresponding to the projection plane where the corresponding pixel is located.

In the embodiment, the depth corresponding to the corresponding pixel with the minimum matching cost is used as the estimated depth of the pixel in the target image, so that the accuracy of depth recovery of the pixel in the target image is improved.

In some embodiments, the first processing module 32, before determining the estimated depth of the pixel in the target image according to the matching cost of the pixel in the target image and the corresponding pixel in the plurality of projection images, is further configured to determine the corresponding pixel in the matching image, which corresponds to the pixel in the target image, in a one-to-one manner according to the camera relative poses of the target image and the matching image; and determining corresponding pixels in the projection images, which correspond to pixels in the target image one by one, according to the corresponding pixels in the matched images.

According to the depth recovery method and device, the corresponding pixels in the matched image and the target image are positioned through the relative pose of the camera, and the accuracy and reliability of the depth recovery of the target image are improved.

In some embodiments, the first processing module 32 is further configured to, before determining, according to the camera relative poses of the target image and the matching image, respective pixels in the matching image that correspond to the pixels in the target image in a one-to-one manner, acquire wheel speed count data of a rear wheel of a vehicle indicating a horizontal movement distance of the vehicle-mounted monocular camera and vehicle-mounted inertial measurement unit IMU data indicating a horizontal movement direction of the vehicle-mounted monocular camera; determining camera pose data of the vehicle-mounted monocular camera according to wheel speed meter data of the rear wheel of the vehicle and the vehicle-mounted IMU data; and determining the camera relative pose of the target image and the matched image according to the camera pose data of the vehicle-mounted monocular camera.

In the embodiment, because the rear wheels and the vehicle body do not rotate relatively in the horizontal direction, the wheel speed of the rear wheels can directly represent the moving speed of the vehicle body, and the camera pose is obtained by combining the wheel speed meter data of the rear wheels of the vehicle and the vehicle-mounted IMU data, so that the reliability of the camera pose is improved, and the accuracy and the reliability of the detection of the travelable area are improved.

In some embodiments, the second processing module 33 is configured to determine, according to the target three-dimensional point cloud data and a polar grid network that horizontally slices the shooting area of the target image, the number of obstacle points included in each grid in the polar grid network, where the obstacle points are target three-dimensional points whose height to ground is greater than a preset obstacle height threshold; and determining the nearest barrier points in each direction in the shooting area of the target image according to the nearest barrier grid in each sector partition of the polar coordinate grid network, wherein the nearest barrier grid is the grid which has the closest radial distance with the origin of the camera in the sector partition and contains barrier points with the number larger than a preset number threshold. Specifically, for example, a target three-dimensional point in each grid of a preset polar grid network is determined according to the target three-dimensional point cloud data and a camera origin position corresponding to the target image, wherein the polar grid network is a segmentation network formed by overlapping and segmenting a first type of cutting surfaces arranged in a fan shape and a second type of cutting surfaces arranged in parallel and facing the camera in a shooting area of the target image, and an intersection line of the first type of cutting surfaces is collinear with a ground perpendicular line of the camera origin; determining the number of barrier points contained in each grid, wherein the barrier points are target three-dimensional points with the ground height larger than a preset barrier height threshold; determining a nearest barrier grid in a sector zone divided by the first type of cutting surface, wherein the nearest barrier grid is a grid which is closest to the radial distance of the origin of the camera in the sector zone and contains barrier points with the number larger than a preset number threshold; and determining a nearest obstacle point in the shooting area of the target image according to the nearest obstacle grid.

The method and the device have the advantages that the shooting area of the target image is spatially divided by utilizing the polar coordinate grid network established for the origin of the camera, the grids which are probably the obstacles in the fan-shaped partitions in all directions and have the minimum radial distance to the origin of the camera are extracted, the nearest obstacle point in all directions of the origin of the camera is determined, the accuracy of the nearest obstacle point is improved, and the accuracy and the reliability of the detection of the drivable area are further improved.

In some embodiments, the second processing module 33 is configured to obtain an average position point of the obstacle points included in the nearest obstacle grid; and taking the average position point corresponding to each nearest obstacle grid as the nearest obstacle point in the shooting area of the target image.

The embodiment further optimizes the position of the nearest barrier point and improves the accuracy of the barrier point.

In some embodiments, the third processing module 34 is configured to, in a uniform segmentation network that horizontally segments a capture area of the target image, determine a weighted value of each grid cell according to a position of each grid cell in the uniform segmentation network relative to the nearest obstacle point; determining a new weight of each grid cell according to the initial weight of each grid cell and the weighted value, wherein the new weight is greater than or equal to a minimum weight threshold and less than or equal to a maximum weight threshold, and the initial weight of each grid cell is 0 or is a new weight determined when a travelable area is determined for a previous frame of image; and determining a travelable area in the shooting area of the target image according to the first type of grid cells or the second type of grid cells indicated by the new weight, wherein the nearest barrier point is arranged between the first type of grid cells indicated by the new weight and the origin of the camera, and the nearest barrier point is not arranged between the second type of grid cells indicated by the new weight and the origin of the camera. For example, a uniform segmentation network including the nearest obstacle point may be obtained according to the nearest obstacle point and a camera origin position corresponding to the target image, where the uniform segmentation network is a segmentation network formed by segmenting a shooting area of the target image in a horizontal direction by a uniform square grid; determining a weighted value of each grid unit according to the radial distance of each grid unit in the uniformly-divided network to the camera origin and the radial distance of the nearest barrier point in the direction of the grid unit to the camera origin; determining a new weight of each grid cell according to the initial weight of each grid cell and the weighted value, wherein the new weight is greater than or equal to a minimum weight threshold and less than or equal to a maximum weight threshold, and the initial weight of each grid cell is 0 or is a new weight determined when a travelable area is determined for a previous frame of image; determining the grid cells to be first type grid cells or second type grid cells according to the new weight values of the grid cells, wherein the nearest barrier points exist between the first type grid cells and the camera origin, and the nearest barrier points do not exist between the second type grid cells and the camera origin; determining a travelable region in a shooting region of the target image according to the second type of grid cells, wherein the second type of grid cells adjacent to the first type of grid cells are boundaries of the travelable region.

According to the method, the weighted value of each grid unit in the uniformly-divided network in the target image shooting area is calculated, the weight is updated for the grid units, and the new weight of each grid unit is updated in a weighting mode continuously along with continuous updating of continuous image frames, so that the smooth noise can be achieved, the problem that the nearest barrier point of a single frame is missed to be detected can be solved, and the reliability of detection on the driving-capable area is improved.

In some embodiments, the third processing module 34 is specifically configured to, if a difference between a radial distance from a grid unit to a camera origin in the evenly-divided network and a radial distance from a closest obstacle point in the direction of the grid unit to the camera origin is greater than or equal to a distance upper limit threshold, determine that a weighted value of the grid unit is a first value; if the difference value of the radial distance of the grid unit to the camera origin in the direction of the nearest barrier point is subtracted from the radial distance of the nearest barrier point to the camera origin, and the difference value is smaller than or equal to a distance lower limit threshold, the weighted value of the grid unit is a second value; if the difference value of the radial distance from the grid unit to the camera origin in the direction of the nearest barrier point is less than the distance upper limit threshold and greater than the distance lower limit threshold, the weighted value of the grid unit is a third value, wherein the third value is a value determined according to the difference value and a preset smooth continuous function; wherein the distance upper threshold is the inverse of the distance lower threshold, the first value is the inverse of the second value, and the absolute value of the third value is smaller than the absolute value of the distance upper threshold or the absolute value of the distance lower threshold.

The embodiment realizes the determination of the weighted value of each grid unit, and the smooth transition of the grid units with the difference between the distance upper limit threshold and the distance lower limit threshold is realized through the smooth continuous function, so that the noise in the process of multi-frame fusion weighted accumulation is further reduced, the reliability of determining the new weight of the grid units is improved, and the reliability of detecting the travelable area is further improved.

In some embodiments, the matching image and the target image are both fisheye images. According to the embodiment of the application, the fisheye image is adopted, so that the horizontal visual angle is increased, the visual field range of the image shooting area is enlarged, and the reliability of the drivable area detection is improved.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Referring to fig. 12, a block diagram of an electronic device of a travelable area detection method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 12, the electronic apparatus includes: one or more processors 1201, memory 1202, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 12 illustrates an example of a processor 1201.

Memory 1202 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the travelable region detection method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the travelable region detection method provided by the present application.

The memory 1202 is a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (for example, the image acquisition module 31, the first processing module 32, the second processing module 33, and the third processing module 34 shown in fig. 11) corresponding to the travelable region detection method in the embodiment of the present application. The processor 1201 executes various functional applications of the server and data processing, i.e., a method of travelable area detection in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 1202.

The memory 1202 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device detected by the travelable region, and the like. Further, the memory 1202 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1202 may optionally include memory remotely located from the processor 1201 which may be connected to the drivable area detecting electronic devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of travelable region detection may further include: an input device 1203 and an output device 1204. The processor 1201, the memory 1202, the input device 1203, and the output device 1204 may be connected by a bus or other means, and are illustrated as being connected by a bus in fig. 12.

The input device 1203 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus for travelable region detection, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input device. The output devices 1204 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments are not intended to limit the scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A travelable region detection method, comprising:

in a uniform segmentation network for horizontally segmenting the shooting area of the target image, determining the weighted value of each grid unit according to the radial distance of each grid unit in the uniform segmentation network to the origin of the camera and the radial distance of the nearest barrier point in the direction of the grid unit to the origin of the camera;

and determining a travelable area in the shooting area of the target image according to the first type of grid cells or the second type of grid cells indicated by the new weight, wherein the nearest barrier point is arranged between the first type of grid cells indicated by the new weight and the camera origin, and the nearest barrier point is not arranged between the second type of grid cells indicated by the new weight and the camera origin.

2. The method of claim 1, wherein the three-dimensional reconstruction of the target image according to the matching image to obtain target three-dimensional point cloud data corresponding to the target image comprises:

3. The method of claim 2, wherein the projection surface comprises: n1 vertical projection planes;

the N1 vertical projection planes are parallel to the right opposite of the camera, and the distances from the camera origin to the N1 vertical projection planes are in an inversely proportional equal difference distribution, wherein N1 is an integer greater than 1.

4. The method of claim 3, wherein the projection surface further comprises: n2 horizontal projection planes and/or N3 projection spheres;

5. The method of any of claims 2 to 4, wherein determining the estimated depth of the pixel in the target image based on the cost of matching the pixel in the target image to the corresponding pixel in the plurality of projection images comprises:

acquiring target pixel window characteristics of pixels in the target image;

6. The method according to claim 1, wherein the determining a nearest obstacle point in a shooting area of the target image according to the target three-dimensional point cloud data comprises:

and determining the nearest barrier points in each direction in the shooting area of the target image according to the nearest barrier grid in each fan-shaped partition of the polar coordinate grid network, wherein the nearest barrier grid is the grid which has the closest radial distance with the origin of the camera in the fan-shaped partition and contains barrier points with the number larger than a preset number threshold.

7. The method of claim 6, wherein determining nearest obstacle points in each direction in the capture area of the target image from the nearest obstacle grids in each sector of the polar grid mesh network comprises:

8. The method of claim 1, 2, 3, 4, 6 or 7, wherein the matching image and the target image are both fisheye images.

9. A travelable region detection apparatus, characterized by comprising:

the system comprises an image acquisition module, a matching image acquisition module and a target image acquisition module, wherein the image acquisition module is used for acquiring the matching image and the target image acquired by shooting the surrounding area of a vehicle by using a vehicle-mounted monocular camera, and the matching image is the previous frame image of the target image;

the third processing module is used for determining the weighted value of each grid unit in an evenly-divided network for horizontally dividing the shooting area of the target image according to the radial distance of each grid unit in the evenly-divided network to the camera origin and the radial distance of the nearest barrier point in the direction of the grid unit to the camera origin; determining a new weight of each grid cell according to the initial weight of each grid cell and the weighted value, wherein the new weight is greater than or equal to a minimum weight threshold and less than or equal to a maximum weight threshold, and the initial weight of each grid cell is 0 or is a new weight determined when a travelable area is determined for a previous frame of image; and determining a travelable area in the shooting area of the target image according to the first type of grid cells or the second type of grid cells indicated by the new weight, wherein the nearest barrier point is arranged between the first type of grid cells indicated by the new weight and the origin of the camera, and the nearest barrier point is not arranged between the second type of grid cells indicated by the new weight and the origin of the camera.

10. The apparatus according to claim 9, wherein the first processing module is specifically configured to project the matching image onto a plurality of preset projection surfaces according to the view angle of the target image, so as to obtain a plurality of projection images, where each projection surface corresponds to a depth relative to an origin of a camera; determining the estimated depth of the pixels in the target image according to the matching cost of the pixels in the target image and the corresponding pixels in the plurality of projection images; and acquiring target three-dimensional point cloud data corresponding to the target image according to the estimated depth of the pixels in the target image.

11. The apparatus of claim 10, wherein the projection surface comprises: n1 vertical projection planes; the N1 vertical projection planes are parallel to the right opposite of the camera, and the distances from the camera origin to the N1 vertical projection planes are in an inversely proportional equal difference distribution, wherein N1 is an integer greater than 1.

12. The apparatus of claim 11, wherein the plane of projection further comprises: n2 horizontal projection planes and/or N3 projection spheres; the N2 horizontal projection planes are parallel to the ground right below the camera, and the N2 horizontal projection planes are uniformly arranged in the ground distribution range taking the ground as a symmetry center, wherein N2 is an integer larger than 1; the N3 projection spherical surfaces are concentric spherical surfaces taking the camera origin as the sphere center, and the radiuses of the N3 projection spherical surfaces are in inverse proportion equal difference distribution, wherein N3 is an integer larger than 1.

13. The apparatus according to any one of claims 10 to 12, wherein the first processing module is specifically configured to obtain a target pixel window characteristic of a pixel in the target image; acquiring projection pixel window characteristics of corresponding pixels in the plurality of projection images; according to the target pixel window characteristic and the projection pixel window characteristic, obtaining the matching cost of the pixel in the target image and the corresponding pixel in each projection image; and taking the depth corresponding to the corresponding pixel with the minimum matching cost as the estimated depth of the pixel in the target image, wherein the depth corresponding to the corresponding pixel is the depth corresponding to the projection plane where the corresponding pixel is located.

14. The apparatus according to claim 9, wherein the second processing module is configured to determine, according to the target three-dimensional point cloud data and a polar grid network horizontally dividing a shooting area of the target image, a number of obstacle points included in each grid in the polar grid network, where the obstacle points are target three-dimensional points whose height to ground is greater than a preset obstacle height threshold; and determining the nearest barrier points in each direction in the shooting area of the target image according to the nearest barrier grid in each fan-shaped partition of the polar coordinate grid network, wherein the nearest barrier grid is the grid which has the closest radial distance with the origin of the camera in the fan-shaped partition and contains barrier points with the number larger than a preset number threshold.

15. The apparatus according to claim 14, wherein the second processing module 33 is configured to obtain an average position point of obstacle points included in the nearest obstacle grid; and taking the average position point corresponding to each nearest obstacle grid as the nearest obstacle point in the shooting area of the target image.

16. The apparatus of claim 9, 10, 11, 12, 14 or 15, wherein the matching image and the target image are both fisheye images.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the travelable region detection method of any of claims 1-8.

18. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the travelable region detection method of any of claims 1 to 8.