CN109344746B

CN109344746B - Pedestrian counting method, system, computer device and storage medium

Info

Publication number: CN109344746B
Application number: CN201811081090.6A
Authority: CN
Inventors: 曹志杰; 吴旻烨
Original assignee: Yaoke Intelligent Technology Shanghai Co ltd
Current assignee: Yaoke Intelligent Technology Shanghai Co ltd
Priority date: 2018-09-17
Filing date: 2018-09-17
Publication date: 2022-02-01
Anticipated expiration: 2038-09-17
Also published as: CN109344746A

Abstract

The pedestrian counting method, the system, the computer equipment and the storage medium acquire a plurality of pedestrian images acquired by a light field camera array; respectively identifying the areas to which the pedestrians corresponding to the pedestrians in each pedestrian image belong through a target detector; carrying out first duplication removal on regions to which pedestrians belonging to the same pedestrian among pedestrian images of different cameras belong, and counting the regions to which the duplicated pedestrians belong to obtain the number of the first pedestrians; performing second pedestrian detection and second duplicate removal on other areas except the area where each pedestrian belongs in each pedestrian image, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians; counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number; the invention starts from the advantages of the light field camera array, combines a deep learning high-performance target detection method, avoids the problem of missed detection, and also obtains a more accurate pedestrian counting result by matching with a deep learning pedestrian weight removing mode.

Description

Pedestrian counting method, system, computer device and storage medium

Technical Field

The present invention relates to the field of image vision technology, and in particular, to a pedestrian counting method, system, computer device, and storage medium.

Background

The pedestrian counting is one of common applications of computer vision, and the real-time number condition of people in a scene can be counted by matching a related algorithm of the computer vision with a camera of the related scene, so that passenger flow information can be provided for places such as a market, a scenic spot and the like for further analysis, the number of pedestrians can be provided for a street for analyzing potential safety hazards, or reliable passenger flow distribution information can be provided for a subway, an airport and the like, so that the number of shifts of transportation tools can be reasonably configured by related departments, and the like.

In recent years, as for the related art of pedestrian counting, a deep learning method is available, and a conventional method is also available, and the method of deep learning can be divided into two methods, one is a detection method, the number of pedestrians is counted by detecting the pedestrians in a scene, and the defect is that many people cannot be detected due to occlusion. The other deep learning mode is a regression mode, the density distribution of pedestrians is approximately regressed through a neural network, the shielding situation can be effectively overcome, but the method is easily influenced by scenes, clothes, illumination and the like, one scene is changed, further training is possibly needed, and the universality is poor. The main advantage of the conventional method is high speed, but the method is inferior to the deep learning method in terms of detection precision and dealing with the problem of occlusion. Meanwhile, a plurality of cameras are used, and a patent tries to solve the problem that some people cannot detect a single camera by using the plurality of cameras in a multi-view mode, but most of the cameras adopt a traditional detection method, the advantages of deep learning are not combined, and the effect is not good.

Disclosure of Invention

In view of the above-described drawbacks of the prior art, it is an object of the present invention to provide a pedestrian counting method, system, computer device and storage medium that solve the problems of the prior art.

To achieve the above and other related objects, the present invention provides a pedestrian counting method, comprising: acquiring a plurality of pedestrian images acquired by a light field camera array; wherein the plurality of pedestrian images comprises: a pedestrian image captured under a field of view region of each camera in the light field camera array; respectively identifying the areas to which the pedestrians corresponding to the pedestrians in each pedestrian image belong through a target detector; performing first de-weighting on regions to which pedestrians belonging to the same pedestrian belong among pedestrian images of different cameras, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians; performing second pedestrian detection on other regions except the region where each pedestrian belongs in each pedestrian image, performing second duplicate removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians; and counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number.

In an embodiment of the present invention, the target detector is implemented by a multi-layer full convolution neural network model, which generates a target bounding box for framing each pedestrian in the pedestrian image as the area to which the pedestrian belongs.

In an embodiment of the invention, the object detector is trained to have its object bounding box used only for framing a part of the image recognized as a pedestrian classification.

In an embodiment of the present invention, the first deduplication comprises: converting two-dimensional image coordinates of pixel points in the pedestrian belonging area of each pedestrian image into three-dimensional world coordinates, and converting the three-dimensional world coordinates into corresponding two-dimensional ground coordinates to obtain first mapping points corresponding to the two-dimensional ground coordinates; the method comprises the steps of pairing first mapping points which are respectively obtained from areas to which each pedestrian belongs in two pedestrian images, and judging whether areas to which the pedestrians to which a pair of pixel points corresponding to each pair of first mapping points respectively belong are collected under public view areas of two corresponding cameras or not; if not, classifying the pair of first mapping points into different pedestrians respectively; if yes, calculating the similarity degree between the pair of first mapping points; judging whether the similarity degree is larger than a preset threshold value or not; if yes, judging that the pair of first mapping points are classified into different pedestrians; if not, the pair of first mapping points is judged to belong to the same pedestrian, and third duplication elimination is carried out.

In an embodiment of the invention, the third deduplication comprises: respectively inputting the regions of the pedestrians to which a pair of pixel points corresponding to each pair of first mapping points suspected to belong to the same pedestrian respectively belong into two neural networks of a twin neural network model; calculating the similarity degree between the feature vectors respectively output at the last full-connection layer of the two neural networks; judging whether the similarity degree between the feature vectors is larger than a preset threshold value or not; if yes, judging that the areas to which the two pedestrians belong to the same pedestrian, and further removing the weight; if not, judging that the areas to which the two pedestrians belong to different pedestrians.

In an embodiment of the present invention, the calculation of the similarity degree includes: any one of cosine distance, euclidean distance, normalized euclidean distance, mahalanobis distance, hamming distance, and manhattan distance.

In an embodiment of the present invention, the performing the second pedestrian detection on the other areas except the area to which each pedestrian belongs in each pedestrian image includes: corresponding to each pedestrian image, processing other areas in the pedestrian image by using a compensation detection model for regressing human head distribution to obtain a corresponding human head distribution density image; wherein the head distribution density image is used for integrating to calculate the second pedestrian number.

In an embodiment of the invention, the second deduplication comprises: converting two-dimensional image coordinates of pixel points in other areas except the area where the pedestrian of each pedestrian image belongs to into three-dimensional world coordinates to obtain corresponding second mapping points, wherein each second mapping point corresponding to each pedestrian image forms a partial space coverage area of a camera for collecting the pedestrian image; eliminating repeated coverage areas among partial space coverage areas of different cameras in the light field camera array through a repulsion theorem so as to reserve the individual coverage areas of the cameras and obtain the de-emphasis head distribution density images corresponding to the individual coverage areas; wherein the partial head distribution density image is used for integration to calculate the second pedestrian number.

To achieve the above and other related objects, the present invention provides a pedestrian counting system, comprising: the communication unit is used for acquiring a plurality of pedestrian images acquired by the light field camera array; wherein the plurality of pedestrian images comprises: a pedestrian image captured under a field of view region of each camera in the light field camera array; the processing unit is used for respectively identifying the areas to which the pedestrians correspond in each pedestrian image through the target detector; performing first de-weighting on regions to which pedestrians belonging to the same pedestrian belong among pedestrian images of different cameras, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians; performing second pedestrian detection on other regions except the region where each pedestrian belongs in each pedestrian image, performing second duplicate removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians; and counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number.

To achieve the above and other related objects, the present invention provides a computer apparatus, comprising: a communicator, a processor, and a memory; the communicator is in communication connection with the light field camera array; the memory stores a computer program; the processor is used for operating the computer program to realize the pedestrian counting method.

In an embodiment of the invention, the computer device and the light field camera array are connected through a communication network.

To achieve the above and other related objects, the present invention provides a computer storage medium storing a computer program; the computer program is run to implement the pedestrian counting method.

As described above, the pedestrian counting method, system, computer device, and storage medium of the present invention acquire a plurality of pedestrian images acquired by a light field camera array; wherein the plurality of pedestrian images comprises: a pedestrian image captured under a field of view region of each camera in the light field camera array; respectively identifying the areas to which the pedestrians corresponding to the pedestrians in each pedestrian image belong through a target detector; performing first de-weighting on regions to which pedestrians belonging to the same pedestrian belong among pedestrian images of different cameras, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians; performing second pedestrian detection on other regions except the region where each pedestrian belongs in each pedestrian image, performing second duplicate removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians; counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number; based on the advantages of multiple visual angles, convenience in calculation and the like of the light field camera array, the deep learning high-performance target detection method is combined, the problem of missed detection is avoided, and a more accurate pedestrian counting result is obtained by matching with a pedestrian weight removing mode of deep learning.

Drawings

Fig. 1 is a flowchart illustrating a pedestrian counting method according to an embodiment of the present invention.

FIG. 2 is a diagram illustrating the detection effect of the target detector according to an embodiment of the present invention.

Fig. 3 is a schematic flow chart of a first deduplication process according to an embodiment of the present invention.

Fig. 4 is a schematic flow chart illustrating a third deduplication process according to an embodiment of the present invention.

FIG. 5 is a block diagram illustrating a regression model for a third deduplication application in an embodiment of the present invention.

FIG. 6A is a schematic diagram illustrating a process of obtaining a human head distribution density map by regression model processing according to an embodiment of the present invention.

Fig. 6B shows a distribution density of human heads after the area to which the pedestrian belongs is masked according to the embodiment of the present invention.

Fig. 7 is a flowchart illustrating a second deduplication process according to an embodiment of the present invention.

FIG. 8 is a functional block diagram of a pedestrian counting system in accordance with an embodiment of the present invention.

Fig. 9 is a system architecture diagram illustrating a computer device and a connection method thereof for implementing pedestrian counting according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The technical scheme of the invention relates to improvement of a pedestrian counting scheme, and aims at the problems that in the prior art, the camera acquires image data and the deep learning algorithm is easy to interfere, the two are combined and optimized, and the pedestrian counting accuracy is improved.

Fig. 1 shows a flow chart of a pedestrian counting method according to an embodiment of the present invention.

The method specifically comprises the following steps:

step S101: acquiring a plurality of pedestrian images acquired by a light field camera array; wherein the plurality of pedestrian images comprises: an image of a pedestrian captured under a field of view of each camera in the light field camera array.

The light field camera array comprises a plurality of cameras, each camera collects images of pedestrians from different angles, and then a total image with depth can be spliced; in the embodiment of the invention, the pedestrian number statistics is completed by utilizing each pedestrian image to carry out pedestrian detection and de-weighting.

Step S102: and respectively identifying the areas to which the pedestrians corresponding to the pedestrians in each pedestrian image belong through the target detector.

At present, most of pedestrian detection methods for deep learning rely on general target detection methods, which can detect many categories of objects (each object corresponds to a category, such as pets, plants, pedestrians, buildings, and the like), but the technical scheme of the invention only needs to detect pedestrians, so that a model special for detecting pedestrians is obtained by calling a general target detection model and training through transfer learning on a pedestrian detection data set. Meanwhile, in order to consider the requirement of real-time video detection, a rapid pedestrian detection method is required, and a common target detection algorithm, such as fast R-CNN, has the disadvantage of slow speed.

Therefore, in an embodiment of the present invention, the target detector is preferably made by using an SSD model.

The SSD is called a Single Shot multi box Detector, which is from Jisoo Jeong, Hyojin Park, non Kwak et al article "Enhancement of SSD by y containing features maps for object detection", and is a multi-layer full convolution network, each convolution layer of the model for feature extraction generates multiple object bounding boxes, thereby realizing the generation of multi-scale object bounding boxes from different convolution layers, and the object bounding boxes are used for framing objects in an image, such as pedestrians, plants, pets, buildings, etc.

In the embodiments of the present application, the region in the pedestrian image occupied by the target bounding box of the pedestrian in the frame-selected image is referred to as a region to which the pedestrian belongs.

As shown in fig. 2, a processing procedure of framing an area 203 to which each pedestrian belongs in one pedestrian image 202 by the object detector 201 in the present embodiment is shown.

It should be noted that, the present invention uses a multilayer full convolution network, such as SSD model, as the target detector 201 to realize the generation of the target bounding box, only performs the frame selection on the pedestrian, and discards the bounding boxes corresponding to other redundant classifications, so as to form a high-efficiency target detector, which is very suitable for the pedestrian target detection of the video stream.

Specifically, since the SSD model itself is used for multi-target detection, but it is not necessary to detect other kinds of targets for pedestrian counting, such as a bag or a backpack in the above figure, the SSD model trained on the multi-target detected data set is migration-learned with a data set having only a pedestrian bounding box, i.e., fine-tuning of the network model is performed.

When the model is fine-tuned, the network structure is unchanged, the original data set is diversified, the current data set is of a single type, namely, whether the content in a target bounding box is a person or a background is judged, and the method is described by the following formula:

the above formula is a classification loss function in SSD to improve classification accuracy,

the representative marks the bounding box for the class p, and the higher the mark is, the more likely the target bounding box content belongs to the class p.

The whole classification loss function is calculated by calculating whether the target bounding box is a scoring value of the background, namely i belongs to a part of Neg, carrying out log processing on the scoring value, judging whether the target bounding box detected by the SSD and the artificially marked target bounding box in the data set are subjected to overlap ratio calculation, wherein the overlap ratio exceeds a certain threshold value,

the target bounding box is equal to 1, the coincidence degree of the ith target bounding box detected by the SSD and the calibrated jth target bounding box (the object class in the box is p) is high, otherwise, the coincidence degree of the ith target bounding box detected by the SSD and the calibrated jth target bounding box detected by the SSD is high

It is equal to 0. And (3) what kind of calibrated bounding box with high coincidence degree with the detected target bounding box belongs to, scoring the kind of the detected bounding box by the detected bounding box, taking log, and accumulating the log, namely the part of the formula, which belongs to the Pos, belongs to.

In the embodiment of the invention, when the SSD is trained by utilizing a pedestrian data set to realize fine tuning, because the calibrated target bounding boxes of the data set are only directed at the class of pedestrians, each target bounding box detected by the SSD needs to be scored only whether to score the pedestrian or not and does not score other classes if the calibrated target bounding box with higher overlap ratio with the background exists, which is different when the model is fine tuned.

Through model fine adjustment, the trained SSD can achieve a better pedestrian detection effect, and meanwhile, interference of detection targets (such as pets, plants, buildings and the like) of other categories is eliminated.

Step S103: and carrying out first de-weighting on regions to which pedestrians belonging to the same pedestrian among the pedestrian images of different cameras belong, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians.

Specifically, a specific flow of the first deduplication process is shown by the embodiment of fig. 3.

The first deduplication comprising:

step S301: and converting the two-dimensional image coordinates of the pixel points in the region of the pedestrian of each pedestrian image into three-dimensional world coordinates, and converting the three-dimensional world coordinates into corresponding two-dimensional ground coordinates to obtain first mapping points corresponding to the two-dimensional ground coordinates.

Specifically, the coordinates of the pixel points in the two-dimensional image coordinate system are converted into coordinates in the camera coordinate system by using camera internal parameters, then the coordinates are converted into world three-dimensional coordinates by using camera external parameters, and then the world three-dimensional coordinates are converted into ground coordinates in the two-dimensional ground coordinate system to eliminate the influence of the height of the pedestrian.

For example, assuming that the image coordinates of the center of the target bounding box in the image of the pedestrian captured by a certain camera are (X, Y), the spatial coordinates (X, Y, Z) in the camera coordinates are calculated from the image coordinates (X, Y) in 2D:

where d represents depth information and K represents the camera's internal reference matrix, which are all provided by the calibrated light field camera array.

Further, from the camera coordinates (X, Y, Z), common world coordinates (X ', Y ', Z ') of the camera array are calculated:

where T is the extrinsic parameter matrix of the camera and W is the coefficient of the homogeneous coordinate.

In this way, each pixel point on the pedestrian detected from the pedestrian image acquired by each camera can be mapped to the three-dimensional world coordinate in the public world coordinate system.

Because the pixel points taken in the above embodiment depend on the size of the target bounding box, that is, start from the central image coordinate, if the heights of pedestrians are different, two people with similar positions will be caused, but the calculated world three-dimensional coordinate distance is larger, so that, in order to eliminate the influence of the heights, the points in the world coordinate system can be projected to the points in the two-dimensional ground coordinate system, and the calculation process is as follows:

in the above formula, R represents a rotation matrix, the coordinates of points in the world coordinate system are converted into a standard cartesian coordinate system in the Z-axis direction, then P is a projection matrix, the height coordinates in the cartesian coordinate system are removed, and the plane coordinates are retained.

Step S302: the first mapping points which are respectively obtained from the areas to which each pedestrian belongs in the two pedestrian images are paired, and whether the areas to which the pedestrians to which the pixel points of each pair of the first mapping points respectively belong are collected in the public vision areas of the two corresponding cameras is judged.

Step S303: if not, classifying the pair of first mapping points into different pedestrians respectively;

step S304: if yes, calculating the similarity degree between the pair of first mapping points; further proceed to step S305;

step S305: judging whether the similarity degree is larger than a preset threshold value or not;

step S306: if yes, judging that the pair of first mapping points are classified into different pedestrians;

step S307: if not, the pair of first mapping points is judged to belong to the same pedestrian, and third duplication elimination is carried out.

For example, after obtaining the second mapping point coordinates (X ", Y") of each pedestrian in the two-dimensional ground coordinate system, suppose a, B two cameras in the light field camera array and a pedestrian in the a camera maps out the spatial coordinates of a point P1 in the two-dimensional ground coordinate system as (X "₁，Y”₁) And the space coordinate of a point P2 in a two-dimensional ground coordinate system mapped by a certain pedestrian in the B camera is (X) "₂，Y”₂) And (3) pairing P1 and P2, and judging whether the two belong to the same repeated pedestrian by adopting a geometric de-duplication method to perform de-duplication treatment:

it is determined whether P1 and P2 are in the common field of view of both cameras.

If not in the common field of view region, i.e. in the respective unique field of view region, then certainly not repeating, so counting directly; if both P1 and P2 are in a common area, then the next operation is performed to calculate the degree of similarity between P1 and P2 by: any one of cosine distance, euclidean distance, normalized euclidean distance, mahalanobis distance, hamming distance, and manhattan distance.

In this embodiment, the calculation method of the euclidean distance is expressed by the following formula:

further, a similarity degree threshold α is set, and if the distance > α of P1 and P2 is set, two different persons are considered, and the two cameras respectively detect a person that the other person does not detect, and therefore both persons count the first number of pedestrians.

If the distance of two points is < α, it is considered that it is likely to be a duplicate observation of the same person, and certainly, due to the detection error and the proximity of two different person stations, the spatial coordinates are close, and further third deduplication needs to be performed.

In an embodiment of the present invention, the optimization of the third deduplication may be performed by a pedestrian re-identification (ReID) method through feature extraction.

As shown in fig. 4, a third deduplication process in the embodiment is shown, which includes:

step S401: and respectively inputting the regions of the pedestrians where a pair of pixel points corresponding to each pair of first mapping points suspected to belong to the same pedestrian are respectively into two neural networks of a twin neural network model.

In one embodiment of the present invention, the twin neural network model, i.e. the Simese neural network, the structure is as shown in fig. 5, the areas of the pedestrian where the pair of pixel points corresponding to P1 and P2 of distance < alpha are respectively located can be respectively input into the two neural networks of the twin neural network, the pixel point corresponding to the P1 is classified as the pedestrian 1, the pixel point corresponding to the P2 is classified as the pedestrian 2, the pixel point set classified as the pedestrian 1 and the pixel set classified as the pedestrian 2 are partial images of the pedestrian belonging area 1 and the pedestrian belonging area 2 which are selected from the pedestrian images through the target bounding box, further, as shown in fig. 5, the regions to which the pedestrians selected by the two frames belong are respectively input into the two neural networks of the twin neural network model to determine whether the pedestrians 1 and 2 corresponding to the two partial images are the same pedestrian.

Step S402: calculating the similarity degree between the feature vectors respectively output at the last full-connection layer of the two neural networks;

step S403: judging whether the similarity degree between the feature vectors is larger than a preset threshold value or not;

step S404: if yes, judging that the areas to which the two pedestrians belong to the same pedestrian, and further removing the weight;

step S405: if not, judging that the areas to which the two pedestrians belong to different pedestrians.

For example, the last L2Norm layer of the twin neural network model is used to calculate the Euclidean distance loss function, that is, Euclidean loss, and is used in training, after the training is completed, when the similarity of partial images (each pixel) of pedestrians in two target bounding boxes is measured, only the L2Norm layer needs to be removed, and 256-dimensional feature vectors output by the last full-link FC7 layer of the model are directly obtained.

For partial images of two target bounding boxes which need to be judged whether to be the same person, 256-dimensional feature vectors are obtained by using the above network structure, then the similarity of the two feature vectors needs to be compared, the distance between the two feature vectors can be calculated by adopting an Euclidean distance (which can be any one of a cosine distance, a standardized Euclidean distance, a Mahalanobis distance, a Hamming distance and a Manhattan distance), and the similarity is taken as the similarity, the closer the distance is, the higher the similarity is, the more likely the persons in the two detection boxes are the same person, specifically, a threshold value beta is used for measurement, and the method is as follows:

if the identification (b1, b2) is equal to true, which means b1, the persons in the two target bounding boxes of b2 are the same person, and vice versa, they are different persons. distance (v1, v2) represents the Euclidean distance of two feature vectors extracted from two target bounding boxes v1 and v2, and if the Euclidean distance is larger than beta, the two feature vectors are judged to be the same person, otherwise, the two feature vectors are considered to be two different persons. The value of beta depends on the actual use effect.

Meanwhile, in order to prevent omission of counting pedestrians in other regions of the pedestrian image than the region to which the pedestrian belongs, it is necessary to count these other regions.

Specifically, as shown in fig. 2, although most pedestrians are selected after the SSD model detection, since most of the whole body is blocked and some pedestrians in the upper right area are not detected, the areas need to be detected again

Step S104: and performing second pedestrian detection on other areas except the area where each pedestrian belongs in each pedestrian image, performing second duplication removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplication removal to obtain the second pedestrian number.

Specifically, for the area where no pedestrian is detected, we can further use regression to compensate, that is, the frame is input to a regression model, which can be a full convolution neural network for regressing the distribution of human head.

As shown in fig. 6A, a process of obtaining a human head distribution density map 603 (which may be obtained by using, for example, a gaussian function as a convolution kernel) by performing regression on a pedestrian image 601, which is obtained after the region to which a pedestrian belongs is identified according to the target bounding box, through a regression model 602 is shown. The highlight part in the density map 603 indicates that the probability of the existence of the head is high, the background indicates that the probability of the existence of the head is 0, and the integral (i.e., the accumulation of the pixel values) of the whole map is performed, so that the number of the heads in the map, that is, the number of pedestrians can be obtained.

Since the pedestrian image 601 also includes the regions to which the pedestrians belong, the regions need to be eliminated and then the pedestrians need to be counted. Specifically, as shown in fig. 6B, the part of the region to which each pedestrian belongs, which corresponds to the head density distribution map 603, may be shielded by a mask 604(mask) corresponding to the size of the target bounding box, and then integrated, so that the number of pedestrians in the pedestrian image 601 is obtained, which is different from the first number of pedestrians.

However, since there may be overlapping regions between the multiple camera-captured images of the light field camera array, pedestrians located in overlapping regions of other regions may be counted repeatedly, and thus, a second deduplication may be performed.

As shown in fig. 7, a specific process of the second deduplication in the embodiment is shown, which includes:

step S701: and converting two-dimensional image coordinates of pixel points in other areas except the area where the pedestrian of each pedestrian image belongs to into three-dimensional world coordinates to obtain corresponding second mapping points, wherein each second mapping point corresponding to each pedestrian image forms a partial space coverage area of a camera for collecting the pedestrian image.

In this embodiment, similar to S301 in the foregoing embodiment, the coordinate transformation is performed, and each pixel point in another region after the region to which the pedestrian belongs in the pedestrian image is removed is mapped from the image coordinate system to the world coordinate system to form a partial spatial coverage region, and different points in images captured by different cameras may correspond to the same point in the world coordinate system, so that duplication can be removed accordingly.

Step S702: eliminating repeated coverage areas among partial space coverage areas of different cameras in the light field camera array through a repulsion theorem so as to reserve the individual coverage areas of the cameras and obtain the de-emphasis head distribution density images corresponding to the individual coverage areas; wherein the partial head distribution density image is used for integration to calculate the second pedestrian number.

Specifically, the same three-dimensional world coordinates of each point in a partial space coverage area formed by the second mapping points of the plurality of cameras are removed by adopting the repulsion theorem.

For example, assume that the partial spatial coverage area of camera 1 is a, the partial spatial coverage area of camera 2 is B, and the partial spatial coverage area of camera 3 is C

Then the overlapping regions between A, B, C are eliminated according to the repulsion theorem:

S(A∪B∪C)＝S(A)+S(B)+S(C)-S(A∩B)-S(B∩C)-S(C∩A)+S(A∩B∩C)；

a set of independent covered portions in each camera portion spatial coverage area may be calculated.

Furthermore, the regression second pedestrian number can be calculated by integrating the partial distribution density images corresponding to the set of the independent covering parts of each camera in the light field camera array, namely summing the probability density values corresponding to each point of each partial distribution density image.

Step S105: and counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number.

The first pedestrian number represents the pedestrian number corresponding to the area to which the pedestrian belongs in each pedestrian image detected by the target detector, the second pedestrian number represents the pedestrian number in other areas except the area to which the pedestrian belongs, and the total pedestrian number obtained by counting the pedestrian images shot by the light field camera array is obtained by adding.

As shown in fig. 8, a pedestrian counting system 800 provided in the embodiment of the present invention is shown, and since the principle of the pedestrian counting system 800 is substantially the same as that of the foregoing method embodiment, various technical details in the method embodiment can be applied to the embodiment, and thus, repeated descriptions are omitted.

The system 800 includes:

a communication unit 801, configured to acquire a plurality of pedestrian images acquired by a light field camera array; wherein the plurality of pedestrian images comprises: a pedestrian image captured under a field of view region of each camera in the light field camera array;

the processing unit 802 is configured to identify, through the target detector, regions to which pedestrians correspond in each pedestrian image respectively belong; performing first de-weighting on regions to which pedestrians belonging to the same pedestrian belong among pedestrian images of different cameras, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians; performing second pedestrian detection on other regions except the region where each pedestrian belongs in each pedestrian image, performing second duplicate removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians; and counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number.

It should be noted that the division of each unit of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these units can be implemented entirely in software, invoked by a processing element; or may be implemented entirely in hardware; and part of the units can be realized in the form of calling software by the processing element, and part of the units can be realized in the form of hardware. For example, the processing unit may be a processing element separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a function of the processing unit may be called and executed by a processing element of the apparatus. The other units are implemented similarly. In addition, all or part of the units can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, the steps of the method or the units above may be implemented by hardware integrated logic circuits in a processor element or instructions in software.

For example, the above units may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when the above unit is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

As shown in fig. 9, a computer device 900 provided in an embodiment of the present invention is shown, where the computer device 900 includes: a communicator 901, a processor 902, and a memory 903; the communicator 901 is in communication connection with the light field camera array 904; the memory 903 is used for storing computer programs; the processor 902 is configured to run the computer program to implement the pedestrian counting method in the foregoing embodiment.

Optionally, the communicator 901, the processor 902 and the memory 903 may be connected by a system bus, that is, as shown by a thick line in the figure, the system bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, and the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. It should be noted that the system bus is only shown by a thick line in the figure, but does not indicate that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library).

The processor 902 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

The memory 903 may include a Random Access Memory (RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Preferably, in this embodiment, the computer device 900 is connected to the light field camera array 904 through a communication network 905, and the communication network 905 connection may be any suitable combination of one or more wired or wireless networks; the communication network may include any one or more of the internet of things, the internet, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a Digital Subscriber Line (DSL) network, a frame relay network, an Asynchronous Transfer Mode (ATM) network, a Virtual Private Network (VPN), and/or any other suitable communication network, and then the communicator 901 is implemented as a communication circuit that complies with the requirements of the network communication protocols.

The light field camera array 904 may integrate or connect with a corresponding communicator, thereby enabling access to the communication network 905 to transceive data.

While in this embodiment the computer device 900 may be a server implementation in a centralized network architecture, in other embodiments it may be a distributed network device implementation in a decentralized network architecture.

Specifically, a video stream can be acquired through a light field camera array installed in a certain scene, the video stream is transmitted to a server of a control center through a communication network, a program for counting the number of people is executed on the server, and then the number of people is output to a screen for being used by related people.

To achieve the above and other related objects, the present invention provides a computer storage medium storing a computer program; the computer program is run to implement the pedestrian counting method. The storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In summary, the pedestrian counting method, system, computer device and storage medium of the present invention acquire a plurality of pedestrian images acquired by the light field camera array; wherein the plurality of pedestrian images comprises: a pedestrian image captured under a field of view region of each camera in the light field camera array; respectively identifying the areas to which the pedestrians corresponding to the pedestrians in each pedestrian image belong through a target detector; performing first de-weighting on regions to which pedestrians belonging to the same pedestrian belong among pedestrian images of different cameras, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians; performing second pedestrian detection on other regions except the region where each pedestrian belongs in each pedestrian image, performing second duplicate removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians; counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number; based on the advantages of multiple visual angles, convenience in calculation and the like of the light field camera array, the deep learning high-performance target detection method is combined, the problem of missed detection is avoided, and a more accurate pedestrian counting result is obtained by matching with a pedestrian weight removing mode of deep learning.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A pedestrian counting method, characterized by comprising:

acquiring a plurality of pedestrian images acquired by a light field camera array; wherein the plurality of pedestrian images comprises: a pedestrian image captured under a field of view region of each camera in the light field camera array;

respectively identifying the areas to which the pedestrians corresponding to the pedestrians in each pedestrian image belong through a target detector;

converting two-dimensional image coordinates of pixel points in the pedestrian belonging area of each pedestrian image into three-dimensional world coordinates; performing first de-weighting on regions to which pedestrians belonging to the same pedestrian belong among pedestrian images of different cameras, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians;

converting two-dimensional image coordinates of pixel points in other areas except the area to which the pedestrian belongs of each pedestrian image into three-dimensional world coordinates; performing second pedestrian detection on other regions except the region where each pedestrian belongs in each pedestrian image, performing second duplicate removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians;

the second pedestrian detection is performed on other regions outside the region to which each pedestrian belongs in each pedestrian image, and the second pedestrian detection includes: corresponding to each pedestrian image, processing other areas in the pedestrian image by using a compensation detection model for regressing human head distribution to obtain a first human head distribution density image;

converting the three-dimensional world coordinate into a corresponding two-dimensional ground coordinate to obtain second mapping points corresponding to the two-dimensional ground coordinate, wherein each second mapping point corresponding to each pedestrian image forms a partial space coverage area of a camera for collecting the pedestrian image; eliminating repeated coverage areas among partial space coverage areas of different cameras in the light field camera array through a repulsion theorem so as to reserve the independent coverage areas of the cameras and obtain duplication-removing second human head distribution density images corresponding to the independent coverage areas;

wherein the first and second head distribution density images are used for integration to calculate the second pedestrian number;

and counting the first pedestrian number and the second pedestrian number to obtain the total pedestrian number.

2. The pedestrian counting method according to claim 1,

the target detector is realized through a multilayer full convolution neural network model, and generates a target bounding box for framing each pedestrian in the pedestrian image as the region to which the pedestrian belongs;

the target detector is trained through transfer learning to enable a target bounding box of the target detector to be only used for framing and identifying partial images classified as pedestrians;

and the transfer learning training is to train the multi-layer full convolution neural network model trained on the data set of the multi-target detection by using the data set only with the pedestrian bounding box.

3. The pedestrian counting method according to claim 2,

the specific method of the transfer learning training is based on a multilayer full convolution neural network model, the network structure of the model is unchanged, and only various data sets are changed into a single type, so that the content in one target bounding box only needs to be judged to be a person or a background; the classification loss function that improves the classification accuracy is:

wherein,

4. The pedestrian counting method according to claim 1, wherein the first deduplication comprises:

converting the three-dimensional world coordinate into a corresponding two-dimensional ground coordinate to obtain a first mapping point corresponding to the two-dimensional ground coordinate;

the method comprises the steps of pairing first mapping points which are respectively obtained from areas to which each pedestrian belongs in two pedestrian images, and judging whether areas to which the pedestrians to which a pair of pixel points corresponding to each pair of first mapping points respectively belong are collected under public view areas of two corresponding cameras or not;

if not, classifying the pair of first mapping points into different pedestrians respectively;

if yes, calculating the similarity degree between the pair of first mapping points;

judging whether the similarity degree is larger than a preset threshold value or not;

if yes, judging that the pair of first mapping points are classified into different pedestrians;

if not, the pair of first mapping points is judged to belong to the same pedestrian, and third duplication elimination is carried out.

5. The pedestrian counting method of claim 4, wherein the third deduplication comprises:

respectively inputting the regions of the pedestrians to which a pair of pixel points corresponding to each pair of first mapping points suspected to belong to the same pedestrian respectively belong into two neural networks of a twin neural network model;

calculating the similarity degree between the feature vectors respectively output at the last full-connection layer of the two neural networks;

judging whether the similarity degree between the feature vectors is larger than a preset threshold value or not;

if yes, judging that the areas to which the two pedestrians belong to the same pedestrian, and further removing the weight;

if not, judging that the areas to which the two pedestrians belong to different pedestrians.

6. The pedestrian counting method according to claim 4 or 5, wherein the calculation of the degree of similarity includes: any one of cosine distance, euclidean distance, normalized euclidean distance, mahalanobis distance, hamming distance, and manhattan distance.

7. A pedestrian counting system, comprising:

the communication unit is used for acquiring a plurality of pedestrian images acquired by the light field camera array; wherein the plurality of pedestrian images comprises: a pedestrian image captured under a field of view region of each camera in the light field camera array;

the processing unit is used for respectively identifying the areas to which the pedestrians correspond in each pedestrian image through the target detector; converting two-dimensional image coordinates of pixel points in the pedestrian belonging area of each pedestrian image into three-dimensional world coordinates; performing first de-weighting on regions to which pedestrians belonging to the same pedestrian belong among pedestrian images of different cameras, and counting the regions to which the pedestrians belong in each pedestrian image after the first de-weighting to obtain the number of the first pedestrians;

the processing unit is also used for converting the two-dimensional image coordinates of the pixel points in other areas except the area where the pedestrian of each pedestrian image belongs to into three-dimensional world coordinates; performing second pedestrian detection on other regions except the region where each pedestrian belongs in each pedestrian image, performing second duplicate removal on the part belonging to the same pedestrian in the detection result, and calculating according to the detection result after the second duplicate removal to obtain the number of the second pedestrians;

8. The pedestrian counting system of claim 7,

9. The pedestrian counting system of claim 8,

wherein,

10. The pedestrian counting system of claim 7, wherein the first deduplication comprises:

11. The pedestrian counting system of claim 10, wherein the third deduplication comprises:

12. The pedestrian counting system according to claim 10 or 11, wherein the calculation of the degree of similarity includes: any one of cosine distance, euclidean distance, normalized euclidean distance, mahalanobis distance, hamming distance, and manhattan distance.

13. A computer device, characterized in that the computer device comprises: a communicator, a processor, and a memory;

the communicator is in communication connection with the light field camera array;

the memory stores a computer program;

the processor is configured to run the computer program to implement the pedestrian counting method according to any one of claims 1 to 6.

14. The computer device of claim 13, wherein the computer device is connected to the light field camera array via a communications network.

15. A computer storage medium, characterized by a computer program stored therein; the computer program when executed implements a pedestrian counting method according to any one of claims 1 to 6.