CN113947108B

CN113947108B - Football player tracking detection method based on YOLO V5

Info

Publication number: CN113947108B
Application number: CN202111201325.2A
Authority: CN
Inventors: 陈国栋; 陈文铿; 黄立萱; 林鸿强; 严铮; 方莉; 赵志峰
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2024-07-02
Anticipated expiration: 2041-10-15
Also published as: CN113947108A

Abstract

The invention relates to a football player tracking and detecting method based on YOLO V5. The parameters of the target frame are determined and the data-enhanced data set is trained by using a K-means clustering method, the prediction heads are integrated into the YOLO V5 by using the YOLO V5 and DeepSort algorithm, the targets are accurately positioned and tracked in a high-density scene, and the scene is captured in real time above the court by using an unmanned aerial vehicle. The invention can achieve higher model precision, can carry out high-accuracy identification on the object and the overlapped shielding object, can record the motion trail and details of the athlete in a whole game, helps us to better carry out data analysis on the athlete and the team where the athlete is located, such as more scores of the athlete in what area, better recovery after the coach and the athlete play, and helps the coach and the athlete to conduct targeted guidance and training and targeted defending deployment on the opponent team player; for the user, the user is helped to quickly find interesting sports video programs or segments.

Description

Football player tracking detection method based on YOLO V5

Technical Field

The invention relates to a football player tracking and detecting method based on YOLO V5.

Background

With the development of basketball, the world basketball level is gradually improved, more scientific basis is needed for the training method of basketball players as guidance, and the method for finding out the playing mode and habit of the players and pertinently improving important technologies such as shooting gesture, dribbling mode, basketball feeding mode and the like is greatly helpful for improving the technologies. At present, the recording mode is adopted, the recording mode can well record the race condition of the whole game, but the detail action of a player is difficult to be shot, the player and a coach can hardly find corresponding details when watching the recording mode after the game, the camera is at a fixed viewing angle during the recording mode, all viewing angles cannot be shot, blind spots of the visual field in certain places often exist, and the detail recording of the player is not detailed enough. Therefore, the requirements of the players and coaches can be met only by realizing the tracking detection of specific players, so that the player movement tracking by using high and new technologies such as unmanned aerial vehicles, computers and the like has important significance.

Disclosure of Invention

The invention aims to provide a football player tracking detection method based on a YOLO V5 according to the current target identification and tracking requirements of a coach and a football player on a scene, integrates a YOLO V5+ DeepSort algorithm, integrates 4 prediction heads to detect targets with different scales, and finally is applied to unmanned aerial vehicle shooting to accurately identify and track the motion trail of the football player and record.

In order to achieve the above purpose, the technical scheme of the invention is as follows: a method for detecting the tracking of a player based on YOLO V5 includes such steps as determining the parameters of a target frame by K-means clustering method and residual network model, training the data set with enhanced data, fusing DeepSort algorithm based on YOLO V5 algorithm to identify the target, integrating the prediction heads, and applying it to unmanned aerial vehicle shooting.

In one embodiment of the present invention, the method is specifically implemented as follows: firstly, clustering a target frame of a data set by using a K-means clustering algorithm to obtain a YOLO V5 model of a target character prior frame, taking a residual network model as a deep learning frame, improving detection classification precision by utilizing the characteristics of a residual block structure and a jump connection mechanism, and optimizing a loss function of a system. The corresponding mechanism is as follows:

H(x)＝F(x)+x

H (x) is the predicted value, x is the input feature, and F (x) is the residual. The 3-layer cross-over primary input feature x is taken.

Before the player gets on the scene, his photo details are divided into a plurality of groups as input targets. Features are first extracted from one set of input targets using one set of feature maps, and then features are extracted from the output targets and another set of input targets by another set of feature maps. This process is repeated several times until all input targets are processed. Finally, all the output targets are spliced, feature fusion is carried out through convolution kernels of 3x3, 21 x 21 and 48 x 48, the convolution kernels can also play a role of a filter, the feature maps of the high layer and the bottom layer have different receptive fields, the feature map of the high layer is responsible for detecting large targets such as ball cover numbers and trunk, and the feature map of the bottom layer is responsible for detecting detailed parts such as hand steps. The shooting process adopts unmanned aerial vehicle shooting, adopts CENTERNET object detector based on CNN, and comprises 2 parts, namely a CNN-based trunk for image feature extraction, and a detection head for predicting the class and Box of targets, wherein the motion of the sportsman is relatively intense due to the particularity of basketball motion, and the movement of the target is relatively intense, so four prediction heads are added to detect targets with different scales, and the prediction heads consist of a pre-defined number of feedforward networks. The output of each prediction head contains a class prediction and a prediction box, and the loss is calculated by calculating a bipartite matching loss, with the following formula:

The 4-head structure can relieve negative effects caused by severe target scale change, and can accurately position the target in a high-density moving scene.

Finally, the object is lost and overlapped, aiming at the needs, the idea of the innovative fusion DeepSort algorithm and the DeepSort algorithm is to input the iou (cross ratio) of a detection frame obtained by an object detection algorithm (such as YOLO) and a predicted tracking frame into the hungarian algorithm to carry out linear allocation so as to correlate the inter-frame IDs, and add the appearance information of the object into the calculation of the inter-frame matching, so that the ID can be correctly matched under the condition that the object is blocked but the follow-up reappears, thereby reducing the switching of the IDs and achieving the purpose of continuous tracking. Before tracking, detection has been completed for all targets, and a feature modeling process is implemented. Initializing with the detected target and creating a new tracker when the first frame comes in, and labeling id; the following frame comes in, and the state prediction and covariance prediction generated by the box of the preceding frame are obtained in the Kalman filter. Solving the IOU of the box detected by the tracker and all target state predictions, and calculating the distance between the object detection Bbox dj and the object tracking BBox yi by using the Markov distance (note: the reason why the Euclidean distance is not used is that the spatial domain distribution of dj and yi is different, and the calculation result that the Euclidean distance ignores the spatial domain distribution cannot accurately reflect the real distance of the dj and yi) has the following formula:

Obtaining the largest unique match (data association part) of the IOU through a Hungary assignment algorithm, removing the matching pair with the matching value smaller than the iou_threshold, and indicating that the two are associated when the distance between the two is smaller than or equal to a specific threshold value:

The Kalman tracker is updated by the matched target detection box in the frame, kalman gain, state update and covariance update are calculated, and the state update value is output as the tracking box of the frame. The tracker is reinitialized for targets that are not matched in the present frame. The Kalman tracker combines the history tracking record, adjusts the residual errors of the history box and the box of the frame, and better matches the tracking id.

Compared with the prior art, the invention has the following beneficial effects: the novel YOLO V5-based target detection algorithm integrates DeepSort algorithm and prediction heads, and is applied to an unmanned aerial vehicle, so that the novel YOLO V5-based target detection algorithm is named as a DP-YOLO V5 algorithm. Numerous experiments on VisDrone2021 datasets showed that DP-YOLO V5 has good performance and interpretability on unmanned aerial vehicle capture scenes. The AP result of DP-YOLOv on the DET-test-change dataset was 39.18% and an improvement of 1.81% over the previous SOTA method (DPNetV 3). In VisDrone Challenge 2021, the DP-YOLO V5 was increased by about 7% compared to YOLOv.

Drawings

Fig. 1 is a schematic diagram of the network structure of YOLO V5.

Fig. 2 is a workflow diagram of an embodiment of the present invention.

Fig. 3 is an extraction of feature points.

Fig. 4 is a specific recognition result.

Detailed Description

The technical scheme of the invention is specifically described below with reference to the accompanying drawings.

According to the method for detecting the player tracking based on the YOLO V5, the parameters of a target frame are determined and a data set with enhanced data is trained by using a K-means clustering method and a residual error network model, a DeepSort algorithm is fused on the basis of identifying a target based on the YOLO V5 algorithm (the network structure diagram of the YOLO V5 is shown in fig. 1), a prediction head is integrated, and the method is applied to unmanned aerial vehicle shooting, so that the aim can be accurately positioned in a high-density scene and continuous tracking and recording can be finally achieved.

The following is a specific implementation procedure of the present invention.

The embodiment provides a player motion tracking detection method based on a DP-YOLO V5 algorithm, which comprises a K-means dimension clustering algorithm for clustering and optimizing a target frame of a dataset, a YOLO V5 target recognition detection algorithm and a DeepSort algorithm.

The following describes the specific implementation steps in conjunction with the workflow diagram of fig. 2:

And step 1, collecting an experimental data set. The experimental data set of the experiment is mainly composed of dynamic and static photos of various players, and photos of different players wearing different ball covers in different periods are added, so that in order to improve the detection effect, the data expansion is carried out on the existing data set, and the processing of random rotation, random translation, random deformation, random scaling, mirror image overturning and the like is carried out on part of the pictures.

And step 2, obtaining proper prior frame parameters through a K-means clustering algorithm. And (5) finding out the k value by comparing the convergence conditions of different k values to the loss curve. The k value set in this experiment was 6. The image is subjected to graying processing, and noise reduction is performed on the image using gaussian filtering.

And 3, training the YOLO V5 model in a residual error network. And training the expanded data set, wherein in the training process, different convolution kernels are utilized to obtain feature maps with different scales, the feature map of a high layer is responsible for detecting large targets such as a ball jacket number, a trunk and the like, the feature map of a bottom layer is responsible for detecting detailed parts such as hand steps and the like, and then a part of data is left as a test set. And then the graphics are spliced together to perform feature point comparison. Fig. 3 is an extraction of feature points.

And 4, integrating 4 predition heads targets with different scales, wherein the prediction heads consist of a predefined number of feedforward networks, and detecting the positions and the categories of the targets through the extracted feature graphs to complete feature modeling.

And 5, fusing DeepSort an algorithm, inputting the iou (cross-over ratio) of the detection frame obtained by the target detection algorithm YOLO V5 and the predicted tracking frame into the hungarian algorithm for linear allocation so as to initialize the detected target and create a new tracker.

And 6, loading a high-definition camera and a CENTERNET object detector based on CNN on the unmanned plane. If the detection target is far away from the unmanned aerial vehicle, tracking the detection target by utilizing the mobility of the unmanned aerial vehicle and the zooming function of the high-definition camera. Fig. 4 is a specific recognition result.

And 7, after the camera finishes recording, analyzing the required data through a corresponding data analysis system. For coaches and players to replay after the competition.

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. A football player tracking detection method based on YOLO V5 is characterized in that a K-means clustering method and a residual error network model are utilized to determine parameters of a target frame and train a data set with enhanced data, a DeepSort algorithm is fused on the basis of identifying a target based on the YOLO V5 algorithm, a prediction heads are integrated, and the method is applied to unmanned aerial vehicle shooting, and finally, the purpose of accurately positioning the target in a high-density scene and continuously tracking and recording is achieved;

The method comprises the following specific implementation modes: firstly, clustering a target frame of a data set by a K-means clustering algorithm, clustering a YOLO V5 model of a target character prior frame, taking a residual network model as a deep learning frame, and optimizing a loss function of a system by utilizing the characteristics of a residual block structure and a jump connection mechanism; then, dividing the data set into a plurality of groups, extracting features through the feature maps of the plurality of groups, and after convolution kernels with different sizes are passed, the feature map of the high layer is responsible for detecting a large object comprising a ball jacket number and a trunk, and the feature map of the bottom layer is responsible for detecting a detail part comprising a hand and a foot, so that the aim of simultaneously detecting a plurality of objects is fulfilled; furthermore, based on the mobility of athletes, 4 prediction heads are integrated, the negative influence caused by severe target scale change can be relieved by a 4-head structure, and the targets can be accurately positioned in a high-density moving scene; secondly, integrating DeepSort algorithm, inputting the iou intersection ratio of the detection frame obtained by the YOLO V5 algorithm and the predicted tracking frame into the hungarian algorithm to perform linear allocation to correlate the inter-frame IDs, and adding the appearance information of the target into the calculation of inter-frame matching so as to achieve the purposes of correctly matching the corresponding IDs under the condition that the target is shielded but appears again later, reducing the switching of the IDs and achieving continuous tracking; finally, the tracking and recording can be completed by being applied to unmanned aerial vehicle shooting and matched with the mobility of the unmanned aerial vehicle.