CN108229352B - Standing detection method based on deep learning - Google Patents

Standing detection method based on deep learning Download PDF

Info

Publication number
CN108229352B
CN108229352B CN201711397963.XA CN201711397963A CN108229352B CN 108229352 B CN108229352 B CN 108229352B CN 201711397963 A CN201711397963 A CN 201711397963A CN 108229352 B CN108229352 B CN 108229352B
Authority
CN
China
Prior art keywords
standing
tracklet
frame
detection model
standing detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201711397963.XA
Other languages
Chinese (zh)
Other versions
CN108229352A (en
Inventor
邵奔驰
姜飞
申瑞民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201711397963.XA priority Critical patent/CN108229352B/en
Publication of CN108229352A publication Critical patent/CN108229352A/en
Application granted granted Critical
Publication of CN108229352B publication Critical patent/CN108229352B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/144Movement detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a standing detection method based on deep learning, which comprises the following steps: 1) collecting samples, wherein each sample comprises a sample picture and a corresponding annotation file; 2) establishing a standing detection model, wherein the standing detection model is based on a convolutional neural network structure and is trained by an R-FCN target detection algorithm based on the sample, and the standing detection model comprises a high-grade standing detection model and a low-grade standing detection model; 3) and performing standing detection on the video to be detected by using the trained standing detection model. Compared with the prior art, the invention has the advantages of high detection total rate and accuracy, suitability for complex classroom environment and the like.

Description

Standing detection method based on deep learning
Technical Field
The invention relates to an image processing technology, in particular to a standing detection method based on deep learning.
Background
In recording and broadcasting classrooms of traditional teaching and similar indoor monitoring, a system is needed that can automatically detect standing behavior in order to be able to assess the overall atmosphere and the participation level of the participants. However, the characteristics of the standing behavior are closely related to the height of the individual, the height of the individual in the same scene is different, and the height distribution in different scenes is also different, so that the detection of the standing behavior of the person in the traditional classroom environment is still a difficult task.
A student tracking and positioning method based on master-slave cameras uses two slave cameras and a master camera with a cloud platform device. The secondary camera can automatically or manually generate an interested area, detect whether students enter or leave the interested area by using a background difference method, and send a detection result to the primary camera; the main camera judges the number of standing students according to the information transmitted from the secondary camera, and selects a panoramic recording mode or a positioning recording mode according to the number of standing students, wherein the positioning recording mode detects the outlines of all moving objects by using an interframe difference method, and judges that the object with the highest outline center point is the standing student, and the flow chart is shown in fig. 1. Although the method has certain positioning accuracy, the following defects exist:
1. according to the method, the cameras are arranged on two sides of the blackboard, and the installation height is equal to the leveling position of the top of the head of the student after the student sits down, so that the cameras are basically positioned in front of the sight of the student, and psychological pressure is easily generated on the student. Also at this height, the student may touch the camera, either intentionally or unintentionally, causing a deviation in the results.
2. One master and one slave are required to complete the standing detection.
3. This method is less effective on pupils of lower grades because the pupils of lower grades do not differ much in standing height and sitting height.
Disclosure of Invention
The invention provides a standing detection method based on deep learning to overcome the defects in the prior art.
One of the purposes of the invention is to realize standing detection by only one camera.
The second purpose of the invention is to improve the detection effect on the students in the lower grades.
The invention also aims to serially connect the same standing behavior in different frames to avoid repeated counting of the same standing behavior in different frames.
The purpose of the invention can be realized by the following technical scheme:
a method for standing detection based on deep learning, the method comprising:
1) collecting samples, wherein each sample comprises a sample picture and a corresponding annotation file;
2) establishing a standing detection model, wherein the standing detection model is based on a convolutional neural network structure and is trained by an R-FCN target detection algorithm based on the sample, and the standing detection model comprises a high-grade standing detection model and a low-grade standing detection model;
3) and performing standing detection on the video to be detected by using the trained standing detection model.
The information of the markup document comprises the type of the standing person.
The standing person types include senior students, junior students, and teachers.
The establishment of the standing detection model specifically comprises the following steps:
201) training a basic standing model by using all samples;
202) and respectively carrying out further training on the basis of the basic standing model by using samples with the marks of the senior students and samples with the marks of the junior students to obtain a senior standing detection model and a junior standing detection model.
The method further comprises the steps of:
4) and tracking the standing according to the standing detection result of the previous frame and the current standing detection result.
The tracking specifically comprises:
401) acquiring a first image frame and detected coordinates of standing frames, wherein a tracklet array is correspondingly established for each standing frame, and the state is initialized to ALIVE;
402) acquiring a next image frame, judging whether the lens is changed or not, if so, changing the states of all the tracklet arrays into DEAD, reestablishing a new tracklet array, returning to the step 402), and if not, executing the step 403);
403) traversing all standing frames detected by the current image frame, and selecting a tracklet array which is best matched for each standing frame by utilizing a tracking algorithm;
404) and judging whether the state of the tracklet array which is not matched under the current image frame is ALIVE or not, if so, modifying the state to WAIT, if not, modifying the state to DEAD, and returning to the step 402) until all the image frames are processed.
The specific steps for judging whether the lens is changed are as follows:
acquiring two adjacent image frames, judging whether the difference value of the gray values between the two image frames is greater than a first threshold value, and if so, judging that the pixel points of the two image frames are changed;
and judging whether the proportion of the number of the changed pixel points between the two image frames to the total pixel points is larger than a second threshold value, if so, judging that the lens visual angle conversion occurs, and if not, judging that the lens visual angle conversion does not occur.
The selecting of the tracklet array with the best match is specifically as follows:
and selecting a tracklet closest to the standing frame, calculating the sum of the width difference and the length difference of the border of the tracklet and the standing frame, and judging that the standing frame is best matched with the tracklet when the sum is less than one third of the width of the standing frame and the coincidence ratio of the border of the tracklet and the standing frame is more than 0.3.
The method further comprises the steps of:
5) the number of standings obtained by tracking is counted.
Compared with the prior art, the invention has the following beneficial effects:
1) the invention marks the collected samples, subdivides the standing behavior into teachers and students, and the low-grade and high-grade, and effectively improves the reliability of the model.
2) The invention adopts a target detection algorithm based on deep learning to extract a large amount of data (such as twenty thousand samples) from real videos in a classroom, and can obtain a better detection result in a complex classroom environment.
3) The invention trains the high-grade and low-grade samples respectively to obtain a high-grade standing detection model and a low-grade standing detection model, thereby effectively solving the problem of large difference between high-grade standing and low-grade standing. Tests show that the full detection rate can reach 90% in the senior level and the subordinate level, and the accuracy rate is 80%.
4) The invention adopts the tracking algorithm to track the standing action and can serially connect the same standing action of different frames, thereby obtaining the data of the real standing times and providing a basis for further analysis and evaluation.
Drawings
Fig. 1 is a schematic flowchart of a conventional student tracking and positioning method;
FIG. 2 is a schematic flow chart of the present invention;
FIG. 3 is a schematic diagram of low grade stance behavior;
FIG. 4 is a schematic diagram of high grade stance behavior;
fig. 5 is a schematic diagram illustrating a tracking process of standing movement according to the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
As shown in fig. 2, the present invention provides a standing detection method based on deep learning, which includes the following steps:
1) collecting a sample
The samples were prepared in the format of a PASCAL VOC dataset, which provides a whole set of standardized excellent datasets for image identification and classification, using a label tool, LabelImg. The method mainly focuses on the standing of students, and the teacher also judges that the students stand when walking in a classroom, so that the standing behavior in the classroom is subdivided into the standing of the students and the standing of the teacher, and the standing of the students and the standing of the teacher are respectively marked when the samples are made.
In this embodiment, the number of samples used is 2 ten thousand. Each sample comprises a sample picture and a corresponding annotation file, the information of the annotation file comprises the type of a standing person, frames which are intercepted from a video and contain standing are stored in a JPEGImaps folder, and the corresponding annotation file is stored in exceptions. The standing person types include senior students, junior students, and teachers. The grade division of high and low grade is based on 1-3 grade as low grade, and grade division of four grade and above is high grade.
2) Establishing standing detection model
As shown in fig. 3 and 4, which are schematic diagrams of the standing behaviors of the senior students and the senior students, respectively, it can be seen that the standing and sitting behaviors of the senior students are not much different, and the standing and sitting behaviors of the senior students are much different. Therefore, if two situations are combined in one model, if the standing is detected basically in the lower grade, a lot of false detection is added to the video in the higher grade. The present invention divides and handles these two cases separately.
The establishment of the standing detection model specifically comprises the following steps:
201) training a base standing model by using an R-FCN model based on ResNet-101 and utilizing all samples of high and low grades together;
202) and respectively carrying out further training on the basis of the basic standing model by using samples with the marks of the senior students and samples with the marks of the junior students to obtain a senior standing detection model and a junior standing detection model. In the present embodiment, the fine-tuning is used to obtain a model for senior and a model for junior.
The frame used for training is cafe, the iteration number during training is 30000 times, and the training parameters are as follows:
base_lr:0.001
lr_policy:"step"
gamma:0.1
stepsize:10000
display:20
momentum:0.9
weight_decay:0.0005
and performing standing detection on the video to be detected by using the trained standing detection model.
In certain embodiments, the method further comprises the steps of:
4) and tracking the standing according to the standing detection result of the previous frame and the current standing detection result.
And the judgment of lens change is needed when the standing tracking is carried out. This is because in the surveillance video of the whole classroom, the lens will rotate, zoom in, zoom out, and the like, and the same standing position in different frames will change greatly, so that the matching cannot be performed effectively.
The algorithm for judging whether the change of the lens occurs is concretely as follows: converting the current frame and the previous frame into a gray-scale image, if the difference value of the gray-scale values between the two frames is greater than a first threshold thres0, judging that the pixel point changes, and when the proportion of the changed pixel point to the total pixel point exceeds a second threshold thres1, judging that the lens changes. Otherwise, judging that the shot is not changed. The number of thres0 used in this example is 20, and the number of thres1 is 0.2 (20%).
As shown in fig. 5, the specific process of standing tracking is as follows:
401) and acquiring a first image frame and detected coordinates of the standing frames, wherein a tracklet array is correspondingly established for each standing frame, the state is initialized to ALIVE, and the tracklet is used for recording tracking information.
402) And acquiring the next image frame, judging whether the lens is transformed, if so, changing the states of all the tracklet arrays into DEAD, reestablishing a new tracklet array, returning to the step 402), and if not, executing the step 403).
403) And traversing all the detected standing frames of the current image frame, and selecting a tracklet array with the best matching for each standing frame by using a tracking algorithm.
The specific process of selecting a tracklet array with the best match is as follows:
firstly, finding the nearest tracklet to each standing frame, calculating the sum of the width difference and the length difference between each standing frame and the nearest tracklet frame, and judging that the standing frame is matched with the current tracklet when the sum is less than one third of the width of the standing frame and the coincidence ratio of the tracklet frame to the standing frame is more than 0.3, otherwise, the standing frame has no matched tracklet. If the tracklet has been matched, the most recent tracklet of the stand frame is updated, and the stand frame is marked to determine whether the tracklet matches the most recent tracklet.
404) And judging whether the state of the tracklet array which is not matched under the current image frame is ALIVE or not, if so, modifying the state to WAIT, because one frame of tracking information disappears, the standing behavior of the frame is possibly not detected, otherwise, modifying the state to DEAD, marking the end of the tracking of the standing behavior, and returning to the step 402) until all the image frames are processed.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (6)

1. A standing detection method based on deep learning is characterized by comprising the following steps:
1) collecting samples, wherein each sample comprises a sample picture and a corresponding label file, the information of the label file comprises standing personnel types, the standing personnel types comprise senior students, junior students and teachers, the first to third grades are junior grades, and the grades of four grades and above are divided into senior grades;
2) establishing a standing detection model, wherein the standing detection model is based on a convolutional neural network structure and is trained by an R-FCN target detection algorithm based on the sample, and the standing detection model comprises a high-grade standing detection model and a low-grade standing detection model;
3) performing standing detection on the video to be detected by using the trained standing detection model;
the establishment of the standing detection model specifically comprises the following steps:
201) training a basic standing model by using all samples;
202) and respectively carrying out further training on the basis of the basic standing model by using samples with the marks of the senior students and samples with the marks of the junior students to obtain a senior standing detection model and a junior standing detection model.
2. The deep learning based standing detection method according to claim 1, further comprising the steps of:
4) and tracking the standing according to the standing detection result of the previous frame and the current standing detection result.
3. The method for detecting standing based on deep learning of claim 2, wherein the tracking is specifically:
401) acquiring a first image frame and detected coordinates of standing frames, wherein a tracklet array is correspondingly established for each standing frame, and the state is initialized to ALIVE;
402) acquiring a next image frame, judging whether the lens is changed or not, if so, changing the states of all the tracklet arrays into DEAD, reestablishing a new tracklet array, returning to the step 402), and if not, executing the step 403);
403) traversing all standing frames detected by the current image frame, and selecting a tracklet array which is best matched for each standing frame by utilizing a tracking algorithm;
404) and judging whether the state of the tracklet array which is not matched under the current image frame is ALIVE or not, if so, modifying the state to WAIT, if not, modifying the state to DEAD, and returning to the step 402) until all the image frames are processed.
4. The method for detecting standing according to claim 3, wherein the determining whether the shot is transformed specifically comprises:
acquiring two adjacent image frames, judging whether the difference value of the gray values between the two image frames is greater than a first threshold value, and if so, judging that the pixel points of the two image frames are changed;
and judging whether the proportion of the number of the changed pixel points between the two image frames to the total pixel points is larger than a second threshold value, if so, judging that the lens visual angle conversion occurs, and if not, judging that the lens visual angle conversion does not occur.
5. The deep learning based standing detection method according to claim 3, wherein the selecting the best matching one tracklet array is specifically:
and selecting a tracklet closest to the standing frame, calculating the sum of the width difference and the length difference between the border of the tracklet and the standing frame, and judging that the standing frame is best matched with the tracklet when the sum is less than one third of the width of the standing frame and the coincidence ratio of the border of the tracklet to the standing frame is more than 0.3.
6. The deep learning based standing detection method according to claim 2, further comprising the steps of:
5) the number of standings obtained by tracking is counted.
CN201711397963.XA 2017-12-21 2017-12-21 Standing detection method based on deep learning Expired - Fee Related CN108229352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711397963.XA CN108229352B (en) 2017-12-21 2017-12-21 Standing detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711397963.XA CN108229352B (en) 2017-12-21 2017-12-21 Standing detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN108229352A CN108229352A (en) 2018-06-29
CN108229352B true CN108229352B (en) 2021-09-07

Family

ID=62648359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711397963.XA Expired - Fee Related CN108229352B (en) 2017-12-21 2017-12-21 Standing detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN108229352B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215344B (en) * 2018-09-27 2021-06-18 中电科大数据研究院有限公司 Method and system for urban road short-time traffic flow prediction
CN109472226B (en) * 2018-10-29 2021-07-09 上海交通大学 Sleeping behavior detection method based on deep learning
CN110266984B (en) * 2019-07-01 2020-12-18 浙江大学 Intelligent analysis teaching recorded broadcast all-in-one is made a video recording to cloud platform
CN111310591A (en) * 2020-01-20 2020-06-19 复旦大学 Multi-type sample data making device and method
CN111243363B (en) * 2020-03-27 2021-11-09 上海松鼠课堂人工智能科技有限公司 Multimedia sensory teaching system
CN112686154B (en) * 2020-12-29 2023-03-07 杭州晨安科技股份有限公司 Student standing detection method based on head detection and picture sequence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803913A (en) * 2017-03-10 2017-06-06 武汉东信同邦信息技术有限公司 A kind of detection method and its device of the action that taken the floor for Auto-Sensing student
CN106931968A (en) * 2017-03-27 2017-07-07 广东小天才科技有限公司 Method and device for monitoring classroom performance of students
CN106941602A (en) * 2017-03-07 2017-07-11 中国铁道科学研究院 Trainman's Activity recognition method, apparatus and system
CN107146177A (en) * 2017-04-21 2017-09-08 阔地教育科技有限公司 A kind of tutoring system and method based on artificial intelligence technology

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9384396B2 (en) * 2014-09-29 2016-07-05 Xerox Corporation System and method for detecting settle down time using computer vision techniques
US20160104385A1 (en) * 2014-10-08 2016-04-14 Maqsood Alam Behavior recognition and analysis device and methods employed thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106941602A (en) * 2017-03-07 2017-07-11 中国铁道科学研究院 Trainman's Activity recognition method, apparatus and system
CN106803913A (en) * 2017-03-10 2017-06-06 武汉东信同邦信息技术有限公司 A kind of detection method and its device of the action that taken the floor for Auto-Sensing student
CN106931968A (en) * 2017-03-27 2017-07-07 广东小天才科技有限公司 Method and device for monitoring classroom performance of students
CN107146177A (en) * 2017-04-21 2017-09-08 阔地教育科技有限公司 A kind of tutoring system and method based on artificial intelligence technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
R-FCN: Object Detection via Region-based Fully Convolutional Networks;Jifeng Dai等;《30th Conference on Neural Information Processing Systems》;20161231;全文 *
一种基于卷积神经网络深度学习的人体行为识别方法;王忠民等;《计算机科学》;20161130;第43卷(第11A期);全文 *

Also Published As

Publication number Publication date
CN108229352A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN108229352B (en) Standing detection method based on deep learning
CN111507283B (en) Student behavior identification method and system based on classroom scene
ZA202300610B (en) System and method for crop monitoring
CN106845357A (en) A kind of video human face detection and recognition methods based on multichannel network
CN105354548A (en) Surveillance video pedestrian re-recognition method based on ImageNet retrieval
CN106022345B (en) A kind of high voltage isolator state identification method based on Hough forest
CN107292318B (en) Image significance object detection method based on center dark channel prior information
CN105869085A (en) Transcript inputting system and method for processing images
CN109711377B (en) Method for positioning and counting examinees in single-frame image monitored by standardized examination room
CN109376637A (en) Passenger number statistical system based on video monitoring image processing
CN105741375A (en) Large-visual-field binocular vision infrared imagery checking method
CN106339657B (en) Crop straw burning monitoring method based on monitor video, device
CN110163567A (en) Classroom roll calling system based on multitask concatenated convolutional neural network
CN108921038A (en) A kind of classroom based on deep learning face recognition technology is quickly called the roll method of registering
CN105930798A (en) Tongue image quick detection and segmentation method based on learning and oriented to handset application
CN103065163B (en) A kind of fast target based on static images detects recognition system and method
US20220148292A1 (en) Method for glass detection in real scenes
CN115719516A (en) Multichannel-based classroom teaching behavior identification method and system
CN113705510A (en) Target identification tracking method, device, equipment and storage medium
Yang SCB-dataset: A dataset for detecting student classroom behavior
CN103150552A (en) Driving training management method based on people counting
CN109117771A (en) Incident of violence detection system and method in a kind of image based on anchor node
CN116597438A (en) Improved fruit identification method and system based on Yolov5
CN115810163A (en) Teaching assessment method and system based on AI classroom behavior recognition
CN114627553A (en) Method for detecting classroom scene student behaviors based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210907