CN107194559B - Workflow identification method based on three-dimensional convolutional neural network - Google Patents
Workflow identification method based on three-dimensional convolutional neural network Download PDFInfo
- Publication number
- CN107194559B CN107194559B CN201710335309.XA CN201710335309A CN107194559B CN 107194559 B CN107194559 B CN 107194559B CN 201710335309 A CN201710335309 A CN 201710335309A CN 107194559 B CN107194559 B CN 107194559B
- Authority
- CN
- China
- Prior art keywords
- frame
- workflow
- neural network
- video
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0633—Workflow analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Computing Systems (AREA)
- Development Economics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Multimedia (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a workflow identification method based on a three-dimensional convolutional neural network. Only different process tasks are divided in advance and different action behaviors are labeled manually in the process of analyzing the video, so that the method is not in line with the automation requirement of intelligent manufacturing. The invention firstly provides an interframe difference method with a self-adaptive threshold value, which is mainly used for dividing the region of a moving object from a complex background, thereby reducing the time complexity of subsequent feature extraction and model training; secondly, the 3D convolutional neural network is improved to be capable of fully adapting to a factory environment with a plurality of monitoring devices, and for different views, views at different angles are fused according to weights by adopting a view pooling layer; finally, a new action division method is provided, and continuous production actions in the video are automatically divided, so that an automatic workflow identification process is realized.
Description
Technical Field
The invention belongs to the technical field of workflow identification, and is used for quickly and accurately identifying and detecting a production and manufacturing process.
Background
The intelligent manufacturing is a further development direction of manufacturing automation, and the artificial intelligence technology is widely applied to various links of engineering design, process design, production scheduling, fault diagnosis and the like in the industrial manufacturing process, so that the manufacturing process is intelligent, and the productivity is greatly improved. Workflow recognition (workflow recognition) has attracted attention from the industry and the scientific research community as an important technical direction for intelligent manufacturing. The camera installed in a manufacturing workshop is utilized to shoot the whole process of production scheduling on a production line, then the video is calculated and processed, and the industrial production flow is identified and detected quickly and accurately, so that the method plays an important role in protecting the personal safety of staff, reducing production overhead, ensuring product quality, optimizing production scheduling and flow specification.
However, workflow identification techniques have their complexities and specificities. Firstly, because various machines, transport vehicles, auxiliary equipment and other objects in a production workshop are more and often shielded from each other, and the similarity of different process operations and frequent light intensity changes in the workshop bring challenges to the analysis and identification of videos and images. Furthermore, the dynamic production workflow process makes the identification process rather complex and prone to bias: for example, different tasks in a workflow tend to have different execution times, and there is no explicit definition between the start and end of a task; these tasks may even involve both human and machine actions, and some of these workflow-independent actions must be distinguished from the actual production task. These aspects make conventional motion/pose recognition methods that rely on target object detection and tracking difficult to adapt to complex factory manufacturing environments. In addition, some researchers have developed partial research on workflow identification technology, but how to automatically divide the production process/action of the image sequence in the video is not clearly defined, and most of them only divide the different process tasks in advance and manually mark different action behaviors in the process of analyzing the video, which obviously does not meet the automation requirement of intelligent manufacturing.
Disclosure of Invention
Aiming at the current research situation, the invention provides a workflow identification framework with stronger robustness. In the framework, firstly, an interframe difference method with an adaptive threshold is provided, and the method is mainly used for dividing a region of a moving object from a complex background, so that the time complexity of subsequent feature extraction and model training is reduced; secondly, the 3D convolutional neural network is improved to be capable of fully adapting to a factory environment with a plurality of monitoring devices, and for different views, views at different angles are fused according to weights by adopting a view pooling layer; finally, a new action division method is provided, and continuous production actions in the video are automatically divided, so that an automatic workflow identification process is realized.
The method comprises the following specific steps:
step (1), exporting a workflow video containing multiple visual angles from a data set, and acquiring the video resolution and the frame number of the workflow video at each visual angle;
initializing an interframe difference threshold of workflow videos of all visual angles; respectively carrying out the steps (3) to (11) on the workflow video of each visual angle;
step (3), setting t to be 2;
reading three continuous video frames of t-1, t and t +1, and carrying out graying and median filtering processing on the three video frames;
step 5, performing interframe difference operation on the previous two frames and the next two frames respectively to obtain two interframe difference images;
step (6), dynamically updating an interframe difference threshold according to the two interframe difference images obtained in the step (5); the method for dynamically updating the interframe difference threshold comprises the following steps:
6.1 set 1 for l, t frame inter-frame difference thresholddkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
performing binarization processing on the current frame according to the inter-frame differential threshold obtained in the step (6), wherein pixel points larger than the inter-frame differential threshold are set as 1, and pixel points smaller than the inter-frame differential threshold are set as 0;
step (8), operating and operating the two previous and next interframe difference images to obtain three frame difference images, and acquiring the center coordinates of the interest points by using a block extraction method;
step (9), segmenting the extracted interest points from the original image of the current frame;
step (10), gradually adding 1 to the value of t, and repeatedly executing the steps (4) to (9) until the value of t is 1 less than the value of the last frame of the workflow video, wherein the segmentation size of the step (9) is unchanged in the repeated process; storing the interest point images obtained in the step (9) in each repeated process as interest point videos according to the sequence, and classifying the interest point videos according to the classification rules in the data set;
step (11), randomly selecting 90% of the interest point videos obtained in the step (10) as a training set, and taking the rest as a test set;
step (12), constructing a multi-view three-dimensional convolution neural network, and initializing the number of training rounds to be 5000; the multi-view three-dimensional convolution neural network construction method comprises the following steps:
12.1 convolution and pooling operations are as follows:
initializing a four-dimensional convolution kernel with the size of 9 × 10 for the first convolution layer, wherein an activation function is sigmoid, the window size of the first pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 7 × 30 for the second convolution layer, wherein the activation function is sigmoid, the window size of the second pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 8 × 5 × 50 for the third convolution layer, wherein the activation function is sigmoid, the window size of the third pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 4 x 3 x 150 for the fourth convolution layer, wherein the activation function is sigmoid, the window size of the fourth pooling layer is 2, and the step length is 2;
12.2 initializing each characteristic graph weight parameter in the weighted average view pooling layerIs [0,1 ]]A medium random value, andthe weighted average view pooling operation in the weighted average view pooling layer is as follows:
wherein a is the weighted average characteristic graph after the weighted average view pooling operation, t1The sequence number of the pooling profile after the convolution and pooling operations,is a serial number t1The corresponding pooled feature map is weighted, exp represents an exponential function with e as the base,is a serial number t1Corresponding pooling profiles;
12.3, respectively initializing a convolution kernel of 3000 × 1500 and 1500 × 750 for the first two fully-connected layers, and setting an activation function as Relu; inputting the weighted average characteristic graph after the weighted average view pooling operation into the front two fully-connected layers;
12.4 initialize a 750 × 14 convolution kernel for the last fully connected layer and set the Softmax classification function.
Step (13), randomly selecting 20 videos from a training set corresponding to the workflow videos of each visual angle, inputting the 20 videos into the multi-view three-dimensional convolution neural network in the step (12) for feature training, and outputting training errors;
step (14), randomly selecting 10 videos from a training set corresponding to the workflow videos of each visual angle, inputting the 10 videos into a multi-view three-dimensional convolutional neural network for verification, and obtaining the accuracy of classification and identification of the multi-view three-dimensional convolutional neural network;
step (15), repeating the steps (13) to (14), and subtracting 1 from the number of training rounds each time until the number of training rounds is 0 to obtain a trained multi-view three-dimensional convolution neural network;
step (16), testing the multi-view three-dimensional convolution neural network in the step (15) by using a test set corresponding to the workflow video of each visual angle;
step (17), acquiring the resolution and the frame number of the newly input workflow video, and initializing an interframe difference threshold; setting t to be 2;
step (18), extracting the center coordinates of the interest points of two adjacent frames according to the steps (4) to (8), calculating the distance between the two center coordinates, and marking the distance as a motion state S if the distance is greater than a set threshold value T1Otherwise, the flag is in a relatively static state S0;
Step (19), gradually adding 1 to the t value, repeating the step (18) until the t value is 1 less than the last frame value of the newly input workflow video, and counting continuous S0And S1When S is detected0Or S1When the number is greater than or equal to N, continuous S is divided0Or S1Storing the target interest points in the corresponding frames into a frame queue, otherwise discarding the continuous S0Or S1Corresponding to the frame.
Step (20), each successive S of the frame queue0Or S1And extracting continuous key frames from the ith frame in the corresponding frame set, wherein i is more than 5, and the number of the key frames is the same as that of each section of classified video in the data set.
Step (21), inputting the videos formed by the key frames in the step (20) according to the sequence into the multi-view three-dimensional convolution neural network trained in the step (15) to classify and recognize the staff behaviors;
and (22) comparing the behavior type obtained in the step (21) with a predefined standard workflow.
The invention has the following beneficial effects:
the workflow identification method based on the three-dimensional convolution neural network mainly comprises the following functional modules: the device comprises a moving object segmentation module, a behavior identification module and an action division module.
The moving target segmentation module mainly realizes the segmentation of target interest points from image and video sequences. Because the target motion in the workflow video sequence is relatively large, and the background is basically in a static state, the two frames of images before and after can be subtracted to obtain an inter-frame difference image, and then the moving target can be segmented according to the size relation between the pixel difference and the threshold value. The adopted self-adaptive three-frame difference method is to perform AND operation on the inter-frame difference images obtained by the first two frames and the second two frames in the three video frames to obtain three-frame difference images, and the setting of the threshold value is automatically adjusted according to the previous inter-frame difference image, so that the influence of noise can be effectively avoided;
and the behavior recognition module performs behavior recognition on the moving target by utilizing the 3D convolutional neural network and the multi-view learning capability. To achieve multi-view fusion, we use a view-pooling layer (view-pooling layer) to fuse the global view information. Multiple independent 3D-CNNs are involved in the multi-view 3D-CNNs for extracting features from image sequences of different views; then, fusing feature descriptors extracted from different views in a view pooling layer and learning view-related features; finally, a full connected neural network (FNN) with a softmax classifier is used for final identification;
the action partitioning module defines two states: a moving state and a relatively stationary state. And (4) taking the central coordinate of the interest point of each frame, wherein when the interest point moves, the central coordinate of the interest point also moves. At this time, the difference between the center coordinates of the interest points of two adjacent frames can be taken to represent the state of the current interest point. Dynamic and static partitioning is achieved in this way;
the workflow identification method provided by the invention can effectively solve two problems to be solved in workflow identification under complex environment, wherein the first problem is that objects such as various machines, transport vehicles and auxiliary instruments in a production workshop are shielded from each other, the similarity of different process operations and the influence of frequent light intensity change in the workshop on workflow identification, and the second problem is how to automatically divide the production process/action of an image sequence in a video.
Drawings
FIG. 1 is a schematic diagram of a multi-view three-dimensional convolutional neural network construction;
fig. 2 is a schematic diagram of the division of the working flow.
Detailed Description
The invention is further illustrated by the following figures and examples.
First, concept definition and symbol description are performed:
interframe difference thresholdt denotes the current frame number, l ≧ 1 denotes the recursion order, dkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
a: the weighted average view pools the weighted average profiles after the operation.
t1: sequence number of pooled feature map after convolution and pooling operations.
Secondly, the workflow identification method based on the three-dimensional convolution neural network comprises the following implementation steps:
(1) and (3) moving object segmentation: video monitoring equipment on a production line is often erected at a higher position, so that most of areas in a monitoring picture are factory backgrounds irrelevant to workflow identification, and if feature vectors are directly extracted from the whole monitoring picture, the difficulty of feature extraction and the time consumption of calculation are greatly increased. Therefore, the three-frame difference method using the adaptive threshold value segments the moving object (interest point) part in the video, thereby reducing the workload of the later steps. Specifically, the method comprises the following steps:
(1.1) exporting the multi-view workflow video from the data set, and acquiring the video resolution and the frame number of the workflow video at each view;
(1.2) initializing an interframe difference threshold of workflow videos of all visual angles; setting t to 2, the steps (1.3) - (1.9) are performed for each view workflow video respectively
(1.3) reading a video frame t and two adjacent frames t-1 and t +1 thereof, and carrying out graying and median filtering processing on the three video frames;
(1.4) performing interframe difference operation on the first two frames and the second two frames respectively to obtain two interframe difference images;
(1.5) dynamically updating the interframe difference threshold according to the two interframe difference images obtained in the step (1.4), wherein the updating method comprises the following steps:
(1.5.1) setting l as 1, and setting the t frame interframe difference threshold valuedkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
(1.6) carrying out binarization processing on the current frame (namely the middle frame) according to the inter-frame differential threshold obtained in the step (1.5), wherein pixel points larger than the inter-frame differential threshold are set as 1, and pixel points smaller than the inter-frame differential threshold are set as 0;
(1.7) performing AND operation on the front differential image and the rear differential image to obtain three frames of differential images, and acquiring the center coordinates of the interest points by using a Blob Extraction method;
(1.8) segmenting the extracted interest points from the original image of the current frame;
(1.9) gradually adding 1 to the t value, and repeatedly executing the steps (1.3) - (1.8) until the t value is 1 less than the value of the last frame of the workflow video, wherein in the repeated process, the segmentation size in the step (1.8) is unchanged; storing the interest point images obtained in the step (1.8) in each repeated process as interest point videos according to the sequence, and classifying the interest point videos according to the classification rules in the data set;
(2) behavior identification based on a multi-view three-dimensional convolutional neural network: after the current manufacturing production line is inspected, the fact that the multiple cameras are adopted for synchronous real-time monitoring from different angles in the same working scene in the current manufacturing production line is often found, and therefore the quality of products and the safety of staff are guaranteed. By utilizing the characteristic, the influence of a complex environment of a factory on behavior recognition is effectively reduced by using a multi-view feature extraction and fusion method, and the accuracy of the behavior recognition is improved. The specific execution steps are as follows:
(2.1) selecting 90% of the interest point videos obtained in the step (1) as a training set, and taking the rest as a test set;
(2.2) constructing a multi-view three-dimensional convolution neural network (see the attached figure 1). The number of the initial training rounds is 5000, and the multi-view three-dimensional convolution neural network construction method comprises the following steps:
the operation processes of convolution and pooling are (2.2.1) - (2.2.4):
(2.2.1) initializing a four-dimensional convolution kernel of size 9 x 10 for the first convolution layer, the activation function being sigmoid, the first pooling layer window size being 2, and the step size being 2;
(2.2.2) initializing a four-dimensional convolution kernel of size 9 x 7 x 30 for the second convolutional layer, with an activation function of sigmoid, a second pooling layer window size of 2, and a step size of 2;
(2.2.3) initializing a four-dimensional convolution kernel of size 9 x 8 x 5 x 50 for the third convolutional layer, with an activation function of sigmoid, a third pooling layer window size of 2, and a step size of 2;
(2.2.4) initializing a four-dimensional convolution kernel with a size of 4 x 3 x 150 for the fourth convolution layer, with an activation function of sigmoid, a fourth pooling layer window size of 2, and a step size of 2;
(2.2.5) initializing feature map weight parameters in the weighted-average view pooling layerIs [0,1 ]]A medium random value, andthe weighted average view pooling layer (WAVP) is calculated as follows:
(2.2.6) initializing a convolution kernel of 3000 × 1500 and 1500 × 750 for the first two fully connected layers respectively, and setting the activation function to Relu; inputting the weighted average characteristic graph after the weighted average view pooling operation into the front two fully-connected layers;
(2.2.7) initialize a 750 x 14 convolution kernel for the last fully-connected layer and set the Softmax classification function, where 14 is the kind of action.
(2.3) randomly selecting 20 videos from the training set of the workflow videos of all the visual angles, inputting the 20 videos into the multi-view three-dimensional convolution neural network in the step (2.2) for feature training, and outputting training errors;
(2.4) randomly selecting 10 videos from the training set of the workflow videos of each visual angle, inputting the 10 videos into the multi-view three-dimensional convolutional neural network for verification, and obtaining the accuracy of classification and identification of the multi-view three-dimensional convolutional neural network;
(2.5) repeating the steps (2.3) - (2.4), and subtracting 1 from the number of training rounds each time until the number of training rounds is 0 to obtain a trained multi-view three-dimensional convolutional neural network;
(2.6) testing the multi-view three-dimensional convolution neural network in the (2.5) by using a test set corresponding to the workflow video of each visual angle;
(3) the action division method based on the state comprises the following steps: in an actual environment, the motions of the workers often occur continuously, and in this case, if the motions are to be recognized, the motions need to be divided first, and then each motion can be recognized separately. It has been observed that a displacement occurs between the operations of the worker from taking the part, handling the part to placing the part, and the worker from taking the welding tool to welding the part (see fig. 2). Therefore, the motion can be divided according to the motion state of the worker. The specific execution steps are as follows:
(3.1) acquiring the resolution and the frame number of the newly input video, and initializing an interframe difference threshold; setting t to be 2;
(3.2) extracting the center coordinates of the interest points of two adjacent frames according to the steps (1.3) to (1.7), calculating the distance between the two center coordinates, and marking the distance as a motion state S if the distance is greater than a threshold value T set by people1Otherwise, the flag is in a relatively static state S0;
(3.3) t fetchGradually adding 1 to the value, repeating the step (3.2) until the value of t is 1 less than the value of the last frame of the newly input video, and counting the continuous S0And S1Number of S when successive S are detected0Or S1When the number is larger than or equal to N, N is larger than 10, and continuous S is divided by the method of (1.8)0Or S1Storing the target interest points in the corresponding frames into a frame queue, otherwise discarding the continuous S0Or S1Corresponding to the frame.
(3.4) respective successive S of the frame queue0Or S1And extracting continuous key frames from the ith frame in the corresponding frame set, wherein i is more than 5, so that the number of the key frames is the same as that of each classified video in the data set.
(3.5) inputting the video formed by the key frames in the step (3.4) in sequence into the trained multi-view three-dimensional convolution neural network in the step (2.5) to classify and recognize the staff behaviors;
and (3.6) comparing the behavior categories obtained in (3.5) with a predefined standard workflow.
Claims (1)
1. A workflow identification method based on a three-dimensional convolution neural network is characterized by comprising the following steps: the method comprises the following specific steps:
step (1), exporting a workflow video containing multiple visual angles from a data set, and acquiring the video resolution and the frame number of the workflow video at each visual angle;
initializing an interframe difference threshold of workflow videos of all visual angles; respectively carrying out the steps (3) to (11) on the workflow video of each visual angle;
step (3), setting t to be 2;
reading three continuous video frames of t-1, t and t +1, and carrying out graying and median filtering processing on the three video frames;
step 5, performing interframe difference operation on the previous two frames and the next two frames respectively to obtain two interframe difference images;
step (6), dynamically updating an interframe difference threshold according to the two interframe difference images obtained in the step (5); the method for dynamically updating the interframe difference threshold comprises the following steps:
6.1 setting l 1, t frameInterframe difference thresholddkIs the pixel value of the kth pixel in the inter-frame difference map, max { d }kIs the maximum value of pixel values in the difference image between frames, min { d }kThe pixel value in the difference image between frames is the minimum value;
performing binarization processing on the current frame according to the inter-frame differential threshold obtained in the step (6), wherein pixel points larger than the inter-frame differential threshold are set as 1, and pixel points smaller than the inter-frame differential threshold are set as 0;
step (8), operating and operating the two previous and next interframe difference images to obtain three frame difference images, and acquiring the center coordinates of the interest points by using a block extraction method;
step (9), segmenting the extracted interest points from the original image of the current frame;
step (10), gradually adding 1 to the value of t, and repeatedly executing the steps (4) to (9) until the value of t is 1 less than the value of the last frame of the workflow video, wherein the segmentation size of the step (9) is unchanged in the repeated process; storing the interest point images obtained in the step (9) in each repeated process as interest point videos according to the sequence, and classifying the interest point videos according to the classification rules in the data set;
step (11), randomly selecting 90% of the interest point videos obtained in the step (10) as a training set, and taking the rest as a test set;
step (12), constructing a multi-view three-dimensional convolution neural network, and initializing the number of training rounds to be 5000; the multi-view three-dimensional convolution neural network construction method comprises the following steps:
12.1 convolution and pooling operations are as follows:
initializing a four-dimensional convolution kernel with the size of 9 × 10 for the first convolution layer, wherein an activation function is sigmoid, the window size of the first pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 7 × 30 for the second convolution layer, wherein the activation function is sigmoid, the window size of the second pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 9 × 8 × 5 × 50 for the third convolution layer, wherein the activation function is sigmoid, the window size of the third pooling layer is 2, and the step length is 2;
initializing a four-dimensional convolution kernel with the size of 4 x 3 x 150 for the fourth convolution layer, wherein the activation function is sigmoid, the window size of the fourth pooling layer is 2, and the step length is 2;
12.2 initializing each characteristic graph weight parameter in the weighted average view pooling layerIs [0,1 ]]A medium random value, andthe weighted average view pooling operation in the weighted average view pooling layer is as follows:
wherein a is the weighted average characteristic graph after the weighted average view pooling operation, t1For the pools after convolution and pooling operationsThe serial number of the feature map is changed,is a serial number t1The corresponding pooled feature map is weighted, exp represents an exponential function with e as the base,is a serial number t1Corresponding pooling profiles;
12.3, respectively initializing a convolution kernel of 3000 × 1500 and 1500 × 750 for the first two fully-connected layers, and setting an activation function as Relu; inputting the weighted average characteristic graph after the weighted average view pooling operation into the front two fully-connected layers;
12.4, initializing a 750 × 14 convolution kernel for the last fully-connected layer and setting a Softmax classification function;
step (13), randomly selecting 20 videos from the training set corresponding to the workflow videos of each visual angle, inputting the videos into the multi-view three-dimensional convolution neural network in the step (12) for feature training, and outputting training errors;
step (14), randomly selecting 10 videos from a training set corresponding to the workflow videos of each visual angle, inputting the 10 videos into a multi-view three-dimensional convolutional neural network for verification, and obtaining the accuracy of classification and identification of the multi-view three-dimensional convolutional neural network;
step (15), repeating the steps (13) to (14), and subtracting 1 from the number of training rounds each time until the number of training rounds is 0 to obtain a trained multi-view three-dimensional convolution neural network;
step (16), testing the multi-view three-dimensional convolution neural network in the step (15) by using a test set corresponding to the workflow video of each visual angle;
step (17), acquiring the resolution and the frame number of the newly input workflow video, and initializing an interframe difference threshold; setting t to be 2;
step (18), extracting the center coordinates of the interest points of two adjacent frames according to the steps (4) to (8), calculating the distance between the two center coordinates, and marking the distance as a motion state S if the distance is greater than a set threshold value T1Otherwise, mark asRelative stationary state S0;
Step (19), gradually adding 1 to the t value, repeating the step (18) until the t value is 1 less than the last frame value of the newly input workflow video, and counting continuous S0And S1When S is detected0Or S1When the number is larger than or equal to N, N is larger than 10, and continuous S is divided0Or S1Storing the target interest points in the corresponding frames into a frame queue, otherwise discarding the continuous S0Or S1Corresponding frames;
step (20), each successive S of the frame queue0Or S1Extracting continuous key frames from the ith frame in the corresponding frame set, wherein i is more than 5, so that the number of the key frames is the same as the number of the frames of each classified section of video in the data set;
step (21), inputting the videos formed by the key frames in the step (20) according to the sequence into the multi-view three-dimensional convolution neural network trained in the step (15) to classify and recognize the staff behaviors;
and (22) comparing the behavior type obtained in the step (21) with a predefined standard workflow.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710335309.XA CN107194559B (en) | 2017-05-12 | 2017-05-12 | Workflow identification method based on three-dimensional convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710335309.XA CN107194559B (en) | 2017-05-12 | 2017-05-12 | Workflow identification method based on three-dimensional convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107194559A CN107194559A (en) | 2017-09-22 |
CN107194559B true CN107194559B (en) | 2020-06-05 |
Family
ID=59873285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710335309.XA Active CN107194559B (en) | 2017-05-12 | 2017-05-12 | Workflow identification method based on three-dimensional convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107194559B (en) |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10032136B1 (en) * | 2012-07-30 | 2018-07-24 | Verint Americas Inc. | System and method of scheduling work within a workflow with defined process goals |
KR102522350B1 (en) * | 2017-06-14 | 2023-04-14 | 가부시키가이샤 한도오따이 에네루기 켄큐쇼 | Imaging devices and electronic devices |
CN107798297B (en) * | 2017-09-28 | 2021-03-23 | 成都大熊智能科技有限责任公司 | Method for automatically extracting stable frame based on inter-frame difference |
CN107766292B (en) * | 2017-10-30 | 2020-12-29 | 中国科学院计算技术研究所 | Neural network processing method and processing system |
CN108875931B (en) * | 2017-12-06 | 2022-06-21 | 北京旷视科技有限公司 | Neural network training and image processing method, device and system |
CN108010538B (en) * | 2017-12-22 | 2021-08-24 | 北京奇虎科技有限公司 | Audio data processing method and device and computing equipment |
CN108447048B (en) * | 2018-02-23 | 2021-09-14 | 天津大学 | Convolutional neural network image feature processing method based on attention layer |
CN108235003B (en) * | 2018-03-19 | 2020-03-06 | 天津大学 | Three-dimensional video quality evaluation method based on 3D convolutional neural network |
CN108681690B (en) * | 2018-04-04 | 2021-09-03 | 浙江大学 | Assembly line personnel standard operation detection system based on deep learning |
CN109065165B (en) * | 2018-07-25 | 2021-08-17 | 东北大学 | Chronic obstructive pulmonary disease prediction method based on reconstructed airway tree image |
CN109068174B (en) * | 2018-09-12 | 2019-12-27 | 上海交通大学 | Video frame rate up-conversion method and system based on cyclic convolution neural network |
CN109145874B (en) * | 2018-09-28 | 2023-07-04 | 大连民族大学 | Application of measuring difference between continuous frames of video and convolution characteristic diagram in obstacle detection of vision sensing part of autonomous automobile |
CN110969217B (en) * | 2018-09-28 | 2023-11-17 | 杭州海康威视数字技术股份有限公司 | Method and device for image processing based on convolutional neural network |
CN109409294B (en) * | 2018-10-29 | 2021-06-22 | 南京邮电大学 | Object motion trajectory-based classification method and system for ball-stopping events |
CN109635843B (en) * | 2018-11-14 | 2021-06-18 | 浙江工业大学 | Three-dimensional object model classification method based on multi-view images |
CN109711454B (en) * | 2018-12-21 | 2020-07-31 | 电子科技大学 | Feature matching method based on convolutional neural network |
CN110704653A (en) * | 2019-09-09 | 2020-01-17 | 上海慧之建建设顾问有限公司 | Method for searching component by graph in BIM (building information modeling) model and graph-text searching system |
CN111160410B (en) * | 2019-12-11 | 2023-08-08 | 北京京东乾石科技有限公司 | Object detection method and device |
CN111144262B (en) * | 2019-12-20 | 2023-05-16 | 北京容联易通信息技术有限公司 | Process anomaly detection method based on monitoring video |
CN111310801B (en) * | 2020-01-20 | 2024-02-02 | 桂林航天工业学院 | Mixed dimension flow classification method and system based on convolutional neural network |
CN112116195B (en) * | 2020-07-21 | 2024-04-16 | 蓝卓数字科技有限公司 | Railway beam production procedure identification method based on example segmentation |
CN112016409B (en) * | 2020-08-11 | 2024-08-02 | 艾普工华科技(武汉)有限公司 | Deep learning-based process specification visual identification judging method and system |
CN114519841A (en) * | 2020-11-05 | 2022-05-20 | 百威雷科技控股有限公司 | Production line monitoring method and monitoring system thereof |
US12106562B2 (en) * | 2021-10-08 | 2024-10-01 | Mitsubishi Electric Research Laboratories, Inc. | System and method for anomaly detection of a scene |
CN114299128A (en) * | 2021-12-30 | 2022-04-08 | 咪咕视讯科技有限公司 | Multi-view positioning detection method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217214A (en) * | 2014-08-21 | 2014-12-17 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method |
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN106407903A (en) * | 2016-08-31 | 2017-02-15 | 四川瞳知科技有限公司 | Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method |
WO2017031088A1 (en) * | 2015-08-15 | 2017-02-23 | Salesforce.Com, Inc | Three-dimensional (3d) convolution with 3d batch normalization |
-
2017
- 2017-05-12 CN CN201710335309.XA patent/CN107194559B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217214A (en) * | 2014-08-21 | 2014-12-17 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method |
WO2017031088A1 (en) * | 2015-08-15 | 2017-02-23 | Salesforce.Com, Inc | Three-dimensional (3d) convolution with 3d batch normalization |
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN106407903A (en) * | 2016-08-31 | 2017-02-15 | 四川瞳知科技有限公司 | Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method |
Also Published As
Publication number | Publication date |
---|---|
CN107194559A (en) | 2017-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107194559B (en) | Workflow identification method based on three-dimensional convolutional neural network | |
Santosh et al. | Tracking multiple moving objects using gaussian mixture model | |
CN111886600B (en) | Apparatus and method for instance level segmentation of images | |
EP2549759B1 (en) | Method and system for facilitating color balance synchronization between a plurality of video cameras as well as method and system for obtaining object tracking between two or more video cameras | |
CN111126115B (en) | Violent sorting behavior identification method and device | |
CN109460719A (en) | A kind of electric operating safety recognizing method | |
CN110298297A (en) | Flame identification method and device | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
CN110751097B (en) | Semi-supervised three-dimensional point cloud gesture key point detection method | |
CN108734109B (en) | Visual target tracking method and system for image sequence | |
CN113449606A (en) | Target object identification method and device, computer equipment and storage medium | |
CN106023249A (en) | Moving object detection method based on local binary similarity pattern | |
Gaba et al. | Motion detection, tracking and classification for automated Video Surveillance | |
Nosheen et al. | Efficient Vehicle Detection and Tracking using Blob Detection and Kernelized Filter | |
Abdullah et al. | Objects detection and tracking using fast principle component purist and kalman filter. | |
CN108345835B (en) | Target identification method based on compound eye imitation perception | |
Ali et al. | Deep Learning Algorithms for Human Fighting Action Recognition. | |
KR101690050B1 (en) | Intelligent video security system | |
CN114821441A (en) | Deep learning-based airport scene moving target identification method combined with ADS-B information | |
CN113936034A (en) | Apparent motion combined weak and small moving object detection method combined with interframe light stream | |
CN110111358B (en) | Target tracking method based on multilayer time sequence filtering | |
Thotapalli et al. | Feature extraction of moving objects using background subtraction technique for robotic applications | |
Odey et al. | Feature Deep Learning Extraction Approach for Object Detection in Self-Driving Cars | |
CN111860229A (en) | Intelligent abnormal behavior identification method and device and storage medium | |
Sujatha et al. | An innovative moving object detection and tracking system by using modified region growing algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220831 Address after: Room 405, 6-8 Jiaogong Road, Xihu District, Hangzhou City, Zhejiang Province, 310013 Patentee after: Hangzhou Taoyi Data Technology Co.,Ltd. Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang Patentee before: HANGZHOU DIANZI University |
|
TR01 | Transfer of patent right |