CN117635458A - Video prediction method based on deep stream analysis network - Google Patents
Video prediction method based on deep stream analysis network Download PDFInfo
- Publication number
- CN117635458A CN117635458A CN202311659020.5A CN202311659020A CN117635458A CN 117635458 A CN117635458 A CN 117635458A CN 202311659020 A CN202311659020 A CN 202311659020A CN 117635458 A CN117635458 A CN 117635458A
- Authority
- CN
- China
- Prior art keywords
- video
- network
- prediction
- constructing
- motion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000004458 analytical method Methods 0.000 title claims abstract description 12
- 230000003287 optical effect Effects 0.000 claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 19
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000005096 rolling process Methods 0.000 claims description 3
- 230000033001 locomotion Effects 0.000 abstract description 29
- 230000000007 visual effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000004504 retinal motion Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000003930 cognitive ability Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a video prediction method based on a depth stream analysis network, which predicts future scenes by analyzing an optical stream into a rigid stream and a residual stream, wherein the rigid stream represents scene dynamics generated by self-movement of an observer, and the residual stream corresponds to movement of other objects in the scene. Specifically, the method proposes an end-to-end unsupervised depth neural network that predicts future video frames by decomposing scene motion into self-motion (camera motion) and object-centric motion. The method improves the capability of the model for analyzing the dynamic information of the scene, and has certain social value and practical significance.
Description
Technical Field
The invention belongs to the technical field of video analysis and prediction, and particularly relates to a video prediction method based on a deep stream analysis network.
Background
The ability to predict future conditions based on current and historical observations is critical to machine decision making. This task is relatively easy for humans, but is very challenging for machines. In recent years, computer vision researchers have focused their attention on video prediction tasks, specifically, predicting future video frames from video frames that have been observed.
The robust and effective video prediction method not only needs to fully utilize space semantic information, but also needs to accurately master time sequence motion rules. The motion dynamics contain rich scene evolution information, which is critical to understanding the environment, especially for automatic driving automobiles. Existing methods almost always jointly estimate the motion of background and foreground objects by direct optical flow or inter-frame differences, however, the motion of background and foreground objects in a scene is of different origin: the former is purely from the self-motion of the observer camera, while the latter is from the double superposition of the self-motion of the observer camera and the residual motion of the object. Therefore, the existing method has limited capability in distinguishing the static object and the moving object of the scene, and can not analyze the dynamic information of the scene with high fidelity. This problem is further exacerbated in complex urban environments where dynamic objects are dense.
Rushton et al found that there is a "flow resolution mechanism" in the human visual system, and the brain uses its sensitivity to optical flow to resolve retinal motion into components that result from self or object-centric motion, depth information also plays an important role in this process. The self-motion component is first estimated from the visual stimulus produced by the observer's motion to the retina, and then the "true" object-centric motion estimate is calculated by "subtracting" the self-motion from the retinal motion. This cognitive ability helps humans systematically solve problems and adapt to new situations. The method obtains inspiration from the biological 'flow analysis mechanism', and proposes decoupling background change and residual motion centered on an object through scene geometry reconstruction, thereby facilitating the inference of future frames in a video sequence.
Existing video prediction algorithms can be divided into deterministic video prediction methods and stochastic video prediction methods. The goal of deterministic video prediction methods is to minimize the reconstruction distance between the real world and the predicted result. In addition to ensuring the predicted quality of each frame, it is also necessary to extract a temporal representation in the video sequence. Deterministic video prediction tasks are of great significance for autopilot, robotic control, etc., and can generate predictions that are accurate enough to make safer, more reliable decisions. In deterministic methods, direct pixel synthesis models attempt to directly predict future pixel intensities on a frame-by-frame basis, which implicitly model the dynamic and static content of a scene during feature extraction. Ranzato et al use k-means to discrete video frames in a cluster of image blocks, and they assume that non-overlapping image blocks are different in k-means discretized space. The method is based on a model of recurrent neural network, short-term prediction is performed at the block level, and since the whole frame is made up of predicted blocks, prediction of large and fast moving objects is accurate, however, there is still room for improvement when small and slow moving objects are involved. Lotter et al propose a "PredNet" whose inspiration comes from the neuroscience concept of "predictive coding". "PredNet" consists of a series of repetitively stacked modules that attempt to locally predict the inputs of the modules, although exhibiting some promising results, the model has a limited length of timing that can be predicted. Therefore, improvement of long-term prediction performance is an important point for subsequent work. Jin et al use the generation of an antagonism network to improve the authenticity of predictions. Inspired by the band decomposition characteristics of the human visual system, jin et al propose video prediction that utilizes wavelet analysis to explore multi-frequency analysis for high fidelity and timing consistency. Shouno et al propose a depth residual network with a hierarchical structure to handle large motions, where each layer predicts future states at different spatial resolutions. The predictions of these different layers are combined by a top-down connection to generate future frames. Another type of deterministic method uses the generation of a transformation matrix for video prediction, which is equivalent to affine transformation between adjacent frames. Vondrick et al deal with future uncertainties and past memory by learning transformations, separating past memory from predictions of the future.
Random video prediction methods consider future prediction as a multi-modal task, which generally encodes uncertainty as a sequence of potential variables. Random methods are typically based on generating a challenge network, varying the self-encoder structure, etc. Babaizadeh et al propose a first task of random multi-frame prediction that proposes a random variation video prediction method that predicts the different possible future of each potential variable sample. Denton et al propose a stochastic video generation model that combines a deterministic frame predictor with random latent variables that vary over time. Lee et al propose the first effort to produce high quality predictions by varying the lower bound and countermeasure training.
Although existing video prediction algorithms have achieved a certain performance, their lack of understanding of motion information decoupling often results in blurred predicted video sequences and lack of timing consistency, which makes it difficult to perform well.
Disclosure of Invention
The embodiment of the invention discloses a video prediction method based on a depth stream analysis network, which predicts a future scene by analyzing an optical stream into a rigid stream and a residual stream, wherein the rigid stream represents scene dynamics generated by self-movement of an observer, and the residual stream corresponds to movement of other objects in the scene. Specifically, the method proposes an end-to-end unsupervised depth neural network that predicts future video frames by decomposing scene motion into self-motion (camera motion) and object-centric motion. The method improves the capability of the model for analyzing the dynamic information of the scene, and has certain social value and practical significance.
The technical scheme of the invention is as follows:
a video prediction method based on a deep stream analysis network comprises the following steps:
s1, acquiring a training sample;
s2, preprocessing video data;
s3, constructing a depth and pose prediction network;
based on a convolutional neural network architecture, removing the original full-connection layer and all subsequent layers, only reserving a rolling and pooling part, and constructing a depth and pose prediction network;
s4, constructing a geometric rigid flow projection unit, and connecting the geometric rigid flow projection unit to the back of the convolution and pooling convolution neural network architecture reserved in the S3;
s5, constructing a residual flow network based on a convolutional neural network, outputting residual flow, and adding the residual flow with the residual flow to obtain an overall optical flow;
s6, constructing an LSTM module, inputting the whole optical flow and memorizing time sequence information;
s7, constructing a decoder module, and connecting to the LSTM constructed in the S6 to obtain a video prediction network model M;
s8, training a video prediction model M;
s9, calculating training loss, and updating network parameters by using a back propagation algorithm;
s10, video frame prediction is carried out on the input video sequence by utilizing the trained network.
Further, the step S1 specifically includes:
the method comprises the steps of obtaining video sequence data sets from a database, wherein the data sets comprise a KITTI data set for carrying out video prediction on automatic driving of an automobile and a Caltech Pedestrain data set, extracting a certain number of video frame sequences as input by taking one data set as a unique data set during training of a network, taking subsequent video frames as corresponding reference results, and then carrying out the same operation by taking the other data set as the unique data set.
Further, the step S2 specifically includes:
s21, scaling: scaling the video frame to the original theta times, wherein the value range in the embodiment is 1.0-1.5;
s22, cutting: the original training samples are randomly sheared into 320 x 320 pixel video sequences;
s23, HSL adjustment: the chromaticity (Hue), saturation (Saturation), and luminance (lighting) of the clipped samples are multiplied by a random value delta e 1.0,1.2 to simulate the illumination variation of the natural environment.
S24, dividing the video sequence data set into a training set and a testing set;
further, step S8 specifically includes:
extracting a sequence of t successive video images x= { X from the input video sequence in S1 1 ,x 2 ,…,x t Sequentially inputting the video image sequence X into the video prediction network M constructed in S7 to extract features and predict the next video frame image
Further, step S9 specifically includes:
video frames to be predictedThe video prediction network inputted to S7 gets predicted +.>And so on until a k-frame video sequence to be predicted is obtained +.>The true video sequence s= { x 1 ,x 2 ,…,x t ,x t+1 ,x t+2 ,…,x t+k Video frame sequence of } and prediction +.>In contrast to this, the number of the cells,calculating loss, training a network model M by using a back propagation algorithm, wherein loss functions used in training are respectively as follows:
compared with the prior art, the invention has the beneficial technical effects that:
1) The invention provides a video prediction method based on a deep stream analysis network. In real scenes, self-motion of the camera and object-centric motion superposition result in complex dynamic evolution, the full knowledge and understanding of which is necessary for the video prediction task. Previous studies have mostly focused on the processing of global motion, ignoring the ambiguity of camera self-motion and object center motion, resulting in incomplete understanding of overall scene dynamics. The method is inspired by a 'stream parsing mechanism' of a human visual system, and provides separation of background change and residual motion centered on an object through scene geometry reconstruction so as to facilitate inference of future frames in a video sequence. Compared with the traditional video prediction method, the method can better sense the motion in the video, and further improves the accuracy and stability of prediction.
2) The present invention emphasizes the importance of disambiguating camera self-motion and object center motion in future predictions. The optical flow is resolved into a rigid optical flow related to camera motion and a residual optical flow related to object center motion. In addition, content information is synchronously extracted from the historical frames through the full convolution neural network, and a better prediction effect is achieved through double understanding of the content and the motion characteristics.
3) The invention realizes deep understanding of video motion by introducing a stream analysis mechanism, thereby improving the accuracy and stability of the model. Therefore, the method has important application value and wide development prospect in the field of video prediction. In practical use, the result prediction sequence can be obtained by only inputting the video sequence into a generating network through one forward propagation, and the method has better effect compared with the traditional video prediction method.
Drawings
FIG. 1 is a flow chart of a video prediction method of the present invention;
FIG. 2 is a diagram of an embodiment of the present invention;
fig. 3 is a schematic diagram of a video prediction network structure according to the present invention.
Detailed Description
As shown in fig. 1-3, a video prediction method based on a deep stream parsing network includes the following steps:
s1, obtaining a training sample
Acquiring video sequence data sets from a database, wherein the data sets comprise a KITTI data set and a Caltech Pedestrian data set for carrying out video prediction on automatic driving of an automobile, extracting a certain number of video frame sequences as input by taking one data set as a unique data set during training of a network, taking subsequent video frames as corresponding reference results, and then carrying out the same operation by taking the other data set as the unique data set;
s2, preprocessing operation of video data
The step S2 specifically comprises the following steps:
s21, scaling: scaling the video frame to the original theta times, wherein the value range in the embodiment is 1.0-1.5;
s22, cutting: the original training samples are randomly sheared into 320 x 320 pixel video sequences;
s23, HSL adjustment: the chromaticity (Hue), saturation (Saturation), and luminance (lighting) of the clipped samples are multiplied by a random value delta e 1.0,1.2 to simulate the illumination variation of the natural environment.
S24, dividing the video sequence data set into a training set and a testing set;
s3, constructing a depth and pose prediction network;
based on a convolutional neural network architecture, removing the original full-connection layer and all subsequent layers, only reserving a rolling and pooling part, and constructing a depth and pose prediction network;
s4, constructing a geometric rigid flow projection unit, and connecting the geometric rigid flow projection unit to the back of the convolution and pooling convolution neural network architecture reserved in the S3;
s5, constructing a residual flow network based on a convolutional neural network, outputting residual flow, and adding the residual flow with the residual flow to obtain an overall optical flow;
s6, constructing an LSTM module, inputting the whole optical flow and memorizing time sequence information;
s7, constructing a decoder module, and connecting to the LSTM constructed in the S6 to obtain a video prediction network model M;
s8, training a video prediction model M;
the step S8 specifically comprises the following steps:
extracting a sequence of t successive video images x= { X from the input video sequence in S1 1 ,x 2 ,…,x t X in the formula }, where i Representing the ith frame image, inputting the video image sequence X into the video prediction network M constructed in S7 in sequence to extract features and predict the next video frame imageI.e. the image frame at time t + 1.
S9, calculating training loss, and updating network parameters by using a back propagation algorithm
The step S9 specifically comprises the following steps:
video frames to be predictedThe video prediction network inputted to S7 gets predicted +.>I.e. the image frame at time t +2, and so on until a k-frame video sequence to be predicted is obtained>The true video sequence s= { x 1 ,x 2 ,…,x t ,x t+1 ,x t+2 ,…,x t+k And (3)Predicted video frame sequence->In contrast, the loss is calculated, the network model M is trained by using a back propagation algorithm, and loss functions used in training are respectively as follows:
s10, video frame prediction is carried out on the input video sequence by utilizing the trained network.
The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.
Claims (1)
1. A space-time wavelet analysis video prediction method based on a differential attention mechanism is characterized by comprising the following steps:
s1, acquiring a training sample;
s2, preprocessing video data;
s3, constructing a depth and pose prediction network;
based on a convolutional neural network architecture, removing the original full-connection layer and all subsequent layers, only reserving a rolling and pooling part, and constructing a depth and pose prediction network;
s4, constructing a geometric rigid flow projection unit, and connecting the geometric rigid flow projection unit to the back of the convolution and pooling convolution neural network architecture reserved in the S3;
s5, constructing a residual flow network based on a convolutional neural network, outputting residual flow, and adding the residual flow with the residual flow to obtain an overall optical flow;
s6, constructing an LSTM module, inputting the whole optical flow and memorizing time sequence information;
s7, constructing a decoder module, and connecting to the LSTM constructed in the S6 to obtain a video prediction network model M;
s8, training a video prediction model M;
s9, calculating training loss, and updating network parameters by using a back propagation algorithm;
s10, video frame prediction is carried out on the input video sequence by utilizing the trained network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311659020.5A CN117635458A (en) | 2023-12-05 | 2023-12-05 | Video prediction method based on deep stream analysis network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311659020.5A CN117635458A (en) | 2023-12-05 | 2023-12-05 | Video prediction method based on deep stream analysis network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117635458A true CN117635458A (en) | 2024-03-01 |
Family
ID=90030215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311659020.5A Pending CN117635458A (en) | 2023-12-05 | 2023-12-05 | Video prediction method based on deep stream analysis network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117635458A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108184128A (en) * | 2018-01-11 | 2018-06-19 | 安徽优思天成智能科技有限公司 | Video sequence lost frames prediction restoration methods based on deep neural network |
US10814815B1 (en) * | 2019-06-11 | 2020-10-27 | Tangerine Innovation Holding Inc. | System for determining occurrence of an automobile accident and characterizing the accident |
CN113156959A (en) * | 2021-04-27 | 2021-07-23 | 东莞理工学院 | Self-supervision learning and navigation method of autonomous mobile robot in complex scene |
-
2023
- 2023-12-05 CN CN202311659020.5A patent/CN117635458A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108184128A (en) * | 2018-01-11 | 2018-06-19 | 安徽优思天成智能科技有限公司 | Video sequence lost frames prediction restoration methods based on deep neural network |
US10814815B1 (en) * | 2019-06-11 | 2020-10-27 | Tangerine Innovation Holding Inc. | System for determining occurrence of an automobile accident and characterizing the accident |
CN113156959A (en) * | 2021-04-27 | 2021-07-23 | 东莞理工学院 | Self-supervision learning and navigation method of autonomous mobile robot in complex scene |
Non-Patent Citations (2)
Title |
---|
SEOKJU LEE等: "Learning Residual Flow as Dynamic Motion from Stereo Videos", 《ARXIV》, 16 September 2019 (2019-09-16), pages 1 - 7 * |
金贝贝 等: "基于差分注意力的时空小波分析视频预测算法", 《计算机辅助设计与图形学学报》, 28 February 2022 (2022-02-28), pages 180 - 183 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Predrnn: A recurrent neural network for spatiotemporal predictive learning | |
CN110458844B (en) | Semantic segmentation method for low-illumination scene | |
CN109271933B (en) | Method for estimating three-dimensional human body posture based on video stream | |
WO2021093468A1 (en) | Video classification method and apparatus, model training method and apparatus, device and storage medium | |
Wang | Research on sports training action recognition based on deep learning | |
CN114049381A (en) | Twin cross target tracking method fusing multilayer semantic information | |
CN114550223B (en) | Person interaction detection method and device and electronic equipment | |
Jung et al. | Goal-directed behavior under variational predictive coding: Dynamic organization of visual attention and working memory | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
CN114612414B (en) | Image processing method, model training method, device, equipment and storage medium | |
CN115661246A (en) | Attitude estimation method based on self-supervision learning | |
CN113011320B (en) | Video processing method, device, electronic equipment and storage medium | |
CN113255514B (en) | Behavior identification method based on local scene perception graph convolutional network | |
CN113554653A (en) | Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration | |
Xu et al. | AutoSegNet: An automated neural network for image segmentation | |
Du et al. | Adaptive visual interaction based multi-target future state prediction for autonomous driving vehicles | |
CN114842542B (en) | Facial action unit identification method and device based on self-adaptive attention and space-time correlation | |
CN116205962A (en) | Monocular depth estimation method and system based on complete context information | |
CN113971826B (en) | Dynamic emotion recognition method and system for estimating continuous titer and arousal level | |
CN117635458A (en) | Video prediction method based on deep stream analysis network | |
CN117022305A (en) | Human factor intelligent driving behavior prediction method, system, terminal equipment and storage medium | |
CN117454119A (en) | Urban rail passenger flow prediction method based on dynamic multi-graph and multidimensional attention space-time neural network | |
CN116402874A (en) | Spacecraft depth complementing method based on time sequence optical image and laser radar data | |
Lee et al. | Boundary-aware camouflaged object detection via deformable point sampling | |
CN116452472A (en) | Low-illumination image enhancement method based on semantic knowledge guidance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |