CN117635458A - Video prediction method based on deep stream analysis network - Google Patents

Video prediction method based on deep stream analysis network Download PDF

Info

Publication number
CN117635458A
CN117635458A CN202311659020.5A CN202311659020A CN117635458A CN 117635458 A CN117635458 A CN 117635458A CN 202311659020 A CN202311659020 A CN 202311659020A CN 117635458 A CN117635458 A CN 117635458A
Authority
CN
China
Prior art keywords
video
network
prediction
constructing
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311659020.5A
Other languages
Chinese (zh)
Inventor
金贝贝
宋晓辉
李金东
张鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Physics Henan Academy Of Sciences
Henan Academy of Sciences
Original Assignee
Institute Of Physics Henan Academy Of Sciences
Henan Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Physics Henan Academy Of Sciences, Henan Academy of Sciences filed Critical Institute Of Physics Henan Academy Of Sciences
Priority to CN202311659020.5A priority Critical patent/CN117635458A/en
Publication of CN117635458A publication Critical patent/CN117635458A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video prediction method based on a depth stream analysis network, which predicts future scenes by analyzing an optical stream into a rigid stream and a residual stream, wherein the rigid stream represents scene dynamics generated by self-movement of an observer, and the residual stream corresponds to movement of other objects in the scene. Specifically, the method proposes an end-to-end unsupervised depth neural network that predicts future video frames by decomposing scene motion into self-motion (camera motion) and object-centric motion. The method improves the capability of the model for analyzing the dynamic information of the scene, and has certain social value and practical significance.

Description

Video prediction method based on deep stream analysis network
Technical Field
The invention belongs to the technical field of video analysis and prediction, and particularly relates to a video prediction method based on a deep stream analysis network.
Background
The ability to predict future conditions based on current and historical observations is critical to machine decision making. This task is relatively easy for humans, but is very challenging for machines. In recent years, computer vision researchers have focused their attention on video prediction tasks, specifically, predicting future video frames from video frames that have been observed.
The robust and effective video prediction method not only needs to fully utilize space semantic information, but also needs to accurately master time sequence motion rules. The motion dynamics contain rich scene evolution information, which is critical to understanding the environment, especially for automatic driving automobiles. Existing methods almost always jointly estimate the motion of background and foreground objects by direct optical flow or inter-frame differences, however, the motion of background and foreground objects in a scene is of different origin: the former is purely from the self-motion of the observer camera, while the latter is from the double superposition of the self-motion of the observer camera and the residual motion of the object. Therefore, the existing method has limited capability in distinguishing the static object and the moving object of the scene, and can not analyze the dynamic information of the scene with high fidelity. This problem is further exacerbated in complex urban environments where dynamic objects are dense.
Rushton et al found that there is a "flow resolution mechanism" in the human visual system, and the brain uses its sensitivity to optical flow to resolve retinal motion into components that result from self or object-centric motion, depth information also plays an important role in this process. The self-motion component is first estimated from the visual stimulus produced by the observer's motion to the retina, and then the "true" object-centric motion estimate is calculated by "subtracting" the self-motion from the retinal motion. This cognitive ability helps humans systematically solve problems and adapt to new situations. The method obtains inspiration from the biological 'flow analysis mechanism', and proposes decoupling background change and residual motion centered on an object through scene geometry reconstruction, thereby facilitating the inference of future frames in a video sequence.
Existing video prediction algorithms can be divided into deterministic video prediction methods and stochastic video prediction methods. The goal of deterministic video prediction methods is to minimize the reconstruction distance between the real world and the predicted result. In addition to ensuring the predicted quality of each frame, it is also necessary to extract a temporal representation in the video sequence. Deterministic video prediction tasks are of great significance for autopilot, robotic control, etc., and can generate predictions that are accurate enough to make safer, more reliable decisions. In deterministic methods, direct pixel synthesis models attempt to directly predict future pixel intensities on a frame-by-frame basis, which implicitly model the dynamic and static content of a scene during feature extraction. Ranzato et al use k-means to discrete video frames in a cluster of image blocks, and they assume that non-overlapping image blocks are different in k-means discretized space. The method is based on a model of recurrent neural network, short-term prediction is performed at the block level, and since the whole frame is made up of predicted blocks, prediction of large and fast moving objects is accurate, however, there is still room for improvement when small and slow moving objects are involved. Lotter et al propose a "PredNet" whose inspiration comes from the neuroscience concept of "predictive coding". "PredNet" consists of a series of repetitively stacked modules that attempt to locally predict the inputs of the modules, although exhibiting some promising results, the model has a limited length of timing that can be predicted. Therefore, improvement of long-term prediction performance is an important point for subsequent work. Jin et al use the generation of an antagonism network to improve the authenticity of predictions. Inspired by the band decomposition characteristics of the human visual system, jin et al propose video prediction that utilizes wavelet analysis to explore multi-frequency analysis for high fidelity and timing consistency. Shouno et al propose a depth residual network with a hierarchical structure to handle large motions, where each layer predicts future states at different spatial resolutions. The predictions of these different layers are combined by a top-down connection to generate future frames. Another type of deterministic method uses the generation of a transformation matrix for video prediction, which is equivalent to affine transformation between adjacent frames. Vondrick et al deal with future uncertainties and past memory by learning transformations, separating past memory from predictions of the future.
Random video prediction methods consider future prediction as a multi-modal task, which generally encodes uncertainty as a sequence of potential variables. Random methods are typically based on generating a challenge network, varying the self-encoder structure, etc. Babaizadeh et al propose a first task of random multi-frame prediction that proposes a random variation video prediction method that predicts the different possible future of each potential variable sample. Denton et al propose a stochastic video generation model that combines a deterministic frame predictor with random latent variables that vary over time. Lee et al propose the first effort to produce high quality predictions by varying the lower bound and countermeasure training.
Although existing video prediction algorithms have achieved a certain performance, their lack of understanding of motion information decoupling often results in blurred predicted video sequences and lack of timing consistency, which makes it difficult to perform well.
Disclosure of Invention
The embodiment of the invention discloses a video prediction method based on a depth stream analysis network, which predicts a future scene by analyzing an optical stream into a rigid stream and a residual stream, wherein the rigid stream represents scene dynamics generated by self-movement of an observer, and the residual stream corresponds to movement of other objects in the scene. Specifically, the method proposes an end-to-end unsupervised depth neural network that predicts future video frames by decomposing scene motion into self-motion (camera motion) and object-centric motion. The method improves the capability of the model for analyzing the dynamic information of the scene, and has certain social value and practical significance.
The technical scheme of the invention is as follows:
a video prediction method based on a deep stream analysis network comprises the following steps:
s1, acquiring a training sample;
s2, preprocessing video data;
s3, constructing a depth and pose prediction network;
based on a convolutional neural network architecture, removing the original full-connection layer and all subsequent layers, only reserving a rolling and pooling part, and constructing a depth and pose prediction network;
s4, constructing a geometric rigid flow projection unit, and connecting the geometric rigid flow projection unit to the back of the convolution and pooling convolution neural network architecture reserved in the S3;
s5, constructing a residual flow network based on a convolutional neural network, outputting residual flow, and adding the residual flow with the residual flow to obtain an overall optical flow;
s6, constructing an LSTM module, inputting the whole optical flow and memorizing time sequence information;
s7, constructing a decoder module, and connecting to the LSTM constructed in the S6 to obtain a video prediction network model M;
s8, training a video prediction model M;
s9, calculating training loss, and updating network parameters by using a back propagation algorithm;
s10, video frame prediction is carried out on the input video sequence by utilizing the trained network.
Further, the step S1 specifically includes:
the method comprises the steps of obtaining video sequence data sets from a database, wherein the data sets comprise a KITTI data set for carrying out video prediction on automatic driving of an automobile and a Caltech Pedestrain data set, extracting a certain number of video frame sequences as input by taking one data set as a unique data set during training of a network, taking subsequent video frames as corresponding reference results, and then carrying out the same operation by taking the other data set as the unique data set.
Further, the step S2 specifically includes:
s21, scaling: scaling the video frame to the original theta times, wherein the value range in the embodiment is 1.0-1.5;
s22, cutting: the original training samples are randomly sheared into 320 x 320 pixel video sequences;
s23, HSL adjustment: the chromaticity (Hue), saturation (Saturation), and luminance (lighting) of the clipped samples are multiplied by a random value delta e 1.0,1.2 to simulate the illumination variation of the natural environment.
S24, dividing the video sequence data set into a training set and a testing set;
further, step S8 specifically includes:
extracting a sequence of t successive video images x= { X from the input video sequence in S1 1 ,x 2 ,…,x t Sequentially inputting the video image sequence X into the video prediction network M constructed in S7 to extract features and predict the next video frame image
Further, step S9 specifically includes:
video frames to be predictedThe video prediction network inputted to S7 gets predicted +.>And so on until a k-frame video sequence to be predicted is obtained +.>The true video sequence s= { x 1 ,x 2 ,…,x t ,x t+1 ,x t+2 ,…,x t+k Video frame sequence of } and prediction +.>In contrast to this, the number of the cells,calculating loss, training a network model M by using a back propagation algorithm, wherein loss functions used in training are respectively as follows:
compared with the prior art, the invention has the beneficial technical effects that:
1) The invention provides a video prediction method based on a deep stream analysis network. In real scenes, self-motion of the camera and object-centric motion superposition result in complex dynamic evolution, the full knowledge and understanding of which is necessary for the video prediction task. Previous studies have mostly focused on the processing of global motion, ignoring the ambiguity of camera self-motion and object center motion, resulting in incomplete understanding of overall scene dynamics. The method is inspired by a 'stream parsing mechanism' of a human visual system, and provides separation of background change and residual motion centered on an object through scene geometry reconstruction so as to facilitate inference of future frames in a video sequence. Compared with the traditional video prediction method, the method can better sense the motion in the video, and further improves the accuracy and stability of prediction.
2) The present invention emphasizes the importance of disambiguating camera self-motion and object center motion in future predictions. The optical flow is resolved into a rigid optical flow related to camera motion and a residual optical flow related to object center motion. In addition, content information is synchronously extracted from the historical frames through the full convolution neural network, and a better prediction effect is achieved through double understanding of the content and the motion characteristics.
3) The invention realizes deep understanding of video motion by introducing a stream analysis mechanism, thereby improving the accuracy and stability of the model. Therefore, the method has important application value and wide development prospect in the field of video prediction. In practical use, the result prediction sequence can be obtained by only inputting the video sequence into a generating network through one forward propagation, and the method has better effect compared with the traditional video prediction method.
Drawings
FIG. 1 is a flow chart of a video prediction method of the present invention;
FIG. 2 is a diagram of an embodiment of the present invention;
fig. 3 is a schematic diagram of a video prediction network structure according to the present invention.
Detailed Description
As shown in fig. 1-3, a video prediction method based on a deep stream parsing network includes the following steps:
s1, obtaining a training sample
Acquiring video sequence data sets from a database, wherein the data sets comprise a KITTI data set and a Caltech Pedestrian data set for carrying out video prediction on automatic driving of an automobile, extracting a certain number of video frame sequences as input by taking one data set as a unique data set during training of a network, taking subsequent video frames as corresponding reference results, and then carrying out the same operation by taking the other data set as the unique data set;
s2, preprocessing operation of video data
The step S2 specifically comprises the following steps:
s21, scaling: scaling the video frame to the original theta times, wherein the value range in the embodiment is 1.0-1.5;
s22, cutting: the original training samples are randomly sheared into 320 x 320 pixel video sequences;
s23, HSL adjustment: the chromaticity (Hue), saturation (Saturation), and luminance (lighting) of the clipped samples are multiplied by a random value delta e 1.0,1.2 to simulate the illumination variation of the natural environment.
S24, dividing the video sequence data set into a training set and a testing set;
s3, constructing a depth and pose prediction network;
based on a convolutional neural network architecture, removing the original full-connection layer and all subsequent layers, only reserving a rolling and pooling part, and constructing a depth and pose prediction network;
s4, constructing a geometric rigid flow projection unit, and connecting the geometric rigid flow projection unit to the back of the convolution and pooling convolution neural network architecture reserved in the S3;
s5, constructing a residual flow network based on a convolutional neural network, outputting residual flow, and adding the residual flow with the residual flow to obtain an overall optical flow;
s6, constructing an LSTM module, inputting the whole optical flow and memorizing time sequence information;
s7, constructing a decoder module, and connecting to the LSTM constructed in the S6 to obtain a video prediction network model M;
s8, training a video prediction model M;
the step S8 specifically comprises the following steps:
extracting a sequence of t successive video images x= { X from the input video sequence in S1 1 ,x 2 ,…,x t X in the formula }, where i Representing the ith frame image, inputting the video image sequence X into the video prediction network M constructed in S7 in sequence to extract features and predict the next video frame imageI.e. the image frame at time t + 1.
S9, calculating training loss, and updating network parameters by using a back propagation algorithm
The step S9 specifically comprises the following steps:
video frames to be predictedThe video prediction network inputted to S7 gets predicted +.>I.e. the image frame at time t +2, and so on until a k-frame video sequence to be predicted is obtained>The true video sequence s= { x 1 ,x 2 ,…,x t ,x t+1 ,x t+2 ,…,x t+k And (3)Predicted video frame sequence->In contrast, the loss is calculated, the network model M is trained by using a back propagation algorithm, and loss functions used in training are respectively as follows:
s10, video frame prediction is carried out on the input video sequence by utilizing the trained network.
The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims (1)

1. A space-time wavelet analysis video prediction method based on a differential attention mechanism is characterized by comprising the following steps:
s1, acquiring a training sample;
s2, preprocessing video data;
s3, constructing a depth and pose prediction network;
based on a convolutional neural network architecture, removing the original full-connection layer and all subsequent layers, only reserving a rolling and pooling part, and constructing a depth and pose prediction network;
s4, constructing a geometric rigid flow projection unit, and connecting the geometric rigid flow projection unit to the back of the convolution and pooling convolution neural network architecture reserved in the S3;
s5, constructing a residual flow network based on a convolutional neural network, outputting residual flow, and adding the residual flow with the residual flow to obtain an overall optical flow;
s6, constructing an LSTM module, inputting the whole optical flow and memorizing time sequence information;
s7, constructing a decoder module, and connecting to the LSTM constructed in the S6 to obtain a video prediction network model M;
s8, training a video prediction model M;
s9, calculating training loss, and updating network parameters by using a back propagation algorithm;
s10, video frame prediction is carried out on the input video sequence by utilizing the trained network.
CN202311659020.5A 2023-12-05 2023-12-05 Video prediction method based on deep stream analysis network Pending CN117635458A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311659020.5A CN117635458A (en) 2023-12-05 2023-12-05 Video prediction method based on deep stream analysis network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311659020.5A CN117635458A (en) 2023-12-05 2023-12-05 Video prediction method based on deep stream analysis network

Publications (1)

Publication Number Publication Date
CN117635458A true CN117635458A (en) 2024-03-01

Family

ID=90030215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311659020.5A Pending CN117635458A (en) 2023-12-05 2023-12-05 Video prediction method based on deep stream analysis network

Country Status (1)

Country Link
CN (1) CN117635458A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108184128A (en) * 2018-01-11 2018-06-19 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on deep neural network
US10814815B1 (en) * 2019-06-11 2020-10-27 Tangerine Innovation Holding Inc. System for determining occurrence of an automobile accident and characterizing the accident
CN113156959A (en) * 2021-04-27 2021-07-23 东莞理工学院 Self-supervision learning and navigation method of autonomous mobile robot in complex scene

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108184128A (en) * 2018-01-11 2018-06-19 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on deep neural network
US10814815B1 (en) * 2019-06-11 2020-10-27 Tangerine Innovation Holding Inc. System for determining occurrence of an automobile accident and characterizing the accident
CN113156959A (en) * 2021-04-27 2021-07-23 东莞理工学院 Self-supervision learning and navigation method of autonomous mobile robot in complex scene

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SEOKJU LEE等: "Learning Residual Flow as Dynamic Motion from Stereo Videos", 《ARXIV》, 16 September 2019 (2019-09-16), pages 1 - 7 *
金贝贝 等: "基于差分注意力的时空小波分析视频预测算法", 《计算机辅助设计与图形学学报》, 28 February 2022 (2022-02-28), pages 180 - 183 *

Similar Documents

Publication Publication Date Title
Wang et al. Predrnn: A recurrent neural network for spatiotemporal predictive learning
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN109271933B (en) Method for estimating three-dimensional human body posture based on video stream
WO2021093468A1 (en) Video classification method and apparatus, model training method and apparatus, device and storage medium
Wang Research on sports training action recognition based on deep learning
CN114049381A (en) Twin cross target tracking method fusing multilayer semantic information
CN114550223B (en) Person interaction detection method and device and electronic equipment
Jung et al. Goal-directed behavior under variational predictive coding: Dynamic organization of visual attention and working memory
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN114612414B (en) Image processing method, model training method, device, equipment and storage medium
CN115661246A (en) Attitude estimation method based on self-supervision learning
CN113011320B (en) Video processing method, device, electronic equipment and storage medium
CN113255514B (en) Behavior identification method based on local scene perception graph convolutional network
CN113554653A (en) Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration
Xu et al. AutoSegNet: An automated neural network for image segmentation
Du et al. Adaptive visual interaction based multi-target future state prediction for autonomous driving vehicles
CN114842542B (en) Facial action unit identification method and device based on self-adaptive attention and space-time correlation
CN116205962A (en) Monocular depth estimation method and system based on complete context information
CN113971826B (en) Dynamic emotion recognition method and system for estimating continuous titer and arousal level
CN117635458A (en) Video prediction method based on deep stream analysis network
CN117022305A (en) Human factor intelligent driving behavior prediction method, system, terminal equipment and storage medium
CN117454119A (en) Urban rail passenger flow prediction method based on dynamic multi-graph and multidimensional attention space-time neural network
CN116402874A (en) Spacecraft depth complementing method based on time sequence optical image and laser radar data
Lee et al. Boundary-aware camouflaged object detection via deformable point sampling
CN116452472A (en) Low-illumination image enhancement method based on semantic knowledge guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination