AI Computer Vision Software for Undergrad Thesis

This repository contains code for my Undergraduate Senior Honours Thesis in the Goldreich Lab. I'm developing a computer vision program which can automatically detect how long participants are holding an object in videos.

Proof of Concept Video

AI displays probability that participant is touching grey object (AI was not trained on this clip)

Proof of Concept Multi-Trial Labelling

Predicted contact duration using simple threshold decision rule. Each shaded region represents duration of a different trial (AI was not trained on this portion of video)

Repository Table of Contents

Preprocess Streamlit

Full frontend for preparing raw videos for training or inference
Includes detailed instructions on how to use
Can create a cropped video centered on hands (for inference)
Extract frames from video so they can be labeled (for training)
TODO: done

Frontend Preview (some lab equipment blurred)

Train Neural Network

Training image classification neural network

Paper Visuals

Creating visuals for paper and presentation of results
TODO: always room for more helpful visualizations

Video with Frame Probability Jupyter Notebook

Functions to make predictions in a video using a pre-trained model
Can save video with probability of contact on each frame overlaid
Has interactive plotly figure to predict contact time and examine errors
TODO: done

Frame Label Tool Jupyter Notebook

Easy function to extract frames from video
Given transition frames (i.e. the first frame of object contact or object non-contact), can automatically label all remaining frames in video
Saves tons of time labelling: labeller needs to carefully label 10 frames, and it sorts out the remaining 4000+ with perfect accuracy (accuracy guaranteed because it's rule-based, not machine learning-based)
TODO:

Add support for edge cases: First or last frame are transition frames

Auto-Zoom Jupyter Notebook (superseded by streamlit preprocess)

Series of functions to automatically zoom videos to the desired size, with the participant's hands in the center
Used MediaPipe for hand detection
If both the participant's and experimenter's hands are in the frame, focuses on 2 most likely hands (which will be the participants, because the model is trained on ungolved hands.
If desired crop region outside of photo, white pixels get added to retain desired output video dimensions
TODO: done

Auto-Split Jupyter Notebook (depreciated, multi-trial vids in use)

Series of functions to split long videos of many trials into many videos each containing only one trial
Determines when to split videos based on the amount of blue in a given frame (a trial ends when an experimenter, who wears a blue glove, replaces the object for the participant to classify)
TODO: done

Preprocess Pipeline (Depreciated)

.py file to automatically preprocess videos including auto-zoom and auto-crop (as developed in the Jupyter Notebooks)
TODO: done

TODO Before Deployment

Finalize time series analysis for changepoint detection.

Current testing indicates PELT algorithm works well.

PELT pros:
Very performant in testing (with correct hyperparameter choice)
linear time complexity to find global optimum of cost function.
PELT cons:
Has penalization hyperparameter which needs to be chosen
assumes we don't know the number of changepoints, but there might be situations where we do, meaning the model is sub-optimal (although it is faster than those which have a given number of changepoints

Could also try to a 2 part Bayesian model: detect contact begins if p(touching) makes a huge jump upwards, and then say the duration is determined by a prior for a geometric distributution, and likelihood based on size of probability jump down. The geometric distribution prior for each participant can be determined by a hierarchical Bayesian model.
Formalizing as a Hidden Markov Model (with Baum–Welch algorithm for segmentation) is a possibility, but I'd need labeled training sets which the model isn't trained on to find transition probabilities, which is probably not a great use of resources if I can avoid it.

Need to figure out how to deal with pre-zoomed videos in wrong aspect ratio
- probably slice off the top. Maybe if aspect ratio more extreme than x, do top slice + some bottom slice and or white pixels around edges
Need to evaluate model. Selective review of offline change point detection methods has methods for evaluating time series changepoint detection (which are sufficient metrics for the final reliability regardless of all intermediate steps)

Possible Future directions

Totally new direction: try action segmentation using a paradigm like the ASFormer: Transformer for Action Segmentation
MAYBE: Add model which detects clearly visible object with no contact. Then can do similar time series analysis on current type of model (which detects probability of contact in a given frame) and then do ensemble learning. Ideally each model would have good reliability, but together would be very reliable.

Could do this with another image classification model or maybe with image segmentation

MAYBE: develop some sort of custom loss function for video duration prediction so I can make a very custom time series segmentation model, probably with a neural network. No real details worked out at this point.

Could run into same dual training set problem as in Hidden Markov model

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.idea		.idea
jupyter notebooks		jupyter notebooks
streamlit		streamlit
README.md		README.md
preprocess_pipeline.py		preprocess_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Computer Vision Software for Undergrad Thesis

Proof of Concept Video

Proof of Concept Multi-Trial Labelling

Repository Table of Contents

Preprocess Streamlit

Train Neural Network

Paper Visuals

Video with Frame Probability Jupyter Notebook

Frame Label Tool Jupyter Notebook

Auto-Zoom Jupyter Notebook (superseded by streamlit preprocess)

Auto-Split Jupyter Notebook (depreciated, multi-trial vids in use)

Preprocess Pipeline (Depreciated)

TODO Before Deployment

Possible Future directions

About

Releases

Packages

Languages

nripstein/Undergrad-Thesis

Folders and files

Latest commit

History

Repository files navigation

AI Computer Vision Software for Undergrad Thesis

Proof of Concept Video

Proof of Concept Multi-Trial Labelling

Repository Table of Contents

Preprocess Streamlit

Train Neural Network

Paper Visuals

Video with Frame Probability Jupyter Notebook

Frame Label Tool Jupyter Notebook

Auto-Zoom Jupyter Notebook (superseded by streamlit preprocess)

Auto-Split Jupyter Notebook (depreciated, multi-trial vids in use)

Preprocess Pipeline (Depreciated)

TODO Before Deployment

Possible Future directions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages