Skip to content

Code for undergrad thesis AI computer vision software

Notifications You must be signed in to change notification settings

nripstein/Undergrad-Thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Computer Vision Software for Undergrad Thesis

This repository contains code for my Undergraduate Senior Honours Thesis in the Goldreich Lab. I'm developing a computer vision program which can automatically detect how long participants are holding an object in videos.

Proof of Concept Video

36_prob_gif

AI displays probability that participant is touching grey object (AI was not trained on this clip)

Proof of Concept Multi-Trial Labelling

36_prob_gif

Predicted contact duration using simple threshold decision rule. Each shaded region represents duration of a different trial (AI was not trained on this portion of video)

Repository Table of Contents

  • Full frontend for preparing raw videos for training or inference
  • Includes detailed instructions on how to use
  • Can create a cropped video centered on hands (for inference)
  • Extract frames from video so they can be labeled (for training)
  • TODO: done
streamlit_preprocess_ss Frontend Preview (some lab equipment blurred)
  • Training image classification neural network
  • Creating visuals for paper and presentation of results
  • TODO: always room for more helpful visualizations
  • Functions to make predictions in a video using a pre-trained model
  • Can save video with probability of contact on each frame overlaid
  • Has interactive plotly figure to predict contact time and examine errors
  • TODO: done
  • Easy function to extract frames from video
  • Given transition frames (i.e. the first frame of object contact or object non-contact), can automatically label all remaining frames in video
  • Saves tons of time labelling: labeller needs to carefully label 10 frames, and it sorts out the remaining 4000+ with perfect accuracy (accuracy guaranteed because it's rule-based, not machine learning-based)
  • TODO:
    • Add support for edge cases: First or last frame are transition frames

Auto-Zoom Jupyter Notebook (superseded by streamlit preprocess)

  • Series of functions to automatically zoom videos to the desired size, with the participant's hands in the center
  • Used MediaPipe for hand detection
  • If both the participant's and experimenter's hands are in the frame, focuses on 2 most likely hands (which will be the participants, because the model is trained on ungolved hands.
  • If desired crop region outside of photo, white pixels get added to retain desired output video dimensions
  • TODO: done

Auto-Split Jupyter Notebook (depreciated, multi-trial vids in use)

  • Series of functions to split long videos of many trials into many videos each containing only one trial
  • Determines when to split videos based on the amount of blue in a given frame (a trial ends when an experimenter, who wears a blue glove, replaces the object for the participant to classify)
  • TODO: done

Preprocess Pipeline (Depreciated)

  • .py file to automatically preprocess videos including auto-zoom and auto-crop (as developed in the Jupyter Notebooks)
  • TODO: done

TODO Before Deployment

  1. Finalize time series analysis for changepoint detection.
    • Current testing indicates PELT algorithm works well.
      • PELT pros:
      • Very performant in testing (with correct hyperparameter choice)
      • linear time complexity to find global optimum of cost function.
      • PELT cons:
      • Has penalization hyperparameter which needs to be chosen
      • assumes we don't know the number of changepoints, but there might be situations where we do, meaning the model is sub-optimal (although it is faster than those which have a given number of changepoints
    • Could also try to a 2 part Bayesian model: detect contact begins if p(touching) makes a huge jump upwards, and then say the duration is determined by a prior for a geometric distributution, and likelihood based on size of probability jump down. The geometric distribution prior for each participant can be determined by a hierarchical Bayesian model.
    • Formalizing as a Hidden Markov Model (with Baum–Welch algorithm for segmentation) is a possibility, but I'd need labeled training sets which the model isn't trained on to find transition probabilities, which is probably not a great use of resources if I can avoid it.
  2. Need to figure out how to deal with pre-zoomed videos in wrong aspect ratio
    • probably slice off the top. Maybe if aspect ratio more extreme than x, do top slice + some bottom slice and or white pixels around edges
  3. Need to evaluate model. Selective review of offline change point detection methods has methods for evaluating time series changepoint detection (which are sufficient metrics for the final reliability regardless of all intermediate steps)

Possible Future directions

  • Totally new direction: try action segmentation using a paradigm like the ASFormer: Transformer for Action Segmentation
  • MAYBE: Add model which detects clearly visible object with no contact. Then can do similar time series analysis on current type of model (which detects probability of contact in a given frame) and then do ensemble learning. Ideally each model would have good reliability, but together would be very reliable.
    • Could do this with another image classification model or maybe with image segmentation
  • MAYBE: develop some sort of custom loss function for video duration prediction so I can make a very custom time series segmentation model, probably with a neural network. No real details worked out at this point.
    • Could run into same dual training set problem as in Hidden Markov model

About

Code for undergrad thesis AI computer vision software

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published