Skip to content

Pipeline to generate summaries of youtube videos, using Whisper for transcription, and BART for summarisation.

License

Notifications You must be signed in to change notification settings

jwhogg/youtube_to_summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

youtube_to_summary

Pipeline to generate summaries of youtube videos, using Whisper-Small for transcription, and BART-LARGE-XSUM for summarisation. BART has been finetuned on the popular CNN/Daily Mail Dataset, as it lends itself to summarisation tasks. Initially, we attempted to fine-tune GPT-2 for the summarisation task, but found it had poor performance: being a generative transfotmer, it generates words one-by-one, (extractive summarisation) whereas BART can generate at the sentence level (using abstractive summarisation). For more info on choice of summarisation model, see this article. We use the HuggingFace Transformers libary to abstract some of the PyTorch code using the pipeline submodule.

Warning

For some instances of google colab, the yt-dlp package may not work. This is an issue with the package, and cannot be resolved at this time. If you encouter this, I reccomend trying the pytube package instead.

About

Pipeline to generate summaries of youtube videos, using Whisper for transcription, and BART for summarisation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages