This repo uses Python to extract YouTube channel video URLs and mass transcript data from those videos.
Download or clone this repo to get started git clone https://github.com/tpcav/mfm-transcripts.git
. Or see the main.py
file which has the code for the YouTube Transcript API for Python.
- JSON Decoder
import json
- YouTube Transcript API
pip install youtube-transcript-api
- Once downloaded to start DEMO run:
main.py
- Go to the YouTube Channel's video page
- Inspect Element
- Go to the console
- Copy this first
var scroll = setInterval(function(){ window.scrollBy(0, 1000)}, 1000);
This code scrolls to the bottom of the channels video page, to the first video. - Copy this seconde
window.clearInterval(scroll); console.clear(); urls = $$('a'); urls.forEach(function(v,i,a){if (v.id=="video-title-link"){console.log('\t'+v.title+'\t'+v.href+'\t')}});
This code returns a list in the console of all of the titles and URLs of all the videos of a channel.
- This is a full URL
https://www.youtube.com/watch?v=2dOCPr355TQ
- We just want the last part
2dOCPr355TQ
This is the video id. - The video ids can be used by the YouTube Transcript API to get the transcript data
See main.py
TODO, add video file