See the source code here: https://github.com/fontanads/ytbot.
This is a personal project experimenting with the YouTube API.
This was made to practice and for fun, so don't expect much from it, it is a WIP without a deadline.
Disclaimer: some code has been written with help of our coding buddy Chat GPT and GitHub Copilot on VS Code, which was also a part of the project - experimenting with AI aid in writing code. However, almost every snippet that came out of the generative AI outputs has been modified to some extent. In general, my experience is iteratively going back and forth until you achieve the functionality you desired. So I'd say this is more like an "AI-in-the-loop" codestyle.
Current functionality includes:
- API requests to collect trending video statistics and metadata
- upload to Big Query tables
- a Streamlit app with a simple dashboard displaying the collected data.
- Installing the project
- YouTube API Credentials
- Getting data for trending videos
- Big Query API
- Streamlit Dashboard
The dependencies are found in the pyproject.toml
file.
I suggest you use poetry
to install the dependencies.
Once you have poetry installed and configured, run poetry install
in the folder you cloned the repo. Use poetry shell
to spin up the Python virtual environment manually and run any scripts of the project.
To instantiate the API class, it is necessary to deal with the credentials.
This version requires manual authentication on browser.
First step is authorizing the credentials on your Google Cloud Platform (GCP) project.
Follow this link for detailed instructions.
Once you've been through the steps above, you'll be able to see the API listed in your GCP project: https://console.cloud.google.com/apis/dashboard?project=YOUR_PROJECT_ID
.
The code is taking a variable YOUTUBE_CLIENT_SECRET_FILE
from a .env
file (not in the repo, you need to create it yourself). This variable contains just the path to the JSON file with the API secret. The JSON file follows this format:
{
"web":
{
"client_id":"YOUR_CLIENT_ID.apps.googleusercontent.com",
"project_id":"GCP_PROJECT_ID",
"auth_uri":"https://accounts.google.com/o/oauth2/auth",
"token_uri":"https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
"client_secret":"YOUR_CLIENT_SECRET"
}
}
Feel free to change this implementation modifying __init__
method of YouTube
class in src/core/youtube.py
.
The script is found in src/analytics/trending_videos_by_country.py
.
It requests data from trending videos (descending order by view count) per region with no specific topic query. It goes over a list of regions (country codes) and deals with pagination to retrieve a maximum number of video ids.
A second request implements a batch request of statistics and metadata for the listed videos in the first search.
Be mindful with the quota limits of the YouTube API.
The code for the use of the API is found here: src/cloud/bigquery.py
.
Credentials here are not being passed explicitly, if you want to change its implementation modify the __init__
method of the BigQueryClient
class.
To use it from your local machine without writing a new implementation to deal with the credentials, configure your GCP client following instructions here.
The method upload_dataframe
is the one with more implementation details. It has input flags to delete the destination table, in case it exists, and also to overwrite date partitions.
The source code is in src/dashboard/app.py
.
To run the application, run this command your terminal:
streamlit run src/dashboard/app.py
The Big Query client has download_table
that is used in the dashboard to collect the data. The helper functions in the app code use the Streamlit decorator @st.cache_data
to cache the downloaded DataFrames while the app is executing. Each region table is downloaded separetely running a filtered query. Changing between regions in the drop-down menu does not reset the cache.