Skip to content

FarzanehSoltanzadeh/Spotify

Repository files navigation

Analyzing Spotify Data: A Comprehensive Data Analysis

Explore the rich tapestry of music through our in-depth analysis of Spotify data, unveiling hidden insights and trends at the intersection of data science and music appreciation.

Spotify

Table of Contents

PHASE ONE: Data Collection and Preparation

In our initial phase, we kickstarted our analysis with the dataset sourced from Kaggle. As our exploration deepened, it became apparent that further data enrichment was necessary. Consequently, we employed the Spotify library in Python, enabling us to extract comprehensive artist and album information.

Please note that our dataset exclusively comprises the top 200 songs from each day from 2017 to 2020, across 35 different countries. All the data analyses and insights we present are rooted in this comprehensive dataset, offering a thorough examination of music trends and patterns during that specific year and across diverse geographical regions.

The dataset acquired through the utilization of the Spotify library encompasses the following components:

Artists Information
├── Name
├── Type
├── Popularity
├── Genres
└── Followers
Albums Information
├── Name
├── Release Date
├── Total Tracks
├── Popularity
└── Artists

To install the Spotify library, just use this command in your command prompt:

pip install spotify

PHASE TWO: Data Cleaning and Database Design

Transitioning to the following phase, we undertook the construction of a database employing the previously gathered data. Our initial step encompassed the creation of an entity-relationship diagram to strategically guide our progress.

After carefully refining and optimizing the data, we harnessed the power of the SQLAlchemy library in Python. This process led to the meticulous development of our database infrastructure, ensuring efficiency and precision throughout.

To install the SQLAlchemy library, just use this command in your command prompt:

pip install SQLAlchemy

PHASE THREE: Statistical Analysis

In the third phase of our project, we conducted a comprehensive statistical analysis on the Spotify dataset that had been curated and refined throughout the preceding phases.

This analytical stage was aimed at extracting meaningful insights and enhancing our understanding of the dataset. This analytical phase served as a pivotal step in unearthing intricate patterns and correlations within the dataset, fostering a more profound comprehension of the dynamics underpinning the realm of music on the Spotify platform.

The following inquiries were systematically examined:

  • Investigation of the Top 5 Albums of Each Artist Based on Popularity
  • Examination of Popular Literary Genres: Patterns and Influences
  • Identifying the Top 5 Popular Artists within Each Subgenre
  • A Comparative Study of the Top 10 Chart-Topping Songs across Distinct Musical Categories
  • Exploration of the Top 10 Most Popular Artists Segregated by Genre
  • Comparative Analysis of Noteworthy Album Releases: Unveiling Distinctive Attributes of 5 Select Albums from the Current Year
  • Scrutiny of the Ten Most Profoundly Expressive Lyricists in the Artistic Domain
  • Profiling the Annual Distribution of Artist Activity through the Aggregation of Total Song Counts
  • Delving into Musical Trends: An Appraisal of the Leading 10 Songs' Popularity, Both Explicit and Implicit in Content
  • The lyrical themes and content in rap music compared to other music genres
  • Exploring the Popularity of Explicit vs. Non-Explicit Songs: A Comparative Analysis
  • Examine the relationship between popularity and followers (Hypothesis Testing)
  • Factors Influencing Song Popularity (Definition of Criteria)
  • Personalized Mood-Based Music Selection

Employing Streamlit and Plotly frameworks, we developed a user-friendly web application to visually present our data insights. To install the Streamlit library and Plotly library, just use these command in your command prompt:

pip install streamlit
pip install plotly

Through the utilization of the MySQL connector library, we established direct connectivity to the database for efficient data retrieval, ensuring a seamless integration of information. To install the mysql-connector library, just use this command in your command prompt:

pip install mysql-connector-python

statsmodels is a versatile Python library designed for statistical modeling and hypothesis testing. It empowers users to perform a wide range of statistical analyses, including linear and generalized linear regression, time series analysis, nonparametric methods, hypothesis testing, and survival analysis. With tools for modeling and interpreting data, statsmodels is invaluable in fields like statistics, economics, and data science, enabling professionals to estimate model parameters, assess goodness of fit, conduct hypothesis tests, and visualize results. Whether analyzing relationships between variables, conducting hypothesis tests, or working with time series data, statsmodels is a reliable choice for rigorous statistical analysis and inference. To install the statsmodels library, just use this command in your command prompt:

pip install statsmodels

PHASE FOUR: Predictive Modeling and Machine Learning

In this section, we leveraged a variety of machine learning techniques to gain valuable insights from Spotify data. Our approach included both supervised learning, with regression and classification methods, and unsupervised learning, particularly clustering. We employed specific models such as linear regression and random forest for the following key objectives:

  • Predicting Artist Popularity through Regression Analysis
  • Genre-Based Track Classification Analysis
  • Mood-Driven Track Classification through Lyric Analysis
  • Track Clustering using K-Means Analysis
  • Predicting Track Popularity through Regression Analysis

This comprehensive approach allowed us to extract meaningful information and enhance our understanding of the Spotify dataset.

For our supervised learning tasks, we harnessed the power of the scikit-learn library, enabling us to employ linear regression and random forest models for classification. Additionally, our unsupervised learning efforts, particularly K-Means clustering, were made possible through the utilization of the kmodes library. To install these essential libraries, simply execute the following commands in your command prompt:

pip install scikit-learn
pip install kmodes

PHASE FIVE: Visualization and Business Insights in Power BI

In the fifth and final phase, we employed Power BI to craft a comprehensive dashboard. This dashboard served as the culmination of our efforts, providing a visual representation of our findings. Through these visualizations, we aimed to enhance our understanding of the Spotify data and offer a more intuitive and insightful presentation of our results.

PowerBI

Please be aware that the file size exceeded the limit allowed for GitHub uploads. To address this, the file has been compressed into two RAR archives. Utilize WinRAR to extract both parts simultaneously, revealing the consolidated Power BI file and enabling access to the dashboard.

Prerequisites:

Prior to engaging with the content and materials presented in this repository, a foundational set of prerequisites should be fulfilled. These prerequisites have been identified to ensure a seamless and productive interaction with the project environment. The following steps are recommended:

  1. Python Installation: It is imperative to have the Python programming language installed on your system. The latest version is recommended to leverage the most up-to-date functionalities.

  2. Jupyter Notebook Installation: To facilitate an interactive and organized computational environment, the utilization of Jupyter Notebook is recommended. Alternatively, for users who prefer Visual Studio Code, the Jupyter Notebook extension can be integrated.

  3. MySQL Installation: In order to partake in database-related activities, the installation of MySQL is necessary. This will enable efficient management and manipulation of structured data.

  4. Power BI Installation: For advanced data visualization and exploration, Microsoft Power BI Desktop installation is recommended. This tool will empower users to derive comprehensive insights from complex datasets.

  5. Python Library Installation: To harness the full spectrum of capabilities offered by the project, ensure the installation of all the Python libraries explicitly mentioned in the project documentation.

By meticulously fulfilling these prerequisites, users can immerse themselves in the project material with a heightened capacity to comprehend, analyze, and derive value from the presented content.

Additional Libraries:

Below, you'll find a list of other libraries that have been employed in this project. To install them, simply copy and execute the commands provided in your command prompt (cmd). This will ensure the proper setup of all required dependencies.

pip install patool
pip install pandas
pip install numpy
pip install matplotlib
pip install seaborn
pip install scipy

Authors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •