The project uses Tableau for data visualization. The visualization results can be accessed via this link.
You can access the run data for this project via this link.
- Introduction
- Project Structure
- Setup and Installation
- Data Sources
- Data Processing Pipeline
- Machine Learning Models
- Visualization
- References
BNBinsight is a project focused on intelligent pricing decisions for the hospitality industry in the Catalonia region. By analyzing and processing data from multiple sources, BNB Insight aims to help hotel owners optimize their pricing strategies, attract more bookings, and increase their revenue. The project uses advanced data processing and machine learning techniques to provide actionable insights and recommendations.
The project structure is as follows:
BDM-Project/
├── Project 1/
├── Project 2/
│ ├── BDM_P2_Exploitation.ipynb
│ ├── BDM_P2_Formatted_Exploitation.ipynb
│ ├── BDM_P2_Formatted.ipynb
│ └── graphs/
├── README.md
-
Clone the repository:
git clone https://github.com/yourusername/BDM-Project.git cd BDM-Project
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r Project 2/requirements.txt
The project uses data from the following sources:
The data processing pipeline involves the following steps:
- Data Ingestion: Collecting raw data from various sources.
- Data Cleaning: Removing duplicates and handling missing values.
- Data Transformation: Transforming data into suitable formats for analysis.
- Data Integration: Merging different datasets to create a comprehensive view.
- Data Filtering: Filtering data based on specific criteria.
- Data Validation: Ensuring data integrity and quality.
- Data Storage: Storing cleaned and processed data in a structured format.
We use PySpark MLlib for predictive analysis. The model pipeline includes:
- Feature Selection: Selecting relevant features for the model.
- Model Training: Training the model using Linear Regression.
- Model Evaluation: Evaluating model performance using metrics such as MSE, MAE, and R-squared.
- Model Deployment: Saving the trained model for future use.
- BDM-MIRI/BDMA, \textit{Big Data Management}, Available online at: \url{https://raco.fib.upc.edu/home/assignatura?espai=270678}, accessed June 2024.
- Neo4j, Inc., \textit{Neo4j Documentation}, Available online at: \url{https://neo4j.com/docs/}, accessed June 2024.
- Tableau, \textit{Tableau}, Available online at: \url{https://www.tableau.com/}, accessed June 2024.
- Pyspark MLlib, \textit{Pyspark MLlib}, Available online at: \url{https://spark.apache.org/docs/latest/mllib-guide.html}, accessed June 2024.
- Inside Airbnb, \textit{Inside Airbnb}, Available online at: \url{https://insideairbnb.com/get-the-data/}, accessed June 2024.
- Transitaeri Flightradar, \textit{Transitaeri Flightradar}, Available online at: \url{https://opendata-ajuntament.barcelona.cat/data/en/dataset/transitaeri_flightradar_ppal_pais}, accessed June 2024.