BnBInsight - Big Data Management Project

Visualization

The project uses Tableau for data visualization. The visualization results can be accessed via this link.

Run Data

You can access the run data for this project via this link.

Introduction

BNBinsight is a project focused on intelligent pricing decisions for the hospitality industry in the Catalonia region. By analyzing and processing data from multiple sources, BNB Insight aims to help hotel owners optimize their pricing strategies, attract more bookings, and increase their revenue. The project uses advanced data processing and machine learning techniques to provide actionable insights and recommendations.

Project Structure

The project structure is as follows:

BDM-Project/
├── Project 1/
├── Project 2/
│   ├── BDM_P2_Exploitation.ipynb
│   ├── BDM_P2_Formatted_Exploitation.ipynb
│   ├── BDM_P2_Formatted.ipynb
│   └── graphs/
├── README.md

Setup and Installation

Clone the repository:

git clone https://github.com/yourusername/BDM-Project.git
cd BDM-Project

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:

pip install -r Project 2/requirements.txt

Data Sources

The project uses data from the following sources:

Data Processing Pipeline

The data processing pipeline involves the following steps:

Data Ingestion: Collecting raw data from various sources.
Data Cleaning: Removing duplicates and handling missing values.
Data Transformation: Transforming data into suitable formats for analysis.
Data Integration: Merging different datasets to create a comprehensive view.
Data Filtering: Filtering data based on specific criteria.
Data Validation: Ensuring data integrity and quality.
Data Storage: Storing cleaned and processed data in a structured format.

Machine Learning Models

We use PySpark MLlib for predictive analysis. The model pipeline includes:

Feature Selection: Selecting relevant features for the model.
Model Training: Training the model using Linear Regression.
Model Evaluation: Evaluating model performance using metrics such as MSE, MAE, and R-squared.
Model Deployment: Saving the trained model for future use.

References

BDM-MIRI/BDMA, \textit{Big Data Management}, Available online at: \url{https://raco.fib.upc.edu/home/assignatura?espai=270678}, accessed June 2024.
Neo4j, Inc., \textit{Neo4j Documentation}, Available online at: \url{https://neo4j.com/docs/}, accessed June 2024.
Tableau, \textit{Tableau}, Available online at: \url{https://www.tableau.com/}, accessed June 2024.
Pyspark MLlib, \textit{Pyspark MLlib}, Available online at: \url{https://spark.apache.org/docs/latest/mllib-guide.html}, accessed June 2024.
Inside Airbnb, \textit{Inside Airbnb}, Available online at: \url{https://insideairbnb.com/get-the-data/}, accessed June 2024.
Transitaeri Flightradar, \textit{Transitaeri Flightradar}, Available online at: \url{https://opendata-ajuntament.barcelona.cat/data/en/dataset/transitaeri_flightradar_ppal_pais}, accessed June 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.idea		.idea
Project 1		Project 1
Project 2		Project 2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BnBInsight - Big Data Management Project

Visualization

Run Data

Table of Contents

Introduction

Project Structure

Setup and Installation

Data Sources

Data Processing Pipeline

Machine Learning Models

References

About

Releases

Packages

Contributors 2

Languages

yutao-data/BDM-Project

Folders and files

Latest commit

History

Repository files navigation

BnBInsight - Big Data Management Project

Visualization

Run Data

Table of Contents

Introduction

Project Structure

Setup and Installation

Data Sources

Data Processing Pipeline

Machine Learning Models

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages