Skip to content

yutao-data/BDM-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BnBInsight - Big Data Management Project

Visualization

The project uses Tableau for data visualization. The visualization results can be accessed via this link.

Run Data

You can access the run data for this project via this link.

Table of Contents

  1. Introduction
  2. Project Structure
  3. Setup and Installation
  4. Data Sources
  5. Data Processing Pipeline
  6. Machine Learning Models
  7. Visualization
  8. References

Introduction

BNBinsight is a project focused on intelligent pricing decisions for the hospitality industry in the Catalonia region. By analyzing and processing data from multiple sources, BNB Insight aims to help hotel owners optimize their pricing strategies, attract more bookings, and increase their revenue. The project uses advanced data processing and machine learning techniques to provide actionable insights and recommendations.

Project Structure

The project structure is as follows:

BDM-Project/
├── Project 1/
├── Project 2/
│   ├── BDM_P2_Exploitation.ipynb
│   ├── BDM_P2_Formatted_Exploitation.ipynb
│   ├── BDM_P2_Formatted.ipynb
│   └── graphs/
├── README.md

Setup and Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/BDM-Project.git
    cd BDM-Project
  2. Create a virtual environment and activate it:

    python3 -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r Project 2/requirements.txt

Data Sources

The project uses data from the following sources:

  1. Inside Airbnb: Barcelona Hotel Dataset
  2. Barcelona's City Hall Open Data: Transitaeri Flightradar

Data Processing Pipeline

The data processing pipeline involves the following steps:

  1. Data Ingestion: Collecting raw data from various sources.
  2. Data Cleaning: Removing duplicates and handling missing values.
  3. Data Transformation: Transforming data into suitable formats for analysis.
  4. Data Integration: Merging different datasets to create a comprehensive view.
  5. Data Filtering: Filtering data based on specific criteria.
  6. Data Validation: Ensuring data integrity and quality.
  7. Data Storage: Storing cleaned and processed data in a structured format.

Machine Learning Models

We use PySpark MLlib for predictive analysis. The model pipeline includes:

  1. Feature Selection: Selecting relevant features for the model.
  2. Model Training: Training the model using Linear Regression.
  3. Model Evaluation: Evaluating model performance using metrics such as MSE, MAE, and R-squared.
  4. Model Deployment: Saving the trained model for future use.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages