A demonstration of data analysis and machine learning techniques for regression problems on tabular data, using the Ames Housing Dataset.
.
├───config # Configuration files
├───data # Data for training the model
├───model # Saved model
├───notebooks # Notebooks for analysis
├───references # Data documentation
├───src # Source files
├───static # Flask static directory
└───templates # Flask templates directory
- The root directory contains two main scripts:
train_model.py
andapp.py
, plus two analysis notebooks in thenotebooks
subdirectory. - The notebooks are used to analyse the data and prototype machine learning.
- The
train_model.py
script trains a LightGBM model and saves it in themodel
subdirectory. If the data is not present in thedata
directory, it is automatically downloaded. - The
app.py
scripts runs a Flask application that generates a web interface for predicting house prices based on the model generated bytrain_model.py
.
You have three different options to run this software: using your local environment, using a Conda virtual environment, or using Docker.
Download and install Python if not already installed. I recommend the version 3.12 (latest revision). Then open a terminal window, navigate to this project directory, and run the command:
pip install -r requirements.txt
To make sure that your pip environment contains the required libraries, or compatible versions. You're now ready to run this software.
Optional: if you wish to run also the notebooks, you need to install additional requirements:
pip install -r requirements-notebooks.txt
Once installed, you can run Jupyter with the command jupyter-lab
and run the notebooks using its interface.
Download and install Conda if not already present in your system. I recommend Miniconda. During the installation, make sure to deselect the option to add Anaconda to PATH and deselect the option to register it as the default Python to avoid systemwise problems.
Open the Anaconda prompt and create a new environment (in this example we call it amesml
, but you can name it however you like):
conda create -n amesml python=3.12
Answer y
when prompted for confirmation. After completing the creation, activate the newly created environment:
conda activate amesml
Navigate to this project directory, then install the requirements:
pip install -r requirements.txt
You're now ready to run this software.
Optional: if you wish to run also the notebooks, you need to install additional requirements:
pip install -r requirements-notebooks.txt
Once installed, you can run Jupyter with the command jupyter-lab
and run the notebooks using its interface.
The application can be built as a Docker image (in this example we call it amesml
, but you can name it however you like):
docker build -t amesml .
Building the image for the first time will take several minutes.
The app.py
script runs a server on the address 127.0.0.1:5000
(localhost:5000
). While running, it can be accessed using a browser to connect to the address. The server generates a web interface for the machine learning model.
Open a terminal (or Anaconda Prompt if using Conda) and navigate to this project directory.
In case the model file ames_regressor.pickle
is missing from the model
directory, you can generate it invoking the train_model.py
script:
python train_model.py
It will take a few or several seconds, depending on how powerful your machine is.
To run the main application, use the command:
python app.py
Once the image is built, it should be run by publishing the container's port 5000 to 127.0.0.1:5000
(localhost:5000
) on the host:
docker run -p 127.0.0.1:5000:5000 amesml:latest
If you want to run the service in the background, detatching it from the terminal, you can use the additional argument -d
:
docker run -dp 127.0.0.1:5000:5000 amesml:latest
In case you use Conda or Docker, here you can find instructions to cleanup the environments.
Open Anaconda Prompt if not already open. To remove the environment, deactivate it first if it's active:
conda deactivate
You can now remove the environment with the command:
conda env remove -n amesml
Answer y
when prompted for confirmation.
To remove the packages and images, use Docker's interface. To empty the cache, use the command:
docker builder prune
Answer y
when prompted for confirmation.