Skip to content

A collection of notebooks for walking through the typical ML lifecycle from data cleaning through to model hosting using Amazon SageMaker.

License

Notifications You must be signed in to change notification settings

jpbarto/ml_lifecycle_lab

Repository files navigation

Machine learning lifecycle lab

Binder

A collection of notebooks for walking through the typical ML lifecycle from data cleaning through to model hosting using Amazon SageMaker.

A typical ML lifecycle will look something like...

  1. Identify a business problem or question which ML can answer
  2. Identify the data sources available to describe the problem space
  3. Acquire and cleanse the data or a sample of the data
  4. Engineer a feature set from the data or data sample so that everything has meaning and relevance
  5. Apply this cleansing and feature engineering logic to the full data set
  6. Spot check multiple ML algorithms against a sample of the feature set to assess which algorithm is likely to give the best result
  7. Select one or more algorithms and perform hyperparameter optimization to determine the best configuration parameters, use a sample of the feature set
  8. Train a model using the best performing algorithm and hyperparameters on the full training feature set
  9. Test the model on a control or test feature set to produce a baseline for performance
  10. Deploy the model for consumption by the business (Lambda, mobile device, container, etc)
  11. Consider how future observations will be engineered in preparation for inference
  12. Monitor the model for context drift

For this collection of labs we will start by defining a business problem and then work through the process through to model deployment.

Table of contents

  1. [Feature engineering](./01 Feature engineering.ipynb) This notebook walks through acquiring the data, cleaning it and then engineering a base feature set which can then be prepared for ML training.
  2. [ML algorithm spot check](./02 Algorithm spot check.ipynb) This notebook walks through transforming the cleansed data to assess the performance of many ML algorithms.
  3. [Hyperparameter optimization](./03 Hyperparameter tuning.ipynb) This notebook walks through performing HPO on an algorithm and a subset of the feature set before performing a full scale training job.
  4. [Training your model](./04 Training.ipynb) This notebook walks through performing a full scale training job of your model.
  5. [Hosting and usage](./05 Host and infer.ipynb) This notebook walks through how to host a trained model and use it to make predictions.

Further reading


Resources

About

A collection of notebooks for walking through the typical ML lifecycle from data cleaning through to model hosting using Amazon SageMaker.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published