Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lfunderburk committed May 15, 2023
1 parent f09dd99 commit 8a356dd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The data pipeline consists of five scripts:

1. Data download and wrangling: extracts data on vehicle models from this [public API](https://open.canada.ca/data/api/action/package_show?id=98f1a129-f628-4ce4-b24d-6f16bf24dd64)
2. CO2 ratings are missing in a large proportion of fuel-based vehicles. The goal of this script is to perform supervised learning (voting classifier) to impute missing CO2 scores based on fuel-based vehicles. A model is setup and saved.
3. Model is used to complete missing values for CO2 ratings. Given there is a high correlation between CO2 ratings and smog ratings, KNNImputer is used on to complete missing smog rating scores.
3. Model is used to complete missing values for CO2 ratings. KNNImputer is used on to complete missing smog rating scores.
4. Once the data is labelled, clustering is perfomed with the purpose of uncovering patterns. Recursive feature elimination with cross-validation is used to identify key features. Once key features are selected, Agglomerative Clustering, TSNE is computed for 2 and 3 dimensions, then results are compared against labelled data.
5. Results are served via an API with two key entry points:
* Search: a natural language entry point that can ask questions about the data.
Expand Down

0 comments on commit 8a356dd

Please sign in to comment.