Repository for the Fall 2022 Peak.AI x NYU Data Science Club Datathon, Recommender Systems Challenge (Winner)
Teammates:
- Sunny Son (LinkedIn, GitHub)
- Morgan Xu (LinkedIn, GitHub)
- Shane Sun (LinkedIn)
- Sunny Yang (LinkedIn)
The goal in this Datathon is to act as developers for the host company of this datathon, Peak.AI, and develop a recommender system model for our customer, the Brazilian e-commerce company Olist, to advertise products to users based off of previous purchase history
We then joined all necessary tables to determine all relevant orders based on the primary key of customer_id
, at the neds related to product_id
with the entity relationship diagrams shown below:
We then follow the below steps to finalize the data for modeling:
We used a k-Nearest Neighbor model, to determine a mapping between "features" (e.g. cost, location, etc.) and "label" (e.g. Lamp), and minimize a specific metric distance for the labels of previously purchased items to the label of the target item. This procedure is shown below:
With this, our model is ready to generate recommendations. We accept an input parameter "k" for the number of closest products to generate, and do so using a modified euclidiean distance metric based on product category:
In order to determine a "distance" metric for our model to follow, we decided to scale the k-NN distance by the "cosine similarity" of a product's category to ensure similar placement of similar products, using the Global Vector (GloVe) 50d embedding:
For the full notebook, please refer to the file peak-ai-x-nyu-dsc-datathon-team-7-submission.ipynb