Shopee Price Match

Environment setup

Install environment with conda install env.yml

Alternative, if it doesn't work due to incompatible cudatoolkit, you can edit the env.yml by specifying a compatible cudatoolkit (more info here).

Utility function to export .yml : conda env export | grep -v "^prefix: " > env.yml

If you are going to use `jupyter notebook`

Activate your environment conda activate price_match_env
Install the kernel python -m ipykernel install --user --display_name "price_match_env"
Spin off your jupyter notebook as usual jupyter notebook

Testing Your Environment

Normally, only GPU-dependent modules are problematic.

Test `Tensorflow` installation:

Activate the environment conda activate price_match
Bring up python shell python
Import and check import tensorflow as tf ; tf.test.is_gpu_available() # Should return True
If you receive error similar to this Could not load dynamic library 'libcudart.so.11.0', then you need to set your environment variable to point to the the correct folder that contains the library (likely in /home/<your_username>/anaconda3/envs/price_match/lib/). If you only want this variable to be set when your conda env is active, follow this guide here

Test `Pytorch` installation:

Step 1 and 2 of test tensorflow installation
Import and check import torch ; torch.cuda.get_device_name() # Should return your NVIDA GPU name

Test `Xgboost` installation:

Step 1 and 2 of test tensorflow installation

Since there is no utility function that helps to check if gpu is available for xgboost, need to write some sample code to test:

import numpy as np
import xgboost as xgb

n = 10_000
m = 100
X = np.random.randn(n, m)
y = np.random.randn(n)
exp_models = []

for i in range(3):

    # As long as this runs with no problem, gpu support should be ok
    clf = xgb.XGBRegressor(
        tree_method='gpu_hist', eta=0.1, max_depth=6, verbosity=0)
    trained_model = clf.fit(X, y, verbose=False)

Neptune Setup

Sign up for a Neptune account here.
Get your Neptune API Token (on your neptune.ai console, select your profile icon on the top right corner -> Get Your API Token)
Create a new project (e.g. My Shopee Price Match Project)

In your local environment's root, create a .env file with the following lines:

NEPTUNE_TOKEN="<your_api_token>"
PROJECT_NAME="<your_neptune_username>/<your_neptune_project_name>"

Data & Model Folder Structure

model
├── efficient_net_b3
│   └── pretrained
│       └── efficientnet_b3.pth
└── indobert_lite_p2
    ├── pretrained
    │   ├── config.json
    │   ├── pytorch_model.bin
    │   ├── README.md
    │   ├── special_tokens_map.json
    │   ├── tf_model.h5
    │   ├── tokenizer_config.json
    │   └── vocab.txt
    └── tokenizer
        ├── special_tokens_map.json
        ├── tokenizer_config.json
        └── vocab.txt

data
└── raw
    ├── train_images
    │   ├── 0a0d257d1127f7d4298a7753875b372a.jpg
    │   ├── 0a1ad1756ba6219eb2359fd3ed2a7082.jpg
    │   └── 0a1c01e1b84cc6c6655dbf886fd72ead.jpg
    └── train_split_v3.csv

Training all models

Simply run bash train_model.sh
The following models will be run:
1. Indobert Lite P2 (NLP) is trained using all data (i.e. no validation)
2. Efficientnet B3 (IMG) is trained using 4 folds validation (Grouped K Fold - each fold consists of unique label groups)

Kaggle Notebooks

Both training and inference notebooks are provided in kaggle_notebooks

Training Notebook

To train models on Kaggle environment, you need to provide your Github and Neptune tokens using Kaggle Secrets (In a new Kaggle notebook, navigate to Add-ons -> Secrets; Make sure internet access is enabled)

Submission/ Inference Notebook

For submission, since internet access is disabled, you need to install the required packages (Faiss and TIMM) using the wheels uploaded (use the following or upload your own):
- Faiss
- TIMM
Also, upload both the pretrained weights and trained weights of the NLP and IMG models to Kaggle as datasets, and attach them to your submission notebook (check that the paths in the notebook are pointing to the right folder - depending on how you named the uploaded files)

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
data/raw		data/raw
kaggle_notebooks		kaggle_notebooks
model		model
src		src
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml
mlm.py		mlm.py
train.py		train.py
train_model.sh		train_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shopee Price Match

Environment setup

If you are going to use `jupyter notebook`

Testing Your Environment

Test `Tensorflow` installation:

Test `Pytorch` installation:

Test `Xgboost` installation:

Neptune Setup

Data & Model Folder Structure

Training all models

Kaggle Notebooks

Training Notebook

Submission/ Inference Notebook

About

Releases

Packages

Languages

Toukenize/price_match

Folders and files

Latest commit

History

Repository files navigation

Shopee Price Match

Environment setup

If you are going to use jupyter notebook

Testing Your Environment

Test Tensorflow installation:

Test Pytorch installation:

Test Xgboost installation:

Neptune Setup

Data & Model Folder Structure

Training all models

Kaggle Notebooks

Training Notebook

Submission/ Inference Notebook

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

If you are going to use `jupyter notebook`

Test `Tensorflow` installation:

Test `Pytorch` installation:

Test `Xgboost` installation:

Packages