migai/Kag Repository for Coursera's Competitive Data Science course, taught by NRU-HSE
Team: Andreas Theodoulou and Michael Gaidis
Data files for the final competition reside in the readonly/final_project_data directory
template_Kaggle_Coursera_Final_Assignment.ipynb is a starter notebook that aids in using Google Colab, by loading all data and code helper files from this GitHub repository into the Colab environment.
This should be forked and used at your discretion, and then appropriately renamed with a version number and stored in the "ipynb_versions" directory or in a new branch you create. Only modify the "template" ipynb in the top directory if you are including modifications that are key for every ipynb notebook we use in the competition. Otherwise, clone or fork from the modified version you have been working on.
There is a "kaggle_utils_at_mg.py" starter file in the "helper_code" directory as well. This is (at present) just a template for how we could store snippets of code to make our Jupyter notebooks more readable.
The "data_output" directory can be used to store modified data (e.g., with extra feature columns) or can make use of better file storage / compression and pickling to include any important hyperparameters you'd like to keep, etc.
For some stimulating ideas on useful helper code files, and how to proceed with EDA, feature generation, and modeling in the competition, look in the "readonly/kaggletils" and "readonly/examples" directories, which hold files associated with our Coursera instructors and other pioneers in this (or other) Kaggle competition.