Skip to content

This repository is for a sharing work from a competition on kaggle were teamed up on.

License

Notifications You must be signed in to change notification settings

thomasthaddeus/DSComp

Repository files navigation

DS-Competition

Pylint

This repository holds in progress work.

Distribution of Work

  1. Eli - Data Collection and Preprocessing Specialist: This person would be responsible for collecting the necessary data for the project, cleaning it, and transforming it into a format that can be used for model training. They would also handle any necessary data augmentation.

  2. Thad - Feature Engineer: This person would be responsible for creating new features from the existing data that might help improve the model's performance. They would work closely with the Data Collection and Preprocessing Specialist to understand the data and come up with effective features.

  3. Model Developer 1: This person would be responsible for selecting a suitable model, training it, and tuning its parameters. They would work closely with the Feature Engineer to understand the features and how they can be used in the model.

  4. Model Developer 2: This person would also be responsible for model development. Having two people on this task allows for parallel experimentation with different models or different sets of parameters, which can speed up the process and potentially lead to better results.

  5. Validation and Testing Specialist: This person would be responsible for evaluating the model's performance using a validation set and making adjustments to the model if necessary. They would work closely with the Model Developers to understand the models and how they can be improved.

  6. Person 6 - Submission and Documentation Manager / Infrastructure Manager: This person would be responsible for submitting the team's entries to the competition, documenting the team's work, and managing the infrastructure needed for model training. This includes keeping track of the different models that were tried, the features that were used, and the performance of each model. They would also handle any necessary setup and management of cloud resources, and manage the team's code using a version control system like Git.

Users

  1. Thad - Data Collection and Preprocessing Specialist
  2. Eli - Feature Engineer
  3. Nicholas

Project Structure

  • .github: don't touch this folder
  • /data: all data should be stored here
  • /models: store learning models here
  • /notebooks: put all notebooks here under your folder
  • /src: any source code you need to import for your notebook to work

Directory Tree

.
├── data
│   ├── eval_student_summaries
│   │   ├── prompts_test.csv
│   │   ├── prompts_train.csv
│   │   ├── sample_submission.csv
│   │   ├── summaries_test.csv
│   │   └── summaries_train.csv
│   └── json
├── LICENSE
├── models
├── notebooks
│   └── sample_notebk.ipynb
├── README.md
├── requirements.txt
├── sitemap.html
├── src
│   ├── evaluation
│   ├── prep
│   │   ├── data_prep.py
│   │   └── text_prep.py
│   ├── scripts
│   └── visualize
└── tests

Setup and Installation

Instructions for setting up and installing any necessary software or libraries.

If you want to use weights and biases here is the link

Usage

Instructions for how to run the code.

How to setup the virtual environment.

License

MIT

About

This repository is for a sharing work from a competition on kaggle were teamed up on.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages