Skip to content

Create efficient datasets for training models

License

Notifications You must be signed in to change notification settings

emma-heriot-watt/datasets

Repository files navigation

EMMA: Datasets

Python 3.9 PyTorch Poetry
pre-commit style: black wemake-python-styleguide

Continuous Integration Tests

Important

If you have questions or find bugs or anything, you can contact us in our organisation's discussion.


To use this package in your project, you can install it by running

poetry add git+https://github.com/emma-simbot/datasets.git

You can then just import from emma_datasets or run commands using the CLI with

python -m emma_datasets

Writing code and running things

When running commands for emma_datasets, you can append --help to get more information on the commands and any arguments available to you.

Project structure

This is organised in very similarly to structure from the Lightning-Hydra-Template to facilitate reproducible research code.

  • scriptssh scripts to run experiments
  • notebooks — Jupyter notebook for analysis and exploration
  • storage — data for training/inference (and maybe use symlinks to point to other parts of the file system)
  • testspytest scripts to verify the code
  • src/emma_datasets — where the main code lives

How-to guides

For more detail on how to use this library, check out the following specific pages on: