Overview 📖

In a snapshot, orchex is a library for the orchestration of data workflows, including hierarchical extraction, transformation with pseudonymisation, automated documentation, and secure sharing mechanisms.

For a closer look, you can explore the core module's primary code located at orchex/dataextract.py, where you'll find the implementation of the main data classes: DataSourceand DataExtract.

DataSource

This class contains several methods to facilitiate data extraction from a data source and create a dataframe object.

Supported data sources:
- SQL code
- SQL file
- Table Storage database
- csv file
DataExtract

This class allows the user to combine multiple DataSources objects at a single entity, enabling seamless execution of the same operation to multiple different DataSources such as pseudonymisation.

The data from the DataExtract will be stored in the following filestructue:
```
{name}-{YYYYmmDDHHMM}-{id}
├── {name}-{YYYYmmDDHHMM}-{id}-PRIVATE.pkl
├── {name}-{YYYYmmDDHHMM}-{id}-PUBLIC/
│   ├── data
│   │   └──{data_source_name}.csv
│   ├──img
│   ├──docs
│       ├── README.md
│       └── img
```
This class allows for 3 different ways of saving the data:
- save(): saves a .pkl file of the class . Recommended for personal use. NOT sharing data
- export(): creates pseudonymised .csv files. Best way to share data
- archive(): creates a .zip file with all the created folders and uploads them to Azure Blob Storage.

📝 NOTE: In both classes there is the functionality to create a markdown report with all the class info.

2. Setup 🧑‍🔬

2.1 Prerequisites 📋

Python 🐍

Python version 3.12^ is required
ODBC Driver (if running SQL code) 💻

If you wish to create DataSources and/or DataExtracts using SQL code then, ODBC drivers should be installed. Please follow the instructions on the following page based on your OS (v17+ is recommended):
- Windows
- MacOS
- Linux
.env file 📃

To extract data from the database some azure specific variables are required to be stored in a .env file. If you don't have those information please contact Simon

2.2 Installation ⏬

Poetry

orchex uses poetry (do not use pip or conda). To create the environment:

Windows

poetry env use 3.12
poetry config virtualenvs.in-project true
poetry install

# to activate the env
poetry shell

MacOS

poetry env use 3.12
poetry config virtualenvs.in-project true

poetry config --local installer.no-binary pyodbc

poetry install

# to activate the env
poetry shell

Linux/ Eedi VM

export PYTHON_KEYRING_BACKEND=keyring.backends.fail.Keyring

poetry env use 3.12
poetry config virtualenvs.in-project true

poetry install

# to activate the env
poetry shell

❗ NOTE: if you get the following error

This error originates from the build backend, and is likely not a problem with poetry but with multidict (6.0.4) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "multidict (==6.0.4)"'.

Run:

poetry shell
pip install --upgrade pip
MULTIDICT_NO_EXTENSIONS=1 pip install multidict
poetry add inflect
poetry add pyodbc

# if package are not reinstalled then run: 
poetry update

Run 🏃

Example run, where foo a function:

from orchex.dataextract import DataExtract

data_extract = DataExtract(
        name="model-agnostic-data-extract",
        description="""A model-agnostic extract of Eedi data.""",
        container_path="data"
)

topic_pathway_collection_ids = (4, 5, 6, 7, 9, 10, 11)
answers_ds =data_extract.get_or_set_data_source(
    "answers", 
    foo,
    topic_pathway_collection_ids=topic_pathway_collection_ids
)
print(answers_ds.head())

Using `orchex` in other repositories

Previously we would have installed the package globally using pip install -e ., using poetry you simply add a dependency to the local package.

Clone the repository:

git clone [email protected]:Eedi/orchex.git

In your other repository, add the following to the pyproject.toml:
```
orchex = {path = <path-to-orchex>, develop=true}
```
Example: orchex was cloned in the parent directory of the current project.
```
orchex = {path = "../orchex", develop = true}
```
The develop flag should mean that your installation will be automatically updated when orchex is editted.
Some environments variables (.env and .sheets) are required for some components. Contact Simon for details.

You can now import this package:

from orchex.dataextract import DataExtract, DataSource

If you then update this package it should update automatically (if develop = true). If this does not happen you should be able to just run poetry update orchex but you may need to reinstall your poetry environment. To do so:
- Close any IDEs (i.e. VS Code) that might be using the environment. (Otherwise the following will fail.)
- Run poetry env list to get the name of the environment.
- Remove the environment poetry env remove orchex-fYa19ibp-py3.12
- Go and delete where the environment folder is (e.g. E:\packages\poetry\virtualenvs). This is necessary otherwise the next step will just reinstalled some cached versions.
- Reinstall poetry install
- In VS Code you may need to manually select the new environment. Ctrl-Shift-P, then click Enter interpreter path....

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.vscode		.vscode
orchex		orchex
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview 📖

2. Setup 🧑‍🔬

2.1 Prerequisites 📋

Python 🐍

ODBC Driver (if running SQL code) 💻

`.env` file 📃

2.2 Installation ⏬

Poetry

Windows

MacOS

Linux/ Eedi VM

Run 🏃

Using `orchex` in other repositories

About

Releases

Packages

Contributors 4

Languages

License

Eedi/orchex

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview 📖

2. Setup 🧑‍🔬

2.1 Prerequisites 📋

Python 🐍

ODBC Driver (if running SQL code) 💻

.env file 📃

2.2 Installation ⏬

Poetry

Windows

MacOS

Linux/ Eedi VM

Run 🏃

Using orchex in other repositories

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

`.env` file 📃

Using `orchex` in other repositories

Packages