This repository contains tutorials, scripts, examples etc. for getting started with your machine learning / NLP project.
The information are mainly tailored to users of DFKI's PEGASUS system.
- https://code.visualstudio.com/
- Python with VSCode: https://donjayamanne.github.io/pythonVSCodeDocs/docs/python-path/
- https://www.jetbrains.com/pycharm/
One of the key features of any good IDE is its debugging support. The debugger will make it much easier to fix your code (no need for print-statements anymore).
Tutorials for debuggers:
- https://code.visualstudio.com/docs/editor/debugging
- https://www.youtube.com/watch?v=6cOsxaNC06c
- How To Debug Python Code In Visual Studio Code (VSCode) https://www.youtube.com/watch?v=oCcTiRGPogQ
- If you are a student, apply for a GitHub education account.
- Install GitHub Copilot! See VSCode extensions.
Get familar with coding standards and best practices! This improve your code by a lot and makes it much easier to maintain.
You can use automated tools to enforce coding styles:
Today's machine learning requires large computing resources that your local machine won't have. Thus, you need to connect a remote server to run your experiments.
- How to Use SSH to Connect to a Remote Server in Linux or Windows
- https://projects.dfki.uni-kl.de/km-publications/web/ML/core/hpc-doc/docs/guidelines/getting-started/
SSH keys
- https://linuxhandbook.com/ssh-config-file/
- https://www.cyberciti.biz/faq/create-ssh-config-file-on-linux-unix/
A SSH-config may contain entries like below. Replace <dfki_account
> with your DFKI Account and <pegasus_account>
with your PEGASUS account.
# PEGASUS via SSH-Gate
Host pegasus.dfki # a custom hostname
User <pegasus_account>
HostName login2.pegasus.kl.dfki.de # change this to a different login node if needed
ProxyJump <dfki_account>@sshgate.sb.dfki.de
With such a config, you can simply connect to PEGASUS by typing ssh pegasus.dfki
.
An SSH connectio can be used a proxy to access resources from the intranet:
# replace <proxy_port> with a port number, e.g. 8001
ssh -D <proxy_port> pegasus.dfki
This creates a SOCKS proxy (see https://ma.ttias.be/socks-proxy-linux-ssh-bypass-content-filters/).
Enable this proxy via or system settings or browser settings or use a proxy browser plugin like FoxyProxy:
Use the following settings:
- Proxy Type: SOCKS5
- Proxy IP address: localhost
- Proxy port:
<proxy_port>
You connection to a remote might be lost and, therefore, it is important to maintain sessions of the remote server independent from your own connection.
tmux
provides this and many other features that will make your work on remote servers much easier.
Alternatives to tmux are: screen, ...
The .bashrc
in your home directory is loaded everytime you start a bash session.
It is a good place to define global environment variables, e.g., cache paths or login credentials. For example:
# append to "~/.bashrc"
# shortcuts
alias ll="ls -l"
# PIP cache
# https://projects.dfki.uni-kl.de/km-publications/web/ML/core/hpc-doc/posts/pypi-cache/
export PIP_INDEX_URL=https://pypi-cache/index
export PIP_TRUSTED_HOST=pypi-cache
export PIP_NO_CACHE=true
# Huggingface
export HF_LOGIN=<your huggingface login>
export HF_PASSWORD=<your huggingface API key>
export HF_DATASETS_CACHE="/netscratch/$USER/datasets/hf_datasets_cache/"
export TRANSFORMERS_CACHE="/netscratch/$USER/datasets/transformers_cache/"
# Weights & Biases
export WANDB_API_KEY<your WANDB api>
Read the PEGASUS documentation. It should provide all necassary information. For other questions, please use the cluster chat.
Some potentially useful commands:
# starts an interactive job with pytorch (8hrs time limit)
$ srun -K \
--container-image=/netscratch/enroot/nvcr.io_nvidia_pytorch_23.07-py3.sqsh \
--container-workdir="`pwd`" \
--container-mounts=/netscratch/$USER:/netscratch/$USER,/ds:/ds:ro,"`pwd`":"`pwd`" \
--time 08:00:00 --pty bash
# list your current jobs
squeue --me
PEGASUS uses enroot for containers. If you have rebuild Docker images you can convert them as follows:
srun -p $ANY_PARTITION \
enroot import \
-o /netscratch/$USER/enroot/malteos_eulm_latest.sqsh \
'docker:https://ghcr.io#malteos/eulm:latest'
Build custom images with Podman:
sbin/podman_build.sh
You can run Jupyter noteboks on Pegasus:
# start interactive compute job
# --container-save=$EVAL_DEV_IMAGE
srun \
--container-mounts=/netscratch:/netscratch,/home/$USER:/home/$USER \
--container-image=$IMAGE \
--container-workdir=$(pwd) -p RTX6000 --time 08:00:00 --pty /bin/bash
# run in compute job
echo "Jupyter starting at ... https://${HOSTNAME}.kl.dfki.de:8880" && jupyter notebook --ip=0.0.0.0 --port=8880 \
--allow-root --no-browser --config /home/mostendorff/.jupyter/jupyter_notebook_config.json \
--notebook-dir /netscratch/mostendorff/experiments
# start with fixed token (for VSCode -> "Specify Jupyter connection")
JUPYTER_TOKEN=yoursecrettoken jupyter notebook