End-to-end showcase of MLOps for Elixir ecosystem.
There is a scarcity of end-to-end ModelOps projects made available to aid understanding of the state of the art and to demonstrate how such solutions may be deployed on custom infrastructure.
Therefore the Emporium was erected with the purpose of illustrating practical implementation techniques for Elixir teams, and secondarily to assess end-to-end performance of such implementations, which will surely become popular over time.
The Emporium assumes the following canonical architecture:
- x86_64 system, macOS or Linux
- Optional Nvidia CUDA-capable GPU
For more context, check out:
The application requires the following dependencies:
-
LLVM / Clang (Run:
./vendor/setup-clang.sh
) -
CMake (Run:
apt-get install cmake
) -
LibTorch (run
./vendor/setup-libtorch.sh
) -
FFMpeg (Run:
apt-get install ffmpeg
) -
OpenCV (Run:
./vendor/setup-libtorch.sh
) -
Nvidia’s CUDA Toolkit (if using GPU)
This application comes with 2 models:
-
Image Classification via ResNet-50
-
Object Detection via YOLOv5
The former is pulled by Bumblebee therefore no action is required.
For the latter, clone YOLOv5 and export the pre-trained model:
$ git clone https://github.com/ultralytics/yolov5 ultralytics-yolov5
$ cd ultralytics-yolov5; asdf local python 3.8.5; cd -; cd -
$ pip install -r requirements.txt
$ python3 export.py --weights yolov5s.pt --include torchscript
Then copy yolov5s.torchscript
to:
apps/emporium_inference_yolov5/priv/yolov5s.torchscript
You can also use other sizes by configuration:
config :emporium_runner_yolov5, model_name: "yolov5x.torchscript"
The system is split into the following sections:
The Emporium Environment application is responsible for foundational services, including node clustering (enabling Erlang Distribution between nodes), and setting up of facilities that would allow custom Horde supervisors in other applications to function correctly.
The Emporium Proxy application provides entry point for HTTP traffic (TLS is offloaded to the load balancer) and manages proxying of all connections to other applications such as Emporium Web, or if required, an Admin app in the future.
The Emporium Web application provides the entry point, and hosts the Session LiveView, which is the main interaction element.
The Emporium Nexus application provides WebRTC ingress and orchestration for inference workloads. It hosts the Membrane framework, exposes RTP endpoints. and allows WebRTC connections to be made to the application cluster.
The Emporium Inference application is a façade, which holds image conversion logic, via Evision, which is used in featurisation (pre-processing).
The Emporium Inference (YOLOv5) application provides Object Detection capabilities via YOLOv5, orchestrated via Sbroker, and implemented with PyTorch using a custom C++ program.
The Emporium Inference (ResNet) application provides Image Classification capabilities via Bumblebee and the microsoft/resnet-50
model.
To prepare the environment, install the prerequisites above, then install the CUDA Toolkit which you can find at CUDA Downloads, if you are using an NVidia GPU on Linux.
Note if you are using WSL 2, then select Linux → x86_64 → WSL-Ubuntu → 2.0. It is critical not to use normal versions for Ubuntu as that would install drivers, which are not necessary for WSL 2, where the driver is already installed on Windows side.
To change between Object Recognition and Image Classification, modify EmporiumNexus.InferenceSink
.
At a bare minimum, you should configure config/dev.env
with the template at config/dev.env.template
:
NGROK_SUBDOMAIN
: Is used when running ngrok as specified in ProcfileSECRET_KEY_BASE
: Is used for Phoenix dead & live viewsTURN_IP
,TURN_MOCK_IP
should be publicly routable from your clients for TCP/UDP TURN trafficTURN_PORT_UDP_FROM
,TURN_PORT_UDP_TO
should be a high range e.g. 50000 - 65535
You may run into some problems which are documented in “Common Problems”.
-
Investigate real time display of metrics
-
Investigate YOLOv8
-
Investigate operational pattern for TensorRT
-
Investigate deployment topology / AWS CFN
-
Investigate IPv6 usage
This may be related to whether you have used HTTPS or not. WebRTC requires a secure context and the fastest way to acquire full compliance would be to configure and use ngrok then use the ngrok domain for all testing purposes.
For local deployment, consider CloudFlare Zero Trust / CloudFlare Access.
FastTLS is a dependency of the Membrane Framework.
On Linux, as long as pkg-config
exists, it should be found automatically, but this can be a problem on macOS. It is a known problem and the workaround is as follows:
export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CFLAGS="-I/usr/local/opt/openssl/include/"
export CPPFLAGS="-I/usr/local/opt/openssl/include/"
export PKG_CONFIG_PATH="/usr/local/opt/openssl@3/lib/pkgconfig:$PKG_CONFIG_PATH"
mix deps.compile fast_tls
### Unable to compile YOLOv5 Runner
When no CMAKE_CUDA_COMPILER
could be found, it may be due to improper configuration of the CUDA Toolkit.
Consider adding the following to ~/.zshrc
or equivalent:
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
Once nvcc
can be found, this problem resolves itself.
Membrane’s WebRTC implementation includes an integrated TURN server, so you should set both TURN_IP
and TURN_MOCK_IP
…
TURN_IP
is the IP on the interface that the TURN server listens toTURN_MOCK_IP
is the IP that is presented to the client
…to publicly routable IPs.
See the following post for more information:
This is under investigation, a workaround has been put in place (issue)
If not using a 40-series card, you may downgrade libTorch to 1.3.x by:
rm -rf ./vendor/libtorch
LIBTORCH_VERSION=1.3.0 ./vendor/setup-libtorch.sh