A python gRPC worker serving a Vision Transformer model: vit-base-patch16-224 available on Hugging Face 🤗
The Vision Transformer, or ViT, is a model for image classification that employs a Transformer-like architecture over patches of the image. An image is split into fixed-size patches, each of them are then linearly embedded, position embeddings are added, and the resulting sequence of vectors is fed to a standard Transformer encoder. In order to perform classification, the standard approach of adding an extra learnable “classification token” to the sequence is used.
For additional ressources, please refer to paperswithcode
git clone [email protected]:rm-rfred/vit-worker.git
cd vit-worker
docker-compose build
docker-compose up -d
image_classification_pb2.py and image_classification_pb2_grpc.py where generated by running :
bash run_protoc.sh
- Higher performances for microservice architecture
- High load APIs
- Better suited for real time / streaming apps
Docker version 24.0.6, build ed223bc Docker Compose version v2.23.0