Predict tags for posts from StackOverflow with multilabel classification approach.
To initialize the cluster we have to:
- Build dockerfile within src/redirecting_service and name this build: remla-redirecting-service:latest (later we will replace with online build version but for testing purposes this is easier).
- Run
kubectl apply -f .\k8s-local-ingress-controller.yaml
building the ingress controller - Run
kubectl apply -f .\k8s-local-deployment.yaml
- Forward the port to the host:
kubectl port-forward --namespace=ingress-nginx service/ingress-nginx-controller 8080:80
- redirecting service exposed on: https://remla.localdev.me
Minikube
minikube addons enable ingress
kubectl apply -f .\k8s-local-deployment.yaml
- Forward the port to the host:
kubectl port-forward --namespace=ingress-nginx service/ingress-nginx-controller 8080:80
Cleanup: To remove old model deployments: kubectl delete all --all
- Go to
https://remla.localdev.me/admin
Input a version of a model you want to deploy. Version should be >= 1.6.1 - Wait until the model service deploys to the cluster successfully (Better manually check)
- Refresh the page. The dropdown should contain a list of model. If there is only one model, the model gets active automatically. The active model is the one in production. The rest models are shadow models.
- Select a model you want to release in the dropdown and press
Set Active
- Go to
https://remla.localdev.me
. Input description and press predict to get some tag recommendation! - On the given result. You can write your tag improvement feedback to the provided input field.
- Go to
https://remla.localdev.me/prometheus
to check current metrics. Model metrics could be found by filtering with 'model' in status -> targets.
- Dataset of post titles from StackOverflow
- Transformed text data to numeric vectors using bag-of-words and TF-IDF.
MultiLabelBinarizer to transform labels in a binary form and the prediction will be a mask of 0s and 1s.
Logistic Regression for Multilabel classification
- Coefficient = 10
- L2-regularization technique
Results evaluated using several classification metrics:
- Numpy — a package for scientific computing.
- Pandas — a library providing high-performance, easy-to-use data structures and data analysis tools for the Python
- scikit-learn — a tool for data mining and data analysis.
- NLTK — a platform to work with natural language.
Everything in the data/
directory is tracked by DVC.
Dockerfiles are found in the docker folder. Note that the build context should be the root project folder, and not the folder the dockerfile is contained in.
To build the inference API:
docker build -f docker/inference-api/Dockerfile -t inference-api .
To build the redirecting service:
docker build -f docker/redirecting-service/Dockerfile -t redirecting-service .
Note: this sample project was originally created by @partoftheorigin