Multilabel classification on Stack Overflow tags

Predict tags for posts from StackOverflow with multilabel classification approach.

Kubernetes instruction

To initialize the cluster we have to:

Build dockerfile within src/redirecting_service and name this build: remla-redirecting-service:latest (later we will replace with online build version but for testing purposes this is easier).
Run kubectl apply -f .\k8s-local-ingress-controller.yaml building the ingress controller
Run kubectl apply -f .\k8s-local-deployment.yaml
Forward the port to the host: kubectl port-forward --namespace=ingress-nginx service/ingress-nginx-controller 8080:80
redirecting service exposed on: https://remla.localdev.me

Minikube

minikube addons enable ingress
kubectl apply -f .\k8s-local-deployment.yaml
Forward the port to the host: kubectl port-forward --namespace=ingress-nginx service/ingress-nginx-controller 8080:80

Cleanup: To remove old model deployments: kubectl delete all --all

Instruction of using the app

Go to https://remla.localdev.me/admin Input a version of a model you want to deploy. Version should be >= 1.6.1
Wait until the model service deploys to the cluster successfully (Better manually check)
Refresh the page. The dropdown should contain a list of model. If there is only one model, the model gets active automatically. The active model is the one in production. The rest models are shadow models.
Select a model you want to release in the dropdown and press Set Active
Go to https://remla.localdev.me. Input description and press predict to get some tag recommendation!
On the given result. You can write your tag improvement feedback to the provided input field.
Go to https://remla.localdev.me/prometheus to check current metrics. Model metrics could be found by filtering with 'model' in status -> targets.

Dataset

Dataset of post titles from StackOverflow

Transforming text to a vector

Transformed text data to numeric vectors using bag-of-words and TF-IDF.

MultiLabel classifier

MultiLabelBinarizer to transform labels in a binary form and the prediction will be a mask of 0s and 1s.

Logistic Regression for Multilabel classification

Coefficient = 10
L2-regularization technique

Evaluation

Results evaluated using several classification metrics:

Libraries

Numpy — a package for scientific computing.
Pandas — a library providing high-performance, easy-to-use data structures and data analysis tools for the Python
scikit-learn — a tool for data mining and data analysis.
NLTK — a platform to work with natural language.

DVC

Everything in the data/ directory is tracked by DVC.

Docker

Dockerfiles are found in the docker folder. Note that the build context should be the root project folder, and not the folder the dockerfile is contained in.

To build the inference API: docker build -f docker/inference-api/Dockerfile -t inference-api .

To build the redirecting service: docker build -f docker/redirecting-service/Dockerfile -t redirecting-service .

Note: this sample project was originally created by @partoftheorigin

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.dvc		.dvc
.github/workflows		.github/workflows
docker/inference-api		docker/inference-api
notebooks		notebooks
src		src
tests		tests
.dvcignore		.dvcignore
.gitignore		.gitignore
.mllint.yml		.mllint.yml
.pylintrc		.pylintrc
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
full-deploy.sh		full-deploy.sh
k8s-local-deployment.yaml		k8s-local-deployment.yaml
k8s-local-ingress-controller.yaml		k8s-local-ingress-controller.yaml
kube-local-deployment.yaml		kube-local-deployment.yaml
prometheus.yml		prometheus.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilabel classification on Stack Overflow tags

Kubernetes instruction

Instruction of using the app

Dataset

Transforming text to a vector

MultiLabel classifier

Evaluation

Libraries

DVC

Docker

About

Releases

Packages

Contributors 5

Languages

keonchennl/REMLA

Folders and files

Latest commit

History

Repository files navigation

Multilabel classification on Stack Overflow tags

Kubernetes instruction

Instruction of using the app

Dataset

Transforming text to a vector

MultiLabel classifier

Evaluation

Libraries

DVC

Docker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages