Skip to content


Repository files navigation

Hugging Face GPT Chatbot Example


TLDR: NLP models often have quite a specific set of inputs. With Transformers models this is no different. In this project, we create a small Wrapper class ( which will be used to package a Hugging Face model in mlflow.pyfunc flavor.

In the inference notebook (, we have an example where we:

  1. Download the pretrained model and tokenizer from Hugging Face
  2. Save the model into MLflow Model Registry
  3. Promote the model to Production
  4. Load the model
  5. Get some predictions (concretely, we're having a chat with the GPT Bot 😄)


A State-of-the-Art Large-scale Pretrained Response generation model (DialoGPT)

This repo contains a basic example of Hugging Face model for Language Modeling using Transformers - most specifically, microsoft/DialoGPT-medium.

From their Hugging Face repo:

DialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. The human evaluation results indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test. The model is trained on 147M multi-turn dialogue from Reddit discussion thread.

Multi-turn generation examples from an interactive environment:

Role Response
User Does money buy happiness?
Bot Depends how much money you spend on it .
User What is the best way to buy happiness ?
Bot You just have to be a millionaire by your early 20s, then you can be happy .
User This is so difficult !
Bot You have no idea how hard it is to be a millionaire and happy . There is a reason the rich have a lot of money


  1. Databricks + MLflow
  2. FastAPI
  3. NGINX
  4. ReactJS


Databricks Serverless ML Endpoints

  • Clone this repo into your Databricks workspace.
  • Make sure to use a cluster running an LTS ML runtime.
  • In the databricks folder, run the notebook. This will generate our dataset for further fine tuning our DialoGPT model.
  • Run the notebook. This will finetune our model with the dataset generated in the previous step.
  • Run the notebook. Doing so will register our model into MLflow Model Registry, and generate some predictions.
  • Once the model is registered, use it to create a REST Realtime Endpoint (Model Serving V2).

User Interface

  • Be sure to have Docker installed.
  • Clone this repo.
  • Build the backend container image by running make backend.
  • Build the frontend container image by running make frontend.
  • Create a Databricks PAT Token on your workspace.
  • Copy the .env.example file into .env and fill in the parameters. Use the info from your workspace and the Model Serving V2 created in the previous section.
  • Run both containers by executing make run.
  • On your browser, go to http:



See the issues section
