This repo contains the source code and the user evaluation data and analysis scripts for Ruler, a data programming by demonstration system for document labeling.
Ruler synthesizes labeling functions based on your span-level annotations, allowing you to quickly and easily generate large amounts of training data for text classification, without the need to program.
Check out our demo video to see Ruler in action on a spam classification task, or try it yourself on a sentiment analysis task.
News: We have recently released TagRuler, an extension of Ruler that you can use to generate labelled data for span annotation tasks. Check it out!
- What is Ruler?
- How to Run the Source Code in This Repo
- Using Ruler: the Basics
- For Researchers
- Contact
The success of machine learning has dramatically increased the demand for high-quality labeled data---but this data is expensive to obtain, which inhibits broader utilization of machine learning models outside resource rich settings. That's where data programming [1, 2] comes in. Data programming aims to address the difficulty of collecting labeled data using a programmatic approach to weak supervision, where domain (subject-matter) experts are expected to provide functions incorporating their domain knowledge to label a subset of a large training dataset.
This approach has a few drawbacks, however. Many domain experts lack programming expertise, but it would still be useful to translate their knowledge into functions. For example, training models for the medical domain requires volumes of high-accuracy training data, but the medical experts' time is very valuable, limiting the amount of time they can spend labeling. Even for domain experts who are proficient programmers, it is often difficult to convert domain knowledge to a set of rules.
In short, the accessibility of writing labeling functions is a challenge to wider adoption of data programming. To address this challenge, we introduce a new framework, Data Programming by Demonstration (DPBD), to synthesize labeling functions through user interactions.
Overview of the data programming by demonstration (DPBD) framework. Straight lines indicate the flow of domain
knowledge, and dashed lines indicate the flow of data.
DPBD aims to move the burden of writing labeling functions to an intelligent synthesizer while enabling users to steer this synthesis. Ruler is an interactive tool that operationalizes data programming by demonstration for document text.
An overview of the Ruler workflow. The user iteratively annotates and labels text, selects functions from those Ruler generates, and gets feedback on the performance of the set of labeling functions they have selected.
For example, consider a sentiment classification task. A labeling function might look something like this Python code:
def find_positive_adj(text):
if "awesome" in text or "great" in text:
return POSITIVE
else:
return NEGATIVE
Instead of formalizing this function as Python code, a user can use Ruler to annotate the words "awesome" and "great" to get the same function. This is the "demonstration" part of DPBD. Ruler functions can also make use of word co-occurence, named entities, and more.
Once the user is satisfied with the functions they've created using Ruler, these functions are aggregated using Snorkel, which denoises the resulting label model. With this model, the user can label as much training data as they would like, and use it to train a more sophisticated supervised model.
By limiting users' task to simple annotation and selection from suggested rules,
we allow fast exploration over the space of labeling functions.
This allows users to focus on
✅ choosing the right generalization of observed instances
✅ capturing the tail end of their data distribution
and avoid worrying about
❌ implementation details in a programming language
❌ how to express rules in natural language
❌ how to formalize their intuition
Follow these instructions to run the system on your own, where you can plug in your own data and save the resulting labels, models, and annotations.
The server runs on Flask and can be found in server/
.
It is strongly reccomended that you use Python version 3.6
cd server
pip install -r requirements.txt
python api/server.py
Now the engine is running. To use Ruler, you will need to run the UI as well, described below.
You can check out https://localhost:5000/api/ui to see the supported endpoints. This will display a Swagger UI page that allows you to interact directly with the API.
The user interface is implemented in React JavaScript Library. The code can be found in ui/
.
You can download node.js here.
To confirm that you have node.js installed, run node - v
cd ui
npm install
npm start
By default, the app will make calls to localhost:5000
, assuming that you have the server running on your machine. (See the instructions above).
Once you have both of these running, navigate to localhost:3000
.
Congrats, you've got Ruler running! 🎉
When you navigate to localhost:3000
, you will be guided through the process of initializing your project.
-
Upload data. There is some example data under (server/datasets/spam_example/processed.csv)[server/datasets/spam_example/processed.csv]. You can also upload your own data here, just make sure it's a valid csv file, and your text column is labeled
text
. If you have labels you want to use for development, these should be in a column namedlabels
. Ruler will automatically split your data into training (the data you interactively label), development (the data your functions are evaluated on), and test/validation (to evaluate the end model). -
Create/load a model. If you're iterating on a model you've previously saved, you can load it here. Otherwise, enter a name for your new model, and you will define the label classes in the next step.
-
Define Labels. WARNING your label classes need to match the data you've uploaded. If you're dataset has labels
{0: NON-SPAM, 1: SPAM}
then you need to add the labels in this order to make sure they're mapped correctly. If you're loading a previous model, make sure these label classes match the dataset. -
Continue to Project. You should automatically be redirected to
localhost:3000/project
once your data is pre-processed.
Need some ideas? Try sentiment classification on this (Amazon Review dataset)[https://www.kaggle.com/bittlingmayer/amazonreviews].
Upload this dataset, create a new model, define the labels NON-SPAM
and SPAM
, and get labelling.
Now you're at localhost:3000/project
, where the magic happens.
A/B Highlight parts of the text, add links between them, or create concepts to annotate the data.
C Once you select a label class, Ruler will automatically suggest functions for you. Select and submit the ones you like.
D Your label model performance will update as you go, showing changes with each addition/deletion of a function.
E If you want to evaluate a model trained on your generated labels, click the refresh icon in this panel. This will train a logistic regression model on bag of words features and report the performance. You should use this sparingly to avoid overfitting to the test set. Note that this is a very simplistic model which may not be suitable for evaluating labels for some tasks.
F Here, you can inspect individual functions' performance, and deactivate them.
See our demo video for some example interactions.
Save your model by clicking the icon on the top right. If you decide to iterate on it more later, you can load it on the create/load project page.
Here you can find the data from our user study, along with the code to generate all of our figures and analysis.
Please see our Findings of EMNLP'20 publication for details.
If you have any problems, please feel free to create a Github issue.
For other inquiries, contact [email protected].