Skip to content

labeltext is a simple command-line utility to annotate large amounts of text quickly for text classification tasks

Notifications You must be signed in to change notification settings

soumendra/labeltext

Repository files navigation

pypi labeltext version Conda labeltext version labeltext python compatibility labeltext license

latest release date latest release version issue count open pr count last commit at contributors count

Getting started

Workflow overview

After installing labeltext,

  1. create a TextAnnotation object (or restore from earlier annotation session),
  2. start annotating by calling the .annotate() method.

Install labeltext

pip install labeltext

Create a TextAnnotation object

task = TextAnnotation(
    records=["Albert Einstein", "Stephen King", "Marie Curie"],
    labels=["male", "female"],
    output="scientists.csv"
)
print(task)
  • records: List of text records to be annotated
  • labels: List of class labels (up to 16)
  • output: The CSV file where annotations will be saved (default: annotations.csv)

It'll probably be more natural to read the records from a (csv) file somewhere.

import pandas as pd
df = pd.read_csv("example.csv")

task = TextAnnotation(
    records=list(df.text.values), # `text` is a column in df
    labels=["male", "female"],
    output="scientists.csv"
)
print(task)

Start annotating

task.annotate(user_name="@dataBiryani", update_freq=2)

This function starts an interactive annotation session.

  • user_name (optional): A project may have multiple annotators. If not provided, the user will be asked for a user_name
  • update_freq (optional): New annotations are not immediately saved to disk. They are saved once every update_freq annotations (default 5), or if the user ends the annotation session, or if no records are left to annotate.

Note: The output of the annotation session will be written to a csv file that you can feed into your modeling pipeline. The current state of annotation will also be saved in a pickle file (with the same filename as the csv file, but with .pkl extension). You can use the .pkl file to continue annotation in future sessions.

Continue from where you left off

task = TextAnnotation()("annotations.pkl")
task.annotate(user_name="@dataBiryani")

About

labeltext is a simple command-line utility to annotate large amounts of text quickly for text classification tasks

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages