Skip to content

edzq/awesome-weak-supervision

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

awesome-weak-supervision

A curated list of weak and distant supervision papers and tools.

Table of Contents

Introduction

Weak supervision and distant supervision provide ways to (semi-) automatically generate training data for machine learning systems in a fast and efficient manner where normal, supervised training data is lacking. This idea is popular in fields like natural language processing and computer vision and is actively researched. Here, we list interesting papers and tools to help newcomers from both the research and the application side try out weak supervision.

This list was started by the organizers for the WeaSuL Workshop on Weakly Supervised Learning at ICLR'21 and we welcome contributions to extend it.

Contributing

If you want to contribute to this list, just create a pull-request or a new issue. For a paper or tool, please provide all the necessary information (authors, title, conference, link, topic tags, short description). If you are unsure, feel free to open an issue to discuss it. If you encounter any typos, just let us know. Thanks!

Overview Texts

Texts that give a quick start into the topic.

Surveys

Surveys give a broad overview of a field and can allow you to quickly get insights into current trends and issues for future work.

Foundational Papers

Important steps in how we came to the current state of the art.

Books

Libraries and Tools

Open-source libraries and tools already providing implementations that get you started quickly.

  • Cleanlab [ML, CV, NLP] "Python package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and learning with label errors in datasets."
  • Knodle [ML, CV, NLP] "Modular weakly supervised learning with PyTorch."
  • Snorkel [ML, CV, NLP] "Programmatically build and manage training data.”
  • ANEA [NLP] "A tool to automatically annotate named entities in unlabeled text based on entity lists for the use as distant supervision"
  • Sweak [NLP] "It provides labeling functions to automatically label documents, and aggregate their results to obtain a labeled version of the corpus."

Datasets and Benchmarks

Datasets generated through weak and distant supervision. These works can provide both insights into how to generate weakly supervised data as well as to evaluate your learning algorithms on them.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published