-
AI2, University of Washington
- https://yanaiela.github.io
- @yanaiela
Highlights
- Pro
Block or Report
Block or report yanaiela
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Tool for interactive embeddings visualization
This is an extension of the popular 21cmFAST code that interfaces with CLASS to generate initial conditions at recombination that are consistent with the input cosmological model
Data and tools for generating and inspecting OLMo pre-training data.
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
A latent text-to-image diffusion model
A high-throughput and memory-efficient inference and serving engine for LLMs
A Survey on Data Selection for Language Models
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
A library to manipulate font files from Python.
Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, date/time, etc. in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI, NL. Partial support for JA, KO, AR, SV).…
BookNLP, a natural language processing pipeline for books
Creative interactive views of any dataset.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Lexical Generalization Improves with Larger Models and Longer Training (EMNLP 2022)
Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Python script which prints out a summary of your free slots from your Google calendar(s) so you can paste into a scheduling email.
A template repo for Python packages
Open-Source Neural Machine Translation in Tensorflow
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphic…
A reading list for papers on causality for natural language processing (NLP)