Skip to content
View JoyeBright's full-sized avatar
:octocat:
Going deeper on NLP !
:octocat:
Going deeper on NLP !

Block or report JoyeBright

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Site infrastructure for gwern.net. Custom Hakyll website with unique link archiving, popup UX, transclusions/collapses, dark+reader mode, bidirectional backlinks, and typography (sidenotes, dropcap…

Haskell 488 43 Updated Aug 28, 2024

Interpretability for sequence generation models 🐛 🔍

Python 351 37 Updated Aug 22, 2024

A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering

659 92 Updated Jul 9, 2024

Python API wrapper for Instructure's Canvas LMS. Easily manage courses, users, gradebooks, and more.

Python 554 173 Updated Aug 27, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 8,974 822 Updated Jul 1, 2024

Seed Machine Translation Data

TeX 22 3 Updated Aug 20, 2024

The FLORES+ Machine Translation Benchmark

TeX 84 14 Updated Aug 20, 2024

Anonymous Github is a proxy server to support anonymous browsing of Github repositories for open-science code and data.

TypeScript 1,370 55 Updated Aug 26, 2024

Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.

Python 84 13 Updated Aug 2, 2024

A byte-level decoder architecture that matches the performance of tokenized Transformers.

Jupyter Notebook 56 6 Updated Apr 24, 2024

MAchine Translation Evaluation Online (MATEO)

Python 15 1 Updated Mar 15, 2024

GEMBA — GPT Estimation Metric Based Assessment

Python 91 15 Updated Jul 30, 2024

Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

Python 148 22 Updated Jun 18, 2024

BLEURT is a metric for Natural Language Generation based on transfer learning.

Python 682 84 Updated Aug 4, 2023

Sync your Outlook and Google calendars

C# 1,791 215 Updated Aug 28, 2024

Track and predict the energy consumption and carbon footprint of training deep learning models.

Python 358 28 Updated Aug 25, 2024

Ensembling Hugging Face transformers made easy

Python 59 4 Updated Dec 24, 2022

An Emacs framework for the stubborn martian hacker

Emacs Lisp 19,136 3,033 Updated Aug 29, 2024

Critical difference diagram with Wilcoxon-Holm post-hoc analysis.

Python 247 70 Updated Aug 24, 2022

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

Python 578 82 Updated Aug 24, 2024

Large Language Model Text Generation Inference

Python 8,657 1,002 Updated Aug 29, 2024

Model parallel transformers in JAX and Haiku

Python 6,265 890 Updated Jan 21, 2023

C++ Library Manager for Windows, Linux, and MacOS

CMake 22,684 6,271 Updated Aug 29, 2024

Data and tools for generating and inspecting OLMo pre-training data.

Python 890 89 Updated Aug 29, 2024

premake system for use in making raylib game projects

C 179 49 Updated Jul 28, 2024

Statistics on multilingual datasets

17 2 Updated Jul 12, 2022

Citron is an experimental quote extraction system created by BBC R&D

Python 23 5 Updated Dec 14, 2021

Python extension for Visual Studio Code

TypeScript 4,280 1,168 Updated Aug 26, 2024
Next