Skip to content
View kl3259's full-sized avatar
Block or Report

Block or report kl3259

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Summarize existing representative LLMs text datasets.

737 64 Updated Jun 15, 2024

[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations".

Python 26 1 Updated Jun 8, 2023

Logiqa2.0 dataset - logical reasoning in MRC and NLI tasks

Python 69 10 Updated Aug 11, 2023

Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.

Python 98 12 Updated Oct 21, 2023

mlr3 extension for Fairness in Machine Learning

HTML 14 2 Updated Jun 5, 2024

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 14,398 2,551 Updated Jul 13, 2024

source code for ICLR'22 paper "VOS: Learning What You Don’t Know by Virtual Outlier Synthesis"

Python 303 52 Updated Oct 1, 2023

Machine Learning Bias Mitigation

Jupyter Notebook 7 6 Updated May 9, 2022

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image …

Python 1,783 238 Updated Jul 18, 2024

This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models" (EMNLP 2020).

HTML 93 24 Updated Mar 1, 2024

Data for evaluating gender bias in coreference resolution systems.

Python 63 11 Updated May 14, 2019

The machine learning toolkit for time series analysis in Python

Python 2,842 335 Updated Jul 1, 2024

Open-source simulator for autonomous driving research.

C++ 10,885 3,499 Updated Jul 17, 2024

The repository for paper <Evaluating Open-QA Evaluation>

Python 19 Updated Apr 9, 2024

[ICML'2024] Can AI Assistants Know What They Don't Know?

Python 56 4 Updated Feb 5, 2024

TISSUE (Transcript Imputation with Spatial Single-cell Uncertainty Estimation) provides tools for estimating well-calibrated uncertainty measures for gene expression predictions in single-cell spat…

Python 25 4 Updated Mar 10, 2024
Python 74 18 Updated Jul 17, 2024

Modeling, training, eval, and inference code for OLMo

Python 4,227 399 Updated Jul 19, 2024

Code for the paper "Calibrating Deep Neural Networks using Focal Loss"

Jupyter Notebook 146 25 Updated Jan 10, 2024

Extending Conformal Prediction to LLMs

Jupyter Notebook 51 6 Updated Jun 21, 2024
Python 17 6 Updated Mar 9, 2023
Python 438 90 Updated Apr 6, 2023

FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age

404 17 Updated Sep 28, 2023

Repo for external large-scale work

Python 6,436 722 Updated Apr 27, 2024

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Python 525 81 Updated Sep 14, 2020

Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models

582 42 Updated Jun 18, 2024

A curated (most recent) list of resources for Learning with Noisy Labels

640 58 Updated Feb 29, 2024
Next