- Marseille, France
- https://mathieu.3maisons.org
- @moreymat
Stars
Backend ressources for Albert. Albert is a conversational agent that uses official French data sources to answer administrative agents questions.
Alignability testing and integration of single-cell data
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Minimalistic large language model 3D-parallelism training
OCR, layout analysis, reading order, table recognition in 90+ languages
Repository with code & data for the publication Microbial interactions shape cheese flavour formation
Systematically learn and evaluate manifolds from high-dimensional data
Collecting archives and analysis on Jupyter's history
Continual pretraining of foundation LLM using ⚡ Lightning Fabric
Polars extension for general data science use cases
Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
The most accurate natural language detection library for Rust, suitable for short text and mixed-language text
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Robust recipes to align language models with human and AI preferences
A package for statistically rigorous scientific discovery using machine learning. Implements prediction-powered inference.
A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
data cleaning and curation for unstructured text
Python programs, usually short, of considerable difficulty, to perfect particular skills.
The platform for building AI from enterprise data
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Uncertainty-aware representation learning (URL) benchmark