Starred repositories
High-performance retrieval engine for unstructured data
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Source code of the paper "ReClean: Reinforcement Learning for Automated Data Cleaning in ML Pipelines"
This is the official code for paper "EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy"
A simple, easy-to-hack GraphRAG implementation
Data and Code for ICLR2020 Paper "TabFact: A Large-scale Dataset for Table-based Fact Verification"
AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval (NeurIPS 2024)
深度学习面试宝典(含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向)
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
[ICML 2024 (Spotlight)] InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation. Paper: https://arxiv.org/abs/2406.00426.
[ICML 2024] Selecting High-Quality Data for Training Language Models
A resource for learning about Machine learning & Deep Learning
Graph Neural Networks for Tabular Data Learning (GNN4TDL)
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
Bidirectional Representation Model for Erroneous Data Detection