Stars
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
T2Ranking: A large-scale Chinese benchmark for passage ranking.
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
colbert for dense retrieval, including multi view version, dureader-retrieval as an example
An Open-Source Package for Information Retrieval
A Semantic Search Engine Built on Arxiv dataset from Kaggle.
🔍 AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your da…
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Awesome Machine Unlearning (A Survey of Machine Unlearning)
Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"
Source code and checkpoints for legal pre-trained language models.
A python package that takes tables from a web page and processes them to get high quality tables
KDD'2022: Towards Representation Alignment and Uniformity in Collaborative Filtering
Must-read papers on prompt-based tuning for pre-trained language models.
Source code of "NeurIPS21 - Universal Graph Convolutional Networks"
Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning