dataset
人工精调的中文对话数据集和一段chatglm的微调代码
中文文本生成(NLG)之文本摘要(text summarization)工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。(graph,feature,topic model,summarize to…
Instruction Tuning with GPT-4
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
CamelBell(驼铃) is be a Chinese Language Tuning project based on LoRA. CamelBell is belongs to Project Luotuo(骆驼), an open sourced Chinese-LLM project created by 冷子昂 @ 商汤科技 & 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技
Used for adaptive human in the loop evaluation of language and embedding models.
Dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization"
A wide variety of research projects developed by the SpokenNLP team of Speech Lab, Alibaba Group.