FYVictor93

FYVictor93

Stars

tongyx361 / Awesome-LLM4Math

Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise descriptions to help readers g…

55 2 Updated Jul 12, 2024

mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs

Python 646 37 Updated Aug 28, 2024

wasiahmad / Awesome-LLM-Synthetic-Data

A reading list on LLM based Synthetic Data Generation 🔥

73 3 Updated Aug 18, 2024

fe1ixxu / CPO_SIMPO

This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.

Python 25 3 Updated Aug 13, 2024

princeton-nlp / LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Jupyter Notebook 319 26 Updated Jun 29, 2024

ZubinGou / math-evaluation-harness

A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨

Python 66 7 Updated Apr 26, 2024

CrazyBoyM / llama3-Chinese-chat

Llama3、Llama3.1 中文仓库（随书籍撰写中... 各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档）

Python 3,859 315 Updated Aug 16, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 6,246 1,649 Updated Aug 28, 2024

project-numina / aimo-progress-prize

Jupyter Notebook 246 14 Updated Jul 22, 2024

meowpass / FollowComplexInstruction

Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models"

Python 29 2 Updated Jun 24, 2024

km1994 / nlp_paper_study

该仓库主要记录 NLP 算法工程师相关的顶会论文研读笔记

C++ 3,840 660 Updated Aug 18, 2023

NisaarAgharia / Advanced_RAG

Advanced Retrieval-Augmented Generation (RAG) through practical notebooks, using the power of the Langchain, OpenAI GPTs ,META LLAMA3 ,Agents.

Jupyter Notebook 148 27 Updated Apr 26, 2024

BinNong / meet-libai

李白 👤 作为唐代杰出诗人，其诗歌作品在中国文学史上具有重要地位。近年来，随着数字技术和人工智能的快速发展，传统文化普及推广的形式也面临着创新与变革。国内外对于李白诗歌的研究虽已相当深入，但在数字化、智能化普及方面仍存在不足。因此，本项目旨在通过构建李白知识图谱，结合大模型训练出专业的AI智能体，以生成式对话应用的形式，推动李白文化的普及与推广。

Python 1,095 124 Updated Jul 12, 2024

datawhalechina / so-large-lm

大模型基础: 一文了解大模型基础知识

2,414 218 Updated Aug 13, 2024

FareedKhan-dev / AI-text-to-video-model-from-scratch

In this blog, we will build a small scale text-to-video model from scratch. We will input a text prompt, and our trained model will generate a video based on that prompt.

Jupyter Notebook 110 20 Updated Jun 23, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,173 66 Updated Aug 21, 2024

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 11,205 789 Updated Aug 25, 2024

mlfoundations / dclm

DataComp for Language Models

HTML 1,080 96 Updated Aug 19, 2024

CASIA-LM / MoDS

Python 104 11 Updated Apr 16, 2024

arlosefj / github_interest

interest repositories

182 42 Updated Feb 6, 2024

pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE

Python 1,439 158 Updated Aug 13, 2024

magpie-align / magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

Python 357 35 Updated Aug 28, 2024

bilibili / Index-1.9B

A SOTA lightweight multilingual LLM

Python 798 42 Updated Jul 8, 2024

e2b-dev / awesome-ai-agents

A list of AI autonomous agents

9,428 672 Updated Jul 30, 2024

MaartenGr / KeyBERT

Minimal keyword extraction with BERT

Python 3,407 342 Updated Jul 16, 2024

Bistutu / FluentRead

拥有基于上下文语境的人工智能翻译引擎，为网站提供更加友好的翻译，让所有人都能够拥有基于母语般的阅读体验。

JavaScript 1,296 59 Updated Jun 10, 2024

huggingface / cosmopedia

Python 407 39 Updated Jul 17, 2024

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 1,888 128 Updated Aug 28, 2024

huggingface / text-clustering

Easily embed, cluster and semantically label text datasets

Python 421 32 Updated Mar 28, 2024

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,332 231 Updated Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly