GitHub - percent4/embedding_model_exp: 本项目用于Embedding模型的相关实验，包括Embedding模型评估、Embedding模型微调、Embedding模型量化等。

本项目用于Embedding模型的相关实验，包括Embedding模型评估、ReRank模型微调、Embedding模型微调、Embedding模型量化等。

1. Embedding模型评估

参考脚本: src/baseline_eval目录：

bge_base_zh_eval.py: BGE-base-zh-v1.5模型评估，作为基线评估（baseline）

评估结果参考 docs/model_evaluation.md 文档。

2. Embedding模型微调

Using sentence-transformers v3:

python src/finetune/ft_embedding.py

Using AutoTrain:

cd ./src/finetune
CUDA_VISIBLE_DEVICES=0 autotrain --config config.yml

Using LlamaIndex Finetune Embeddings:

可查阅参考文献5。

3. ReRank模型微调

数据合成: src/utils/make_ft_rerank_corpus.py
模型微调: src/finetune/ft_rerank.py
评估实验: https://github.com/percent4/embedding_rerank_retrieval，评估结果参考 docs/model_evaluation.md 文档。

4. Embedding模型量化

基础测试: src/quantization/basic_test.py

5. Embedding Usage(应用)

图片搜索示例: src/usage/image_search.py

参考文献

Training and Finetuning Embedding Models with Sentence Transformers v3: https://huggingface.co/blog/train-sentence-transformers
Fine-tune Embedding models for Retrieval Augmented Generation (RAG): https://www.philschmid.de/fine-tune-embedding-model-for-rag
俄罗斯套娃 (Matryoshka) 嵌入模型概述: https://huggingface.co/blog/zh/matryoshka
Finetune Embeddings: https://docs.llamaindex.ai/en/stable/examples/finetuning/embeddings/finetune_embedding/
NLP（八十六）RAG框架Retrieve阶段的Embedding模型微调: https://mp.weixin.qq.com/s?__biz=MzU2NTYyMDk5MQ==&mid=2247486333&idx=1&sn=29d00d472647bc5d6e336bec22c88139&chksm=fcb9b2edcbce3bfb42ea149d96fb1296b10a79a60db7ad2da01b85ab223394191205426bc025&token=1376257911&lang=zh_CN#rd
How to Fine-Tune Custom Embedding Models Using AutoTrain: https://huggingface.co/blog/abhishek/finetune-custom-embeddings-autotrain
Upload a dataset to the Hub: https://huggingface.co/docs/datasets/v1.16.0/upload_dataset.html
Training Examples » MS MARCO: https://sbert.net/examples/training/ms_marco/cross_encoder_README.html
train_cross-encoder_scratch.py: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/ms_marco/train_cross-encoder_scratch.py
NLP（八十三）RAG框架中的Rerank算法评估: https://mp.weixin.qq.com/s/ZqBbrrZxlMtn2ohttAGDIQ
NLP（一百零一）Embedding模型微调实践: https://mp.weixin.qq.com/s/lJ3Mycjw1G99T08r8c7dSQ

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
docs		docs
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Embedding模型评估

2. Embedding模型微调

3. ReRank模型微调

4. Embedding模型量化

5. Embedding Usage(应用)

参考文献

About

Releases

Packages

Languages

percent4/embedding_model_exp

Folders and files

Latest commit

History

Repository files navigation

1. Embedding模型评估

2. Embedding模型微调

3. ReRank模型微调

4. Embedding模型量化

5. Embedding Usage(应用)

参考文献

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages