Add Data Leakage Assessment #1866

dt-ahmed-touila · 2024-03-26T15:35:24Z

🚀 Feature Request

Measure data leakage based on a subset of samples/conversations from the training/fine-tuning dataset

🔈 Motivation

To efficiently apply generative models to specific domain/application (legal field is a good example) companies turn to fine-tuning / alignment on proprietary/confidential dataset. Once the fine-tuning is done and given the nature of the models, there is a significant risk of training data leakage on inference. This leakage is due to two capacities in LLMs; Memorization and Association.

🛰 Alternatives

Adhoc implementations 🐰

luca-martial · 2024-04-16T08:29:29Z

Hi @dt-ahmed-touila, thanks for sharing this request! Since we're blackbox application-focused we don't currently develop tooling for training/fine-tuning related evaluation. We'll re-open this if we get to that domain one day.

dt-ahmed-touila added the enhancement New feature or request label Mar 26, 2024

luca-martial closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Data Leakage Assessment #1866

Add Data Leakage Assessment #1866

dt-ahmed-touila commented Mar 26, 2024 •

edited

Loading

luca-martial commented Apr 16, 2024

Add Data Leakage Assessment #1866

Add Data Leakage Assessment #1866

Comments

dt-ahmed-touila commented Mar 26, 2024 • edited Loading

🚀 Feature Request

🔈 Motivation

🛰 Alternatives

luca-martial commented Apr 16, 2024

dt-ahmed-touila commented Mar 26, 2024 •

edited

Loading