-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic data generation from LLMs #7
Comments
I would suggest PandasAI and Synthea as starting points for this. A vector storage with instructions may be all the is required to get similar or improved results over Synthea -- with the added benefit of being less expensive than tuning a model and allowing the use of larger base models. Another option is to take real data and generate synthetic data from it. Synthetic Data Vault is a good example of this. From, for example, 100 real records you can expand to >100. with GaussianCopula and CTGAN. It would be interesting to use this framework to add a third method of an LLM an evaluate between the three. |
I’m interested in using this approach to see if we can create accurate synthetic data from a pretrained LLM. First step would be to have an evaluation framework. Opening this issue for discussion of this use case.
The text was updated successfully, but these errors were encountered: