😎 Cocoon provides LLM agents to organize raw data in your data warehouse, ready for analysis.
- 👉 Python Package: Check out the notebook that cleans tables in Snowflake/DuckDB
- 👉 Check out the 1 min demo
Screenshot where LLMs help you interactively cast columns and fix cases. The output is DBT staging sql/yml.
Profiling is the first step to understanding the table and identifying any anomalies.
Many small decisions require semantic understanding by LLMs. For example, an age of 100 is acceptable, but -1 is impossible!
- 👉 Online Service: Drop your CSV, and the profile will be ready in <10 min
- 👉 Python Package: Check out the notebook to interactively profile your table in python
- (Both run the same code; Python package requires LLM API, but is interactive and no size/#col limit)
Check out more profiles
Dataset Title | Profile Link |
---|---|
AQI and Latitude/Longitude of Countries | View Profile |
2020 Property Sales Data | View Profile |
AAC Shelter Cat Outcome | View Profile |
Books | View Profile |
Cancer | View Profile |
Divorces 2000-2015 | View Profile |
German Credit Data | View Profile |
K-Drama | View Profile |
Patients | View Profile |
Used Car Data | View Profile |
Cite Cocoon Profile
@article{huang2024cocoon,
title={Cocoon: Semantic Table Profiling Using Large Language Models},
author={Huang, Zezhou and Wu, Eugene},
journal={arXiv preprint arXiv:2404.12552},
year={2024}
}
Join could be challenging when a standardized join key is missing (e.g., join by non-standardized names).
We help you find the related ones, and explain how they are related.
Cite Cocoon Fuzzy Join
@article{huang2024disambiguate,
title={Disambiguate Entity Matching through Relation Discovery with Large Language Models},
author={Huang, Zezhou},
journal={arXiv preprint arXiv:2403.17344},
year={2024}
}
We are working on tools to help understand data, break silos and maintain pipelines for the data warehouse.
These will make discovering tables, generating reports, and making predictions incredibly simple.
Email [email protected] to learn more...