Skip to content

Git Repo for EDW Best Practice Assets on the Lakehouse

License

Notifications You must be signed in to change notification settings

datakey-ai/edw-best-practices

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

edw-best-practices

Git Repo for EDW Best Practice Assets on the Lakehouse

This Git Project Provides a framework of example notebooks that aims to show any typical data warehousing SQL users how to built pipelines and analytics on the Lakeshouse. Broken out in 4 steps, the notebooks will walk the user through a single use case that they can run in their own Databricks environment leading them through the data maturity curve as follows:

  • 1. Step 1 - Build a classical batched-oriented SQL pipeline with best practices on the Lakehouse
  • 2. Step 2 - Build the above in Delta Live Tables and automate all orchestration
  • 3. Step 3 - Build and analyze summary analytics tables
  • 4. Step 4 - Create gold views
  • 4. Step 5 - Convert and old batch pipeline to a Streaming pipeline

    This Git repo also provides some examples of more advacned use cases like using the Delta Change Data feed.

    This Git repo also provides some helper functions that make ETL easier in production pipelines.

  • About

    Git Repo for EDW Best Practice Assets on the Lakehouse

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages

    • Python 100.0%