Skip to content

Xuanwo/learn-data-lake-from-storage

Repository files navigation

Learn Data Lake From Storage

Hello everyone, welcome to Learn Data Lake From Storage!

Data lakes are complex systems characterized by varying specifications, formats, and engines. However, the foundational element of all data lakes is the storage layer. Observing how they organize metadata and data on this layer, as well as their optimization strategies based on file design, provides clearer insights. From the storage layer perspective, we can fundamentally understand data more thoroughly. All engines built upon a data lake are essentially implementation details.

This project seeks to explore various data lake projects by deploying them and analyzing their storage behaviors to gain insights into their functionality and design.

Layout

This project is structured into various questions along with different data lake projects, each with its own directory. Every project includes a README.md file that describes the project and provides deployment instructions.

You can navigate to the corresponding directory to explore the project you are interested in. Or you can follow the questions in the Questions section to explore the data lake projects step by step.

Questions

  1. How Data Lake Stores Table Metadata?
  2. How Data Lake Handles One Line Insert?

License

Licensed under the Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0

About

Learn Data Lake From Storage Layer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published