This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports
- Kedro starter, to get you up to speed fast
- automatically creating Snowflake Stored Procedures from Kedro nodes (using Snowpark SDK)
- translating Kedro pipeline into Snowflake tasks graph
- running Kedro pipeline fully within Snowflake, without external system
- using Kedro's official
SnowparkTableDataSet
- automatically storing intermediate data as Transient Tables (if Snowpark's DataFrames are used)
- (New!) MLflow integration with Snowflake with examples in Snowflights Kedro starter
For detailed documentation refer to https://kedro-snowflake.readthedocs.io/
-
Install the plugin
pip install "kedro-snowflake>=0.1.0"
-
Create new project with our Kedro starter ❄️ Snowflights 🚀:
kedro new --starter=snowflights --checkout=master
And answer the interactive prompts ⬇️ (click to expand)
Project Name ============ Please enter a human readable name for your new project. Spaces, hyphens, and underscores are allowed. [Snowflights]: Snowflake Account ================= Please enter the name of your Snowflake account. This is the part of the URL before .snowflakecomputing.com []: abc-123 Snowflake User ============== Please enter the name of your Snowflake user. []: user2137 Snowflake Warehouse =================== Please enter the name of your Snowflake warehouse. []: compute-wh Snowflake Database ================== Please enter the name of your Snowflake database. [DEMO]: Snowflake Schema ================ Please enter the name of your Snowflake schema. [DEMO]: Snowflake Password Environment Variable ======================================= Please enter the name of the environment variable that contains your Snowflake password. Alternatively, you can re-configure the plugin later to use Kedros credentials.yml [SNOWFLAKE_PASSWORD]: Pipeline Name Used As A Snowflake Task Prefix ============================================= [default]: Enable Mlflow Integration (See Documentation For The Configuration Instructions) ================================================================================ [False]: The project name 'Snowflights' has been applied to: - The project title in /tmp/snowflights/README.md - The folder created for your project in /tmp/snowflights - The project's python package in /tmp/snowflights/src/snowflights
-
Run the project
cd snowflights kedro snowflake run --wait-for-completion
- Install the plugin
pip install "kedro-snowflake>=0.1.0"
- Initialize the plugin
kedro snowflake init <ACCOUNT> <USER> <PASSWORD_FROM_ENV> <DATABASE> <SCHEMA> <WAREHOUSE>
- Run the project
kedro snowflake run --wait-for-completion
Execution: