How to Build a Metadata Driven Data Pipelines with Delta Live Tables

  Рет қаралды 17,418

Databricks

Databricks

Күн бұрын

In this session, you will learn how you can use metaprogramming to automate the creation and management of Delta Live Tables pipelines at scale. The goal is to make it easy to use DLT for large-scale migrations, and other use cases that require ingesting and managing hundreds or thousands of tables, using generic code components and configuration-driven pipelines that can be dynamically reused across different projects or datasets.
Talk by: Mojgan Mazouchi and Ravi Gawai
Connect with us: Website: databricks.com
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc
Facebook: / databricksinc

Пікірлер: 10
@brads2041
@brads2041 11 ай бұрын
not clear how this process would handle what happens if your source query, for silver in this context, though might be more relevant to gold, uses something like an aggregate, which dlt streaming doesn't like and you may have to fully materialize a table instead of streaming
@AnjaliH-wo4hm
@AnjaliH-wo4hm 6 ай бұрын
would appreciate if databricks comes with a proper explanation ...both the tutor's explanation aren't clear
@garethbayvel8374
@garethbayvel8374 11 ай бұрын
Does this support Unity Catalog?
@ganeshchand
@ganeshchand 8 ай бұрын
yes. the recent release support UC.
@rishabhruwatia6201
@rishabhruwatia6201 Жыл бұрын
Can we have a video for loading multiple tables using single pipeline
@rishabhruwatia6201
@rishabhruwatia6201 Жыл бұрын
I mean something of a for each activity
@RaviGawai-db
@RaviGawai-db Жыл бұрын
@@rishabhruwatia6201 you can check repo dlt-meta and check dlt-meta-demo or run integration tests
@brads2041
@brads2041 Жыл бұрын
We tried that just recently. Depending on how you approach this it may not work. In our case, we did not always call the DLT with the same tables to be processed. Any table that was processed previously, but not in a next run would be removed from unity (though the parquet files still exist - ie behavior like an external table). This is of course not acceptable, so we switched to meta data driven structured streaming. To put this a different way, if you call the pipeline with table a, then call it again with table b, table a is dropped. You'd have to always execute the pipeline with all tables relevant to the pipeline.
@RaviGawai-db
@RaviGawai-db Жыл бұрын
@@brads2041 you reload onboarding before each run to add or remove tables from group. So workflow might be: onboarding(can refresh each row addition removal for tables a,b) -> DLT Pipeline
Seja Gentil com os Pequenos Animais 😿
00:20
Los Wagners
Рет қаралды 38 МЛН
Когда отец одевает ребёнка @JaySharon
00:16
История одного вокалиста
Рет қаралды 15 МЛН
Всё пошло не по плану 😮
00:36
Miracle
Рет қаралды 3,3 МЛН
How to Build a Delta Live Table Pipeline in Python
25:27
Bryan Cafferky
Рет қаралды 16 М.
Data Denormalization in Modern System Design - Unlocking Performance at Scale
37:36
Better Backend Software Engineering
Рет қаралды 2,4 М.
Accelerating Data Ingestion with Databricks Autoloader
59:25
Databricks
Рет қаралды 69 М.
Databricks - Change Data Feed/CDC with Structured Streaming and Delta Live Tables
38:30
Introduction to Databricks Delta Live Tables
50:06
SQLBits
Рет қаралды 8 М.
Learn to Use Databricks for the Full ML Lifecycle
39:47
Databricks
Рет қаралды 31 М.
Why Databricks Delta Live Tables?
16:43
Bryan Cafferky
Рет қаралды 17 М.
Seja Gentil com os Pequenos Animais 😿
00:20
Los Wagners
Рет қаралды 38 МЛН