Should You Use Databricks Delta Live Tables?

  Рет қаралды 7,098

Bryan Cafferky

Bryan Cafferky

Күн бұрын

Пікірлер: 26
@colinsorensen7382
@colinsorensen7382 10 ай бұрын
Omg--finally someone who can talk about the WHY and the tradeoffs in using xyz feature in Databricks instead of just soapboxing about how cool it is and/or how to implement. So helpful, thank you!!
@jkiran2020
@jkiran2020 4 ай бұрын
I appreciate your straight to the point style. Well done!
@joshcaro1226
@joshcaro1226 Жыл бұрын
Please, do make a video on migrating workflows to Delta Live!! My colleagues and I are very interested! If it helps, I bought your book :)
@BryanCafferky
@BryanCafferky Жыл бұрын
It does help. Good to know I'm not the only one trying to migrate to DLT. Thanks!
@tomfontanella6585
@tomfontanella6585 3 ай бұрын
Excellent video. Thanks for the fair and clear perspective.
@BryanCafferky
@BryanCafferky 3 ай бұрын
You're welcome!
@rahulsood81
@rahulsood81 Жыл бұрын
Hey @Bryan can you enlighten 💡💡💡💡 us what's the alternative to using Delta Live tables (DLT). Is it KSQL or Snowflake streams ?
@BryanCafferky
@BryanCafferky Жыл бұрын
DLT is a good option. Other options, depending on your needs. Structured streaming with standard Databricks workflows can be done, perhaps Eventstreams using MS Fabric? I'm not convinced true real time streaming is really required in some cases so I would confirm something less expensive and simpler like frequent refreshes are not an option.
@fb-gu2er
@fb-gu2er 7 ай бұрын
If you need a lot of customization in your ETL, DLT are not for you. If wanna treat your ETL as more traditional apps, DLT are not for you. For example, do you wanna configure a custom logger, call APIs and other services within your pipeline ? Not for you either. DLTs can be great when you have a very simple pipeline, when you transform using fairly simple stages. Anything non trivial, not for you in most cases
@yosh_2024
@yosh_2024 4 ай бұрын
Useful and objective analysis.
@JustBigdata
@JustBigdata Жыл бұрын
Quick question. If the source file schema keeps on changing and source files are received once in a month, would using DLT be efficient ?
@BryanCafferky
@BryanCafferky Жыл бұрын
DLT and non DLT Delta supports schema evolution. See learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/dlt
@andrewrodriguez8720
@andrewrodriguez8720 10 ай бұрын
Great video, thank you for what you do!
@BryanCafferky
@BryanCafferky 10 ай бұрын
You're Welcome!
@artus198
@artus198 Жыл бұрын
I am proficient in SQL , and come from an on-premise datawarehouse background... I am trying to get a grip about this whole databricks framework, but I get stuck a lot of times as to how it works ! It is frustrating.
@BryanCafferky
@BryanCafferky Жыл бұрын
Databricks is really a data platform, so a lot more than a framework. Yeah, there is a lot and it is confusing. Watch this intro video which is part of a series to get a summary of what Databricks is. kzbin.info/www/bejne/j6e3q6mQnZisiqc
@JMo268
@JMo268 Жыл бұрын
Someone told me that DLTs can only have one storage location per storage account. So if you're using ADLS and have bronze/silver/gold containers you won't be able to store DLTs in each of those containers. Can anyone confirm if that is true?
@BryanCafferky
@BryanCafferky Жыл бұрын
Not aware of such a limitation. Read these blogs for the right answer. docs.databricks.com/en/delta-live-tables/settings.html#cloud-storage-configuration And docs.databricks.com/en/delta-live-tables/sql-ref.html#create-a-delta-live-tables-materialized-view-or-streaming-table When you create a DLT table, there is a location option which seems to allow pointing to wherever you like.
@wcharliee
@wcharliee 11 ай бұрын
Think it depends on what you mean by "storage location". DLT can support multiple storage locations per DLT-pipeline. However, if you enable Unity Catalog together with DLT, then the whole DLT-pipeline needs to have the same "target catalog", which in turn needs to be defined on a single location / storage path. You can off course split into different DLT-pipelines depending on the target catalog, but by doing so might obstruct the orchestration benefit that DLT gives.
@osoucy
@osoucy 3 ай бұрын
Awesome decision tree for selecting DLT or not! If people are interested by DLT, but are concerned about vendor lock-in or "stickiness", you might want consider Laktory (kzbin.info/www/bejne/eIuuYYN7YrSln7M) that allows to declare a pipeline using a yaml configuration file and deploy it as DLT, but also as a Databricks job or even run it locally if you want to move away from Databricks.
@BryanCafferky
@BryanCafferky 3 ай бұрын
Looks interesting. How does it fit in and work with Databricks Asset Bundles (DABs)? Does it work with Azure DevOps? Thanks
@osoucy
@osoucy 3 ай бұрын
@@BryanCafferky It can actually be used as a replacement to DABs. I started working on this project before DABs was announced :) It offers more or less the same capabilities as DABs, except for a few key differences: - State management is more aligned with Terraform than DABs, meaning that the state is not automatically saved to your Databricks workspace. As a consequence, the deployments are global by default and not "user-specific". - Laktory supports multiple IaC backends. You can use terraform, but you can also use Pulumi. - Laktory supports almost any Databricks resources. You can not only deploy notebooks and jobs, but also Catalogs, Schemas, Tables, clusters, warehouses, vector search endpoints, queries, secrets, etc. - As per my initial comment, Laktory is an ETL framework, so you can use it to define all your data transformations through SQL statements or Spark Chain, a serialized expression of spark commands. In other words, Laktory is like DAB + dbt in a single framework, but with a strong focus on DataFrame and Spark transformations. I haven't used it with Azure DevOps yet, but they would definitely work nicely together using Laktory CLI. You can find an example of a git action here: github.com/okube-ai/lakehouse-as-code/blob/main/.github/workflows/_job_laktory_deploy.yml The syntax is a bit different than with Azure DevOps, but it would be very similar.
@ngneerin
@ngneerin Жыл бұрын
It's expensive
@BryanCafferky
@BryanCafferky Жыл бұрын
So are data engineers. Usually people cost more than compute.
@awadelrahman
@awadelrahman 5 ай бұрын
@@BryanCafferky😅😅
@ngneerin
@ngneerin Жыл бұрын
Can write a job instead
How to Build a Delta Live Table Pipeline in Python
25:27
Bryan Cafferky
Рет қаралды 18 М.
Why Databricks Delta Live Tables?
16:43
Bryan Cafferky
Рет қаралды 18 М.
The evil clown plays a prank on the angel
00:39
超人夫妇
Рет қаралды 52 МЛН
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
Advancing Spark - Rethinking ETL with Databricks Autoloader
21:09
Advancing Analytics
Рет қаралды 27 М.
Data Lakehouse: An Introduction
25:00
Bryan Cafferky
Рет қаралды 22 М.
Master Dimensional Modeling Lesson 01 - Why Use a Dimensional Model?
9:57
Delta Tables
16:24
Pragmatic Works
Рет қаралды 13 М.
Why use DuckDB in your data pipelines ft. Niels Claeys
22:26
MotherDuck
Рет қаралды 23 М.
Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
1:27:52