Should You Use Databricks Delta Live Tables?

Рет қаралды 7,574

Күн бұрын

Пікірлер: 26

@colinsorensen7382 11 ай бұрын

Omg--finally someone who can talk about the WHY and the tradeoffs in using xyz feature in Databricks instead of just soapboxing about how cool it is and/or how to implement. So helpful, thank you!!

@jkiran2020 6 ай бұрын

I appreciate your straight to the point style. Well done!

@Devilconvincesyou Жыл бұрын

Please, do make a video on migrating workflows to Delta Live!! My colleagues and I are very interested! If it helps, I bought your book :)

@BryanCafferky Жыл бұрын

It does help. Good to know I'm not the only one trying to migrate to DLT. Thanks!

@tomfontanella6585 5 ай бұрын

Excellent video. Thanks for the fair and clear perspective.

@BryanCafferky 5 ай бұрын

You're welcome!

@andrewrodriguez8720 11 ай бұрын

Great video, thank you for what you do!

@BryanCafferky 11 ай бұрын

You're Welcome!

@yosh_2024 5 ай бұрын

Useful and objective analysis.

@rahulsood81 Жыл бұрын

Hey @Bryan can you enlighten 💡💡💡💡 us what's the alternative to using Delta Live tables (DLT). Is it KSQL or Snowflake streams ?

@BryanCafferky Жыл бұрын

DLT is a good option. Other options, depending on your needs. Structured streaming with standard Databricks workflows can be done, perhaps Eventstreams using MS Fabric? I'm not convinced true real time streaming is really required in some cases so I would confirm something less expensive and simpler like frequent refreshes are not an option.

@fb-gu2er 8 ай бұрын

If you need a lot of customization in your ETL, DLT are not for you. If wanna treat your ETL as more traditional apps, DLT are not for you. For example, do you wanna configure a custom logger, call APIs and other services within your pipeline ? Not for you either. DLTs can be great when you have a very simple pipeline, when you transform using fairly simple stages. Anything non trivial, not for you in most cases

@JustBigdata Жыл бұрын

Quick question. If the source file schema keeps on changing and source files are received once in a month, would using DLT be efficient ?

@BryanCafferky Жыл бұрын

DLT and non DLT Delta supports schema evolution. See learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/dlt

@artus198 Жыл бұрын

I am proficient in SQL , and come from an on-premise datawarehouse background... I am trying to get a grip about this whole databricks framework, but I get stuck a lot of times as to how it works ! It is frustrating.

@BryanCafferky Жыл бұрын

Databricks is really a data platform, so a lot more than a framework. Yeah, there is a lot and it is confusing. Watch this intro video which is part of a series to get a summary of what Databricks is. kzbin.info/www/bejne/j6e3q6mQnZisiqc

@JMo268 Жыл бұрын

Someone told me that DLTs can only have one storage location per storage account. So if you're using ADLS and have bronze/silver/gold containers you won't be able to store DLTs in each of those containers. Can anyone confirm if that is true?

@BryanCafferky Жыл бұрын

Not aware of such a limitation. Read these blogs for the right answer. docs.databricks.com/en/delta-live-tables/settings.html#cloud-storage-configuration And docs.databricks.com/en/delta-live-tables/sql-ref.html#create-a-delta-live-tables-materialized-view-or-streaming-table When you create a DLT table, there is a location option which seems to allow pointing to wherever you like.

@wcharliee Жыл бұрын

Think it depends on what you mean by "storage location". DLT can support multiple storage locations per DLT-pipeline. However, if you enable Unity Catalog together with DLT, then the whole DLT-pipeline needs to have the same "target catalog", which in turn needs to be defined on a single location / storage path. You can off course split into different DLT-pipelines depending on the target catalog, but by doing so might obstruct the orchestration benefit that DLT gives.

@osoucy 5 ай бұрын

Awesome decision tree for selecting DLT or not! If people are interested by DLT, but are concerned about vendor lock-in or "stickiness", you might want consider Laktory (kzbin.info/www/bejne/eIuuYYN7YrSln7M) that allows to declare a pipeline using a yaml configuration file and deploy it as DLT, but also as a Databricks job or even run it locally if you want to move away from Databricks.

@BryanCafferky 5 ай бұрын

Looks interesting. How does it fit in and work with Databricks Asset Bundles (DABs)? Does it work with Azure DevOps? Thanks

@osoucy 5 ай бұрын

@@BryanCafferky It can actually be used as a replacement to DABs. I started working on this project before DABs was announced :) It offers more or less the same capabilities as DABs, except for a few key differences: - State management is more aligned with Terraform than DABs, meaning that the state is not automatically saved to your Databricks workspace. As a consequence, the deployments are global by default and not "user-specific". - Laktory supports multiple IaC backends. You can use terraform, but you can also use Pulumi. - Laktory supports almost any Databricks resources. You can not only deploy notebooks and jobs, but also Catalogs, Schemas, Tables, clusters, warehouses, vector search endpoints, queries, secrets, etc. - As per my initial comment, Laktory is an ETL framework, so you can use it to define all your data transformations through SQL statements or Spark Chain, a serialized expression of spark commands. In other words, Laktory is like DAB + dbt in a single framework, but with a strong focus on DataFrame and Spark transformations. I haven't used it with Azure DevOps yet, but they would definitely work nicely together using Laktory CLI. You can find an example of a git action here: github.com/okube-ai/lakehouse-as-code/blob/main/.github/workflows/_job_laktory_deploy.yml The syntax is a bit different than with Azure DevOps, but it would be very similar.