No video

What is this delta lake thing?

  Рет қаралды 57,180

Guy in a Cube

Guy in a Cube

Күн бұрын

You may be using a lake for your data and it may just be regular parquet files. In this video, Stijn joins us to explain why you should be using a delta lake instead and how this works in Azure Synapse Analytics.
Connect with Stijn
/ sqlstijn
What is Delta Lake
docs.microsoft...
Delta Lake Documentation
docs.delta.io/...
📢 Become a member: guyinacu.be/me...
*******************
Want to take your Power BI skills to the next level? We have training courses available to help you with your journey.
🎓 Guy in a Cube courses: guyinacu.be/co...
*******************
LET'S CONNECT!
*******************
-- / guyinacube
-- / awsaxton
-- / patrickdba
-- / guyinacube
-- / guyinacube
-- guyinacube.com
**Gear**
🛠 Check out my Tools page - guyinacube.com...
#AzureSynapse #DeltaLake #GuyInACube

Пікірлер: 35
@joannapodgoetsky4382
@joannapodgoetsky4382 2 жыл бұрын
A for Atomicity I think 😊
@yosh_2024
@yosh_2024 7 күн бұрын
right 🙂
@cantTouch948
@cantTouch948 Жыл бұрын
This video is gold, makes it easier to understand spark and delta lake - kudos!
@mohamedtarek-gh4fr
@mohamedtarek-gh4fr Жыл бұрын
Again, another great video from the great series (Azure synapse analytics) Thanks a lot guys(in the cube), you are amazing
@VjBroodz
@VjBroodz 4 ай бұрын
Very Clear, thanks
@PCGHigh
@PCGHigh 2 жыл бұрын
Great video series for getting started with the topic. Probably the video is already in production but as a follow up to the series I can imagine it could be interesting to see how powerful the functionality of delta is. What exactly does the time travel feature look like. For me it was impressive to see how granular you can jump back in time and roll back changes to rows but also structural changes to a table. If we want to look at it more from an ETL perspective, maybe a look at the change data feed would be interesting. Regardless of how you continue this series I am very excited because your hands-on way of approaching these things takes the hurdle out of many to begin their journey.
@adamcanfield9682
@adamcanfield9682 Жыл бұрын
Yeah I'd like to see this too please!
@radekou
@radekou 2 жыл бұрын
Hello, thanks for putting out great content and useful videos. Delta is certainly cool, however, after having a deeper look: Delta time travel does not seem to be a replacement for a proper Type2 SCD modelled data, since: - there is a limited data retention for the delta log (30 days), it can be extended of course - you can't leverage that time travel when using Serverless SQL Pool (which is how I'd expose Delta tables to Power BI) - or have I missed something obvious? Furthermore - the SQL / pySpark interoperability works to an extent, for example Synapse Spark SQL doesn't support SQL based time travel (SELECT * FROM TABLE AS OF VERSOIN N) - this has to be done via pySpark. On the bright side - pySpark is not that hard to pick up, takes getting used to, but it's quite powerful :) Now only add support for Delta for the Workspace-created Lake Database! :) Cheers
@MDevion
@MDevion 2 жыл бұрын
Im on the same route as you, but still do not fully understand where they want us to do the transformations. Should it be in a notebook, ADF or where exactly? And do we store the transformed data before it gets into silver/delta lake?
@radekou
@radekou 2 жыл бұрын
@@MDevion Re. the transformations - you can do most of those in steps using pySpark or SparkSQL on top of the temp view, and only persist the data physically once all applied. Dataframes are evaluated "lazily", to me it's a bit like layering SQL CTEs on top of each other. As for where to do it? Yeah - we're looking at what does it best - is it Dataflow, Pipelines (copy activity doesn't support writing to Delta) or Notebooks. Lastly - try creating a lake table based on delta location - the columns / schema don't seem to be picked up properly. I used to thing Databricks were a half-baked product but Synapse in its current state (at least for Spark / Serverless) is on another level. No ACLs for folders in the Develop blade? Phew. :)
@MDevion
@MDevion 2 жыл бұрын
@@radekou and what if I only want to apply transformations to the latest data? How does Synapse know that it only should transform the latest data? The whole line up now/process is just incoherent.
@ChronicSurfer
@ChronicSurfer Жыл бұрын
Interesting. What is the benefit of using this vs creating incremental loading within your merge statements? Are there more costs associated with using a delta lake? Additionally, will this pick-up changes from my source?
@dancrowell2933
@dancrowell2933 Жыл бұрын
How do you handle change to the source system in a Delta lake? For example: when a source table adds 3 columns and drops two?
@matthiask4602
@matthiask4602 2 жыл бұрын
Adam looks different today...
@RamyNazier
@RamyNazier 2 ай бұрын
Nice video but the audio quality makes it a bit harder to understand
@mrnagoo2
@mrnagoo2 Жыл бұрын
ACID = "Atomicity" not "Automicity". Thanks for the video.
@eth6706
@eth6706 2 жыл бұрын
You should do videos about machine learning models in Synapse
@maxirojo7829
@maxirojo7829 Жыл бұрын
Hello! excellent video! It is recommended in the first bronze layer to save the data in parquet and in the following two in delta? thank you
@crystal9543
@crystal9543 Жыл бұрын
Yes explore the BOG boots on the ground
@user-uf7ie5pt9e
@user-uf7ie5pt9e 5 ай бұрын
To be clear, bronze layer (parquet, json, csv, ...etc), silver layer (delta lake, iceberg or hudi - one table open format) and gold layer (SQL Query - Views) to server data users ? is it correct?
@szklydm
@szklydm 2 жыл бұрын
PySQL should be a thing! 😁
@noahhadro8213
@noahhadro8213 2 ай бұрын
So is this a true statement. Delta, delta tables, and delta-parquet are all synonyms and mean the same thing?
@nagoorpashashaik8400
@nagoorpashashaik8400 Жыл бұрын
@Guy in a cube - Can we do this same thing in ADF - Mapping dataflow?
@kshitizaggarwal1
@kshitizaggarwal1 Жыл бұрын
Atomicity not automicity
@sid0000009
@sid0000009 Жыл бұрын
Can an API hosted on an App service in anyway fetch Delta tables data ? thanks
@googlegoogle1812
@googlegoogle1812 2 жыл бұрын
Do you know what is the difference between lake databases and delta lake project? Both seem to have roughly the same functionality - I can use Spark to do ETL tasks - and then use spark pools as well as serverless sql pools to query data.
@bal_slayer
@bal_slayer Ай бұрын
This is really Bananas
@martinbubenheimer6289
@martinbubenheimer6289 2 жыл бұрын
Previously I would not have perquet files, previously I would have a SQL-Server. What problem does a delta lake solve compared to just using a SQL-Server?
@ghoser1986
@ghoser1986 2 жыл бұрын
When your table is in a Delta format, it opens up new use cases for you. Streaming, Data Science and Analytics from a single source of truth in the language of your choice - sql, python, r. Ultimately getting more value from your data without having to store it multiple times. Also it’s open and cheaper to store vs SQL server. I’m biased but I feel Databricks allows you to get more value from it vs Synapse
@MDevion
@MDevion 2 жыл бұрын
The big advantage is, if you are in the cloud, computation and storage are seperated. Which means you only pay for storage when its not doing anything and only pay processing when needed. For Serverless its around 5 euro/5 usd per 1TB, minimum per query 10MB invoiced. Its alot cheaper then a dedicated pool in Synapse. Also its easier to scale up. In Azure SQL DB(Assuming you are using this), both storage and computation scale up and scale down. There is more flexibility then there used to be, but they are still linked. Also Azure SQL DB is not suited for heavy DWH workload, mainly due how SQL logs and Azure DBs are locked into FULL as log.
@martinbubenheimer6289
@martinbubenheimer6289 2 жыл бұрын
Interesting aspect. I would love to discuss with the Exasol guys when they would recommend to prefer a delta lake against their Exasol SQL database for scalability, heavy workload, or analytics use cases. Do you know what was the origin of delta lake? ACID compliance doesn't seem to be the most important DWH requirement for a data storage solution on the silver layer where access can be controlled by the ETL process: Source systems write to the copper layer, destination system read from the gold layer, inbetween ETL can be orchestrated to eliminate the need of ACID compliance. This sounds more like a transactional requirement.
@helloranjan89
@helloranjan89 2 жыл бұрын
Seems complex 🤔
@willi1978
@willi1978 4 ай бұрын
guess i'd prefer snowflake to this.
Merging your data in a modern lakehouse data warehouse
4:51
Guy in a Cube
Рет қаралды 20 М.
Parquet File Format - Explained to a 5 Year Old!
11:28
Data Mozart
Рет қаралды 29 М.
Ik Heb Aardbeien Gemaakt Van Kip🍓🐔😋
00:41
Cool Tool SHORTS Netherlands
Рет қаралды 8 МЛН
小丑把天使丢游泳池里#short #angel #clown
00:15
Super Beauty team
Рет қаралды 43 МЛН
ОБЯЗАТЕЛЬНО СОВЕРШАЙТЕ ДОБРО!❤❤❤
00:45
КТО ЛЮБИТ ГРИБЫ?? #shorts
00:24
Паша Осадчий
Рет қаралды 1,1 МЛН
What is a Delta Lake? [Introduction to Delta Lake - Ep. 1]
10:23
Pragmatic Works
Рет қаралды 14 М.
Modifying Delta Tables
20:40
Pragmatic Works
Рет қаралды 4,3 М.
Why a Data Lakehouse Architecture
8:02
IBM Technology
Рет қаралды 57 М.
What is Apache Parquet file?
8:02
Riz Ang
Рет қаралды 74 М.
Data Warehouse vs Data Lake vs Data Lakehouse
9:32
Jesper Lowgren
Рет қаралды 43 М.
Ik Heb Aardbeien Gemaakt Van Kip🍓🐔😋
00:41
Cool Tool SHORTS Netherlands
Рет қаралды 8 МЛН