Rounding out the lakehouse with the GOLD layer in Azure Synapse

Рет қаралды 13,815

Күн бұрын

Пікірлер: 25

@dbalkin777 2 жыл бұрын

I love all of the Azure videos, but what is the entry point for those of us who are primarily PBI developers? Is there a road map to learning all of this? Are you planning on creating a How-To to get started in this world?

@FrostSpike 2 жыл бұрын

03:27 The data isn't "autonomous"! The "A" in ACID represents "Atomicity" rather than "Automicity". This just means that a set of changes to the data either completes fully, or not at all i.e. the "transaction" wrapping the changes is "rolled back" in its entirety on an error. Rather than changes partially occurring, the consistency of the database is maintained by reverting any mutating operations already done in the transaction prior to the error.

@AessamL 2 жыл бұрын

@Stijn thx for this video series, i kinda hope that you continue writing in your blog as well. I can only see part 1 for now.

@edgards 2 жыл бұрын

Patrick's breathing very close to the mic! Awesome video regardless!

@tangerinekpopper1868 Жыл бұрын

Noticed that too. Lol

@yogeshnikam8064 2 жыл бұрын

Great content! Will it be available in dedicated SQL pool as well?

@scottytarvan9523 2 жыл бұрын

I think that there is some confusion here (maybe on my part). You mention that the lake database is not a persistent store of data but a meta-data model over a file system, agreed on that. But then you mention that it supports ACID transactions even though data is not persisted in the gold layer? I would describe this a little different: The silver layer is persisted via the delta format allowing ACID transactions over a file system, this is the main feature of delta lake allowing the lakehouse architecture. The gold layer is a meta-data model (also called virtual tables) that is able to query the silver layer at runtime using the sql serverless pool. The virtual tables have been created in such a way that they simulate facts and dimensions in a star schema. Couple of problems: - You created a fake SCD type 2 table in your gold layer, this needs to be implemented properly in the silver layer, not a trivial task. - You would generally need to generate surrogate keys to join facts and dimensions, either by implementing your own algorithm or by using the built-in delta identity column, once again not a trivial task when you lookup foreign keys. -You need to consider partition pruning between facts and dimensions for query performance -You often need a calendar dimension for date calculations -You need to compare performance and cost between the synapse serverless sql pool and the databricks sql endpoint (now strangley called a datawarehouse), they can both be used to create virtual tables/views and be attached to power bi I appreciate the video but this is far from a real world working application that can scale and has considered performance and cost factors.

@anonimoi4957 2 жыл бұрын

All good but this breathing sound doesn`t let me focus enough

@gabrielmorais7312 2 жыл бұрын

True!

@bitips Жыл бұрын

I notice that in all videos from Guy in a cube.

@kevinfrank7044 2 жыл бұрын

Has that SCD video been created yet? I'm looking for a good video on how to create a type 2 SCD with Databricks in Azure Synapse.

@lkassen Жыл бұрын

This is bananas! 🤣

@bunnihilator Жыл бұрын

the drop table if exists gives an hive error about illegal argument exception null path

@MrRittick Жыл бұрын

I am a big fan of your posts. However you have been saying "Automicity/Autonomous" for the ACID properties, it should be Atomicity. Which means the transactions should happen only once(Atom), either commit or rollback. Great works! cheers

@attapon56 2 жыл бұрын

May I ask, why delta lake tool doesn't come GUI like other pipeline tools or use SQL as a main language, instead of python class? are there any reason behind this ? In my view, moving data across pool seems very common and should have been more simple and automated.

@danielmudie605 2 жыл бұрын

Maybe it's a stupid question... but if the gold layer tables aren't persistent how can I add indexes to tune them for specific queries? Or is there another "platinum" layer we'll be introduced to in another video?

@B-Luv 2 жыл бұрын

Disclaimer, I'm just learning this too. But do you need to tune it for specific queries? Is your flow not to pull all of the data from gold into the Power BI service?