No video

Understand Slowly Changing Dimensions

  Рет қаралды 20,583

Bryan Cafferky

Bryan Cafferky

Күн бұрын

Пікірлер: 58
@sheldonwayne1390
@sheldonwayne1390 2 жыл бұрын
I'm from China and it's frustrating that you can't find any video doing a good job explaining knowledges about DWH, glad I found this on KZbin. Cheers!
@thoroth_
@thoroth_ Ай бұрын
This is the best explanation of this concept that I’ve seen, thank you so much
@BryanCafferky
@BryanCafferky Ай бұрын
You're welcome!
@sandrafield9813
@sandrafield9813 2 жыл бұрын
Thanks for this description! Its better than all the other ones I found. Its more clear with extra details without being convoluted and messy.
@italomacelloneable
@italomacelloneable 3 жыл бұрын
Excellent content and very well explained. Thank you
@cluebcke
@cluebcke 3 жыл бұрын
This is really awesome, thank you so much. Just to add, one use case for Type 5 (which I had no idea was called Type 5 before watching this) is that if you have a lot of data in a record that doesn't change much, and only a few fields that change a lot, partitioning them into different dimensions can substantially reduce table bloat (as you're not creating many copies of unchanged data) and improve query performance. There are tradeoffs of course and you really have to evaluate (and test empirically) for your use case.
@BryanCafferky
@BryanCafferky 3 жыл бұрын
Thanks. Great point too.
@MuhammadMustafa-rd6vg
@MuhammadMustafa-rd6vg 2 жыл бұрын
this is a good tutorial , watched dimensional modelling & SCD tutorial, you were pretty clear , explained in detail without making it boring !
@houstonfirefox
@houstonfirefox Жыл бұрын
Great explanation Bryan w/real-world examples and analogies. New sub!
@bestherk4522
@bestherk4522 3 жыл бұрын
Good video, Clear Explanations. Thank you.
@BryanCafferky
@BryanCafferky 3 жыл бұрын
YW and thanks for watching!
@maisamabbas6872
@maisamabbas6872 6 ай бұрын
This is the best explaination for slowly changing dimensions that I've come across on KZbin. You definitely deserve my like and subscribe. Thanks for such an amazing video. I hope you make a lot more videos on Data Engineering topics.
@BryanCafferky
@BryanCafferky 6 ай бұрын
Thanks
@sauravmisra6061
@sauravmisra6061 2 жыл бұрын
Thank you sir for explaining the code EDW concepts in both the videos - Dimensional modeling and SCD, so well and precisely. I also followed your Databricks and Spark and these are the best I have found so far on these topics. Thank you , sir and really appreciate your effort.
@BryanCafferky
@BryanCafferky 2 жыл бұрын
Glad it is helpful. Thanks for your kind words.
@Jel.Awesh.M
@Jel.Awesh.M 2 жыл бұрын
Thank you for sharing! A very good resource for SCDs!
@Pasdpawn
@Pasdpawn Жыл бұрын
man I spent a fortune to get an IT degree and here i am , learning it from a youtube tutorial which is far better and easier to understand than my database lectures. thanks a lot sir for your effort.
@BryanCafferky
@BryanCafferky Жыл бұрын
YW. So glad it was helpful.
@helovesdata8483
@helovesdata8483 2 жыл бұрын
you've helped me to understand data warehouses more
@batmanbuddha
@batmanbuddha 3 жыл бұрын
Awesome Explanation, even sharing to my friend too,
@BryanCafferky
@BryanCafferky 3 жыл бұрын
Thanks. Yeah. The more that can be helped the better.
@marcosoliveira8731
@marcosoliveira8731 3 жыл бұрын
Topics like this still so relevant. I hope you have more to share about DW Architecture.
@guillermopalaciosgaray
@guillermopalaciosgaray 2 жыл бұрын
Thanks for your explanation! This is the best video I've found :)
@I677000
@I677000 3 жыл бұрын
Awesome, :) hop to get this question asked on my interview :) now I know how to answer it:)
@fatallny
@fatallny Жыл бұрын
Awesome video, now everything is clear
@BryanCafferky
@BryanCafferky Жыл бұрын
Glad to hear it. Thanks.
@shivam6565
@shivam6565 2 жыл бұрын
Thank you Bryan for this wonderful tutorial. Concept lucidly explained and I like your teaching style. Liked and subscribed.
@zerovine8025
@zerovine8025 Жыл бұрын
Thanks for the easy to understand explanation, man. (btw nice mustache)
@anirudhvyas6069
@anirudhvyas6069 10 ай бұрын
nicely presented thank you!
@BryanCafferky
@BryanCafferky 9 ай бұрын
You're welcome!
@dataofbitcoin6685
@dataofbitcoin6685 2 жыл бұрын
Amazing video, again! The best in biz.
@BryanCafferky
@BryanCafferky 2 жыл бұрын
Thanks
@a29miller
@a29miller 2 жыл бұрын
Well done. I really appreciate the detail
@AC-hh2cb
@AC-hh2cb 3 жыл бұрын
Great explanation, will help me at work :)
@rickstratton1714
@rickstratton1714 3 жыл бұрын
Good video, clear explanations and visual examples. Have to wonder what kind of cocktail parties you attend! :)
@BryanCafferky
@BryanCafferky 3 жыл бұрын
Hmmm... Come to think of it, I was not invited back. :-) Thanks
@JimRohn-u8c
@JimRohn-u8c 2 жыл бұрын
Thank you so much for this! I’d love to see more videos on this. Btw, how does this work for Snowflake Schemas? Are those schemas only used in OLTP Databases?
@BryanCafferky
@BryanCafferky 2 жыл бұрын
Snowflake schemas are to be avoided if at all possible. I think the product Snowflake is causing confusion about this due to its name,
@SwapperTheFirst
@SwapperTheFirst 2 жыл бұрын
@@BryanCafferky I guess these to be avoided only if (1) compute and storage are expensive (2) compute and storage are slow (3) you don't have MPP or vectorized calculations. In current environment, compute and storage cost much less than data engineer and data architect time, so data modeling approaches changed. What was true for Teradata or Netezza in 1999 is not true for Snowflake or Big Query in 2022. On the topic of SCD, you can find the video of Airflow original creator, where he asserts, that all these approaches with SCD are just thing of the past, and now we should create daily or weekly full snapshots of dimensional tables. I don't fully understand his approach (for example, Kimball Type 2 for me sounds very intuitive), but I don't fully grasp how to solve the SCD reporting problem, if you have (say) daily snapshots of all your dimensional tables.
@BryanCafferky
@BryanCafferky 2 жыл бұрын
@@SwapperTheFirst SCD concepts I explained are based on Kimball. It's more about the functionality required rather than the technology used. Airflow has nothing to do with this subject so not show how the Airflow creator's comments apply. You should choose the best approach for your needs. Thanks
@elinadiary9357
@elinadiary9357 2 ай бұрын
Thank you sir
@marcosoliveira8731
@marcosoliveira8731 3 жыл бұрын
THANK YOU! Real good explanation! btw: Nice R mug :)
@JimRohn-u8c
@JimRohn-u8c 2 жыл бұрын
Also why would a “NULL” be better than “9999-12-31” etc. ?
@BryanCafferky
@BryanCafferky 2 жыл бұрын
a NULL means unknown whereas '9999-12-31' is a value. If you filter where date > 2022-01-01, the later would return TRUE but usually a NULL compare would not return TRUE. Forcing vlaues is an option but technically inaccurate with repercussions. Bear in mind the 9999-12-31 would appear in groupings on reports, etc. A text value of 'NULL' is just a glitch of the extract from a database and not a true NULL.
@aaronvr_
@aaronvr_ 3 жыл бұрын
Great video, very well explained! Is there an efficient way to actually identify/locate/spot such changes in large data sets? Slowly changing dimensions help when dealing with pre-identified changes, but as to how one could locate them in the first place..I'm stuck!
@BryanCafferky
@BryanCafferky 3 жыл бұрын
Hi. SQL Server has a feature called change data capture. docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15
@BryanCafferky
@BryanCafferky 3 жыл бұрын
Or you can use a merge statement that compares the data. Ideally, the source tables have a modified date/time that can be used to get only changes since the last ETL load. This is easier than using Change Data Capture, i.e. you just keep track of the last date/time you did a load and only pull data that comes after that.
@nameunknown007
@nameunknown007 2 жыл бұрын
Writes a book about Microsoft and sells it on Amazon like a legend 👍
@gokukanishka
@gokukanishka 2 жыл бұрын
Hey Bryan , can you please make videos on fast changing dimensions and how to handle them
@houstonfirefox
@houstonfirefox Жыл бұрын
Apparently Gender went from a static dimension to a fast changing dimension 🤣
@LAPOVETS
@LAPOVETS 2 жыл бұрын
thank you!
@javedabdool1187
@javedabdool1187 Жыл бұрын
great video boss. I have 2 questions though maybe a lil stupid :P if eveer i missed something: should we put the source key in the fact table?if not then why bring the source key then? do fact table need a fact id?
@BryanCafferky
@BryanCafferky Жыл бұрын
Thanks. The source dimension key like ProductKey should be stored in the dim table and the dim surrogate key (created by the BI ETL) is the Pk to teh Dim table and FK to the Fact table. You don't need a fact table key per se but SQL Tables perform much better with a primary key(pk) so an Identity column suffices. See my video on dim modeling kzbin.info/www/bejne/ooizmoaQm5xgq7M
@javedabdool1187
@javedabdool1187 Жыл бұрын
@@BryanCafferky cools thnx!!
@canyurt1
@canyurt1 2 жыл бұрын
Hey Bryan - I think Type 4 resembles Data Vault initiatives. Would you not agree?
@BryanCafferky
@BryanCafferky 2 жыл бұрын
I have not studied Data Vault modeling in depth but from the wiki on the subject, I would be careful equating concepts of the two methods as Data Vault modeling does not cleanse and optimize for querying per this link en.wikipedia.org/wiki/Data_vault_modeling
@usman_oz
@usman_oz 3 жыл бұрын
Thanks for the explanation. I just wanted to check if the "Dim customer" table at around 13:20 playtime shows the correct Start & End dates. Shouldn't the "Start Date" be +1 day of the previous record's End Date ?
@marcjkeppler3590
@marcjkeppler3590 11 күн бұрын
How does this video have fewer than 1k likes? 😅
Dimensional Modeling
53:54
Bryan Cafferky
Рет қаралды 167 М.
SCD: Slowly changing dimensions explained with real examples
25:43
Joker can't swim!#joker #shorts
00:46
Untitled Joker
Рет қаралды 40 МЛН
61. Databricks | Pyspark | Delta Lake : Slowly Changing Dimension (SCD Type2)
20:03
Slowly Changing Dimensions(SCD) Types with Real time examples
27:10
The Data Channel
Рет қаралды 4,5 М.
Data Architecture vs. Data Engineering Deep Dive
33:56
Bryan Cafferky
Рет қаралды 3,9 М.
Working with Slowly Changing Dimensions in Power BI
13:31
Guy in a Cube
Рет қаралды 53 М.
Dimensional Modeling - Declaring Dimensions
55:32
Pragmatic Works
Рет қаралды 24 М.
Master Dimensional Modeling Lesson 02 - The 4 Step Process
14:25
Bryan Cafferky
Рет қаралды 2,8 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57