Data Modeling Where Theory Meets Reality - How Different Companies I Worked At Modeled Their Data

  Рет қаралды 19,042

Seattle Data Guy

Seattle Data Guy

Күн бұрын

Пікірлер: 44
@SeattleDataGuy
@SeattleDataGuy 9 ай бұрын
If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
@karszn
@karszn 9 ай бұрын
Thank you for this . Using this for my analytics engineer interview and data modeling .
@SeattleDataGuy
@SeattleDataGuy 9 ай бұрын
good luck!
@thabompala6524
@thabompala6524 3 ай бұрын
How did it go? I have one coming up
@Lunarisage
@Lunarisage 8 ай бұрын
Speaking from the analyst side of things, I was totally happy to create derived views/tables to be able to get historical data accurate if the data engineers could provide a change-record table with a certain minimal set of fields - the value (City in the earlier example), the relevant record (Customer in this case), and the date that value started applying. So if you don't have time to create something cleaner, providing at least those will give analysts enough to figure the rest out. Thanks for talking about this!
@ivanooo117
@ivanooo117 8 ай бұрын
Would be great to hear you talk more about data architecture design! Also would be nice to hear your experience around the ‘build or buy’ dilemma within data teams.
@tech-n-data
@tech-n-data 8 ай бұрын
I was just trying to figure this out for a DE portfolio project regarding an employees table, thank you!!
@sergioramos3437
@sergioramos3437 5 ай бұрын
It's so great to see this explained. I worked with an engineering team who did daily partitions but only for a week or two AND month end snapshots ... I had no idea what those were until I had to learn to query their data for reports lol But we didn't have a macro, we used where dt = current date - 2 (also on hive) cause we knew our latest refresh had a 2 day lag always And we may use subqueries to create each "latest" table and join those together Also, curious to see your thoughts on audit tables? I did application support and they had a robust history of every change made to every field and who changed it
@glstnlev
@glstnlev 8 ай бұрын
Interesting use case about SCD2 but how in practice do we create these tables? I understand the importance and how useful is it to have a new row for each change but can’t get how to model it to make it work
@sergioramos3437
@sergioramos3437 5 ай бұрын
Is there a playlist somewhere? Trying to figure out where I'm at in this series lol
@zoltantakats1146
@zoltantakats1146 8 ай бұрын
Thank you! Great and very helpful video as always. I have a question regarding the SCDT2 explanation at 7:30. Don't you have to set the end date of the first record of John to 2023-12-31? Because in your example, it seems it is overlapping if you query for the active records for 2024-01-01. You are going to have John's records twice in your result set on that day.
@raphaeldayan
@raphaeldayan 7 ай бұрын
Thanks for the video!
@SeattleDataGuy
@SeattleDataGuy 7 ай бұрын
you're welcome!
@maxonthetrack
@maxonthetrack 9 ай бұрын
As always awesome content!
@SeattleDataGuy
@SeattleDataGuy 8 ай бұрын
Glad you think so!
@lukemurphy3362
@lukemurphy3362 7 ай бұрын
What about using a surrogate key?
@piripitflaustik8499
@piripitflaustik8499 8 ай бұрын
wait.. if you capture the datacreated you do have the date when it change city, you dont really need more information, just need to query all datacreated where name=john and you will have a diagram of john
@SeattleDataGuy
@SeattleDataGuy 8 ай бұрын
Fair question, when I wrote date created my assumption is that this is the date created from the transactional database. Meaning it only represents when the user or customer was created. So even if the information is updated, it would only update the city information. That's generally how SCD type 1 would operate. Now, if instead, the date created field represented when new information was appended, so now you have two rows for john, one where the data represents when the customer lived in seattle and the other NYC you'd likely still want to know does "date created" mean effective date or does it mean when the row was created(which could be two different things). If it did happen to always mean effective date, you could in figure out the start and end date, likely by some form of partition by clause but that puts a lot of weight on analysts performing the logic correctly and it'd likely cause the logic to be in place in multiple places which wouldn't be a good long term design. In turn, having a start effective date and end effective date field would be ideal.
@piripitflaustik8499
@piripitflaustik8499 8 ай бұрын
@@SeattleDataGuy I understand, it should also be more efficient and less costly to update data with an extra field containing the information. Thank you for responding and for the DB lesson!
@SeattleDataGuy
@SeattleDataGuy 8 ай бұрын
@@piripitflaustik8499 Thanks for the question!
@otavioattuy5394
@otavioattuy5394 8 ай бұрын
Where do I find the theory behind the "types" of dimension tables?
@joa0liveira92
@joa0liveira92 7 ай бұрын
Kimball's book
@alexanderpotts8425
@alexanderpotts8425 9 ай бұрын
very cool. when I started out scds were king. every team I've worked in for the last 3 years has gone the "partition every day route". I'm still not sure I like it.
@AyushMandloi
@AyushMandloi 8 ай бұрын
Sound of transition is very loud
@SeattleDataGuy
@SeattleDataGuy 7 ай бұрын
I am reducing these moving forward
@michaeljacobs6460
@michaeljacobs6460 9 ай бұрын
What diagram tool are you using?
@chikusk7197
@chikusk7197 9 ай бұрын
It looks like Lucidchart.
@SeattleDataGuy
@SeattleDataGuy 9 ай бұрын
It is lucid chart!
@michaeljacobs6460
@michaeljacobs6460 9 ай бұрын
Thank you!
@SeattleDataGuy
@SeattleDataGuy 9 ай бұрын
thank you for all your support!
@fzrbigman
@fzrbigman 9 ай бұрын
great video!
@SeattleDataGuy
@SeattleDataGuy 9 ай бұрын
Glad you enjoyed it!
@reddixiecrat
@reddixiecrat 9 ай бұрын
Did you ever get access to Palantir Foundry to see how they model data?
@sitrakaforler8696
@sitrakaforler8696 8 ай бұрын
great video haha
@SeattleDataGuy
@SeattleDataGuy 7 ай бұрын
Glad you enjoyed it
@johnsteer596
@johnsteer596 6 ай бұрын
My name is John and I live in NYC
@SeattleDataGuy
@SeattleDataGuy 6 ай бұрын
when do you plan on moving so i can track that in my DW
@jk3587
@jk3587 9 ай бұрын
ds=''
@SeattleDataGuy
@SeattleDataGuy 9 ай бұрын
JK to the rescue! fun fact we were on the same team at FB
@mihirit7137
@mihirit7137 8 ай бұрын
so this macro just displays the current date like the CURRENT_DATE function in postgresql ? I am unsure about what macros are 😅😅
@gourabsarker9552
@gourabsarker9552 8 ай бұрын
Sir is your household income 150k dollars a year? Plz reply. Thanks a lot.
@alecryan8220
@alecryan8220 6 ай бұрын
Is this made with AI
Is it TOO LATE to Become a Data Analyst in 2025?
6:25
Lore So What
Рет қаралды 11 М.
Tuna 🍣 ​⁠@patrickzeinali ​⁠@ChefRush
00:48
albert_cancook
Рет қаралды 148 МЛН
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
Which AI is Best for the Average User?
37:22
Bobology
Рет қаралды 9
Data Modeling in the Modern Data Stack
10:14
Kahan Data Solutions
Рет қаралды 116 М.
A Decade In Data Engineering - Has Anything Actually Changed?
14:28
Seattle Data Guy
Рет қаралды 10 М.
Data Modeling Tutorial: Star Schema (aka Kimball Approach)
16:34
Kahan Data Solutions
Рет қаралды 139 М.
What Tools Should Data Engineers Know - 100 Days Of Data Engineering
17:31
Tuna 🍣 ​⁠@patrickzeinali ​⁠@ChefRush
00:48
albert_cancook
Рет қаралды 148 МЛН