If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
@karszn9 ай бұрын
Thank you for this . Using this for my analytics engineer interview and data modeling .
@SeattleDataGuy9 ай бұрын
good luck!
@thabompala65243 ай бұрын
How did it go? I have one coming up
@Lunarisage8 ай бұрын
Speaking from the analyst side of things, I was totally happy to create derived views/tables to be able to get historical data accurate if the data engineers could provide a change-record table with a certain minimal set of fields - the value (City in the earlier example), the relevant record (Customer in this case), and the date that value started applying. So if you don't have time to create something cleaner, providing at least those will give analysts enough to figure the rest out. Thanks for talking about this!
@ivanooo1178 ай бұрын
Would be great to hear you talk more about data architecture design! Also would be nice to hear your experience around the ‘build or buy’ dilemma within data teams.
@tech-n-data8 ай бұрын
I was just trying to figure this out for a DE portfolio project regarding an employees table, thank you!!
@sergioramos34375 ай бұрын
It's so great to see this explained. I worked with an engineering team who did daily partitions but only for a week or two AND month end snapshots ... I had no idea what those were until I had to learn to query their data for reports lol But we didn't have a macro, we used where dt = current date - 2 (also on hive) cause we knew our latest refresh had a 2 day lag always And we may use subqueries to create each "latest" table and join those together Also, curious to see your thoughts on audit tables? I did application support and they had a robust history of every change made to every field and who changed it
@glstnlev8 ай бұрын
Interesting use case about SCD2 but how in practice do we create these tables? I understand the importance and how useful is it to have a new row for each change but can’t get how to model it to make it work
@sergioramos34375 ай бұрын
Is there a playlist somewhere? Trying to figure out where I'm at in this series lol
@zoltantakats11468 ай бұрын
Thank you! Great and very helpful video as always. I have a question regarding the SCDT2 explanation at 7:30. Don't you have to set the end date of the first record of John to 2023-12-31? Because in your example, it seems it is overlapping if you query for the active records for 2024-01-01. You are going to have John's records twice in your result set on that day.
@raphaeldayan7 ай бұрын
Thanks for the video!
@SeattleDataGuy7 ай бұрын
you're welcome!
@maxonthetrack9 ай бұрын
As always awesome content!
@SeattleDataGuy8 ай бұрын
Glad you think so!
@lukemurphy33627 ай бұрын
What about using a surrogate key?
@piripitflaustik84998 ай бұрын
wait.. if you capture the datacreated you do have the date when it change city, you dont really need more information, just need to query all datacreated where name=john and you will have a diagram of john
@SeattleDataGuy8 ай бұрын
Fair question, when I wrote date created my assumption is that this is the date created from the transactional database. Meaning it only represents when the user or customer was created. So even if the information is updated, it would only update the city information. That's generally how SCD type 1 would operate. Now, if instead, the date created field represented when new information was appended, so now you have two rows for john, one where the data represents when the customer lived in seattle and the other NYC you'd likely still want to know does "date created" mean effective date or does it mean when the row was created(which could be two different things). If it did happen to always mean effective date, you could in figure out the start and end date, likely by some form of partition by clause but that puts a lot of weight on analysts performing the logic correctly and it'd likely cause the logic to be in place in multiple places which wouldn't be a good long term design. In turn, having a start effective date and end effective date field would be ideal.
@piripitflaustik84998 ай бұрын
@@SeattleDataGuy I understand, it should also be more efficient and less costly to update data with an extra field containing the information. Thank you for responding and for the DB lesson!
@SeattleDataGuy8 ай бұрын
@@piripitflaustik8499 Thanks for the question!
@otavioattuy53948 ай бұрын
Where do I find the theory behind the "types" of dimension tables?
@joa0liveira927 ай бұрын
Kimball's book
@alexanderpotts84259 ай бұрын
very cool. when I started out scds were king. every team I've worked in for the last 3 years has gone the "partition every day route". I'm still not sure I like it.
@AyushMandloi8 ай бұрын
Sound of transition is very loud
@SeattleDataGuy7 ай бұрын
I am reducing these moving forward
@michaeljacobs64609 ай бұрын
What diagram tool are you using?
@chikusk71979 ай бұрын
It looks like Lucidchart.
@SeattleDataGuy9 ай бұрын
It is lucid chart!
@michaeljacobs64609 ай бұрын
Thank you!
@SeattleDataGuy9 ай бұрын
thank you for all your support!
@fzrbigman9 ай бұрын
great video!
@SeattleDataGuy9 ай бұрын
Glad you enjoyed it!
@reddixiecrat9 ай бұрын
Did you ever get access to Palantir Foundry to see how they model data?
@sitrakaforler86968 ай бұрын
great video haha
@SeattleDataGuy7 ай бұрын
Glad you enjoyed it
@johnsteer5966 ай бұрын
My name is John and I live in NYC
@SeattleDataGuy6 ай бұрын
when do you plan on moving so i can track that in my DW
@jk35879 ай бұрын
ds=''
@SeattleDataGuy9 ай бұрын
JK to the rescue! fun fact we were on the same team at FB
@mihirit71378 ай бұрын
so this macro just displays the current date like the CURRENT_DATE function in postgresql ? I am unsure about what macros are 😅😅
@gourabsarker95528 ай бұрын
Sir is your household income 150k dollars a year? Plz reply. Thanks a lot.