Want to build a reliable, modern data architecture without the mess? Here’s a free checklist to help you → bit.ly/kds-checklist
@AdamSmith-lg2vn8 ай бұрын
Really really clearly explained. I like the idea of using separating out a staging view for trivial rename/cleaning vs. business logic in the warehouse layer loads. I push for a very similar architecture but I'm going to integrate that detail going forward.
@KahanDataSolutions8 ай бұрын
Glad it was helpful!
@minhtungo4538 ай бұрын
Straight to the point, with clear examples. It does help me alot. Thanks Kahan
@KahanDataSolutions8 ай бұрын
Thanks for watching!
@runningwithstelvio8 ай бұрын
This is a three layer approach that I like. The weakness I believe is the integration and data loading performance of a star schema, which is more complex then a more normalized model like the Data Vault for example. With the flat wide DM tables in any case you ensure performance in extracting the data from the DWH, so why don't use a normalized model instead of a star schema, to ensure loading performance and agile integration of the model, and then use a star schema or flat tables in the the DMs to ensure performance in extracting the data? Thanks for your video, I like the format and the way you present these data model topics. And luckily we have still someone who strongly believe in data modelling!
@KahanDataSolutions8 ай бұрын
Thanks for the comment! Star Schema/Dimensional Modeling is just one of many approaches you could take in that middle "warehouse" layer. It's personally what I'm most comfortable with but you could certainly use other techniques.
@sunilbabu5885 ай бұрын
9:02 Does staging area serve only as a medium for 'mental clarity'? Because the raw data can directly be transformed into a desired data model.
@KahanDataSolutions5 ай бұрын
I cover the different benefits of the Staging layer in depth in this video - kzbin.info/www/bejne/sJC3o6ehf7CLpaM
@andreseduardoquinones47905 ай бұрын
Thanks for your video! Nicely explained. A question, would this approach be the same as a medallion architecture? if not, what are the differences?
@ahmedsomir8 ай бұрын
Simple as great thanks for this, but from my point of view, the staging layer is the raw data tables (incremental one-one from sources) and the enforced reading mechanize with VIEW is perfect with a retention for staging after 2-3 days. what do u think?
@vishal_uk8 ай бұрын
Thanks a lot! I watch all your vids and I've subscribed. Could you share the layers for DEV and CI, please?
@KahanDataSolutions8 ай бұрын
Check out this video - kzbin.info/www/bejne/l2iraoWhr5eep9Esi=e-qIKGRHuo34dvwu
@vishal_uk8 ай бұрын
@@KahanDataSolutions Thanks a bunch!
@StuartWeir8 ай бұрын
Where do you feel Entity Resolution fits into this? For example, I have multiple data sources pertaining to the same data type; does ER fit between staging and the DW in this 3 Layered Approach?
@KahanDataSolutions8 ай бұрын
Sounds like that would be handled in the Warehouse layer. Each table in the different sources would have its own Staging view, then combined in the Warehouse to create the single entity.
@retenim282 ай бұрын
hi, just a question: is what you described the same as the so called medallion architecture? or it's something different?
@alxsbn8 ай бұрын
Hi @KahanDataSolution. How did you merge this organization with dbt folder structure recommandations especially for the intermediate and marts layers/folders. Do you advise to create a new folder named warehouse? Did you advise not letting end users having access to facts and dimensions? you're speaking about OBT, is this the way to go?
@KahanDataSolutions8 ай бұрын
I tend to break out warehouse as a separate directory. In dbt it'll look something like this: models/ marts staging warehouse For intermediate, these are more like "helper" models. So you could organize them in a separate directory right under models/ or nest it under the particular layer it's supporting. For example: models/ marts staging warehouse intermediate As mentioned in the video, I prefer to keep end-user access limited to Marts but some organizations allow direct fact/dimension access. It really depends on the user base, their understanding & familiarity with those concepts. Overall, there is no single "right" approach but will be a mixture based on your particular company. But this 3 layer design has worked well for me and others as a great starting point. For more on the different "warehouse" models, check out this video - kzbin.info/www/bejne/f5WmnoChhrGpfaMsi=ZyfIVFSII7qznVBI Hope this helps!
@sanjidnet8 ай бұрын
What’s the benefit of setting up staging as view instead of a regular table?
@KahanDataSolutions8 ай бұрын
To avoid duplicate storage since typically there are no other joins involved. Think of it like a glorified select statement. However, if performance becomes an issue you may look to deploy as a table.
@andynelson23408 ай бұрын
Views will always grab the latest data upstream, they'll always be fresh relative to the source data.
@apamwamba3 ай бұрын
i think virtual staging is good for small data volumes where no heavy volumes are needed
@dmeeuwsen41058 ай бұрын
So where in this example would you implement the dbt snapshotting? At the client im working for atm we have the raw source data coming in, in the landing schema. Then in the staging schema the history is build (dbt snapshot, with a select * from landing.table). The snapshotted tables are then used in the models, where (among other things ofc) we rename columns and clean some data.
@KahanDataSolutions8 ай бұрын
I'd probably look to do something like this: source > snapshot > staging > warehouse > mart
@reiko5518 ай бұрын
This is a idempotent pipeline. Can you do incremental one via dbt?
@chiragpramod3088 ай бұрын
Thanks alot for Practical pov !! Do you still suggest reading Kimball or are there better and more modern & Practical books on modelling Love your content man
@KahanDataSolutions8 ай бұрын
I still like to follow a star schema for the core "warehouse" model for clarity & organizational purposes. Then create wider user-facing "marts" on top of those to leverage the capabilities of more modern cloud DBs.
@chiragpramod3088 ай бұрын
@@KahanDataSolutionsThanks for your insight. I heard about Lakehouse modelling being a Hype but It only happens on Larger companies. So is it important to get thoroughly in touch with the basics before heading towards such Complex Topics?
@KahanDataSolutions8 ай бұрын
@@chiragpramod308 Learning the basics before moving to complex topics is always a good strategy.