How to Create a Data Modeling Pipeline (3 Layer Approach)

  Рет қаралды 6,958

Kahan Data Solutions

Kahan Data Solutions

Күн бұрын

Пікірлер: 30
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
Want to build a reliable, modern data architecture without the mess? Here’s a free checklist to help you → bit.ly/kds-checklist
@AdamSmith-lg2vn
@AdamSmith-lg2vn 8 ай бұрын
Really really clearly explained. I like the idea of using separating out a staging view for trivial rename/cleaning vs. business logic in the warehouse layer loads. I push for a very similar architecture but I'm going to integrate that detail going forward.
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
Glad it was helpful!
@minhtungo453
@minhtungo453 8 ай бұрын
Straight to the point, with clear examples. It does help me alot. Thanks Kahan
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
Thanks for watching!
@runningwithstelvio
@runningwithstelvio 8 ай бұрын
This is a three layer approach that I like. The weakness I believe is the integration and data loading performance of a star schema, which is more complex then a more normalized model like the Data Vault for example. With the flat wide DM tables in any case you ensure performance in extracting the data from the DWH, so why don't use a normalized model instead of a star schema, to ensure loading performance and agile integration of the model, and then use a star schema or flat tables in the the DMs to ensure performance in extracting the data? Thanks for your video, I like the format and the way you present these data model topics. And luckily we have still someone who strongly believe in data modelling!
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
Thanks for the comment! Star Schema/Dimensional Modeling is just one of many approaches you could take in that middle "warehouse" layer. It's personally what I'm most comfortable with but you could certainly use other techniques.
@sunilbabu588
@sunilbabu588 5 ай бұрын
9:02 Does staging area serve only as a medium for 'mental clarity'? Because the raw data can directly be transformed into a desired data model.
@KahanDataSolutions
@KahanDataSolutions 5 ай бұрын
I cover the different benefits of the Staging layer in depth in this video - kzbin.info/www/bejne/sJC3o6ehf7CLpaM
@andreseduardoquinones4790
@andreseduardoquinones4790 5 ай бұрын
Thanks for your video! Nicely explained. A question, would this approach be the same as a medallion architecture? if not, what are the differences?
@ahmedsomir
@ahmedsomir 8 ай бұрын
Simple as great thanks for this, but from my point of view, the staging layer is the raw data tables (incremental one-one from sources) and the enforced reading mechanize with VIEW is perfect with a retention for staging after 2-3 days. what do u think?
@vishal_uk
@vishal_uk 8 ай бұрын
Thanks a lot! I watch all your vids and I've subscribed. Could you share the layers for DEV and CI, please?
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
Check out this video - kzbin.info/www/bejne/l2iraoWhr5eep9Esi=e-qIKGRHuo34dvwu
@vishal_uk
@vishal_uk 8 ай бұрын
@@KahanDataSolutions Thanks a bunch!
@StuartWeir
@StuartWeir 8 ай бұрын
Where do you feel Entity Resolution fits into this? For example, I have multiple data sources pertaining to the same data type; does ER fit between staging and the DW in this 3 Layered Approach?
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
Sounds like that would be handled in the Warehouse layer. Each table in the different sources would have its own Staging view, then combined in the Warehouse to create the single entity.
@retenim28
@retenim28 2 ай бұрын
hi, just a question: is what you described the same as the so called medallion architecture? or it's something different?
@alxsbn
@alxsbn 8 ай бұрын
Hi @KahanDataSolution. How did you merge this organization with dbt folder structure recommandations especially for the intermediate and marts layers/folders. Do you advise to create a new folder named warehouse? Did you advise not letting end users having access to facts and dimensions? you're speaking about OBT, is this the way to go?
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
I tend to break out warehouse as a separate directory. In dbt it'll look something like this: models/ marts staging warehouse For intermediate, these are more like "helper" models. So you could organize them in a separate directory right under models/ or nest it under the particular layer it's supporting. For example: models/ marts staging warehouse intermediate As mentioned in the video, I prefer to keep end-user access limited to Marts but some organizations allow direct fact/dimension access. It really depends on the user base, their understanding & familiarity with those concepts. Overall, there is no single "right" approach but will be a mixture based on your particular company. But this 3 layer design has worked well for me and others as a great starting point. For more on the different "warehouse" models, check out this video - kzbin.info/www/bejne/f5WmnoChhrGpfaMsi=ZyfIVFSII7qznVBI Hope this helps!
@sanjidnet
@sanjidnet 8 ай бұрын
What’s the benefit of setting up staging as view instead of a regular table?
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
To avoid duplicate storage since typically there are no other joins involved. Think of it like a glorified select statement. However, if performance becomes an issue you may look to deploy as a table.
@andynelson2340
@andynelson2340 8 ай бұрын
Views will always grab the latest data upstream, they'll always be fresh relative to the source data.
@apamwamba
@apamwamba 3 ай бұрын
i think virtual staging is good for small data volumes where no heavy volumes are needed
@dmeeuwsen4105
@dmeeuwsen4105 8 ай бұрын
So where in this example would you implement the dbt snapshotting? At the client im working for atm we have the raw source data coming in, in the landing schema. Then in the staging schema the history is build (dbt snapshot, with a select * from landing.table). The snapshotted tables are then used in the models, where (among other things ofc) we rename columns and clean some data.
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
I'd probably look to do something like this: source > snapshot > staging > warehouse > mart
@reiko551
@reiko551 8 ай бұрын
This is a idempotent pipeline. Can you do incremental one via dbt?
@chiragpramod308
@chiragpramod308 8 ай бұрын
Thanks alot for Practical pov !! Do you still suggest reading Kimball or are there better and more modern & Practical books on modelling Love your content man
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
I still like to follow a star schema for the core "warehouse" model for clarity & organizational purposes. Then create wider user-facing "marts" on top of those to leverage the capabilities of more modern cloud DBs.
@chiragpramod308
@chiragpramod308 8 ай бұрын
​@@KahanDataSolutionsThanks for your insight. I heard about Lakehouse modelling being a Hype but It only happens on Larger companies. So is it important to get thoroughly in touch with the basics before heading towards such Complex Topics?
@KahanDataSolutions
@KahanDataSolutions 8 ай бұрын
@@chiragpramod308 Learning the basics before moving to complex topics is always a good strategy.
Data Warehouse Security w/ 4 Simple Roles
11:49
Kahan Data Solutions
Рет қаралды 1,5 М.
How to become a Data Architect (Career in Architecture)
9:15
Software Architecture Academy
Рет қаралды 26 М.
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 42 МЛН
The Missing Piece in Many Data Pipelines
9:55
Kahan Data Solutions
Рет қаралды 6 М.
Data Modeling Tutorial: Star Schema (aka Kimball Approach)
16:34
Kahan Data Solutions
Рет қаралды 134 М.
You Don't Need to Learn Every Data Tool & Skill
6:46
Kahan Data Solutions
Рет қаралды 2 М.
Modern Data Engineering Workflows, Explained
6:38
Kahan Data Solutions
Рет қаралды 7 М.
Data Modeling in the Modern Data Stack
10:14
Kahan Data Solutions
Рет қаралды 114 М.
How to Build Incremental Models | dbt tutorial
10:51
Kahan Data Solutions
Рет қаралды 14 М.
What tools should you know as a Data Engineer?
10:24
Kahan Data Solutions
Рет қаралды 67 М.
Tales of Data Architecture Evolution - Josef Goldstein - NDC Oslo 2023
58:02