Data Architecture 101: The Modern Data Warehouse

  Рет қаралды 31,317

Kahan Data Solutions

Kahan Data Solutions

Күн бұрын

Пікірлер: 43
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Want to build a reliable, modern data architecture without the mess? Here’s a free checklist to help you → bit.ly/kds-checklist
@shakedm7256
@shakedm7256 Жыл бұрын
Just discovered your channel recently and I wanted to say it is a gold mine! Keep making this kind of content!
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Appreciate it! Glad to have you here
@aliahmad1987
@aliahmad1987 11 ай бұрын
Great video! Your pace, presentation and visuals are really on point. Keep up the good work :)
@colter7
@colter7 11 ай бұрын
Best data modeling videos I've come across so far, great job!
@Austin-dm5bp
@Austin-dm5bp Жыл бұрын
Really appreciated seeing the different examples, as it helped to underline how the stages remain the same, regardless of the specific tools being used.
@jayakrishna8121
@jayakrishna8121 Жыл бұрын
awesome this is really useful. Keep making these sample architecture videos.
@shashankemani1609
@shashankemani1609 Жыл бұрын
its really a great video for someone to understand the high-level architecture of modern data stack. It would be great if you can start a in-depth data modelling playlist as it plays a crucial role in designing data engineering pipelines. Thank you
@johnflanagan6367
@johnflanagan6367 6 ай бұрын
I just discovered your videos. They are excellent. Clear, concise and to the point. Great content! Thanks so much!
@KahanDataSolutions
@KahanDataSolutions 6 ай бұрын
Glad you like them!
@rks.siddhartha
@rks.siddhartha Жыл бұрын
These design and architecture videos are great to learn the concepts in bite sizes. Looking forward to more such videos.
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Glad you liked it!
@jgianan
@jgianan Жыл бұрын
awesome explanation and visuals! Keep it up!
@vurtgoan
@vurtgoan 3 ай бұрын
Great stuff mate!
@navoabey6047
@navoabey6047 Жыл бұрын
Simple and to the point explanation. I think it very important to understand the concepts as well not just tools, very useful for interviews also.
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Glad it was helpful!
@AlexKashie
@AlexKashie Жыл бұрын
Wow, great content broken down simply.... Thank you.
@thomasbrothaler4963
@thomasbrothaler4963 Жыл бұрын
How did I not found your channel much earlier. Your videos are extremely concise, well visualized and informative. I am a Data Scientist transitioning to Data Engineering (because in Gaming I am also always the healer/support 😉)
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Love that - welcome to the channel!
@pbxmy4521
@pbxmy4521 3 ай бұрын
For products like Databricks that attempt to offer a full DE and Analytics package, can these concepts be applied similarly? Using Azure Data Lake, Databricks SQL for transforming, and Delta tables for analytics?
@muftkuseng5924
@muftkuseng5924 11 ай бұрын
Is it meant to only add the new data to the datalake or the full copy? As an example we are using odoo as our erp system. the sql database has a unpacked size of 6gb. if i would copy it daily the amount of data would get huge. on the one hand every data would be persistend with it and i could have more options to analiye but is this really best practice?
@patmclaughlin107
@patmclaughlin107 3 ай бұрын
Do the “Data Models” (I assume these are synonymous with data marts) physically contain data? Or, are these like database views?
@SheranneTan-n1p
@SheranneTan-n1p 6 ай бұрын
Great content! I had a question - why would companies choose to use standalone ELT / ETL providers (e.g. Stitch, Matillion) over the native Amazon Glue / Azure data factory? Wouldn’t it be easier to use the cloud provides as it would be more integrated?
@sylwiasiniakowska6939
@sylwiasiniakowska6939 11 ай бұрын
Hello! Where does ETL tools like Informatica or Alteryx land in a modern data architecture? Or not at all because we have dbt / azure data factory/ SQL script ?
@AdamWeisberg-y1c
@AdamWeisberg-y1c Жыл бұрын
I think it would be really cool to see how, once the data is landed in the data lake, you bring all the data together, since you wont necessarily have matching IDs from different sources to work with.
@tomastruchly9484
@tomastruchly9484 Жыл бұрын
Heh the concept you presented (collect data from various sources into Snowflake DWH) & transform it via dbt is exactly what we do for customer :) I worked in on-premise where we handled everything via scripts & Jenkins & must say this modern approach is in many aspects better :)
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Snowflake + dbt is my favorite stack as well. Doesn't have to be overly complex to be effective.
@StartDataLate
@StartDataLate 11 ай бұрын
Can I ask where do you put the HDFS in the current data architec stack?
@sailedship6530
@sailedship6530 Жыл бұрын
What if I want to add local data Marts to that traditional flow ? Would that be a bad idea? I just want to set up local data Marts and connect them to a data lake (and somehow make it replace the data warehouse. And if that's not possible, connect them to the warehouse). Can you please make a video to show us the disadvantages of this set up? Thank you in advance
@AdamWeisberg-y1c
@AdamWeisberg-y1c Жыл бұрын
What is the use case for brining in all of the data into the data lake prior to the data warehouse? Is it possible that you bring some data into your data warehouse from the source systems directly and some data in from s3 buckets?
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Keeping it in a data lake: - Gives a historical log of all source data in it's raw form (before any DW transformations) - Allows you to load data faster, and separately from the DW transformation processes - Provides a clear location for all source data to be landed, whereas a DW might have other processes involved Plus storage is less expensive nowadays so it's less of a problem to storing it all this way (to an extent). I'm probably missing other things but that's just off the top of my head. Plus storage is less expensive nowadays so it's less of a problem to storing it all this way (to an extent). Hope that helps!
@AdamWeisberg-y1c
@AdamWeisberg-y1c Жыл бұрын
@@KahanDataSolutions Interesting, thanks for the reply! I guess my only follow up we be around your first point about keeping a historical log. Couldnt this be done just as easily in the data warehouse, assuming you are dealing with structured data? The data could be dumped into the DW just as raw as it could be in the data lake right?
@gatorpika
@gatorpika Жыл бұрын
What are some examples of the difference between the data warehouse and data models? So like if you build a star schema data warehouse, couldn't tableau just connect directly to that rather than another layer of models? Or are you building the models to differentiate the data used by different groups (i.e. a marketing mart)? Also would you typically materialize those as OBT views or physical tables? Kind of can't wrap my head around that part. Thanks!
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Great question, and the short answer is both approaches you're mentioning are possible. It's typically a matter of how much logic you want to hold in the database/queries vs offloaded to Tableau or reporting tool. I find that a lot of companies start with going from Data Warehouse right to reporting tool, but then end up shifting to having a handful of custom data mart models in between that get tied to different reports. Ideally, you can then reuse the same mart for multiple reports. The reason teams often struggle when adding a lot of logic directly in reporting tools is that as you add more and more complex logic it becomes really hard to track & troubleshoot. It basically gets lost within reports. It's easier in the short term but results in duplication, conflicting logic and more. You also don't have easy version control, testing, transparency, etc. like you would have if you wrote it in sql (and with a tool like dbt) and deployed it to a DB first. If you don't want to add an extra mart layer, it's also possible to handle a lot of that complexity still within the warehouse layer. It's really up to your team and how you want to organize it. For the second part of your question on materialization, again there's no one-size fits all answer. But I find that the marts layer is typically more of a OBT (table) approach, or closer to it. For example, you can tie together a bunch of DW tables to create a common "summary" view or on the granularity of something that can be re-used for multiple reports. But as you said, it's also acceptable to simply create additional custom models simply to separate user groups. I've seen all of the above done and it is often a case by case basis. This was a LONG winded response, but hopefully was helpful. Data strategy can be confusing but at the end of the day is just finding a way to organize tables/views/data in ways that work best for you.
@gatorpika
@gatorpika Жыл бұрын
@@KahanDataSolutions Thanks for spending the time to write that up, that was very helpful. I had not really thought of version control over the mart logic since our current BI tool sort of handles that, so that makes a lot of sense. I guess I got confused since we do all the things you mentioned within what we call the "data warehouse" layer in our architecture and wouldn't call that out separately on a diagram probably, so I was assuming there was something magical happening there that I couldn't figure out after having see that architecture a few times. Makes sense to call it out I guess I just wasn't bright enough to figure out why. Appreciate your content.
@kinuthiasteve4505
@kinuthiasteve4505 Жыл бұрын
@@gatorpika This is really an amazing question, I am in a situation where management want near real-time dashboards. My manager wants to plug Tableau directly to the DB(it's AWS dynamodb) using ODBC driver. But my thinking is, stream the data from DynamoDB with dynamo streams/kinesis firehose use AWS glue to crawl and maybe change datatypes then load it to redshift or s3 where I can connect with Tableau. I much appreciate your views, thanks.
@gatorpika
@gatorpika Жыл бұрын
@@kinuthiasteve4505 I'm not a streaming expert but yeah I think you are on the right track. I have not used Tableau in years, but it used to be more of an analysis platform for historical data, not a streaming platform, right? Like you have to manually refresh the data? We are working on something like that now where we use Kafka to consume the source data and that feeds some apps that display the real time stream and also feeds our data platform where history is accumulated and accessible via a BI tool like Tableau.
@Ka_Vin_Da
@Ka_Vin_Da Жыл бұрын
Really useful ❤
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Glad it was helpful!
@michaelhunger6160
@michaelhunger6160 Жыл бұрын
I was curious coming from your "simple, small/mid-sided" data email. I expected something on efficient analytics databases like duckdb/motherduck/hydra/firebolt? Do you plan to cover that in a future episode? Basically the other parts of the stack would stay the same just the processing goes from snowflake/synapse/bigquery to one of these more efficient, lower-cost tools.
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
Thanks for reading the email and checking out this video! I actually have not used any of those tools in depth so I can't speak much on them at this time. But perhaps in the future!
@Bravopetwal-wj3iz
@Bravopetwal-wj3iz 9 ай бұрын
Superb
@bananaboydan3642
@bananaboydan3642 Жыл бұрын
Amazing video man. As a senior CS student and aspiring data engineer, I get none of this in school! Love the channel man. Are you on instagram / twitter?
Data Modeling in the Modern Data Stack
10:14
Kahan Data Solutions
Рет қаралды 115 М.
3 Must-Know Trends for Data Engineers | DataOps
8:05
Kahan Data Solutions
Рет қаралды 24 М.
How to work SSIS with Snowflake ODBC connection
4:42
SnowflakePro
Рет қаралды 4,1 М.
ETL vs ELT | Modern Data Architectures
4:42
Kahan Data Solutions
Рет қаралды 44 М.
Design a Data Warehouse | System Design
14:08
Interview Pen
Рет қаралды 32 М.
Data Architecture 101: The Lambda Strategy
4:57
Kahan Data Solutions
Рет қаралды 10 М.
Shift left to write data once, read as tables or streams
16:21
What tools should you know as a Data Engineer?
10:24
Kahan Data Solutions
Рет қаралды 68 М.
What is a Headless Data Architecture?
11:11
Confluent
Рет қаралды 18 М.
What is Data Pipeline? | Why Is It So Popular?
5:25
ByteByteGo
Рет қаралды 238 М.
Modern Data Warehouse explained - James Serra
19:56
James Serra
Рет қаралды 18 М.