Design a Data Warehouse | System Design

  Рет қаралды 31,134

Interview Pen

Interview Pen

Күн бұрын

Пікірлер: 27
@Porkductions
@Porkductions 8 ай бұрын
The timing could not be better. I'm about to take on a new role literally about the contents of this video so thank you so much for making this!
@interviewpen
@interviewpen 8 ай бұрын
Glad you liked it!
@richardmccauley9081
@richardmccauley9081 8 ай бұрын
Years of experience all packed in to 14 min, Thank you Sir! As with all your videos, great work
@interviewpen
@interviewpen 8 ай бұрын
Thanks for watching!
@work_lpag-t7i
@work_lpag-t7i 25 күн бұрын
I love this video. It's informative and really well made. I also love to read the comments on these videos as, no matter how good your architecture is, there is always something to improve. Looking forward to see an update on this (especially regarding the data extraction process). But once again. This is awesome content. Keep going :) :)
@MahadirAhmad
@MahadirAhmad 8 ай бұрын
I generally believe this should be lakehouse design instead of warehouse also the database replication into data lake is classified under change data capture (cdc). If you push your data both into queue and database it’s hard to ensure consistency between the datalake and the database ie cases like rollback or database failure. Typically the state of the art solution for this type of problem is to rely on the database journal for instance through the binlog or WAL
@interviewpen
@interviewpen 8 ай бұрын
Cool thanks
@MardiLo-l3c
@MardiLo-l3c 3 ай бұрын
It's indeed a lakehouse design
@anegyptiangod7386
@anegyptiangod7386 3 ай бұрын
Thanks for sharing. But there's a thing that i do not understand, what is the point of using bus system like kafka or kinesis if the main goal is to process data in scheduled intervals? Would it to be more cost efficient to just use old school batch processing pipeline?
@interviewpen
@interviewpen 2 ай бұрын
Sometimes yes, sometimes no. If we have the ability to process streaming data, this results in less overall data being processed and more up to date results. Sometimes this isn’t possible though, and we have to process the data in batch.
@Rami_Elkady
@Rami_Elkady 8 ай бұрын
In today's lesson we explain motor vehicles ... We will go over everything ... But the engine .... DW means show OLTP Schema design vs OLAP
@interviewpen
@interviewpen 8 ай бұрын
Our youtube videos are usually higher level-If you’re looking for more in-depth content we have plenty on interviewpen.com :)
@Rami_Elkady
@Rami_Elkady 8 ай бұрын
@@interviewpen I think it is more of a watered down average Joe explanations rather than higher level. Which is ok but should be reflected in the title. "Data warehousing concepts for the average Joe" would be a better title for a generic video whose audience are accountants are libertarians ... If you call it "Design a data warehouse" like you did, professionals will think that you will provide what you said you would ... Which did not materialize ...
@qazyhn94
@qazyhn94 3 ай бұрын
Which tech is used or concept to put DB changes to the queue ? Does PG native support this? It's also scary since they can't be desynchronized
@interviewpen
@interviewpen 2 ай бұрын
Usually we’d dump things into Kafka at the same time they’re being dumped into the database. But this is highly dependent on the system-some databases support this natively, sometimes it’s easier just to use a batch job.
@gaberial3361
@gaberial3361 8 ай бұрын
I'm wondering which app were you using for demo
@interviewpen
@interviewpen 8 ай бұрын
We use GoodNotes on an iPad
@bhanuprakashrao1460
@bhanuprakashrao1460 3 ай бұрын
Does consumers of Kafka also generally horizontally scale? Or are the consumers always unique (different from each other, rather than being replicas)
@interviewpen
@interviewpen 3 ай бұрын
Yes, the consumers are stateless so scaling them just means replicating it.
@iamcrabgod2809
@iamcrabgod2809 8 ай бұрын
Hi I love your videos ❤
@interviewpen
@interviewpen 8 ай бұрын
Thanks!
@ryan.aquino
@ryan.aquino Ай бұрын
I'm still quite confused on why we prefer queue + Flink (stream processing) instead of just ingesting data directly from sources using spark.
@geekwithme9449
@geekwithme9449 19 күн бұрын
spark is not a tool for streaming data. Using Flink enables you to do real-time data ingestion along with the transformations. It is mostly used in fraud and anomaly detection usecases.
@decrypt_key
@decrypt_key 8 ай бұрын
Surprised that dbt was not mentioned since we're talking about a modern approach👀. Appreciated the video otherwise
@interviewpen
@interviewpen 8 ай бұрын
Yep, dbt can certainly be used as an alternative to the solutions mentioned. Thanks!
@personalbranddata
@personalbranddata 3 ай бұрын
I didn't learn how to design a data warehouse from this video. Misleading title. Bad.
@deddallama
@deddallama Ай бұрын
Why not Use CDC?
Design a Code Execution System | System Design
7:37
Interview Pen
Рет қаралды 12 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
It works #beatbox #tiktok
00:34
BeatboxJCOP
Рет қаралды 41 МЛН
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН
Design an ML Recommendation Engine | System Design
8:46
Interview Pen
Рет қаралды 17 М.
Systems Design in an Hour
1:11:00
Jordan has no life
Рет қаралды 34 М.
Design a Simple Authentication System | System Design Interview Prep
17:22
Design a Payment System - System Design Interview
31:40
High-Performance Programming
Рет қаралды 511 М.
Data Lake VS Data Warehouse VS Data Marts | CodeLearnX
12:07
CodeLearnX
Рет қаралды 29 М.
Design a Fault Tolerant E-commerce System | System Design
8:17
Interview Pen
Рет қаралды 30 М.
Design a High-Throughput Logging System | System Design
8:23
Interview Pen
Рет қаралды 49 М.
What is Data Pipeline? | Why Is It So Popular?
5:25
ByteByteGo
Рет қаралды 224 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН