What are some common data pipeline design patterns? What is a DAG ? | ETL vs ELT vs CDC (2022)

  Рет қаралды 159,787

IT k Funde

IT k Funde

2 жыл бұрын

What are some common data pipeline design patterns? What is a DAG ? | ETL vs ELT vs CDC (2022)
#datapipeline #designpattern #et# #elt #cdc
1:01 - Data pipeline components
4:10 - ETL design pattern (Extract, Transform & Load)
7:15 - ELT design pattern (Extract, Load & Transform)
10:37 - CDC design pattern (Change Data Capture)
14:22 - EtLT design pattern (Extract transform Load & Transform)
Hi Friends, I am Anshul Tiwari, and welcome to your youtube channel "IT k Funde" where we make I.T. interesting for everyone (Tech or No-Tech).
**Do check out our popular playlists**
1) Networking and Infra Concepts - • Networking & Infra Con...
2) Data Analytics & Insights - • Learn - Data Engineeri...
3) Google Cloud Platform Beginner Series -
• Google Cloud Platform ...
4) Latest technology tutorial (2021) -
• What is a Data Vault ?...
More about this video -
Thanks for all your love on my Data Pipeline basics video - • What is Data Pipeline ...
This video is a follow-up video that talks about some basic data pipeline design patterns that are used in data warehousing or data lake solutions. We will learn what is DAG (Directed Acyclic Graph) and its core components.
Then we will move on to our 3 primary design patterns and an additional bonus sub-pattern. Below are the topics we will cover in this video.
1 - Data pipeline components
2 - ETL design pattern (Extract, Transform & Load)
3 - ELT design pattern (Extract, Load & Transform)
4 - CDC design pattern (Change Data Capture)
5 - EtLT design pattern (Extract transform Load & Transform)
PLEASE WATCH OTHER VIDEOS FROM THE POPULAR PLAYLISTS GIVEN BELOW. EVERY SINGLE LIKE, COMMENT AND SHARE MEANS THE WORLD TO ME!
#itkfunde #keeplearning #keepsharing #keephustling
Credits & Resources -
images - pixabay.com
Research - wikipedia
**Social Channels**
KZbin - / itkfunde
Facebook - / itkfunde
Linkedin - / ansh9685
Twitter - / ansh9685
Instagram - / itkfunde
🚀🚀LAUNCHING MY 1st EVER ONLINE COURSE 🚀🚀
"Cloud 101: AWS for Dummies - Your 1st Date with Cloud !"
✨Your first steps towards a Digital Cloud Career☁️✨
🔥🔥Enroll Now via link below - www.itkfunde.net/courses/Clou...
Highlights:
✅ No Pre-requisites, No Coding needed
✅ Course Starts on 11th September 2023
✅ Career Boost: Unlock new career opportunities by mastering cloud basics and AWS fundamentals.
✅ Step-by-Step Demos: Follow along with our easy-to-follow demos that walk you through key concepts.
✅ Live QnA session and career guidance session with me
✅ 2 years Access to the course
✅7-Day Money-Back Guarantee: I know you'll love the course. If for any reason you're not satisfied within the first 7 days from course launch, we offer a full refund.
✅ AWS Cloud Practioner Certification Guidance
✅ Bonus: Invitation to join my special Telegram community for Lifetime
✅ Bonus: After the course, stay personally connected to me for career guidance
With a 7-day money-back guarantee, you can enroll with confidence. Don't miss out on this opportunity to learn and grow in the cloud industry.
Enroll Now and take YOUR First Steps towards a Digital Cloud Career !!
Hurry!!
**About This Channel**
Friends ITkFUNDE channel wants to bring I.T related knowledge, information, career advice, and much more to every individual regardless of whether he or she belongs to I.T or not. This channel is for everyone interested in learning something new!

Пікірлер: 120
@patmclaughlin107
@patmclaughlin107 9 ай бұрын
Love this, man! Our engineers made it so hard to understand what DAG was. I thought I was not smart enough, but now I know they were either deliberately making it hard, or maybe they didn’t understand it themselves.
@bajicdusko
@bajicdusko Жыл бұрын
It always amazes me how we can have knowledge like this one click away! Fantastic content, keep up with good work.
@sumit12072007
@sumit12072007 Жыл бұрын
I was thinking the same while going through this video.
@andygarnet7191
@andygarnet7191 Жыл бұрын
Thanks man! Your explanations so clear and straight fwd. For years I spoke to do many engineers who would over -complicate pattern concepts or straight low ball the documentation to cover themselves when the pipelines blow up and impact the business. Keep up with these great videos!
@AamirAzizYouTube
@AamirAzizYouTube Жыл бұрын
Thanks so much, sir! This topic was a nightmare for me, you made it so simple to grasp. Keep up the good work!
@daves4026
@daves4026 2 жыл бұрын
Perfect. Full respect to your kindness and sharing of your knowledge
@deeptihazari3233
@deeptihazari3233 Жыл бұрын
How this Amazing channel was hidden till now ..this is called Quality content delivery 👍
@metaocloudstudio2221
@metaocloudstudio2221 2 жыл бұрын
Good point , also the pros of using ELT over cons of ETL is creating normalizing tables and real-time materialized views
@AmanGupta-yf1hj
@AmanGupta-yf1hj 8 ай бұрын
Wonderful content
@wendypark3848
@wendypark3848 2 жыл бұрын
I learned a lot network and data pipeline knowledge from you. It''s really hard to learn these from a book. Thanks a lot!
@raghurajsawant24
@raghurajsawant24 2 жыл бұрын
You are doing a fantastic job. Love your videos.
@francksgenlecroyant
@francksgenlecroyant 2 жыл бұрын
perfect video about Data Pipelines 👌, thanks!
@vaidyanathashankar7441
@vaidyanathashankar7441 Жыл бұрын
Fantastic explanation, thanks for the wonderful session.
@DJEYkanjaria
@DJEYkanjaria 2 жыл бұрын
Great Video, Simple and detailed explanation
@mohit.srivastava
@mohit.srivastava Жыл бұрын
both this and the previous connected video explained the concept really well. thanks!!
@user-fm7wc9hy3j
@user-fm7wc9hy3j 11 ай бұрын
I studied Spark and read DAG many times but just understand it now that i'm watching ur tutorial. thks
@greenshadowooo
@greenshadowooo 9 ай бұрын
Thanks for your sharing ! 😀😀😀
@UTUBDZ
@UTUBDZ 2 жыл бұрын
Great content, thank you very much sir !
@altamashjawad6691
@altamashjawad6691 2 жыл бұрын
Thank you so much, very nice and comprehensive video!
@sunnyj1967
@sunnyj1967 Жыл бұрын
Its a terrific presentation.
@ITkFunde
@ITkFunde Жыл бұрын
thanks sunny
@SeanRomberg
@SeanRomberg Жыл бұрын
Thanks for the share - you have helped me better understand the pipeline automation software that delivers orchestration, ingestion, transformation, and activation all in one. This makes sense now.
@AnandhabalanRadhakrishnan
@AnandhabalanRadhakrishnan 11 ай бұрын
Well explained, keep sharing valuable information like this.
@federicogonzalez7673
@federicogonzalez7673 2 жыл бұрын
Im glad that I found your video in my feed, nice one
@sultanqureshi2766
@sultanqureshi2766 2 жыл бұрын
Though its not exactly related to my current profile but its make me happy to learn more about the whole software industry from core and you are best at making this understand by making it simple. Understood the 4(ETL, ELT, ETLT, CDC) data pipeline at once. Video was not long at all Thanks
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Diwakar for your support as always 🙏☺️
@emmanuelaolaiya
@emmanuelaolaiya Ай бұрын
Great job and thanks
@ASHighlights668
@ASHighlights668 Жыл бұрын
Very helpful sir your videos converts my nervousness into confidence !!
@mayurarun
@mayurarun Жыл бұрын
This is such a gem video. This would help me so much. Great work.
@guidodichiara2243
@guidodichiara2243 Жыл бұрын
Great job. Keep going on!
@Poornima_life
@Poornima_life Жыл бұрын
Absolutely…I liked the video ,content and your valuable efforts….thanks
@almamun8291
@almamun8291 Жыл бұрын
Thank you very much, got clear concept about data pipelines
@harishb8790
@harishb8790 Жыл бұрын
Amazing explanation. 👏
@ITkFunde
@ITkFunde Жыл бұрын
Glad you liked it
@rohithsai5265
@rohithsai5265 Жыл бұрын
Great content 💯
@jishuenkam6213
@jishuenkam6213 2 жыл бұрын
Not exactly a backend developer or data engineer, but this video is very informational on the various data pipeline designs!
@ITkFunde
@ITkFunde 2 жыл бұрын
thanks
@mailsuresh9
@mailsuresh9 2 жыл бұрын
Sirjee tussi great ho. Thank you for making IT interesting
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Suresh ☺️☺️🙏
@ashokrajur09
@ashokrajur09 2 жыл бұрын
nice one, very informative
@rahuldey1182
@rahuldey1182 Жыл бұрын
In my project, we are using CDC + EtLT design pattern for our data pipeline. All the design patterns of data pipelines are covered here. Very well presented, good job, keep going.
@ITkFunde
@ITkFunde Жыл бұрын
Thanks Rahul ♥️
@mohammadateef3339
@mohammadateef3339 Жыл бұрын
ur entry is osm sir
@subhradeepshah472
@subhradeepshah472 2 жыл бұрын
❤️extreme top right hand corner of the whiteboard. ❤️
@sabrinafung3155
@sabrinafung3155 2 жыл бұрын
As a foreigner, I am curious what is that means in the top right hand corner of the whiteboard, is it motivation dialogue?
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Subhradeep🙏☺️
@ITkFunde
@ITkFunde 2 жыл бұрын
🙏|| ॐ गं गणपतये नमः || || Om Gan Ganpataye Namah|| 🙏 Hi Sabrina, Lord Ganesha in Hinduism is called god of wisdom and knowledge and its believed that any good work should start by taking his name first hence this mantra in sanskrit is a prayer to him to seek his blessings for all of us before we start our journey towards knowledge and wisdom. According to individual faiths he could be Jesus, Allah, Waheguru For us... He is Ganesha 🙏
@kristhomas8295
@kristhomas8295 2 жыл бұрын
Thank you so much for this!
@Liubov_110
@Liubov_110 Жыл бұрын
Thank you so much for this detailed video 👍
@Lebrao09
@Lebrao09 2 жыл бұрын
great video!
@mangesh4231
@mangesh4231 5 ай бұрын
Very detailed explanation, helpful. thanks a lot for all work and efforts.
@ashisharora9649
@ashisharora9649 10 ай бұрын
AMAZING
@javierruizdiaz8656
@javierruizdiaz8656 2 жыл бұрын
Thank you, excellent Video.
@the.abhisheksinha
@the.abhisheksinha Жыл бұрын
nicely explained !
@itneka
@itneka Жыл бұрын
Thanks for the information
@swaragupta7932
@swaragupta7932 Жыл бұрын
Easy Explanation, Detailed video
@ITkFunde
@ITkFunde Жыл бұрын
Thanks Swara
@mzeeshan
@mzeeshan 2 жыл бұрын
Loved the details mate!.
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Zeeshan☺️
@hsiaoshuang
@hsiaoshuang Жыл бұрын
Very informative!
@JJ-ki2mw
@JJ-ki2mw Жыл бұрын
Thank you so much the way you described it is so easy to understand
@vigneshbaskaran7931
@vigneshbaskaran7931 10 ай бұрын
Love this content, Thank you so much for all the efforts.
@ITkFunde
@ITkFunde 10 ай бұрын
Thanks
@TheAfroKingPlay
@TheAfroKingPlay 2 жыл бұрын
Very nice video man. Thanks I need this class. Take my like.
@StaceyJ1908
@StaceyJ1908 Жыл бұрын
First, your videos are amazing....I have learned so much! I am looking at our current GCP implementation and trying to identify key risks across each step in the pipeline to determine if we have the correct controls in place or gaps...what are key risks to address at each stage of the data pipeline?
@jagss3472
@jagss3472 Жыл бұрын
Lovely explanation and very insight details.
@ITkFunde
@ITkFunde Жыл бұрын
Glad it was helpful Jaga!
@davidcamiloespitiamanrique9
@davidcamiloespitiamanrique9 2 жыл бұрын
Good one! probably, you can talk about AWS DMS and AWS GLUE
@connect_vikas
@connect_vikas Жыл бұрын
Love you brother for beautifully explained this.
@ITkFunde
@ITkFunde Жыл бұрын
Thanks Vikas
@GernPudman
@GernPudman 6 ай бұрын
Thanks!
@lwhieldon1
@lwhieldon1 2 жыл бұрын
DAG concept is talked about a lot in data science. Can you talk about how this concept in data science correlates with the DAG design?
@GabrielJambert
@GabrielJambert 4 ай бұрын
Thank you
@arond.g1120
@arond.g1120 Жыл бұрын
Feel like I am learning in my own language. ❤❤❤
@ITkFunde
@ITkFunde Жыл бұрын
Thanks Aron ♥️♥️🙏
@PiyushSharma-jq8rr
@PiyushSharma-jq8rr Жыл бұрын
This was really good :-)
@augugninfin1034
@augugninfin1034 Жыл бұрын
Thank You!
@ITkFunde
@ITkFunde Жыл бұрын
♥️♥️♥️
@ajaykiranchundi9979
@ajaykiranchundi9979 2 жыл бұрын
Thank you so much! BTW it was certainly not at all a long video.
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Ajay ☺️❤️
@chinuamareashwar8146
@chinuamareashwar8146 2 жыл бұрын
nice explanation brother
@ganeshsrinivasannv4296
@ganeshsrinivasannv4296 Жыл бұрын
Thanks and it's a great work. Can you share a content on the data captured received as XML messaging pattern and advise on how to store that
@lastboomer6164
@lastboomer6164 Жыл бұрын
Hello I very much appreciate the training. Would you consider a white board exercise whereas the ETL Jobs and Transformations are using a Metadata Data Driven ETL. - I learned that this is a good practice....but one downside is that this data design can not feed a data catalog "lineage"
@victoraf4274
@victoraf4274 9 ай бұрын
such an amazing video! not bored at all (im not joking) hehe
@Vikas.007
@Vikas.007 2 жыл бұрын
Awesome content 👍👍 Datamart video link in description plz share 🙏
@big_pants0493
@big_pants0493 2 жыл бұрын
This is amazing. IT k Funde, Can you please suggest any Books that will explain the following topics further and also provide some training?
@bluzane
@bluzane 2 жыл бұрын
You don't need any....this Genius Guy is the book. He is my Guru for life
@ITkFunde
@ITkFunde 2 жыл бұрын
thanks dear 🙏🙏❤❤
@masterh6868
@masterh6868 2 жыл бұрын
hey your video as usual full for information and with crystal clear concepts of understanding.. Thanks posting such useful video as industries trends.. can you make video data pipeline , which does not fall DAG pattern ...... like Ml pipeline maybe..(not sure)
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks buddy for your feedback and suggestion ☺️
@jayanth1376
@jayanth1376 2 жыл бұрын
👌👌👌
@VlasTrunov
@VlasTrunov 2 жыл бұрын
It's good you focus on DAG's. But for those new to the subject it might be too abstract, I guess. What I would do is I would show how things flow in Airflow, for example, for those who perceive information visually. This way you would spread the (butter on the bread) information in your video uniformly, makking people get the grasp of the information in one pass, if you know what I mean. It's just a suggestion. But to me personally, the detailization you give is perfect.
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Vlas such useful feedbacks helps me better my content. I will defintely take your thaughts and do something better nxt time.😊
@debrajpradhan5500
@debrajpradhan5500 2 жыл бұрын
Sir, I am interested on AWS analytics.so can u plz tell what AWS data services read 1st,2nd?
@TheyCalledMeT
@TheyCalledMeT Жыл бұрын
would you put data- cleansing / preparation as part of the t of EtLT pattern? or in the T?
@prashantprashant1291
@prashantprashant1291 2 жыл бұрын
Your videos are full of knowledge.. are u Data Solution Architect
@brookster7772
@brookster7772 8 ай бұрын
Great Video! Can you tell where a Vector Database fits into this model? Isnt it at some point all data must be converted to Embeddings / vectors to be stored into a massive Vecotr Store to be used for AI similarity searches?
@arundhutinayak8221
@arundhutinayak8221 2 жыл бұрын
Now I can put technical terms to my current task. Can you do something on API
@deepsy4786
@deepsy4786 2 жыл бұрын
I would like to discuss on considering CDC as a data pipeline design pattern. My understanding would be that CDC is more related to data modelling concept. You would have to build an ELT or ETL pipeline anyways. CDC more relates to Load or Transformation technic instead of being an individual pipeline. However, all the insights shared were helpful and did helped me relate my work with some of these concepts.
@ambarishhazarnis9531
@ambarishhazarnis9531 2 жыл бұрын
Here CDC referred to storing the delta on a separate table. This way we don’t need to do a read on source table again to extract the change.
@upendrakumar-ok3tr
@upendrakumar-ok3tr Жыл бұрын
Can you please make a video on Baremetal and Hypervisor
@arijitsinha2955
@arijitsinha2955 Жыл бұрын
Can you make video on data bricks along with an example please ?
@adamjapal7370
@adamjapal7370 2 жыл бұрын
do you have a reference or pdf book file of the data pipeline concept? if you do, could you take me to the link? thank you.
@anilmantri2139
@anilmantri2139 4 ай бұрын
Hi, how to pull the source data into EL DAG in the CDC pattern. I mean what tech stack to be used?
@SamS-oi5pz
@SamS-oi5pz 2 жыл бұрын
Hi how do we identify changed data from source?
@veeek8
@veeek8 2 жыл бұрын
'Hope you're not bored', never 😁
@lcsxwtian
@lcsxwtian 2 жыл бұрын
At CDC, you had said that max() would get the latest snapshot of the data. I am assuming max() would get the maximum count of the data - correct? If that were the case, what if the last change was to DELETE some data, then I don't think max() would be right?
@AnishBhola
@AnishBhola 7 ай бұрын
yes your right! timestamp based cdc is generally not a good option to process deletes. There are other types of cdc such as log based (most optimal) which you can use for such situations. This video primarily talks about implementing difference based cdc (where 2 snapshots of target systems are compared).
@imnischaygowda
@imnischaygowda 7 ай бұрын
What is purpose of Sink ? Can t we store data directly to DataWarehouse ?
@pm4306
@pm4306 Жыл бұрын
please give some concrete business example instead of 'n' and 'n+1' as as example will help to clarify and walk thru in a better way - i think you should give concrete real-life business examples for all cases that u discuss......you are missing actual business examples in your videos.
@ernesto8738
@ernesto8738 2 жыл бұрын
and here I am with a cyclic graph problem {{{(>_
@antonfernando8409
@antonfernando8409 2 жыл бұрын
Never heard of most of the terms like (ETL, ELT, CDC) mentioned, I guess these are specific to cloud computing, still in terms of data pipeline, its useful to learn I think. Thanks
@egor.okhterov
@egor.okhterov 2 жыл бұрын
No, it’s not about cloud computing. It’s about data analytics in general. When you want to build web dashboards that draw graphs of some business processes or want to analyse customer behavior, you build this data pipeline. TLDR: you cannot run SQL on your logs. You need to push your logs into MySQL in order to be able to query your logs.
@ITkFunde
@ITkFunde 2 жыл бұрын
Hi Anton these terms are quite old but have become more prominent with new age data management. May be you are not from Data, Business Inteligence background, but its good to learn these
@dylanmccullough2679
@dylanmccullough2679 2 жыл бұрын
Question regarding the ELT pattern. You said that you should use SQL at the (T)transformation part. Could you use spark instead of SQL at this point? For example - Data Factory Data Flows, instead of putting compute pressure on the EDW with SQL queries?
@chandrakanthotkar7262
@chandrakanthotkar7262 2 жыл бұрын
Whenever we say ELT basically we do transformation after data has been landed in DWH or Database. Like Bigquery (GCP). As Spark engine is basically used for transformation during flow.
@nikhilgurram6569
@nikhilgurram6569 Жыл бұрын
Thanks!
@ITkFunde
@ITkFunde Жыл бұрын
thanks
Последний Закат Кота Макса...
00:21
Глеб Рандалайнен
Рет қаралды 7 МЛН
ОДИН ДОМА #shorts
00:34
Паша Осадчий
Рет қаралды 6 МЛН
Sigma Girl Education #sigma #viral #comedy
00:16
CRAZY GREAPA
Рет қаралды 37 МЛН
What is ETL with a clear example - Data Engineering Concepts
13:56
What is ETL | What is Data Warehouse | OLTP vs OLAP
8:07
codebasics
Рет қаралды 399 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 147 М.
What REALLY is Data Science? Told by a Data Scientist
11:09
Joma Tech
Рет қаралды 3,7 МЛН
Последний Закат Кота Макса...
00:21
Глеб Рандалайнен
Рет қаралды 7 МЛН