What are some common data pipeline design patterns? What is a DAG ? | ETL vs ELT vs CDC (2022)

  Рет қаралды 175,375

IT k Funde

IT k Funde

Күн бұрын

Пікірлер: 132
@MasQred
@MasQred 7 ай бұрын
While extraction of data from operational database. Won't that affect the operetional databases performance. How to extract data without affecting it.
@ITkFunde
@ITkFunde 7 ай бұрын
Good Qs thats why in old times ETL pipelines used to run overnight when the operational systems were not under heavy use but today there are seamless replication tools like attunity that can replicate the data from source by reading the logs.
@patmclaughlin107
@patmclaughlin107 Жыл бұрын
Love this, man! Our engineers made it so hard to understand what DAG was. I thought I was not smart enough, but now I know they were either deliberately making it hard, or maybe they didn’t understand it themselves.
@nikhilgurram6569
@nikhilgurram6569 2 жыл бұрын
Thanks!
@ITkFunde
@ITkFunde Жыл бұрын
thanks
@bajicdusko
@bajicdusko 2 жыл бұрын
It always amazes me how we can have knowledge like this one click away! Fantastic content, keep up with good work.
@sumit12072007
@sumit12072007 Жыл бұрын
I was thinking the same while going through this video.
@jeganj2009
@jeganj2009 11 күн бұрын
Excellent Explanation.. on all design patterns in simple way ....
@daves4026
@daves4026 3 жыл бұрын
Perfect. Full respect to your kindness and sharing of your knowledge
@andygarnet7191
@andygarnet7191 Жыл бұрын
Thanks man! Your explanations so clear and straight fwd. For years I spoke to do many engineers who would over -complicate pattern concepts or straight low ball the documentation to cover themselves when the pipelines blow up and impact the business. Keep up with these great videos!
@AamirAzizYouTube
@AamirAzizYouTube 2 жыл бұрын
Thanks so much, sir! This topic was a nightmare for me, you made it so simple to grasp. Keep up the good work!
@rahuldey1182
@rahuldey1182 2 жыл бұрын
In my project, we are using CDC + EtLT design pattern for our data pipeline. All the design patterns of data pipelines are covered here. Very well presented, good job, keep going.
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Rahul ♥️
@ashishmaharana5865
@ashishmaharana5865 3 ай бұрын
This is god send. I was just trying to implement CDC in my organizaton. This was most helpful.
@deeptihazari3233
@deeptihazari3233 2 жыл бұрын
How this Amazing channel was hidden till now ..this is called Quality content delivery 👍
@sultanqureshi2766
@sultanqureshi2766 3 жыл бұрын
Though its not exactly related to my current profile but its make me happy to learn more about the whole software industry from core and you are best at making this understand by making it simple. Understood the 4(ETL, ELT, ETLT, CDC) data pipeline at once. Video was not long at all Thanks
@ITkFunde
@ITkFunde 3 жыл бұрын
Thanks Diwakar for your support as always 🙏☺️
@ASHighlights668
@ASHighlights668 2 жыл бұрын
Very helpful sir your videos converts my nervousness into confidence !!
@metaocloudstudio2221
@metaocloudstudio2221 2 жыл бұрын
Good point , also the pros of using ELT over cons of ETL is creating normalizing tables and real-time materialized views
@SheetalKumari-lk1vv
@SheetalKumari-lk1vv Ай бұрын
Thank you.. it was amazing..
@benoyeremita1359
@benoyeremita1359 3 ай бұрын
Really good explanation man. Hats off to you
@mzeeshan
@mzeeshan 2 жыл бұрын
Loved the details mate!.
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Zeeshan☺️
@francis191
@francis191 4 ай бұрын
Fantastic tutorial
@swaragupta7932
@swaragupta7932 2 жыл бұрын
Easy Explanation, Detailed video
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Swara
@harishb8790
@harishb8790 Жыл бұрын
Amazing explanation. 👏
@ITkFunde
@ITkFunde Жыл бұрын
Glad you liked it
@sunnyj1967
@sunnyj1967 Жыл бұрын
Its a terrific presentation.
@ITkFunde
@ITkFunde Жыл бұрын
thanks sunny
@vigneshbaskaran7931
@vigneshbaskaran7931 Жыл бұрын
Love this content, Thank you so much for all the efforts.
@ITkFunde
@ITkFunde Жыл бұрын
Thanks
@AnandhabalanRadhakrishnan
@AnandhabalanRadhakrishnan Жыл бұрын
Well explained, keep sharing valuable information like this.
@mangesh4231
@mangesh4231 Жыл бұрын
Very detailed explanation, helpful. thanks a lot for all work and efforts.
@HassanYoussoufFossi
@HassanYoussoufFossi Жыл бұрын
I studied Spark and read DAG many times but just understand it now that i'm watching ur tutorial. thks
@raghurajsawant24
@raghurajsawant24 3 жыл бұрын
You are doing a fantastic job. Love your videos.
@Meowlah32
@Meowlah32 2 жыл бұрын
Not exactly a backend developer or data engineer, but this video is very informational on the various data pipeline designs!
@ITkFunde
@ITkFunde 2 жыл бұрын
thanks
@mayurarun
@mayurarun 2 жыл бұрын
This is such a gem video. This would help me so much. Great work.
@rohithsai5265
@rohithsai5265 2 жыл бұрын
Great content 💯
@vaidyanathashankar7441
@vaidyanathashankar7441 Жыл бұрын
Fantastic explanation, thanks for the wonderful session.
@Poornima_life
@Poornima_life 2 жыл бұрын
Absolutely…I liked the video ,content and your valuable efforts….thanks
@JJ-ki2mw
@JJ-ki2mw Жыл бұрын
Thank you so much the way you described it is so easy to understand
@mohit.srivastava
@mohit.srivastava 2 жыл бұрын
both this and the previous connected video explained the concept really well. thanks!!
@Liubov_110
@Liubov_110 Жыл бұрын
Thank you so much for this detailed video 👍
@greenshadowooo
@greenshadowooo Жыл бұрын
Thanks for your sharing ! 😀😀😀
@almamun8291
@almamun8291 2 жыл бұрын
Thank you very much, got clear concept about data pipelines
@jagss3472
@jagss3472 Жыл бұрын
Lovely explanation and very insight details.
@ITkFunde
@ITkFunde Жыл бұрын
Glad it was helpful Jaga!
@connect_vikas
@connect_vikas 2 жыл бұрын
Love you brother for beautifully explained this.
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Vikas
@the.abhisheksinha
@the.abhisheksinha 2 жыл бұрын
nicely explained !
@guidodichiara2243
@guidodichiara2243 2 жыл бұрын
Great job. Keep going on!
@PiyushSharma-jq8rr
@PiyushSharma-jq8rr Жыл бұрын
This was really good :-)
@hsiaoshuang
@hsiaoshuang Жыл бұрын
Very informative!
@federicogonzalez7673
@federicogonzalez7673 3 жыл бұрын
Im glad that I found your video in my feed, nice one
@javierruizdiaz8656
@javierruizdiaz8656 2 жыл бұрын
Thank you, excellent Video.
@itneka
@itneka Жыл бұрын
Thanks for the information
@TheAfroKingPlay
@TheAfroKingPlay 3 жыл бұрын
Very nice video man. Thanks I need this class. Take my like.
@emmanuelaolaiya
@emmanuelaolaiya 9 ай бұрын
Great job and thanks
@augugninfin1034
@augugninfin1034 2 жыл бұрын
Thank You!
@ITkFunde
@ITkFunde 2 жыл бұрын
♥️♥️♥️
@francksgenlecroyant
@francksgenlecroyant 3 жыл бұрын
perfect video about Data Pipelines 👌, thanks!
@AmanGupta-yf1hj
@AmanGupta-yf1hj Жыл бұрын
Wonderful content
@kristhomas8295
@kristhomas8295 3 жыл бұрын
Thank you so much for this!
@jayanth1376
@jayanth1376 2 жыл бұрын
👌👌👌
@mohammadateef3339
@mohammadateef3339 2 жыл бұрын
ur entry is osm sir
@ashokrajur09
@ashokrajur09 2 жыл бұрын
nice one, very informative
@SeanRomberg
@SeanRomberg 2 жыл бұрын
Thanks for the share - you have helped me better understand the pipeline automation software that delivers orchestration, ingestion, transformation, and activation all in one. This makes sense now.
@wendypark3848
@wendypark3848 3 жыл бұрын
I learned a lot network and data pipeline knowledge from you. It''s really hard to learn these from a book. Thanks a lot!
@chinuamareashwar8146
@chinuamareashwar8146 2 жыл бұрын
nice explanation brother
@aditiaditi3302
@aditiaditi3302 7 ай бұрын
Thanks for sharing this video :)
@altamashjawad6691
@altamashjawad6691 3 жыл бұрын
Thank you so much, very nice and comprehensive video!
@DepressedMonkeyGaming
@DepressedMonkeyGaming 3 жыл бұрын
Great Video, Simple and detailed explanation
@rdprasad2225
@rdprasad2225 Ай бұрын
thank you very much
@jay2151000
@jay2151000 28 күн бұрын
Thank you...
@UTUBDZ
@UTUBDZ 3 жыл бұрын
Great content, thank you very much sir !
@victoraf4274
@victoraf4274 Жыл бұрын
such an amazing video! not bored at all (im not joking) hehe
@arond.g1120
@arond.g1120 Жыл бұрын
Feel like I am learning in my own language. ❤❤❤
@ITkFunde
@ITkFunde Жыл бұрын
Thanks Aron ♥️♥️🙏
@ashisharora9649
@ashisharora9649 Жыл бұрын
AMAZING
@Vikas.007
@Vikas.007 3 жыл бұрын
Awesome content 👍👍 Datamart video link in description plz share 🙏
@ajaykiranchundi9979
@ajaykiranchundi9979 3 жыл бұрын
Thank you so much! BTW it was certainly not at all a long video.
@ITkFunde
@ITkFunde 3 жыл бұрын
Thanks Ajay ☺️❤️
@davidcamiloespitiamanrique9
@davidcamiloespitiamanrique9 2 жыл бұрын
Good one! probably, you can talk about AWS DMS and AWS GLUE
@GabrielJambert
@GabrielJambert Жыл бұрын
Thank you
@prashantprashant1291
@prashantprashant1291 3 жыл бұрын
Your videos are full of knowledge.. are u Data Solution Architect
@masterh6868
@masterh6868 3 жыл бұрын
hey your video as usual full for information and with crystal clear concepts of understanding.. Thanks posting such useful video as industries trends.. can you make video data pipeline , which does not fall DAG pattern ...... like Ml pipeline maybe..(not sure)
@ITkFunde
@ITkFunde 3 жыл бұрын
Thanks buddy for your feedback and suggestion ☺️
@Lebrao09
@Lebrao09 3 жыл бұрын
great video!
@StaceyJ1908
@StaceyJ1908 Жыл бұрын
First, your videos are amazing....I have learned so much! I am looking at our current GCP implementation and trying to identify key risks across each step in the pipeline to determine if we have the correct controls in place or gaps...what are key risks to address at each stage of the data pipeline?
@lwhieldon1
@lwhieldon1 2 жыл бұрын
DAG concept is talked about a lot in data science. Can you talk about how this concept in data science correlates with the DAG design?
@VlasTrunov
@VlasTrunov 2 жыл бұрын
It's good you focus on DAG's. But for those new to the subject it might be too abstract, I guess. What I would do is I would show how things flow in Airflow, for example, for those who perceive information visually. This way you would spread the (butter on the bread) information in your video uniformly, makking people get the grasp of the information in one pass, if you know what I mean. It's just a suggestion. But to me personally, the detailization you give is perfect.
@ITkFunde
@ITkFunde 2 жыл бұрын
Thanks Vlas such useful feedbacks helps me better my content. I will defintely take your thaughts and do something better nxt time.😊
@ganeshsrinivasannv4296
@ganeshsrinivasannv4296 2 жыл бұрын
Thanks and it's a great work. Can you share a content on the data captured received as XML messaging pattern and advise on how to store that
@sreechalasani9268
@sreechalasani9268 2 ай бұрын
Thanks for the video ! In the cdc model, rather than capture all the prior versions, when a new version of the data comes in - why can’t the previous versions be invalidated? Wouldn’t that be more efficient?
@upendrakumar-ok3tr
@upendrakumar-ok3tr 2 жыл бұрын
Can you please make a video on Baremetal and Hypervisor
@TheyCalledMeT
@TheyCalledMeT 2 жыл бұрын
would you put data- cleansing / preparation as part of the t of EtLT pattern? or in the T?
@brookster7772
@brookster7772 Жыл бұрын
Great Video! Can you tell where a Vector Database fits into this model? Isnt it at some point all data must be converted to Embeddings / vectors to be stored into a massive Vecotr Store to be used for AI similarity searches?
@arijitsinha2955
@arijitsinha2955 2 жыл бұрын
Can you make video on data bricks along with an example please ?
@deepsy4786
@deepsy4786 3 жыл бұрын
I would like to discuss on considering CDC as a data pipeline design pattern. My understanding would be that CDC is more related to data modelling concept. You would have to build an ELT or ETL pipeline anyways. CDC more relates to Load or Transformation technic instead of being an individual pipeline. However, all the insights shared were helpful and did helped me relate my work with some of these concepts.
@ambarishhazarnis9531
@ambarishhazarnis9531 3 жыл бұрын
Here CDC referred to storing the delta on a separate table. This way we don’t need to do a read on source table again to extract the change.
@arundhutinayak8221
@arundhutinayak8221 3 жыл бұрын
Now I can put technical terms to my current task. Can you do something on API
@lastboomer6164
@lastboomer6164 2 жыл бұрын
Hello I very much appreciate the training. Would you consider a white board exercise whereas the ETL Jobs and Transformations are using a Metadata Data Driven ETL. - I learned that this is a good practice....but one downside is that this data design can not feed a data catalog "lineage"
@anilmantri2139
@anilmantri2139 Жыл бұрын
Hi, how to pull the source data into EL DAG in the CDC pattern. I mean what tech stack to be used?
@imnischaygowda
@imnischaygowda Жыл бұрын
What is purpose of Sink ? Can t we store data directly to DataWarehouse ?
@debrajpradhan5500
@debrajpradhan5500 3 жыл бұрын
Sir, I am interested on AWS analytics.so can u plz tell what AWS data services read 1st,2nd?
@lcsxwtian
@lcsxwtian 3 жыл бұрын
At CDC, you had said that max() would get the latest snapshot of the data. I am assuming max() would get the maximum count of the data - correct? If that were the case, what if the last change was to DELETE some data, then I don't think max() would be right?
@AnishBhola
@AnishBhola Жыл бұрын
yes your right! timestamp based cdc is generally not a good option to process deletes. There are other types of cdc such as log based (most optimal) which you can use for such situations. This video primarily talks about implementing difference based cdc (where 2 snapshots of target systems are compared).
@chinnap8987
@chinnap8987 Ай бұрын
coding is required to learn this?
@SamS-oi5pz
@SamS-oi5pz 3 жыл бұрын
Hi how do we identify changed data from source?
@adamjapal7370
@adamjapal7370 3 жыл бұрын
do you have a reference or pdf book file of the data pipeline concept? if you do, could you take me to the link? thank you.
@veeek8
@veeek8 2 жыл бұрын
'Hope you're not bored', never 😁
@pm4306
@pm4306 Жыл бұрын
please give some concrete business example instead of 'n' and 'n+1' as as example will help to clarify and walk thru in a better way - i think you should give concrete real-life business examples for all cases that u discuss......you are missing actual business examples in your videos.
@antonfernando8409
@antonfernando8409 3 жыл бұрын
Never heard of most of the terms like (ETL, ELT, CDC) mentioned, I guess these are specific to cloud computing, still in terms of data pipeline, its useful to learn I think. Thanks
@egor.okhterov
@egor.okhterov 3 жыл бұрын
No, it’s not about cloud computing. It’s about data analytics in general. When you want to build web dashboards that draw graphs of some business processes or want to analyse customer behavior, you build this data pipeline. TLDR: you cannot run SQL on your logs. You need to push your logs into MySQL in order to be able to query your logs.
@ITkFunde
@ITkFunde 3 жыл бұрын
Hi Anton these terms are quite old but have become more prominent with new age data management. May be you are not from Data, Business Inteligence background, but its good to learn these
@ernesto8738
@ernesto8738 3 жыл бұрын
and here I am with a cyclic graph problem {{{(>_
@dylanmccullough2679
@dylanmccullough2679 3 жыл бұрын
Question regarding the ELT pattern. You said that you should use SQL at the (T)transformation part. Could you use spark instead of SQL at this point? For example - Data Factory Data Flows, instead of putting compute pressure on the EDW with SQL queries?
@chandrakanthotkar7262
@chandrakanthotkar7262 3 жыл бұрын
Whenever we say ELT basically we do transformation after data has been landed in DWH or Database. Like Bigquery (GCP). As Spark engine is basically used for transformation during flow.
@GernPudman
@GernPudman Жыл бұрын
Thanks!
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
It works #beatbox #tiktok
00:34
BeatboxJCOP
Рет қаралды 41 МЛН
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН
ETL Made Simple | What Data Analysts Should Know
6:58
Data Wizardry
Рет қаралды 19 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 182 М.
Data Mesh tutorial for beginners - What is Data Mesh ?
19:59
IT k Funde
Рет қаралды 13 М.
What is Data Pipeline? | Why Is It So Popular?
5:25
ByteByteGo
Рет қаралды 237 М.
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН