Why Data Engineers LOVE/HATE Airflow (FT.

  Рет қаралды 41,267

Seattle Data Guy

Seattle Data Guy

Күн бұрын

Пікірлер: 63
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
@MarcLamberti
@MarcLamberti 2 жыл бұрын
Thank you for making this video. I don't want to over promote Airflow because I'm obviously a little bit biased, but I do think a lot of people still know Airflow from version 1.10.X and haven't tried 2.X yet. Many things have been fixed (performances, dag autorhing, UI, etc.). The gap is just huge. Also, I would say the flexibility/freedom that Airflow brings is a double edge sword: You can do a lot, you can configure many things, touch any details to fit perfectly with you needs, but the deeper you go the steeper the learning curve. It's easy to get lost in all features and parameterable things that Airflow brings. However, it's relatively easy IMHO if you just want to run data pipelines and execute a few tasks.❤
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Thank you, yeah I think, as you said, most people use Airflow at a very base level. Even if they are using 2.X. Also, I think a while back you may have had a comment on collabing...I feel like I never got back to you on that
@splashoui3760
@splashoui3760 2 жыл бұрын
What is the best way to learn and practise airflow?
@sreemantakesh9637
@sreemantakesh9637 Жыл бұрын
Hi @@SeattleDataGuy . I am seeing lot of people using Airflow to orchestrate ADF in Azure. Is it really worth using it given we already have ADF triggers?
@anna__geller
@anna__geller 2 жыл бұрын
Awesome video, a very balanced perspective without focusing only on strengths or weaknesses of any single tool 👍
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Glad you found it helpful! I really was trying to be balanced so I am glad you felt that way.
@rdean150
@rdean150 2 жыл бұрын
We've adopted Argo Workflows, which is a Cloud Native Computing Foundation project built on top of Kubernetes.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Nice! Any pros and cons with that?
@metrocartao
@metrocartao Ай бұрын
It is very robust and language agnostic but you need to be a k8s shop
@chetansurwade
@chetansurwade 2 жыл бұрын
I for one didn't face any issue while working with Xcoms, specially with large dataset using custom backend of Azure Blob storage. And Airflow by design is an orchestrator, so offloading computation is more sensible.
@miguelvera9465
@miguelvera9465 2 жыл бұрын
This was very interesting, glad to hear the different insights. Hope to see more collaborations in the community
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
That is my goal! I really want to get more perspectives than my own.
@sevegarza
@sevegarza 2 жыл бұрын
Do a video about Prefect!
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Not yet!
@jamespaz4333
@jamespaz4333 Жыл бұрын
X2 😂
@mehdio
@mehdio 2 жыл бұрын
Cool journalist approach, glad to have other's opinion included! 👍
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Thank you so much for all your perspective on the topic!
@nashaeshire6534
@nashaeshire6534 9 ай бұрын
Thanks a lot, much appreciate. I plan to use Apache Kafka on log system. In order to add maintenability to my ETL (transform on Kafka and before ElasticSearch), I wish to add air flow. But Apache Kafka connect look pretty good too. Over this 2 solutions, what will you choose for an ELK + Kafka Pipeline ?
@anildangol
@anildangol 2 жыл бұрын
I don't think there is best ETL pipeline and I would not bother finding the best one. Each company and team operates differently depending on their skillset, Line of Business & priorities. I never had problem while working in SSIS and rarely have problem while working in Data factory either. Yes, each tool have lots of limitations but you will find a way to overcome those limitations. One thing which I liked about Azure Data Factory is its ease of use with no code and extremely cheap to maintain. Yes, I like to code in Python and work on airflow which gives extreme flexibility which I couldn't have it in ADF but if ADF gives me headache then I will go with this tool anyway. I've onboarded a junior dev who have never worked in any ETL tool in a week. It's that easy. May time we, data engineers, spend our time tweaking and finding best tools possible in the market but companies hired us to deliver result.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
That's fair. Tool wise I think its always up in the air in terms of which is best. I think finding a process that works with the tools you have at hand is probably far more important...because once you switch companies it will be a completely new tool. As you said, results first, fancy tools later.
@alexischicoine2072
@alexischicoine2072 2 жыл бұрын
I’ve also found data factory to work well for orchestrating and the low code keeps it simple. The actual steps being orchestrated are already complex enough.
@tanmaybagul2957
@tanmaybagul2957 2 жыл бұрын
😅😅
@mohamedyasser5285
@mohamedyasser5285 2 жыл бұрын
Great video! I would love to hear your opinion about Apache Kedro.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
It feels like one day it could be great, but I feel like its still early and needs a stronger community before I would adopt it.
@kevinsu2219
@kevinsu2219 Жыл бұрын
Do a video about flyte
@JimRohn-u8c
@JimRohn-u8c 2 жыл бұрын
How do you feel about Prefect?
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
I still haven't got it into production. I believe Madison has a better opinion here. madisonmae.substack.com/p/sorry-i-hate-airflow
@brettstoddard7947
@brettstoddard7947 5 ай бұрын
I've only used airflow in a narrow capacity for handling scheduling & dependencies. What's the k8s drama about? I've never heard airflow and k8s used in the same sentence before
@gava5327
@gava5327 2 жыл бұрын
Can you review the Meta Database Engineer Professional Certificate on Coursera when it comes out?
@minthura24
@minthura24 2 жыл бұрын
Thanks for the video.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
glad you enjoyed it!
@peterbizik224
@peterbizik224 Жыл бұрын
Interesting point of views, thanks for the video. As I see it, technology evolves, but the tech stacks, getting crazy complicated. At the end, mostly it got stuck on the budget, get someone cheap (overpromise data engineer) and you are getting headache, can't move from dev environment and most of the data pipelines are sql at the end. But I could be wrong.
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
They are getting crazy, keep things as simple as possible for as long as possible
@mauludinrohman6177
@mauludinrohman6177 2 жыл бұрын
What is the different between airflow and astronomer, can you help me sir ?
@janswee1
@janswee1 Жыл бұрын
You should summarize pros and cons in the beginning
@robot01001
@robot01001 Жыл бұрын
I'm halfway through this video and I still don't know wtf AIrflow is. I know it has a k8s operator but I have no idea what it is or why I would use it. Maybe this video is for advanced people.
@rguez2332
@rguez2332 2 жыл бұрын
Is Pentaho PDI used for different purposes than Airflow??
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
I don't think Pentaho is that popular..but i could be wrong. Where do you use it? Have you used it alot?
@rguez2332
@rguez2332 2 жыл бұрын
@@SeattleDataGuy Its used for ETL. But I can mention popular tools. Im still learnirng and I was wondering if you can ETL,ELT the data with tools like FiveTran/Airbyte/Stitch without using Airflow. Is Airflow used just to automate or you can get the whole ETL process with it?
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
@@rguez2332 In the past airflow was used for everything. But technically its just an orchestrator. Nowadays people are trying to use other tools like airbyte, dbt + airflow to make pipelines. But thats more for open-source style pipelines. There are so many other tools that people like out there.
@rguez2332
@rguez2332 2 жыл бұрын
@@SeattleDataGuy thx so much!
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
@@rguez2332 you're welcome!
@sana-sz5ue
@sana-sz5ue 2 жыл бұрын
What are peoples thoughts on what data engineer career progression is like because you dont gain a qualification, only work experience???
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Do you mean like certificates?
@sana-sz5ue
@sana-sz5ue 2 жыл бұрын
@@SeattleDataGuy yes like how can you keep working your way up without qualifications or in the tech industry do certificates work the same way?
@DataPains
@DataPains 6 ай бұрын
Used it for years, I also tried the later 2.x version, I still don't like it, and I think there are better ways of architecting pipelines. But yeah I was amazed when I saw Airflow the first time, and it did solve a lot of problems, but I still think, it is a tool of the past. I hope I am wrong!
@SeattleDataGuy
@SeattleDataGuy 6 ай бұрын
It's been a decade, so I wouldn't be surprised to see it replaced in the next 5 years. But never know, some things are hard to get rid of.
@datawitharslan
@datawitharslan 2 жыл бұрын
As a starter in Modern Data Stack , should i learn Prefect or Airflow ? What you recommend
@valerianmp
@valerianmp 2 жыл бұрын
Just pick any one of that, you can always learn the other one later when you need it
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Valerian is kind of right. Airflow is far more popular for now. But in tech you're constantly learning. What was important to know this year is old news 3 years from now.
@Emanuel-yb3qk
@Emanuel-yb3qk 2 жыл бұрын
Hi I’m a new subscriber and I just saw your video of “roadmap to become a data engineer” and, I wonder if you could advice me a course to learn python. You channel is awesome
@lucasbayout195
@lucasbayout195 2 жыл бұрын
Airflow is amazing.
@TheSilpelit
@TheSilpelit 2 жыл бұрын
Why can't you use the well known DevOps tools like Jenkins?
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
To manage custom data pipelines? I have seen it done. It was pretty hairy though.
@AmantisAnalytics
@AmantisAnalytics Жыл бұрын
Mage AI
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
woo hoo!
@ASO-xh5vu
@ASO-xh5vu 2 жыл бұрын
This is a perfect channel. My only criticism is "verbosity". Too many words...
@deepanshurathore9661
@deepanshurathore9661 4 ай бұрын
Airflow is shit...
@samsal073
@samsal073 2 жыл бұрын
Airflow sucks ....it should be thrown in the trash ......doesn't support muli system, all code based which violates low code-no code rules....impossible to install cluster on standalone mode on premise without involving technologies like docker and kubernetes which increases complexity...etc.
@romank7944
@romank7944 6 ай бұрын
In this case, please tell me what tools you could recommend for orchestrating ETL processes in Python (on Windows) What do you prefer? thanks
@samsal073
@samsal073 6 ай бұрын
@romank7944 look at apache nifi . It's great tool that is easy to install, setup and scale. Even though it's recommended to run on Unix based systems it runs fine on windows. It's java based app but post 2.0 version supports writing python extensions.
Building Data Pipelines Part 1: Airbnb's Airflow Vs Spotify's Luigi
15:34
Why Everyone Cares About Snowflake
11:41
Seattle Data Guy
Рет қаралды 118 М.
The evil clown plays a prank on the angel
00:39
超人夫妇
Рет қаралды 53 МЛН
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Tuna 🍣 ​⁠@patrickzeinali ​⁠@ChefRush
00:48
albert_cancook
Рет қаралды 148 МЛН
Airflow DAG: Coding your first DAG for Beginners
20:31
Data with Marc
Рет қаралды 238 М.
Don't Use Apache Airflow
16:21
Bryan Cafferky
Рет қаралды 101 М.
Apache Airflow Architecture 101
18:29
Bryan Cafferky
Рет қаралды 13 М.
Airflow Vs. Dagster: The Full Breakdown!
14:51
The Data Guy
Рет қаралды 8 М.
The Harsh Reality of Being a Data Engineer
14:21
Jash Radia
Рет қаралды 254 М.
Airflow for Beginners - Run Spotify ETL Job in 15 minutes!
16:38
Karolina Sowinska
Рет қаралды 145 М.
GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem
19:15
What is ETL | What is Data Warehouse | OLTP vs OLAP
8:07
codebasics
Рет қаралды 436 М.
The evil clown plays a prank on the angel
00:39
超人夫妇
Рет қаралды 53 МЛН