If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
@MarcLamberti2 жыл бұрын
Thank you for making this video. I don't want to over promote Airflow because I'm obviously a little bit biased, but I do think a lot of people still know Airflow from version 1.10.X and haven't tried 2.X yet. Many things have been fixed (performances, dag autorhing, UI, etc.). The gap is just huge. Also, I would say the flexibility/freedom that Airflow brings is a double edge sword: You can do a lot, you can configure many things, touch any details to fit perfectly with you needs, but the deeper you go the steeper the learning curve. It's easy to get lost in all features and parameterable things that Airflow brings. However, it's relatively easy IMHO if you just want to run data pipelines and execute a few tasks.❤
@SeattleDataGuy2 жыл бұрын
Thank you, yeah I think, as you said, most people use Airflow at a very base level. Even if they are using 2.X. Also, I think a while back you may have had a comment on collabing...I feel like I never got back to you on that
@splashoui37602 жыл бұрын
What is the best way to learn and practise airflow?
@sreemantakesh9637 Жыл бұрын
Hi @@SeattleDataGuy . I am seeing lot of people using Airflow to orchestrate ADF in Azure. Is it really worth using it given we already have ADF triggers?
@anna__geller2 жыл бұрын
Awesome video, a very balanced perspective without focusing only on strengths or weaknesses of any single tool 👍
@SeattleDataGuy2 жыл бұрын
Glad you found it helpful! I really was trying to be balanced so I am glad you felt that way.
@rdean1502 жыл бұрын
We've adopted Argo Workflows, which is a Cloud Native Computing Foundation project built on top of Kubernetes.
@SeattleDataGuy2 жыл бұрын
Nice! Any pros and cons with that?
@metrocartaoАй бұрын
It is very robust and language agnostic but you need to be a k8s shop
@chetansurwade2 жыл бұрын
I for one didn't face any issue while working with Xcoms, specially with large dataset using custom backend of Azure Blob storage. And Airflow by design is an orchestrator, so offloading computation is more sensible.
@miguelvera94652 жыл бұрын
This was very interesting, glad to hear the different insights. Hope to see more collaborations in the community
@SeattleDataGuy2 жыл бұрын
That is my goal! I really want to get more perspectives than my own.
@sevegarza2 жыл бұрын
Do a video about Prefect!
@SeattleDataGuy2 жыл бұрын
Not yet!
@jamespaz4333 Жыл бұрын
X2 😂
@mehdio2 жыл бұрын
Cool journalist approach, glad to have other's opinion included! 👍
@SeattleDataGuy2 жыл бұрын
Thank you so much for all your perspective on the topic!
@nashaeshire65349 ай бұрын
Thanks a lot, much appreciate. I plan to use Apache Kafka on log system. In order to add maintenability to my ETL (transform on Kafka and before ElasticSearch), I wish to add air flow. But Apache Kafka connect look pretty good too. Over this 2 solutions, what will you choose for an ELK + Kafka Pipeline ?
@anildangol2 жыл бұрын
I don't think there is best ETL pipeline and I would not bother finding the best one. Each company and team operates differently depending on their skillset, Line of Business & priorities. I never had problem while working in SSIS and rarely have problem while working in Data factory either. Yes, each tool have lots of limitations but you will find a way to overcome those limitations. One thing which I liked about Azure Data Factory is its ease of use with no code and extremely cheap to maintain. Yes, I like to code in Python and work on airflow which gives extreme flexibility which I couldn't have it in ADF but if ADF gives me headache then I will go with this tool anyway. I've onboarded a junior dev who have never worked in any ETL tool in a week. It's that easy. May time we, data engineers, spend our time tweaking and finding best tools possible in the market but companies hired us to deliver result.
@SeattleDataGuy2 жыл бұрын
That's fair. Tool wise I think its always up in the air in terms of which is best. I think finding a process that works with the tools you have at hand is probably far more important...because once you switch companies it will be a completely new tool. As you said, results first, fancy tools later.
@alexischicoine20722 жыл бұрын
I’ve also found data factory to work well for orchestrating and the low code keeps it simple. The actual steps being orchestrated are already complex enough.
@tanmaybagul29572 жыл бұрын
😅😅
@mohamedyasser52852 жыл бұрын
Great video! I would love to hear your opinion about Apache Kedro.
@SeattleDataGuy2 жыл бұрын
It feels like one day it could be great, but I feel like its still early and needs a stronger community before I would adopt it.
@kevinsu2219 Жыл бұрын
Do a video about flyte
@JimRohn-u8c2 жыл бұрын
How do you feel about Prefect?
@SeattleDataGuy2 жыл бұрын
I still haven't got it into production. I believe Madison has a better opinion here. madisonmae.substack.com/p/sorry-i-hate-airflow
@brettstoddard79475 ай бұрын
I've only used airflow in a narrow capacity for handling scheduling & dependencies. What's the k8s drama about? I've never heard airflow and k8s used in the same sentence before
@gava53272 жыл бұрын
Can you review the Meta Database Engineer Professional Certificate on Coursera when it comes out?
@minthura242 жыл бұрын
Thanks for the video.
@SeattleDataGuy2 жыл бұрын
glad you enjoyed it!
@peterbizik224 Жыл бұрын
Interesting point of views, thanks for the video. As I see it, technology evolves, but the tech stacks, getting crazy complicated. At the end, mostly it got stuck on the budget, get someone cheap (overpromise data engineer) and you are getting headache, can't move from dev environment and most of the data pipelines are sql at the end. But I could be wrong.
@SeattleDataGuy Жыл бұрын
They are getting crazy, keep things as simple as possible for as long as possible
@mauludinrohman61772 жыл бұрын
What is the different between airflow and astronomer, can you help me sir ?
@janswee1 Жыл бұрын
You should summarize pros and cons in the beginning
@robot01001 Жыл бұрын
I'm halfway through this video and I still don't know wtf AIrflow is. I know it has a k8s operator but I have no idea what it is or why I would use it. Maybe this video is for advanced people.
@rguez23322 жыл бұрын
Is Pentaho PDI used for different purposes than Airflow??
@SeattleDataGuy2 жыл бұрын
I don't think Pentaho is that popular..but i could be wrong. Where do you use it? Have you used it alot?
@rguez23322 жыл бұрын
@@SeattleDataGuy Its used for ETL. But I can mention popular tools. Im still learnirng and I was wondering if you can ETL,ELT the data with tools like FiveTran/Airbyte/Stitch without using Airflow. Is Airflow used just to automate or you can get the whole ETL process with it?
@SeattleDataGuy2 жыл бұрын
@@rguez2332 In the past airflow was used for everything. But technically its just an orchestrator. Nowadays people are trying to use other tools like airbyte, dbt + airflow to make pipelines. But thats more for open-source style pipelines. There are so many other tools that people like out there.
@rguez23322 жыл бұрын
@@SeattleDataGuy thx so much!
@SeattleDataGuy2 жыл бұрын
@@rguez2332 you're welcome!
@sana-sz5ue2 жыл бұрын
What are peoples thoughts on what data engineer career progression is like because you dont gain a qualification, only work experience???
@SeattleDataGuy2 жыл бұрын
Do you mean like certificates?
@sana-sz5ue2 жыл бұрын
@@SeattleDataGuy yes like how can you keep working your way up without qualifications or in the tech industry do certificates work the same way?
@DataPains6 ай бұрын
Used it for years, I also tried the later 2.x version, I still don't like it, and I think there are better ways of architecting pipelines. But yeah I was amazed when I saw Airflow the first time, and it did solve a lot of problems, but I still think, it is a tool of the past. I hope I am wrong!
@SeattleDataGuy6 ай бұрын
It's been a decade, so I wouldn't be surprised to see it replaced in the next 5 years. But never know, some things are hard to get rid of.
@datawitharslan2 жыл бұрын
As a starter in Modern Data Stack , should i learn Prefect or Airflow ? What you recommend
@valerianmp2 жыл бұрын
Just pick any one of that, you can always learn the other one later when you need it
@SeattleDataGuy2 жыл бұрын
Valerian is kind of right. Airflow is far more popular for now. But in tech you're constantly learning. What was important to know this year is old news 3 years from now.
@Emanuel-yb3qk2 жыл бұрын
Hi I’m a new subscriber and I just saw your video of “roadmap to become a data engineer” and, I wonder if you could advice me a course to learn python. You channel is awesome
@lucasbayout1952 жыл бұрын
Airflow is amazing.
@TheSilpelit2 жыл бұрын
Why can't you use the well known DevOps tools like Jenkins?
@SeattleDataGuy2 жыл бұрын
To manage custom data pipelines? I have seen it done. It was pretty hairy though.
@AmantisAnalytics Жыл бұрын
Mage AI
@SeattleDataGuy Жыл бұрын
woo hoo!
@ASO-xh5vu2 жыл бұрын
This is a perfect channel. My only criticism is "verbosity". Too many words...
@deepanshurathore96614 ай бұрын
Airflow is shit...
@samsal0732 жыл бұрын
Airflow sucks ....it should be thrown in the trash ......doesn't support muli system, all code based which violates low code-no code rules....impossible to install cluster on standalone mode on premise without involving technologies like docker and kubernetes which increases complexity...etc.
@romank79446 ай бұрын
In this case, please tell me what tools you could recommend for orchestrating ETL processes in Python (on Windows) What do you prefer? thanks
@samsal0736 ай бұрын
@romank7944 look at apache nifi . It's great tool that is easy to install, setup and scale. Even though it's recommended to run on Unix based systems it runs fine on windows. It's java based app but post 2.0 version supports writing python extensions.