Airflow Vs. Dagster: The Full Breakdown!

  Рет қаралды 8,412

The Data Guy

The Data Guy

Күн бұрын

Пікірлер: 29
@thanhbinh24
@thanhbinh24 Жыл бұрын
This video is just right on point! I had my first job as a DE recently and was tasked with migrating all the cronjobs to an orchestration tool and I was looking for the best option, and now i'm pretty sure that we'll be better off with Airflow. Thank you and keep up the good work my man
@thedataguygeorge
@thedataguygeorge Жыл бұрын
Thank you so much, happy this helped you make a decision!!
@jarredthedataengineer
@jarredthedataengineer 10 ай бұрын
This an interesting video but it is fairly inaccurate about Dagster, I'm sure not out of malice, but probably because op is more familiar with Airflow. for ex... Dagster is open-source, it is super extensible and modular, etc. I'd also point out a pretty important difference between Dagster and Airflow, Dagster enables a local to production test-build and deploy cycle, which is not really possible with Airflow. Also, Dagster comes with a ton of automation capabilities that just aren't possible with an imperative orchestator like Airflow. This is a pretty deep subject that requires a fair amount of knowledge by the author to really give a fair comparison, and it's somewhat lacking in this video.
@ricardomalla6533
@ricardomalla6533 11 ай бұрын
would airflow be a good fit to orquestate a couple of python scripts to send marketing emails to our customers based on certain criteria? is there something better for this application?
@thedataguygeorge
@thedataguygeorge 11 ай бұрын
Thats a great use case for Airflow! MailChimp might also be a good option for that particular use case as well!
@datalearningsihan
@datalearningsihan Жыл бұрын
Thank you. I feel privileged for making the video on my request. I know I know, I will take the whole of the credits :D
@thedataguygeorge
@thedataguygeorge Жыл бұрын
hahahaha no worries man, doing it all for you!
@nixbruh
@nixbruh Жыл бұрын
awesome stuff bro. question, is there any reason why not just to use these things as schedulers and just have them spin up containers that hold the code? i feel like you get tied to a specific framework and it turns into a nightmare...
@nixbruh
@nixbruh Жыл бұрын
i guess the only downside is that you can stop and start parts of the code that might fail or just to run things manually? but idk if that trade off is worth it...hoping people who know what they're doing can share opinions
@thedataguygeorge
@thedataguygeorge Жыл бұрын
That is a totally valid approach, honestly one that I think Airflow excels at. A lot I see using Airflow in production are just using it to call out to other containers/services to run those jobs there, and just have Airflow as a centralized error-handling/monitoring layer on top in addition to its scheduling capabilities
@simondelorean
@simondelorean Жыл бұрын
Thank you, that was very helpful.
@thedataguygeorge
@thedataguygeorge Жыл бұрын
You're very welcome, glad it was helpful!
@luiztauffer8513
@luiztauffer8513 Жыл бұрын
Hey, thanks a lot for the insightful overview! And your channel is awesome for Airflow content. I'd love to see a similar comparisons with Flyte and Kestra
@thedataguygeorge
@thedataguygeorge Жыл бұрын
Thanks Luiz! Really appreciate the love! And will put them in the schedule, thanks for the idea!
@ofnotandi
@ofnotandi Жыл бұрын
Dagster is open source according to the homepage
@thedataguygeorge
@thedataguygeorge Жыл бұрын
Sorry you're right, I think it's more of an open-core since there's not much development outside of the dagster company but that's definitely up for debate!
@baja
@baja Жыл бұрын
Coming here as someone who uses dagster daily and wants to know if Airflow is worth it so appreciate this comparison A few things on the Dagster side: for the first example you can do exactly what you have in Airflow in Dagster. You can create branching logic by having an Op have multi outputs (not all required) and only output the single one for the day of the week. You can wrap this branching Op and the specific day of the week Ops in a graph and build this graph into one of the assets shown. If guitar lessons, family dinner, etc... produce assets, you can just make them their own assets and have a similar not required feature where they only fire on their specific day of the week. In the UI you can expand the assets to their Ops and Graphs to see the branching logic I use this for example by training a ML model every monday and then running predictions using it after. Every other day of the week, we just use the previous model for predictions without retraining I don't really understand the point about testing in dagster? You can add assertions/raise errors in the Dagster Assets, there's also hooks which are separate functions that run after the completion of an asset (these can send messages to slack, do any quality checks, etc... it's just a python function) - which is just nicer to keep things separate. Most of those logs you're seeing in Dagster will be user specified as the logger gets passed into the Asset function - I log debug info, errors, warnings, etc... I don't really understand the last point about dagster api?? You can run anything in Dagster, for example if you want to trigger something in Fivetran or DBT Cloud, the dagster code is just hitting the endpoint and polling while computations are done elsewhere. You can set up your own api's to do a similar thing. I don't really like how Dagster couples compute and orchestration so much but it seems like Airflow is doing a similar thing and you don't have to use Dagster this way. There's IO managers to manage the data passing between assets. This doesn't have to be JSON data from an API but any python variable. I run dagster on kubernetes where each asset is run in it's own pod so I'll use S3 or GCS, etc... to pickle the python objects and pass between pods. My understanding is that this is an advantage dagster has because it type checks the data going between pods. There's other tasks where my assets just run cli, one example being running scripts in R
@thedataguygeorge
@thedataguygeorge Жыл бұрын
Wow thank you so much for that breakdown, really really appreciate it! Am planning on a revised version of this video to give Dagster more credibility after learning all these things, made the video when I was still relatively new to Dagster
@baja
@baja Жыл бұрын
@@thedataguygeorge All good, and looking forward to the new videa! It did take me a lot of time using Dagster to learn a lot of these thigns
@nixbruh
@nixbruh Жыл бұрын
one thing i have to say that sucks is let's say you want to have two ops in a job, and have them run in parallel - dagster won't let you do that if your io managers are in memory. it will force one to wait for the other. for me that defeats the whole purpose honestly. maybe im clueless?@@baja
@baja
@baja Жыл бұрын
@@nixbruh This shouldn't depend on the io manager but on the executor you're using. Are you using a multiprocess executor or in process? I don't have an issue using multiprocess locally or I typically use a k8s_executor when deploying. I typically use the fs_io_manager instead of in memory locally but again that shouldn't matter
@joshuasmith2814
@joshuasmith2814 Жыл бұрын
Great content... (horrid audio, was your landlady vacuuming?)
@thedataguygeorge
@thedataguygeorge Жыл бұрын
Thanks Josh! And apologies, I had facade construction going on outside my window from 8-6 the past couple months that was really screwing me up, all done now though, hopefully its better in recent videos!
@christophergutknecht8683
@christophergutknecht8683 5 ай бұрын
Love the content! Audio could be better, squeaky chair and booming background noise are a little distracting
@thedataguygeorge
@thedataguygeorge 5 ай бұрын
Thanks for the tips, hope my more recent videos are more up to snuff!
@StefanoMessina-ux2mj
@StefanoMessina-ux2mj 7 ай бұрын
I'm pretty sure Dagster is open source
@thedataguygeorge
@thedataguygeorge 7 ай бұрын
It technically is but 90% of the dev work is from the on-staff Dagster team
@ricardomalla6533
@ricardomalla6533 11 ай бұрын
genius.
@thedataguygeorge
@thedataguygeorge 11 ай бұрын
Thanks man!
Databricks Vs. Airflow for ETL Workflows!
29:55
The Data Guy
Рет қаралды 3 М.
Intro To Data Orchestration With Airflow
53:56
Astronomer
Рет қаралды 9 М.
Who is More Stupid? #tiktok #sigmagirl #funny
0:27
CRAZY GREAPA
Рет қаралды 10 МЛН
Jaidarman TOP / Жоғары лига-2023 / Жекпе-жек 1-ТУР / 1-топ
1:30:54
Getting Started with Prefect | Task Orchestration & Data Workflows
26:40
Kahan Data Solutions
Рет қаралды 42 М.
Dagster Hot Takes:  Airflow
3:49
Sean Lopp @ Dagster Labs
Рет қаралды 1,1 М.
Dagster Data Orchestration 10 min walkthrough
10:28
Dagster
Рет қаралды 26 М.
Choose the Right Data Orchestration Tool for Your Needs
23:23
The DataYard Podcast
Рет қаралды 1,8 М.
Don't Use Apache Airflow
16:21
Bryan Cafferky
Рет қаралды 102 М.