Airflow in Practice Stop Worrying Start Loving DAGs - Sarah Schattschneider

  Рет қаралды 58,309

SF Python

SF Python

Күн бұрын

This talk was presented at PyBay2019 - 4th annual Bay Area Regional Python conference. See pybay.com for more details about PyBay and click SHOW MORE for more information about this talk.
Description
Heard of Apache Airflow? Do you work with Airflow or want to work with Airflow? Ever wonder how to better test Airflow? Have you considered all data workflow use cases for Airflow? Come be reminded of key concepts and then we will dive into Airflow’s value add, common use cases, and best practices. Some use cases: Extract Transform Load (ETL) jobs, snapshot databases, and ML feature extraction.
Abstract
Background - What is Airflow? Explain Cron and how it compares to Airflow High level explain the key concepts of Airflow * Direct Acyclic Graph (DAG) - nodes are tasks and edges are dependency structure * Third Party Integrations (Slack, Google Cloud Platform, AWS, etc) * Airflow Hooks & Operators * What is Airflow? * Programmatically author workflows * Stateful scheduling * Rich CLI and UI that make development easy * Logging, monitoring, and alerting * Modularity lends itself well to testability * Solves common problems with batch processing * Open sourced by AirBnB in 2015
Evaluating Airflow * What value does Airflow add? * Retries task elegantly, which handles transient network errors * Alerts on failure (email or slack) * Can re-run specific tasks in a large DAG * Support distributed execution * Great OSS community and momentum * Can be hosted on AWS, Azure, or GCP * Managed options for Airflow - AWS Glue, GCP Cloud Composer, or Azure Data Factory
Does Airflow Have an Ugly Side? How to Overcome Challenges?
Upgrades can be more challenging when you have custom hooks and operators
env vars vs variables vs xcoms
Common Use Cases Extract Transform Load (ETL) Jobs * Airflow enables moving data and transforming data very easily * Can create custom Hooks for Third Party APIs Efficiently Snapshot Databases Create Test Environments for QA ML Feature Extraction
Best Practices Testing * Unit tests from lib functions * Acceptance tests to run list_dags Doc MD for the DAG * Contain Points of Contact * What remediation/escalation steps should the on-call person take when this DAG fails?
Exciting New/New(ish) Features * Lineage * Role Based Access Control * Airflow 2.0 Improvements
Original slides: t.ly/xYJk9
About the speaker
Software Engineer at Blue Apron on the Data Engineering team. Work daily using Python on our data pipeline. Excited by how Python is transforming Data Engineering.
Sponsor Acknowledgement
This and other PyBay2019 videos are via the help of our media partner AlphaVoice (www.alphavoice...!
#pybay #pybay2019 #python #python3 #gdb

Пікірлер: 33
@Shambo271
@Shambo271 3 жыл бұрын
Excellent presentation. It's clear that you use Airflow, know Airflow and your pacing (during the presentation) is perfect. Thank you for the help.
@ajeetis
@ajeetis 4 жыл бұрын
Great talk! Very well explained for all airflow beginners.
@DodaGarcia
@DodaGarcia Жыл бұрын
What a fantastic talk! I've been using Airflow for a few days now and this cleared up a lot of the lingering questions I still had.
@ssb26
@ssb26 3 жыл бұрын
Very useful session, serves as a good introduction to everyone who would like to learn about Apache Airflow.
@stanchen8634
@stanchen8634 4 жыл бұрын
16:48 Isn't it the 3rd of every month at 10;05 in the morning?
@AnandKumar-wg7jo
@AnandKumar-wg7jo 4 жыл бұрын
same thing I noticed !
@airamfuentes5216
@airamfuentes5216 4 жыл бұрын
Thanks , I also noticed.
@lakshminarayanank4116
@lakshminarayanank4116 2 жыл бұрын
I also Noticed....!
@yannickpezeu3419
@yannickpezeu3419 2 жыл бұрын
thanks for this perfect presentation
@imrankhakwani
@imrankhakwani 3 жыл бұрын
Very useful presentation. Thank you.
@prateekshrivastava7733
@prateekshrivastava7733 3 жыл бұрын
nice explanation.. Thanks.
@okechukwuudokoro8157
@okechukwuudokoro8157 4 жыл бұрын
Awesome presentation. I got a better understanding of Airflow. The concepts were explained in simple terms
@me_buckbeak7389
@me_buckbeak7389 4 жыл бұрын
Hi , do you know how to call a yml file from another Linux server to Airflow server through DAG ? The yml consists Linux service restart.
@rajanjoseph4877
@rajanjoseph4877 4 жыл бұрын
Very useful 👍
@kannappansirchabesan8577
@kannappansirchabesan8577 4 жыл бұрын
Very good talk. Thank you
@맛있는치킨-s4k
@맛있는치킨-s4k 4 жыл бұрын
It was a nice talk! Thank you for sharing your excellent story!
@upendramb7456
@upendramb7456 3 жыл бұрын
Can Airflow be used run tasks on hosts outside of kubernetes, as some apps run on dedicated hosts?
@ShashankSingh
@ShashankSingh 3 жыл бұрын
I am guessing, if such an operator doesnt exist already you can write it for yourself. but IIRC aws batch operator, python virtualenv operator exists
@ashrafalkibsi
@ashrafalkibsi 4 жыл бұрын
now I love Airflow even more
@hannimedable
@hannimedable 4 жыл бұрын
Thanks for the presentation! One part still unclear for me about ETL, maybe someone can answer me here.\ If we work with really big DB like 10Tb, should DAG select all data and download it to airflow instance, and then load it to BigQuery or something else? Or DAG should split job and download data with batches?
@me_buckbeak7389
@me_buckbeak7389 4 жыл бұрын
Hi , do you know how to call a yml file from another Linux server to Airflow server through DAG ? The yml consists Linux service restart.
@brothermalcolm
@brothermalcolm Жыл бұрын
directed acyclic dag
@rahulthomas5
@rahulthomas5 4 жыл бұрын
Hello, i would like to know whether there is a way airflow can use autosys job as a dependent job.
@me_buckbeak7389
@me_buckbeak7389 4 жыл бұрын
Hi , do you know how to call a yml file from another Linux server to Airflow server through DAG ? The yml consists Linux service restart.
@jakerutherford5716
@jakerutherford5716 3 жыл бұрын
Didn't know Jennifer Lawrence was so knowledgeable about Airflow
@dsinghr
@dsinghr 4 жыл бұрын
good that she finds time to write DAGs after running around jungles in Hunger games
@konutek7716
@konutek7716 3 жыл бұрын
Cool
@ThangTran-jv7mm
@ThangTran-jv7mm 4 жыл бұрын
I didn't know Jennifer Lawrence was also a talented developer as well. =D
@beixu9998
@beixu9998 4 жыл бұрын
hah!!!, I thought the same!
@_Machiavel_
@_Machiavel_ 4 жыл бұрын
lot off blala
@user-pp3xq1ot8l
@user-pp3xq1ot8l 3 жыл бұрын
You are looking so beautiful... flawless, very attractive n natural !!!
Don't Use Apache Airflow
16:21
Bryan Cafferky
Рет қаралды 93 М.
🌈Apache Airflow for beginners
29:06
PyConDE
Рет қаралды 71 М.
1ОШБ Да Вінчі навчання
00:14
AIRSOFT BALAN
Рет қаралды 6 МЛН
这三姐弟太会藏了!#小丑#天使#路飞#家庭#搞笑
00:24
家庭搞笑日记
Рет қаралды 124 МЛН
The CUTEST flower girl on YouTube (2019-2024)
00:10
Hungry FAM
Рет қаралды 49 МЛН
小丑和白天使的比试。#天使 #小丑 #超人不会飞
00:51
超人不会飞
Рет қаралды 45 МЛН
The Newcomer's Guide to Airflow's Architecture
27:26
Apache Airflow
Рет қаралды 23 М.
Airflow 101: Essential Tips For Beginners
52:12
Astronomer
Рет қаралды 6 М.
Airflow Tutorial for Beginners - Full Course in 2 Hours 2022
2:01:13
Best Practices For Writing DAGs In Airflow 2
46:24
Astronomer
Рет қаралды 9 М.
Airflow with DBT tutorial - The best way!
17:54
Data with Marc
Рет қаралды 45 М.
1ОШБ Да Вінчі навчання
00:14
AIRSOFT BALAN
Рет қаралды 6 МЛН