Data Engineering Principles - Build frameworks not pipelines - Gatis Seja

  Рет қаралды 153,843

PyData

PyData

Күн бұрын

PyData London Meetup #54
Tuesday, March 5, 2019
Data pipelines are necessary for the flow of information from its source to its consumers, typically data scientists, analysts and software developers. Managing data flow from many sources is a complex task where the maintenance cost limits scale of being able to build a large reliable data warehouse. This presentation proposes a number of applied data engineering principles that can be used to build robust easily manageable data pipelines and data products. Examples will be shown using Python on AWS.
Sponsored & Hosted by Man AHL
****
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our KZbin videos to help with discoverability? Find out more here: github.com/numfocus/KZbinVi...

Пікірлер: 17
@efeorikpete8774
@efeorikpete8774 2 жыл бұрын
Fast-forward to 3 years later: AIRFLOW now has robust documentation for authoring, scheduling and monitoring your data pipeline
@MrKane101111
@MrKane101111 2 жыл бұрын
Great presentation, really nice analogy and very clear.
@severtone263
@severtone263 Жыл бұрын
This was very helpful. That analogy is simply the best.
@AshokTak
@AshokTak 2 жыл бұрын
00:00 Welcome 00:34 Merchant John Story 08:17 Need for standardization 22:26 Q&A Will update it later.
@dmytrooliinyk3083
@dmytrooliinyk3083 22 күн бұрын
That's a great talk!
@boudehoucherahma8083
@boudehoucherahma8083 2 жыл бұрын
Verry interesting présentation. Tanks🙏
@TheSolbiatii
@TheSolbiatii 2 жыл бұрын
00:00 Welcome 00:34 Merchant John Story 08:17 Need for standardization 10:25 Traditional Pipeline vs Ideal Framework with Validations 18:02 Principles 22:26 Q&A
@jamesattwood3454
@jamesattwood3454 2 жыл бұрын
Great talk!
@alexgartner8187
@alexgartner8187 2 жыл бұрын
Awesome
@augugninfin1034
@augugninfin1034 Жыл бұрын
great...
@horaceweatherby2910
@horaceweatherby2910 Жыл бұрын
To be honest, I didn't find this to be very helpful. I'm a project manager tasked with redesigning the whole data environment in a small enterprise, technically minded but never formally studied. It seemed like the presenter didn't make the case for the presentation's title "Build frameworks, not pipelines." I didn't observe a part where he discounted pipelines. The beginning 10 minutes about many units being used across Britain as an analogy for different technologies and systems in data didn't reveal any insights and can be safely skipped IMO. After that, the diagramming of a framework from the data source all the way to a data warehouse seems more like an explanation for beginner's, but without the clarity that such an explanation should possess. Overall, seemed like an inadequately organized way to present a basic idea. Though, some individual points from this presentation that I took away: - Keep HTML files from web scraping, not just fields, for access to the data at any time without going back to the original source - Maintain a layer for failed data extractions: this has been my idea for a long time but good to see it articulated by an actual data engineer - Maintain a layer as a staging data warehouse, prior to the production data warehouse Instead, I found this recommended video better, even though it was more complex: kzbin.info/www/bejne/eWekk6lubKlomrc It goes more in-depth about one company's challenges in designing a new data pipeline and offers insights that are generalizable to anyone setting up or upgrading such a pipeline.
@ooker777
@ooker777 10 ай бұрын
Thanks for your time and effort to write a detailed review
@mayurarun
@mayurarun Жыл бұрын
Nice
@firefoxmetzger9063
@firefoxmetzger9063 2 жыл бұрын
Somehow this makes me think of XKCD's Standards comic.
@julianatlas5172
@julianatlas5172 2 жыл бұрын
I likes the xkdc about date format. There is only one good date format according to the ISO 8601 which is YYYY-MM-DD e.g 2021-12-15
@vansf3433
@vansf3433 Жыл бұрын
It's too simple, and anyone can learn the process of sorting out, transforming and transmitting data without any need of good knowledge of CS
@RedShipsofSpainAgain
@RedShipsofSpainAgain 11 ай бұрын
First 10 minutes he talks about different measuring units in Britain as a bad analogy for the importance of standards in modern daya engineering: it has zero relevance to data engineering platforms. Really poor analogy. Just skip to 10:20.
Functional Data Engineering - A Set of Best Practices | Lyft
39:43
Data Council
Рет қаралды 76 М.
I wish I could change THIS fast! 🤣
00:33
America's Got Talent
Рет қаралды 41 МЛН
Can teeth really be exchanged for gifts#joker #shorts
00:45
Untitled Joker
Рет қаралды 14 МЛН
How Data Engineering Works
14:14
AltexSoft
Рет қаралды 420 М.
Data Engineering and Data Science: Bridging the Gap | DataEDGE 2016
30:13
Berkeley School of Information
Рет қаралды 26 М.
Data Engineering Career Tips By Airbnb Data Engineer | Part 1
31:05
Fundamentals of Data Engineering | Joe Reis and Matt Housley
34:35
The MAD Podcast with Matt Turck
Рет қаралды 12 М.
Why use DuckDB in your data pipelines ft. Niels Claeys
22:26
MotherDuck
Рет қаралды 15 М.
The Harsh Reality of Being a Data Engineer
14:21
Jash Radia
Рет қаралды 217 М.
Airflow for Beginners - Run Spotify ETL Job in 15 minutes!
16:38
Karolina Sowinska
Рет қаралды 138 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 152 М.
Какой ПК нужен для Escape From Tarkov?
0:48
CompShop Shorts
Рет қаралды 247 М.
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 27 МЛН
5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят
0:59
Кибер Андерсон
Рет қаралды 1,6 МЛН
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 603 М.
сюрприз
1:00
Capex0
Рет қаралды 1,6 МЛН
i like you subscriber ♥️♥️ #trending #iphone #apple #iphonefold
0:14
How charged your battery?
0:14
V.A. show / Магика
Рет қаралды 6 МЛН