Automated Testing For Protecting Data Pipelines from Undocumented Assumptions

  Рет қаралды 4,986

Databricks

Databricks

Күн бұрын

Untested, undocumented assumptions about data in data pipelines create risk, waste time and erode trust in data products. Automated testing has been one of the biggest productivity boosters in modern software development and essential for managing complex codebases. Data science and engineering have been largely missing out on automated testing. This talk introduces Great Expectations, an open-source python framework for bringing data pipelines and products under test. Great Expectations is a python framework for bringing data pipelines and products under test. Like assertions in traditional python unit tests, Expectations provide a flexible, declarative language for describing expected behavior. Unlike traditional unit tests, Great Expectations applies Expectations to data instead of code. We strongly believe that most of the pain caused by accumulating pipeline debt is avoidable.
We built Great Expectations to make it very, very simple to:
-Set up your testing framework early
-Capture those early learnings while they’re still fresh
-Systematically validate new data against them. It’s the best tool we know of for managing the complexity that inevitably grows within data pipelines.
-We hope it helps you as much as it’s helped us. Main takeaways:
This talk will teach you how to use Great Expectations to get more done with data, faster
-Save time during data cleaning and munging.
-Accelerate ETL and data normalization.
-Streamline analyst-to-engineer handoffs.
-Monitor data quality in production data pipelines and data products.
-Simplify debugging for data pipelines if (when) they break.
Connect with us:
Website: databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com/databricks-nam...

Пікірлер: 1
@sbanerjea
@sbanerjea 3 жыл бұрын
Loved the great intro. I wish it was little longer and had some demos. I look forward to use Great Expectations in my next data pipeline.
Learn to Efficiently Test ETL Pipelines
35:13
Databricks
Рет қаралды 10 М.
Best KFC Homemade For My Son #cooking #shorts
00:58
BANKII
Рет қаралды 62 МЛН
50 YouTubers Fight For $1,000,000
41:27
MrBeast
Рет қаралды 199 МЛН
Задержи дыхание дольше всех!
00:42
Аришнев
Рет қаралды 3,3 МЛН
Clown takes blame for missing candy 🍬🤣 #shorts
00:49
Yoeslan
Рет қаралды 40 МЛН
Unit testing with Databricks | Jonathan Neo | November 2021
44:45
Melbourne Databricks User Group
Рет қаралды 16 М.
Data Pipelines Explained
8:29
IBM Technology
Рет қаралды 144 М.
CI/CD in Databricks with Azure DevOps  - 2022.12.07
51:58
Stephanie Rivera
Рет қаралды 3,9 М.
Functional Data Engineering - A Set of Best Practices | Lyft
39:43
Data Council
Рет қаралды 77 М.
ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК
1:00
VA-PC
Рет қаралды 2,4 МЛН
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 16 МЛН
iPhone, Galaxy или Pixel? 😎
0:16
serg1us
Рет қаралды 1,3 МЛН
iPhone 15 Pro в реальной жизни
24:07
HUDAKOV
Рет қаралды 452 М.