Рет қаралды 5,301
Tune into DoorDash's journey to migrate from a flaky ETL system with 24-hour data delays, to standardizing a CDC streaming pattern across more than 150 databases to produce near real-time data in a scalable, configurable, and reliable manner.
During this journey, understand how we use Delta Lake to build a self-serve, read-optimized data lake with data latencies of 15, whilst reducing operational overhead. Furthermore, understand how certain tradeoffs like conceding to a non-real-time system allow for multiple optimizations but still permit for OLTP query use-cases, and the benefits it provides.
Talk by: Ivan Peng and Phani Nalluri
Here’s more to explore:
Big Book of Data Engineering: 2nd Edition: dbricks.co/3Xp...
The Data Team's Guide to the Databricks Lakehouse Platform: dbricks.co/46n...
Connect with us: Website: databricks.com
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc
Facebook: / databricksinc