Making Apache Spark™ Better with Delta Lake

  Рет қаралды 174,603

Databricks

Databricks

Күн бұрын

Join Michael Armbrust, head of Delta Lake engineering team, to learn about how his team built upon Apache Spark to bring ACID transactions and other data reliability technologies from the data warehouse world to cloud data lakes.
Apache Spark is the dominant processing framework for big data. Delta Lake adds reliability to Spark so your analytics and machine learning initiatives have ready access to quality, reliable data. This webinar covers the use of Delta Lake to enhance data reliability for Spark environments.
Topics areas include:
- The role of Apache Spark in big data processing
- Use of data lakes as an important part of the data architecture
- Data lake reliability challenges
- How Delta Lake helps provide reliable data for Spark processing
- Specific improvements improvements that Delta Lake adds
- The ease of adopting Delta Lake for powering your data lake
See full Getting Started with Delta Lake tutorial series here:
databricks.com/getting-starte...
Get the Delta Lake: Up & Running by O’Reilly ebook preview to learn the basics of Delta Lake, the open storage format at the heart of the lakehouse architecture. Download the ebook: dbricks.co/3IIcVCg

Пікірлер: 16
@sonagy23
@sonagy23 2 жыл бұрын
28:32 How does Delta Lake work? 28:50 Delta On Disk 29:59 Table = result of a set of actions 31:31 Implementing Atomicity 32:48 Ensuring Serializability 33:33 Solving Conflicts Optimistically 35:08 Handling Massive Metadata 36:32 Roadmap 38:20 QnA
@kbkonatham1701
@kbkonatham1701 Жыл бұрын
hi kim thanks for support , you are from ? , i am from india.
@rakshithvenkatesh2773
@rakshithvenkatesh2773 3 жыл бұрын
I see this whole "Hierarchical Data Pipeline" strategy being talked about quite a bit these days. We did establish this as part of a ready solution we built for Manufacturing use case using Confluent Kafka + KSQL. But the Data Lake is something i believe will remain/continue to exist as a depot for long term retention of data where AI/DA platforms leverage data from these data lakes for batch processing. I see this story from DataBricks to be a Data-warehouse convergence towards Data Lakes !
@meryplays8952
@meryplays8952 3 жыл бұрын
The architecture comes with a nice VLDB 2020 paper (which the presenter did not mention).
@RossittoS
@RossittoS 3 жыл бұрын
Excellent features!!
@hidemisuzuki965
@hidemisuzuki965 2 жыл бұрын
Where can I download the slides? Thanks!
@Sangeethsasidharanak
@Sangeethsasidharanak 3 жыл бұрын
27.28 on automating data quality. .. isn't it same as we do quality check before we save using custom code..Will there be any additional benefits?
@gustavemuhoza4212
@gustavemuhoza4212 3 жыл бұрын
It's probably the same, but not sure how you could do that on a datalake consistently. As described here, Delta appears to make it easier to do and making it possible to do it as if you were doing it on a relational database.
@moebakry3203
@moebakry3203 3 жыл бұрын
What is the best way to load data from Sql server to Delta lake every 5 seconds?
@NicholasGabriel04
@NicholasGabriel04 10 ай бұрын
debezium
@srh80
@srh80 11 ай бұрын
Wait, people still use comcast and watch TV?
@hanssylvest8390
@hanssylvest8390 3 жыл бұрын
Please give all empl. a better audio recording microphone.
@jacekb4057
@jacekb4057 11 ай бұрын
Or use some AI audio cleaner :D
@rahulpathak3161
@rahulpathak3161 3 жыл бұрын
Thank you and can you please share PPT..
@user-ni4cp7lj6s
@user-ni4cp7lj6s 3 жыл бұрын
www.slideshare.net/databricks/making-apache-spark-better-with-delta-lake
@hanmuster
@hanmuster 3 жыл бұрын
@@user-ni4cp7lj6s Many thanks!
Simplify and Scale Data Engineering Pipelines with Delta Lake
57:53
- А что в креме? - Это кАкАооо! #КондитерДети
00:24
Телеканал ПЯТНИЦА
Рет қаралды 7 МЛН
Mama vs Son vs Daddy 😭🤣
00:13
DADDYSON SHOW
Рет қаралды 35 МЛН
Пранк пошел не по плану…🥲
00:59
Саша Квашеная
Рет қаралды 6 МЛН
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
Darshil Parmar
Рет қаралды 278 М.
Data Warehouse vs Data Lake vs Data Lakehouse
9:32
Jesper Lowgren
Рет қаралды 41 М.
Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
1:27:52
What is a Delta Lake? [Introduction to Delta Lake - Ep. 1]
10:23
Pragmatic Works
Рет қаралды 13 М.
Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction
32:23
Lakehouse with Delta Lake Deep Dive Training
2:41:52
Databricks
Рет қаралды 53 М.
Как правильно выключать звук на телефоне?
0:17
Люди.Идеи, общественная организация
Рет қаралды 1,9 МЛН
Kumanda İle Bilgisayarı Yönetmek #shorts
0:29
Osman Kabadayı
Рет қаралды 1,9 МЛН
$1 vs $100,000 Slow Motion Camera!
0:44
Hafu Go
Рет қаралды 27 МЛН
Сколько реально стоит ПК Величайшего?
0:37
Копия iPhone с WildBerries
1:00
Wylsacom
Рет қаралды 7 МЛН