Simplify and Scale Data Engineering Pipelines with Delta Lake

  Рет қаралды 27,745

Databricks

Databricks

Күн бұрын

Пікірлер: 12
@KoushikPaulliveandletlive
@KoushikPaulliveandletlive 4 жыл бұрын
Wonderful Demonstration and very handy notebook. Following are my assumptions. 1. Deltalake keeps multiple version of the data( like HBASE ) . 2. Deltalake takes care of the automicity for the user showing only the latest file if not specified otherwise. 3. Deltalake checks the schema before appending to prevent corruption of the table, this makes developers job easy, similar things can be achieved with manual effort like manually mentioning the schema instead of infering it. 4. In case of update it always overwrites the entire table or the entire partition(dataframes are immutable) . Questions. 1. If it keeps multiple version is there a default limit for number of versions ? 2. As it keeps multiple versions so is it only for smaller tables ? for tables in terabytes wont it be a waste of space? 3. In relational DB data is tightly coupled with metadata/schema , so we can only get the data only from the table not the data files . But in hive / spark this is different. external tables are also allowed . Without having access the metadata, we can recreate the table . How it is handled in DeltaLake , because we have multiple snapshot/version of the same table , without the log/metadata will someone be able to access it? In hive/Spark multiple table with different tool ( hive, presto, spark) can be created on the same data. Can other tool share the same data with deltalake ?
@vinyasshetty4042
@vinyasshetty4042 4 жыл бұрын
For updates, it will not overwrite the entire table, but look at the files that has the data that needs to be updated and create the new copy of only those files . Such files will have the updates in them + non update records in that file.To eventually clean up the older version you will have to run a vacuum command. Currently only sparksql works for querying the delta location but I believe they are working on making presto, hive work with it.
@jasonabhi
@jasonabhi 4 жыл бұрын
Amazing Hands On Session
@CoopmanGreg
@CoopmanGreg Жыл бұрын
If the streaming / batch notebook you demonstrated were being run in a workflow and and lets say100k rows have streamed in successfully, but then an error occurs and the job fails. As I understand it, the 100K rows and all other changes that occurred in the workflow would be automatically rolled back. Is this correct?
@andyharman1771
@andyharman1771 4 жыл бұрын
Starts at 3:10
@Databricks
@Databricks 4 жыл бұрын
Thanks Andy, I trimmed it. Video starts right at 0:00
@nit46hin
@nit46hin 4 жыл бұрын
Great demo... very useful for learning delta architecture
@Databricks
@Databricks 4 жыл бұрын
Thanks for the feedback Nithin! Glad you enjoyed it.
@nit46hin
@nit46hin 4 жыл бұрын
Can you help to share the steps on how to import the notebook from the github link to databricks community edition.
@dennylee4934
@dennylee4934 4 жыл бұрын
Please refer to the "Importing Notebooks" section of github.com/delta-io/delta/tree/master/examples/tutorials/saiseu19#importing-notebooks for step-by-step instructions. HTH!
Beyond Lambda: Introducing Delta Architecture
57:35
Databricks
Рет қаралды 36 М.
Getting Data Ready for Data Science with Delta Lake and MLflow
58:45
didn't manage to catch the ball #tiktok
00:19
Анастасия Тарасова
Рет қаралды 33 МЛН
Flipping Robot vs Heavier And Heavier Objects
00:34
Mark Rober
Рет қаралды 59 МЛН
Making Apache Spark™ Better with Delta Lake
58:10
Databricks
Рет қаралды 178 М.
Intro to Databricks Lakehouse Platform Architecture and Security
28:47
Data Lakehouse: An Introduction
25:00
Bryan Cafferky
Рет қаралды 21 М.
Lakehouse with Delta Lake Deep Dive Training
2:41:52
Databricks
Рет қаралды 54 М.
gradle training
1:31:42
hartraft888
Рет қаралды 48
Diving into Delta Lake: Unpacking the Transaction Log
29:31
Databricks
Рет қаралды 4,7 М.
Lunch and Learn Cybersecurity Playbook: Guide to Protect Your Business with Eric Vicencio
52:30
didn't manage to catch the ball #tiktok
00:19
Анастасия Тарасова
Рет қаралды 33 МЛН