Simplify and Scale Data Engineering Pipelines with Delta Lake

  Рет қаралды 27,899

Databricks

Databricks

Күн бұрын

Пікірлер
@jasonabhi
@jasonabhi 4 жыл бұрын
Amazing Hands On Session
@KoushikPaulliveandletlive
@KoushikPaulliveandletlive 4 жыл бұрын
Wonderful Demonstration and very handy notebook. Following are my assumptions. 1. Deltalake keeps multiple version of the data( like HBASE ) . 2. Deltalake takes care of the automicity for the user showing only the latest file if not specified otherwise. 3. Deltalake checks the schema before appending to prevent corruption of the table, this makes developers job easy, similar things can be achieved with manual effort like manually mentioning the schema instead of infering it. 4. In case of update it always overwrites the entire table or the entire partition(dataframes are immutable) . Questions. 1. If it keeps multiple version is there a default limit for number of versions ? 2. As it keeps multiple versions so is it only for smaller tables ? for tables in terabytes wont it be a waste of space? 3. In relational DB data is tightly coupled with metadata/schema , so we can only get the data only from the table not the data files . But in hive / spark this is different. external tables are also allowed . Without having access the metadata, we can recreate the table . How it is handled in DeltaLake , because we have multiple snapshot/version of the same table , without the log/metadata will someone be able to access it? In hive/Spark multiple table with different tool ( hive, presto, spark) can be created on the same data. Can other tool share the same data with deltalake ?
@vinyasshetty4042
@vinyasshetty4042 4 жыл бұрын
For updates, it will not overwrite the entire table, but look at the files that has the data that needs to be updated and create the new copy of only those files . Such files will have the updates in them + non update records in that file.To eventually clean up the older version you will have to run a vacuum command. Currently only sparksql works for querying the delta location but I believe they are working on making presto, hive work with it.
@andyharman1771
@andyharman1771 4 жыл бұрын
Starts at 3:10
@Databricks
@Databricks 4 жыл бұрын
Thanks Andy, I trimmed it. Video starts right at 0:00
@CoopmanGreg
@CoopmanGreg 2 жыл бұрын
If the streaming / batch notebook you demonstrated were being run in a workflow and and lets say100k rows have streamed in successfully, but then an error occurs and the job fails. As I understand it, the 100K rows and all other changes that occurred in the workflow would be automatically rolled back. Is this correct?
@nit46hin
@nit46hin 4 жыл бұрын
Great demo... very useful for learning delta architecture
@Databricks
@Databricks 4 жыл бұрын
Thanks for the feedback Nithin! Glad you enjoyed it.
@nit46hin
@nit46hin 4 жыл бұрын
Can you help to share the steps on how to import the notebook from the github link to databricks community edition.
@dennylee4934
@dennylee4934 4 жыл бұрын
Please refer to the "Importing Notebooks" section of github.com/delta-io/delta/tree/master/examples/tutorials/saiseu19#importing-notebooks for step-by-step instructions. HTH!
Beyond Lambda: Introducing Delta Architecture
57:35
Databricks
Рет қаралды 36 М.
[Webinar] LLMs for Evaluating LLMs
49:07
Arthur
Рет қаралды 11 М.
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН
Правильный подход к детям
00:18
Beatrise
Рет қаралды 11 МЛН
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
Snowflake Query Pruning: A deep dive
41:15
SELECT
Рет қаралды 248
Data for SaaS   Episode 7   Multi Tenancy with Vector Databases
29:42
Building SaaS on AWS
Рет қаралды 100
2024 12 18 Chicago DataFusion Meetup 03 Xiangpeng Hao
1:06:37
Andrew Lamb
Рет қаралды 204
Unity Catalog Community Meetup - December 5, 2024
26:48
Unity Catalog
Рет қаралды 195
Hive Metastore and AWS Glue Federation in Unity Catalog
4:15
Databricks
Рет қаралды 1,3 М.
Legion Retreat 2024 - The Legion Profiler & Fuzzer - Elliott Slaughter
25:05
Legion Programming System
Рет қаралды 8
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН