Tech Talk | Diving into Delta Lake Part 1: Unpacking the Transaction Log

  Рет қаралды 34,041

Databricks

Databricks

Күн бұрын

Пікірлер: 21
@Databricks
@Databricks 4 жыл бұрын
Check out the Online Meetup playlist for video recordings of these tech talks. This one will be available later today! - dbricks.co/youtube-meetups
@Databricks
@Databricks 4 жыл бұрын
- Watch Part 2, Enforcing and Evolving the Schema: @ - Watch Part 3: How do DELETE, UPDATE and MERGE work: kzbin.info/www/bejne/bZbanpaap96fqaM
@machinelearninginreallife3558
@machinelearninginreallife3558 3 жыл бұрын
From what you've said, I understand that data versioning is not a real component of DeltaLake. What we want is only to avoid some mistakes (mistaken delete). Am I right?
@dheerajkumarsolanki5716
@dheerajkumarsolanki5716 4 жыл бұрын
There is default 30 days of transaction log retention period. So, after 30 days the older transaction logs files are automatically deleted? Similar to this, what happened to logically deleted data files after deletedFileRetentionDuration period, is they are automatically deleted or we have to manually delete it?
@no_more_free_nicks
@no_more_free_nicks 4 жыл бұрын
No, not everybody knows what a Data Lake is, so thanks for explaining it briefly.
@machinelearninginreallife3558
@machinelearninginreallife3558 3 жыл бұрын
I'm not sure to understand. Can we keep data for more than 30 days? Is it a bad practice? Is it even possible?
@amitjaju3351
@amitjaju3351 Жыл бұрын
Hello Burak, I need one small help .Could you please tell me if we are performing delete operation on Delta table which later and if we need to keep of records which we deleted then can we do that from transaction log folder or is there any other way bh which we can keep track like which record we deleted if someone ask us in future. Waiting for your response. Thanks
@TheIceSpinner
@TheIceSpinner 4 жыл бұрын
You don't mention how you actually store reads and writes. Do you store them differentially, and if so, what is the unit? So eg. when you delete a single row in a dataframe, is it only the deleted row that's stored in the new parquet, with some kind of flag, or the whole dataset is duplicated (minus that row)?
@dennyglee
@dennyglee 4 жыл бұрын
There are new Parquet files that are created so that way you can have time travel. You can see which Parquet files are created within the transaction log.
@stuckinamomentt
@stuckinamomentt 4 жыл бұрын
So Vacuum does not remove log files (due to GDPR), then when are the log files cleaned up to avoid growing indefinitely?
@dennylee4934
@dennylee4934 4 жыл бұрын
That's correct, VACUUM does not remove the logs - only the data (parquet) files. Note that the logs are converted from JSON to Parquet which subsequently improves the performance of reading the log.
@stuckinamomentt
@stuckinamomentt 4 жыл бұрын
@@dennylee4934 Thanks, and I believe delta.logRetentionDuration controls how to clean up the logs
@harikrishnasiliveri1364
@harikrishnasiliveri1364 4 жыл бұрын
@@dennylee4934 In that case, once we do repartition, we cant achieve time-travel? (since logs are not pointing to the data files anymore) is that correct?
@bhanu4j
@bhanu4j 4 жыл бұрын
Can you share the link for this python notebook. I did not find it.
@dennyglee
@dennyglee 4 жыл бұрын
The notebook link is hiding in the description - here you go: github.com/dennyglee/databricks/blob/master/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Diving%20Into%20Delta%20Lake:%20Unpacking%20The%20Transaction%20Log.py
@ArturSukhenko
@ArturSukhenko 4 жыл бұрын
my_table/date=2019-01-01. Parquet doesn't support date format :) So date is string there?
@kevingomez-yo3or
@kevingomez-yo3or 4 жыл бұрын
date='2019-01-01'.patquet
@Sarmoung-Biblioteca
@Sarmoung-Biblioteca 3 жыл бұрын
Otimo Video !! Obrigado !!
@kevingomez-yo3or
@kevingomez-yo3or 4 жыл бұрын
Can we have the slides?
@dennylee4934
@dennylee4934 4 жыл бұрын
Sure, you can find them in our tech-talks repo at: github.com/databricks/tech-talks/tree/master/2020-03-26%20%7C%20Diving%20into%20Delta%20Lake%20-%20Unpacking%20the%20Transaction%20Log
@irochkalviv
@irochkalviv 3 жыл бұрын
Burak Yavuz, A couple of corrections: 1. Turks arrived in Anatolia (from Central Asia), starting the 11th century, no need to falsify history with the intention of giving some rights to occupy Anatolia and the rest of so called "Turkey" 2. 1453 is also the beginning of long five centuries of abuse, looting, oppression and massacres of the native Christian populations 3. A very important important date seems to be omitted: !915, the genocide of native christian populations by the Turks Actually, the Turkic tribes that invaded Anatolia have many similarities with the fighters of the Islamic state ISIS. Is it why so called "Turkey" supported them?
Diving into Delta Lake: Unpacking the Transaction Log
29:31
Databricks
Рет қаралды 4,8 М.
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
37:51
bayGUYS
Рет қаралды 581 М.
Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
1:27:52
Data Lakehouse: An Introduction
25:00
Bryan Cafferky
Рет қаралды 22 М.
Making Apache Spark™ Better with Delta Lake
58:10
Databricks
Рет қаралды 180 М.
What is Apache Iceberg?
12:54
IBM Technology
Рет қаралды 33 М.
Intro to Delta Lake
51:37
Databricks
Рет қаралды 30 М.
Advancing Spark - Delta Deletion Vectors
17:02
Advancing Analytics
Рет қаралды 3,6 М.
Diving into Delta Lake 2.0
29:37
Databricks
Рет қаралды 4,8 М.
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
37:51
bayGUYS
Рет қаралды 581 М.