Tech Talk | Diving into Delta Lake Part 1: Unpacking the Transaction Log

  Рет қаралды 33,558

Databricks

Databricks

Күн бұрын

Пікірлер: 21
@Databricks
@Databricks 4 жыл бұрын
Check out the Online Meetup playlist for video recordings of these tech talks. This one will be available later today! - dbricks.co/youtube-meetups
@Databricks
@Databricks 4 жыл бұрын
- Watch Part 2, Enforcing and Evolving the Schema: @ - Watch Part 3: How do DELETE, UPDATE and MERGE work: kzbin.info/www/bejne/bZbanpaap96fqaM
@no_more_free_nicks
@no_more_free_nicks 4 жыл бұрын
No, not everybody knows what a Data Lake is, so thanks for explaining it briefly.
@machinelearninginreallife3558
@machinelearninginreallife3558 3 жыл бұрын
From what you've said, I understand that data versioning is not a real component of DeltaLake. What we want is only to avoid some mistakes (mistaken delete). Am I right?
@dheerajkumarsolanki5716
@dheerajkumarsolanki5716 4 жыл бұрын
There is default 30 days of transaction log retention period. So, after 30 days the older transaction logs files are automatically deleted? Similar to this, what happened to logically deleted data files after deletedFileRetentionDuration period, is they are automatically deleted or we have to manually delete it?
@amitjaju3351
@amitjaju3351 10 ай бұрын
Hello Burak, I need one small help .Could you please tell me if we are performing delete operation on Delta table which later and if we need to keep of records which we deleted then can we do that from transaction log folder or is there any other way bh which we can keep track like which record we deleted if someone ask us in future. Waiting for your response. Thanks
@machinelearninginreallife3558
@machinelearninginreallife3558 3 жыл бұрын
I'm not sure to understand. Can we keep data for more than 30 days? Is it a bad practice? Is it even possible?
@Sarmoung-Biblioteca
@Sarmoung-Biblioteca 3 жыл бұрын
Otimo Video !! Obrigado !!
@TheIceSpinner
@TheIceSpinner 4 жыл бұрын
You don't mention how you actually store reads and writes. Do you store them differentially, and if so, what is the unit? So eg. when you delete a single row in a dataframe, is it only the deleted row that's stored in the new parquet, with some kind of flag, or the whole dataset is duplicated (minus that row)?
@dennyglee
@dennyglee 4 жыл бұрын
There are new Parquet files that are created so that way you can have time travel. You can see which Parquet files are created within the transaction log.
@stuckinamomentt
@stuckinamomentt 4 жыл бұрын
So Vacuum does not remove log files (due to GDPR), then when are the log files cleaned up to avoid growing indefinitely?
@dennylee4934
@dennylee4934 4 жыл бұрын
That's correct, VACUUM does not remove the logs - only the data (parquet) files. Note that the logs are converted from JSON to Parquet which subsequently improves the performance of reading the log.
@stuckinamomentt
@stuckinamomentt 4 жыл бұрын
@@dennylee4934 Thanks, and I believe delta.logRetentionDuration controls how to clean up the logs
@harikrishnasiliveri1364
@harikrishnasiliveri1364 4 жыл бұрын
@@dennylee4934 In that case, once we do repartition, we cant achieve time-travel? (since logs are not pointing to the data files anymore) is that correct?
@bhanu4j
@bhanu4j 4 жыл бұрын
Can you share the link for this python notebook. I did not find it.
@dennyglee
@dennyglee 3 жыл бұрын
The notebook link is hiding in the description - here you go: github.com/dennyglee/databricks/blob/master/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Diving%20Into%20Delta%20Lake:%20Unpacking%20The%20Transaction%20Log.py
@ArturSukhenko
@ArturSukhenko 4 жыл бұрын
my_table/date=2019-01-01. Parquet doesn't support date format :) So date is string there?
@kevingomez-yo3or
@kevingomez-yo3or 4 жыл бұрын
date='2019-01-01'.patquet
@kevingomez-yo3or
@kevingomez-yo3or 4 жыл бұрын
Can we have the slides?
@dennylee4934
@dennylee4934 4 жыл бұрын
Sure, you can find them in our tech-talks repo at: github.com/databricks/tech-talks/tree/master/2020-03-26%20%7C%20Diving%20into%20Delta%20Lake%20-%20Unpacking%20the%20Transaction%20Log
@irochkalviv
@irochkalviv 3 жыл бұрын
Burak Yavuz, A couple of corrections: 1. Turks arrived in Anatolia (from Central Asia), starting the 11th century, no need to falsify history with the intention of giving some rights to occupy Anatolia and the rest of so called "Turkey" 2. 1453 is also the beginning of long five centuries of abuse, looting, oppression and massacres of the native Christian populations 3. A very important important date seems to be omitted: !915, the genocide of native christian populations by the Turks Actually, the Turkic tribes that invaded Anatolia have many similarities with the fighters of the Islamic state ISIS. Is it why so called "Turkey" supported them?
Analyzing COVID-19: Can the Data Community Help?
1:02:23
Databricks
Рет қаралды 11 М.
Lakehouse with Delta Lake Deep Dive Training
2:41:52
Databricks
Рет қаралды 54 М.
ДЕНЬ УЧИТЕЛЯ В ШКОЛЕ
01:00
SIDELNIKOVVV
Рет қаралды 4,1 МЛН
Кәсіпқой бокс | Жәнібек Әлімханұлы - Андрей Михайлович
48:57
Когда отец одевает ребёнка @JaySharon
00:16
История одного вокалиста
Рет қаралды 14 МЛН
Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
1:27:52
Diving into Delta Lake: Unpacking the Transaction Log
29:31
Databricks
Рет қаралды 4,7 М.
Beyond Lambda: Introducing Delta Architecture
57:35
Databricks
Рет қаралды 36 М.
Diving into Delta Lake 2.0
29:37
Databricks
Рет қаралды 4,7 М.
AZ-305 Designing Microsoft Azure Infrastructure Solutions Study Cram - Over 100,000 views
3:38:35
John Savill's Technical Training
Рет қаралды 454 М.
Kubernetes 101 workshop - complete hands-on
3:56:03
Kubesimplify
Рет қаралды 1,6 МЛН
ДЕНЬ УЧИТЕЛЯ В ШКОЛЕ
01:00
SIDELNIKOVVV
Рет қаралды 4,1 МЛН