Partitioning vs Bucketing | Interview Question | PySpark

  Рет қаралды 3,004

pysparkpulse

pysparkpulse

Күн бұрын

Partitioning and bucketing are techniques used to optimize data storage and improve query performance in PySpark. The choice between them depends on the specific use case and the nature of the queries that will be executed on the data.
Sample Data:
date product amount region
01-01-2024 Product_0 0 Region_0
02-01-2024 Product_1 10 Region_1
03-01-2024 Product_2 20 Region_2
04-01-2024 Product_1 30 Region_0
05-01-2024 Product_4 40 Region_1
06-01-2024 Product_0 50 Region_2
07-01-2024 Product_1 60 Region_0
08-01-2024 Product_2 70 Region_1
09-01-2024 Product_2 80 Region_2
10-01-2024 Product_4 90 Region_0
Check out this video and do let me know your doubts we can connect on
linkedIn : / priyam-jain-0946ab199
PWC interview Question:
• Question 11: PWC Inter...
• Question 10: PWC Inter...
Deloitte interview Question:
• Question 9: Deloitte I...
Do subscribe @pysparkpulse for more such Questions.
#pyspark #spark #bigdata #bigdataengineer #dataengineering #dataengineer #deloitte #pwc #mnc

Пікірлер: 4
@rockroll28
@rockroll28 3 ай бұрын
Good information. Constructive criticism: You were explaining too fast. Chart explained can be part of 1 video and practical can be in another video. This way 2 videos of 10 to 12 minutes could have been helpful. Best of luck 👍🏻
@pysparkpulse
@pysparkpulse 3 ай бұрын
Thank you for your feedback will keep this in mind ☺️
@abhishekmalvadkar206
@abhishekmalvadkar206 3 ай бұрын
very well explained 👏
@pysparkpulse
@pysparkpulse 3 ай бұрын
Thank you Abhishek
Flatten Nested Json in PySpark
9:22
GeekCoders
Рет қаралды 3,1 М.
Fake watermelon by Secret Vlog
00:16
Secret Vlog
Рет қаралды 7 МЛН
طردت النملة من المنزل😡 ماذا فعل؟🥲
00:25
Cool Tool SHORTS Arabic
Рет қаралды 33 МЛН
So Cute 🥰
00:17
dednahype
Рет қаралды 45 МЛН
22 Optimize Joins in Spark & Understand Bucketing for Faster joins
28:17
Spark Runtime Architecture (Cluster Mode) | #pyspark  | #databricks
25:38
The TRUTH About High Performance Data Partitioning
22:18
Afaque Ahmad
Рет қаралды 5 М.
Spark - Coalesce vs Repartition
9:01
Big Tech Talk
Рет қаралды 1,5 М.
Database Sharding and Partitioning
23:53
Arpit Bhayani
Рет қаралды 82 М.
Explode and Explode_Outer in PySpark| Databricks |
15:18
GeekCoders
Рет қаралды 1,2 М.
Fake watermelon by Secret Vlog
00:16
Secret Vlog
Рет қаралды 7 МЛН