coalesce vs repartition vs partitionBy in spark | Interview question Explained

  Рет қаралды 5,819

GK Codelabs

GK Codelabs

Күн бұрын

Hi All,
In this video, I have explained the concepts of coalesce, repartition, and partitionBy in apache spark.
To become a GKCodelabs Extended plan member you can check the below links, and purchase the Big Data end to end pipeline course in your preferred language Python or SCALA
PySpark course available at
courses.gkcode...
Spark + SCALA course available at
courses.gkcode...
End to End pipeline Introduction Videos:
Pyspark End to End Pipeline
• BIG DATA COMPLETE PROJ... ​
Spark + Scala End to End Pipeline
• BIG DATA complete PROJ... ​
Starter Pack available at just: ₹549 (For Indian Payments) or $9 (For non-Indian payments)
Extended Pack available at just: ₹1299 (For Indian Payments) or $19 (For non-Indian payments)
Queries? Write to us at support@gkcodelabs.com
Website: www.gkcodelabs...​ In this video I have shared my day-2 experience as a Big Data Engineer and shared with you the usual tasks, assignments, call, and routines in my life as a Big Data engineer.
To become a GKCodelabs Extended plan member you can check the below links, and purchase the Big Data end to end pipeline course in your preferred language Python or SCALA
PySpark course available at
courses.gkcode...
Spark + SCALA course available at
courses.gkcode...
End to End pipeline Introduction Videos:
Pyspark End to End Pipeline
• BIG DATA COMPLETE PROJ... ​
Spark + Scala End to End Pipeline
• BIG DATA complete PROJ... ​
Starter Pack available at just: ₹549 (For Indian Payments) or $9 (For non-Indian payments)
Extended Pack available at just: ₹1299 (For Indian Payments) or $19 (For non-Indian payments)
Queries? Write to us at: support@gkcodelabs.com
Website: www.gkcodelabs...

Пікірлер: 5
@johnsonrajendran6194
@johnsonrajendran6194 3 жыл бұрын
Nice explanation!!
@NikhileshwarYanamanagandla
@NikhileshwarYanamanagandla 10 ай бұрын
When you do repartition and then partitionby already data is partitioned now based on partitionby column they why no of part file depend on repartition() again?
@srikanthk8261
@srikanthk8261 3 жыл бұрын
Good explanation. I have question as you mentioned when your doing partition by age columns that will creating 3 partitions bcoz we have three age groups here. Let's assume I have 1000 unique Ids in a dataset. I have provided partition by Id column then how many partition it will create. On which basis it will create partitions. Could you please brief about this if you have time. Thanks Srikanth kita
@GKCodelabs
@GKCodelabs 3 жыл бұрын
Good catch 😊! I will try to answer this, in as simple way as possible, but it will have some conditions 😉 (distributed computing always has a lot to given's and provided's) 😜 So for your case: It will be 1000 partitions (condition: You should have 1000+ cores on your cluster) Else it will be equal to your number of cores (condition: Each core could handle the amount of data which it is processing) Else it can be slightly more than your number of cores, in case some cores were not able to processes the data given to them, and processed rest of it in next cycle (task). Hope i was able to answer your question.!
@MiRayalaseemaPillakai
@MiRayalaseemaPillakai 3 жыл бұрын
1st view
Spark  - Repartition Or  Coalesce
10:02
Data Engineering
Рет қаралды 19 М.
Partition vs bucketing | Spark and Hive Interview Question
9:15
小天使和小丑太会演了!#小丑#天使#家庭#搞笑
00:25
家庭搞笑日记
Рет қаралды 59 МЛН
Кәсіпқой бокс | Жәнібек Әлімханұлы - Андрей Михайлович
48:57
Я сделала самое маленькое в мире мороженое!
00:43
Кушать Хочу
Рет қаралды 4,6 МЛН
Spark Data Skew
18:34
The Data Tech
Рет қаралды 4,7 М.
Spark reduceByKey Or groupByKey
12:06
Data Engineering
Рет қаралды 16 М.
Spark Session vs Spark Context | Spark Internals
8:08
Data Savvy
Рет қаралды 72 М.
小天使和小丑太会演了!#小丑#天使#家庭#搞笑
00:25
家庭搞笑日记
Рет қаралды 59 МЛН