Spark Performance Tuning | Handling DATA Skewness | Interview Question

  Рет қаралды 24,641

TechWithViresh

TechWithViresh

Күн бұрын

Пікірлер: 34
@sahiba2227
@sahiba2227 3 жыл бұрын
7:24, repartition does full shuffling and hence creates equal size partitions. i.e It always guarantess the equal sized partitions.
@desparadoking8209
@desparadoking8209 3 жыл бұрын
Very informative video 👍🙂
@brothermalcolm
@brothermalcolm 2 жыл бұрын
Great video, like the pace, like the presentation.
@TechWithViresh
@TechWithViresh 2 жыл бұрын
Glad you liked it!
@ajaywade9418
@ajaywade9418 Жыл бұрын
video from 11:30, we are adding random key to exiting towerid key for Example. tower id: 101 and salt key : 67 then 101+67= 168 hash value of the 168 would be a final value right. what in case of partition column is string datatype. ??
@TechWithViresh
@TechWithViresh Жыл бұрын
Incase of strings, we can add surrogate keys, based on string column values and then do the salting.
@Dsmehra379
@Dsmehra379 4 жыл бұрын
thank you so much for this video, but i am not able to find the 2nd part of this video.. Can you please comment the link for the part 2 video
@vijeandran
@vijeandran 4 жыл бұрын
Hi Viresh, Thanks for the video.... How can we achieve salting technique in Pyspark?
@AdityaKommu1
@AdityaKommu1 4 жыл бұрын
Hello, Your videos are very good, Can you please do a video on incremental data load and full data load by taking an example?
@udaynayak4788
@udaynayak4788 2 жыл бұрын
Hi Viresh, can you please share the link for part 2
@ramkumarananthapalli7151
@ramkumarananthapalli7151 3 жыл бұрын
Thank you for making this video. Could you suggest on which column mean, medium and the mode are calculated?
@bentchow
@bentchow Жыл бұрын
The columns are those that are being shuffled by such as the join columns or group by columns. There is data skew when the distribution is not normal.
@vivekpuurkayastha1580
@vivekpuurkayastha1580 2 жыл бұрын
If the partition key is non numeric then how to perform salting? like your tower ids were numeric, but if instead of being 1, 2, .. they are to be A, B, ...
@Gecasomx
@Gecasomx 3 жыл бұрын
Thanks for the video, no part 2 tho?
@SpiritOfIndiaaa
@SpiritOfIndiaaa 4 жыл бұрын
thank you so much , really good , so what is the difference b/w isolation salting and salting ? and what is difference b/w , isolation map join & map join ??
@rishigc
@rishigc 4 жыл бұрын
@TechWithViresh I simply love your videos. I have watched your other tutorial videos too. They are awesome. I am interested in knowing how to do Iterative Broadcast Join with the SQL API. Any help is highly appreciated. Can you pls advise.
@ayanbizz
@ayanbizz 4 жыл бұрын
Nice explanation.A couple of questions 1) Repartitioning does ensure the data distribution is not skewed (unlike coalesce) 2) You said repartitioning uses the hash value to distribute the data (are you talking about bucketing ?)
@TechWithViresh
@TechWithViresh 4 жыл бұрын
There are two provided partitioners in Spark 1. Hash partitioner and 2. Range partitioner.Default is Hash one.
@harshvardhansolanki1466
@harshvardhansolanki1466 4 жыл бұрын
If you repartition on column, there you can get skewed data. If you repartition by number of parts then distribution may be almost equal.
@harshvardhansolanki1466
@harshvardhansolanki1466 4 жыл бұрын
Thank you so much for the video. I seek some clarification though. In your example you did mapPartition. Means for each partition of different keys, you updated the key with salt. But still the records remained in the respective partitions only. How will those records be shuffled across partitions for equal distribution?
@TechWithViresh
@TechWithViresh 4 жыл бұрын
Partition will change with the change in the key, as it is essentially the hascode of key+salt now.
@harshvardhansolanki1466
@harshvardhansolanki1466 4 жыл бұрын
@@TechWithViresh I tried it so I believe a new DF will have to be created and REPARTITIONED again! in order for the records to be shuffle by updated salted keys. It wont just trigger shuffle on key update in mapPartition function! That only makes sense.
@jaiharsad7121
@jaiharsad7121 4 жыл бұрын
Hi sir, pls upload the spark interview question videos which were present earlier.. I'm not able to find them in your playlist
@TechWithViresh
@TechWithViresh 4 жыл бұрын
All the videos are uploaded, please check:)
@chilukapavan6344
@chilukapavan6344 5 жыл бұрын
Awesome video 🙏...can you pls share part2 video link
@TechWithViresh
@TechWithViresh 4 жыл бұрын
Coming Soon!! , Thanks :)
@pareshpal3533
@pareshpal3533 3 жыл бұрын
@@TechWithViresh when ?
@thanoojbharateeyudu3786
@thanoojbharateeyudu3786 4 жыл бұрын
We could loose our key join by Salting key adding random numbers If we want to do join with the same key then problem May be join key could be the different on other than salted column
@aneksingh4496
@aneksingh4496 4 жыл бұрын
have u uploaded part 2 of this
@TechWithViresh
@TechWithViresh 4 жыл бұрын
Check out other videos in the playlist for performance optimization and executor tuning.
@rishigc
@rishigc 4 жыл бұрын
where is Part 2 ?
@Hk-eo5yr
@Hk-eo5yr 5 жыл бұрын
can u share part 2 video
@TechWithViresh
@TechWithViresh 4 жыл бұрын
Coming Soon!! , Thanks :)
@bhuneshwarsingh630
@bhuneshwarsingh630 5 жыл бұрын
please give some solid coding example with explaination
Spark Performance Tuning | EXECUTOR Tuning | Interview Question
18:19
TechWithViresh
Рет қаралды 32 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 18 МЛН
Правильный подход к детям
00:18
Beatrise
Рет қаралды 10 МЛН
Spark Interview Question | Bucketing | Spark SQL
12:06
TechWithViresh
Рет қаралды 14 М.
Spark Scenario Interview Question | Persistence Vs Broadcast
8:20
TechWithViresh
Рет қаралды 13 М.
Spark Performance Tuning | Avoid GroupBy | Interview Question
8:37
TechWithViresh
Рет қаралды 11 М.
Broadcast Joins & AQE (Adaptive Query Execution)
20:37
Afaque Ahmad
Рет қаралды 8 М.
Why Data Skew Will Ruin Your Spark Performance
12:36
Afaque Ahmad
Рет қаралды 7 М.
Performance Tuning in Spark
14:13
CloudFitness
Рет қаралды 7 М.
Apache Spark Memory Management | Unified Memory Management
7:18