34. Databricks - Spark: Data Skew Optimization

  Рет қаралды 28,506

Raja's Data Engineering

Raja's Data Engineering

Күн бұрын

Пікірлер: 33
@Prashanth-os5he
@Prashanth-os5he Жыл бұрын
This is by far the best databricks and spark tutorial series on youtube... great job Raja
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Glad you think so! Thanks for your comment
@sumanmondal8836
@sumanmondal8836 2 жыл бұрын
Thanks, Raja, your explanations are really good...can you please make a video on salting techniques with example? It will be very helpful.
@rajasdataengineering7585
@rajasdataengineering7585 2 жыл бұрын
Thank you Suman. Sure, will make a video on salting
@abhinavsingh1173
@abhinavsingh1173 Жыл бұрын
Your course it best. But problem with you course is that you are not attching the github link for your sample data and code. Irequest you as your audience please do this. Thanks
@srinubathina7191
@srinubathina7191 Жыл бұрын
Awesome content Thank You So much Sir
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Glad you liked it
@skasifali4457
@skasifali4457 2 жыл бұрын
Thanks Raja..Your video is really useful. Can you please create a video on debugging techniques and how we can use spark UI to debug and understand the bottleneck using use cases. Thanks a lot again
@rajasdataengineering7585
@rajasdataengineering7585 2 жыл бұрын
Sure Asif, will post a video on debugging
@swapnilgosawi
@swapnilgosawi Ай бұрын
Do you have a document with all these details ?if yes, that would be great to share on git., Really Great explanation. Thank you !!
@joyo2122
@joyo2122 2 жыл бұрын
You are the best Raja 🙌
@rajunaik8803
@rajunaik8803 Жыл бұрын
Hi Raja, QQ - Does AQE take care of salting and skew hint technique automatically in case of data skewness? Or do we have to explicitly apply them?
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Yes AQE handles data skewness automatically. In later spark versions after 3.0, it is enabled by default. For prior versions of spark, we just need to enable AQE through spark config settings
@rajunaik8803
@rajunaik8803 Жыл бұрын
@@rajasdataengineering7585 thanks alot for your response. Do you have any telegram channel? And may I know your LinkedIn id please
@iamkiri_
@iamkiri_ 11 ай бұрын
Thanks for the video, I have a question.. Is salting technique applied while reading the data from source or during intermediate processing of the application..
@rajasdataengineering7585
@rajasdataengineering7585 11 ай бұрын
It is applied during transformation stage, not at data extraction
@iamkiri_
@iamkiri_ 11 ай бұрын
Thanks Bro
@VishalSharma-hv6ks
@VishalSharma-hv6ks 2 жыл бұрын
You mainly focus on theoretical. It would be great if you write the code for salting as well.
@rajasdataengineering7585
@rajasdataengineering7585 2 жыл бұрын
Sure, will post another video with coding example
@naveenkumarsingh3829
@naveenkumarsingh3829 4 ай бұрын
why cant we use set maxpartitionbytes to get equal size of partitions and handle data skewness?
@Personalcomments
@Personalcomments 2 жыл бұрын
Your videos are very informative. Can you please post a video on Client mode vs Cluster mode vs local
@rajasdataengineering7585
@rajasdataengineering7585 2 жыл бұрын
Sure Merin, will post the video on this topic
@balakrishna61
@balakrishna61 5 ай бұрын
@rajasdataengineering7585 Please explain salting in detail.It's not clear how you parition the German-1,_2 and so on .Each record will become one partition correct in this case?
@SaurabhDestiny18
@SaurabhDestiny18 Жыл бұрын
Hi Tq for such useful videos, i have one question, i am still confused about executor boundary and cores/tasks boundary. In your first video you mentioned executor can have many cores/ram and then this video you mention executor runs in its own jvm process , which means all the cores/tasks are running under one jvm process? Or under than parent jvm process there are many more jvm process are running which are equal to number of cores/tasks?
@sanskarsuman9340
@sanskarsuman9340 2 жыл бұрын
i have doubt: when u say data is partitioned on country and there are five different countries, out of which lets say Germany has 80% of data, so how can I say that germany data is in single partition only? coz partition is determined on the size of the block and 1 parttion = 128mb size, so depending on its size, germany data could be splitted into multiple partitions automatically?
@ndbweurt34485
@ndbweurt34485 Жыл бұрын
same question i had
@supriyakoura7755
@supriyakoura7755 2 ай бұрын
Same question
@sravankumar1767
@sravankumar1767 2 жыл бұрын
Superb
@rajasdataengineering7585
@rajasdataengineering7585 2 жыл бұрын
Thank you
@prathapganesh7021
@prathapganesh7021 7 ай бұрын
thank you
@rajasdataengineering7585
@rajasdataengineering7585 7 ай бұрын
Welcome!
@tanushreenagar3116
@tanushreenagar3116 2 жыл бұрын
nice
@rajasdataengineering7585
@rajasdataengineering7585 2 жыл бұрын
Thanks
35. Databricks & Spark: Interview Question - Shuffle Partition
5:52
Raja's Data Engineering
Рет қаралды 19 М.
Я сделала самое маленькое в мире мороженое!
00:43
бабл ти гель для душа // Eva mash
01:00
EVA mash
Рет қаралды 1,4 МЛН
the balloon deflated while it was flying #tiktok
00:19
Анастасия Тарасова
Рет қаралды 33 МЛН
How Salting Can Reduce Data Skew By 99%
28:55
Afaque Ahmad
Рет қаралды 9 М.
66. Databricks | Pyspark | Delta: Z-Order Command
14:16
Raja's Data Engineering
Рет қаралды 22 М.
Spark Data Skew
18:34
The Data Tech
Рет қаралды 4,8 М.
Apache Spark Architecture - EXPLAINED!
1:15:10
Databricks For Professionals
Рет қаралды 4,9 М.
75. Databricks | Pyspark | Performance Optimization - Bucketing
22:03
Raja's Data Engineering
Рет қаралды 19 М.
Why Data Skew Will Ruin Your Spark Performance
12:36
Afaque Ahmad
Рет қаралды 6 М.
Я сделала самое маленькое в мире мороженое!
00:43