Spark Performance Tuning | EXECUTOR Tuning | Interview Question

  Рет қаралды 32,631

TechWithViresh

TechWithViresh

Күн бұрын

Пікірлер: 40
@RohanKumar-mh3pt
@RohanKumar-mh3pt Жыл бұрын
Very Nice and clear explanation before this video i was very confused regarding executor tuning part now after this video it is now crystal clear.
@TheFaso1964
@TheFaso1964 3 жыл бұрын
Dude. I feel like I knew nothing about spark in particular before I got my hands dirty with your performance improvement solutions. Appreciate a lot, got my subscription. Cheers from Germany !
@TechWithViresh
@TechWithViresh 3 жыл бұрын
Thanks a lot :)
@sankarn6016
@sankarn6016 3 жыл бұрын
Nice Explanation!! can we use this approach for tuning/triggering multiple jobs in cluster ??
@nivedita5639
@nivedita5639 4 жыл бұрын
Very very helpful. Thanks
@fahad_ishaqwala
@fahad_ishaqwala 4 жыл бұрын
Excellent videos brother. Much Appreciated. Can you do a video on Performance Tuning for Spark Structured Streaming jobs as well.
@TechWithViresh
@TechWithViresh 3 жыл бұрын
Surely, Working on a video for the same.
@aneksingh4496
@aneksingh4496 4 жыл бұрын
As always best !!! Please include some real simulation example s
@whatever-genuine7945
@whatever-genuine7945 2 жыл бұрын
How to allocate executers, core and memory if there are multiple jobs running on the cluster?
@giyama
@giyama 4 жыл бұрын
This calculation is for just one job, what would be the calculation for multiple jobs running simultaneously? And how to calculate based on the volumetry? (Great job btw, tks!)
@SidharthanPV
@SidharthanPV 4 жыл бұрын
Dynamic allocation is currently supported. You can set the max limit, yarn takes care of managing it in case of multiple instances running parallel.
@umeshkatighar3635
@umeshkatighar3635 Жыл бұрын
What If each node has only 8cores?? How does spark allocate 5cores per jvm ?
@KNOW-HOW-HUB
@KNOW-HOW-HUB 2 жыл бұрын
To process 1TB data what could be the best approach we have to follow
@ranju184
@ranju184 4 жыл бұрын
excellent explanation. Thanks
@Dipanki-c7k
@Dipanki-c7k Жыл бұрын
What if I have multiple spark jobs in parallel in on spark session
@DilipDiwakarAricent
@DilipDiwakarAricent 4 жыл бұрын
If not configure , so what will be the default number choose by spark.
@inferno9004
@inferno9004 4 жыл бұрын
@5:10 can you explain how 20GB + 7% of 20GB is 23GB and not 21.4GB ?
@rockngelement
@rockngelement 4 жыл бұрын
calculation mistake bhai, anyway it doesn't affect the info in this video
@manisekhar4446
@manisekhar4446 4 жыл бұрын
According to your eg. How much GB if data can be processed by spark job??
@mdmoniruzzaman703
@mdmoniruzzaman703 Жыл бұрын
Hi, 10 nodes means including the master node? i have a configuration like this: "Instances": { "InstanceGroups": [ { "Name": "Master nodes", "Market": "SPOT", "InstanceRole": "MASTER", "InstanceType": "m5.4xlarge", "InstanceCount": 1 }, { "Name": "Worker nodes", "Market": "SPOT", "InstanceRole": "CORE", "InstanceType": "m5.4xlarge", "InstanceCount": 9 } ], "KeepJobFlowAliveWhenNoSteps": false, "TerminationProtected": false },
@sivavulli7487
@sivavulli7487 3 жыл бұрын
Hi Sir , thank you for your nice explanation but if only one job is running over the cluster , that is more meaningful and understandable ..what if there are so many jobs running on the same cluster ??
@TechWithViresh
@TechWithViresh 3 жыл бұрын
Based on the executor params passed for the each , that defines the container boundaries or running scope for that.If there are not enough resources available to be allocated, then that job(s) would be in queue.
@sivavulli7487
@sivavulli7487 3 жыл бұрын
@@TechWithViresh so executor core can run only one job task at a time ..so if that is the case , in your examples , if there are 2 jobs on the same cluster, we need to take half of the resources mentioned in that video or better to take whatever you mentioned ..then first job runs successfully , it will take second job??( Until first job completed, second will be in queue).. could you please suggest best approach...alltogather before giving spark resource configurations for any job , just if we look at the cluster configuration is enough or need to look at how many other jobs running on the same cluster??
@TechWithViresh
@TechWithViresh 3 жыл бұрын
@@sivavulli7487 Yes, we should take into account, how many concurrent jobs need to be run .How better approach followed these days to have interactive clusters for each job..
@sivavulli7487
@sivavulli7487 3 жыл бұрын
@@TechWithViresh okay ..thank you sir ..if possible , pls can you make a video how to give the resources if there are multiple concurrent jobs running on the same cluster...
@anusha0504
@anusha0504 4 жыл бұрын
What are advanced spark technologies
@SpiritOfIndiaaa
@SpiritOfIndiaaa 4 жыл бұрын
thanks bro , really wonderful explanation.... bro , can you make some vid on how to analyze Stages , Physical Plans etc on SparkUI ...based on that how to fix the issues regarding optimization ... its always confusing a lot to interpret these sql explain plans?
@TechWithViresh
@TechWithViresh 4 жыл бұрын
Thanks very much, check out the video on stage details
@SpiritOfIndiaaa
@SpiritOfIndiaaa 4 жыл бұрын
@@TechWithViresh i dont find it, any url plz
@snehakavinkar2240
@snehakavinkar2240 4 жыл бұрын
How to decide these configurations for a certain volume of data? Thank you.
@TechWithViresh
@TechWithViresh 4 жыл бұрын
idea is to make sure max 5 tasks per executor, and the partition size is within the memory allocated to exec
@snehakavinkar2240
@snehakavinkar2240 4 жыл бұрын
Is there any upper or lower limit to the amount of memory per executor?
@TechWithViresh
@TechWithViresh 4 жыл бұрын
depends on the total memory resource available in your cluster.
@rikuntri
@rikuntri 4 жыл бұрын
One executor is having four core so it can handle one task or 4 at a time
@the_high_flyer
@the_high_flyer 4 жыл бұрын
No of cores = no of parallel task
@girijapanda1306
@girijapanda1306 3 жыл бұрын
7% of 21GB = 1.4 GB am I missing something here
@RAB-fu4rw
@RAB-fu4rw 3 жыл бұрын
7% of 21 gb is 3gb ????? how come it is 1.47 GB how did u arrive at 3 GB ???
@komalkarnam1429
@komalkarnam1429 3 жыл бұрын
Yes had the same question
@divyar7991
@divyar7991 2 жыл бұрын
For yarn you can choose between 6 to 10 per
@KiranKumar-cg3yg
@KiranKumar-cg3yg 2 жыл бұрын
Means what I know is nothing.
Spark Performance Tuning | Handling DATA Skewness | Interview Question
16:08
Spark Interview Questions | Spark Context Vs Spark Session
9:26
TechWithViresh
Рет қаралды 19 М.
How to Read Spark DAGs | Rock the JVM
21:12
Rock the JVM
Рет қаралды 24 М.
Spark Interview Question | Bucketing | Spark SQL
12:06
TechWithViresh
Рет қаралды 14 М.
Performance Tuning in Spark
14:13
CloudFitness
Рет қаралды 7 М.
Spark Scenario Interview Question | Persistence Vs Broadcast
8:20
TechWithViresh
Рет қаралды 13 М.
Spark Join Without Shuffle | Spark Interview Question
10:42
TechWithViresh
Рет қаралды 21 М.
Apache Spark Memory Management
23:09
Afaque Ahmad
Рет қаралды 15 М.
Advancing Spark - Understanding the Spark UI
30:19
Advancing Analytics
Рет қаралды 55 М.