Spark Interview Question | Speculative Execution in Spark | With Demo | LearntoSpark

  Рет қаралды 17,313

Azarudeen Shahul

Azarudeen Shahul

Күн бұрын

Пікірлер: 25
@jagadeeshkoduri9660
@jagadeeshkoduri9660 3 жыл бұрын
U are better, Really explaining based on scenarios..
@sarjfud
@sarjfud 4 жыл бұрын
Nice demo and very well explained. Please continue uploading such nice videos on spark
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
Thanks for your support :)
@barathy.m5589
@barathy.m5589 4 жыл бұрын
Good Explanation.. please put videos on handling duplicates ,Null values using spark
@NishaKumari-op2ek
@NishaKumari-op2ek 4 жыл бұрын
Can you upload a video on spark optimization techniques for interview perspective. This has been asked multiple times. Thank you
@kondareddyp4144
@kondareddyp4144 2 жыл бұрын
Hi bro, can you plz do videos on clusters and types, worker nodes, working nature of cluster in databricks
@sumitkhattar9849
@sumitkhattar9849 3 жыл бұрын
Great videos
@gurunagendra9690
@gurunagendra9690 3 жыл бұрын
Hi , what happens in case the re-initiated task t2 in both the worker nodes will complete with same time period . And which task will be consider either t2 in worker node1 or t2 in worker node2 .
@rushipradhan4704
@rushipradhan4704 Жыл бұрын
What if I dont set the configs for speculative execution in my file before I begin my spark application? Once I encounter a long running task, can I simply open the config file and set the configs related to speculative execution or do I need to kill the application and then set the configs?
@AzarudeenShahul
@AzarudeenShahul Жыл бұрын
Changing the config file in between the active job will not enforce the changes to the job configs. u have two options, either kill the job and retrigger or else add a inline config command to the long running job with if else loop as below, if job_time>threshold: #kill current run and run with below config spark.conf.set("speculative exec","true")
@yaniv54
@yaniv54 4 жыл бұрын
Very well articulated and explained. Can we conclude Speculative execution as one of the spark optimization techniques.
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
Spark itself will take care of this. But, before enabling this in config, we should be aware of resources. If we have a less resource available in our cluster, then it might degrade the performance ..
@yaniv54
@yaniv54 4 жыл бұрын
@@AzarudeenShahul Thanks for the reply. Can you upload a video on spark optimization techniques for interview perspective.
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
Sure, we can plan for series on optimization..
@thesadanand6599
@thesadanand6599 Жыл бұрын
speculative execution is at job level or stage level ?
@pavithranpavi3198
@pavithranpavi3198 3 жыл бұрын
Is data skew issue can handle by speculative execution?
@nitinagrawal6637
@nitinagrawal6637 4 жыл бұрын
Good one, but I just wonder if one task is taking time on one node then how can another node complete the same task earlier?
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
Thanks for your support :) It is not about one long running task, it may go till 25% of the overall pending task.
@nitinagrawal6637
@nitinagrawal6637 4 жыл бұрын
@@AzarudeenShahul Right, but I think that 25% or 30% etc is what we configure to decide when to trigger Speculation Execution. So if 25% is still pending & taking time then what makes us consider that if we start it on another node, it will be completed soon. This is my doubt.
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
There might be many reason for the slowdown of task in one node; Few are, 1. hardware degradation 2.software mis-configuration 3.slow data transfer 4.traffic issue 5.Data availability It might be difficult to detect the cause for slowness since the task still completes successfully. Spark does not try to diagnose and fix the slow running task, instead it it detects the slow running and runs backup task for them. Hope this answers your question
@nitinagrawal6637
@nitinagrawal6637 4 жыл бұрын
@@AzarudeenShahul Thanks for the response but still I feel that void. Means we start a task on any node first when there is the capacity for that task. So in speculative execution we are engaging two nodes for the same task without being sure about the cause nor we cancel the task on the first node. And this way possibility is high that task will be completed by previous node first but still that task was started on new node also. And I think in your example also the task was completed by the first node & the task on new node was cancelled then. So doesn't it seem like wrong understanding of the infrastructure or wrong system configuration? Means I am trying to fill the gap between reality & expectations and my understanding.
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
Its not about the capacity of node, its the slowness as i mentioned above. More you read the application log, more u understand. It comes through experience. And in my example, its not the real project, i simulated a slow running task using sleep, which obviously makes node 1 to complete first.
@ENTERTAINMENTSERVICE
@ENTERTAINMENTSERVICE 3 жыл бұрын
Thanks man
小路飞嫁祸姐姐搞破坏 #路飞#海贼王
00:45
路飞与唐舞桐
Рет қаралды 29 МЛН
What's in the clown's bag? #clown #angel #bunnypolice
00:19
超人夫妇
Рет қаралды 12 МЛН
World‘s Strongest Man VS Apple
01:00
Browney
Рет қаралды 66 МЛН
Understanding Spark Execution
15:33
BigData Thoughts
Рет қаралды 2,2 М.
Spark reduceByKey Or groupByKey
12:06
Data Engineering
Рет қаралды 16 М.
Spark Basics | Partitions
5:13
Palantir Developers
Рет қаралды 18 М.
Adaptive Query Execution (AQE) in Spark | Spark Interview Questions
9:01
小路飞嫁祸姐姐搞破坏 #路飞#海贼王
00:45
路飞与唐舞桐
Рет қаралды 29 МЛН