Spark Interview Question | Speculative Execution in Spark | With Demo

Spark Interview Question | Speculative Execution in Spark | With Demo | LearntoSpark

Рет қаралды 17,580

Күн бұрын

In this video, We will learn how Apache Spark itself handles the slow or long running task. What is Speculative Execution? How it works under the hood and how to configure it. We will also have a demo on Spark Speculative Execution on Azure Databricks.
Blog link to learn more on Spark:
www.learntospark.com
Linkedin profile:
/ azarudeen-s-83652474
FB page:
/ learntospark-104523781...
Github:
github.com/aza...

Пікірлер

@thesadanand6599 Жыл бұрын

speculative execution is at job level or stage level ?

@rushipradhan4704 Жыл бұрын

What if I dont set the configs for speculative execution in my file before I begin my spark application? Once I encounter a long running task, can I simply open the config file and set the configs related to speculative execution or do I need to kill the application and then set the configs?

@AzarudeenShahul Жыл бұрын

Changing the config file in between the active job will not enforce the changes to the job configs. u have two options, either kill the job and retrigger or else add a inline config command to the long running job with if else loop as below, if job_time>threshold: #kill current run and run with below config spark.conf.set("speculative exec","true")

@jagadeeshkoduri9660 3 жыл бұрын

U are better, Really explaining based on scenarios..

@sarjfud 4 жыл бұрын

Nice demo and very well explained. Please continue uploading such nice videos on spark

@AzarudeenShahul 4 жыл бұрын

Thanks for your support :)

@kondareddyp4144 2 жыл бұрын

Hi bro, can you plz do videos on clusters and types, worker nodes, working nature of cluster in databricks

@barathy.m5589 4 жыл бұрын

Good Explanation.. please put videos on handling duplicates ,Null values using spark

@ENTERTAINMENTSERVICE 3 жыл бұрын

Thanks man

@NishaKumari-op2ek 4 жыл бұрын

Can you upload a video on spark optimization techniques for interview perspective. This has been asked multiple times. Thank you

@sumitkhattar9849 3 жыл бұрын

Great videos

@nitinagrawal6637 4 жыл бұрын

Good one, but I just wonder if one task is taking time on one node then how can another node complete the same task earlier?

@AzarudeenShahul 4 жыл бұрын

Thanks for your support :) It is not about one long running task, it may go till 25% of the overall pending task.

@nitinagrawal6637 4 жыл бұрын

@@AzarudeenShahul Right, but I think that 25% or 30% etc is what we configure to decide when to trigger Speculation Execution. So if 25% is still pending & taking time then what makes us consider that if we start it on another node, it will be completed soon. This is my doubt.

@AzarudeenShahul 4 жыл бұрын

There might be many reason for the slowdown of task in one node; Few are, 1. hardware degradation 2.software mis-configuration 3.slow data transfer 4.traffic issue 5.Data availability It might be difficult to detect the cause for slowness since the task still completes successfully. Spark does not try to diagnose and fix the slow running task, instead it it detects the slow running and runs backup task for them. Hope this answers your question

@nitinagrawal6637 4 жыл бұрын

@@AzarudeenShahul Thanks for the response but still I feel that void. Means we start a task on any node first when there is the capacity for that task. So in speculative execution we are engaging two nodes for the same task without being sure about the cause nor we cancel the task on the first node. And this way possibility is high that task will be completed by previous node first but still that task was started on new node also. And I think in your example also the task was completed by the first node & the task on new node was cancelled then. So doesn't it seem like wrong understanding of the infrastructure or wrong system configuration? Means I am trying to fill the gap between reality & expectations and my understanding.

@AzarudeenShahul 4 жыл бұрын

Its not about the capacity of node, its the slowness as i mentioned above. More you read the application log, more u understand. It comes through experience. And in my example, its not the real project, i simulated a slow running task using sleep, which obviously makes node 1 to complete first.

@yaniv54 4 жыл бұрын

Very well articulated and explained. Can we conclude Speculative execution as one of the spark optimization techniques.

@AzarudeenShahul 4 жыл бұрын

Spark itself will take care of this. But, before enabling this in config, we should be aware of resources. If we have a less resource available in our cluster, then it might degrade the performance ..

@yaniv54 4 жыл бұрын

@@AzarudeenShahul Thanks for the reply. Can you upload a video on spark optimization techniques for interview perspective.

@AzarudeenShahul 4 жыл бұрын

Sure, we can plan for series on optimization..

@gurunagendra9690 3 жыл бұрын

Hi , what happens in case the re-initiated task t2 in both the worker nodes will complete with same time period . And which task will be consider either t2 in worker node1 or t2 in worker node2 .