The best explanation so far I found on KZbin...easily explained
@Rafian19242 жыл бұрын
You are the best trainer on KZbin bro. Keep up the good work.
@rajasekaranm11982 ай бұрын
beautiful explanation.....thank u
@anujasebastian80343 жыл бұрын
I've been looking so many videos...It is only now i got the concept...thanks so much for the explanation.
@learnwithfunandenjoy31432 жыл бұрын
Excellent explanation... Great video to learn the concept in so a simple way. Please make another video so that we could learn all such concepts easily. Thanks.
@ashutoshranghar29526 жыл бұрын
Bro best Explanation WOW>>!!!.Also, do you have a video of explaining entire SPARK-SUBMIT command as to how the worker nodes are created and data is distributed across multiple partitions and task and jobs?It would be really helpful
@rajeshguddati2102 жыл бұрын
Thank you sir, with simple example
@svcc77736 жыл бұрын
It's clear and nice explanation. this is one of best vedio so far in this concept thanks
@VivekKBangaru Жыл бұрын
clear explanation thanks buddy
@abhishekfulzele31482 жыл бұрын
In addition to the Resilient Distributed Dataset (RDD) interface, the second kind of low-level API in Spark is two types of “distributed shared variables”: broadcast variables and accumulators. These are variables you can use in your user-defined functions (e.g., in a map function on an RDD or a DataFrame) that have special properties when running on a cluster. Specifically, accumulators let you add together data from all the tasks into a shared result (e.g., to implement a counter so you can see how many of your job’s input records failed to parse), while broadcast variables let you save a large value on all the worker nodes and reuse it across many Spark actions without re-sending it to the cluster.
@ca202153 жыл бұрын
Excellent explaination.
@kurakularajesh46173 жыл бұрын
super bayya, nice explanation
@arunasingh86172 жыл бұрын
It's informative, Can you also let us know in what situations accumulators is useful?
@afaque674 жыл бұрын
Hi, Many people have questions how accumulator is getting update. Accumulator variable on each worker node is a local copy and there is a global copy which is in driver node and it can be accessed only by the driver process... Hence each worker node will return the count of blank lines to the driver process and the driver process will cumulate and update the global copy.
@svcc77733 жыл бұрын
Exactly
@architsoni893 жыл бұрын
Yes true, this explanation is half cooked
@Shubhaarti25013 жыл бұрын
Excellent Teaching
@mangeshpatil7143 жыл бұрын
Nice explain sir.. 👌👌👍👍
@rajatsaha8913 жыл бұрын
Awsome explanation
@kishorekumar27696 жыл бұрын
excellent video bro.Great explanation and very thorough
@bharathkumar-eg3gc6 жыл бұрын
You said that accumulator value is being updated in each worker node, does worker node 2 will wait until worker node 1 empty lines count updated done? since you are updating the value........... AS SPARK JOB IS A PARALLEL HOW COULD IT GET UPDATED SEQUENTIALLY?
@hiItsEshikahere5 жыл бұрын
i have the same question as well
@airesearch80574 жыл бұрын
@@hiItsEshikahere I think each worker will have its own version of the accumulator (local accumulator), and each worker will update the state of its own local accumulator and when the workers finish the processing, the local accumulators will be sent back to the driver, and the driver will aggregate them all into the global accumulator.
@harshadborkar2550 Жыл бұрын
@@airesearch8057This is the correct answer, workers will have their local variables cached once work is done it sends back the results to the driver node and gets merged.
@drdee945 жыл бұрын
Excellent explanation!
@atheerabdullatif75573 жыл бұрын
amazing!
@dhananjayreddy99982 жыл бұрын
When the data is getting analyzed parallelly, then how come the Accumulators get incremented. For example partition 1 has 1 space line and partition 2 has one space line, when these two processed simultaneously, both partitions can update the accumulator as 1 right. Could you please clarify
@prabuchandrasekar34375 жыл бұрын
Thanks for the clear explanation
@adarshnigam755 жыл бұрын
Awsome explanation..!!
@soutammandal88395 жыл бұрын
Bro u r champ nice explaning
@BetterLifePhilosophies5 жыл бұрын
Yes Thank you.. my questions is how the situation will be handled in case we have encountered blank lines at same time on three worker nodes?
@mayankvijay34365 жыл бұрын
I don't think in broadcast variable example what you showed that w1 contains only USA and w2 only IND is correct. Data is distributed in random fashion and code map can be used as lookup within that worker. Please correct if understanding is wrong.
@chetan300819913 жыл бұрын
I think since broadcast variable is of small size, it will share the complete code map over all workers without segregating the data
@bhavaniv17214 жыл бұрын
Thanks for sharing such a nice video can please share me spark scala training videos
@merimihelmi86265 жыл бұрын
thank's for this explanation
@kashishshah84174 жыл бұрын
can i have the accumulator variable pass the value to broadcast variable? Like some worker nodes update the accumulator variable which is copied to a broadcast variable and inturn read by some other worker nodes
@haveafuninlife3 жыл бұрын
broadcast variable is immutable. once you do broadcast from driver node, value of the variable is sent to all the worker nodes. Workers can just read the value.
@shreyash182 жыл бұрын
Time stamp 3.55 spark submit .... You didn't mentioned about cluster manager role in spark submit background process As u mentioned drive program initiate and connect to worker ....yet driver connect with cluster manager and cluster manager wil connect to workers
@svcc77733 жыл бұрын
Didn't mention how to retrieve record from broadcast variable
@architsoni893 жыл бұрын
This is not the correct explanation for Accumulator variables from the start. Kindly edit the video to add factual information
@ssssssssssss50264 жыл бұрын
This guy said, driver will create worker node. I think he should review his video before posting. Every single person is just want to make money by starting his own channel but does not want to spend time in giving quality videos.