Spark Scenario Interview Question | Persistence Vs Broadcast

  Рет қаралды 12,947

TechWithViresh

TechWithViresh

Күн бұрын

Пікірлер: 28
@mrkrish501
@mrkrish501 4 жыл бұрын
This scenario was not clear when i go through external video but after your explanation i understood the difference. excellent
@rameshthamizhselvan2458
@rameshthamizhselvan2458 4 жыл бұрын
Excellent... Billion-dollar video...
@sumitgandhi628
@sumitgandhi628 3 жыл бұрын
Thanks for explaining it in easy way :)
@srinubathina7191
@srinubathina7191 Жыл бұрын
Super content thank you
@umamahesh2307
@umamahesh2307 4 жыл бұрын
Data in memory and data in disk will not occupy same space. So if it is 12gb on disk in memory it can be 18gb
@gemini_537
@gemini_537 3 жыл бұрын
Is it approximately 12GB after persisting? Any significant overhead when the data is in memory?
@Jayaditya87
@Jayaditya87 3 жыл бұрын
if we persist with serialize, we will have cpu overhead else no overhead.
@HemanthKumar-cm9lv
@HemanthKumar-cm9lv 4 жыл бұрын
Hi viresh, Thanks for the video, can you confirm the below statement, In Persist executors save partition of the data frame in memory and in broadcast executor will save the entire data frame in memory?
@mateen161
@mateen161 4 жыл бұрын
Hi Viresh, Thanks for this nice video....I believe that the broadcast variable is used to broadcast a small table and join it with a huge table which would avoid shuffling....What happens if we broadcast a table with more number of columns into the executors? Assuming that the broadcast table is larger in size because of having more number of columns.
@dipanjansaha6824
@dipanjansaha6824 4 жыл бұрын
You would not be able to leverage the benifit of broadcast join in that case. Even though you forcefully enable broadcast, would not be noticing much impact or improvement
@mateen161
@mateen161 4 жыл бұрын
@@dipanjansaha6824 Sounds good...Thanks
@TechWithViresh
@TechWithViresh 4 жыл бұрын
This would fill up the memory quickly and can result in out of memory issue , that’s why default size for auto broadcast is 10 mb, Also as discussed in separate video, memory footprint in broadcast becomes 4 times of the actual size. Thanks
@mateen161
@mateen161 4 жыл бұрын
Thanks Viresh for the information.
@hiteshpatil5906
@hiteshpatil5906 2 жыл бұрын
why here u r taking only 3 no. of executor?
@mahendramaurya3208
@mahendramaurya3208 4 жыл бұрын
why the number of partitions are three in case of broadcast join, Can we keep it low like 1or 2 orr why not to keep all the partitions into single executor
@TechWithViresh
@TechWithViresh 4 жыл бұрын
It will throw out of memory in that case for sure.
@gemini_537
@gemini_537 3 жыл бұрын
In the case of broadcast, why do we have to include the 12GB of existing DF? I feel it is unfair to compare persist with broadcast. It is possible to avoid the 12GB?
@Jayaditya87
@Jayaditya87 3 жыл бұрын
it is the primary /parent data using which we have created copies and shared it to executors. until this application ends this parent data is still part of the program.
@siddhantpathak6289
@siddhantpathak6289 4 жыл бұрын
Hi Viresh, I am new to Spark so cut me some slack for asking newbie questions. In persistence, the data frame is held either in memory or disc. Suppose the data from the data lake was held in executor memory which is in the given case is 4 GB and it is completely occupied. Now I want to read another data frame, then how executor will deal with it since its memory is already occupied by the previous persisted data frame. In Broadcast, the memory footprint is said to be 4 times the data frame memory. Where does this memory come from since each executor got only 4 GB? And I also read somewhere after Garbage Collection it is only left 3 times. Why is it so? In persistence, the data is said to be stored in memory. Is it just the executor memory which is 4 GB or entire system memory? Thanks in Advance.
@suryasatish5581
@suryasatish5581 4 жыл бұрын
broadcast variable is copy per node right ? why it will 36 GB ?
@MrVivekc
@MrVivekc 4 жыл бұрын
what I understood in case of broadcast variable, 12GB data is copied to all, 12 GB per node, nodes thus adding upto 36GB.
@aneksingh4496
@aneksingh4496 4 жыл бұрын
Hey Viresh , from where do you study these concepts . please share resources
@anantababa
@anantababa 4 жыл бұрын
not clear ..you have explain about the persistence ..
@ITEraTech
@ITEraTech 3 жыл бұрын
no voice clarity
@vkd9442
@vkd9442 4 жыл бұрын
Not clearly explained
@TechWithViresh
@TechWithViresh 4 жыл бұрын
Please post your doubt/question, we will try to answer the same. Thanks for the feedback.
@skms31
@skms31 4 жыл бұрын
@Technical Tutorials : Please be specific in giving feed back so people can help you !Be as specific as possible please
Spark Interview Question | Partition Pruning | Predicate Pushdown
8:17
Spark Interview Questions | Spark Context Vs Spark Session
9:26
TechWithViresh
Рет қаралды 19 М.
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 34 МЛН
龟兔赛跑:好可爱的小乌龟#short #angel #clown
01:00
Super Beauty team
Рет қаралды 120 МЛН
Random Emoji Beatbox Challenge #beatbox #tiktok
00:47
BeatboxJCOP
Рет қаралды 51 МЛН
Spark Performance Tuning | Handling DATA Skewness | Interview Question
16:08
Spark Join Without Shuffle | Spark Interview Question
10:42
TechWithViresh
Рет қаралды 21 М.
Spark Performance Tuning | Avoid GroupBy | Interview Question
8:37
TechWithViresh
Рет қаралды 11 М.
Spark Interview Question | Bucketing | Spark SQL
12:06
TechWithViresh
Рет қаралды 14 М.
Apache Spark Memory Management
23:09
Afaque Ahmad
Рет қаралды 12 М.
Spark Performance Tuning | EXECUTOR Tuning | Interview Question
18:19
TechWithViresh
Рет қаралды 32 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57