Spark Memory Management | How to calculate the cluster Memory in Spark

Рет қаралды 13,035

SparklingFuture

Күн бұрын

Пікірлер: 26

@venukumargadiparthy8233 6 ай бұрын

Much needed for interviews, Thanks for sharing sravana.

@Sharath_NK98 9 ай бұрын

Well explained Laxmi.

@aratithakare8016 2 жыл бұрын

very good explanation of concepts

@sravanalakshmipisupati6533 2 жыл бұрын

Thanks a lot.

@hamidkureshi6722 Жыл бұрын

A great explanation! Thank you so much.

@sravanalakshmipisupati6533 Жыл бұрын

Thanks a lot.

@heenagirdher6443 2 жыл бұрын

Thank you so much for this video. Very well explained. Can you please make more videos related to interview questions on this topic.

@sravanalakshmipisupati6533 2 жыл бұрын

Thank you. Sure Heena.

@leedsshri 2 жыл бұрын

-num executor should be 15 right ? We need to give number of executors to be used in total and not per node. Pls correct me if I am wrong.

@sravanalakshmipisupati6533 2 жыл бұрын

total num of executors will be calculated using the formulae - total cores/number of cores per executor. Then this number will be divided by total nodes in the cluster to get the number of executors per node. Here 15 is for total number of executors.

@universaltv6798 Жыл бұрын

say if we want to process 1 tb data with a given cluster capacity in your example 1. when we may get OOM (executory) issue 2. when we will not get OOM issue 3. how spark can do sort merge shuffle join (500gb per df, 2 dfs) 4. briefly explain, how come spark handles big data without OOM issues and when it may get OOM with examples along with code

@sarukavi1007 3 жыл бұрын

👏

@RaXiUs007 3 ай бұрын

a small correction i think --num-executors is across all nodes in cluster so it should not be 3 it should be 3 * 5 = 15

@HollyJollyTolly 2 жыл бұрын

Even if we have 1GB input data shall I consider same parameters

@sravanalakshmipisupati6533 2 жыл бұрын

Hi Mahesh, the memory parameters are set at cluster level. Please check this video for processing large files kzbin.info/www/bejne/qHPbg3WhZ7-ejM0

@udaynayak4788 2 жыл бұрын

can you please name those tools which handle the memory optimization?

@sravanalakshmipisupati6533 2 жыл бұрын

Spark internal mechanism performs all the memory optimizations. We can update the configuration for explicitly applying memory configurations which will inturn does optimizations.

@manjunathbn9513 Жыл бұрын

In Spark 3.0 onwards, these calculations are done by default by spark right?

@sravanalakshmipisupati6533 Жыл бұрын

Hi, in any version , Spark is capable of calculating memory as per the tasks list. If we want to customize based on data then we need to provide these custom calculated values. Even in Spark 3, it provides default values, but its always better to check the UI, analyse the jobs to see that there si no data skewness, no delays etc and we can use the custom configuration in spark-submit.

@antonyjesu7698 3 жыл бұрын

Kindly can u say different between application master and driver? Y r we giving less memory(2gb) to driver?

@sravanalakshmipisupati6533 3 жыл бұрын

Thank you for watching the video. Driver is the spark program where you have created the spark context or spark session. Application master is the process which negotiates the resources with resource manager about the executors. The driver memory mentioned as example. You can provide the driver memory as per your requirement. In some cases, driver memory will be kept as same as executor memory also. Its depend on the cluster memory and the resource sharing.

@RajuKundu-c3j Жыл бұрын

if we request a container of 4gb then we are actually requesting 4gb(heap memory) + max(384,10% of 4gb)[off heap memory] out off the 4gb (total heap memory) 300mb reserved for running executers. 4096-300=3796(3.7gb) out of this 3.7 gb, 60% of it goes to unified( storage+ execution memory ). 2.3 gb is for (storage+execution) remaining 40 % of 3.7gb goes to user memory(i.e. 1.4 gb) I am not able relate with your calculation kindly help me. I have checked multiple video but still not able to understand. how do I calculate cluster memory. kindly help

@sravanalakshmipisupati6533 Жыл бұрын

Yes, 4gb will be divided for execution+ cache+ overhead memory. As 4 GB is very small in number we see less memory allocation. Think from production cluster perspective, where we will have memory in TBs. In this case, we can see that enough memory is available for job execution and cache memory to Store the intermediate results.

@antonyjesu7698 3 жыл бұрын

Sorry to ask..again.... suppose the question is raw process data is 10gb. Based on raw data 10 gb how to calculate memory? Plz

@sravanalakshmipisupati6533 3 жыл бұрын

No problem, thank you for asking the questions. When you are looking based on the input data, you need to think about parallelism. Hdfs block size, and no of partitions to be counted. So for calculating the executor memory, leave 1 core abs 1gb per node to yarn. In the remaining memory, calculate the per executor memory. Executor cores are 5 per cpu. So calculate the memory as, how much per code you can divide and then multiply that with 10(file memory). Example, if you have 5 nodes and each 15 cores. So total you will have 85 cores and let's say 64gb memory. So for 5 nodes total memory is - 64*5=320gb. Leave 1 core and 2 gb to yarn. So we will have 70 cores and 310gb memory (for all 5 nodes). So the remaining resources will be - 310/70=4.42. We will have 5 executor cores, so 5*4.42 = 22.14 gb executor memory. I put data will be divided as partitions so we can calculate executor memory in this way.

@antonyjesu7698 3 жыл бұрын

Thanks.