Spark Executor Tuning | Decide Number Of Executors and Memory | Spark Tutorial Interview Questions

  Рет қаралды 92,534

Data Savvy

Data Savvy

6 жыл бұрын

As part of our spark Interview question Series, we want to help you prepare for your spark interviews. We will discuss various topics about spark like Lineage, reduceby vs group by, yarn client mode vs yarn cluster mode etc. As part of this video we are covering
what is role of spark context.
Please subscribe to our channel.
Here is link to other spark interview questions
• 2.5 Transformations Vs...
Here is link to other Hadoop interview questions
• 1.1 Why Spark is Faste...

Пікірлер: 184
@rishi190
@rishi190 5 жыл бұрын
Number of cores = Concurrent tasks as executor can run So we might think, more concurrent tasks for each executor will give better performance. But research shows that any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5. This number came from the ability of executor and not from how many cores a system has. So the number 5 stays same even if you have double(32) cores in the CPU.
@DataSavvy
@DataSavvy 5 жыл бұрын
Thanks for inputs... Useful for community
@shaiksubhani9802
@shaiksubhani9802 5 жыл бұрын
Really appreciate your efforts.. thanks for sharing your knowledge.
@manieshsh
@manieshsh 5 жыл бұрын
You are doing exceptional work. Love 👍🏻
@DataSavvy
@DataSavvy 5 жыл бұрын
Thanks Manish :)
@ananthasubramanian7355
@ananthasubramanian7355 3 жыл бұрын
Basic but important concept w.r.t Spark. Well explained...!! Thank you for this video..!!
@DataSavvy
@DataSavvy 3 жыл бұрын
Thanks Avantha :)
@baritos4
@baritos4 4 жыл бұрын
Hi - I have question regarding reserved memory (300MB) that is no available for any spark allocation. During your calculation of 64GB per node , why wsnt this 300MB deducted . You deducted 1GB for OS operations (did it include this reserved memory of 300MB)?Also if we enable spark.memory.offheap , should we deduct the offheap memory size as well from 64gb?
@sandeepmaheshuni
@sandeepmaheshuni 5 жыл бұрын
Can you elaborate, how many cores and memory need to be allocated individually directly w.r.t spark-submit configuration. coz we have driver/executor memory, driver/executor cores. This is most expected question in interview.
@abhinavsingh9333
@abhinavsingh9333 5 жыл бұрын
Nice video, well explained.
@karanrewari6482
@karanrewari6482 3 жыл бұрын
Thanks for the great video..really helped me understand thing from a fundamental level. One question - given the same context, how would you fit in driver memory requirements assuming you need 4GB of driver memory. How about the cores for it as well ? The driver memory only needs to be on 1 machine, while others can play the role of workers (how is this configured ?)
@Sunkarakrwutarth9389
@Sunkarakrwutarth9389 5 жыл бұрын
thanks for your video, how to decide the number of workers for spark cluster in the data bricks environment.
@remoram8141
@remoram8141 5 жыл бұрын
Hi , We are running HDFS/SPARK on same machines and total we have 7 nodes in cluster.each node having 94GB ram ,16 cores....so,how much resource we need to give for spark daemons and hdfs..we are getting heavy memory usage alerts from spark workers..can u plz help me in this...
@amruthpesari7922
@amruthpesari7922 3 жыл бұрын
Hi Sir, Could you please elaborate on " contention between threads for IO from HDFS " which is the answer for "why 5 is no. of cores per executor"
@arryanderson
@arryanderson 3 жыл бұрын
I have a question. If there is 1 core assigned for the OS operation each machine, then doesnt that mean 4GB is assigned for OS? Little confused on how we arrived at 1GB for OS per machine. Also, if 21 GB per executor is divided among 5 cores, does that mean each core has 4.2 GB in its disposal?
@sjdee13
@sjdee13 5 жыл бұрын
Please make a video on debugging spark code in an ide like eclipse. Thanks
@zillala4553
@zillala4553 3 жыл бұрын
Great.... It's so clear explanation 👍👍
@sundaramoorthysivasamy9784
@sundaramoorthysivasamy9784 4 жыл бұрын
Hi , if it is shared cluster and we are provided with a resource queue of 1000 cores and 4000 GB memory , if we have 10 hourly jobs that needs to run in that resource queue , how to provide executor core and memory for each job. Some job might read TB volume , some in GB.
@lavanyareddy310
@lavanyareddy310 3 жыл бұрын
Great video sir,thank u so much for all u r videos
@DataSavvy
@DataSavvy 3 жыл бұрын
Thanks Lavanya :)
@nimishbajaj6247
@nimishbajaj6247 3 жыл бұрын
Can there be scenarios when 5 cores per executor be a bad design, lets say we need more memory per executor? In that case, it might be better to have 3 executors in total with all the cores and memory we can give to the individual executors. (Will this scenario occur when the data is not properly partitioned/ cannot be equally paritioned)
@study-channel6301
@study-channel6301 2 жыл бұрын
What if there are a lot of Spark jobs running in the same cluster being developed by different developers with the same machine configurations? What considerations should be taken in that case?
@ibrarahmed8563
@ibrarahmed8563 5 жыл бұрын
i get the concept of least and most granular aproaches but in both aproaches we have no resources left, the points you have made for most granular aproach,can also be considered for least granular approach( first approach ) bcz we also have no resources left there as well ? pls explain!!
@phanikumar4915
@phanikumar4915 5 жыл бұрын
Hi Is it possible to decide on which executer/machine our job should run
@mubasshirali1986
@mubasshirali1986 4 жыл бұрын
Thank you so much for the informative video, Sir. You said there's a formula for allocating the memory for Yarn Overhead. As a kind reminder you said at 8:27 that you'd share it in comment/description section... Thanks in advance
@shubhamshingi4657
@shubhamshingi4657 3 жыл бұрын
many ppls keep it 7 % of executor memory, but now a days 10% is also used
@mubasshirali1986
@mubasshirali1986 3 жыл бұрын
@@shubhamshingi4657 Thanks for the knowledge Sir 🙏
@chiranjeevikatta8116
@chiranjeevikatta8116 3 жыл бұрын
So if we allocate all resources to our single spark job. What will happen if I have to run others jobs parallel.
@saurabhpant44
@saurabhpant44 5 жыл бұрын
Hello, thanks for a very clear explanation of the concept. I have a question though. You have mentioned that, if we use a lot of cores for a single executor, all cores will compete with each other to read data from HDFS and will cause IO contention. But, how is it different from the case when we have 5 cores each for 3 executors. All these 3 executors and hence 15 cores will still try to read the data from HDFS in parallel. Isn't it or am I missing something here?
@sachinyadahalli5745
@sachinyadahalli5745 5 жыл бұрын
Yes, had same question in my mind. Could anyone comment on this ?
@peace-ye2lm
@peace-ye2lm 2 жыл бұрын
we can mention many parallel task at a time to run in a executor but spark decides internally at that point of time only certain number of tasks will run in parallel remaining will wait till the other tasks complete , ones a task completes the one which was in waiting state will start running
@raghuram9130
@raghuram9130 2 жыл бұрын
Hi, no of executor core you have decided as 5.can you please confirm how you decided that value?
@sunilsambarekar3144
@sunilsambarekar3144 5 жыл бұрын
Thank you, this is very helpful
@DataSavvy
@DataSavvy 5 жыл бұрын
Thanks sunil... Please provide feedback on other videos also
@udaykumarreddygajjala7242
@udaykumarreddygajjala7242 4 жыл бұрын
I have a doubt If a spark job is running How do we check how much memory we allocated to that job? Apart from yarn UI?
@lazyengineer1994
@lazyengineer1994 3 жыл бұрын
$ yarn top
@rameshbabuy9254
@rameshbabuy9254 3 жыл бұрын
can you configure below properties based on above example cluster in the video spark.executor.memory = spark.executor.cores = spark.executor.instances = spark.yarn.executor.memoryOverhead = spark.default.parallelism =
@prakashmudliyar4834
@prakashmudliyar4834 3 жыл бұрын
Can Dynamic Resource allocation takes are of all this?
@ankush_sagane
@ankush_sagane 3 жыл бұрын
can u please share dynamic memory allocation as well?
@shyamsundar8665
@shyamsundar8665 3 жыл бұрын
Please do create a video on explaining 1) yarn memory overhead(off and on heap ) - what it is and its significance 2) garbage collection - what is this memory and how to resolve gc out of limit error
@tejasriin
@tejasriin 5 жыл бұрын
Hi could you please give a good suggestion for my cluster 12*256 memory and 12*56 core and disk of 12*14tb
@DataSavvy
@DataSavvy 5 жыл бұрын
You need to understand nature of your job to precisely tell what is configuration required by your job... You job can be cpu heavy or Io heavy... Assign resources based on that
@maheshkumarpurushothaman6167
@maheshkumarpurushothaman6167 5 жыл бұрын
It is really helpful video. If i run spark job with shuffle partitions would that need to be considered to have additional cores/ memory for executors?
@DataSavvy
@DataSavvy 5 жыл бұрын
You should always try to reduce shuffle... Increasing cores will not help here... Increasing memory overhead for shuffle service will help but still shuffle is expensive operation
@Raghav54321
@Raghav54321 5 жыл бұрын
Please share the formula. As you have mentioned in video
@user-he9nw7th8u
@user-he9nw7th8u 4 жыл бұрын
Spark and Kafka and scala,how to tell the rolls and Responsibility
@sachinyadahalli5745
@sachinyadahalli5745 5 жыл бұрын
dont we have to decide any cores or memory for driver ? Or is it handled by spark itself. bcz I have seen many jobs where we specify as --driver-memory 10G etc. how about the cores for driver ?
@DataSavvy
@DataSavvy 5 жыл бұрын
You are right...Driver memory can also be set for your job... It is generally used when you collect big data in your job or your driver is very memory intensive... It is not considered good thing as if you are trying to do your processing in driver then that mean you are not using power of spark properly
@ankitsrivastava2914
@ankitsrivastava2914 2 жыл бұрын
Interview question - f job is running in 1 hour in production what you will do?
@santoshkumar-jd9ns
@santoshkumar-jd9ns 3 жыл бұрын
Is there any formula to take executor cores to be 5
@RAVIKUMAR74
@RAVIKUMAR74 5 ай бұрын
how to decide calculate no of executor and memory when multiple jobs are runing by multiple proresses in cluster? what is the best way to tune our process. is this good way to give 50 executor with 16gb each. could you please explain with example.
@bhaswatirout5764
@bhaswatirout5764 6 жыл бұрын
I have been asked this question twice in interview: Assume the table has 1TB(certain amount) data , how do you decide no of executors in spark submit? In this case is there any formula for this or only we can say based on past experience and hit n trial.
@DataSavvy
@DataSavvy 5 жыл бұрын
I will create a video on this
@ajaypratap4025
@ajaypratap4025 5 жыл бұрын
Data Savvy when?
@rikuntri
@rikuntri 5 жыл бұрын
@@DataSavvy is this video available?
@sashanksah4864
@sashanksah4864 5 жыл бұрын
@Data savvy: is the video available? also couldn't find the yarn overhead formula.. plz provide us.
@ayushjain139
@ayushjain139 3 жыл бұрын
@@DataSavvy 1TB/128MB(HDF blocksize) would be the cores you would want to have. Now with 5 cores per executor idealogy. Roughly comes down to ~1562 executors for max parallelism. Of course, based on available processing power this would be limited. Please correct this answer if someone knows the correct answer.
@harekrushnamishra691
@harekrushnamishra691 3 жыл бұрын
Hi Sir ....Could you please explain how many executors we should go ahead with if we have 100 gb input data, 4 node cluster with 16gb ram and 8 cores each ? Asked in one of the interview question.
@neelbanerjee7875
@neelbanerjee7875 Жыл бұрын
Similar question i too had..
@nehabansal677
@nehabansal677 5 жыл бұрын
Very helpful for interviews
@DataSavvy
@DataSavvy 5 жыл бұрын
Thanks
@DataSavvy
@DataSavvy 5 жыл бұрын
Please watch full spark interview series
@tolasebrisco6565
@tolasebrisco6565 2 жыл бұрын
Keep the good work #Prinetechs I can clearly see all the good reviews about you man…i never believed my account can fixed after 7 months hahaha
@TheRn35
@TheRn35 3 жыл бұрын
1. So this says at best what can be assigned if only one job is running, right? What we have multiple independent batch jobs scheduled in parallel? 2. The calculation only factors in the cluster configuration. Shouldn't the data volume and type of operations being performed also be deciding factors? 3. Also why bother about these when dynamic allocation is available?
@SpiritOfIndiaaa
@SpiritOfIndiaaa 5 жыл бұрын
Thanks a lot sir , giving some good idea about how to allocate resources in spark. I am working on ETL tool but having real problem in tuning my environment...can you please help me.
@DataSavvy
@DataSavvy 5 жыл бұрын
Hi... Thank you... I will love to help on this... Which Etl tool are you using... I have knowledge about big data related things... Not much aware of Informatica etc
@sangamchoubey2772
@sangamchoubey2772 3 жыл бұрын
How can we decide on no of executors and memory based upon the data input , like 1 TB ?
@DataSavvy
@DataSavvy 3 жыл бұрын
You should check how many partitions you have, what is approximate size of each partition and if you are doing any joins or aggregation how number of partitions were changing after that... This will give you good hints about hope many executors and executor cores you should use
@abhishekkrishna9757
@abhishekkrishna9757 6 жыл бұрын
Thanks for sharing. If possible, could you please share some scala intervew questions?
@DataSavvy
@DataSavvy 6 жыл бұрын
Sure, will create a series on Scala interview questions... Please subscribe to our channel 😊
@abhishekkrishna9757
@abhishekkrishna9757 6 жыл бұрын
Thanks for your reply. Already subscribed :)
@user-he9nw7th8u
@user-he9nw7th8u 4 жыл бұрын
Good job ,pls use Paint(board)
@lifefactsfun
@lifefactsfun 5 жыл бұрын
Can you make video on the complete life cycle of a spark project.Including how unit testing is done by developers.
@DataSavvy
@DataSavvy 5 жыл бұрын
Sure I will create a video on this
@riyasmohammad9234
@riyasmohammad9234 2 жыл бұрын
Can you make a playlist for spark from scratch?
@JS-gg4px
@JS-gg4px 3 жыл бұрын
I am so confused when you mention cores and executors back force , which one is one ? lol
@bhargavhr1891
@bhargavhr1891 6 жыл бұрын
Very good content, I would like to know how will we increase the number of cores as a part of cluster upgrade, also every time the cores are increased in the cluster, does the formula change?
@DataSavvy
@DataSavvy 5 жыл бұрын
As you add more machines... More CPU are added, hence more cores are added... Sorry for delay in response
@4ukcs2004
@4ukcs2004 5 жыл бұрын
even though cluster upgrade happens ,maximum cores to be used is 5 as per Hortonworks documentations.
@aneksingh4496
@aneksingh4496 5 жыл бұрын
Please share some real time problem like job stuck or some use case .. that will be useful
@IrfanShaikh-jt6yu
@IrfanShaikh-jt6yu 2 жыл бұрын
awsome sir please make same as for yarn tunning with atleast 10 machine
@sumitkrs2
@sumitkrs2 5 жыл бұрын
Very well explained..I have got the question here. We decide number of executors for each spark job but what will happen when we submit multiple jobs with same number of executors.
@DataSavvy
@DataSavvy 5 жыл бұрын
If those amount of resources are available, then all jobs will get resources... If those resources are not available then job priority and when job was submitted comes in picture
@DataSavvy
@DataSavvy 5 жыл бұрын
I am happy that you liked the video :)
@gouthamanush
@gouthamanush 3 жыл бұрын
Hi, I have been following your content for quite sometime. Really good content. In this example, you have assumed that we have to utilise all the existing resources. But what if the data is just 200MB. Would you still use the same configuration? Shouldn't the decision be based on how much input data is and then decide accordingly? I am confused here.
@ratnawaliparshetti904
@ratnawaliparshetti904 2 жыл бұрын
Hi..Even i have same question.Did you recive this answer.
@DataSavvy
@DataSavvy 2 жыл бұрын
Hi... You are right. In my example, I assumed that we have huge data and we have to use all resources. But when data size is small we need to allocate based on amount of data we process and what will be in memory footprint... Spot on :)
@AmitPrasadbangalore
@AmitPrasadbangalore 2 жыл бұрын
Hi , You are doing great job bro. I have a question though at 7:17 , how did we decide to have no of executor as 5 cores ?
@mohitmanna7308
@mohitmanna7308 Жыл бұрын
broo do u have answer now?
@krishnarupeshnagubandi596
@krishnarupeshnagubandi596 5 жыл бұрын
can you please elaborate.
@mdmoshiuzzaman
@mdmoshiuzzaman 2 жыл бұрын
Hi, First of all thanks for your great efforts!!! :) Now coming to my doubt,this may sound very childish to others :( but I am curious to know how we will loose parallelism in case of tiny executor approach? cant different task run on different executor parallely ?
@suvasishudgata7326
@suvasishudgata7326 Жыл бұрын
Hi, yes your question is valid... you will not lose parallelism in the first case. A rather better statement would be you will not be using the parallelism to its max potential, in the 1st case you will have 1core/executor and in the 2nd one you will have 3 executor/machine with 5 core for each executor.
@gouravchoubey860
@gouravchoubey860 4 жыл бұрын
How did we derived 5 cores per executor?Was it mere guess?Also how did we reached number of executors
@DataSavvy
@DataSavvy 4 жыл бұрын
5 cores is usually a standard when you are defining no of cores... Having more than 5 threads(executor cores) will increase contentions among threads for resources... However there is no hard and fast rule.. u can try with 4 or 6 if that gives u better performance...
@amarnathkal6467
@amarnathkal6467 3 жыл бұрын
What happens if one of the Machine/ Workernode goes down in middle of the data processing? Can someone help please.
@DataSavvy
@DataSavvy 3 жыл бұрын
If a machine fails, executors on that machine will fail. They will be restarted on another machine
@amarnathkal6467
@amarnathkal6467 3 жыл бұрын
@@DataSavvy Thanks Bro👍👍👍
@bhargavhr1891
@bhargavhr1891 6 жыл бұрын
Hi Harjeet, could you please share the formula for finding the exact cores and memory for an cluster who's configuration would be different from what you have explained here
@DataSavvy
@DataSavvy 6 жыл бұрын
Sure, Bhargav... Will put it together and post here
@Raghav54321
@Raghav54321 5 жыл бұрын
@@DataSavvy please share...still waiting
@harjeetkumar4632
@harjeetkumar4632 5 жыл бұрын
Great quality of content... Can you make more videos on this topic
@DataSavvy
@DataSavvy 5 жыл бұрын
Thank you
@lifefactsfun
@lifefactsfun 5 жыл бұрын
How to resolve a spark job that is stuck for a long time during runtime?
@DataSavvy
@DataSavvy 5 жыл бұрын
There can be multiple reason for job to stuck... One most common reason is excessive shuffle... Other can be failure of executor again and again.. other could be job waiting to aquire resources as the resources on cluster may not be free
@ShubhamJain-qx4hx
@ShubhamJain-qx4hx 5 жыл бұрын
@@DataSavvy What should be approach to solve this?
@rameshthamizhselvan2458
@rameshthamizhselvan2458 5 жыл бұрын
I have one doubt for a long days .. how number of excecutor core becomes 5 suddenly ????? is this assumption ??
@DataSavvy
@DataSavvy 5 жыл бұрын
Hi Ramesh... Number of cores = Concurrent tasks as executor can run So we might think, more concurrent tasks for each executor will give better performance. But research shows that any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5. This number came from the ability of executor and not from how many cores a system has. So the number 5 stays same even if you have double(32) cores in the CPU.
@DataSavvy
@DataSavvy 5 жыл бұрын
You may decrease it from 5 to 3 while executors on samne machine may increase... More than 5 cores create contention for resources within executor and reduction in write througput
@chaithanyakrishnaravindrak7032
@chaithanyakrishnaravindrak7032 6 жыл бұрын
Hi, Please share the formula as described in your video.
@DataSavvy
@DataSavvy 5 жыл бұрын
Will try to put this down
@kaladharnaidusompalyam851
@kaladharnaidusompalyam851 3 жыл бұрын
Hi. Why spark is in memory? Expecting clear solution from you?
@DataSavvy
@DataSavvy 3 жыл бұрын
Hi... Spark processing is not always in memory. U can chose not to cache data and do processing. Caching shd be used only when u are doing operations on same data again and again..
@rajudakoju
@rajudakoju 5 жыл бұрын
Please have videos about Data Bricks
@neelbanerjee7875
@neelbanerjee7875 Жыл бұрын
Please make an video on optimization based on data size.. how to choose executor num, memory, core depending upon file size..
@rohithgorthy971
@rohithgorthy971 3 жыл бұрын
Thin vs Fat executor, optimum selection!
@ranganath031
@ranganath031 3 жыл бұрын
why is the no of executor cores is 5?
@areyoufreetolisten
@areyoufreetolisten 4 жыл бұрын
So nbr of executor per m/c = (no. of core - 1)/5 and mem per executor = ((tot mem - 1gb)/no. of executors) - 2gb overhead
@DataSavvy
@DataSavvy 4 жыл бұрын
Yup... That's the max on machine... U will need lesser number of executors on a single machine in actual
@ayushjain139
@ayushjain139 3 жыл бұрын
​@@DataSavvy So, --executor-memory will be ((tot mem - 1gb)/no. of executors) or ((tot mem - 1gb)/no. of executors) - 2gb overhead?
@venkatraju6553
@venkatraju6553 6 жыл бұрын
Thanks for the content...audio was low...Please do increase in future video's.....!
@DataSavvy
@DataSavvy 6 жыл бұрын
Yes Venkat... Audio quality is going down when I upload it on KZbin... I am trying to find a solution for this
@vishal7598
@vishal7598 6 жыл бұрын
Hi ...Your videos are great to see. Could you please tell me how you decide the No of executor cores to 5?
@DataSavvy
@DataSavvy 6 жыл бұрын
Good question my friend... This number is more from experience perspective. If u give more than 5 cores to executor , then it is observed there is contention between threads for IO from HDFS. So you data read and write will become slow.. 5 is generally more accepted number.. Hope this helps...
@vishal7598
@vishal7598 6 жыл бұрын
Thanks for your reply. So it totally depends on heat and trial to get to know the performance.
@DataSavvy
@DataSavvy 6 жыл бұрын
VISHAL ANAND not exactly... Only the 5 number of executors per executor are outcome of experience... Rest is still math
@maheshrichhariya6731
@maheshrichhariya6731 5 жыл бұрын
@@DataSavvy i think one executor we eliminate for yarn..
@Rahulwagh234
@Rahulwagh234 5 жыл бұрын
How did you reach to Number of Executor Core should be 5
@DataSavvy
@DataSavvy 5 жыл бұрын
That number is arrived with hit and trial... If you job is Io intensive, then it may change
@rikuntri
@rikuntri 5 жыл бұрын
He explained that if we apply less core than we will not utilise the jvm and if high io response will increase .so executor core should be in mid range not granular or too high
@DataSavvy
@DataSavvy 5 жыл бұрын
Number 5 is also driven by capability of executor to handle parallelism...
@SpiritOfIndiaaa
@SpiritOfIndiaaa 5 жыл бұрын
Can you please elobarate on No of executor cores : 5 and per machine : 3 , really confusing
@DataSavvy
@DataSavvy 5 жыл бұрын
Hi... Could not understand this question... Please elaborate
@sandeepmaheshuni
@sandeepmaheshuni 5 жыл бұрын
After calculation, Out of 5 cores, 3 will be allocated to avoid I/O contention in HDFS.
@rikuntri
@rikuntri 5 жыл бұрын
@@sandeepmaheshuni no .Kindly go through the video again or basic architecture of spark.The thing you missed is Executor core and executor
@rikuntri
@rikuntri 5 жыл бұрын
no .Kindly go through the video again or basic architecture of spark.The thing you missed is Executor core and executor .
@surabhiyadav7673
@surabhiyadav7673 5 жыл бұрын
Hi can you please share the formula to decide number of executors.
@DataSavvy
@DataSavvy 5 жыл бұрын
Sure... Will post that
@sandeeppandita1
@sandeeppandita1 5 жыл бұрын
Does each executor run on separate jvms?
@31bikashdash
@31bikashdash 4 жыл бұрын
yes
@31bikashdash
@31bikashdash 4 жыл бұрын
different jvm instance
@lavanyareddy310
@lavanyareddy310 3 жыл бұрын
How much salary we expect for spark developer for 3 years experience
@DataSavvy
@DataSavvy 3 жыл бұрын
That depends on your skill, current salary and role and company you have applied for... It will be misleading to give you a single answer
@inkuban
@inkuban 4 жыл бұрын
Nice work sir!! But where is the formula for overhead memory allocation? Please provide code recipes also please. If you make udemy videos let me know.
@31bikashdash
@31bikashdash 4 жыл бұрын
8 % percent of memory allocated to each executors
@jeevithat6038
@jeevithat6038 5 жыл бұрын
Poor audio quality. You speak in low voice. Can you play improve this. Apart from this your videos are informative.
@adityapratapsingh7649
@adityapratapsingh7649 3 жыл бұрын
True.
@insaniyat_1512
@insaniyat_1512 2 жыл бұрын
audio thoda mute hi krdo ya tez rkho plz. aapki cdo k liye thank you aise.
@mubasshirali1986
@mubasshirali1986 4 жыл бұрын
Another Interview Question: How to distribute resources between Yarn, Impala and Spark ?
@DataSavvy
@DataSavvy 4 жыл бұрын
Thats a good question Mubasshir.. This question is asked usually for admin roles... will plan a video on this
@mubasshirali1986
@mubasshirali1986 4 жыл бұрын
@@DataSavvy Thank you so much Sir. . Actually I'm looking for an admin job profile. Will be waiting for your video on my request. Gratitude.
@mubasshirali1986
@mubasshirali1986 3 жыл бұрын
@@DataSavvy Sir, Hope you're doing great. Kind reminder for requested query about: How to decide/share resources between Yarn, Impala and Spark. Thanks in advance.
@anusha0504
@anusha0504 4 жыл бұрын
What are advanced spark technologies
@DataSavvy
@DataSavvy 4 жыл бұрын
Can you elaborate on your question... Do u mean to ask what are new features of spark?
@anusha0504
@anusha0504 4 жыл бұрын
@@DataSavvy Yes please
@DataSavvy
@DataSavvy 4 жыл бұрын
There are a lot... Delta lake, AQE, dynamic Partition Pruning, auto coalesce, structured streaming etc
@anusha0504
@anusha0504 4 жыл бұрын
@@DataSavvy Thanks for the information
@mohitmanna7308
@mohitmanna7308 Жыл бұрын
Simple Words: 1. Leave 1 Core & 1 GB for OS (per machine) So, 16-1 =15 Cores / Machine | 64-1= 63 GB RAM/machine 2. Now each executor should have 4 or 5 Cores. if less cores / executor then, under utilization of cluster. JVM / multithreading not being used to its fullest if more cores / executor then, too many IO on HDFS and thus HDFS IO Contention thus slow RW on HDFS So lets fix it at 5 in this case. Means 5 Cores / Executor means No. of executors / machine = 3 3. RAM We are left with 63 GB RAM and we have 3 Executors to divide into. Each gets equal share. 21 GB RAM/Executor Now, this was about Node. each Node has 21 GB RAM, 5 Cores. Each Node needs some YARN Memory Overhead where it stores variables, objects, strings. This space is also used for shuffling / optimization. Generally it should be 7-10% so for 21 GB Node it would be roughly 2 GB. So Memory left for "actual" Executor is 19 GB
@adityatavde406
@adityatavde406 5 жыл бұрын
why are you taking Number of Executor Core should be 5 . please make a video for this.
@nafisaslam4605
@nafisaslam4605 5 жыл бұрын
The number of concurrent task which an executor can process is directly proportional to the number of cores. More than five task per executor degrades the performance (it's the tried and tested number). That's the reason here 5 cores have been selected per executor.
@adityatavde406
@adityatavde406 5 жыл бұрын
@@nafisaslam4605 ..do you have any example for that as you said..'degrades the performance (it's the tried and tested number)'
@nafisaslam4605
@nafisaslam4605 5 жыл бұрын
@@adityatavde406 I can't share the screenshot due to data security policy of our client, but I suggest you can change the number of cores on your job and check the metrics on spark web UI.
@SpiritOfIndiaaa
@SpiritOfIndiaaa 4 жыл бұрын
Would have explained pictorial diagram...
@DataSavvy
@DataSavvy 4 жыл бұрын
Good suggestion SHA... I will create a better version of this
@snehakavinkar2240
@snehakavinkar2240 3 жыл бұрын
Does it mean if we have 10 jobs running in a cluster then all 10 would be using the same configuration? Thank you.
@DataSavvy
@DataSavvy 3 жыл бұрын
It does not mean that... based on how much data ur job is processing, how many partitions u have, and is ur job io intensive or compute intensive, u have to tweak these parameters
@neelbanerjee7875
@neelbanerjee7875 Жыл бұрын
@@DataSavvy please make a video on this...would be helpfull
@vishalaaa1
@vishalaaa1 Жыл бұрын
are you telling secret in low volume ????
@DataSavvy
@DataSavvy Жыл бұрын
Yes :)
@srinidhisridharan9676
@srinidhisridharan9676 2 жыл бұрын
Your audio level is bit low, kindly raise your decibel in your future videos.
@insaniyat_1512
@insaniyat_1512 2 жыл бұрын
audio level bdao yr plz
@DataSavvy
@DataSavvy 2 жыл бұрын
Yes Manish... I need to record this video again... I will fix this in new video
@realMujeeb
@realMujeeb 2 жыл бұрын
Barely able to hear you Sir!
@shivamgarg9698
@shivamgarg9698 3 жыл бұрын
didn't properly explained the disadvantages of small and large size executors
@Rarchit
@Rarchit Жыл бұрын
Here is the formula Consider this aws ec2 machine r7g.4xlarge CPU core - 16 RAM - 128g Node - 5 ========================= tiny big approach --> 5 core per executors # spark.executor.core = 5 standard number of executor per node = (total vcpu - 1 [hadoop daemon] )/ spark.executor.core (15/5) = 3 # spark.executor.memory = (total RAM per instance / number of executors per instance) (128g / 3) = 42g 42 * 0.9 = 37 (round down) // 90% of total towards executor mem # spark.yarn.executor.memoryOverhead = 10% of total memory = 42g * 0.1 = 5 (round up) # spark.driver.memory = spark.executor.memory # spark.driver.cores = spark.executor.core # spark.executor.instances = (number of executors per instance * number of core instances) minus 1 for the driver) 3 * 5 - 1 = 14 # spark.default.parallelism = spark.executor.instances * spark.executors.cores * 2 14 * 5 * 2 = 140 =========================
@adityatavde406
@adityatavde406 5 жыл бұрын
Great video series!!! one suggestion: please make a already written script for explanation for video cause your fillers(e.g. ee ,aa uu) are really annoying.
@tripathi123shailesh
@tripathi123shailesh 4 жыл бұрын
Please speak little more Lauder
@DataSavvy
@DataSavvy 4 жыл бұрын
Ok Shailesh
@tolasebrisco6565
@tolasebrisco6565 2 жыл бұрын
Keep the good work #Prinetechs I can clearly see all the good reviews about you man…i never believed my account can fixed after 7 months hahaha
@tolasebrisco6565
@tolasebrisco6565 2 жыл бұрын
Keep the good work #Prinetechs I can clearly see all the good reviews about you man…i never believed my account can fixed after 7 months hahaha
Spark Shuffle service | Executor Tuning
10:22
Data Savvy
Рет қаралды 17 М.
Useful gadget for styling hair 🤩💖 #gadgets #hairstyle
00:20
FLIP FLOP Hacks
Рет қаралды 7 МЛН
Happy 4th of July 😂
00:12
Alyssa's Ways
Рет қаралды 70 МЛН
I solved 541 Leetcode problems. But you need only 150.
7:42
Sahil & Sarra
Рет қаралды 2,3 МЛН
Spark Performance Tuning | EXECUTOR Tuning | Interview Question
18:19
TechWithViresh
Рет қаралды 31 М.
Spark Executor Core & Memory Explained
8:32
Data Engineering
Рет қаралды 59 М.
Top 5 Mistakes When Writing Spark Applications
30:37
Spark Summit
Рет қаралды 101 М.
6.8 Catalyst Optimizer | Spark Interview questions
9:53
Data Savvy
Рет қаралды 31 М.
Apache Spark Executor Tuning | Executor Cores & Memory
44:35
Afaque Ahmad
Рет қаралды 7 М.