Bhai itni information to aj tak kisi paid course me bhi nahi mili, thank you so much.
@DivyaSharma-ux4mo7 ай бұрын
This is so true, admire your hard work.!!!!!!
@hankeepankee53614 ай бұрын
18:55 Tradeoff between cpu usage (shuffle sort join) and in-memory usage (shuffle hash join)
@hankeepankee53616 ай бұрын
Good work bro... In hash join creating the hash would take O(N) -> N being number of unique values in the column. So hash join would take O(N) vs sort join which is O (NlogN)
@AnimeOverload15Ай бұрын
I have not got a detailed video more than this in my entire career
@jagannathsahoo829728 күн бұрын
excellent explanation. and that too free of cost ☺
@mrinalraj48017 ай бұрын
Great in depth concepts. Maja aa gya. You are genius. Thanks a lot. Keep up the great work you're doing for the community.
@anuragdwivedi18044 күн бұрын
bro i have never seen a detailed vedio like this
@mprtech3158 ай бұрын
I follow your both spark series. Really its valuable for me 🎉 thanks
@sambitmohanty1758 Жыл бұрын
Hi Manish, your content are amazing, keep it up.
@manojkaransingh5848 Жыл бұрын
amazing...!!!!!!!!! ...video bhaiii....@@@
@udittiwari842010 ай бұрын
Thank you sir for the detailed series! Your clear explanations have been incredibly helpful in my learning journey.
@prabhatsingh7391 Жыл бұрын
Hi Manish Bhaiya, here we perform the join based on key= id that is an integer so we can see that id%200 is the partition number where data will go ,but if key= string then ,how it will happens or in that scene internally spark create a key for each column.
@manish_kumar_1 Жыл бұрын
Murmur3 hashing is applied for strings. If you want to know more then check how murmur3 works
@divyanshusingh39662 ай бұрын
Thank you bro for providing quality content for free
@voice69054 ай бұрын
Apko KOTI KOTI PRANAM GURU JI! Please bring playlists on Apache AIRFLOW and Apache KAFKA. I'm sure they would be the best resource in the KZbin
@venkatmunna8918Ай бұрын
Thank you so much for the detailed explanation, However, I am confused about one point. Could you please clarify my question? Let's say we don't have the color coding as blue and red. Now, executor-1 has 200 partitions and executor-2 also has 200 partitions. If we consider id=102, then 102/200 = 102. How does spark determine whether the record 102 should go toexecutor-1/executor-2 ? This is discussed at the 10:56 timestamp. Thanks!
@rohitsharma-mg7hdАй бұрын
bhai pehle to ye batao 102/200=102 kaise ho gaya ? maths ati hai
@shivaog007Ай бұрын
@@rohitsharma-mg7hd we are taking the remainder
@rohitsharma-mg7hdАй бұрын
@@shivaog007 ha bhai bataya unhone , mereko galatfehmi ho gai thi hui hui
@Daily_Code_Challenge13 күн бұрын
executor 1 is taking 1 to 100 and executor 2 101 to 200 colour is showing 2 different table (df1 ,df 2)is created per executor
@adityakvs3529Ай бұрын
bhai hash table is created at individual partiton or entire data frame in shuffle hash join
@younevano19 күн бұрын
Partition level
@rajnandinipadhy2533 Жыл бұрын
so if in interview recuirter will ask what kind of join you are performing then should we say as per the data we need to analyze first what kind join should be appropriate for this or we should as spark will do the optimization internally?
@manish_kumar_1 Жыл бұрын
You can talk about types of join strategy and then give a comparison between 2 by taking some dataframe size. If interviewer further asks anything then only explain in detail.
@anuragdwivedi18043 күн бұрын
bro can you please tell what book do you follow for spark?
@abhigyanprakash56034 күн бұрын
One doubt: You explained joining on the basis of id column where you showed 1/200 gives remainder as 1 --> So, You placed the record in executor 1 with P1... Similarly 109/200 gives remained 109 --> So, You placed the record in P109. But Now assume instead of joining the records based on integer column, we are joining records based on String (Char or VARCHAR datatype). Then, how will this thing work ?
@anish_bhateja Жыл бұрын
Hi Manish, Excellent explanation. Thanks for the informative video.
@maurifkhan3029 Жыл бұрын
is it like every dataframe is split into 200 partitions before shuffling (based on number of shuffle partitions set) ? or is it like if we have 2 Dataframe to join each will get only 100 shuffle partitions
@manish_kumar_1 Жыл бұрын
No it's not like ki every dataframe will get 100. So based on joining condition 200 partitions will get created. And then you can consider 200 bucket is there and every bucket has the same joining key records. Let say df1 had id 5 is in box no 5 then from df2 also id 5 will come to box5 and then box5 is self sufficient to join.
@HanuamnthReddy10 ай бұрын
Really exemplary 🎉
@khurshidhasankhan4700 Жыл бұрын
Could you please ek video class and case class pr video Bana dijiye maximum interview me puch raha hai
@lakshya1375 Жыл бұрын
Bhai Optimization technique bataya h kya aapne kisi video me?
@Nomanqureshi22048 ай бұрын
sir spark streaming par video banaiye
@sachindubey4315 Жыл бұрын
how these 200 partitions spilited into 2 executor ? what if there is 3 or 4 executor are there how split of 200 partiton will be heppen ? ?
@manish_kumar_1 Жыл бұрын
Then partition will be distributed over 4 executors
@prathapganesh7021 Жыл бұрын
Hi you said 100 partitions in each executor but in one executor you demonstrate blue and red in one executor counts 200 could you please elaborate that. Thank you
@diksha.chaudhary Жыл бұрын
hey Manish, your videos are amazing!! 👏 love the way you explain each and every detail. thankyou for sharing your knowledge and keep it up. ✨️
@vikashroy58825 ай бұрын
Hi Manish If we follow the approach mentioned at this timestamp 9:28 , then in which partition data will go if we have 0 remainder. Ex- if we have Id as 200 or multiple of 200
@Daily_Code_Challenge13 күн бұрын
2nd
@prathapganesh7021 Жыл бұрын
Thank you great explanation 🙏
@vishaljoshi1752 Жыл бұрын
hi manish as you said sorting is nlogn and what if we combine the data suppose p1 of table has id 1 and p2 has id 1,1 then if we combine two for loops are required for this then complexity n2 .. is it perform in the same way?
@rishavsharma57322 ай бұрын
Baal kharab hogaya..xD, nice work btw..these videos are really helpful.
@aashishraja-k7u3 ай бұрын
well explained
@nityabajpai2022 Жыл бұрын
Hi Manish, I have few questions : 1. We are applying join on partitions right and not DF? Because DF are already divided into 4, 4 partitions each. 2. Now each join will make 200 new partitions, so if we join RP1 and BP3 so it will create total 200 more partitions? And this way if we'll join each partition in Red with every partition in Blue, then total we'll have 3200 partitions? 3. In the video you said - not 200 partitions per executor but executor does have 200 partitons - 100 for Red and 100 Blue.
@akhiladevangamath12776 ай бұрын
Hey, This is my understanding, my answers might help you to understand 1. we r applying join DF, yes we have 4 partitions for each DF. when we apply join, those 4 partitions will made into 200 partitions. 3. 200 partitions for each DF, so each executor has 100 partitions of DF1 and 100 partitions of DF2.
@ManishSharma-fi2vr6 ай бұрын
Thanks Manish Bhai!!
@homeactfun Жыл бұрын
Amazing video
@ajaypatil1881 Жыл бұрын
Will you please make video on O(n^2) ? what actually it is
@vishaljoshi1752 Жыл бұрын
hi manish one more question you are saying in-memory for hash-table but as we know first data is loaded in executor memory and logical operation are performed so in shuffle-sort join all the things are performing in memory so why we are not saying shuffle-sort join in-memory as both the partitions for the same key should be loaded in-memory then after join operation will be performed ?
@Daily_Code_Challenge13 күн бұрын
We can't say because shuffle sort-merge uses disk also while hash-table relies heavily on the hash table being entirely in memory,
@rohanchoudhary67210 ай бұрын
Nice video sir, but use modulus operation, divide is little confusing.
@manish_kumar_110 ай бұрын
Modulus operator dekhiye kaise kaam karta hai
@rohanchoudhary67210 ай бұрын
@@manish_kumar_1 aap remainder hi to lerhe ho 200 ka
@quiet86917 ай бұрын
Tera intro mujhe namaskar mai ravish Kumar jaisa lagta h 👍👌🔥
@sreelakshmang72756 ай бұрын
how to know dataframe size?
@raajnghani Жыл бұрын
I am working as Operation Executive in a warehouse, but I started learning sqoop, hive, MySQL, MongoDB, Hbase, Nifi, Kafka, spark, AWS Services. It is completely Non-IT, I cleared two interviews. How do I get an experience certificate for working on above technologies.
@manish_kumar_1 Жыл бұрын
Tell them that you don't have experience. You have done all the project by your own. If you cleared interview means you are good fit for the role.
@raajnghani Жыл бұрын
@@manish_kumar_1 Recuiter need experience after clearing l2 discussion also.
@adityakvs3529Ай бұрын
Bhai which join is better shuffle hash or sort merge and how spark decides which join it needs to use
@KaranSingh-hx8dh Жыл бұрын
Thank you for explaining.
@Amarjeet-fb3lk6 ай бұрын
200 partition banega,means 200 cores bhi chahiye hoga, Tabhi to 200 partition banega. Agar 200 cores nahi hua to?
@manish_kumar_16 ай бұрын
Tab bhi chalega. Distributed computing ka kaam hi hai aapke Kam resource me v job chalane ka. Aapko Pura spark samjhne ke liye to Pura playlist dekhna parega
@younevano18 күн бұрын
It will run 200/n times where n= number of cores!
@sanooosai8 ай бұрын
great sir thank you
@RohitKumar-kd5fj2 ай бұрын
DIvision hoga kya ? Mereko lagra hai modulus hoga
@Daily_Code_Challenge13 күн бұрын
yes wo modulus hai
@RajeshKumar-re8tj6 ай бұрын
Which memory pool utilizing to create hash table during shuffle hash join?
@younevano19 күн бұрын
Executor's those partitions are on after shuffling?
@mhdakram4 ай бұрын
An executor can have only one partition at a time...is this not correct?
@akumar2575.7 ай бұрын
day 4 done👍
@mayanksinghsoniАй бұрын
what if the id is not numeric?
@mdasif24116 ай бұрын
Jb salary table 10MB se km h r phla table itna zyada, toh dono m same no. of partitions kaise bnega?
@rameshbayanavenkata1305 Жыл бұрын
Hi Manish..i am following all your videos. Thanks for your great contribution in explaining each and every thing in detail. As you said records will be segregated in each partition as per the reminder which we get from dividing id value with 200 partitions. What if the joining is done on name column instead of id. how division takes place here to segregate name column in each partition. pls clarify..
@amritranjannayak27059 ай бұрын
I also have same question, Please answer this.
@younevano18 күн бұрын
@@amritranjannayak2705 he replied on same other comment murmur3 hashing is done for joining on strings!
@kartikgupta22994 ай бұрын
Per executor 200 partition bante dikhre hai as in your vedio but aap bolre ho per executor 200 nhi banege total 200 partition banenge please ye part explain kro aur 200 by default kyu bante h
@mohammadfurquan241 Жыл бұрын
Sir I have done Python, ,basic SQL, Linux commands All DBMS concepts. CAN I LEARN SPARK NOW OR IS THERE ANY PREREQUISITE FOR SPARK???????