This man is singlehandedly responsible for spawning data scientists in the industry.
@MSuriyaPrakaashJL Жыл бұрын
I am happy that I completed this video in one sitting
@baneous18 Жыл бұрын
42:17 Here the 'Missing values' is only replacing in the 'Name' column not anywhere else. even if I am specifying the columns names as 'age' or 'experience', it's not replacing the null values in those columns
@Star.22lofd Жыл бұрын
Lemme know if you get the answer
@WhoForgot2Flush3 ай бұрын
Because they are not strings. If you cast the other columns to strings it will work as you expect, but I wouldn't do that just keep them as ints.
I have to say, it is nice and clear. The pace is really good as well. There are many tutorials online that are either too fast or too slow.
@shritishaw75103 жыл бұрын
Sir Krish Naik is an amazing tutor, learned a lot about statistics and data science from his channel
@candicerusser90953 жыл бұрын
Uploaded at the right time. I was looking for this course. Thank you so much.
@farees962 жыл бұрын
Hvala!
@arturo.gonzalex2 жыл бұрын
IMPORTANT NOTICE: the na.fill() method now works only on subsets with specific datatypes, e.g. if value is a string, and subset contains a non-string column, then the non-string column is simply ignored. So now it is impossible to replace all columns' NaN values with different datatypes into one. Other important question is: how come values in his csv file are treated as strings, if he has set inferSchema=True?
@kinghezzy2 жыл бұрын
This observation is true.
@aadilrashidnajar94682 жыл бұрын
Indeed i also observed the same issue, now don't set inferSchema=True while reading the csv to RDD then .na.fill() will work fine
@sathishp3180 Жыл бұрын
Yes, I found the same. Fill won't work if the data type of filling value is different from the columns we are filling. So preferable to fill 'na' in each column using dictionary as below: df_pyspark.na.fill({'Name' : 'Missing Names', 'age' : 0, 'Experience' : 0}).show()
@aruna5472 Жыл бұрын
Correct, even if we give value using dictionary like @Sathish P, If those data type are not string, it will ignore the value, once again, we need to read csv without inferSchema=True, may be instructor missed it to say that missing values applicable only for the string action( Look 43:03 all string ;-) ) . But this is good material to follow, I appreciate the good help !
@gunjankum Жыл бұрын
Yes i found the same thing
@alireza22953 ай бұрын
This video provides an excellent starting point for the journey-clear, concise, and incredibly efficient. Great job!
@MiguelPerez-nv2yw2 жыл бұрын
I just love how he says “Very very simple guys” And it turns out to be simple xD
@nagarjunp233 жыл бұрын
You guys are literally reading everyone's mind. Just yesterday I searched for pyspark tutorial and today it's here. Thank you so much. ❤️
@centershopgaming76553 жыл бұрын
Same thing
@Mathandcodingsimplified3 жыл бұрын
U phone is being tracked.... It's no coincidence.... All our online activities are recorded
Dear Mr Beau, thank you so much for amazing courses on this channel. I am really grateful how such invaluable courses are available for free.
@sunny105282 жыл бұрын
Please thank Mr Krish Naik
@mohandev73853 жыл бұрын
I didn't expect krish.... Amazingly explained
@anikinskywalker71273 жыл бұрын
Why are u uploading the good stuff during my exams bro
@awwtawnoo3 жыл бұрын
HaHa
@settarapramod4463 жыл бұрын
Xactly
@neillunavat3 жыл бұрын
EVEN MY EXAMS GOIN ON
@subramanianchenniappan40592 жыл бұрын
Can't you watch it later🤣🤣
@antonmursid35052 жыл бұрын
Antonmursid🙏🙏🙏🙏🙏✌🇸🇬🇸🇬🇸🇬🇸🇬🇸🇬✌💝👌🙏
@lakshyapratapsigh35183 жыл бұрын
VERY MUCH HAPPY IN SEEING MY FAVORITE TEACHER COLLABORATING WITH THE FREE CODE CAMP
@dataisfun49642 жыл бұрын
Hi krishnaik, All I can say is just beautiful, I followed from start to finish, and you were amazing, was more interested in the transformation and cleaning aspect and you did justice, I realize some line of code didn't work as yours but all thanks to Google for the rescue. This is a great resource for introduction to PySpark, keep the good work.
@arjitsrivastav555Ай бұрын
Krish Naik has pretty much nailed it in this video. Loved it👏
@IvanSedov-i7f2 жыл бұрын
Прекрасное видео и прекрасная манера подачи материала. Большое спасибо!
@MöbiusuiböM5 ай бұрын
15:20 - lesson 2 31:35 - lesson 3
@vivekadithyamohankumar61343 жыл бұрын
I ran into an issue while importing pyspark(Import Error) in my notebook even after installing it within the environment. After doing some research, I found that the kernel used by the notebook, would be the default kernel, even if the notebook resides within virtual env. We need to create a new kernel within the virtual env, and select that kernel in the notebook. Steps: 1. Activate the env by executing "source bin/activate" inside the environment directory 2. From within the environment, execute "pip install ipykernel" to install IPyKernel 3. Create a new kernel by executing "ipython kernel install --user --name=projectname" 4. Launch jupyter notebook 5. In the notebook, go to Kernel > Change kernel and pick the new kernel you created. Hope this helps! :)
@yashdhaga70473 ай бұрын
Thank you so much!
@yashbhawsar08723 жыл бұрын
@Krish Naik Sir just to clarify at 26:33 I think the Name column min-max decided on the lexicographic order, not by index number.
@shankiiz Жыл бұрын
yep, you are right!
@JackSparrow-bj5ul9 ай бұрын
Thank you so much @Krish Naik for bringing this amazing content. tutorial has really helped me clearing few concepts and really thoughtful hands-0n explanation. Hats-off to the FCC team. Looking forward to your channel @Krish.
@sharanphadke49543 жыл бұрын
Biggest crossover : Krish Naik sir teaching for free code camp
@alanhenry98503 жыл бұрын
Atlast krish naik sir in freecodecamp😍
@SporteeGamer3 жыл бұрын
Thank you so much to give us these type of courses for free
@SameelJabir2 жыл бұрын
Such an amazing explanation. For a beginner: 1.50!hours really worth... You nailed it in a way with very simple examples In high professional way.... Huge Hatsoff
@bhatt_nikhil9 ай бұрын
Really good compilation to get started with PySpark.
@khangnguyendac7184 Жыл бұрын
42:15 The Pyspark now have update the na.fill(). It could only fill up the "value type" matching with "column type". For example, in the video, the professor only could replace all 4 columns because all 4 "column type" is "string" as the same as "Missing value". This being explain in 43:02.
@adekunleshittu5697 ай бұрын
You have to loop through the columns
@ccuny13 жыл бұрын
Yet another excellent offering. Thank you so much.
@carlosrobertomoralessanche36322 жыл бұрын
You dropped this king 👑
@tech-n-data9 ай бұрын
42:11 As of 3/9/24 the na.fill or fillna will not fill integer colums with string. 51:31 aslo df_pyspark.filter('Salary15000')
@TheBarkali2 жыл бұрын
Dear Krish. This is only W.O.N.D.E.R.F.U.L.L 😉. Thanks so Much and thanks to professor Hayth.... who showed me the link to your training. Cheers to both of U guys
@lavanyaballem5085 Жыл бұрын
Such an Amazing Explanation! you Nailed it KrishNaik
@graenathan2 жыл бұрын
Thanks
@tradeking30783 жыл бұрын
At 26:37 , Min and Max values from a column of string data type were not based on the index where they were placed, but it is based on their ASCII values of the words ,their order of characters that are arranged within and the order is ' 0 < 9 < "A" < "Z" < "a" < "z" '. Min will be letter comes first and Max will be which comes last of all the characters, if two similar characters found, it moves to next character and checks and so on ...
@patrickbateman76652 жыл бұрын
True
@zesky66545 ай бұрын
42:11 - Note: The fill.na function only replaces values of the same type as the replacement. So the code on the screen will only replace the NULL values in the 'Name' column.
@cherishpotluri9573 жыл бұрын
Krish Naik on FCC🤯🔥🔥
@sivakumarrajabather11409 ай бұрын
The session is really great and awesome. Excellent presentation. Thank you.
@RossittoS3 жыл бұрын
Great content! Thanks! Regards from Brazil!!!
@nagarajannethi3 жыл бұрын
🥺🥺🙌🙌❣️❣️❤️❤️❤️ This is what we need
@ludovicgardy Жыл бұрын
Really great, complete and straight forward course. Thank you for this, amazing job
@simileoluwaaluko75822 жыл бұрын
Great man. Great! 👍🏼👍🏼👍🏼👍🏼
@mariaakpoduado Жыл бұрын
what an amazing tutorial!
@ujjawalhanda47482 жыл бұрын
There is an update in na.fill(), any integer value inside fill will replace nulls from columns having integer data types and so for the case of string value as well.
@harshaleo43732 жыл бұрын
Yeah. If we are trying to fill with a string, it is filling only the Name column nulls.
@austinchettiar67842 жыл бұрын
@@harshaleo4373 so whats the exact keyword to replace all null values?
@akashk28243 жыл бұрын
Thank you so much sir, 100 % satisfied with your tutorial. Loved it.
@siddhantbhagat72162 жыл бұрын
I am very happy to see krish sir on this channel.
@Dr.indole Жыл бұрын
This video is pretty much amazing 😂
@barzhikevil68733 жыл бұрын
For the filling exercise on minute 42:00 aprox, I cannot do it with integer type data, I had to use string data like you did. But them in the next exercise, the one on minute 44:00, the function won't run unless you use integer data for the columns you are trying to fill.
@Richard-DE3 жыл бұрын
@@caferacerkid you can try to read with/without inferSchema = True and check the schema, you will see the difference. Try to read again for Imputer.
@ammadniazi2906 Жыл бұрын
Where you are setting up the environment variables for spark and Hadoop.
@spoorthydevineni822 Жыл бұрын
extraordinary content
@Pg11001 Жыл бұрын
At 42:23 there was a function called 'fill' of used and it only replacing the string type datatypes with other string datatype so if you are facing the issue of only replacing the rows data one or two places you go up cell in your python notebook(.ipynb) file and at the reading time set 'inferSchema=False' so it catches the the integral type data that is NULL when they are not defined as integer. Thanks for video.
@LifeOnTwoWheels36910 ай бұрын
Thank you
@simple_bihari_babua Жыл бұрын
This feels like it started in between, was there any previous video to it. Which explained the installation and other processes
@saiajaygundepalli3 жыл бұрын
Krish naik sir is teaching wow👍👍
@estelle9819 Жыл бұрын
Thank you so much, this is incredibly helpful.
@raghavsrivastava29103 жыл бұрын
Surprised to see Krish Naik sir here ❤️
@subhajeetchakraborty17913 жыл бұрын
sameee me tooo 🤩
@ronakronu3 жыл бұрын
nice to meet you krish sir😍
@dipakkuchhadiya93333 жыл бұрын
I like it 👌🏻 we request you to make video on blockchain programing.
@DuongTran-zh6td2 жыл бұрын
thank you from Vietnam
@renadhc68 Жыл бұрын
Brilliant project based tutorial
@aliyusifov54812 жыл бұрын
Thank you so much for an amazing tutorial session! Easy to follow
@bhanu2426295 ай бұрын
Excellent explanation Bro... :)
@larsybarz3 ай бұрын
Thanks so much man. This is awesome
@RaviKiran_Me Жыл бұрын
At 1:01:09, maximum salary you found is basically the maximum salary of each person in the departments he/she is working and it's not the maximum total salary of each person.
@thecaptain2000 Жыл бұрын
in your example df_pyspark.na.fill('missing value').show() replace null values with "missing value" just in the "Name" column
@javierpatino4142 Жыл бұрын
Good video brother.
@arulmouzhiezhilarasan85183 жыл бұрын
Impeccable Teaching! Thanks!
@sukurcf Жыл бұрын
26:34 I don't think it's based on index. I just tried changing the indices for min and max values for string. Looks like it's checking the chronological order.
@DonnieDidi19823 жыл бұрын
I was very much looking for this. Great work, thank you!
@sushilkamble83793 жыл бұрын
10:00 | Whoever is getting Exception: Java gateway process exited before sending the driver its port error, Install Java SE 8 (Oracle). The error will be solved.
@kazekagetech9883 жыл бұрын
did you solve bro? im facing it now
@vitazamb33752 жыл бұрын
Me too. Did you manage to solve this problem?
@hariharan1992292 жыл бұрын
Thanks a ton for this wonderful Masterpiece. It helped me a lot!
@bansal02 Жыл бұрын
Really thankful for the video.
@innovationscode99093 жыл бұрын
Massive. This is a GREAT piece. Well done. Keep going
@saurabhdakshprajapati14997 ай бұрын
Good tutorial, thanks
@konstantingorskiy57162 жыл бұрын
Used this video to prepare for the tech interview, hope it will help)))
@michasikorski66712 жыл бұрын
Is this enought to say that you know spark/databricks?
@johanrodriguez2413 жыл бұрын
Finished!. But i still want to see the power of this tool.
@brown_bread3 жыл бұрын
One can do slicing in PySpark not exactly the way it is done in Pandas. Eg. Syntax : df_pys.collect()[2:6] Output : [Row(Name='C', Age=42), Row(Name='A2', Age=43), Row(Name='B2', Age=15), Row(Name='C2', Age=78)]
@programming_duck31222 жыл бұрын
Thank you really useful
@rajatbhatheja3562 жыл бұрын
However one thing is that take precaution while using collect. collect is an action and will execute your DAG.
@Jschmuck8987 Жыл бұрын
Great video. Pretty much simple.
@ChaeWookKim-vd7uy3 жыл бұрын
I love this pyspark course!
@critiquessanscomplaisance83532 жыл бұрын
That for free is charity, litteraly! Thanks a lot!!!
@programming_duck31222 жыл бұрын
Min 58:23 to show the maximum salary you should use max instead of sum? sum will work because name is unique, but i found this a bit misleading.
@amitkumarsaha24242 жыл бұрын
Amazing content
@sanjaygstark3 жыл бұрын
It's quite impressive 💫✨
@ofranceable Жыл бұрын
Excellent Video.
@Nari_Nizar3 жыл бұрын
At 1:09:00 when you try to add Independent feature I get the below error: Py4JJavaError Traceback (most recent call last) in 1 output = featureassembler.transform(trainning) ----> 2 output.show() C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py in show(self, n, truncate, vertical) 492 493 if isinstance(truncate, bool) and truncate: --> 494 print(self._jdf.showString(n, 20, vertical)) 495 else: 496 try:
@crazynikhil38113 жыл бұрын
Indians are the best teachers in the world. Thank you :)
@porvitor2 жыл бұрын
Thank you so much for an amazing tutorial session!🚀🚀🚀
@juanviola58255 ай бұрын
you are the best! thanks!
@jorge18693 жыл бұрын
The full installation of PySpark was omitted in this course.
@praveenkumare21573 жыл бұрын
Atlast i found a precious one
@HariEaswaran983 жыл бұрын
Thanks!
@ujirali46412 жыл бұрын
You 5ioooppeweeetyiiop0
@anassrtimi30152 жыл бұрын
Thank you for this course
@PallabM-bi5uo2 жыл бұрын
Hi, thanks for this tutorial, If my dataset has 20 columns, why describe output is not showing in a nice table like the above? It is coming all distorted. Is there a way to get a nice tabular format like above for a large dataset?
@quanoan982310 ай бұрын
sorry how you could see the description of the method, do you have keyboards shortcuts, example in 40:26
@venkatkondragunta97042 жыл бұрын
Hey Krish, Thank you so much for your efforts.. this is really helpful..
@soundcollective22402 жыл бұрын
This is pretty much a very useful video ;) thanks
@rmehta268 ай бұрын
How do you get the description of a function inline? eg: at timestamp 36:59, while explaining the features under drop
@santoshturamari2 жыл бұрын
At 1:22:32, it was not running just because it was detached to any cluster?
@doreyedahmed2 жыл бұрын
Thank you so much very nice explanation If you use pyspark, its consider we deal with Spark Apache