It was a good video, I like how he didn't cut out the part when he's stuck at some problem. ⭐⭐⭐⭐⭐
@normalperson11303 жыл бұрын
Thank you Aakash for giving a raw walk-through. Apart from usual documentation stuff I think the ability to google and find answers for the problems are much more important skills in the area of Data Science apart from ofcourse mathematical understanding. This walkthrough actually gave me more confidence in using pandas without worrying about typical syntax pitfalls
@jovianhq3 жыл бұрын
Glad it was helpful!
@raihanhosain3374 Жыл бұрын
best project tutorial in any youtube channel. Everyone make video and cut those portion when they get stuck. On the other hand you just show the real scenario and show us how to find solution using google as well. we want similar video on data science project. Love from Bangladesh.
@shivsharma91532 жыл бұрын
Do you know why I loved this video? You kept it raw and real, you are clearly portraying how a data analyst thinks and does the project, which I believe is more important than fancy coding...syntax you can get easily but analytical thinking requires the real efforts
@snehaldamkondwar6182 жыл бұрын
Hi shiv Im from non technical background i was doing the given project but when we close all tabs how to reach out to the same notebook
@shivsharma91532 жыл бұрын
@@snehaldamkondwar618 Hi, where did you save it locally? Which folder
@snehaldamkondwar6182 жыл бұрын
@@shivsharma9153 need to save??
@snehaldamkondwar6182 жыл бұрын
@@shivsharma9153 open notebook through jovian run through colab with help of kaggle datset . I have written some line of code in colab .then i have close all tabs . Now how to go to that file where i have write my code.
@shivsharma91532 жыл бұрын
@@snehaldamkondwar618 try to search locally with the name of notebook you may be able to locate it
@rajivgarg94802 жыл бұрын
I have seen only half the video. Couldn't stop myself from apprecating the good work. Couldn't have been done any better. Way better than the Paid education platforms
@sabchillhai8023 жыл бұрын
great work jovian , we need more such types of session. Thanks a lot
@jovianhq3 жыл бұрын
Glad you liked it!
@MuhammadAkbarAttamimi3 жыл бұрын
55:52 this dataset contains New York data accidents, there are around 10.000 record. I checked it using df[df['City'] == 'New York']
@garvitpoddar69473 жыл бұрын
Yes
@AakashNS3 жыл бұрын
Thanks! Not sure why I missed it. Maybe I was using a different version of the dataset.
@claudiolb85523 жыл бұрын
@@AakashNS not sure why but ["New York" in Df.City] always returns false try it with a another city it just doesn't work
@tirthhihoriya6903 жыл бұрын
@@claudiolb8552 Use: a. >>> 'NY' in list(df.State) or b. >>> 'New York' in list(df.City) or c. >>> cities_by_accident['New York']
@shailjamishra94232 жыл бұрын
yes, new york city is there in the dataset, the state which is missing is 'Alaska'
@aashisethiya4653 Жыл бұрын
Aakash, you are one of the best teachers I have come across. Coming from a hard-core medical background and pivoting into data analytics I came across your panda's courses while preparing for my foundation in python before starting a master's in the US in analytics this Fall. Hands down you have given beginners like me a lot of handholding with your courses and videos!
@jovianhq Жыл бұрын
Thanks, I'm glad you found our course helpful! 😊 - Aakash
@aashisethiya4653 Жыл бұрын
@@jovianhq I went through many teachers on youtube and data camp: but truth to be told- most are ludicrously formal in their teaching methods and have a slower theoretical pace. Is there any possibility to connect with Aakash to get certain roadmap tips for a beginner who plans to venture into the US Health Business Analytics Domain?
@sandeepmesa2 жыл бұрын
I like the way you google for help ..Appreciate your time ..learnt new things on how to articulate our work ..thanks
@jovianhq2 жыл бұрын
Glad it was helpful!
@shreyaskulkarni76123 жыл бұрын
The current dataset is updated. A high percentage of accidents occur between 3 pm to 6 pm (probably people in a hurry to get to home) Next higest percentage is 6 am to 9 am. Over 1100 cities have reported just one accident (need to investigate On Sundays, the peak occurs between 11 am and 6 pm, unlike weekdays
@jovianhq3 жыл бұрын
Interesting analysis and insights Shreyas!
@freehappymeal2 жыл бұрын
Thank you for teaching us how to problem solve and the whole EDA process!
@jovianhq2 жыл бұрын
Happy to help!
@outinthebeach3 жыл бұрын
Great course Aakash - this and everything else you have put here. Thanks for your generosity to teach this the way you have done it. Brilliant!!!
@prisri5953 Жыл бұрын
NY is in the state list. The Missing states are AK(Alaska) and HI(Hawaii). It also considers DC as state
@anwoybarua82133 жыл бұрын
One of the best KZbin channel for data analysis learners❤️❤️
@eyesofdoriss Жыл бұрын
Great sharing. I've been looking for a full guide like this one for a while. Thank you!
@jovianhq Жыл бұрын
Glad you enjoyed it!
@jeetthakkar22972 жыл бұрын
Sir actually New York data is present in the given data set. We get the output as False if we use: 'New York' in df ['City'] And we get the output as True if we use : 'New York' in df ['City'].unique()
@jovianhq2 жыл бұрын
Yes Jeet, you are correct. We found it later but didn't update the video to show that this type of error might happen any time during working on a project. Great work on finding it!
@harshucore Жыл бұрын
I used - 'New York' in df.values and got True
@bvvsr893 жыл бұрын
Watching the master is how you learn...Thanks a lot for this...
@muralikumaar94562 жыл бұрын
Great session on EDA. We need more such sessions on different datasets.
@jovianhq2 жыл бұрын
This is just an example, we hope the viewers will be able to make better EDA projects on different datasets after watching this video.
@sivaramaguhans40023 жыл бұрын
I can't see an EDA explanation clearly in other videos... awesome 🎉
@piyushkumar-kb2jc3 жыл бұрын
concept is crystal clear by anuj bhyia.
@jovianhq Жыл бұрын
Thanks!
@ankitlakshya4502 жыл бұрын
bro you were my senior in intermediate .ascent junior college ,vizag . got a clarity on eda btw
@jovianhq Жыл бұрын
Glad you liked our tutorials!
@tiwarirr3 жыл бұрын
Best teacher for Data scince!
@bane2256 Жыл бұрын
This was excellent. I hope for more of these in 2023
@jovianhq Жыл бұрын
Definitely!! Stay tuned, more interesting videos coming soon.
@bane2256 Жыл бұрын
@@jovianhq is this the type of project that is sufficient to be included in an analytics portfolio? or does it need to be something more extensive?
@deepasarojam44253 жыл бұрын
This is best video on EDA I have ever watched! Thanks Aakash :)
@jovianhq3 жыл бұрын
Thanks for the kind feedback!
@Phoenix_Bro1 Жыл бұрын
This was a superb explanation of how to do EDA. Extremely helpful, Aakash!
@nikunjdeeep2 ай бұрын
this EDA is so motivating to me .......we all search in google...i thought why i can't recall all of those pandas function....
@Mlksgf Жыл бұрын
What a great Tutorial! The df is obviously updated and I cannot find the 'New York' value in Cities, BUT there are data in cities_by_accident "cities_by_accident['New York']" and is equal to 7068
@moymaya3 жыл бұрын
Thank you Aakash. Really helpful. Liked the way we committed mistakes and even learnt something new from it.
@jovianhq3 жыл бұрын
Glad you liked it
@igordemetriusalencar58613 жыл бұрын
Good class of Python pandas, but in R exploratory and statistics analysis are way easier compared to Python. Example: data_frame %>% filter(City == "New York") bam!! dataset filtered. Summarize numeric data => data_frame %>% summary() !! bam!! Done!! in a totally functional way.
@jovianhq Жыл бұрын
Both R and Python are great, you can use either one. Python is gaining more traction because it also has great packages for machine learning & deep learning.
@neelajguhaneogi8348Ай бұрын
New York data is there, manually checked the whole data to find the city because the method you showed at 58 minutes mostly doesn't work because of the spaces, some values contain unnecessary spaces and that creates a problem.
@sajjadabdullah2 жыл бұрын
Perfect video. I was looking for such video. Thank you Sir
@raminirakeshkumar82873 жыл бұрын
Thank you Aakash, great work
@unpatel12 жыл бұрын
This is a great project and I really enjoyed it. After finishing this video yesterday, I am working on other parameters to expand my analysis. I would love to see more projects from Akash. Thank you.
@snehaldamkondwar6182 жыл бұрын
Hi do you know once we close all tabs how to work on it again
@user-zj9pq5xc7x5 ай бұрын
loved your freecodecamp course. thank you so much
@architnangalia34263 жыл бұрын
56:02 The dataset does contain 'New York' "" cities_by_accident['New York'] "" gives us the output as 10255
@shailjamishra94232 жыл бұрын
yes..but it does not show the values..just showing count...strange!!
@jovianhq Жыл бұрын
Yes, the dataset does contain New York now.
@scapri10003 жыл бұрын
Thank you. This is one of the best video on EDA .
@SeunOnSet Жыл бұрын
Thank you for sharing this! It was really insightful to see the analysis process from start to finish. It also answered a few questions I had.
@jovianhq Жыл бұрын
Glad it was helpful!
@shubhamtalks97183 жыл бұрын
Very educational video. Please keep posting such videos.
@raghvendrasingh80372 жыл бұрын
nice video, simple explaination and the best part was it from the scratch. loved it
@beatmarsgo6972Ай бұрын
Didn't thought I would see you here after the freecodecamp course
@gunngunn67633 жыл бұрын
Thank You... looking forward to your upcoming videos
@jawedkhan86023 жыл бұрын
You are doing great job. Thank you
@jovianhq3 жыл бұрын
Thank you!
@TheHasanjafreee Жыл бұрын
This was great! Thank you for the video
@theforester_2 жыл бұрын
awesome video! big shout out from Brazil
@jovianhq2 жыл бұрын
Hey Mauricio👋, thanks!
@sarzilhossain59772 жыл бұрын
"New York" in df. City returns False But "New York" in df.City.uniqu() returns True. (Which I have no explanation for) And in fact, there are 4220 accident cases inside the dataset which occurred in New York inside the dataset (The dataset could be updated recently.). I don't know if it has been updated. But since the accident records fall in between the year 2016 and 2020, it would seem weird if new rows get added later on.
@jovianhq2 жыл бұрын
You are correct, there was "New York" in the dataset before as well. df.City returns a Series where if you search using the "in" operator, it will search for the indexes and not match with the values. Where as df.City.unique() creates a list and "New York" is searched within that list so you were able to find "New York".
@InsaneRealityLeak3 жыл бұрын
Thank you so much. Definitely a very useful video. ✌🏽
@jovianhq3 жыл бұрын
Glad it was helpful!
@UCEAbhishekLokhande Жыл бұрын
Thank You Very Much learn lots of things through this session
@jovianhq Жыл бұрын
Glad to hear that!
@Griffindor21 Жыл бұрын
Really great video! Any chance I can get a copy of the jupyter file?
@abhisarshrivastava46672 жыл бұрын
This is really helpful thank you Jovian
@jovianhq2 жыл бұрын
Glad you liked it!
@pandabear60953 жыл бұрын
Thank you very much ! This video was useful and easy to understand.
@moeid99353 жыл бұрын
i liked ur naturality
@ShelloSongz2 жыл бұрын
Wow, thank you for your concise explanations.
@jovianhq2 жыл бұрын
Glad it was helpful!
@SarcasmWEB2 жыл бұрын
Thank you so much! It was very educational
@muhammadshoaibfareed25773 жыл бұрын
A great session indeed
@jovianhq3 жыл бұрын
Glad you liked it!
@amanpreetsinghgulati24752 жыл бұрын
Hi, at around 49:17 when you are checking that weather we have 'New York' data or not so in that when we are checking for the existence with, if 'New York' in df.values - it will return True And If 'New York' in df.City - False Also If 'Dublin' in df.City - False ( and for all the other cities ) So, in my preference we need to use the df.values ( it will check the whole dataset - yes might be time taking and requires unwanted computing processing as well ) Please help us to improve this part Thanks
@jovianhq2 жыл бұрын
Yes @Aman, you are correct, New York is indeed present in the dataframe. We've purposefully kept the video in it's raw format instead of editing it. This shows that it's very common to get errors like these while working on your project, one have to be very careful before making a conclusion.
@amanpreetsinghgulati24752 жыл бұрын
@@jovianhq yes sir, thanks for the session learnt a lot from this basically for "how" to do it there is ample of resources available but "what" to do in EDA is hard to find Thanks for that
@jyothiramesh3450 Жыл бұрын
Hey I am getting an error while installing packages. "You may need to restart the kernel to use updated packages"
@NiviudPu10 ай бұрын
Shall i do for this as my mini project???
@bikrammajhi30202 жыл бұрын
Thank you so much Sir !!
@navyaagarwal59182 жыл бұрын
Among the top 100 cities in number of accidents, which states do they belong to most frequently? How do we solve this question
@ytg66633 жыл бұрын
Big thank you for being Here 👍👍
@jovianhq3 жыл бұрын
Glad you liked it!
@imdadood57053 жыл бұрын
@36:30 We can also do, df.describe().shape[1] @54:40, I got the results for new york. I did cities_by_accident.loc[“New York”]
@jovianhq Жыл бұрын
Yes, the dataset now contains information about New York
@rohan30497 Жыл бұрын
For personal use:- 1:17:19
@sandipansarkar92112 жыл бұрын
finished watching
@mansigaikwad9 Жыл бұрын
idk if they have updated the dataset , but i just tried to find whether New City is there or not and if yes then the number of records ....(referring to 56:00 ) used this code - len(df[df['City']=='New York']) and got the answer.. so , New york is there in the dataset and the number of accidents is 7068
@jovianhq Жыл бұрын
You are correct! New York was indeed present in the dataset, but in the live session it got skipped due to some mistake in code.
@raghavverma1202 жыл бұрын
I did read your exploratory analysis file for crop production analysis… and all the agroup by queries that you had run were wrong.. plz look into it and rectify them
@hrittickdebnath353 жыл бұрын
You did a fantastic job buddy
@jovianhq3 жыл бұрын
Glad you liked it!
@825sohambharambe92 жыл бұрын
In my case when i read the file the jupyter notebook is taking way too long time What can i do?
@dilaraesmer Жыл бұрын
Thank you so much for all your efforts :)
@jovianhq Жыл бұрын
Thank you for the comment! Glad you like the videos
@rishabgupta2733 Жыл бұрын
On data preparation step my data frame is crashing continuously. What to do now?
@NSASANAPURIKAVYASRI2 жыл бұрын
it is asking permissions to use those datasets,what should i do?
@PinaColada652 жыл бұрын
tysm for this. this tutorial is a blessing
@jovianhq2 жыл бұрын
You're so welcome!
@sharkk29793 жыл бұрын
aakash is knowledeble as sky .
@jovianhq3 жыл бұрын
Can't agree more! - Jovian Team
@hydemi832 жыл бұрын
Great video 👏 Congrats for this awesome job
@jovianhq2 жыл бұрын
Thank you very much!
@lakhanpatel27023 жыл бұрын
sir i try this code and his show True in 'New York' city first i see df.values df.values show my all data value in array form then i write this code 'New York' in df.values this line of code show True as a output.
@kshitizprajapati6943 жыл бұрын
i have completed zero to pandas course can you plz create content about sql integration project?
@jovianhq3 жыл бұрын
Sure we will definitely consider the topic for our upcoming courses.
@akshayshukla4358Ай бұрын
My colab crashes every time i use to read this dataset. it runs for 2-3 minutes and then it get crashed. anyone can help me on this..
@datayogi_2 жыл бұрын
After excluding the bing data, wasn't there a need to recreate the graphs and insights done before finding that bing data is faulty ?
@jovianhq2 жыл бұрын
Yup, you are correct, we should always do more research before concluding something
@datayogi_2 жыл бұрын
@@jovianhq okay 😊, thanks for the reply
@Carworld-s5l2 жыл бұрын
Previously I felt to remember all the pandas methods but you made me confident. Thank you Bhai❤❤
@jovianhq2 жыл бұрын
Glad it was helpful! Check our other courses at jovian.ai/learn
@atifshaik11562 жыл бұрын
Is it Fine to Google Something while working on a project??Like How did u in the Video??
@jovianhq2 жыл бұрын
Yes, it's absolutely fine, you're not expected to know everything, and even if you know there can be a better way of implementing the same thing. So it's totally fine to google something out.
@yashdhangar3261 Жыл бұрын
Which algorithm is used
@vishwaslad18102 жыл бұрын
Great Video
@abhishekkumar-qi3is3 жыл бұрын
please make vedio of feature engirreing and selection and thanks for this content
@jovianhq3 жыл бұрын
Hey, have you tried our Machine Learning course? We have covered feature Engineering/Selection and lots of other interesting topic in that course. View the course from here -> zerotogbms.com
@atharvaparanjape95852 жыл бұрын
at 37:59 how did we get a plot without importing matplotlib ??
@aryanrana5658 Жыл бұрын
It's a good video but the dataset you uploaded that is updated one . We also want the row messy dataset which u use while handling missing values
@jovianhq Жыл бұрын
Thank You. Unfortunately the dataset was updated from Kaggle, we don't have access to the previous version to the dataset.
@vikasmishra43853 жыл бұрын
Hi I have one issue when i am trying to run a histplot in seaborn it is show a error as "module 'seaborn' has no attribute 'histplot'" i am confused like what might be the reason i tried updating the whole PIP but of no use. Can you suggest what shall be the possible solution.
@jovianhq3 жыл бұрын
Try updating the seaborn library using the following command `pip install -U seaborn`
@gajanansawadadkar5003 Жыл бұрын
Good session
@bhushanwagh71923 жыл бұрын
Awesome sir
@debojitmandal86703 жыл бұрын
Sir here you have a column called siverity And it tells the siverity of the accident . So what I am asking is to find out the cities with highest number of accidents can I group by function and group based on the city and siverity . I.e df.groupby('City). Siverity.sum().sort_values( ascending = False) Bcs I feel this is a better approach then using unique values . Please please reply back
@jovianhq3 жыл бұрын
Yes, you can do that, but the code should be like this, df.groupby('City')["Severity"].count().sort_values(ascending=False), Here the column severity does not matter to get the total number of accidents, so we are just counting the total number of rows in each city instead of using sum() on Severity. For better assistance post your question in the community. jovian.ai/forum
@debojitmandal86703 жыл бұрын
@@jovianhq but why doesn't it matter bcs if you read that column description it says the sevirity if the accidents.
@dc46172 жыл бұрын
thank you🙂
@whatdidilearntoday63693 жыл бұрын
Hey aakash, I tried to run jovian notebook via colab but there was a commit error. Can you help me on it?
@jovianhq3 жыл бұрын
Hey, Can you please post your question in the Jovian Forum. Forum Link: jovian.ai/forum
@anupriyasharma92823 жыл бұрын
Hello Sir, 1.Can you pls tell me how to handle missing observations for the following features FEATURE SUM Precipitation(in) 510527 Wind_Chill(F) 449288 Wind_Speed(mph) 128852 Humidity(%) 45506 Visibility(mi) 44206 Weather_Condition 44001 Temperature(F) 43030 Wind_Direction 41857 Pressure(in) 36270 Weather_Timestamp 30263 Airport_Code 4248 Timezone 2302 Zipcode 935 dtype: int64 I have removed "number" feature as 70% of the data of that column was missing Can we use mean/median/ mode or is there any other technique ? 2.For the univariate analysis wouldn't it be very lengthy and time consuming to study 47 features?
@karthikbs84572 жыл бұрын
I have seen people filling median values in the empty cells
@milanms45933 жыл бұрын
Thanks i got the idea of doing EDA. Can you teach us about web scraping .
@theo_riveroooo3 жыл бұрын
Corey Schafer post some great videos about that
@jovianhq3 жыл бұрын
Hi Milan, We are doing a workshop on web scraping next Thursday(April 15th) at 9PM IST on our KZbin Channel. kzbin.info/www/bejne/iHzWfX99Ysete7s
@ganeshr32972 жыл бұрын
At 21:03 ..I couldn't load the data ...what should I do?
@krishnaepili12283 жыл бұрын
@Team : will sample impact the analysis as we are taking 10 percent of data to process the data faster, if no how the data is taking 10 percent for 3.2 billion records in this use case
@jovianhq Жыл бұрын
yes, it will impact the analysis, but if the dataset size is large, it will be approximately correct.
@u_39_siddhantsingh143 жыл бұрын
I only know python. And a little bit of numpy. Will i be able to understand this vid? Is this video helpful for me?
@jovianhq3 жыл бұрын
Yes you will, its a complete step by step guide. Also, you can enroll in our Pandas course to have a better idea about numpy, pandas and Data Analysis. Here's the link: zerotopandas.com
@u_39_siddhantsingh143 жыл бұрын
@@jovianhq thankyou
@PratapO7O13 жыл бұрын
Why did u choose google collab over kaggle. I mean I would have been very easy and we could have saved 26 min.
@AakashNS3 жыл бұрын
You can use either Google Colab or Kaggle notebooks, whichever you mind more convenient!
@jovianhq Жыл бұрын
We're now working on a Kaggle integration that will make it possible to run notebook directly from Kaggle.
@gitasaheru23862 жыл бұрын
Please sir build neural network algorithm with manual coding without keras and use study case
@siddharthpunn103 жыл бұрын
Great session
@jovianhq3 жыл бұрын
Glad you liked it!
@abhaytyagi70933 жыл бұрын
Hey.. I'm working on colab notebook via jovian platform.. but if my screen sleeps for sometime all my data is lost. What way to keep all my cells intact even after my laptop goes into sleep mode.. eagerly waiting for reply to fix it.. and thanks in advance
@AakashNS3 жыл бұрын
Colab shuts down your notebook after some period of inactivity. Execute jovian.commit() from time to time to save a snapshot of your notebook to Jovian. You can then run your notebook on Colab again using the "Run on Colab" option.
@abhaytyagi70933 жыл бұрын
@@AakashNS thank you so much for this.. but I tried this & I'm getting api error when I'm trying to execute jovian.commit.. even though I'm entering the right credentials asked.. even I checked on stack overflow, there are other people too facing same issue.. pls help in this too..
@jovianhq Жыл бұрын
We have improved our Colab integration, please check it out now.
@krazyhorse0043 жыл бұрын
df[df.City == 'New York'] ^ shows New York in the dataset
@jovianhq Жыл бұрын
Yes, the dataset seems to have changed to include New York now