Let's Build an Exploratory Data Analysis Project from Scratch

Let's Build an Exploratory Data Analysis Project from Scratch | Python, Numpy, Pandas

Рет қаралды 221,212

Jovian

Күн бұрын

Пікірлер: 267

@decodewithabhishek 3 жыл бұрын

It was a good video, I like how he didn't cut out the part when he's stuck at some problem. ⭐⭐⭐⭐⭐

@normalperson1130 3 жыл бұрын

Thank you Aakash for giving a raw walk-through. Apart from usual documentation stuff I think the ability to google and find answers for the problems are much more important skills in the area of Data Science apart from ofcourse mathematical understanding. This walkthrough actually gave me more confidence in using pandas without worrying about typical syntax pitfalls

@jovianhq 3 жыл бұрын

Glad it was helpful!

@raihanhosain3374 Жыл бұрын

best project tutorial in any youtube channel. Everyone make video and cut those portion when they get stuck. On the other hand you just show the real scenario and show us how to find solution using google as well. we want similar video on data science project. Love from Bangladesh.

@shivsharma9153 2 жыл бұрын

Do you know why I loved this video? You kept it raw and real, you are clearly portraying how a data analyst thinks and does the project, which I believe is more important than fancy coding...syntax you can get easily but analytical thinking requires the real efforts

@snehaldamkondwar618 2 жыл бұрын

Hi shiv Im from non technical background i was doing the given project but when we close all tabs how to reach out to the same notebook

@shivsharma9153 2 жыл бұрын

@@snehaldamkondwar618 Hi, where did you save it locally? Which folder

@snehaldamkondwar618 2 жыл бұрын

@@shivsharma9153 need to save??

@snehaldamkondwar618 2 жыл бұрын

@@shivsharma9153 open notebook through jovian run through colab with help of kaggle datset . I have written some line of code in colab .then i have close all tabs . Now how to go to that file where i have write my code.

@shivsharma9153 2 жыл бұрын

@@snehaldamkondwar618 try to search locally with the name of notebook you may be able to locate it

@rajivgarg9480 2 жыл бұрын

I have seen only half the video. Couldn't stop myself from apprecating the good work. Couldn't have been done any better. Way better than the Paid education platforms

@sabchillhai802 3 жыл бұрын

great work jovian , we need more such types of session. Thanks a lot

@jovianhq 3 жыл бұрын

Glad you liked it!

@MuhammadAkbarAttamimi 3 жыл бұрын

55:52 this dataset contains New York data accidents, there are around 10.000 record. I checked it using df[df['City'] == 'New York']

@garvitpoddar6947 3 жыл бұрын

Yes

@AakashNS 3 жыл бұрын

Thanks! Not sure why I missed it. Maybe I was using a different version of the dataset.

@claudiolb8552 3 жыл бұрын

@@AakashNS not sure why but ["New York" in Df.City] always returns false try it with a another city it just doesn't work

@tirthhihoriya690 3 жыл бұрын

@@claudiolb8552 Use: a. >>> 'NY' in list(df.State) or b. >>> 'New York' in list(df.City) or c. >>> cities_by_accident['New York']

@shailjamishra9423 2 жыл бұрын

yes, new york city is there in the dataset, the state which is missing is 'Alaska'

@aashisethiya4653 Жыл бұрын

Aakash, you are one of the best teachers I have come across. Coming from a hard-core medical background and pivoting into data analytics I came across your panda's courses while preparing for my foundation in python before starting a master's in the US in analytics this Fall. Hands down you have given beginners like me a lot of handholding with your courses and videos!

@jovianhq Жыл бұрын

Thanks, I'm glad you found our course helpful! 😊 - Aakash

@aashisethiya4653 Жыл бұрын

@@jovianhq I went through many teachers on youtube and data camp: but truth to be told- most are ludicrously formal in their teaching methods and have a slower theoretical pace. Is there any possibility to connect with Aakash to get certain roadmap tips for a beginner who plans to venture into the US Health Business Analytics Domain?

@sandeepmesa 2 жыл бұрын

I like the way you google for help ..Appreciate your time ..learnt new things on how to articulate our work ..thanks

@jovianhq 2 жыл бұрын

Glad it was helpful!

@shreyaskulkarni7612 3 жыл бұрын

The current dataset is updated. A high percentage of accidents occur between 3 pm to 6 pm (probably people in a hurry to get to home) Next higest percentage is 6 am to 9 am. Over 1100 cities have reported just one accident (need to investigate On Sundays, the peak occurs between 11 am and 6 pm, unlike weekdays

@jovianhq 3 жыл бұрын

Interesting analysis and insights Shreyas!

@freehappymeal 2 жыл бұрын

Thank you for teaching us how to problem solve and the whole EDA process!

@jovianhq 2 жыл бұрын

Happy to help!

@outinthebeach 3 жыл бұрын

Great course Aakash - this and everything else you have put here. Thanks for your generosity to teach this the way you have done it. Brilliant!!!

@prisri5953 Жыл бұрын

NY is in the state list. The Missing states are AK(Alaska) and HI(Hawaii). It also considers DC as state

@anwoybarua8213 3 жыл бұрын

One of the best KZbin channel for data analysis learners❤️❤️

@eyesofdoriss Жыл бұрын

Great sharing. I've been looking for a full guide like this one for a while. Thank you!

@jovianhq Жыл бұрын

Glad you enjoyed it!

@jeetthakkar2297 2 жыл бұрын

Sir actually New York data is present in the given data set. We get the output as False if we use: 'New York' in df ['City'] And we get the output as True if we use : 'New York' in df ['City'].unique()

@jovianhq 2 жыл бұрын

Yes Jeet, you are correct. We found it later but didn't update the video to show that this type of error might happen any time during working on a project. Great work on finding it!

@harshucore Жыл бұрын

I used - 'New York' in df.values and got True

@bvvsr89 3 жыл бұрын

Watching the master is how you learn...Thanks a lot for this...

@muralikumaar9456 2 жыл бұрын

Great session on EDA. We need more such sessions on different datasets.

@jovianhq 2 жыл бұрын

This is just an example, we hope the viewers will be able to make better EDA projects on different datasets after watching this video.

@sivaramaguhans4002 3 жыл бұрын

I can't see an EDA explanation clearly in other videos... awesome 🎉

@piyushkumar-kb2jc 3 жыл бұрын

concept is crystal clear by anuj bhyia.

@jovianhq Жыл бұрын

Thanks!

@ankitlakshya450 2 жыл бұрын

bro you were my senior in intermediate .ascent junior college ,vizag . got a clarity on eda btw

@jovianhq Жыл бұрын

Glad you liked our tutorials!

@tiwarirr 3 жыл бұрын

Best teacher for Data scince!

@bane2256 Жыл бұрын

This was excellent. I hope for more of these in 2023

@jovianhq Жыл бұрын

Definitely!! Stay tuned, more interesting videos coming soon.

@bane2256 Жыл бұрын

@@jovianhq is this the type of project that is sufficient to be included in an analytics portfolio? or does it need to be something more extensive?

@deepasarojam4425 3 жыл бұрын

This is best video on EDA I have ever watched! Thanks Aakash :)

@jovianhq 3 жыл бұрын

Thanks for the kind feedback!

@Phoenix_Bro1 Жыл бұрын

This was a superb explanation of how to do EDA. Extremely helpful, Aakash!

@nikunjdeeep 2 ай бұрын

this EDA is so motivating to me .......we all search in google...i thought why i can't recall all of those pandas function....

@Mlksgf Жыл бұрын

What a great Tutorial! The df is obviously updated and I cannot find the 'New York' value in Cities, BUT there are data in cities_by_accident "cities_by_accident['New York']" and is equal to 7068

@moymaya 3 жыл бұрын

Thank you Aakash. Really helpful. Liked the way we committed mistakes and even learnt something new from it.

@jovianhq 3 жыл бұрын

Glad you liked it

@igordemetriusalencar5861 3 жыл бұрын

Good class of Python pandas, but in R exploratory and statistics analysis are way easier compared to Python. Example: data_frame %>% filter(City == "New York") bam!! dataset filtered. Summarize numeric data => data_frame %>% summary() !! bam!! Done!! in a totally functional way.

@jovianhq Жыл бұрын

Both R and Python are great, you can use either one. Python is gaining more traction because it also has great packages for machine learning & deep learning.

@neelajguhaneogi8348 Ай бұрын

New York data is there, manually checked the whole data to find the city because the method you showed at 58 minutes mostly doesn't work because of the spaces, some values contain unnecessary spaces and that creates a problem.

@sajjadabdullah 2 жыл бұрын

Perfect video. I was looking for such video. Thank you Sir

@raminirakeshkumar8287 3 жыл бұрын

Thank you Aakash, great work

@unpatel1 2 жыл бұрын

This is a great project and I really enjoyed it. After finishing this video yesterday, I am working on other parameters to expand my analysis. I would love to see more projects from Akash. Thank you.

@snehaldamkondwar618 2 жыл бұрын

Hi do you know once we close all tabs how to work on it again

@user-zj9pq5xc7x 5 ай бұрын

loved your freecodecamp course. thank you so much

@architnangalia3426 3 жыл бұрын

56:02 The dataset does contain 'New York' "" cities_by_accident['New York'] "" gives us the output as 10255

@shailjamishra9423 2 жыл бұрын

yes..but it does not show the values..just showing count...strange!!

@jovianhq Жыл бұрын

Yes, the dataset does contain New York now.

@scapri1000 3 жыл бұрын

Thank you. This is one of the best video on EDA .

@SeunOnSet Жыл бұрын

Thank you for sharing this! It was really insightful to see the analysis process from start to finish. It also answered a few questions I had.

@jovianhq Жыл бұрын

Glad it was helpful!

@shubhamtalks9718 3 жыл бұрын

Very educational video. Please keep posting such videos.

@raghvendrasingh8037 2 жыл бұрын

nice video, simple explaination and the best part was it from the scratch. loved it

@beatmarsgo6972 Ай бұрын

Didn't thought I would see you here after the freecodecamp course

@gunngunn6763 3 жыл бұрын

Thank You... looking forward to your upcoming videos

@jawedkhan8602 3 жыл бұрын

You are doing great job. Thank you

@jovianhq 3 жыл бұрын

Thank you!

@TheHasanjafreee Жыл бұрын

This was great! Thank you for the video

@theforester_ 2 жыл бұрын

awesome video! big shout out from Brazil

@jovianhq 2 жыл бұрын

Hey Mauricio👋, thanks!

@sarzilhossain5977 2 жыл бұрын

"New York" in df. City returns False But "New York" in df.City.uniqu() returns True. (Which I have no explanation for) And in fact, there are 4220 accident cases inside the dataset which occurred in New York inside the dataset (The dataset could be updated recently.). I don't know if it has been updated. But since the accident records fall in between the year 2016 and 2020, it would seem weird if new rows get added later on.

@jovianhq 2 жыл бұрын

You are correct, there was "New York" in the dataset before as well. df.City returns a Series where if you search using the "in" operator, it will search for the indexes and not match with the values. Where as df.City.unique() creates a list and "New York" is searched within that list so you were able to find "New York".

@InsaneRealityLeak 3 жыл бұрын

Thank you so much. Definitely a very useful video. ✌🏽

@jovianhq 3 жыл бұрын

Glad it was helpful!

@UCEAbhishekLokhande Жыл бұрын

Thank You Very Much learn lots of things through this session

@jovianhq Жыл бұрын

Glad to hear that!

@Griffindor21 Жыл бұрын

Really great video! Any chance I can get a copy of the jupyter file?

@abhisarshrivastava4667 2 жыл бұрын

This is really helpful thank you Jovian

@jovianhq 2 жыл бұрын

Glad you liked it!

@pandabear6095 3 жыл бұрын

Thank you very much ! This video was useful and easy to understand.

@moeid9935 3 жыл бұрын

i liked ur naturality

@ShelloSongz 2 жыл бұрын

Wow, thank you for your concise explanations.

@jovianhq 2 жыл бұрын

Glad it was helpful!

@SarcasmWEB 2 жыл бұрын

Thank you so much! It was very educational

@muhammadshoaibfareed2577 3 жыл бұрын

A great session indeed

@jovianhq 3 жыл бұрын

Glad you liked it!

@amanpreetsinghgulati2475 2 жыл бұрын

Hi, at around 49:17 when you are checking that weather we have 'New York' data or not so in that when we are checking for the existence with, if 'New York' in df.values - it will return True And If 'New York' in df.City - False Also If 'Dublin' in df.City - False ( and for all the other cities ) So, in my preference we need to use the df.values ( it will check the whole dataset - yes might be time taking and requires unwanted computing processing as well ) Please help us to improve this part Thanks

@jovianhq 2 жыл бұрын

Yes @Aman, you are correct, New York is indeed present in the dataframe. We've purposefully kept the video in it's raw format instead of editing it. This shows that it's very common to get errors like these while working on your project, one have to be very careful before making a conclusion.

@amanpreetsinghgulati2475 2 жыл бұрын

@@jovianhq yes sir, thanks for the session learnt a lot from this basically for "how" to do it there is ample of resources available but "what" to do in EDA is hard to find Thanks for that

@jyothiramesh3450 Жыл бұрын

Hey I am getting an error while installing packages. "You may need to restart the kernel to use updated packages"

@NiviudPu 10 ай бұрын

Shall i do for this as my mini project???

@bikrammajhi3020 2 жыл бұрын

Thank you so much Sir !!

@navyaagarwal5918 2 жыл бұрын

Among the top 100 cities in number of accidents, which states do they belong to most frequently? How do we solve this question

@ytg6663 3 жыл бұрын

Big thank you for being Here 👍👍

@jovianhq 3 жыл бұрын

Glad you liked it!

@imdadood5705 3 жыл бұрын

@36:30 We can also do, df.describe().shape[1] @54:40, I got the results for new york. I did cities_by_accident.loc[“New York”]

@jovianhq Жыл бұрын

Yes, the dataset now contains information about New York

@rohan30497 Жыл бұрын

For personal use:- 1:17:19

@sandipansarkar9211 2 жыл бұрын

finished watching

@mansigaikwad9 Жыл бұрын

idk if they have updated the dataset , but i just tried to find whether New City is there or not and if yes then the number of records ....(referring to 56:00 ) used this code - len(df[df['City']=='New York']) and got the answer.. so , New york is there in the dataset and the number of accidents is 7068

@jovianhq Жыл бұрын

You are correct! New York was indeed present in the dataset, but in the live session it got skipped due to some mistake in code.

@raghavverma120 2 жыл бұрын

I did read your exploratory analysis file for crop production analysis… and all the agroup by queries that you had run were wrong.. plz look into it and rectify them

@hrittickdebnath35 3 жыл бұрын

You did a fantastic job buddy

@jovianhq 3 жыл бұрын

Glad you liked it!

@825sohambharambe9 2 жыл бұрын

In my case when i read the file the jupyter notebook is taking way too long time What can i do?

@dilaraesmer Жыл бұрын

Thank you so much for all your efforts :)

@jovianhq Жыл бұрын

Thank you for the comment! Glad you like the videos

@rishabgupta2733 Жыл бұрын

On data preparation step my data frame is crashing continuously. What to do now?

@NSASANAPURIKAVYASRI 2 жыл бұрын

it is asking permissions to use those datasets,what should i do?

@PinaColada65 2 жыл бұрын

tysm for this. this tutorial is a blessing

@jovianhq 2 жыл бұрын

You're so welcome!

@sharkk2979 3 жыл бұрын

aakash is knowledeble as sky .

@jovianhq 3 жыл бұрын

Can't agree more! - Jovian Team

@hydemi83 2 жыл бұрын

Great video 👏 Congrats for this awesome job

@jovianhq 2 жыл бұрын

Thank you very much!

@lakhanpatel2702 3 жыл бұрын

sir i try this code and his show True in 'New York' city first i see df.values df.values show my all data value in array form then i write this code 'New York' in df.values this line of code show True as a output.

@kshitizprajapati694 3 жыл бұрын

i have completed zero to pandas course can you plz create content about sql integration project?

@jovianhq 3 жыл бұрын

Sure we will definitely consider the topic for our upcoming courses.

@akshayshukla4358 Ай бұрын

My colab crashes every time i use to read this dataset. it runs for 2-3 minutes and then it get crashed. anyone can help me on this..

@datayogi_ 2 жыл бұрын

After excluding the bing data, wasn't there a need to recreate the graphs and insights done before finding that bing data is faulty ?

@jovianhq 2 жыл бұрын

Yup, you are correct, we should always do more research before concluding something

@datayogi_ 2 жыл бұрын

@@jovianhq okay 😊, thanks for the reply

@Carworld-s5l 2 жыл бұрын

Previously I felt to remember all the pandas methods but you made me confident. Thank you Bhai❤❤

@jovianhq 2 жыл бұрын

Glad it was helpful! Check our other courses at jovian.ai/learn

@atifshaik1156 2 жыл бұрын

Is it Fine to Google Something while working on a project??Like How did u in the Video??

@jovianhq 2 жыл бұрын

Yes, it's absolutely fine, you're not expected to know everything, and even if you know there can be a better way of implementing the same thing. So it's totally fine to google something out.

@yashdhangar3261 Жыл бұрын

Which algorithm is used

@vishwaslad1810 2 жыл бұрын

Great Video

@abhishekkumar-qi3is 3 жыл бұрын

please make vedio of feature engirreing and selection and thanks for this content

@jovianhq 3 жыл бұрын

Hey, have you tried our Machine Learning course? We have covered feature Engineering/Selection and lots of other interesting topic in that course. View the course from here -> zerotogbms.com

@atharvaparanjape9585 2 жыл бұрын

at 37:59 how did we get a plot without importing matplotlib ??

@aryanrana5658 Жыл бұрын

It's a good video but the dataset you uploaded that is updated one . We also want the row messy dataset which u use while handling missing values

@jovianhq Жыл бұрын

Thank You. Unfortunately the dataset was updated from Kaggle, we don't have access to the previous version to the dataset.

@vikasmishra4385 3 жыл бұрын

Hi I have one issue when i am trying to run a histplot in seaborn it is show a error as "module 'seaborn' has no attribute 'histplot'" i am confused like what might be the reason i tried updating the whole PIP but of no use. Can you suggest what shall be the possible solution.

@jovianhq 3 жыл бұрын

Try updating the seaborn library using the following command `pip install -U seaborn`

@gajanansawadadkar5003 Жыл бұрын

Good session

@bhushanwagh7192 3 жыл бұрын

Awesome sir

@debojitmandal8670 3 жыл бұрын

Sir here you have a column called siverity And it tells the siverity of the accident . So what I am asking is to find out the cities with highest number of accidents can I group by function and group based on the city and siverity . I.e df.groupby('City). Siverity.sum().sort_values( ascending = False) Bcs I feel this is a better approach then using unique values . Please please reply back

@jovianhq 3 жыл бұрын

Yes, you can do that, but the code should be like this, df.groupby('City')["Severity"].count().sort_values(ascending=False), Here the column severity does not matter to get the total number of accidents, so we are just counting the total number of rows in each city instead of using sum() on Severity. For better assistance post your question in the community. jovian.ai/forum

@debojitmandal8670 3 жыл бұрын

@@jovianhq but why doesn't it matter bcs if you read that column description it says the sevirity if the accidents.

@dc4617 2 жыл бұрын

thank you🙂

@whatdidilearntoday6369 3 жыл бұрын

Hey aakash, I tried to run jovian notebook via colab but there was a commit error. Can you help me on it?

@jovianhq 3 жыл бұрын

Hey, Can you please post your question in the Jovian Forum. Forum Link: jovian.ai/forum

@anupriyasharma9282 3 жыл бұрын

Hello Sir, 1.Can you pls tell me how to handle missing observations for the following features FEATURE SUM Precipitation(in) 510527 Wind_Chill(F) 449288 Wind_Speed(mph) 128852 Humidity(%) 45506 Visibility(mi) 44206 Weather_Condition 44001 Temperature(F) 43030 Wind_Direction 41857 Pressure(in) 36270 Weather_Timestamp 30263 Airport_Code 4248 Timezone 2302 Zipcode 935 dtype: int64 I have removed "number" feature as 70% of the data of that column was missing Can we use mean/median/ mode or is there any other technique ? 2.For the univariate analysis wouldn't it be very lengthy and time consuming to study 47 features?

@karthikbs8457 2 жыл бұрын

I have seen people filling median values in the empty cells

@milanms4593 3 жыл бұрын

Thanks i got the idea of doing EDA. Can you teach us about web scraping .

@theo_riveroooo 3 жыл бұрын

Corey Schafer post some great videos about that

@jovianhq 3 жыл бұрын

Hi Milan, We are doing a workshop on web scraping next Thursday(April 15th) at 9PM IST on our KZbin Channel. kzbin.info/www/bejne/iHzWfX99Ysete7s

@ganeshr3297 2 жыл бұрын

At 21:03 ..I couldn't load the data ...what should I do?

@krishnaepili1228 3 жыл бұрын

@Team : will sample impact the analysis as we are taking 10 percent of data to process the data faster, if no how the data is taking 10 percent for 3.2 billion records in this use case

@jovianhq Жыл бұрын

yes, it will impact the analysis, but if the dataset size is large, it will be approximately correct.

@u_39_siddhantsingh14 3 жыл бұрын

I only know python. And a little bit of numpy. Will i be able to understand this vid? Is this video helpful for me?

@jovianhq 3 жыл бұрын

Yes you will, its a complete step by step guide. Also, you can enroll in our Pandas course to have a better idea about numpy, pandas and Data Analysis. Here's the link: zerotopandas.com

@u_39_siddhantsingh14 3 жыл бұрын

@@jovianhq thankyou

@PratapO7O1 3 жыл бұрын

Why did u choose google collab over kaggle. I mean I would have been very easy and we could have saved 26 min.

@AakashNS 3 жыл бұрын

You can use either Google Colab or Kaggle notebooks, whichever you mind more convenient!

@jovianhq Жыл бұрын

We're now working on a Kaggle integration that will make it possible to run notebook directly from Kaggle.

@gitasaheru2386 2 жыл бұрын

Please sir build neural network algorithm with manual coding without keras and use study case

@siddharthpunn10 3 жыл бұрын

Great session

@jovianhq 3 жыл бұрын

Glad you liked it!

@abhaytyagi7093 3 жыл бұрын

Hey.. I'm working on colab notebook via jovian platform.. but if my screen sleeps for sometime all my data is lost. What way to keep all my cells intact even after my laptop goes into sleep mode.. eagerly waiting for reply to fix it.. and thanks in advance

@AakashNS 3 жыл бұрын

Colab shuts down your notebook after some period of inactivity. Execute jovian.commit() from time to time to save a snapshot of your notebook to Jovian. You can then run your notebook on Colab again using the "Run on Colab" option.

@abhaytyagi7093 3 жыл бұрын

@@AakashNS thank you so much for this.. but I tried this & I'm getting api error when I'm trying to execute jovian.commit.. even though I'm entering the right credentials asked.. even I checked on stack overflow, there are other people too facing same issue.. pls help in this too..

@jovianhq Жыл бұрын

We have improved our Colab integration, please check it out now.