Complete Exploratory Data Analysis And Feature Engineering In 3 Hours| Krish Naik

Рет қаралды 213,804

Krish Naik

Күн бұрын

Пікірлер: 124

@krishnaik06 2 жыл бұрын

Give this video 1000 likes then I will start a 7 days Live NLP community Sessions for everyone. Happy Learning!!

@vinayakdumbre2828 2 жыл бұрын

Wow,its should be end to end,not just basic rnn,it would be awesome

@photogenicglint239 2 жыл бұрын

Hi Krish , Collab with Sumit Mittal ( Trendytech) for big data course. He teaches in depth but offer course at high price.once he Collab with ineuron so that he can offer course in affordable price.

@vivekpandey8438 2 жыл бұрын

thanks please start NLP common file and Also Upload statistics in 1 Videos

@aryansheth7369 2 жыл бұрын

666

@faraazmohammed3693 2 жыл бұрын

992..close

@mainlykanchan8740 2 жыл бұрын

Sir data analysis in sql with advance queries for portfolio project. Full length video, like this video please 🙏🏼

@najiibrashiidabdi5014 2 жыл бұрын

My name is najiib and i from country called somaliland which is in somalia really i enjoyed this project i will hope you will upload more topics about machine learn thank you krish naik najiib from somaliland

@krishan9739 Ай бұрын

I have to say, this actual teachss you how to think for EDA so you actually learn something.

@mayowaolowolaiyemo1606 2 жыл бұрын

Thanks for this teaching Krish, your approach is simple and easy.

@VyomKumaraes 2 жыл бұрын

47:50 this is also working df[df['Aggregate rating'] == 0]['Country'].unique()

@thedinaaesh 4 ай бұрын

01:30:08 - Based on my experience, when we encounter null values, we typically reach out to the upstream data source to verify if there are any missing values. In most data science projects, data is provided by a data engineering team, so collaboration with them is essential. If they confirm that the data indeed contains null values after their validation, we can then handle those nulls accordingly.

@adityagoyal6527 4 ай бұрын

At 47:54 the same can be done using final_df.Country[final_df['Aggregate rating']==0].value_counts()

@shashwatgoswami6994 2 жыл бұрын

Very informative video. I would like to add a point regarding the UTF-8 code error i.e if you save the excel sheet as CSV UTF-8 comma delimited format then there is no need to enter the codes.

@NikhilSingh-gv5ne 2 жыл бұрын

Mind-blowing explanation bro keep it up

@loserianlaizer4945 8 ай бұрын

thanks Krish..it has been an enlighten session.. Have watched the entire 2.48hours session. Be blessed

@swapnilpalsapure9781 9 ай бұрын

Really helpful Sir..

@raosajid6578 Жыл бұрын

great work sir subscription done from my side

@snehkansagara3356 15 күн бұрын

Append method is no longer available in pandas version 2.0 and later.

@aishwaryapattnaik3082 2 жыл бұрын

Label Encoder should be used only for target labels i.e y and not on input feature. It's mentioned in sklearn Label Encoder page clearly. For nominal & ordinal variables, we should use One Hot Encoder and Ordinal Encoder respectively. These all should be done within a pipeline and column transformer for hassle free coding preferably

@rajkundra5005 2 жыл бұрын

yes,same doubt

@prayashdash1815 2 жыл бұрын

@@rajkundra5005 bhai link dede

@yumatinikhar7858 11 ай бұрын

Thanks. Its really helpful

@yogeshmane9973 2 жыл бұрын

you are doing excellent work sir

@knowledgedoctor3849 2 жыл бұрын

Great Sir❣️

@celebrationsthecelebschoic575 2 жыл бұрын

When you try to get top 3 countries percentage in pie chart, it calculates for only those three countries. But calculating over all the transactions will make sence. Percent of transactions from India means, among all the transactions what is India's percentage. But here in hour case, it allows only India, USA and UK.

@Agros92 2 жыл бұрын

Thanks Krish, you are the best!. A question related to the "second session" about the Product_Category_(1,2,3), I understand that you explain that in case of NaN values in categorical feature you can use the Mode to replace the NaN values. But for this particular case I think that is important to understand the data before doing that, since Product_Category_(1,2,3) indicated that the products can be part of multiples categories. For example a movie being categorized as "Drama, Action, Suspense". So for this case maybe it would be better to try to use dummies for Product_Category_(1,2,3) and then try to sum it, it would be complex to implement it but you would get the real information about your data, since you can get the info about Product_1 being a (0,1,0,0,1,0,1) if that product has 3 categories. Cheers!

@dikshantakumarbharadwaj6052 2 жыл бұрын

God-Father of Data-Science

@rishi4307 6 ай бұрын

# Function to convert duration to minutes def convert_to_minutes(Duration): hours = 0 minutes = 0 Duration = str(Duration) # Ensure the duration is treated as a string if 'h' in Duration: hours = int(Duration.split('h')[0]) Duration = Duration.split('h')[1] if 'm' in Duration: minutes = int(Duration.split('m')[0]) return hours * 60 + minutes # Apply the function to the 'Duration' column final_df['duration_minutes'] = final_df['Duration'].apply(convert_to_minutes) final_df.head()

@arbiiimesh 2 ай бұрын

Hi, Sir I'm an aspiring Data Analyst enthusiast and IT assistant.. Now become a member of your channel.. 🙏 please guide me what needs to be learnt first for a data analyst or data science career.. Hope you will reply Best Regards

@learner8053 2 жыл бұрын

Please post EDA video in your hindi channel also

@shivamkumar-rn2ve 2 жыл бұрын

There are two types of variable nominal and ordinal In ordinal you can use label encoding but you can't use label encoding for nominal variable you have to use one hot encoding if you will use label encoding for nominal then machine learning model will treat nominal as ordinal so you can't use

@aishwaryapattnaik3082 2 жыл бұрын

@shivamkumar-rn2ve 2 жыл бұрын

yeah you are right about label encoder you can only use it for target variable

@praveentanikella4078 2 жыл бұрын

Nice one. One doubt the main work of data analyst is only finding insights and done. The ML part no needed?? Is that ML job work is for Data scientist.

@photogenicglint239 2 жыл бұрын

Hi Krish , Collab with Sumit Mittal ( Trendytech) for big data course. He teaches in depth but offer course at high price.once he Collab with ineuron so that he can offer course in affordable price.

@Schadenfreude596 7 күн бұрын

1:59:00 made me laugh actually lol. At that time Krish Naik not come yes 😂😂

@SACHINKUMAR-px8kq 2 жыл бұрын

thank you so much sir

@kar2194 2 жыл бұрын

Hi Krish, do you have videos of data cleaning, EDA, and feature engineering for unsupervised ML? (For both Principal Component Analysis (PCA, CA, MCA... etc) and Clustering techniques include partitioning, hierarchical, DBSCAN etc). By the way, are there differences in cleaning cleaning and feature engineering between predictive regression and inferential regression? Thank you!

@HiralPrajapati-j5x Жыл бұрын

Query for flight price prediction dataset for duration column df['Duration_hour']=df['Duration'].str.split('h').str[0].str.split('m').str[0] df['Duration_hour']= df['Duration_hour'].astype(int) It's work for me.

@GuruprasadP-s6s 10 ай бұрын

We could have used product ID to fill product category column

@ayushsharma5640 2 жыл бұрын

Thanks sir

@vikasvs5755 Жыл бұрын

super

@rajkumarkurra9031 7 күн бұрын

bro are you participating in kaggle

@MujahidDeen-x6o 6 ай бұрын

Make another video in data explratoery, eda

@Amansharma-he9qg 2 жыл бұрын

first comment sir how to make sql project for portfolio please reply

@mainlykanchan8740 2 жыл бұрын

Yes..

@RajaSirOpsc 2 жыл бұрын

where is the blackfriday dataset

@ninad8880 2 жыл бұрын

Sir plz turn off your notification sound!

@praveentanikella4078 2 жыл бұрын

For data analyst work the data set is available from any data base or in form of excel or CSV ??

@jagadeeshct7083 2 жыл бұрын

please share blackfriday dataset ..there is no blackfriday dataset in the given link.

@kirankumar9934 2 жыл бұрын

Even I'm not able to find black_friday dataset

@yasmeenkarachiwala9612 5 ай бұрын

Hello Sir! Thank you. @43.00 why the observation of the maximum number of ratings is from 2.5 - 3.4?

@narayanbabubharali9846 2 жыл бұрын

Nice

@vijayramapple 2 жыл бұрын

53:15 / 2:48:54

@dibyanikshetry3775 Жыл бұрын

I couldn't do the part where we have to show the country names that has given 0 rating It's not showing any output

@garimabatra2658 2 ай бұрын

how to download dataset? pls help

@sangramshinde9262 2 жыл бұрын

I dont understand replacing na values of product catogry_2 and product catogry_3 with mode we just manipulated the data

@usmanriaz6157 2 жыл бұрын

Sir, Airline is a nominal feature and in you said that in case of nominal feature, we can do OHE or Mean encoding. Why are you using LabelEncoding ?

@jececdept.9548 Жыл бұрын

is this a regression problem?

@vijaysharma7677 2 жыл бұрын

please explain how one can find the location of CSV or get the jupyter NB to read the file location automatically inside a folder I am getting an error while reading the file

@ManishKumar-qh2ql 2 жыл бұрын

open with path location and instead of \ use \\

@madhupincha7898 2 жыл бұрын

pwd()

@Agros92 2 жыл бұрын

You can put the csv file on the same folder of the JupyterNB file. To read it it would be - pd.read_csv("data_name.csv") -. If you put the data in another folder and that folder is located in the same folder of the JupyterNB file you can do - pd.read_csv("Folder_Name\\data_name.csv") -

@Rijuldhungana Жыл бұрын

data[data['Aggregate rating']==0]['Country'].value_counts() , This also works

@ShikhaJain-u7y 5 ай бұрын

I am not finding train.csv file for the second part of video in your github

@himadrikar4664 4 ай бұрын

The link to the data set is given in the first line of the Python notebook. Download the dataset from that link.

@abhisheksinghmahra446 2 жыл бұрын

sir how to deal with utf-8 encoding

@Srushti_Mane 11 ай бұрын

use latin=1

@bestofmusicc__ 11 ай бұрын

Hi, why did you combined the country code ?? Please explain this.

@A3dull 11 ай бұрын

The first dataset only includes the country code, while the second dataset contains both the country code and the country name. When merging them together, the country name column was populated using the information from the second dataset.

@bestofmusicc__ 11 ай бұрын

@@A3dull yeah thanks man👍💪

@pratik5692 2 жыл бұрын

feature engineering in one video

@moghalkarishma2378 Жыл бұрын

Is necessary to hanle missing values in data analysis?

@_k_kd Жыл бұрын

yes.

@prafulaggarwal9683 Жыл бұрын

where to find the black friday dataset?

@swetamishra3580 11 ай бұрын

Did you find it?

@himadrikar4664 4 ай бұрын

The link to the data set is given in the first line of the Python notebook. Download the dataset from that link.

@tanumoyhazra6055 2 жыл бұрын

from where i can get your codes for this video ?

@oseikofi4953 Жыл бұрын

I can't find the black friday dataset on your github page

@himadrikar4664 4 ай бұрын

The link to the data set is given in the first line of the Python notebook. Download the dataset from that link.

@vanshsrivastava6551 2 жыл бұрын

Is this enough to mention in resume

@srirama8275 2 жыл бұрын

What are Prequesties to learn this sir?

@krishnaik06 2 жыл бұрын

python

@srirama8275 2 жыл бұрын

@@krishnaik06 Thank you sir

@anonymous_12155 Жыл бұрын

I am getting Nan error when I try to replace F with 0 and M with 1 in Black Firday EDA ..How to resolve it?

@pavankumarjammala9262 Жыл бұрын

Once before running that particular code run all cells at a time you will get it

@azamiqbal8792 Жыл бұрын

Can you share file for practice

@adamassrkfan 5 ай бұрын

2:36:35

@Abhi-qn4xv 2 жыл бұрын

Can anyone explain when do we use onehotencoding and when do we use Labelencoder(ordinal encoding) since they both do the same job but in a different way, onehot creates multipe new feature while label do all the work in one feature. Like in this case wouldn't be better to use labelencoder to do encoding in Additional info feature since onhot will create multiple new sparse eatures which might increase he workload of the mode or am i missing some point here?

@sanjaysanjay862 2 жыл бұрын

One-hor encoding is used only for independent variables (feature) but label encoder is used for target variable.And they both won't do the same task one-hot encoding gives seperate columns for each catagory.As of my understanding.If wrong reply

@Abhi-qn4xv 2 жыл бұрын

@@sanjaysanjay862 well u r correct. I did some reading in this topic and found out that although label encoder can be used on independent variables too, it's usually not used. On independent variable, one hot is better than label encoder as label encoder might confuse the model into learning that feature as a rank. So instead of learning 1 as a numerical representation of a word, model will think 1 as a rank. Hope u understand my point

@sanjaysanjay862 2 жыл бұрын

@@Abhi-qn4xv Yes, I agree that

@adeshinaibrahim9641 2 жыл бұрын

In simple terms use one-hot encoding when you have limited number of categories but otherwise dont.

@elahehkhazaei4855 Ай бұрын

Please give us source of you data

@siddhantgaurav7053 2 жыл бұрын

feature engineering in 1 video

@ashishsaha6904 2 жыл бұрын

why latin-1 ?

@ramdasprajapati7884 2 жыл бұрын

Find the top 10 cuisines(food) item for this for zomato dataset is this code correct final_df.Cuisines[:10].value_counts()

@Pyrometin 8 ай бұрын

Guys how to find top 10 Cuisines in data ? help me

@Pyrometin 8 ай бұрын

I got it, use this code. final["Cuisines"].value_counts()[:10]

@aakashpal0777 2 жыл бұрын

@PradeepSahu-kh8vr Жыл бұрын

im not getting zomato csv file....can anyone help????

@pavankumarjammala9262 Жыл бұрын

Yeah !! bro same prblm from my side also

@SachinModi9 2 жыл бұрын

How to find top 10 Cuisines final_df= final_df.replace(np.nan,'Dummy') --- Convert NaN to Dummy one_string = ','.join(final_df['Cuisines'].tolist()) -- Convert Cuisines columns to list and join one_list = one_string.replace(" ","").split(',') -- replace blank spaces by comma pd.value_counts(one_list)[:10] --- top 10 values

@thedinaaesh 4 ай бұрын

01:41:40 - probably few men are buying on behalf of the women 😂

@sonukumar-yp6vs 2 жыл бұрын

11:00

@prashantgupta2172 2 жыл бұрын

Hindi me vedio bana digite aap

@jedits7835 Жыл бұрын

after do doinh this project can we add this resume

@NooBGamer-fd4ln 5 ай бұрын

he said fucked instead of fixed 1:51:00 😆

@naveenojha8377 Жыл бұрын

Hindi m hota to jarur kuch Sikh pate 😓😓😓😓

@RolandElvira-l4y 3 ай бұрын

Wilson Larry Moore Richard Martin Daniel

@jackymarcel4108 3 ай бұрын

Lewis Barbara Davis Brenda Brown Michael

@MLMinute 11 ай бұрын

Everything is perfect except the pronunciation. Haha

@muhammadzakiahmad8069 2 жыл бұрын

Zomato Dataset Assignment: (With respect to value counts) cus_values = final_df["Cuisines"].value_counts().values cus_labels = final_df["Cuisines"].value_counts().index plt.pie(cus_values[:10],labels=cus_labels[:10],autopct='%1.2f%%') (With respect to Aggregate rating) final_df[['Aggregate rating','Cuisines']].groupby(['Aggregate rating','Cuisines']).size().reset_index().tail(10) Please correct me if i did it wrong.

@FaizanSharif-k8g 4 ай бұрын

final_df= final_df.replace(np.nan,'Dummy') --- Convert NaN to Dummy one_string = ','.join(final_df['Cuisines'].tolist()) -- Convert Cuisines columns to list and join one_list = one_string.replace(" ","").split(',') -- replace blank spaces by comma pd.value_counts(one_list)[:10] --- top 10 values