One of the very few members in the Data Science community who provides quality content.
@akhileshlekurwale3644 жыл бұрын
And that too selflessly
@divyamarora49034 жыл бұрын
I can't believe that all this is free. This level of practical content is awesome and rare to find
@arpitakar33842 ай бұрын
Multivariate Imputation by Chained Equations (MICE) or Iterative imputation from scratch just got it's back scratch here... Great video sir... Love from your own country.. WRITE DOWN THE DIFFERENCE: SCKIT-LEARN ITERATIVE IMPUTATER
@arpitakar33842 ай бұрын
Iterative Imputer This estimator is still experimental for now: the predictions and the API might change without any deprecation cycle. To use it, you need to explicitly import enable_iterative_imputer: # explicitly require this experimental feature from sklearn.experimental import enable_iterative_imputer # noqa # now you can import normally from sklearn.impute from sklearn.impute import IterativeImputer
@ravi_krishna_reddy3 жыл бұрын
Very good content and awesome explanation. Thank you so much.
@jayaraghavendra90254 жыл бұрын
Awesome and Expecting more info like this
@raviirla4594 жыл бұрын
Wow movement vedios.. you have nailed it... your vedios are fun to watch with great content.. looking forward more vedios on visualization, feature engineering and data interpretation..
@abhisheksolet84944 жыл бұрын
Amazing Tutorial Sir. Thank you so much for providing such great learning material. Looking forward to have many more.
@madhukerbillapati39444 жыл бұрын
Good one. Worth reading, wish to see more video's
@blue_sapphire86503 жыл бұрын
Simple and neat. I goddamn love the way you covered concepts in this video. In fact, I have been searching for a content like this for a while and now I got it here 😊. Thanks a lot sir.
@datatales10633 жыл бұрын
@19:36 - If we look at the graph, it shows that the slope is touching the y-axis above 0. But, in the equation the intercept value is negative, -924.8180. Why is it like that??
@MrSmarthunky4 жыл бұрын
Very good video Srivatasan sir. Happy if you can make more videos on such foundational knowledge.
@AIEngineeringLife4 жыл бұрын
🙏
@muralikrishna94994 жыл бұрын
Your videos are making me more and more inspired!
@chidiedim31664 жыл бұрын
great one sir
@rakeshkedar40964 жыл бұрын
Thanks for this video . I have a question which was even asked in one of the interviews. How can we evaluate our imputation strategy without applying any machine learning model? for example if i would have replaced Total Charges with mean/median and i do not have actual values to compare as you had in this case. so in that case what are the various statistical approaches to check how good is our imputation strategy ?
@AIEngineeringLife4 жыл бұрын
You can still evaluate the regression model output by creating split within data point you have value to evaluate Mean and median can be good strategy when your data points are close and not spread out. Another option I would say rather imputing use models that can handle missing values
@rakeshkedar40964 жыл бұрын
@@AIEngineeringLife Yes i agree about the intuitive part of using mean/median strategy & using the models that can handle missing values, but i am curious to know whether there are any statistical test to evaluate if the mean/median imputation works for our case?
@TravelWithIndoCanadian4 жыл бұрын
Very well explained.
@hardikraja4 жыл бұрын
Awesome...
@midhileshelavazhagan25414 жыл бұрын
Why does imputing very high values works with gradient booting method? As mentioned in 8:15
@AIEngineeringLife4 жыл бұрын
Sorry for confusion. I think I did not articulate it better.. In models like xgboost I can just makes null values as larger numbers or high negative numbers (in this case since 0 can be valid values, default is 0). Since GBM work on splits it might create separate split for these values. You can check for sparsity aware splitting in below doc arxiv.org/pdf/1603.02754v3.pdf
@arianaquek60364 жыл бұрын
@AIEngineering hello sir, thank you for the insightful video! Just to clarify a few points: - What does 'makes null values as larger numbers or high negative numbers (in this case since 0 can be valid values, default is 0)' you wrote in a comment mean? Does the '0' you mention as 'default is 0' represent a missing value or a value that you impute as a 'value' in the dataset? As what i understood from the paper, xgboost is capable of taking dataset with missing values, impute them by splitting them into different directions and then choose the best route to impute. There is a little part where it says 'The same algorithm can also be applied when the non-presence corresponds to a user specified value by limiting the enumeration only to consistent solutions.' I assume that this is what you meant by making 'makes null values as larger numbers or high negative numbers' - but what does larger/high negative numbers mean?
@AIEngineeringLife4 жыл бұрын
Ariana.. There are 2 things in xgboost. You can set your own missing value in params like the example below xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#dealing-with-missing-values Now what I meant is in case if a continuous value being zero is normal for your business and not really meant missing then keeping the default value is not right. Say a retailer has offer to give a product free if someone buys another product. Then value can be zero for the free product and cannot be treated as missing value. So in these cases we typically impute it with high negative number Now in many cases if the number is substantially an outlier then XGBoost will create separate split sometimes and can be covered in alternate tree path during split even without setting missing value. This can be viewed by visualizing xgboost trees I hope it makes sense.. Will try to cover it in one of my future video where I will be visualizing and interpreting trees
@arianaquek60364 жыл бұрын
@@AIEngineeringLife Thank you for your reply Sir! Am looking forward to more of your videos!
@manassharma8694 жыл бұрын
awesome explanation i hope more parts are coming, thanks
@AIEngineeringLife4 жыл бұрын
Yes framing the problem is difficult.:).. need to see good problems to provide solution. Suggestions welcome as well
@mukeshkund44654 жыл бұрын
Worth watching.
@udaysai26474 жыл бұрын
Srivatsan- I am very excited for this series of videos. It is a great elucidation of we can impute missing values. To generalize your point we need to check how the column containing NaN's varies with the target variable and how other independent variables influence column with NaN's and then figure out best way to impute. This is awesome but just curious about the case where other independent variables might also contain NaN in applying such technique. For example if 'MonthlyCharges' column contains 'NaN's or 'tenure' contains 'NaN's how will we implement this 'lmmodel' technique in this example.
@AIEngineeringLife4 жыл бұрын
Uday.. Nice questions.. First thing in DS process is understand source of data and why nulls are populated. Is Null an unavailability scenario or exception scenario. In Case say monthly charges is null due to error capturing, I will try first imputing it if that field cannot be dropped. Say can I impute it by contract type and services. Below is original dataset github.com/srivatsan88/KZbinLI/blob/master/dataset/WA_Fn-UseC_-Telco-Customer-Churn.csv I have individual services the customer has and can i run KNN based on similarity. Then if I am able to approximate can I use it to get to TotalCharges Again as I said there are probabilistic and there is no one solution to it :)
@udaysai26474 жыл бұрын
Srivatsan- Thank You for the explanation. I think the very first line answers my question. I will try to impart this perception when dealing with a dataset and try to figure out reason that is causing the NaN's . As always you add value to your videos with these suggestions :), Thank you once Again
@mohdhammadkhan55703 жыл бұрын
This content is so rare.
@kachrooabhishek2 жыл бұрын
How we switched from "Monthly Charges" to "Tenure". at 19.34 Sir was that some random guess to check what will be the R-Square and std error with that. ?
@rushikeshbulbule81204 жыл бұрын
Comprehensive ✌ How to do normal distribution by transformation... expecting ahead.. .
@vaibhavbhatia46414 жыл бұрын
Great video sir. Can you please also share a link to the notebook through description, thank you.
@AIEngineeringLife4 жыл бұрын
Notebook is in my git repo here - github.com/srivatsan88/KZbinLI/tree/master/statistics
@vaibhavbhatia46414 жыл бұрын
@@AIEngineeringLife thanks a lot.
@anishnama20914 жыл бұрын
Thanks for this informative video.. How to impute categorical missing values using statistics thinking?
@AIEngineeringLife4 жыл бұрын
Anish.. It depends on distribution of categories. In most cases you can tag it as others and train model or impute with value of max categories. Also similar approach can be followed to see if we can find the category from other variable but this is applicable in very few cases
@anishnama20914 жыл бұрын
@@AIEngineeringLife Thanks
@sachingalugade80924 жыл бұрын
Thanks for video sir..can u please make video which will show various ways to impute values for categorical variables?
@AIEngineeringLife4 жыл бұрын
Categorical is simple typically. We can go with mode of data or use logistics regression to impute it depending on data and business need
@bharathjc47004 жыл бұрын
we can use mIce to impute missing continuous data is this technique better than mice what are the gaps please drop your insights sir
@AIEngineeringLife4 жыл бұрын
Bharath.. The video highlights how to analyze data and use statistical techniques for it, MICE internally uses the same technique but in case if you already have knowledge of data better to use that knowledge instead of have MICE doing the wrong stuff
@antoniushka4 жыл бұрын
Hi Guys! Great job! For some reason I got the same values for both columns "TotChargeNew" and "TotChargesAct", where could be the mistake?
@AIEngineeringLife4 жыл бұрын
Antonio, Thats interesting.. Seeing the data it is very rare to not have any standard error, while you may get value different than mine due to some randomness. Is you data before and after pandas concat same for TotChargeNew column?. You can check my notebook below to compare with yours colab.research.google.com/drive/1fzf5bm_HvbtAQS_2jxR8UoQsCliDr5fa
@antoniushka4 жыл бұрын
@@AIEngineeringLife Thank you! I'll check it out!
@devpratap4 жыл бұрын
first of all, thanks for sharing this all. Sir, I executed the notebook codes after typing them myself to get better understanding. The TotalCharges has only 11 NA values but yours had 28. Also when I load the merged the values in actual Total Charges were empty. Did I do something wrong or have you made changes to the dataset?
@AIEngineeringLife4 жыл бұрын
Devpratap.. My bad.. I think I overwrote the file by mistake.. Check now. Created a new one.. It has 27 though but must work as expected. Let me know if you still have problem
@devpratap4 жыл бұрын
@@AIEngineeringLife I checked using my notebook. It worked fine now. Thanks.
@ashirbaddas25734 жыл бұрын
Hello Sir. Could not we do by simply checking all the correlation values and then we could have gone for best fit couple . And we could have easily find line function for predicting the missing..Please correct me if I am wrong.Thank you for all this.
@AIEngineeringLife4 жыл бұрын
You can but this is simple dataset for demo. think of sparse data more correlated value can introduce bias as well.. it is like how we do feature selection for models even in case of imputing analysis cycle helps
@arulsebastian63384 жыл бұрын
Thanks for the post. What is the github url for this code?
@AIEngineeringLife4 жыл бұрын
Here you go - github.com/srivatsan88/KZbinLI/tree/master/statistics
@ragulshan64904 жыл бұрын
Sir, please do make more videos on different kinds of t-test using python? please elaborate more about different types of normality test.
@AIEngineeringLife4 жыл бұрын
Ragul.. Will do as and when I get time.. Have too many in backlog and finding less free time so bear with me please
@ragulshan64904 жыл бұрын
@@AIEngineeringLife Take your time, sir. I'll be waiting for that!
@rajeshk17394 жыл бұрын
Thanks a lot for your efforts. Request you to please share the ipynb file.
@AIEngineeringLife4 жыл бұрын
It is in my gitrepo - github.com/srivatsan88/KZbinLI/tree/master/statistics
@rajeshvenaganti67973 жыл бұрын
where can i find this code
@username424 жыл бұрын
any github links for jupyter notebooks?
@AIEngineeringLife4 жыл бұрын
Here it is - github.com/srivatsan88/KZbinLI/blob/master/statistics/Statistical_Thinking_Imputing_Missing_Value.ipynb
@username424 жыл бұрын
@@AIEngineeringLife thanks :)
@sumanthreddy15424 жыл бұрын
Why are we imputing values of TotalCharge with Monthlycharge where tenure = 'Zero',Why Can't we put it zero?
@AIEngineeringLife4 жыл бұрын
Sumanth.. we can.. I am just assuming they might anyway have to pay first month. If contract they get penalized for breaking contract. But you can put zero as well. I was just showing thinking to differentiate user personas
@sumanthreddy15424 жыл бұрын
@@AIEngineeringLife Thank you for your response. Very much appreciate your kind effort to share your knowledge.
@raghumarusu40194 жыл бұрын
sir you are doing great job, could you also please tell if i have any doubts in data science, can i reach you on email?
@AIEngineeringLife4 жыл бұрын
Thank you. You can message me on LinkedIn or post as video comments as well
@valerysalov82084 жыл бұрын
please update your github on these video series
@AIEngineeringLife4 жыл бұрын
I thought I did already. Did u check statistics folder in my git. Will check later and update if not
@shivankumar90604 жыл бұрын
Sir please provide Dataset
@AIEngineeringLife4 жыл бұрын
Shivan.. It is in my gitrepo in below link github.com/srivatsan88/KZbinLI/blob/master/dataset/churn_data_st.csv